aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2019-11-13ftrace: Add ftrace_find_direct_func()Steven Rostedt (VMware)1-1/+78
As function_graph tracer modifies the return address to insert a trampoline to trace the return of a function, it must be aware of a direct caller, as when it gets called, the function's return address may not be at on the stack where it expects. It may have to see if that return address points to the a direct caller and adjust if it is. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-11-13ftrace: Add register_ftrace_direct()Steven Rostedt (VMware)2-5/+272
Add the start of the functionality to allow other trampolines to use the ftrace mcount/fentry/nop location. This adds two new functions: register_ftrace_direct() and unregister_ftrace_direct() Both take two parameters: the first is the instruction address of where the mcount/fentry/nop exists, and the second is the trampoline to have that location called. This will handle cases where ftrace is already used on that same location, and will make it still work, where the registered direct called trampoline will get called after all the registered ftrace callers are handled. Currently, it will not allow for IP_MODIFY functions to be called at the same locations, which include some kprobes and live kernel patching. At this point, no architecture supports this. This is only the start of implementing the framework. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-11-13time: Rename tsk->real_start_time to ->start_boottimePeter Zijlstra1-1/+1
Since it stores CLOCK_BOOTTIME, not, as the name suggests, CLOCK_REALTIME, let's rename ->real_start_time to ->start_bootime. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-13locking/lockdep: Update the comment for __lock_release()Dan Carpenter1-3/+1
This changes "to the list" to "from the list" and also deletes the obsolete comment about the "@nested" argument. The "nested" argument was removed in this commit, earlier this year: 5facae4f3549 ("locking/lockdep: Remove unused @nested argument from lock_release()"). Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lkml.kernel.org/r/20191104091252.GA31509@mwanda Signed-off-by: Ingo Molnar <[email protected]>
2019-11-13perf/aux: Allow using AUX data in perf samplesAlexander Shishkin3-3/+207
AUX data can be used to annotate perf events such as performance counters or tracepoints/breakpoints by including it in sample records when PERF_SAMPLE_AUX flag is set. Such samples would be instrumental in debugging and profiling by providing, for example, a history of instruction flow leading up to the event's overflow. The implementation makes use of grouping an AUX event with all the events that wish to take samples of the AUX data, such that the former is the group leader. The samplees should also specify the desired size of the AUX sample via attr.aux_sample_size. AUX capable PMUs need to explicitly add support for sampling, because it relies on a new callback to take a snapshot of the buffer without touching the event states. Signed-off-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-13perf/core: Fix unlock balance in perf_init_event()Qian Cai1-1/+1
Commit: 66d258c5b048 ("perf/core: Optimize perf_init_event()") introduced an unlock imbalance in perf_init_event() where it calls "goto again" and then only repeat rcu_read_unlock(). Signed-off-by: Qian Cai <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Fixes: 66d258c5b048 ("perf/core: Optimize perf_init_event()") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-13Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar1-4/+19
Signed-off-by: Ingo Molnar <[email protected]>
2019-11-13perf/core: Fix missing static inline on perf_cgroup_switch()Ben Dooks (Codethink)1-1/+1
It looks like a "static inline" has been missed in front of the empty definition of perf_cgroup_switch() under certain configurations. Fixes the following sparse warning: kernel/events/core.c:1035:1: warning: symbol 'perf_cgroup_switch' was not declared. Should it be static? Signed-off-by: Ben Dooks (Codethink) <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Mark Rutland <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-13perf/core: Consistently fail fork on allocation failuresAlexander Shishkin1-1/+1
Commit: 313ccb9615948 ("perf: Allocate context task_ctx_data for child event") makes the inherit path skip over the current event in case of task_ctx_data allocation failure. This, however, is inconsistent with allocation failures in perf_event_alloc(), which would abort the fork. Correct this by returning an error code on task_ctx_data allocation failure and failing the fork in that case. Signed-off-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-13perf/aux: Disallow aux_output for kernel eventsAlexander Shishkin1-0/+7
Commit ab43762ef0109 ("perf: Allow normal events to output AUX data") added 'aux_output' bit to the attribute structure, which relies on AUX events and grouping, neither of which is supported for the kernel events. This notwithstanding, attempts have been made to use it in the kernel code, suggesting the necessity of an explicit hard -EINVAL. Fix this by rejecting attributes with aux_output set for kernel events. Signed-off-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-13perf/core: Reattach a misplaced commentAlexander Shishkin1-4/+3
A comment is in a wrong place in perf_event_create_kernel_counter(). Fix that. Signed-off-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-13perf/aux: Fix the aux_output group inheritance fixAlexander Shishkin1-1/+1
Commit f733c6b508bc ("perf/core: Fix inheritance of aux_output groups") adds a NULL pointer dereference in case inherit_group() races with perf_release(), which causes the below crash: > BUG: kernel NULL pointer dereference, address: 000000000000010b > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 3b203b067 P4D 3b203b067 PUD 3b2040067 PMD 0 > Oops: 0000 [#1] SMP KASAN > CPU: 0 PID: 315 Comm: exclusive-group Tainted: G B 5.4.0-rc3-00181-g72e1839403cb-dirty #878 > RIP: 0010:perf_get_aux_event+0x86/0x270 > Call Trace: > ? __perf_read_group_add+0x3b0/0x3b0 > ? __kasan_check_write+0x14/0x20 > ? __perf_event_init_context+0x154/0x170 > inherit_task_group.isra.0.part.0+0x14b/0x170 > perf_event_init_task+0x296/0x4b0 Fix this by skipping over events that are getting closed, in the inheritance path. Signed-off-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Fixes: f733c6b508bc ("perf/core: Fix inheritance of aux_output groups") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-13perf/core: Disallow uncore-cgroup eventsPeter Zijlstra1-0/+9
While discussing uncore event scheduling, I noticed we do not in fact seem to dis-allow making uncore-cgroup events. Such events make no sense what so ever because the cgroup is a CPU local state where uncore counts across a number of CPUs. Disallow them. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2019-11-13sched/pelt: Fix update of blocked PELT orderingVincent Guittot1-9/+20
update_cfs_rq_load_avg() can call cpufreq_update_util() to trigger an update of the frequency. Make sure that RT, DL and IRQ PELT signals have been updated before calling cpufreq. Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: 371bf4273269 ("sched/rt: Add rt_rq utilization tracking") Fixes: 3727e0e16340 ("sched/dl: Add dl_rq utilization tracking") Fixes: 91c27493e78d ("sched/irq: Add IRQ utilization tracking") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-13sched/core: Avoid spurious lock dependenciesPeter Zijlstra1-1/+2
While seemingly harmless, __sched_fork() does hrtimer_init(), which, when DEBUG_OBJETS, can end up doing allocations. This then results in the following lock order: rq->lock zone->lock.rlock batched_entropy_u64.lock Which in turn causes deadlocks when we do wakeups while holding that batched_entropy lock -- as the random code does. Solve this by moving __sched_fork() out from under rq->lock. This is safe because nothing there relies on rq->lock, as also evident from the other __sched_fork() callsite. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Qian Cai <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-12Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds1-1/+26
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TSX Async Abort and iTLB Multihit mitigations from Thomas Gleixner: "The performance deterioration departement is not proud at all of presenting the seventh installment of speculation mitigations and hardware misfeature workarounds: 1) TSX Async Abort (TAA) - 'The Annoying Affair' TAA is a hardware vulnerability that allows unprivileged speculative access to data which is available in various CPU internal buffers by using asynchronous aborts within an Intel TSX transactional region. The mitigation depends on a microcode update providing a new MSR which allows to disable TSX in the CPU. CPUs which have no microcode update can be mitigated by disabling TSX in the BIOS if the BIOS provides a tunable. Newer CPUs will have a bit set which indicates that the CPU is not vulnerable, but the MSR to disable TSX will be available nevertheless as it is an architected MSR. That means the kernel provides the ability to disable TSX on the kernel command line, which is useful as TSX is a truly useful mechanism to accelerate side channel attacks of all sorts. 2) iITLB Multihit (NX) - 'No eXcuses' iTLB Multihit is an erratum where some Intel processors may incur a machine check error, possibly resulting in an unrecoverable CPU lockup, when an instruction fetch hits multiple entries in the instruction TLB. This can occur when the page size is changed along with either the physical address or cache type. A malicious guest running on a virtualized system can exploit this erratum to perform a denial of service attack. The workaround is that KVM marks huge pages in the extended page tables as not executable (NX). If the guest attempts to execute in such a page, the page is broken down into 4k pages which are marked executable. The workaround comes with a mechanism to recover these shattered huge pages over time. Both issues come with full documentation in the hardware vulnerabilities section of the Linux kernel user's and administrator's guide. Thanks to all patch authors and reviewers who had the extraordinary priviledge to be exposed to this nuisance. Special thanks to Borislav Petkov for polishing the final TAA patch set and to Paolo Bonzini for shepherding the KVM iTLB workarounds and providing also the backports to stable kernels for those!" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs Documentation: Add ITLB_MULTIHIT documentation kvm: x86: mmu: Recovery of shattered NX large pages kvm: Add helper function for creating VM worker threads kvm: mmu: ITLB_MULTIHIT mitigation cpu/speculation: Uninline and export CPU mitigations helpers x86/cpu: Add Tremont to the cpu vulnerability whitelist x86/bugs: Add ITLB_MULTIHIT bug infrastructure x86/tsx: Add config options to set tsx=on|off|auto x86/speculation/taa: Add documentation for TSX Async Abort x86/tsx: Add "auto" option to the tsx= cmdline parameter kvm/x86: Export MDS_NO=0 to guests when TSX is enabled x86/speculation/taa: Add sysfs reporting for TSX Async Abort x86/speculation/taa: Add mitigation for TSX Async Abort x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default x86/cpu: Add a helper function x86_read_arch_cap_msr() x86/msr: Add the IA32_TSX_CTRL MSR
2019-11-12cgroup: use cgrp->kn->id as the cgroup IDTejun Heo4-55/+29
cgroup ID is currently allocated using a dedicated per-hierarchy idr and used internally and exposed through tracepoints and bpf. This is confusing because there are tracepoints and other interfaces which use the cgroupfs ino as IDs. The preceding changes made kn->id exposed as ino as 64bit ino on supported archs or ino+gen (low 32bits as ino, high gen). There's no reason for cgroup to use different IDs. The kernfs IDs are unique and userland can easily discover them and map them back to paths using standard file operations. This patch replaces cgroup IDs with kernfs IDs. * cgroup_id() is added and all cgroup ID users are converted to use it. * kernfs_node creation is moved to earlier during cgroup init so that cgroup_id() is available during init. * While at it, s/cgroup/cgrp/ in psi helpers for consistency. * Fallback ID value is changed to 1 to be consistent with root cgroup ID. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Cc: Namhyung Kim <[email protected]>
2019-11-12kernfs: use 64bit inos if ino_t is 64bitTejun Heo1-4/+17
Each kernfs_node is identified with a 64bit ID. The low 32bit is exposed as ino and the high gen. While this already allows using inos as keys by looking up with wildcard generation number of 0, it's adding unnecessary complications for 64bit ino archs which can directly use kernfs_node IDs as inos to uniquely identify each cgroup instance. This patch exposes IDs directly as inos on 64bit ino archs. The conversion is mostly straight-forward. * 32bit ino archs behave the same as before. 64bit ino archs now use the whole 64bit ID as ino and the generation number is fixed at 1. * 64bit inos still use the same idr allocator which gurantees that the lower 32bits identify the current live instance uniquely and the high 32bits are incremented whenever the low bits wrap. As the upper 32bits are no longer used as gen and we don't wanna start ino allocation with 33rd bit set, the initial value for highbits allocation is changed to 0 on 64bit ino archs. * blktrace exposes two 32bit numbers - (INO,GEN) pair - to identify the issuing cgroup. Userland builds FILEID_INO32_GEN fids from these numbers to look up the cgroups. To remain compatible with the behavior, always output (LOW32,HIGH32) which will be constructed back to the original 64bit ID by __kernfs_fh_to_dentry(). Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Cc: Namhyung Kim <[email protected]>
2019-11-12kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id()Tejun Heo1-1/+1
kernfs_find_and_get_node_by_ino() looks the kernfs_node matching the specified ino. On top of that, kernfs_get_node_by_id() and kernfs_fh_get_inode() implement full ID matching by testing the rest of ID. On surface, confusingly, the two are slightly different in that the latter uses 0 gen as wildcard while the former doesn't - does it mean that the latter can't uniquely identify inodes w/ 0 gen? In practice, this is a distinction without a difference because generation number starts at 1. There are no actual IDs with 0 gen, so it can always safely used as wildcard. Let's simplify the code by renaming kernfs_find_and_get_node_by_ino() to kernfs_find_and_get_node_by_id(), moving all lookup logics into it, and removing now unnecessary kernfs_get_node_by_id(). Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]>
2019-11-12kernfs: convert kernfs_node->id from union kernfs_node_id to u64Tejun Heo4-40/+34
kernfs_node->id is currently a union kernfs_node_id which represents either a 32bit (ino, gen) pair or u64 value. I can't see much value in the usage of the union - all that's needed is a 64bit ID which the current code is already limited to. Using a union makes the code unnecessarily complicated and prevents using 64bit ino without adding practical benefits. This patch drops union kernfs_node_id and makes kernfs_node->id a u64. ino is stored in the lower 32bits and gen upper. Accessors - kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the ino and gen. This simplifies ID handling less cumbersome and will allow using 64bit inos on supported archs. This patch doesn't make any functional changes. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Alexei Starovoitov <[email protected]>
2019-11-12sched/Kconfig: Fix spelling mistake in user-visible help textSrivatsa S. Bhat (VMware)1-1/+1
Fix a spelling mistake in the help text for PREEMPT_RT. Signed-off-by: Srivatsa S. Bhat (VMware) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/157204450499.10518.4542293884417101528.stgit@srivatsa-ubuntu
2019-11-12time: Fix spelling mistake in commentMukesh Ojha1-1/+1
witin => within Signed-off-by: Mukesh Ojha <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-11-12time: Optimize ns_to_timespec64()Arnd Bergmann1-9/+12
ns_to_timespec64() calls div_s64_rem(), which is a rather slow function on 32-bit architectures, as it cannot take advantage of the do_div() optimizations for constant arguments. Open-code the div_s64_rem() function in ns_to_timespec64(), so a constant divider can be passed into the optimized div_u64_rem() function. Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-11-12ntp/y2038: Remove incorrect time_t truncationArnd Bergmann1-1/+1
A cast to 'time_t' was accidentally left in place during the conversion of __do_adjtimex() to 64-bit timestamps, so the resulting value is incorrectly truncated. Remove the cast so the 64-bit time gets propagated correctly. Fixes: ead25417f82e ("timex: use __kernel_timex internally") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-11-11cpuidle: Use nanoseconds as the unit of timeRafael J. Wysocki1-1/+1
Currently, the cpuidle subsystem uses microseconds as the unit of time which (among other things) causes the idle loop to incur some integer division overhead for no clear benefit. In order to allow cpuidle to measure time in nanoseconds, add two new fields, exit_latency_ns and target_residency_ns, to represent the exit latency and target residency of an idle state in nanoseconds, respectively, to struct cpuidle_state and initialize them with the help of the corresponding values in microseconds provided by drivers. Additionally, change cpuidle_governor_latency_req() to return the idle state exit latency constraint in nanoseconds. Also meeasure idle state residency (last_residency_ns in struct cpuidle_device and time_ns in struct cpuidle_driver) in nanoseconds and update the cpuidle core and governors accordingly. However, the menu governor still computes typical intervals in microseconds to avoid integer overflows. Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Doug Smythies <[email protected]> Tested-by: Doug Smythies <[email protected]>
2019-11-11Merge branch 'for-5.4-fixes' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "There's an inadvertent preemption point in ptrace_stop() which was reliably triggering for a test scenario significantly slowing it down. This contains Oleg's fix to remove the unwanted preemption point" * 'for-5.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop()
2019-11-11kheaders: explain why include/config/autoconf.h is excluded from md5sumMasahiro Yamada1-2/+9
This comment block explains why include/generated/compile.h is omitted, but nothing about include/generated/autoconf.h, which might be more difficult to understand. Add more comments. Signed-off-by: Masahiro Yamada <[email protected]>
2019-11-11kheaders: remove the last bashism to allow sh to run itMasahiro Yamada2-7/+8
'pushd' ... 'popd' is the last bash-specific code in this script. One way to avoid it is to run the code in a sub-shell. With that addressed, you can run this script with sh. I replaced $(BASH) with $(CONFIG_SHELL), and I changed the hashbang to #!/bin/sh. Signed-off-by: Masahiro Yamada <[email protected]>
2019-11-11kheaders: optimize header copy for in-tree buildsMasahiro Yamada1-7/+9
This script copies headers by the cpio command twice; first from srctree, and then from objtree. However, when we building in-tree, we know the srctree and the objtree are the same. That is, all the headers copied by the first cpio are overwritten by the second one. Skip the first cpio when we are building in-tree. Signed-off-by: Masahiro Yamada <[email protected]>
2019-11-11kheaders: optimize md5sum calculation for in-tree buildsMasahiro Yamada1-16/+16
This script computes md5sum of headers in srctree and in objtree. However, when we are building in-tree, we know the srctree and the objtree are the same. That is, we end up with the same computation twice. In fact, the first two lines of kernel/kheaders.md5 are always the same for in-tree builds. Unify the two md5sum calculations. For in-tree builds ($building_out_of_srctree is empty), we check only two directories, "include", and "arch/$SRCARCH/include". For out-of-tree builds ($building_out_of_srctree is 1), we check 4 directories, "$srctree/include", "$srctree/arch/$SRCARCH/include", "include", and "arch/$SRCARCH/include" since we know they are all different. Signed-off-by: Masahiro Yamada <[email protected]>
2019-11-11kheaders: remove unneeded 'cat' command piped to 'head' / 'tail'Masahiro Yamada1-4/+4
The 'head' and 'tail' commands can take a file path directly. So, you do not need to run 'cat'. cat kernel/kheaders.md5 | head -1 ... is equivalent to: head -1 kernel/kheaders.md5 and the latter saves forking one process. While I was here, I replaced 'head -1' with 'head -n 1'. I also replaced '==' with '=' since we do not have a good reason to use the bashism. Signed-off-by: Masahiro Yamada <[email protected]>
2019-11-11dma-debug: increase HASH_SIZEEric Dumazet1-2/+2
With modern NIC, it is not unusual having about ~256,000 active dma mappings and a hash size of 1024 buckets is too small. Forcing full cache line per bucket does not seem useful, especially now that we have contention on free_entries_lock for allocations and freeing of entries. Better use the space to fit more buckets. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-11-11dma-debug: reorder struct dma_debug_entry fieldsEric Dumazet1-7/+7
Move all fields used during exact match lookups to the first cache line. This makes debug_dma_mapping_error() and friends about 50% faster. Since it removes two 32bit holes, force a cacheline alignment on struct dma_debug_entry. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-11-11dma-mapping: merge the generic remapping helpers into dma-directChristoph Hellwig2-61/+48
Integrate the generic dma remapping implementation into the main flow. This prepares for architectures like xtensa that use an uncached segment for pages in the kernel mapping, but can also remap highmem from CMA. To simplify that implementation we now always deduct the page from the physical address via the DMA address instead of the virtual address. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Max Filippov <[email protected]>
2019-11-11dma-direct: provide mmap and get_sgtable method overridesChristoph Hellwig4-47/+75
For dma-direct we know that the DMA address is an encoding of the physical address that we can trivially decode. Use that fact to provide implementations that do not need the arch_dma_coherent_to_pfn architecture hook. Note that we still can only support mmap of non-coherent memory only if the architecture provides a way to set an uncached bit in the page tables. This must be true for architectures that use the generic remap helpers, but other architectures can also manually select it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Max Filippov <[email protected]>
2019-11-11stacktrace: Get rid of unneeded '!!' patternJiri Slaby1-2/+2
My commit b0c51f158455 ("stacktrace: Don't skip first entry on noncurrent tasks") adds one or zero to skipnr by "!!(current == tsk)". But the C99 standard says: The == (equal to) and != (not equal to) operators are ... Each of the operators yields 1 if the specified relation is true and 0 if it is false. So there is no need to prepend the above expression by "!!" -- remove it. Reported-by: Joe Perches <[email protected]> Signed-off-by: Jiri Slaby <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-11irq_work: Slightly simplify IRQ_WORK_PENDING clearingFrederic Weisbecker1-4/+3
Instead of fetching the value of flags and perform an xchg() to clear a bit, just use atomic_fetch_andnot() that is more suitable to do that job in one operation while keeping the full ordering. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E . McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-11irq_work: Fix irq_work_claim() memory orderingFrederic Weisbecker1-15/+7
When irq_work_claim() finds IRQ_WORK_PENDING flag already set, we just return and don't raise a new IPI. We expect the destination to see and handle our latest updades thanks to the pairing atomic_xchg() in irq_work_run_list(). But cmpxchg() doesn't guarantee a full memory barrier upon failure. So it's possible that the destination misses our latest updates. So use atomic_fetch_or() instead that is unconditionally fully ordered and also performs exactly what we want here and simplify the code. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E . McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-11irq_work: Convert flags to atomic_tFrederic Weisbecker4-12/+12
We need to convert flags to atomic_t in order to later fix an ordering issue on atomic_cmpxchg() failure. This will allow us to use atomic_fetch_or(). Also clarify the nature of those flags. [ mingo: Converted two more usage site the original patch missed. ] Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E . McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-11sched/core: Further clarify sched_class::set_next_task()Peter Zijlstra6-11/+17
It turns out there really is something special to the first set_next_task() invocation. In specific the 'change' pattern really should not cause balance callbacks. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: f95d4eaee6d0 ("sched/{rt,deadline}: Fix set_next_task vs pick_next_task") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-11sched/fair: Use mul_u32_u32()Peter Zijlstra1-2/+1
While reading the code I encountered another site where we should be using mul_u32_u32() because GCC just won't take a hint. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-11sched/core: Simplify sched_class::pick_next_task()Peter Zijlstra7-35/+16
Now that the indirect class call never uses the last two arguments of pick_next_task(), remove them. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-11sched/core: Optimize pick_next_task()Peter Zijlstra4-5/+8
Ever since we moved the sched_class definitions into their own files, the constant expression {fair,idle}_sched_class.pick_next_task() is not in fact a compile time constant anymore and results in an indirect call (barring LTO). Fix that by exposing pick_next_task_{fair,idle}() directly, this gets rid of the indirect call (and RETPOLINE) on the fast path. Also remove the unlikely() from the idle case, it is in fact /the/ way we select idle -- and that is a very common thing to do. Performance for will-it-scale/sched_yield improves by 2% (as reported by 0-day). Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-11sched/core: Make pick_next_task_idle() more consistentPeter Zijlstra2-4/+5
Only pick_next_task_fair() needs the @prev and @rf argument; these are required to implement the cpu-cgroup optimization. None of the other pick_next_task() methods need this. Make pick_next_task_idle() more consistent. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-11sched/fair: Better document newidle_balance()Peter Zijlstra1-0/+5
Whilst chasing the pick_next_task() race, there was some confusion about the newidle_balance() return values. Document them. [ mingo: Minor edits. ] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-11-11Merge tag 'v5.4-rc7' into sched/core, to pick up fixesIngo Molnar46-350/+836
Signed-off-by: Ingo Molnar <[email protected]>
2019-11-11Merge tag 'v5.4-rc7' into perf/core, to pick up fixesIngo Molnar17-89/+224
Signed-off-by: Ingo Molnar <[email protected]>
2019-11-10Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds1-6/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A small set of fixes for timekeepoing and clocksource drivers: - VDSO data was updated conditional on the availability of a VDSO capable clocksource. This causes the VDSO functions which do not depend on a VDSO capable clocksource to operate on stale data. Always update unconditionally. - Prevent a double free in the mediatek driver - Use the proper helper in the sh_mtu2 driver so it won't attempt to initialize non-existing interrupts" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping/vsyscall: Update VDSO data unconditionally clocksource/drivers/sh_mtu2: Do not loop using platform_get_irq_by_name() clocksource/drivers/mediatek: Fix error handling
2019-11-10Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds7-59/+113
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "Two fixes for scheduler regressions: - Plug a subtle race condition which was introduced with the rework of the next task selection functionality. The change of task properties became unprotected which can be observed inconsistently causing state corruption. - A trivial compile fix for CONFIG_CGROUPS=n" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix pick_next_task() vs 'change' pattern race sched/core: Fix compilation error when cgroup not selected
2019-11-10Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixlet from Thomas Gleixner: "A trivial fix for a kernel doc regression where an argument change was not reflected in the documentation" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irq/irqdomain: Update __irq_domain_alloc_fwnode() function documentation