aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2016-05-06sched: Allow per-cpu kernel threads to run on online && !activePeter Zijlstra (Intel)1-7/+42
In order to enable symmetric hotplug, we must mirror the online && !active state of cpu-down on the cpu-up side. However, to retain sanity, limit this state to per-cpu kthreads. Aside from the change to set_cpus_allowed_ptr(), which allow moving the per-cpu kthreads on, the other critical piece is the cpu selection for pinned tasks in select_task_rq(). This avoids dropping into select_fallback_rq(). select_fallback_rq() cannot be allowed to select !active cpus because its used to migrate user tasks away. And we do not want to move user tasks onto cpus that are in transition. Requested-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Thomas Gleixner <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Jan H. Schönherr <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-05-06Merge tag 'keys-next-20160505' of ↵James Morris1-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next
2016-05-05perf/arm: Special-case hetereogeneous CPUsMark Rutland1-1/+7
Commit: 26657848502b7847 ("perf/core: Verify we have a single perf_hw_context PMU") forcefully prevents multiple PMUs from sharing perf_hw_context, as this generally doesn't make sense. It is a common bug for uncore PMUs to use perf_hw_context rather than perf_invalid_context, which this detects. However, systems exist with heterogeneous CPUs (and hence heterogeneous HW PMUs), for which sharing perf_hw_context is necessary, and possible in some limited cases. To make this work we have to perform some gymnastics, as we did in these commits: 66eb579e66ecfea5 ("perf: allow for PMU-specific event filtering") c904e32a69b7c779 ("arm: perf: filter unschedulable events") To allow those systems to work, we must allow PMUs for heterogeneous CPUs to share perf_hw_context, though we must still disallow sharing otherwise to detect the common misuse of perf_hw_context. This patch adds a new PERF_PMU_CAP_HETEROGENEOUS_CPUS for this, updates the core logic to account for this, and makes use of it in the arm_pmu code that is used for systems with heterogeneous CPUs. Comments are added to make the rationale clear and hopefully avoid accidental abuse. Signed-off-by: Mark Rutland <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/20160426103346.GA20836@leverpostej Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05perf/core: Let userspace know if the PMU supports address filtersAlexander Shishkin1-0/+26
Export an additional common attribute for PMUs that support address range filtering to let the perf userspace identify such PMUs in a uniform way. Signed-off-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/1461771888-10409-8-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05perf/core: Introduce address range filteringAlexander Shishkin1-16/+607
Many instruction tracing PMUs out there support address range-based filtering, which would, for example, generate trace data only for a given range of instruction addresses, which is useful for tracing individual functions, modules or libraries. Other PMUs may also utilize this functionality to allow filtering to or filtering out code at certain address ranges. This patch introduces the interface for userspace to specify these filters and for the PMU drivers to apply these filters to hardware configuration. The user interface is an ASCII string that is passed via an ioctl() and specifies (in the form of an ASCII string) address ranges within certain object files or within kernel. There is no special treatment for kernel modules yet, but it might be a worthy pursuit. The PMU driver interface basically adds two extra callbacks to the PMU driver structure, one of which validates the filter configuration proposed by the user against what the hardware is actually capable of doing and the other one translates hardware-independent filter configuration into something that can be programmed into the hardware. Signed-off-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Mathieu Poirier <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/1461771888-10409-6-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05perf/core: Extend perf_event_aux_ctx() to optionally iterate through more eventsAlexander Shishkin1-10/+13
Trace filtering code needs an iterator that can go through all events in a context, including inactive and filtered, to be able to update their filters' address ranges based on mmap or exec events. This patch changes perf_event_aux_ctx() to optionally do this. Signed-off-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/1461771888-10409-5-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05perf/core: Move set_filter() out of CONFIG_EVENT_TRACINGAlexander Shishkin1-23/+22
For instruction trace filtering, namely, for communicating filter definitions from userspace, I'd like to re-use the SET_FILTER code that the tracepoints are using currently. To that end, move the relevant code out from behind the CONFIG_EVENT_TRACING dependency. Signed-off-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/1461771888-10409-2-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar12-25/+135
Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05locking/atomics: Flip atomic_fetch_or() argumentsPeter Zijlstra1-2/+2
All the atomic operations have their arguments the wrong way around; make atomic_fetch_or() consistent and flip them. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05locking/pvqspinlock: Robustify init_qspinlock_stat()Davidlohr Bueso1-8/+14
Specifically around the debugfs file creation calls, I have no idea if they could ever possibly fail, but this is core code (debug aside) so lets at least check the return value and inform anything fishy. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Waiman Long <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05locking/pvqspinlock: Avoid double resetting of statsDavidlohr Bueso1-2/+0
... remove the redundant second iteration, this is most likely a copy/past buglet. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05Merge tag 'v4.6-rc6' into locking/core, to pick up fixesIngo Molnar15-48/+188
Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05sched/fair: Fix comment in calculate_imbalance()Dietmar Eggemann1-3/+4
The comment in calculate_imbalance() was introduced in commit: 2dd73a4f09be ("[PATCH] sched: implement smpnice") which described the logic as it was then, but a later commit: b18855500fc4 ("sched/balancing: Fix 'local->avg_load > sds->avg_load' case in calculate_imbalance()") .. complicated this logic some more so that the comment does not match anymore. Update the comment to match the code. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05sched/fair: Remove stale power aware scheduling commentsDietmar Eggemann1-10/+3
Commit 8e7fbcbc22c1 ("sched: Remove stale power aware scheduling remnants and dysfunctional knobs") deleted the power aware scheduling support. This patch gets rid of the remaining power aware scheduling related comments in the code as well. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05sched/fair: Update rq clock before updating nohz CPU loadMatt Fleming1-0/+1
If we're accessing rq_clock() (e.g. in sched_avg_update()) we should update the rq clock before calling cpu_load_update(), otherwise any time calculations will be stale. All other paths currently call update_rq_clock(). Signed-off-by: Matt Fleming <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Wanpeng Li <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05sched/debug: Print out idle balance values even on !CONFIG_SCHEDSTATS kernelsWanpeng Li1-5/+5
The max_idle_balance_cost and avg_idle values which are tracked and ar used to capture short idle incidents, are not associated with schedstats, however the information of these two values isn't printed out on !CONFIG_SCHEDSTATS kernels. Fix this by moving the value printout out of the CONFIG_SCHEDSTATS section. Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05sched/fair: Optimize sum computation with a lookup tableYuyang Du1-8/+13
__compute_runnable_contrib() uses a loop to compute sum, whereas a table lookup can do it faster in a constant amount of time. The program to generate the constants is located at: Documentation/scheduler/sched-avg.txt Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Morten Rasmussen <[email protected]> Acked-by: Vincent Guittot <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05sched/fair: Rename SCHED_LOAD_SHIFT to NICE_0_LOAD_SHIFT and remove ↵Yuyang Du2-13/+13
SCHED_LOAD_SCALE After cleaning up the sched metrics, there are two definitions that are ambiguous and confusing: SCHED_LOAD_SHIFT and SCHED_LOAD_SHIFT. Resolve this: - Rename SCHED_LOAD_SHIFT to NICE_0_LOAD_SHIFT, which better reflects what it is. - Replace SCHED_LOAD_SCALE use with SCHED_CAPACITY_SCALE and remove SCHED_LOAD_SCALE. Suggested-by: Ben Segall <[email protected]> Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] [ Rewrote the changelog and fixed the build on 32-bit kernels. ] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05sched/fair: Generalize the load/util averages resolution definitionYuyang Du2-9/+10
Integer metric needs fixed point arithmetic. In sched/fair, a few metrics, e.g., weight, load, load_avg, util_avg, freq, and capacity, may have different fixed point ranges, which makes their update and usage error-prone. In order to avoid the errors relating to the fixed point range, we definie a basic fixed point range, and then formalize all metrics to base on the basic range. The basic range is 1024 or (1 << 10). Further, one can recursively apply the basic range to have larger range. Pointed out by Ben Segall, weight (visible to user, e.g., NICE-0 has 1024) and load (e.g., NICE_0_LOAD) have independent ranges, but they must be well calibrated. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05sched/core: Enable increased load resolution on 64-bit kernelsPeter Zijlstra1-4/+6
Mike ran into the low load resolution limitation on his big machine. So reenable these bits; nobody could ever reproduce/analyze the reported power usage claim and Google has been running with this for years as well. Reported-by: Mike Galbraith <[email protected]> Tested-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05locking/lockdep, sched/core: Implement a better lock pinning schemePeter Zijlstra8-62/+123
The problem with the existing lock pinning is that each pin is of value 1; this mean you can simply unpin if you know its pinned, without having any extra information. This scheme generates a random (16 bit) cookie for each pin and requires this same cookie to unpin. This means you have to keep the cookie in context. No objsize difference for !LOCKDEP kernels. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05sched/core: Introduce 'struct rq_flags'Peter Zijlstra3-56/+62
In order to be able to pass around more than just the IRQ flags in the future, add a rq_flags structure. No difference in code generation for the x86_64-defconfig build I tested. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05sched/core: Move task_rq_lock() out of linePeter Zijlstra2-63/+69
Its a rather large function, inline doesn't seems to make much sense: $ size defconfig-build/kernel/sched/core.o{.orig,} text data bss dec hex filename 56533 21037 2320 79890 13812 defconfig-build/kernel/sched/core.o.orig 55733 21037 2320 79090 134f2 defconfig-build/kernel/sched/core.o The 'perf bench sched messaging' micro-benchmark shows a visible improvement of 4-5%: $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i ; done $ perf stat --null --repeat 25 -- perf bench sched messaging -g 40 -l 5000 pre: 4.582798193 seconds time elapsed ( +- 1.41% ) 4.733374877 seconds time elapsed ( +- 2.10% ) 4.560955136 seconds time elapsed ( +- 1.43% ) 4.631062303 seconds time elapsed ( +- 1.40% ) post: 4.364765213 seconds time elapsed ( +- 0.91% ) 4.454442734 seconds time elapsed ( +- 1.18% ) 4.448893817 seconds time elapsed ( +- 1.41% ) 4.424346872 seconds time elapsed ( +- 0.97% ) Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-05Merge branch 'sched/urgent' into sched/core, to pick up fixes before ↵Ingo Molnar6-27/+72
applying new changes Signed-off-by: Ingo Molnar <[email protected]>
2016-05-04seccomp: Fix comment typoMickaël Salaün1-1/+1
Drop accidentally repeated word in comment. Signed-off-by: Mickaël Salaün <[email protected]> Cc: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Will Drewry <[email protected]>
2016-05-04signals/sigaltstack: Report current flag bits in sigaltstack()Andy Lutomirski1-1/+2
sigaltstack()'s reported previous state uses a somewhat odd convention, but the concept of flag bits is new, and we can do the flag bits sensibly. Specifically, let's just report them directly. This will allow saving and restoring the sigaltstack state using sigaltstack() to work correctly. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Al Viro <[email protected]> Cc: Amanieu d'Antras <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Stas Sergeev <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/94b291ec9fd47741a9264851e316e158ded0b00d.1462296606.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2016-05-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller11-63/+188
Conflicts: net/ipv4/ip_gre.c Minor conflicts between tunnel bug fixes in net and ipv6 tunnel cleanups in net-next. Signed-off-by: David S. Miller <[email protected]>
2016-05-03Merge tag 'trace-fixes-v4.6-rc6' of ↵Linus Torvalds1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Chunyu Hu noticed that if one writes into the trigger files within the ftrace subsystem of events that it can cause an oops. This file is only writable by root, but still is a bug that needs to be fixed" * tag 'trace-fixes-v4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Don't display trigger file for events that can't be enabled
2016-05-03tracing: Use temp buffer when filtering eventsSteven Rostedt (Red Hat)4-8/+185
Filtering of events requires the data to be written to the ring buffer before it can be decided to filter or not. This is because the parameters of the filter are based on the result that is written to the ring buffer and not on the parameters that are passed into the trace functions. The ftrace ring buffer is optimized for writing into the ring buffer and committing. The discard procedure used when filtering decides the event should be discarded is much more heavy weight. Thus, using a temporary filter when filtering events can speed things up drastically. Without a temp buffer we have: # trace-cmd start -p nop # perf stat -r 10 hackbench 50 0.790706626 seconds time elapsed ( +- 0.71% ) # trace-cmd start -e all # perf stat -r 10 hackbench 50 1.566904059 seconds time elapsed ( +- 0.27% ) # trace-cmd start -e all -f 'common_preempt_count==20' # perf stat -r 10 hackbench 50 1.690598511 seconds time elapsed ( +- 0.19% ) # trace-cmd start -e all -f 'common_preempt_count!=20' # perf stat -r 10 hackbench 50 1.707486364 seconds time elapsed ( +- 0.30% ) The first run above is without any tracing, just to get a based figure. hackbench takes ~0.79 seconds to run on the system. The second run enables tracing all events where nothing is filtered. This increases the time by 100% and hackbench takes 1.57 seconds to run. The third run filters all events where the preempt count will equal "20" (this should never happen) thus all events are discarded. This takes 1.69 seconds to run. This is 10% slower than just committing the events! The last run enables all events and filters where the filter will commit all events, and this takes 1.70 seconds to run. The filtering overhead is approximately 10%. Thus, the discard and commit of an event from the ring buffer may be about the same time. With this patch, the numbers change: # trace-cmd start -p nop # perf stat -r 10 hackbench 50 0.778233033 seconds time elapsed ( +- 0.38% ) # trace-cmd start -e all # perf stat -r 10 hackbench 50 1.582102692 seconds time elapsed ( +- 0.28% ) # trace-cmd start -e all -f 'common_preempt_count==20' # perf stat -r 10 hackbench 50 1.309230710 seconds time elapsed ( +- 0.22% ) # trace-cmd start -e all -f 'common_preempt_count!=20' # perf stat -r 10 hackbench 50 1.786001924 seconds time elapsed ( +- 0.20% ) The first run is again the base with no tracing. The second run is all tracing with no filtering. It is a little slower, but that may be well within the noise. The third run shows that discarding all events only took 1.3 seconds. This is a speed up of 23%! The discard is much faster than even the commit. The one downside is shown in the last run. Events that are not discarded by the filter will take longer to add, this is due to the extra copy of the event. Cc: Alexei Starovoitov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-05-03tracing: Don't display trigger file for events that can't be enabledChunyu Hu1-2/+7
Currently register functions for events will be called through the 'reg' field of event class directly without any check when seting up triggers. Triggers for events that don't support register through debug fs (events under events/ftrace are for trace-cmd to read event format, and most of them don't have a register function except events/ftrace/functionx) can't be enabled at all, and an oops will be hit when setting up trigger for those events, so just not creating them is an easy way to avoid the oops. Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] # 3.14+ Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework") Signed-off-by: Chunyu Hu <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-05-03signals/sigaltstack: Implement SS_AUTODISARM flagStas Sergeev2-3/+9
This patch implements the SS_AUTODISARM flag that can be OR-ed with SS_ONSTACK when forming ss_flags. When this flag is set, sigaltstack will be disabled when entering the signal handler; more precisely, after saving sas to uc_stack. When leaving the signal handler, the sigaltstack is restored by uc_stack. When this flag is used, it is safe to switch from sighandler with swapcontext(). Without this flag, the subsequent signal will corrupt the state of the switched-away sighandler. To detect the support of this functionality, one can do: err = sigaltstack(SS_DISABLE | SS_AUTODISARM); if (err && errno == EINVAL) unsupported(); Signed-off-by: Stas Sergeev <[email protected]> Cc: Al Viro <[email protected]> Cc: Aleksa Sarai <[email protected]> Cc: Amanieu d'Antras <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Heinrich Schuchardt <[email protected]> Cc: Jason Low <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Moore <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-03signals/sigaltstack: Prepare to add new SS_xxx flagsStas Sergeev1-10/+6
This patch adds SS_FLAG_BITS - the mask that splits sigaltstack mode values and bit-flags. Since there is no bit-flags yet, the mask is defined to 0. The flags are added by subsequent patches. With every new flag, the mask should have the appropriate bit cleared. This makes sure if some flag is tried on a kernel that doesn't support it, the -EINVAL error will be returned, because such a flag will be treated as an invalid mode rather than the bit-flag. That way the existence of the particular features can be probed at run-time. This change was suggested by Andy Lutomirski: https://lkml.org/lkml/2016/3/6/158 Signed-off-by: Stas Sergeev <[email protected]> Cc: Al Viro <[email protected]> Cc: Amanieu d'Antras <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-02tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logicSteven Rostedt (Red Hat)1-61/+10
Nothing sets TRACE_EVENT_FL_USE_CALL_FILTER anymore. Remove it. Signed-off-by: Steven Rostedt <[email protected]>
2016-05-02Merge getxattr prototype change into work.lookupsAl Viro1-1/+1
The rest of work.xattr stuff isn't needed for this branch
2016-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-36/+71
Pull networking fixes from David Miller: 1) MODULE_FIRMWARE firmware string not correct for iwlwifi 8000 chips, from Sara Sharon. 2) Fix SKB size checks in batman-adv stack on receive, from Sven Eckelmann. 3) Leak fix on mac80211 interface add error paths, from Johannes Berg. 4) Cannot invoke napi_disable() with BH disabled in myri10ge driver, fix from Stanislaw Gruszka. 5) Fix sign extension problem when computing feature masks in net_gso_ok(), from Marcelo Ricardo Leitner. 6) lan78xx driver doesn't count packets and packet lengths in its statistics properly, fix from Woojung Huh. 7) Fix the buffer allocation sizes in pegasus USB driver, from Petko Manolov. 8) Fix refcount overflows in bpf, from Alexei Starovoitov. 9) Unified dst cache handling introduced a preempt warning in ip_tunnel, fix by resetting rather then setting the cached route. From Paolo Abeni. 10) Listener hash collision test fix in soreuseport, from Craig Gallak * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits) gre: do not pull header in ICMP error processing net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case tipc: only process unicast on intended node cxgb3: fix out of bounds read net/smscx5xx: use the device tree for mac address soreuseport: Fix TCP listener hash collision net: l2tp: fix reversed udp6 checksum flags ip_tunnel: fix preempt warning in ip tunnel creation/updating samples/bpf: fix trace_output example bpf: fix check_map_func_compatibility logic bpf: fix refcnt overflow drivers: net: cpsw: use of_phy_connect() in fixed-link case dt: cpsw: phy-handle, phy_id, and fixed-link are mutually exclusive drivers: net: cpsw: don't ignore phy-mode if phy-handle is used drivers: net: cpsw: fix segfault in case of bad phy-handle drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config MAINTAINERS: net: Change maintainer for GRETH 10/100/1G Ethernet MAC device driver gre: reject GUE and FOU in collect metadata mode pegasus: fixes reported packet length pegasus: fixes URB buffer allocation size; ...
2016-05-02genirq: Allow the affinity of a percpu interrupt to be set/retrievedMarc Zyngier1-1/+25
In order to prepare the genirq layer for the concept of partitionned percpu interrupts, let's allow an affinity to be associated with such an interrupt. We introduce: - irq_set_percpu_devid_partition: flag an interrupt as a percpu-devid interrupt, and associate it with an affinity - irq_get_percpu_devid_partition: allow the affinity of that interrupt to be retrieved. This will allow a driver to discover which CPUs the per-cpu interrupt can actually fire on. Signed-off-by: Marc Zyngier <[email protected]> Cc: Mark Rutland <[email protected]> Cc: [email protected] Cc: Jason Cooper <[email protected]> Cc: Will Deacon <[email protected]> Cc: Rob Herring <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-05-02irqdomain: Allow domain matching on irq_fwspecMarc Zyngier1-9/+10
When iterating over the irq domain list, we try to match a domain either by calling a match() function or by comparing a number of fields passed as parameters. Both approaches are a bit restrictive: - match() is DT specific and only takes a device node - the fallback case only deals with the fwnode_handle It would be useful if we had a per-domain function that would actually perform the matching check on the whole of the irq_fwspec structure. This would allow for a domain to triage matching attempts that need to extend beyond the fwnode. Let's introduce irq_find_matching_fwspec(), which takes a full blown irq_fwspec structure, and call into a select() function implemented by the irqdomain. irq_find_matching_fwnode() is made a wrapper around irq_find_matching_fwspec in order to preserve compatibility. Signed-off-by: Marc Zyngier <[email protected]> Cc: Mark Rutland <[email protected]> Cc: [email protected] Cc: Jason Cooper <[email protected]> Cc: Will Deacon <[email protected]> Cc: Rob Herring <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-05-02genirq: Add error code reporting to irq_{reserve,destroy}_ipiMatt Redfearn1-14/+17
Make these functions return appropriate error codes when something goes wrong. Previously irq_destroy_ipi returned void making it impossible to notify the caller if the request could not be fulfilled. Patch 1 in the series added another condition in which this could fail in addition to the existing ones. irq_reserve_ipi returned an unsigned int meaning it could only return 0 on failure and give the caller no indication as to why the request failed. As time goes on there are likely to be further conditions added in which these functions can fail. These APIs and the IPI IRQ domain are new in 4.6 and the number of existing call sites are low, changing the API now has little impact on the code, while making it easier for these functions to grow over time. Signed-off-by: Matt Redfearn <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Qais Yousef <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-05-02genirq: Make irq_destroy_ipi take a cpumask of IPIs to destroyMatt Redfearn1-4/+14
Previously irq_destroy_ipi() would destroy IPIs to all CPUs that were configured by irq_reserve_ipi(). This change makes it possible to destroy just a subset of the IPIs. This may be useful to remove IPIs to CPUs that have been hot removed so that the IRQ numbers allocated within the IPI domain can be re-used. The original behaviour is restored by passing the complete mask that the IPI was created with. There are currently no users of this function that would break from the API change. Signed-off-by: Matt Redfearn <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Qais Yousef <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-04-29tracing: Remove unused function trace_current_buffer_lock_reserve()Steven Rostedt (Red Hat)1-10/+0
trace_current_buffer_lock_reserve() has no more users. Remove it. Signed-off-by: Steven Rostedt <[email protected]>
2016-04-29tracing: Remove one use of trace_current_buffer_lock_reserve()Steven Rostedt (Red Hat)1-8/+10
The only user of trace_current_buffer_lock_reserve() is in the boot up self tests. Restructure the code a little to have that code use what everything else uses: trace_event_buffer_lock_reserve(). Signed-off-by: Steven Rostedt <[email protected]>
2016-04-30livepatch: make object/func-walking helpers more robustMiroslav Benes1-0/+3
Current object-walking helper checks the presence of obj->funcs to determine the end of objs array in klp_object structure. This is somewhat fragile because one can easily forget about funcs definition during livepatch creation. In such a case the livepatch module is successfully loaded and all objects after the incorrect one are omitted. This is very confusing. Let's make the helper more robust and check also for the other external member, name. Thus the helper correctly stops on an empty item of the array. We need to have a check for obj->funcs in klp_init_object() to make it work. The same applies to a func-walking helper. As a benefit we'll check for new_func member definition during the livepatch initialization. There is no such check anywhere in the code now. [[email protected]: fix shortlog] Signed-off-by: Miroslav Benes <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Acked-by: Jessica Yu <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2016-04-29tracing: Have trace_buffer_unlock_commit() call the _regs version with NULLSteven Rostedt (Red Hat)2-15/+9
There's no real difference between trace_buffer_unlock_commit() and trace_buffer_unlock_commit_regs() except that the former passes NULL to ftrace_stack_trace() instead of regs. Have the former be a static inline of the latter which passes NULL for regs. Signed-off-by: Steven Rostedt <[email protected]>
2016-04-29tracing: Remove unused function trace_current_buffer_discard_commit()Steven Rostedt (Red Hat)1-8/+0
The function trace_current_buffer_discard_commit() has no callers, remove it. Signed-off-by: Steven Rostedt <[email protected]>
2016-04-29tracing: Move trace_buffer_unlock_commit{_regs}() to local headerSteven Rostedt (Red Hat)2-2/+10
The functions trace_buffer_unlock_commit() and the _regs() version are only used within the kernel/trace directory. Move them to the local header and remove the export as well. Signed-off-by: Steven Rostedt <[email protected]>
2016-04-29tracing: Fold filter_check_discard() into its only userSteven Rostedt (Red Hat)2-20/+6
The function filter_check_discard() is small and only called by one user, its code can be folded into that one caller and make the code a bit less comlplex. Signed-off-by: Steven Rostedt <[email protected]>
2016-04-29Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-3/+7
Merge fixes from Andrew Morton: "20 fixes" * emailed patches from Andrew Morton <[email protected]>: Documentation/sysctl/vm.txt: update numa_zonelist_order description lib/stackdepot.c: allow the stack trace hash to be zero rapidio: fix potential NULL pointer dereference mm/memory-failure: fix race with compound page split/merge ocfs2/dlm: return zero if deref_done message is successfully handled Ananth has moved kcov: don't profile branches in kcov kcov: don't trace the code coverage code mm: wake kcompactd before kswapd's short sleep .mailmap: add Frank Rowand mm/hwpoison: fix wrong num_poisoned_pages accounting mm: call swap_slot_free_notify() with page lock held mm: vmscan: reclaim highmem zone if buffer_heads is over limit numa: fix /proc/<pid>/numa_maps for THP mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check mailmap: fix Krzysztof Kozlowski's misspelled name thp: keep huge zero page pinned until tlb flush mm: exclude HugeTLB pages from THP page_mapped() logic kexec: export OFFSET(page.compound_head) to find out compound tail page kexec: update VMCOREINFO for compound_order/dtor
2016-04-28Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-17/+38
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "x86 PMU driver fixes plus a core code race fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix incorrect lbr_sel_mask value perf/x86/intel/pt: Don't die on VMXON perf/core: Fix perf_event_open() vs. execve() race perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX perf/core: Make sysctl_perf_cpu_time_max_percent conform to documentation perf/x86/intel/rapl: Add missing Haswell model perf/x86/intel: Add model number for Skylake Server to perf
2016-04-28Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds2-3/+36
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two lockdep fixes" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Fix lock_chain::base size locking/lockdep: Fix ->irq_context calculation
2016-04-28kcov: don't profile branches in kcovAndrey Ryabinin1-0/+1
Profiling 'if' statements in __sanitizer_cov_trace_pc() leads to unbound recursion and crash: __sanitizer_cov_trace_pc() -> ftrace_likely_update -> __sanitizer_cov_trace_pc() ... Define DISABLE_BRANCH_PROFILING to disable this tracer. Signed-off-by: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>