aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2016-02-29sched/rt: Fix PI handling vs. sched_setscheduler()Peter Zijlstra3-50/+111
Andrea Parri reported: > I found that the following scenario (with CONFIG_RT_GROUP_SCHED=y) is not > handled correctly: > > T1 (prio = 20) > lock(rtmutex); > > T2 (prio = 20) > blocks on rtmutex (rt_nr_boosted = 0 on T1's rq) > > T1 (prio = 20) > sys_set_scheduler(prio = 0) > [new_effective_prio == oldprio] > T1 prio = 20 (rt_nr_boosted = 0 on T1's rq) > > The last step is incorrect as T1 is now boosted (c.f., rt_se_boosted()); > in particular, if we continue with > > T1 (prio = 20) > unlock(rtmutex) > wakeup(T2) > adjust_prio(T1) > [prio != rt_mutex_getprio(T1)] > dequeue(T1) > rt_nr_boosted = (unsigned long)(-1) > ... > T1 prio = 0 > > then we end up leaving rt_nr_boosted in an "inconsistent" state. > > The simple program attached could reproduce the previous scenario; note > that, as a consequence of the presence of this state, the "assertion" > > WARN_ON(!rt_nr_running && rt_nr_boosted) > > from dec_rt_group() may trigger. So normally we dequeue/enqueue tasks in sched_setscheduler(), which would ensure the accounting stays correct. However in the early PI path we fail to do so. So this was introduced at around v3.14, by: c365c292d059 ("sched: Consider pi boosting in setscheduler()") which fixed another problem exactly because that dequeue/enqueue, joy. Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and have it preserve runqueue location with that option. This requires decoupling the on_rt_rq() state from being on the list. In order to allow for explicit movement during the SAVE/RESTORE, introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in these cases to preserve other invariants. Respecting the SAVE/RESTORE flags also has the (nice) side-effect that things like sys_nice()/sys_sched_setaffinity() also do not reorder FIFO tasks (whereas they used to before this patch). Reported-by: Andrea Parri <[email protected]> Tested-by: Andrea Parri <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29sched/core: Remove duplicated sched_group_set_shares() prototypeDongsheng Yang1-1/+0
Signed-off-by: Dongsheng Yang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29sched/fair: Consolidate nohz CPU load update codeFrederic Weisbecker1-23/+25
Lets factorize a bit of code there. We'll even have a third user soon. While at it, standardize the idle update function name against the others. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Byungchul Park <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E . McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29sched/fair: Avoid using decay_load_missed() with a negative valueByungchul Park1-2/+10
decay_load_missed() cannot handle nagative values, so we need to prevent using the function with a negative value. Reported-by: Dietmar Eggemann <[email protected]> Signed-off-by: Byungchul Park <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E . McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: 59543275488d ("sched/fair: Prepare __update_cpu_load() to handle active tickless") Link: http://lkml.kernel.org/r/20160115070749.GA1914@X58A-UD3R Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29Merge branch 'sched/urgent' into sched/core, to pick up fixes before ↵Ingo Molnar17-317/+546
applying new changes Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29sched/deadline: Always calculate end of period on sched_yield()Peter Zijlstra1-9/+13
Steven noticed that occasionally a sched_yield() call would not result in a wait for the next period edge as expected. It turns out that when we call update_curr_dl() and end up with delta_exec <= 0, we will bail early and fail to throttle. Further inspection of the yield code revealed that yield_task_dl() clearing dl.runtime is wrong too, it will not account the last bit of runtime which could result in dl.runtime < 0, which in turn means that replenish would gift us with too much runtime. Fix both issues by not relying on the dl.runtime value for yield. Reported-by: Steven Rostedt <[email protected]> Tested-by: Steven Rostedt <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Clark Williams <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: John Kacur <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29sched/cgroup: Fix cgroup entity load tracking tear-downPeter Zijlstra3-20/+23
When a cgroup's CPU runqueue is destroyed, it should remove its remaining load accounting from its parent cgroup. The current site for doing so it unsuited because its far too late and unordered against other cgroup removal (->css_free() will be, but we're also in an RCU callback). Put it in the ->css_offline() callback, which is the start of cgroup destruction, right after the group has been made unavailable to userspace. The ->css_offline() callbacks are called in hierarchical order after the following v4.4 commit: aa226ff4a1ce ("cgroup: make sure a parent css isn't offlined before its children") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Li Zefan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29perf: Export perf_event_sysfs_show()Thomas Gleixner1-0/+1
Required to use it in modular perf drivers. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Harish Chegondi <[email protected]> Cc: Jacob Pan <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-29Merge tag 'v4.5-rc6' into perf/core, to pick up fixesIngo Molnar8-170/+260
Signed-off-by: Ingo Molnar <[email protected]>
2016-02-28Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-128/+240
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A rather largish series of 12 patches addressing a maze of race conditions in the perf core code from Peter Zijlstra" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Robustify task_function_call() perf: Fix scaling vs. perf_install_in_context() perf: Fix scaling vs. perf_event_enable() perf: Fix scaling vs. perf_event_enable_on_exec() perf: Fix ctx time tracking by introducing EVENT_TIME perf: Cure event->pending_disable race perf: Fix race between event install and jump_labels perf: Fix cloning perf: Only update context time when active perf: Allow perf_release() with !event->ctx perf: Do not double free perf: Close install vs. exit race
2016-02-28Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixlet from Thomas Gleixner: "A trivial printk typo fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix trivial typo in printk() message
2016-02-25Merge tag 'trace-fixes-v4.5-rc5-2' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Another small bug reported to me by Chunyu Hu. When perf added a "reg" function to the function tracing event (not a tracepoint), it caused that event to be displayed as a tracepoint and could cause errors in tracepoint handling. That was solved by adding a flag to ignore ftrace non-tracepoint events. But that flag was missed when displaying events in available_events, which should only contain tracepoint events. This broke a documented way to enable all events with: cat available_events > set_event As the function non-tracepoint event would cause that to error out. The commit here fixes that by having the available_events file not list events that have the ignore flag set" * tag 'trace-fixes-v4.5-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix showing function event in available_events
2016-02-25Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - Two fixes for compatibility with the ACPI 6.1 specification. Without these fixes multi-interface DIMMs will fail to be probed, and address range scrub commands to find memory errors will give results that the kernel will mis-interpret. For multi-interface DIMMs Linux will accept either the original 6.0 implementation or 6.1. For address range scrub we'll only support 6.1 since ACPI formalized this DSM differently than the original example [1] implemented in v4.2. The expectation is that production systems will only ever ship the ACPI 6.1 address range scrub command definition. - The wider async address range scrub work targeting 4.6 discovered that the original synchronous implementation in 4.5 is not sizing its return buffer correctly. - Arnd caught that my recent fix to the size of the pfn_t flags missed updating the flags variable used in the pmem driver. - Toshi found that we mishandle the memremap() return value in devm_memremap(). * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm: use 'u64' for pfn flags devm_memremap: Fix error value when memremap failed nfit: update address range scrub commands to the acpi 6.1 format libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing nfit: fix multi-interface dimm handling, acpi6.1 compatibility
2016-02-25rcu: Use simple wait queues where possible in rcutreePaul Gortmaker3-30/+31
As of commit dae6e64d2bcfd ("rcu: Introduce proper blocking to no-CBs kthreads GP waits") the RCU subsystem started making use of wait queues. Here we convert all additions of RCU wait queues to use simple wait queues, since they don't need the extra overhead of the full wait queue features. Originally this was done for RT kernels[1], since we would get things like... BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt Pid: 8, comm: rcu_preempt Not tainted Call Trace: [<ffffffff8106c8d0>] __might_sleep+0xd0/0xf0 [<ffffffff817d77b4>] rt_spin_lock+0x24/0x50 [<ffffffff8106fcf6>] __wake_up+0x36/0x70 [<ffffffff810c4542>] rcu_gp_kthread+0x4d2/0x680 [<ffffffff8105f910>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff810c4070>] ? rcu_gp_fqs+0x80/0x80 [<ffffffff8105eabb>] kthread+0xdb/0xe0 [<ffffffff8106b912>] ? finish_task_switch+0x52/0x100 [<ffffffff817e0754>] kernel_thread_helper+0x4/0x10 [<ffffffff8105e9e0>] ? __init_kthread_worker+0x60/0x60 [<ffffffff817e0750>] ? gs_change+0xb/0xb ...and hence simple wait queues were deployed on RT out of necessity (as simple wait uses a raw lock), but mainline might as well take advantage of the more streamline support as well. [1] This is a carry forward of work from v3.10-rt; the original conversion was by Thomas on an earlier -rt version, and Sebastian extended it to additional post-3.10 added RCU waiters; here I've added a commit log and unified the RCU changes into one, and uprev'd it to match mainline RCU. Signed-off-by: Daniel Wagner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: Boqun Feng <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-02-25rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lockDaniel Wagner3-5/+18
rcu_nocb_gp_cleanup() is called while holding rnp->lock. Currently, this is okay because the wake_up_all() in rcu_nocb_gp_cleanup() will not enable the IRQs. lockdep is happy. By switching over using swait this is not true anymore. swake_up_all() enables the IRQs while processing the waiters. __do_softirq() can now run and will eventually call rcu_process_callbacks() which wants to grap nrp->lock. Let's move the rcu_nocb_gp_cleanup() call outside the lock before we switch over to swait. If we would hold the rnp->lock and use swait, lockdep reports following: ================================= [ INFO: inconsistent lock state ] 4.2.0-rc5-00025-g9a73ba0 #136 Not tainted --------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. rcu_preempt/8 [HC0[0]:SC0[0]:HE1:SE1] takes: (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0 {IN-SOFTIRQ-W} state was registered at: [<ffffffff81109b9f>] __lock_acquire+0xd5f/0x21e0 [<ffffffff8110be0f>] lock_acquire+0xdf/0x2b0 [<ffffffff81841cc9>] _raw_spin_lock_irqsave+0x59/0xa0 [<ffffffff81136991>] rcu_process_callbacks+0x141/0x3c0 [<ffffffff810b1a9d>] __do_softirq+0x14d/0x670 [<ffffffff810b2214>] irq_exit+0x104/0x110 [<ffffffff81844e96>] smp_apic_timer_interrupt+0x46/0x60 [<ffffffff81842e70>] apic_timer_interrupt+0x70/0x80 [<ffffffff810dba66>] rq_attach_root+0xa6/0x100 [<ffffffff810dbc2d>] cpu_attach_domain+0x16d/0x650 [<ffffffff810e4b42>] build_sched_domains+0x942/0xb00 [<ffffffff821777c2>] sched_init_smp+0x509/0x5c1 [<ffffffff821551e3>] kernel_init_freeable+0x172/0x28f [<ffffffff8182cdce>] kernel_init+0xe/0xe0 [<ffffffff8184231f>] ret_from_fork+0x3f/0x70 irq event stamp: 76 hardirqs last enabled at (75): [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60 hardirqs last disabled at (76): [<ffffffff8184116f>] _raw_spin_lock_irq+0x1f/0x90 softirqs last enabled at (0): [<ffffffff810a8df2>] copy_process.part.26+0x602/0x1cf0 softirqs last disabled at (0): [< (null)>] (null) other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(rcu_node_1); <Interrupt> lock(rcu_node_1); *** DEADLOCK *** 1 lock held by rcu_preempt/8: #0: (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0 stack backtrace: CPU: 0 PID: 8 Comm: rcu_preempt Not tainted 4.2.0-rc5-00025-g9a73ba0 #136 Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014 0000000000000000 000000006d7e67d8 ffff881fb081fbd8 ffffffff818379e0 0000000000000000 ffff881fb0812a00 ffff881fb081fc38 ffffffff8110813b 0000000000000000 0000000000000001 ffff881f00000001 ffffffff8102fa4f Call Trace: [<ffffffff818379e0>] dump_stack+0x4f/0x7b [<ffffffff8110813b>] print_usage_bug+0x1db/0x1e0 [<ffffffff8102fa4f>] ? save_stack_trace+0x2f/0x50 [<ffffffff811087ad>] mark_lock+0x66d/0x6e0 [<ffffffff81107790>] ? check_usage_forwards+0x150/0x150 [<ffffffff81108898>] mark_held_locks+0x78/0xa0 [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff81108a28>] trace_hardirqs_on_caller+0x168/0x220 [<ffffffff81108aed>] trace_hardirqs_on+0xd/0x10 [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60 [<ffffffff810fd1c7>] swake_up_all+0xb7/0xe0 [<ffffffff811386e1>] rcu_gp_kthread+0xab1/0xeb0 [<ffffffff811089bf>] ? trace_hardirqs_on_caller+0xff/0x220 [<ffffffff81841341>] ? _raw_spin_unlock_irq+0x41/0x60 [<ffffffff81137c30>] ? rcu_barrier+0x20/0x20 [<ffffffff810d2014>] kthread+0x104/0x120 [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260 [<ffffffff8184231f>] ret_from_fork+0x3f/0x70 [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260 Signed-off-by: Daniel Wagner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: Boqun Feng <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-02-25wait.[ch]: Introduce the simple waitqueue (swait) implementationPeter Zijlstra (Intel)2-1/+124
The existing wait queue support has support for custom wake up call backs, wake flags, wake key (passed to call back) and exclusive flags that allow wakers to be tagged as exclusive, for limiting the number of wakers. In a lot of cases, none of these features are used, and hence we can benefit from a slimmed down version that lowers memory overhead and reduces runtime overhead. The concept originated from -rt, where waitqueues are a constant source of trouble, as we can't convert the head lock to a raw spinlock due to fancy and long lasting callbacks. With the removal of custom callbacks, we can use a raw lock for queue list manipulations, hence allowing the simple wait support to be used in -rt. [Patch is from PeterZ which is based on Thomas version. Commit message is written by Paul G. Daniel: - Fixed some compile issues - Added non-lazy implementation of swake_up_locked as suggested by Boqun Feng.] Originally-by: Thomas Gleixner <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: Boqun Feng <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-02-25perf: Robustify task_function_call()Peter Zijlstra1-20/+20
Since there is no serialization between task_function_call() doing task_curr() and the other CPU doing context switches, we could end up not sending an IPI even if we had to. And I'm not sure I still buy my own argument we're OK. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25perf: Fix scaling vs. perf_install_in_context()Peter Zijlstra1-45/+70
Completely reworks perf_install_in_context() (again!) in order to ensure that there will be no ctx time hole between add_event_to_ctx() and any potential ctx_sched_in(). Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25perf: Fix scaling vs. perf_event_enable()Peter Zijlstra1-19/+23
Similar to the perf_enable_on_exec(), ensure that event timings are consistent across perf_event_enable(). Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25perf: Fix scaling vs. perf_event_enable_on_exec()Peter Zijlstra1-0/+1
The recent commit 3e349507d12d ("perf: Fix perf_enable_on_exec() event scheduling") caused this by moving task_ctx_sched_out() from before __perf_event_mask_enable() to after it. The overlooked consequence of that change is that task_ctx_sched_out() would update the ctx time fields, and now __perf_event_mask_enable() uses stale time. In order to fix this, explicitly stop our context's time before enabling the event(s). Reported-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: 3e349507d12d ("perf: Fix perf_enable_on_exec() event scheduling") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25perf: Fix ctx time tracking by introducing EVENT_TIMEPeter Zijlstra1-12/+30
Currently any ctx_sched_in() call will re-start the ctx time tracking, this means that calls like: ctx_sched_in(.event_type = EVENT_PINNED); ctx_sched_in(.event_type = EVENT_FLEXIBLE); will have a hole in their ctx time tracking. This is likely harmless but can confuse things a little. By adding EVENT_TIME, we can have the first ctx_sched_in() (is_active: 0 -> !0) start the time and any further ctx_sched_in() will leave the timestamps alone. Secondly, this allows for an early disable like: ctx_sched_out(.event_type = EVENT_TIME); which would update the ctx time (if the ctx is active) and any further calls to ctx_sched_out() would not further modify the ctx time. For ctx_sched_in() any 0 -> !0 transition will automatically include EVENT_TIME. For ctx_sched_out(), any transition that clears EVENT_ALL will automatically clear EVENT_TIME. These two rules ensure that under normal circumstances we need not bother with EVENT_TIME and get natural ctx time behaviour. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25perf: Cure event->pending_disable racePeter Zijlstra1-3/+3
Because event_sched_out() checks event->pending_disable _before_ actually disabling the event, it can happen that the event fires after it checks but before it gets disabled. This would leave event->pending_disable set and the queued irq_work will try and process it. However, if the event trigger was during schedule(), the event might have been de-scheduled by the time the irq_work runs, and perf_event_disable_local() will fail. Fix this by checking event->pending_disable _after_ we call event->pmu->del(). This depends on the latter being a compiler barrier, such that the compiler does not lift the load and re-creates the problem. Tested-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25perf: Fix race between event install and jump_labelsPeter Zijlstra1-8/+41
perf_install_in_context() relies upon the context switch hooks to have scheduled in events when the IPI misses its target -- after all, if the task has moved from the CPU (or wasn't running at all), it will have to context switch to run elsewhere. This however doesn't appear to be happening. It is possible for the IPI to not happen (task wasn't running) only to later observe the task running with an inactive context. The only possible explanation is that the context switch hooks are not called. Therefore put in a sync_sched() after toggling the jump_label to guarantee all CPUs will have them enabled before we install an event. A simple if (0->1) sync_sched() will not in fact work, because any further increment can race and complete before the sync_sched(). Therefore we must jump through some hoops. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25perf: Fix cloningPeter Zijlstra1-15/+14
Alexander reported that when the 'original' context gets destroyed, no new clones happen. This can happen irrespective of the ctx switch optimization, any task can die, even the parent, and we want to continue monitoring the task hierarchy until we either close the event or no tasks are left in the hierarchy. perf_event_init_context() will attempt to pin the 'parent' context during clone(). At that point current is the parent, and since current cannot have exited while executing clone(), its context cannot have passed through perf_event_exit_task_context(). Therefore perf_pin_task_context() cannot observe ctx->task == TASK_TOMBSTONE. However, since inherit_event() does: if (parent_event->parent) parent_event = parent_event->parent; it looks at the 'original' event when it does: is_orphaned_event(). This can return true if the context that contains the this event has passed through perf_event_exit_task_context(). And thus we'll fail to clone the perf context. Fix this by adding a new state: STATE_DEAD, which is set by perf_release() to indicate that the filedesc (or kernel reference) is dead and there are no observers for our data left. Only for STATE_DEAD will is_orphaned_event() be true and inhibit cloning. STATE_EXIT is otherwise preserved such that is_event_hup() remains functional and will report when the observed task hierarchy becomes empty. Reported-by: Alexander Shishkin <[email protected]> Tested-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25perf: Only update context time when activePeter Zijlstra1-6/+6
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25perf: Allow perf_release() with !event->ctxPeter Zijlstra1-3/+13
In the err_file: fput(event_file) case, the event will not yet have been attached to a context. However perf_release() does assume it has been. Cure this. Tested-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25perf: Do not double freePeter Zijlstra1-1/+6
In case of: err_file: fput(event_file), we'll end up calling perf_release() which in turn will free the event. Do not then free the event _again_. Tested-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-25perf: Close install vs. exit racePeter Zijlstra1-9/+26
Consider the following scenario: CPU0 CPU1 ctx = find_get_ctx(); perf_event_exit_task_context() mutex_lock(&ctx->mutex); perf_install_in_context(ctx, ...); /* NO-OP */ mutex_unlock(&ctx->mutex); ... perf_release() WARN_ON_ONCE(event->state != STATE_EXIT); Since the event doesn't pass through perf_remove_from_context() because perf_install_in_context() NO-OPs because the ctx is dead, and perf_event_exit_task_context() will not observe the event because its not attached yet, the event->state will not be set. Solve this by revalidating ctx->task after we acquire ctx->mutex and failing the event creation as a whole. Tested-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-24tracing: Fix showing function event in available_eventsSteven Rostedt (Red Hat)1-1/+2
The ftrace:function event is only displayed for parsing the function tracer data. It is not used to enable function tracing, and does not include an "enable" file in its event directory. Originally, this event was kept separate from other events because it did not have a ->reg parameter. But perf added a "reg" parameter for its use which caused issues, because it made the event available to functions where it was not compatible for. Commit 9b63776fa3ca9 "tracing: Do not enable function event with enable" added a TRACE_EVENT_FL_IGNORE_ENABLE flag that prevented the function event from being enabled by normal trace events. But this commit missed keeping the function event from being displayed by the "available_events" directory, which is used to show what events can be enabled by set_event. One documented way to enable all events is to: cat available_events > set_event But because the function event is displayed in the available_events, this now causes an INVALID error: cat: write error: Invalid argument Reported-by: Chunyu Hu <[email protected]> Fixes: 9b63776fa3ca9 "tracing: Do not enable function event with enable" Cc: [email protected] # 3.4+ Signed-off-by: Steven Rostedt <[email protected]>
2016-02-23devm_memremap: Fix error value when memremap failedToshi Kani1-1/+3
devm_memremap() returns an ERR_PTR() value in case of error. However, it returns NULL when memremap() failed. This causes the caller, such as the pmem driver, to proceed and oops later. Change devm_memremap() to return ERR_PTR(-ENXIO) when memremap() failed. Signed-off-by: Toshi Kani <[email protected]> Cc: Andrew Morton <[email protected]> Cc: <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2016-02-22Merge tag 'trace-fixes-v4.5-rc5' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Two more small fixes. One is by Yang Shi who added a READ_ONCE_NOCHECK() to the scan of the stack made by the stack tracer. As the stack tracer scans the entire kernel stack, KASAN triggers seeing it as a "stack out of bounds" error. As the scan is looking at the contents of the stack from parent functions. The NOCHECK() tells KASAN that this is done on purpose, and is not some kind of stack overflow. The second fix is to the ftrace selftests, to retrieve the PID of executed commands from the shell with '$!' and not by parsing 'jobs'" * tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing, kasan: Silence Kasan warning in check_stack of stack_tracer ftracetest: Fix instance test to use proper shell command for pids
2016-02-22mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel ↵Kees Cook1-3/+1
mappings It may be useful to debug writes to the readonly sections of memory, so provide a cmdline "rodata=off" to allow for this. This can be expanded in the future to support "log" and "write" modes, but that will need to be architecture-specific. This also makes KDB software breakpoints more usable, as read-only mappings can now be disabled on any kernel. Suggested-by: H. Peter Anvin <[email protected]> Signed-off-by: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Brown <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Emese Revfy <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mathias Krause <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: PaX Team <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: linux-arch <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-20Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A handful of CPU hotplug related fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Plug potential memory leak in CPU_UP_PREPARE perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug state perf/core: Remove bogus UP_CANCELED hotplug state perf/x86/amd/uncore: Plug reference leak
2016-02-20kernel/resource.c: fix muxed resource handling in __request_region()Simon Guinot1-2/+3
In __request_region, if a conflict with a BUSY and MUXED resource is detected, then the caller goes to sleep and waits for the resource to be released. A pointer on the conflicting resource is kept. At wake-up this pointer is used as a parent to retry to request the region. A first problem is that this pointer might well be invalid (if for example the conflicting resource have already been freed). Another problem is that the next call to __request_region() fails to detect a remaining conflict. The previously conflicting resource is passed as a parameter and __request_region() will look for a conflict among the children of this resource and not at the resource itself. It is likely to succeed anyway, even if there is still a conflict. Instead, the parent of the conflicting resource should be passed to __request_region(). As a fix, this patch doesn't update the parent resource pointer in the case we have to wait for a muxed region right after. Reported-and-tested-by: Vincent Pelletier <[email protected]> Signed-off-by: Simon Guinot <[email protected]> Tested-by: Vincent Donnefort <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2016-02-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <[email protected]>: mm: slab: free kmem_cache_node after destroy sysfs file ipc/shm: handle removed segments gracefully in shm_mmap() MAINTAINERS: update Kselftest Framework mailing list devm_memremap_release(): fix memremap'd addr handling mm/hugetlb.c: fix incorrect proc nr_hugepages value mm, x86: fix pte_page() crash in gup_pte_range() fsnotify: turn fsnotify reaper thread into a workqueue job Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread" mm: fix regression in remap_file_pages() emulation thp, dax: do not try to withdraw pgtable from non-anon VMA
2016-02-19tracing, kasan: Silence Kasan warning in check_stack of stack_tracerYang Shi1-1/+5
When enabling stack trace via "echo 1 > /proc/sys/kernel/stack_tracer_enabled", the below KASAN warning is triggered: BUG: KASAN: stack-out-of-bounds in check_stack+0x344/0x848 at addr ffffffc0689ebab8 Read of size 8 by task ksoftirqd/4/29 page:ffffffbdc3a27ac0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() page dumped because: kasan: bad access detected CPU: 4 PID: 29 Comm: ksoftirqd/4 Not tainted 4.5.0-rc1 #129 Hardware name: Freescale Layerscape 2085a RDB Board (DT) Call trace: [<ffffffc000091300>] dump_backtrace+0x0/0x3a0 [<ffffffc0000916c4>] show_stack+0x24/0x30 [<ffffffc0009bbd78>] dump_stack+0xd8/0x168 [<ffffffc000420bb0>] kasan_report_error+0x6a0/0x920 [<ffffffc000421688>] kasan_report+0x70/0xb8 [<ffffffc00041f7f0>] __asan_load8+0x60/0x78 [<ffffffc0002e05c4>] check_stack+0x344/0x848 [<ffffffc0002e0c8c>] stack_trace_call+0x1c4/0x370 [<ffffffc0002af558>] ftrace_ops_no_ops+0x2c0/0x590 [<ffffffc00009f25c>] ftrace_graph_call+0x0/0x14 [<ffffffc0000881bc>] fpsimd_thread_switch+0x24/0x1e8 [<ffffffc000089864>] __switch_to+0x34/0x218 [<ffffffc0011e089c>] __schedule+0x3ac/0x15b8 [<ffffffc0011e1f6c>] schedule+0x5c/0x178 [<ffffffc0001632a8>] smpboot_thread_fn+0x350/0x960 [<ffffffc00015b518>] kthread+0x1d8/0x2b0 [<ffffffc0000874d0>] ret_from_fork+0x10/0x40 Memory state around the buggy address: ffffffc0689eb980: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4 ffffffc0689eba00: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 >ffffffc0689eba80: 00 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 ^ ffffffc0689ebb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0689ebb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 The stacker tracer traverses the whole kernel stack when saving the max stack trace. It may touch the stack red zones to cause the warning. So, just disable the instrumentation to silence the warning. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-02-18Merge branch 'for-linus' of ↵Linus Torvalds2-35/+5
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching fixes from Jiri Kosina: - regression (from 4.4) fix for ordering issue, introduced by an earlier ftrace change, that broke live patching of modules. The fix replaces the ftrace module notifier by direct call in order to make the ordering guaranteed and well-defined. The patch, from Jessica Yu, has been acked both by Steven and Rusty - error message fix from Miroslav Benes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: ftrace/module: remove ftrace module notifier livepatch: change the error message in asm/livepatch.h header files
2016-02-18devm_memremap_release(): fix memremap'd addr handlingToshi Kani1-1/+1
The pmem driver calls devm_memremap() to map a persistent memory range. When the pmem driver is unloaded, this memremap'd range is not released so the kernel will leak a vma. Fix devm_memremap_release() to handle a given memremap'd address properly. Signed-off-by: Toshi Kani <[email protected]> Acked-by: Dan Williams <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-02-17ftrace/module: remove ftrace module notifierJessica Yu2-35/+5
Remove the ftrace module notifier in favor of directly calling ftrace_module_enable() and ftrace_release_mod() in the module loader. Hard-coding the function calls directly in the module loader removes dependence on the module notifier call chain and provides better visibility and control over what gets called when, which is important to kernel utilities such as livepatch. This fixes a notifier ordering issue in which the ftrace module notifier (and hence ftrace_module_enable()) for coming modules was being called after klp_module_notify(), which caused livepatch modules to initialize incorrectly. This patch removes dependence on the module notifier call chain in favor of hard coding the corresponding function calls in the module loader. This ensures that ftrace and livepatch code get called in the correct order on patch module load and unload. Fixes: 5156dca34a3e ("ftrace: Fix the race between ftrace and insmod") Signed-off-by: Jessica Yu <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Acked-by: Rusty Russell <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Reviewed-by: Miroslav Benes <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2016-02-17futex: Remove requirement for lock_page() in get_futex_key()Mel Gorman1-8/+91
When dealing with key handling for shared futexes, we can drastically reduce the usage/need of the page lock. 1) For anonymous pages, the associated futex object is the mm_struct which does not require the page lock. 2) For inode based, keys, we can check under RCU read lock if the page mapping is still valid and take reference to the inode. This just leaves one rare race that requires the page lock in the slow path when examining the swapcache. Additionally realtime users currently have a problem with the page lock being contended for unbounded periods of time during futex operations. Task A get_futex_key() lock_page() ---> preempted Now any other task trying to lock that page will have to wait until task A gets scheduled back in, which is an unbound time. With this patch, we pretty much have a lockless futex_get_key(). Experiments show that this patch can boost/speedup the hashing of shared futexes with the perf futex benchmarks (which is good for measuring such change) by up to 45% when there are high (> 100) thread counts on a 60 core Westmere. Lower counts are pretty much in the noise range or less than 10%, but mid range can be seen at over 30% overall throughput (hash ops/sec). This makes anon-mem shared futexes much closer to its private counterpart. Signed-off-by: Mel Gorman <[email protected]> [ Ported on top of thp refcount rework, changelog, comments, fixes. ] Signed-off-by: Davidlohr Bueso <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Chris Mason <[email protected]> Cc: Darren Hart <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17futex: Rename barrier references in ordering guaranteesDavidlohr Bueso1-17/+17
Ingo suggested we rename how we reference barriers A and B regarding futex ordering guarantees. This patch replaces, for both barriers, MB (A) with smp_mb(); (A), such that: - We explicitly state that the barriers are SMP, and - We standardize how we reference these across futex.c helping readers follow what barrier does what and where. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Davidlohr Bueso <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Chris Mason <[email protected]> Cc: Darren Hart <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17perf/core: Remove unused arguments from a bunch of functionsThomas Gleixner1-11/+10
No functional change, just less confusing to read. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17Merge branch 'perf/urgent' into perf/core, to queue up dependent patchIngo Molnar9-118/+250
Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17perf/core: Plug potential memory leak in CPU_UP_PREPAREThomas Gleixner1-1/+1
If CPU_UP_PREPARE is called it is not guaranteed, that a previously allocated and assigned hash has been freed already, but perf_event_init_cpu() unconditionally allocates and assignes a new hash if the swhash is referenced. By overwriting the pointer the existing hash is not longer accessible. Verify that there is no hash assigned on this cpu before allocating and assigning a new one. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug stateThomas Gleixner1-1/+0
If CPU_DOWN_PREPARE fails the perf hotplug notifier is called for CPU_DOWN_FAILED and calls perf_event_init_cpu(), which checks whether the swhash is referenced. If yes it allocates a new hash and stores the pointer in the per cpu data structure. But at this point the cpu is still online, so there must be a valid hash already. By overwriting the pointer the existing hash is not longer accessible. Remove the CPU_DOWN_FAILED state, as there is nothing to (re)allocate. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17perf/core: Remove bogus UP_CANCELED hotplug stateThomas Gleixner1-1/+0
If CPU_UP_PREPARE fails the perf hotplug code calls perf_event_exit_cpu(), which is a pointless exercise. The cpu is not online, so the smp function calls return -ENXIO. So the result is a list walk to call noops. Remove it. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17sched/deadline: Fix trivial typo in printk() messageSteven Rostedt1-1/+1
It's "too much" not "to much". Signed-off-by: Steven Rostedt <[email protected]> Acked-by: Juri Lelli <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17sched/core: Remove dead statement in __schedule()Byungchul Park1-1/+0
Remove an unnecessary assignment of variable not used any more. ( This has no runtime effects as GCC is smart enough to optimize this out. ) Signed-off-by: Byungchul Park <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Edited the changelog. ] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-14Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds1-6/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull lockdep fix from Thomas Gleixner: "A single fix for the stack trace caching logic in lockdep, where the duplicate avoidance managed to store no back trace at all" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Fix stack trace caching logic
2016-02-13nohz: Implement wide kick on top of irq workFrederic Weisbecker1-8/+4
It simplifies it and allows wide kick to be performed, even when IRQs are disabled, without an asynchronous level in the middle. This comes at a cost of some more overhead on features like perf and posix cpu timers slow-paths, which is probably not much important for nohz full users. Requested-by: Peter Zijlstra <[email protected]> Reviewed-by: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Viresh Kumar <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>