aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2013-07-31printk: move braille console support into separate braille.[ch] filesJoe Perches4-31/+110
Create files with prototypes and static inlines for braille support. Make braille_console functions return 1 on success. Corrected CONFIG_A11Y_BRAILLE_CONSOLE=n _braille_console_setup return value to NULL. Signed-off-by: Joe Perches <[email protected]> Reviewed-by: Samuel Thibault <[email protected]> Cc: Ming Lei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-31printk: add console_cmdline.hJoe Perches2-9/+17
Add an include file for the console_cmdline struct so that the braille console driver can be separated. Signed-off-by: Joe Perches <[email protected]> Cc: Samuel Thibault <[email protected]> Cc: Ming Lei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-31printk: move to separate directory for easier modificationJoe Perches3-1/+3
Make it easier to break up printk into bite-sized chunks. Remove printk path/filename from comment. Signed-off-by: Joe Perches <[email protected]> Cc: Samuel Thibault <[email protected]> Cc: Ming Lei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-31mm: sched: numa: fix NUMA balancing when !SCHED_DEBUGDave Kleikamp1-2/+2
Commit 3105b86a9fee ("mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG") defined numabalancing_enabled to control the enabling and disabling of automatic NUMA balancing, but it is never used. I believe the intention was to use this in place of sched_feat_numa(NUMA). Currently, if SCHED_DEBUG is not defined, sched_feat_numa(NUMA) will never be changed from the initial "false". Signed-off-by: Dave Kleikamp <[email protected]> Acked-by: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+5
Pull networking fixes from David Miller: 1) Fix association failures not triggering a connect-failure event in cfg80211, from Johannes Berg. 2) Eliminate a potential NULL deref with older iptables tools when configuring xt_socket rules, from Eric Dumazet. 3) Missing RTNL locking in wireless regulatory code, from Johannes Berg. 4) Fix OOPS caused by firmware loading races in ath9k_htc, from Alexey Khoroshilov. 5) Fix usb URB leak in usb_8dev CAN driver, also from Alexey Khoroshilov. 6) VXLAN namespace teardown fails to unregister devices, from Stephen Hemminger. 7) Fix multicast settings getting dropped by firmware in qlcnic driver, from Sucheta Chakraborty. 8) Add sysctl range enforcement for tcp_syn_retries, from Michal Tesar. 9) Fix a nasty bug in bridging where an active timer would get reinitialized with a setup_timer() call. From Eric Dumazet. 10) Fix use after free in new mlx5 driver, from Dan Carpenter. 11) Fix freed pointer reference in ipv6 multicast routing on namespace cleanup, from Hannes Frederic Sowa. 12) Some usbnet drivers report TSO and SG in their feature set, but the usbnet layer doesn't really support them. From Eric Dumazet. 13) Fix crash on EEH errors in tg3 driver, from Gavin Shan. 14) Drop cb_lock when requesting modules in genetlink, from Stanislaw Gruszka. 15) Kernel stack leaks in cbq scheduler and af_key pfkey messages, from Dan Carpenter. 16) FEC driver erroneously signals NETDEV_TX_BUSY on transmit leading to endless loops, from Uwe Kleine-König. 17) Fix hangs from loading mvneta driver, from Arnaud Patard. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits) mlx5: fix error return code in mlx5_alloc_uuars() mvneta: Try to fix mvneta when compiled as module mvneta: Fix hang when loading the mvneta driver atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring genetlink: fix usage of NLM_F_EXCL or NLM_F_REPLACE af_key: more info leaks in pfkey messages net/fec: Don't let ndo_start_xmit return NETDEV_TX_BUSY without link net_sched: Fix stack info leak in cbq_dump_wrr(). igb: fix vlan filtering in promisc mode when not in VT mode ixgbe: Fix Tx Hang issue with lldpad on 82598EB genetlink: release cb_lock before requesting additional module net: fec: workaround stop tx during errata ERR006358 qlcnic: Fix diagnostic interrupt test for 83xx adapters. qlcnic: Fix setting Guest VLAN qlcnic: Fix operation type and command type. qlcnic: Fix initialization of work function. Revert "atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring" atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring net/tg3: Fix warning from pci_disable_device() net/tg3: Fix kernel crash ...
2013-07-31tracing: Add comment to describe special break case in probe_remove_event_call()Steven Rostedt (Red Hat)1-0/+6
The "break" used in the do_for_each_event_file() is used as an optimization as the loop is really a double loop. The loop searches all event files for each trace_array. There's only one matching event file per trace_array and after we find the event file for the trace_array, the break is used to jump to the next trace_array and start the search there. As this is not a standard way of using "break" in C code, it requires a comment right before the break to let people know what is going on. Signed-off-by: Steven Rostedt <[email protected]>
2013-07-31tracing: trace_remove_event_call() should fail if call/file is in useOleg Nesterov1-2/+33
Change trace_remove_event_call(call) to return the error if this call is active. This is what the callers assume but can't verify outside of the tracing locks. Both trace_kprobe.c/trace_uprobe.c need the additional changes, unregister_trace_probe() should abort if trace_remove_event_call() fails. The caller is going to free this call/file so we must ensure that nobody can use them after trace_remove_event_call() succeeds. debugfs should be fine after the previous changes and event_remove() does TRACE_REG_UNREGISTER, but still there are 2 reasons why we need the additional checks: - There could be a perf_event(s) attached to this tp_event, so the patch checks ->perf_refcount. - TRACE_REG_UNREGISTER can be suppressed by FTRACE_EVENT_FL_SOFT_MODE, so we simply check FTRACE_EVENT_FL_ENABLED protected by event_mutex. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-07-31cgroup: convert cgroup_ida to cgroup_idrLi Zefan1-6/+27
This enables us to lookup a cgroup by its id. v4: - add a comment for idr_remove() in cgroup_offline_fn(). v3: - on success, idr_alloc() returns the id but not 0, so fix the BUG_ON() in cgroup_init(). - pass the right value to idr_alloc() so that the id for dummy cgroup is 0. Signed-off-by: Li Zefan <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2013-07-31cgroup: more naming cleanupsLi Zefan2-21/+21
Constantly use @cset for css_set variables and use @cgrp as cgroup variables. Signed-off-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2013-07-31cgroup: remove struct cgroup_seqfile_stateLi Zefan1-32/+13
We can use struct cfent instead. v2: - remove cgroup_seqfile_release(). Signed-off-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2013-07-31cgroup: remove sparse tags from offline_css()Li Zefan1-3/+2
This should have been removed in commit d7eeac1913ff ("cgroup: hold cgroup_mutex before calling css_offline"). While at it, update the comments. Signed-off-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2013-07-31cgroup: fix a leak when percpu_ref_init() failsLi Zefan1-1/+3
ss->css_free() is not called when perfcpu_ref_init() fails. Signed-off-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2013-07-30ftrace: Check module functions being traced on reloadSteven Rostedt (Red Hat)1-9/+62
There's been a nasty bug that would show up and not give much info. The bug displayed the following warning: WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230() Pid: 20903, comm: bash Tainted: G O 3.6.11+ #38405.trunk Call Trace: [<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20 [<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230 [<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0 [<ffffffff811401cc>] ? kfree+0x2c/0x110 [<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150 [<ffffffff81149f1e>] __fput+0xae/0x220 [<ffffffff8114a09e>] ____fput+0xe/0x10 [<ffffffff8105fa22>] task_work_run+0x72/0x90 [<ffffffff810028ec>] do_notify_resume+0x6c/0xc0 [<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff815c0f88>] int_signal+0x12/0x17 ---[ end trace 793179526ee09b2c ]--- It was finally narrowed down to unloading a module that was being traced. It was actually more than that. When functions are being traced, there's a table of all functions that have a ref count of the number of active tracers attached to that function. When a function trace callback is registered to a function, the function's record ref count is incremented. When it is unregistered, the function's record ref count is decremented. If an inconsistency is detected (ref count goes below zero) the above warning is shown and the function tracing is permanently disabled until reboot. The ftrace callback ops holds a hash of functions that it filters on (and/or filters off). If the hash is empty, the default means to filter all functions (for the filter_hash) or to disable no functions (for the notrace_hash). When a module is unloaded, it frees the function records that represent the module functions. These records exist on their own pages, that is function records for one module will not exist on the same page as function records for other modules or even the core kernel. Now when a module unloads, the records that represents its functions are freed. When the module is loaded again, the records are recreated with a default ref count of zero (unless there's a callback that traces all functions, then they will also be traced, and the ref count will be incremented). The problem is that if an ftrace callback hash includes functions of the module being unloaded, those hash entries will not be removed. If the module is reloaded in the same location, the hash entries still point to the functions of the module but the module's ref counts do not reflect that. With the help of Steve and Joern, we found a reproducer: Using uinput module and uinput_release function. cd /sys/kernel/debug/tracing modprobe uinput echo uinput_release > set_ftrace_filter echo function > current_tracer rmmod uinput modprobe uinput # check /proc/modules to see if loaded in same addr, otherwise try again echo nop > current_tracer [BOOM] The above loads the uinput module, which creates a table of functions that can be traced within the module. We add uinput_release to the filter_hash to trace just that function. Enable function tracincg, which increments the ref count of the record associated to uinput_release. Remove uinput, which frees the records including the one that represents uinput_release. Load the uinput module again (and make sure it's at the same address). This recreates the function records all with a ref count of zero, including uinput_release. Disable function tracing, which will decrement the ref count for uinput_release which is now zero because of the module removal and reload, and we have a mismatch (below zero ref count). The solution is to check all currently tracing ftrace callbacks to see if any are tracing any of the module's functions when a module is loaded (it already does that with callbacks that trace all functions). If a callback happens to have a module function being traced, it increments that records ref count and starts tracing that function. There may be a strange side effect with this, where tracing module functions on unload and then reloading a new module may have that new module's functions being traced. This may be something that confuses the user, but it's not a big deal. Another approach is to disable all callback hashes on module unload, but this leaves some ftrace callbacks that may not be registered, but can still have hashes tracing the module's function where ftrace doesn't know about it. That situation can cause the same bug. This solution solves that case too. Another benefit of this solution, is it is possible to trace a module's function on unload and load. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Jörn Engel <[email protected]> Reported-by: Dave Jones <[email protected]> Reported-by: Steve Hodgson <[email protected]> Tested-by: Steve Hodgson <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-07-30watchdog: Make it work under full dynticksFrederic Weisbecker1-8/+0
A perf event can be used without forcing the tick to stay alive if it doesn't use a frequency but a sample period and if it doesn't throttle (raise storm of events). Since the lockup detector neither use a perf event frequency nor should ever throttle due to its high period, it can now run concurrently with the full dynticks feature. So remove the hack that disabled the watchdog. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Don Zickus <[email protected]> Cc: Srivatsa S. Bhat <[email protected]> Cc: Anish Singh <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-30perf: Implement finer grained full dynticks kickFrederic Weisbecker1-8/+9
Currently the full dynticks subsystem keep the tick alive as long as there are perf events running. This prevents the tick from being stopped as long as features such that the lockup detectors are running. As a temporary fix, the lockup detector is disabled by default when full dynticks is built but this is not a long term viable solution. To fix this, only keep the tick alive when an event configured with a frequency rather than a period is running on the CPU, or when an event throttles on the CPU. These are the only purposes of the perf tick, especially now that the rotation of flexible events is handled from a seperate hrtimer. The tick can be shutdown the rest of the time. Original-patch-by: Peter Zijlstra <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-30perf: Account freq events per cpuFrederic Weisbecker1-0/+7
This is going to be used by the full dynticks subsystem as a finer-grained information to know when to keep and when to stop the tick. Original-patch-by: Peter Zijlstra <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-30perf: Migrate per cpu event accountingFrederic Weisbecker1-0/+2
When an event is migrated, move the event per-cpu accounting accordingly so that branch stack and cgroup events work correctly on the new CPU. Original-patch-by: Peter Zijlstra <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-30perf: Split the per-cpu accounting part of the event accounting codeFrederic Weisbecker1-32/+55
This way we can use the per-cpu handling seperately. This is going to be used by to fix the event migration code accounting. Original-patch-by: Peter Zijlstra <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-30perf: Factor out event accounting code to account_event()/__free_event()Frederic Weisbecker1-32/+47
Gather all the event accounting code to a single place, once all the prerequisites are completed. This simplifies the refcounting. Original-patch-by: Peter Zijlstra <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-30perf: Sanitize get_callchain_buffer()Frederic Weisbecker2-20/+23
In case of allocation failure, get_callchain_buffer() keeps the refcount incremented for the current event. As a result, when get_callchain_buffers() returns an error, we must cleanup what it did by cancelling its last refcount with a call to put_callchain_buffers(). This is a hack in order to be able to call free_event() after that failure. The original purpose of that was to simplify the failure path. But this error handling is actually counter intuitive, ugly and not very easy to follow because one expect to see the resources used to perform a service to be cleaned by the callee if case of failure, not by the caller. So lets clean this up by cancelling the refcount from get_callchain_buffer() in case of failure. And correctly free the event accordingly in perf_event_alloc(). Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-30perf: Fix branch stack refcount leak on callchain init failureFrederic Weisbecker1-6/+6
On callchain buffers allocation failure, free_event() is called and all the accounting performed in perf_event_alloc() for that event is cancelled. But if the event has branch stack sampling, it is unaccounted as well from the branch stack sampling events refcounts. This is a bug because this accounting is performed after the callchain buffer allocation. As a result, the branch stack sampling events refcount can become negative. To fix this, move the branch stack event accounting before the callchain buffer allocation. Reported-by: Peter Zijlstra <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-30generic-ipi: Kill unnecessary variable - csd_flagsXie XiuQi1-13/+1
After commit 8969a5ede0f9e17da4b943712429aef2c9bcd82b ("generic-ipi: remove kmalloc()"), wait = 0 can be guaranteed, and all callsites of generic_exec_single() do an unconditional csd_lock() now. So csd_flags is unnecessary now. Remove it. Signed-off-by: Xie XiuQi <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Jens Axboe <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Rusty Russell <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-30mutex: Fix w/w mutex deadlock injectionMaarten Lankhorst1-2/+2
The check needs to be for > 1, because ctx->acquired is already incremented. This will prevent ww_mutex_lock_slow from returning -EDEADLK and not locking the mutex. It caused a lot of false gpu lockups on radeon with CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y because a function that shouldn't be able to return -EDEADLK did. Signed-off-by: Maarten Lankhorst <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-07-30sched: Ensure update_cfs_shares() is called for parents of ↵Peter Zijlstra1-0/+1
continuously-running tasks We typically update a task_group's shares within the dequeue/enqueue path. However, continuously running tasks sharing a CPU are not subject to these updates as they are only put/picked. Unfortunately, when we reverted f269ae046 (in 17bc14b7), we lost the augmenting periodic update that was supposed to account for this; resulting in a potential loss of fairness. To fix this, re-introduce the explicit update in update_cfs_rq_blocked_load() [called via entity_tick()]. Reported-by: Max Hailperin <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Reviewed-by: Paul Turner <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Cc: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2013-07-30aio: convert the ioctx list to table lookup v3Benjamin LaHaise1-1/+1
On Wed, Jun 12, 2013 at 11:14:40AM -0700, Kent Overstreet wrote: > On Mon, Apr 15, 2013 at 02:40:55PM +0300, Octavian Purdila wrote: > > When using a large number of threads performing AIO operations the > > IOCTX list may get a significant number of entries which will cause > > significant overhead. For example, when running this fio script: > > > > rw=randrw; size=256k ;directory=/mnt/fio; ioengine=libaio; iodepth=1 > > blocksize=1024; numjobs=512; thread; loops=100 > > > > on an EXT2 filesystem mounted on top of a ramdisk we can observe up to > > 30% CPU time spent by lookup_ioctx: > > > > 32.51% [guest.kernel] [g] lookup_ioctx > > 9.19% [guest.kernel] [g] __lock_acquire.isra.28 > > 4.40% [guest.kernel] [g] lock_release > > 4.19% [guest.kernel] [g] sched_clock_local > > 3.86% [guest.kernel] [g] local_clock > > 3.68% [guest.kernel] [g] native_sched_clock > > 3.08% [guest.kernel] [g] sched_clock_cpu > > 2.64% [guest.kernel] [g] lock_release_holdtime.part.11 > > 2.60% [guest.kernel] [g] memcpy > > 2.33% [guest.kernel] [g] lock_acquired > > 2.25% [guest.kernel] [g] lock_acquire > > 1.84% [guest.kernel] [g] do_io_submit > > > > This patchs converts the ioctx list to a radix tree. For a performance > > comparison the above FIO script was run on a 2 sockets 8 core > > machine. This are the results (average and %rsd of 10 runs) for the > > original list based implementation and for the radix tree based > > implementation: > > > > cores 1 2 4 8 16 32 > > list 109376 ms 69119 ms 35682 ms 22671 ms 19724 ms 16408 ms > > %rsd 0.69% 1.15% 1.17% 1.21% 1.71% 1.43% > > radix 73651 ms 41748 ms 23028 ms 16766 ms 15232 ms 13787 ms > > %rsd 1.19% 0.98% 0.69% 1.13% 0.72% 0.75% > > % of radix > > relative 66.12% 65.59% 66.63% 72.31% 77.26% 83.66% > > to list > > > > To consider the impact of the patch on the typical case of having > > only one ctx per process the following FIO script was run: > > > > rw=randrw; size=100m ;directory=/mnt/fio; ioengine=libaio; iodepth=1 > > blocksize=1024; numjobs=1; thread; loops=100 > > > > on the same system and the results are the following: > > > > list 58892 ms > > %rsd 0.91% > > radix 59404 ms > > %rsd 0.81% > > % of radix > > relative 100.87% > > to list > > So, I was just doing some benchmarking/profiling to get ready to send > out the aio patches I've got for 3.11 - and it looks like your patch is > causing a ~1.5% throughput regression in my testing :/ ... <snip> I've got an alternate approach for fixing this wart in lookup_ioctx()... Instead of using an rbtree, just use the reserved id in the ring buffer header to index an array pointing the ioctx. It's not finished yet, and it needs to be tidied up, but is most of the way there. -ben -- "Thought is the essence of where you are now." -- kmo> And, a rework of Ben's code, but this was entirely his idea kmo> -Kent bcrl> And fix the code to use the right mm_struct in kill_ioctx(), actually free memory. Signed-off-by: Benjamin LaHaise <[email protected]>
2013-07-30freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processesColin Cross2-1/+12
Calling freeze_processes sets a global flag that will cause any process that calls try_to_freeze to enter the refrigerator. It skips sending a signal to the current task, but if the current task ever hits try_to_freeze, all threads will be frozen and the system will deadlock. Set a new flag, PF_SUSPEND_TASK, on the task that calls freeze_processes. The flag notifies the freezer that the thread is involved in suspend and should not be frozen. Also add a WARN_ON in thaw_processes if the caller does not have the PF_SUSPEND_TASK flag set to catch if a different task calls thaw_processes than the one that called freeze_processes, leaving a task with PF_SUSPEND_TASK permanently set on it. Threads that spawn off a task with PF_SUSPEND_TASK set (which swsusp does) will also have PF_SUSPEND_TASK set, preventing them from freezing while they are helping with suspend, but they need to be dead by the time suspend is triggered, otherwise they may run when userspace is expected to be frozen. Add a WARN_ON in thaw_processes if more than one thread has the PF_SUSPEND_TASK flag set. Reported-and-tested-by: Michael Leun <[email protected]> Signed-off-by: Colin Cross <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2013-07-29ftrace: Consolidate some duplicate code for updating ftrace opsSteven Rostedt (Red Hat)1-6/+10
When ftrace ops modifies the functions that it will trace, the update to the function mcount callers may need to be modified. Consolidate the two places that do the checks to see if an update is required with a wrapper function for those checks. Signed-off-by: Steven Rostedt <[email protected]>
2013-07-29tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_privateOleg Nesterov1-32/+15
Change remove_event_file_dir() to clear ->i_private for every file we are going to remove. We need to check file->dir != NULL because event_create_dir() can fail. debugfs_remove_recursive(NULL) is fine but the patch moves it under the same check anyway for readability. spin_lock(d_lock) and "d_inode != NULL" check are not needed afaics, but I do not understand this code enough. tracing_open_generic_file() and tracing_release_generic_file() can go away, ftrace_enable_fops and ftrace_event_filter_fops() use tracing_open_generic() but only to check tracing_disabled. This fixes all races with event_remove() or instance_delete(). f_op->read/write/whatever can never use the freed file/call, all event/* files were changed to check and use ->i_private under event_mutex. Note: this doesn't not fix other problems, event_remove() can destroy the active ftrace_event_call, we need more changes but those changes are completely orthogonal. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-07-29tracing: Introduce remove_event_file_dir()Oleg Nesterov1-24/+23
Preparation for the next patch. Extract the common code from remove_event_from_tracers() and __trace_remove_event_dirs() into the new helper, remove_event_file_dir(). The patch looks more complicated than it actually is, it also moves remove_subsystem() up to avoid the forward declaration. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-07-29tracing: Change f_start() to take event_mutex and verify i_private != NULLOleg Nesterov1-4/+9
trace_format_open() and trace_format_seq_ops are racy, nothing protects ftrace_event_call from trace_remove_event_call(). Change f_start() to take event_mutex and verify i_private != NULL, change f_stop() to drop this lock. This fixes nothing, but now we can change debugfs_remove("format") callers to nullify ->i_private and fix the the problem. Note: the usage of event_mutex is sub-optimal but simple, we can change this later. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-07-29tracing: Change event_filter_read/write to verify i_private != NULLOleg Nesterov2-18/+25
event_filter_read/write() are racy, ftrace_event_call can be already freed by trace_remove_event_call() callers. 1. Shift mutex_lock(event_mutex) from print/apply_event_filter to the callers. 2. Change the callers, event_filter_read() and event_filter_write() to read i_private under this mutex and abort if it is NULL. This fixes nothing, but now we can change debugfs_remove("filter") callers to nullify ->i_private and fix the the problem. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-07-29tracing: Change event_enable/disable_read() to verify i_private != NULLOleg Nesterov1-10/+20
tracing_open_generic_file() is racy, ftrace_event_file can be already freed by rmdir or trace_remove_event_call(). Change event_enable_read() and event_disable_read() to read and verify "file = i_private" under event_mutex. This fixes nothing, but now we can change debugfs_remove("enable") callers to nullify ->i_private and fix the the problem. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-07-29tracing: Turn event/id->i_private into call->event.typeOleg Nesterov1-5/+13
event_id_read() is racy, ftrace_event_call can be already freed by trace_remove_event_call() callers. Change event_create_dir() to pass "data = call->event.type", this is all event_id_read() needs. ftrace_event_id_fops no longer needs tracing_open_generic(). We add the new helper, event_file_data(), to read ->i_private, it will have more users. Note: currently ACCESS_ONCE() and "id != 0" check are not needed, but we are going to change event_remove/rmdir to clear ->i_private. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-07-29rcu: Have the RCU tracepoints use the tracepoint_string infrastructureSteven Rostedt (Red Hat)2-50/+69
Currently, RCU tracepoints save only a pointer to strings in the ring buffer. When displayed via the /sys/kernel/debug/tracing/trace file they are referenced like the printf "%s" that looks at the address in the ring buffer and prints out the string it points too. This requires that the strings are constant and persistent in the kernel. The problem with this is for tools like trace-cmd and perf that read the binary data from the buffers but have no access to the kernel memory to find out what string is represented by the address in the buffer. By using the tracepoint_string infrastructure, the RCU tracepoint strings can be exported such that userspace tools can map the addresses to the strings. # cat /sys/kernel/debug/tracing/printk_formats 0xffffffff81a4a0e8 : "rcu_preempt" 0xffffffff81a4a0f4 : "rcu_bh" 0xffffffff81a4a100 : "rcu_sched" 0xffffffff818437a0 : "cpuqs" 0xffffffff818437a6 : "rcu_sched" 0xffffffff818437a0 : "cpuqs" 0xffffffff818437b0 : "rcu_bh" 0xffffffff818437b7 : "Start context switch" 0xffffffff818437cc : "End context switch" 0xffffffff818437a0 : "cpuqs" [...] Now userspaces tools can display: rcu_utilization: Start context switch rcu_dyntick: Start 1 0 rcu_utilization: End context switch rcu_batch_start: rcu_preempt CBs=0/5 bl=10 rcu_dyntick: End 0 140000000000000 rcu_invoke_callback: rcu_preempt rhp=0xffff880071c0d600 func=proc_i_callback rcu_invoke_callback: rcu_preempt rhp=0xffff880077b5b230 func=__d_free rcu_dyntick: Start 140000000000000 0 rcu_invoke_callback: rcu_preempt rhp=0xffff880077563980 func=file_free_rcu rcu_batch_end: rcu_preempt CBs-invoked=3 idle=>c<>c<>c<>c< rcu_utilization: End RCU core rcu_grace_period: rcu_preempt 9741 start rcu_dyntick: Start 1 0 rcu_dyntick: End 0 140000000000000 rcu_dyntick: Start 140000000000000 0 Instead of: rcu_utilization: ffffffff81843110 rcu_future_grace_period: ffffffff81842f1d 9939 9939 9940 0 0 3 ffffffff81842f32 rcu_batch_start: ffffffff81842f1d CBs=0/4 bl=10 rcu_future_grace_period: ffffffff81842f1d 9939 9939 9940 0 0 3 ffffffff81842f3c rcu_grace_period: ffffffff81842f1d 9939 ffffffff81842f80 rcu_invoke_callback: ffffffff81842f1d rhp=0xffff88007888aac0 func=file_free_rcu rcu_grace_period: ffffffff81842f1d 9939 ffffffff81842f95 rcu_invoke_callback: ffffffff81842f1d rhp=0xffff88006aeb4600 func=proc_i_callback rcu_future_grace_period: ffffffff81842f1d 9939 9939 9940 0 0 3 ffffffff81842f32 rcu_future_grace_period: ffffffff81842f1d 9939 9939 9940 0 0 3 ffffffff81842f3c rcu_invoke_callback: ffffffff81842f1d rhp=0xffff880071cb9fc0 func=__d_free rcu_grace_period: ffffffff81842f1d 9939 ffffffff81842f80 rcu_invoke_callback: ffffffff81842f1d rhp=0xffff88007888ae80 func=file_free_rcu rcu_batch_end: ffffffff81842f1d CBs-invoked=4 idle=>c<>c<>c<>c< rcu_utilization: ffffffff8184311f Signed-off-by: Steven Rostedt <[email protected]>
2013-07-29rcu: Simplify RCU_STATE_INITIALIZER() macroSteven Rostedt (Red Hat)2-11/+7
The RCU_STATE_INITIALIZER() macro is used only in the rcutree.c file as well as the rcutree_plugin.h file. It is passed as a rvalue to a variable of a similar name. A per_cpu variable is also created with a similar name as well. The uses of RCU_STATE_INITIALIZER() can be simplified to remove some of the duplicate code that is done. Currently the three users of this macro has this format: struct rcu_state rcu_sched_state = RCU_STATE_INITIALIZER(rcu_sched, call_rcu_sched); DEFINE_PER_CPU(struct rcu_data, rcu_sched_data); Notice that "rcu_sched" is called three times. This is the same with the other two users. This can be condensed to just: RCU_STATE_INITIALIZER(rcu_sched, call_rcu_sched); by moving the rest into the macro itself. This also opens the door to allow the RCU tracepoint strings and their addresses to be exported so that userspace tracing tools can translate the contents of the pointers of the RCU tracepoints. The change will allow for helper code to be placed in the RCU_STATE_INITIALIZER() macro to export the name that is used. Signed-off-by: Steven Rostedt <[email protected]>
2013-07-29rcu: Add const annotation to char * for RCU tracepoints and functionsSteven Rostedt (Red Hat)7-11/+11
All the RCU tracepoints and functions that reference char pointers do so with just 'char *' even though they do not modify the contents of the string itself. This will cause warnings if a const char * is used in one of these functions. The RCU tracepoints store the pointer to the string to refer back to them when the trace output is displayed. As this can be minutes, hours or even days later, those strings had better be constant. This change also opens the door to allow the RCU tracepoint strings and their addresses to be exported so that userspace tracing tools can translate the contents of the pointers of the RCU tracepoints. Signed-off-by: Steven Rostedt <[email protected]>
2013-07-29cpuset: relocate a misplaced commentZhao Hongjiang1-6/+6
Comment for cpuset_css_offline() was on top of cpuset_css_free(). Move it. Signed-off-by: Zhao Hongjiang <[email protected]> Acked-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2013-07-29cpuset: get rid of the useless forward declaration of cpusetZhao Hongjiang1-1/+0
get rid of the useless forward declaration of the struct cpuset cause the below define it. Signed-off-by: Zhao Hongjiang <[email protected]> Acked-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2013-07-29Revert "cpuidle: Quickly notice prediction failure for repeat mode"Rafael J. Wysocki1-7/+2
Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for repeat mode), because it has been identified as the source of a significant performance regression in v3.8 and later as explained by Jeremy Eder: We believe we've identified a particular commit to the cpuidle code that seems to be impacting performance of variety of workloads. The simplest way to reproduce is using netperf TCP_RR test, so we're using that, on a pair of Sandy Bridge based servers. We also have data from a large database setup where performance is also measurably/positively impacted, though that test data isn't easily share-able. Included below are test results from 3 test kernels: kernel reverts ----------------------------------------------------------- 1) vanilla upstream (no reverts) 2) perfteam2 reverts e11538d1f03914eb92af5a1a378375c05ae8520c 3) test reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4 e11538d1f03914eb92af5a1a378375c05ae8520c In summary, netperf TCP_RR numbers improve by approximately 4% after reverting 69a37beabf1f0a6705c08e879bdd5d82ff6486c4. When 69a37beabf1f0a6705c08e879bdd5d82ff6486c4 is included, C0 residency never seems to get above 40%. Taking that patch out gets C0 near 100% quite often, and performance increases. The below data are histograms representing the %c0 residency @ 1-second sample rates (using turbostat), while under netperf test. - If you look at the first 4 histograms, you can see %c0 residency almost entirely in the 30,40% bin. - The last pair, which reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4, shows %c0 in the 80,90,100% bins. Below each kernel name are netperf TCP_RR trans/s numbers for the particular kernel that can be disclosed publicly, comparing the 3 test kernels. We ran a 4th test with the vanilla kernel where we've also set /dev/cpu_dma_latency=0 to show overall impact boosting single-threaded TCP_RR performance over 11% above baseline. 3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0): TCP_RR trans/s 54323.78 ----------------------------------------------------------- 3.10-rc2 vanilla RX (no reverts) TCP_RR trans/s 48192.47 Receiver %c0 0.0000 - 10.0000 [ 1]: * 10.0000 - 20.0000 [ 0]: 20.0000 - 30.0000 [ 0]: 30.0000 - 40.0000 [ 59]: *********************************************************** 40.0000 - 50.0000 [ 1]: * 50.0000 - 60.0000 [ 0]: 60.0000 - 70.0000 [ 0]: 70.0000 - 80.0000 [ 0]: 80.0000 - 90.0000 [ 0]: 90.0000 - 100.0000 [ 0]: Sender %c0 0.0000 - 10.0000 [ 1]: * 10.0000 - 20.0000 [ 0]: 20.0000 - 30.0000 [ 0]: 30.0000 - 40.0000 [ 11]: *********** 40.0000 - 50.0000 [ 49]: ************************************************* 50.0000 - 60.0000 [ 0]: 60.0000 - 70.0000 [ 0]: 70.0000 - 80.0000 [ 0]: 80.0000 - 90.0000 [ 0]: 90.0000 - 100.0000 [ 0]: ----------------------------------------------------------- 3.10-rc2 perfteam2 RX (reverts commit e11538d1f03914eb92af5a1a378375c05ae8520c) TCP_RR trans/s 49698.69 Receiver %c0 0.0000 - 10.0000 [ 1]: * 10.0000 - 20.0000 [ 1]: * 20.0000 - 30.0000 [ 0]: 30.0000 - 40.0000 [ 59]: *********************************************************** 40.0000 - 50.0000 [ 0]: 50.0000 - 60.0000 [ 0]: 60.0000 - 70.0000 [ 0]: 70.0000 - 80.0000 [ 0]: 80.0000 - 90.0000 [ 0]: 90.0000 - 100.0000 [ 0]: Sender %c0 0.0000 - 10.0000 [ 1]: * 10.0000 - 20.0000 [ 0]: 20.0000 - 30.0000 [ 0]: 30.0000 - 40.0000 [ 2]: ** 40.0000 - 50.0000 [ 58]: ********************************************************** 50.0000 - 60.0000 [ 0]: 60.0000 - 70.0000 [ 0]: 70.0000 - 80.0000 [ 0]: 80.0000 - 90.0000 [ 0]: 90.0000 - 100.0000 [ 0]: ----------------------------------------------------------- 3.10-rc2 test RX (reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4 and e11538d1f03914eb92af5a1a378375c05ae8520c) TCP_RR trans/s 47766.95 Receiver %c0 0.0000 - 10.0000 [ 1]: * 10.0000 - 20.0000 [ 1]: * 20.0000 - 30.0000 [ 0]: 30.0000 - 40.0000 [ 27]: *************************** 40.0000 - 50.0000 [ 2]: ** 50.0000 - 60.0000 [ 0]: 60.0000 - 70.0000 [ 2]: ** 70.0000 - 80.0000 [ 0]: 80.0000 - 90.0000 [ 0]: 90.0000 - 100.0000 [ 28]: **************************** Sender: 0.0000 - 10.0000 [ 1]: * 10.0000 - 20.0000 [ 0]: 20.0000 - 30.0000 [ 0]: 30.0000 - 40.0000 [ 11]: *********** 40.0000 - 50.0000 [ 0]: 50.0000 - 60.0000 [ 1]: * 60.0000 - 70.0000 [ 0]: 70.0000 - 80.0000 [ 3]: *** 80.0000 - 90.0000 [ 7]: ******* 90.0000 - 100.0000 [ 38]: ************************************** These results demonstrate gaining back the tendency of the CPU to stay in more responsive, performant C-states (and thus yield measurably better performance), by reverting commit 69a37beabf1f0a6705c08e879bdd5d82ff6486c4. Requested-by: Jeremy Eder <[email protected]> Tested-by: Len Brown <[email protected]> Cc: 3.8+ <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2013-07-28Merge tag 'trace-fixes-3.11-rc2' of ↵Linus Torvalds3-128/+95
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Oleg is working on fixing a very tight race between opening a event file and deleting that event at the same time (both must be done as root). I also found a bug while testing Oleg's patches which has to do with a race with kprobes using the function tracer. There's also a deadlock fix that was introduced with the previous fixes" * tag 'trace-fixes-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Remove locking trace_types_lock from tracing_reset_all_online_cpus() ftrace: Add check for NULL regs if ops has SAVE_REGS set tracing: Kill trace_cpu struct/members tracing: Change tracing_fops/snapshot_fops to rely on tracing_get_cpu() tracing: Change tracing_entries_fops to rely on tracing_get_cpu() tracing: Change tracing_stats_fops to rely on tracing_get_cpu() tracing: Change tracing_buffers_fops to rely on tracing_get_cpu() tracing: Change tracing_pipe_fops() to rely on tracing_get_cpu() tracing: Introduce trace_create_cpu_file() and tracing_get_cpu()
2013-07-26sysctl: range checking in do_proc_dointvec_ms_jiffies_convFrancesco Fusco1-1/+5
When (integer) sysctl values are expressed in ms and have to be represented internally as jiffies. The msecs_to_jiffies function returns an unsigned long, which gets assigned to the integer. This patch prevents the value to be assigned if bigger than INT_MAX, done in a similar way as in cba9f3 ("Range checking in do_proc_dointvec_(userhz_)jiffies_conv"). Signed-off-by: Francesco Fusco <[email protected]> CC: Andrew Morton <[email protected]> CC: [email protected] Signed-off-by: David S. Miller <[email protected]>
2013-07-26tracing: Add __tracepoint_string() to export string pointersSteven Rostedt (Red Hat)2-0/+22
There are several tracepoints (mostly in RCU), that reference a string pointer and uses the print format of "%s" to display the string that exists in the kernel, instead of copying the actual string to the ring buffer (saves time and ring buffer space). But this has an issue with userspace tools that read the binary buffers that has the address of the string but has no access to what the string itself is. The end result is just output that looks like: rcu_dyntick: ffffffff818adeaa 1 0 rcu_dyntick: ffffffff818adeb5 0 140000000000000 rcu_dyntick: ffffffff818adeb5 0 140000000000000 rcu_utilization: ffffffff8184333b rcu_utilization: ffffffff8184333b The above is pretty useless when read by the userspace tools. Ideally we would want something that looks like this: rcu_dyntick: Start 1 0 rcu_dyntick: End 0 140000000000000 rcu_dyntick: Start 140000000000000 0 rcu_callback: rcu_preempt rhp=0xffff880037aff710 func=put_cred_rcu 0/4 rcu_callback: rcu_preempt rhp=0xffff880078961980 func=file_free_rcu 0/5 rcu_dyntick: End 0 1 The trace_printk() which also only stores the address of the string format instead of recording the string into the buffer itself, exports the mapping of kernel addresses to format strings via the printk_format file in the debugfs tracing directory. The tracepoint strings can use this same method and output the format to the same file and the userspace tools will be able to decipher the address without any modification. The tracepoint strings need its own section to save the strings because the trace_printk section will cause the trace_printk() buffers to be allocated if anything exists within the section. trace_printk() is only used for debugging and should never exist in the kernel, we can not use the trace_printk sections. Add a new tracepoint_str section that will also be examined by the output of the printk_format file. Cc: Paul E. McKenney <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2013-07-26tracing: Remove locking trace_types_lock from tracing_reset_all_online_cpus()Steven Rostedt (Red Hat)1-2/+1
Commit a82274151af "tracing: Protect ftrace_trace_arrays list in trace_events.c" added taking the trace_types_lock mutex in trace_events.c as there were several locations that needed it for protection. Unfortunately, it also encapsulated a call to tracing_reset_all_online_cpus() which also takes the trace_types_lock, causing a deadlock. This happens when a module has tracepoints and has been traced. When the module is removed, the trace events module notifier will grab the trace_types_lock, do a bunch of clean ups, and also clears the buffer by calling tracing_reset_all_online_cpus. This doesn't happen often which explains why it wasn't caught right away. Commit a82274151af was marked for stable, which means this must be sent to stable too. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Arend van Spril <[email protected]> Tested-by: Arend van Spriel <[email protected]> Cc: Alexander Z Lam <[email protected]> Cc: Vaibhav Nagarnaik <[email protected]> Cc: David Sharp <[email protected]> Cc: [email protected] # 3.10 Signed-off-by: Steven Rostedt <[email protected]>
2013-07-26PM / Sleep: increase ftrace coverage in suspend/resumeBrandt, Todd E1-2/+2
Change where ftrace is disabled and re-enabled during system suspend/resume to allow tracing of device driver pm callbacks. Ftrace will now be turned off when suspend reaches disable_nonboot_cpus() instead of at the very beginning of system suspend. Ftrace was disabled during suspend/resume back in 2008 by Steven Rostedt as he discovered there was a conflict in the enable_nonboot_cpus() call (see commit f42ac38 "ftrace: disable tracing for suspend to ram"). This change preserves his fix by disabling ftrace, but only at the function where it is known to cause problems. The new change allows tracing of the device level code for better debug. [rjw: Changelog] Signed-off-by: Todd Brandt <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2013-07-25mutex: Avoid label warning when !CONFIG_MUTEX_SPIN_ON_OWNERDavidlohr Bueso1-2/+2
Fengguang reported the following warning when optimistic spinning is disabled (ie: make allnoconfig): kernel/mutex.c:599:1: warning: label 'done' defined but not used Remove the 'done' label altogether. Reported-by: Fengguang Wu <[email protected]> Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2013-07-25Merge branch 'timers/urgent' of ↵Ingo Molnar1-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/urgent Pull nohz fixes from Frederic Weisbecker. Signed-off-by: Ingo Molnar <[email protected]>
2013-07-24nohz: fix compile warning in tick_nohz_init()Li Zhong1-2/+0
cpu is not used after commit 5b8621a68fdcd2baf1d3b413726f913a5254d46a Signed-off-by: Li Zhong <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Li Zhong <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Kevin Hilman <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2013-07-24nohz: Do not warn about unstable tsc unless user uses nohz_fullSteven Rostedt1-1/+2
If the user enables CONFIG_NO_HZ_FULL and runs the kernel on a machine with an unstable TSC, it will produce a WARN_ON dump as well as taint the kernel. This is a bit extreme for a kernel that just enables a feature but doesn't use it. The warning should only happen if the user tries to use the feature by either adding nohz_full to the kernel command line, or by enabling CONFIG_NO_HZ_FULL_ALL that makes nohz used on all CPUs at boot up. Note, this second feature should not (yet) be used by distros or anyone that doesn't care if NO_HZ is used or not. Signed-off-by: Steven Rostedt <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Li Zhong <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Kevin Hilman <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2013-07-24workqueue: allow work_on_cpu() to be called recursivelyLai Jiangshan1-10/+22
If the @fn call work_on_cpu() again, the lockdep will complain: > [ INFO: possible recursive locking detected ] > 3.11.0-rc1-lockdep-fix-a #6 Not tainted > --------------------------------------------- > kworker/0:1/142 is trying to acquire lock: > ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0 > > but task is already holding lock: > ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock((&wfc.work)); > lock((&wfc.work)); > > *** DEADLOCK *** It is false-positive lockdep report. In this sutiation, the two "wfc"s of the two work_on_cpu() are different, they are both on stack. flush_work() can't be deadlock. To fix this, we need to avoid the lockdep checking in this case, thus we instroduce a internal __flush_work() which skip the lockdep. tj: Minor comment adjustment. Signed-off-by: Lai Jiangshan <[email protected]> Reported-by: "Srivatsa S. Bhat" <[email protected]> Reported-by: Alexander Duyck <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2013-07-24ftrace: Add check for NULL regs if ops has SAVE_REGS setSteven Rostedt (Red Hat)1-4/+14
If a ftrace ops is registered with the SAVE_REGS flag set, and there's already a ops registered to one of its functions but without the SAVE_REGS flag, there's a small race window where the SAVE_REGS ops gets added to the list of callbacks to call for that function before the callback trampoline gets set to save the regs. The problem is, the function is not currently saving regs, which opens a small race window where the ops that is expecting regs to be passed to it, wont. This can cause a crash if the callback were to reference the regs, as the SAVE_REGS guarantees that regs will be set. To fix this, we add a check in the loop case where it checks if the ops has the SAVE_REGS flag set, and if so, it will ignore it if regs is not set. Signed-off-by: Steven Rostedt <[email protected]>