aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2014-07-18ftrace: Remove function_trace_stop check from list funcSteven Rostedt (Red Hat)1-6/+2
function_trace_stop is no longer used to stop function tracing. Remove the check from __ftrace_ops_list_func(). Also, call FTRACE_WARN_ON() instead of setting function_trace_stop if a ops has no func to call. Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-18ftrace: Do no disable function tracing on enabling function tracingSteven Rostedt (Red Hat)1-7/+0
When function tracing is being updated function_trace_stop is set to keep from tracing the updates. This was fine when function tracing was done from stop machine. But it is no longer done that way and this can cause real tracing to be missed. Remove it. Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-18ftrace-graph: Remove usage of ftrace_stop() in ftrace_graph_stop()Steven Rostedt (Red Hat)1-5/+0
All archs now use ftrace_graph_is_dead() to stop function graph tracing. Remove the usage of ftrace_stop() as that is no longer needed. Cc: Frederic Weisbecker <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-18kprobes: Fix "Failed to find blacklist" probing errors on ia64 and ppc64Masami Hiramatsu1-5/+9
On ia64 and ppc64, function pointers do not point to the entry address of the function, but to the address of a function descriptor (which contains the entry address and misc data). Since the kprobes code passes the function pointer stored by NOKPROBE_SYMBOL() to kallsyms_lookup_size_offset() for initalizing its blacklist, it fails and reports many errors, such as: Failed to find blacklist 0001013168300000 Failed to find blacklist 0001013000f0a000 [...] To fix this bug, use arch_deref_entry_point() to get the function entry address for kallsyms_lookup_size_offset() instead of the raw function pointer. Suzuki also pointed out that blacklist entries should also be updated as well. Reported-by: Tony Luck <[email protected]> Fixed-by: Suzuki K. Poulose <[email protected]> Tested-by: Tony Luck <[email protected]> Tested-by: Michael Ellerman <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Acked-by: Michael Ellerman <[email protected]> (for powerpc) Acked-by: Benjamin Herrenschmidt <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: [email protected] Cc: Paul Mackerras <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Fenghua Yu <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Chris Wright <[email protected]> Cc: [email protected] Cc: Kevin Hao <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: David S. Miller <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-17Merge tag 'trace-fixes-v3.16-rc5-v2' of ↵Linus Torvalds4-8/+19
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "A few more fixes for ftrace infrastructure. I was cleaning out my INBOX and found two fixes from zhangwei from a year ago that were lost in my mail. These fix an inconsistency between trace_puts() and the way trace_printk() works. The reason this is important to fix is because when trace_printk() doesn't have any arguments, it turns into a trace_puts(). Not being able to enable a stack trace against trace_printk() because it does not have any arguments is quite confusing. Also, the fix is rather trivial and low risk. While porting some changes to PowerPC I discovered that it still has the function graph tracer filter bug that if you also enable stack tracing the function graph tracer filter is ignored. I fixed that up. Finally, Martin Lau, fixed a bug that would cause readers of the ftrace ring buffer to block forever even though it was suppose to be NONBLOCK" This also includes the fix from an earlier pull request: "Oleg Nesterov fixed a memory leak that happens if a user creates a tracing instance, sets up a filter in an event, and then removes that instance. The filter allocates memory that is never freed when the instance is destroyed" * tag 'trace-fixes-v3.16-rc5-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Fix polling on trace_pipe tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs tracing: Fix graph tracer with stack tracer on other archs tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs tracing: instance_rmdir() leaks ftrace_event_file->filter
2014-07-17ftrace-graph: Remove dependency of ftrace_stop() from ftrace_graph_stop()Steven Rostedt (Red Hat)2-5/+35
ftrace_stop() is going away as it disables parts of function tracing that affects users that should not be affected. But ftrace_graph_stop() is built on ftrace_stop(). Here's another example of killing all of function tracing because something went wrong with function graph tracing. Instead of disabling all users of function tracing on function graph error, disable only function graph tracing. A new function is created called ftrace_graph_is_dead(). This is called in strategic paths to prevent function graph from doing more harm and allowing at least a warning to be printed before the system crashes. NOTE: ftrace_stop() is still used until all the archs are converted over to use ftrace_graph_is_dead(). After that, ftrace_stop() will be removed. Reviewed-by: Masami Hiramatsu <[email protected]> Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-17PM / Sleep: Remove ftrace_stop/start() from suspend and hibernateSteven Rostedt (Red Hat)2-8/+0
ftrace_stop() and ftrace_start() were added to the suspend and hibernate process because there was some function within the work flow that caused the system to reboot if it was traced. This function has recently been found (restore_processor_state()). Now there's no reason to disable function tracing while we are going into suspend or hibernate, which means that being able to trace this will help tremendously in debugging any issues with suspend or hibernate. This also means that the ftrace_stop/start() functions can be removed and simplify the function tracing code a bit. Link: http://lkml.kernel.org/r/[email protected] Acked-by: "Rafael J. Wysocki" <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-17arch, locking: Ciao arch_mutex_cpu_relax()Davidlohr Bueso5-16/+13
The arch_mutex_cpu_relax() function, introduced by 34b133f, is hacky and ugly. It was added a few years ago to address the fact that common cpu_relax() calls include yielding on s390, and thus impact the optimistic spinning functionality of mutexes. Nowadays we use this function well beyond mutexes: rwsem, qrwlock, mcs and lockref. Since the macro that defines the call is in the mutex header, any users must include mutex.h and the naming is misleading as well. This patch (i) renames the call to cpu_relax_lowlatency ("relax, but only if you can do it with very low latency") and (ii) defines it in each arch's asm/processor.h local header, just like for regular cpu_relax functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax, and thus we can take it out of mutex.h. While this can seem redundant, I believe it is a good choice as it allows us to move out arch specific logic from generic locking primitives and enables future(?) archs to transparently define it, similarly to System Z. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Aurelien Jacquiot <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Bharat Bhushan <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Liqin <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David Howells <[email protected]> Cc: David S. Miller <[email protected]> Cc: Deepthi Dharwar <[email protected]> Cc: Dominik Dingel <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hirokazu Takata <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Joe Perches <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Joseph Myers <[email protected]> Cc: Kees Cook <[email protected]> Cc: Koichi Yasutake <[email protected]> Cc: Lennox Wu <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Salter <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Neuling <[email protected]> Cc: Michal Simek <[email protected]> Cc: Mikael Starvik <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Qais Yousef <[email protected]> Cc: Qiaowei Ren <[email protected]> Cc: Rafael Wysocki <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Russell King <[email protected]> Cc: Steven Miao <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Stratos Karafotis <[email protected]> Cc: Tim Chen <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vasily Kulikov <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: Wolfram Sang <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-17locking/lockdep: Only ask for /proc/lock_stat output when availableAndreas Gruenbacher1-0/+2
When lockdep turns itself off, the following message is logged: Please attach the output of /proc/lock_stat to the bug report Omit this message when CONFIG_LOCK_STAT is off, and /proc/lock_stat doesn't exist. Signed-off-by: Andreas Gruenbacher <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-17Merge branch 'locking/urgent' into locking/core, before applying larger ↵Ingo Molnar26-147/+397
changes and to refresh the branch with fixes Signed-off-by: Ingo Molnar <[email protected]>
2014-07-17Merge branch 'rcu/next' of ↵Ingo Molnar10-155/+450
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: * Update RCU documentation. * Miscellaneous fixes. * Maintainership changes. * Torture-test updates. * Callback-offloading changes. Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Tooling fixes and an Intel PMU driver fixlet" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Do not allow optimized switch for non-cloned events perf/x86/intel: ignore CondChgd bit to avoid false NMI handling perf symbols: Get kernel start address by symbol name perf tools: Fix segfault in cumulative.callchain report
2014-07-16tracing: Kill "filter_string" arg of replace_preds()Oleg Nesterov1-6/+3
Cosmetic, but replace_preds() doesn't need/use "char *filter_string". Remove it to microsimplify the code. Link: http://lkml.kernel.org/p/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-16tracing: Change apply_subsystem_event_filter() paths to check file->system ↵Oleg Nesterov1-23/+16
== dir filter_free_subsystem_preds(), filter_free_subsystem_filters() and replace_system_preds() can simply check file->system->subsystem and avoid strcmp(call->class->system). Better yet, we can pass "struct ftrace_subsystem_dir *dir" instead of event_subsystem and just check file->system == dir. Thanks to Namhyung Kim who pointed out that replace_system_preds() can be changed too. Link: http://lkml.kernel.org/p/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-16tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logicOleg Nesterov1-2/+1
alloc_trace_uprobe() sets TRACE_EVENT_FL_USE_CALL_FILTER for unknown reason and this is simply wrong. Fortunately this has no effect because register_uprobe_event() clears call->flags after that. Kill both. This trace_uprobe was kzalloc'ed and we rely on this fact anyway. Link: http://lkml.kernel.org/p/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Srikar Dronamraju <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-16tracing: Kill call_filter_disable()Oleg Nesterov1-6/+1
It seems that the only purpose of call_filter_disable() is to make filter_disable() less clear and symmetrical, remove it. Link: http://lkml.kernel.org/p/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-16tracing: Kill destroy_call_preds()Oleg Nesterov2-7/+2
Remove destroy_call_preds(). Its only caller, __trace_remove_event_call(), can use free_event_filter() and nullify ->filter by hand. Perhaps we could keep this trivial helper although imo it is pointless, but then it should be static in trace_events.c. Link: http://lkml.kernel.org/p/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-16tracing: Kill destroy_preds() and destroy_file_preds()Oleg Nesterov2-21/+0
destroy_preds() makes no sense. The only caller, event_remove(), actually wants destroy_file_preds(). __trace_remove_event_call() does destroy_call_preds() which takes care of call->filter. And after the previous change we can simply remove destroy_preds() from event_remove(), we are going to call remove_event_from_tracers() which in turn calls remove_event_file_dir()->free_event_filter(). Link: http://lkml.kernel.org/p/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-16rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missingPaul E. McKenney1-3/+4
If there isn't a nohz_full= kernel parameter specified, then tick_nohz_full_mask can legitimately be NULL. This can cause problems when RCU's boot code tries to cpumask_or() this value into rcu_nocb_mask. In addition, if NO_HZ_FULL_ALL=y, there is no point in doing the cpumask_or() in the first place because this will cause RCU_NOCB_CPU_ALL=y, which in turn will have all bits already set in rcu_nocb_mask. This commit therefore avoids the cpumask_or() if NO_HZ_FULL_ALL=y and checks for !tick_nohz_full_running otherwise, this latter check catching cases when there was no nohz_full= kernel parameter specified. Signed-off-by: Paul E. McKenney <[email protected]>
2014-07-16ftrace: Allow archs to specify if they need a separate function graph trampolineSteven Rostedt (Red Hat)1-2/+4
Currently if an arch supports function graph tracing, the core code will just assign the function graph trampoline to the function graph addr that gets called. But as the old method for function graph tracing always calls the function trampoline first and that calls the function graph trampoline, some archs may have the function graph trampoline dependent on operations that were done in the function trampoline. This causes function graph tracer to break on those archs. Instead of having the default be to set the function graph ftrace_ops to the function graph trampoline, have it instead just set it to zero which will keep it from jumping to a trampoline that is not set up to be jumped directly too. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Tuomas Tynkkynen <[email protected]> Tested-by: Tuomas Tynkkynen <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-16sched: Allow wait_on_bit_action() functions to support a timeoutNeilBrown1-8/+8
It is currently not possible for various wait_on_bit functions to implement a timeout. While the "action" function that is called to do the waiting could certainly use schedule_timeout(), there is no way to carry forward the remaining timeout after a false wake-up. As false-wakeups a clearly possible at least due to possible hash collisions in bit_waitqueue(), this is a real problem. The 'action' function is currently passed a pointer to the word containing the bit being waited on. No current action functions use this pointer. So changing it to something else will be a little noisy but will have no immediate effect. This patch changes the 'action' function to take a pointer to the "struct wait_bit_key", which contains a pointer to the word containing the bit so nothing is really lost. It also adds a 'private' field to "struct wait_bit_key", which is initialized to zero. An action function can now implement a timeout with something like static int timed_out_waiter(struct wait_bit_key *key) { unsigned long waited; if (key->private == 0) { key->private = jiffies; if (key->private == 0) key->private -= 1; } waited = jiffies - key->private; if (waited > 10 * HZ) return -EAGAIN; schedule_timeout(waited - 10 * HZ); return 0; } If any other need for context in a waiter were found it would be easy to use ->private for some other purpose, or even extend "struct wait_bit_key". My particular need is to support timeouts in nfs_release_page() to avoid deadlocks with loopback mounted NFS. While wait_on_bit_timeout() would be a cleaner interface, it will not meet my need. I need the timeout to be sensitive to the state of the connection with the server, which could change. So I need to use an 'action' interface. Signed-off-by: NeilBrown <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Steve French <[email protected]> Cc: David Howells <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16sched: Remove proliferation of wait_on_bit() action functionsNeilBrown2-7/+19
The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown <[email protected]> Acked-by: David Howells <[email protected]> (fscache, keys) Acked-by: Steven Whitehouse <[email protected]> (gfs2) Acked-by: Peter Zijlstra <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Steve French <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16Merge tag 'v3.16-rc5' into sched/core, to refresh the branch before applying ↵Ingo Molnar22-137/+530
bigger tree-wide changes Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNERDavidlohr Bueso3-3/+7
Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER), encapsulate the dependencies for rwsem optimistic spinning. No logical changes here as it continues to depend on both SMP and the XADD algorithm variant. Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Jason Low <[email protected]> [ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ] Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Chris Mason <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Waiman Long <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/mutex: Disable optimistic spinning on some architecturesPeter Zijlstra1-1/+4
The optimistic spin code assumes regular stores and cmpxchg() play nice; this is found to not be true for at least: parisc, sparc32, tile32, metag-lock1, arc-!llsc and hexagon. There is further wreckage, but this in particular seemed easy to trigger, so blacklist this. Opt in for known good archs. Signed-off-by: Peter Zijlstra <[email protected]> Reported-by: Mikulas Patocka <[email protected]> Cc: David Miller <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: James Bottomley <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Jason Low <[email protected]> Cc: Waiman Long <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Paul McKenney <[email protected]> Cc: John David Anglin <[email protected]> Cc: James Hogan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: [email protected] Cc: Benjamin Herrenschmidt <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Russell King <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/rwsem: Rename 'activity' to 'count'Peter Zijlstra1-14/+14
There are two definitions of struct rw_semaphore, one in linux/rwsem.h and one in linux/rwsem-spinlock.h. For some reason they have different names for the initial field. This makes it impossible to use C99 named initialization for __RWSEM_INITIALIZER() -- or we have to duplicate that entire thing along with the structure definitions. The simpler patch is renaming the rwsem-spinlock variant to match the regular rwsem. This allows us to switch to C99 named initialization. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16sched/numa: Revert "Use effective_load() to balance NUMA loads"Peter Zijlstra1-14/+6
Due to divergent trees, Rik find that this patch is no longer required. Requested-by: Rik van Riel <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16sched: Fix static_key race with sched_feat()Jason Baron1-0/+5
As pointed out by Andi Kleen, the usage of static keys can be racy in sched_feat_disable() vs. sched_feat_enable(). Currently, we first check the value of keys->enabled, and subsequently update the branch direction. This, can be racy and can potentially leave the keys in an inconsistent state. Take the i_mutex around these calls to resolve the race. Reported-by: Andi Kleen <[email protected]> Signed-off-by: Jason Baron <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/9d7780c83db26683955cd01e6bc654ee2586e67f.1404315388.git.jbaron@akamai.com Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16sched: Remove extra static_key*() function indirectionJason Baron1-11/+1
I think its a bit simpler without having to follow an extra layer of static inline fuctions. No functional change just cosmetic. Signed-off-by: Jason Baron <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/2ce52233ce200faad93b6029d90f1411cd926667.1404315388.git.jbaron@akamai.com Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16sched/rt: Fix replenish_dl_entity() comments to match the current upstream codexiaofeng.yan1-1/+1
Signed-off-by: xiaofeng.yan <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16sched: Transform resched_task() into resched_curr()Kirill Tkhai6-45/+47
We always use resched_task() with rq->curr argument. It's not possible to reschedule any task but rq's current. The patch introduces resched_curr(struct rq *) to replace all of the repeating patterns. The main aim is cleanup, but there is a little size profit too: (before) $ size kernel/sched/built-in.o text data bss dec hex filename 155274 16445 7042 178761 2ba49 kernel/sched/built-in.o $ size vmlinux text data bss dec hex filename 7411490 1178376 991232 9581098 92322a vmlinux (after) $ size kernel/sched/built-in.o text data bss dec hex filename 155130 16445 7042 178617 2b9b9 kernel/sched/built-in.o $ size vmlinux text data bss dec hex filename 7411362 1178376 991232 9580970 9231aa vmlinux I was choosing between resched_curr() and resched_rq(), and the first name looks better for me. A little lie in Documentation/trace/ftrace.txt. I have not actually collected the tracing again. With a hope the patch won't make execution times much worse :) Signed-off-by: Kirill Tkhai <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/20140628200219.1778.18735.stgit@localhost Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16sched/deadline: Kill task_struct->pi_top_taskOleg Nesterov2-4/+3
Remove task_struct->pi_top_task. The only user, rt_mutex_setprio(), can use a local. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Alex Thorlton <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Daeseok Youn <[email protected]> Cc: Dario Faggioli <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Dempsky <[email protected]> Cc: Michal Simek <[email protected]> Cc: Oleg Nesterov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16sched: Fix possible divide by zero in avg_atom() calculationMateusz Guzik1-1/+1
proc_sched_show_task() does: if (nr_switches) do_div(avg_atom, nr_switches); nr_switches is unsigned long and do_div truncates it to 32 bits, which means it can test non-zero on e.g. x86-64 and be truncated to zero for division. Fix the problem by using div64_ul() instead. As a side effect calculations of avg_atom for big nr_switches are now correct. Signed-off-by: Mateusz Guzik <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16perf: Add vm_ops->name call for mmap event name retrievalJiri Olsa1-0/+6
The following patch added another way to get mmap name: 78d683e838a6 ("mm, fs: Add vm_ops->name as an alternative to arch_vma_name") The vdso vma mapping already switch to this and we no longer get vdso name via arch_vma_name function. Adding this way to the perf mmap event name retrieval code. Caught this via perf test: $ sudo ./perf test -v 7 7: Validate PERF_RECORD_* events & perf_sample fields : --- start --- SNIP PERF_RECORD_MMAP for [vdso] missing! test child finished with 255 ---- end ---- Validate PERF_RECORD_* events & perf_sample fields: FAILED! Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Corey Ashford <[email protected]> Cc: David Ahern <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/spinlocks/mcs: Micro-optimize osq_unlock()Jason Low1-2/+2
In the unlock function of the cancellable MCS spinlock, the first thing we do is to retrive the current CPU's osq node. However, due to the changes made in the previous patch, in the common case where the lock is not contended, we wouldn't need to access the current CPU's osq node anymore. This patch optimizes this by only retriving this CPU's osq node after we attempt the initial cmpxchg to unlock the osq and found that its contended. Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Tim Chen <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/spinlocks/mcs: Introduce and use init macro and function for osq locksJason Low2-2/+2
Currently, we initialize the osq lock by directly setting the lock's values. It would be preferable if we use an init macro to do the initialization like we do with other locks. This patch introduces and uses a macro and function for initializing the osq lock. Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Tim Chen <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overheadJason Low4-11/+44
The cancellable MCS spinlock is currently used to queue threads that are doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining the lock would access and queue the local node corresponding to the CPU that it's running on. Currently, the cancellable MCS lock is implemented by using pointers to these nodes. In this patch, instead of operating on pointers to the per-cpu nodes, we store the CPU numbers in which the per-cpu nodes correspond to in atomic_t. A similar concept is used with the qspinlock. By operating on the CPU # of the nodes using atomic_t instead of pointers to those nodes, this can reduce the overhead of the cancellable MCS spinlock by 32 bits (on 64 bit systems). Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Tim Chen <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Chris Mason <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Josef Bacik <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node()Jason Low2-16/+16
Currently, the per-cpu nodes structure for the cancellable MCS spinlock is named "optimistic_spin_queue". However, in a follow up patch in the series we will be introducing a new structure that serves as the new "handle" for the lock. It would make more sense if that structure is named "optimistic_spin_queue". Additionally, since the current use of the "optimistic_spin_queue" structure are "nodes", it might be better if we rename them to "node" anyway. This preparatory patch renames all current "optimistic_spin_queue" to "optimistic_spin_node". Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Tim Chen <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Chris Mason <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Josef Bacik <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16locking/rwsem: Allow conservative optimistic spinning when readers have lockJason Low1-5/+5
Commit 4fc828e24cd9 ("locking/rwsem: Support optimistic spinning") introduced a major performance regression for workloads such as xfs_repair which mix read and write locking of the mmap_sem across many threads. The result was xfs_repair ran 5x slower on 3.16-rc2 than on 3.15 and using 20x more system CPU time. Perf profiles indicate in some workloads that significant time can be spent spinning on !owner. This is because we don't set the lock owner when readers(s) obtain the rwsem. In this patch, we'll modify rwsem_can_spin_on_owner() such that we'll return false if there is no lock owner. The rationale is that if we just entered the slowpath, yet there is no lock owner, then there is a possibility that a reader has the lock. To be conservative, we'll avoid spinning in these situations. This patch reduced the total run time of the xfs_repair workload from about 4 minutes 24 seconds down to approximately 1 minute 26 seconds, back to close to the same performance as on 3.15. Retesting of AIM7, which were some of the workloads used to test the original optimistic spinning code, confirmed that we still get big performance gains with optimistic spinning, even with this additional regression fix. Davidlohr found that while the 'custom' workload took a performance hit of ~-14% to throughput for >300 users with this additional patch, the overall gain with optimistic spinning is still ~+45%. The 'disk' workload even improved by ~+15% at >1000 users. Tested-by: Dave Chinner <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Jason Low <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Tim Chen <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/1404532172.2572.30.camel@j-VirtualBox Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16perf: Fix lockdep warning on process exitPeter Zijlstra1-1/+17
Sasha Levin reported: > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel I've stumbled on the following spew: > > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.15.0-next-20140613-sasha-00026-g6dd125d-dirty #654 Not tainted > ------------------------------------------------------- > trinity-c578/9725 is trying to acquire lock: > (&(&pool->lock)->rlock){-.-...}, at: __queue_work (kernel/workqueue.c:1346) > > but task is already holding lock: > (&ctx->lock){-.....}, at: perf_event_exit_task (kernel/events/core.c:7471 kernel/events/core.c:7533) > > which lock already depends on the new lock. > 1 lock held by trinity-c578/9725: > #0: (&ctx->lock){-.....}, at: perf_event_exit_task (kernel/events/core.c:7471 kernel/events/core.c:7533) > > Call Trace: > dump_stack (lib/dump_stack.c:52) > print_circular_bug (kernel/locking/lockdep.c:1216) > __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) > lock_acquire (./arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) > _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) > __queue_work (kernel/workqueue.c:1346) > queue_work_on (kernel/workqueue.c:1424) > free_object (lib/debugobjects.c:209) > __debug_check_no_obj_freed (lib/debugobjects.c:715) > debug_check_no_obj_freed (lib/debugobjects.c:727) > kmem_cache_free (mm/slub.c:2683 mm/slub.c:2711) > free_task (kernel/fork.c:221) > __put_task_struct (kernel/fork.c:250) > put_ctx (include/linux/sched.h:1855 kernel/events/core.c:898) > perf_event_exit_task (kernel/events/core.c:907 kernel/events/core.c:7478 kernel/events/core.c:7533) > do_exit (kernel/exit.c:766) > do_group_exit (kernel/exit.c:884) > get_signal_to_deliver (kernel/signal.c:2347) > do_signal (arch/x86/kernel/signal.c:698) > do_notify_resume (arch/x86/kernel/signal.c:751) > int_signal (arch/x86/kernel/entry_64.S:600) Urgh.. so the only way I can make that happen is through: perf_event_exit_task_context() raw_spin_lock(&child_ctx->lock); unclone_ctx(child_ctx) put_ctx(ctx->parent_ctx); raw_spin_unlock_irqrestore(&child_ctx->lock); And we can avoid this by doing the change below. I can't immediately see how this changed recently, but given that you say it's easy to reproduce, lets fix this. Reported-by: Sasha Levin <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Dave Jones <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-16perf: Revert ("perf: Always destroy groups on exit")Peter Zijlstra1-1/+13
Vince reported that commit 15a2d4de0eab5 ("perf: Always destroy groups on exit") causes a regression with grouped events. In particular his read_group_attached.c test fails. https://github.com/deater/perf_event_tests/blob/master/tests/bugs/read_group_attached.c Because of the context switch optimization in perf_event_context_sched_out() the 'original' event may end up in the child process and when that exits the change in the patch in question destroys the actual grouping. Therefore revert that change and only destroy inherited groups. Reported-by: Vince Weaver <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-07-15ring-buffer: Fix polling on trace_pipeMartin Lau1-4/+0
ring_buffer_poll_wait() should always put the poll_table to its wait_queue even there is immediate data available. Otherwise, the following epoll and read sequence will eventually hang forever: 1. Put some data to make the trace_pipe ring_buffer read ready first 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee) 3. epoll_wait() 4. read(trace_pipe_fd) till EAGAIN 5. Add some more data to the trace_pipe ring_buffer 6. epoll_wait() -> this epoll_wait() will block forever ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2, ring_buffer_poll_wait() returns immediately without adding poll_table, which has poll_table->_qproc pointing to ep_poll_callback(), to its wait_queue. ~ During the epoll_wait() call in step 3 and step 6, ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue because the poll_table->_qproc is NULL and it is how epoll works. ~ When there is new data available in step 6, ring_buffer does not know it has to call ep_poll_callback() because it is not in its wait queue. Hence, block forever. Other poll implementation seems to call poll_wait() unconditionally as the very first thing to do. For example, tcp_poll() in tcp.c. Link: http://lkml.kernel.org/p/[email protected] Cc: [email protected] # 2.6.27 Fixes: 2a2cc8f7c4d0 "ftrace: allow the event pipe to be polled" Reviewed-by: Chris Mason <[email protected]> Signed-off-by: Martin Lau <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-15tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputszhangwei(Jovi)1-0/+6
The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing, so add it, to be consistent with __trace_printk/__trace_bprintk. Those functions are all called by the same function: trace_printk(). Link: http://lkml.kernel.org/p/[email protected] Cc: [email protected] # 3.11+ Signed-off-by: zhangwei(Jovi) <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2014-07-15workqueue: alloc struct worker on its local nodeLai Jiangshan1-4/+4
When the create_worker() is called from non-manager, the struct worker is allocated from the node of the caller which may be different from the node of pool->node. So we add a node ID argument for the alloc_worker() to ensure the struct worker is allocated from the preferable node. tj: @nid renamed to @node for consistency. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2014-07-15tracing: Fix graph tracer with stack tracer on other archsSteven Rostedt (Red Hat)1-2/+2
Running my ftrace tests on PowerPC, it failed the test that checks if function_graph tracer is affected by the stack tracer. It was. Looking into this, I found that the update_function_graph_func() must be called even if the trampoline function is not changed. This is because archs like PowerPC do not support ftrace_ops being passed by assembly and instead uses a helper function (what the trampoline function points to). Since this function is not changed even when multiple ftrace_ops are added to the code, the test that falls out before calling update_function_graph_func() will miss that the update must still be done. Call update_function_graph_function() for all calls to update_ftrace_function() Cc: [email protected] # 3.3+ Signed-off-by: Steven Rostedt <[email protected]>
2014-07-15cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files testTejun Heo1-5/+4
cgrp_dfl_root_inhibit_ss_mask determines which subsystems are not supported on the default hierarchy and is currently initialized statically and just includes the debug subsystem. Now that there's cgroup_subsys->dfl_files, we can easily tell which subsystems support the default hierarchy or not. Let's initialize cgrp_dfl_root_inhibit_ss_mask by testing whether cgroup_subsys->dfl_files is NULL. After all, subsystems with NULL ->dfl_files aren't useable on the default hierarchy anyway. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2014-07-15cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup coreTejun Heo1-5/+5
cgroup now distinguishes cftypes for the default and legacy hierarchies more explicitly by using separate arrays and CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE should be and are used only inside cgroup core proper. Let's make it clear that the flags are internal by prefixing them with double underscores. CFTYPE_INSANE is renamed to __CFTYPE_NOT_ON_DFL for consistency. The two flags are also collected and assigned bits >= 16 so that they aren't mixed with the published flags. v2: Convert the extra ones in cgroup_exit_cftypes() which are added by revision to the previous patch. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2014-07-15cgroup: distinguish the default and legacy hierarchies when handling cftypesTejun Heo1-3/+59
Until now, cftype arrays carried files for both the default and legacy hierarchies and the files which needed to be used on only one of them were flagged with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE. This gets confusing very quickly and we may end up exposing interface files to the default hierarchy without thinking it through. This patch makes cgroup core provide separate sets of interfaces for cftype handling so that the cftypes for the default and legacy hierarchies are clearly distinguished. The previous two patches renamed the existing ones so that they clearly indicate that they're for the legacy hierarchies. This patch adds the interface for the default hierarchy and apply them selectively depending on the hierarchy type. * cftypes added through cgroup_subsys->dfl_cftypes and cgroup_add_dfl_cftypes() only show up on the default hierarchy. * cftypes added through cgroup_subsys->legacy_cftypes and cgroup_add_legacy_cftypes() only show up on the legacy hierarchies. * cgroup_subsys->dfl_cftypes and ->legacy_cftypes can point to the same array for the cases where the interface files are identical on both types of hierarchies. * This makes all the existing subsystem interface files legacy-only by default and all subsystems will have no interface file created when enabled on the default hierarchy. Each subsystem should explicitly review and compose the interface for the default hierarchy. * A boot param "cgroup__DEVEL__legacy_files_on_dfl" is added which makes subsystems which haven't decided the interface files for the default hierarchy to present the legacy files on the default hierarchy so that its behavior on the default hierarchy can be tested. As the awkward name suggests, this is for development only. * memcg's CFTYPE_INSANE on "use_hierarchy" is noop now as the whole array isn't used on the default hierarchy. The flag is removed. v2: Updated documentation for cgroup__DEVEL__legacy_files_on_dfl. v3: Clear CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE when cfts are removed as suggested by Li. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: Li Zefan <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Aneesh Kumar K.V <[email protected]>
2014-07-15cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes()Tejun Heo1-1/+6
Currently, cftypes added by cgroup_add_cftypes() are used for both the unified default hierarchy and legacy ones and subsystems can mark each file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear only on one of them. This is quite hairy and error-prone. Also, we may end up exposing interface files to the default hierarchy without thinking it through. cgroup_subsys will grow two separate cftype addition functions and apply each only on the hierarchies of the matching type. This will allow organizing cftypes in a lot clearer way and encourage subsystems to scrutinize the interface which is being exposed in the new default hierarchy. In preparation, this patch adds cgroup_add_legacy_cftypes() which currently is a simple wrapper around cgroup_add_cftypes() and replaces all cgroup_add_cftypes() usages with it. While at it, this patch drops a completely spurious return from __hugetlb_cgroup_file_init(). This patch doesn't introduce any functional differences. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: Li Zefan <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Aneesh Kumar K.V <[email protected]>
2014-07-15cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypesTejun Heo5-6/+6
Currently, cgroup_subsys->base_cftypes is used for both the unified default hierarchy and legacy ones and subsystems can mark each file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear only on one of them. This is quite hairy and error-prone. Also, we may end up exposing interface files to the default hierarchy without thinking it through. cgroup_subsys will grow two separate cftype arrays and apply each only on the hierarchies of the matching type. This will allow organizing cftypes in a lot clearer way and encourage subsystems to scrutinize the interface which is being exposed in the new default hierarchy. In preparation, this patch renames cgroup_subsys->base_cftypes to cgroup_subsys->legacy_cftypes. This patch is pure rename. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: Li Zefan <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Aneesh Kumar K.V <[email protected]>