aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2011-07-21Merge branch 'tip/perf/core' of ↵Ingo Molnar9-164/+442
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2011-07-21Merge branch 'perf/urgent' into perf/coreIngo Molnar6-35/+144
Merge reason: pick up the latest fixes - they won't make v3.0. Signed-off-by: Ingo Molnar <[email protected]>
2011-07-20rw_semaphore: remove up/down_read_non_ownerChristoph Hellwig1-16/+0
Now that the last users is gone these can be removed. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-07-20Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds4-28/+100
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: signal: align __lock_task_sighand() irq disabling and RCU softirq,rcu: Inform RCU of irq_exit() activity sched: Add irq_{enter,exit}() to scheduler_ipi() rcu: protect __rcu_read_unlock() against scheduler-using irq handlers rcu: Streamline code produced by __rcu_read_unlock() rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special rcu: decrease rcu_report_exp_rnp coupling with scheduler
2011-07-20time: Fix stupid KERN_WARN compile issueJohn Stultz1-1/+1
Terribly embarassing. Don't know how I committed this, but its KERN_WARNING not KERN_WARN. This fixes the following compile error: kernel/time/timekeeping.c: In function ‘__timekeeping_inject_sleeptime’: kernel/time/timekeeping.c:608: error: ‘KERN_WARN’ undeclared (first use in this function) kernel/time/timekeeping.c:608: error: (Each undeclared identifier is reported only once kernel/time/timekeeping.c:608: error: for each function it appears in.) kernel/time/timekeeping.c:608: error: expected ‘)’ before string constant make[2]: *** [kernel/time/timekeeping.o] Error 1 Reported-by: Ingo Molnar <[email protected]> Signed-off-by: John Stultz <[email protected]>
2011-07-20sysctl,rcu: Convert call_rcu(free_head) to kfreePaul E. McKenney1-8/+3
The RCU callback free_head just calls kfree(), so we can use kfree_rcu() instead of call_rcu(). Signed-off-by: Paul E. McKenney <[email protected]> Cc: Andrew Morton <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2011-07-20audit_tree,rcu: Convert call_rcu(__put_tree) to kfree_rcu()Lai Jiangshan1-7/+1
The rcu callback __put_tree() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(__put_tree). Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Cc: Al Viro <[email protected]> Cc: Eric Paris <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2011-07-20Merge branch 'rcu/urgent' of ↵Ingo Molnar4-28/+100
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/urgent
2011-07-20signal: align __lock_task_sighand() irq disabling and RCUPaul E. McKenney1-6/+13
The __lock_task_sighand() function calls rcu_read_lock() with interrupts and preemption enabled, but later calls rcu_read_unlock() with interrupts disabled. It is therefore possible that this RCU read-side critical section will be preempted and later RCU priority boosted, which means that rcu_read_unlock() will call rt_mutex_unlock() in order to deboost itself, but with interrupts disabled. This results in lockdep splats, so this commit nests the RCU read-side critical section within the interrupt-disabled region of code. This prevents the RCU read-side critical section from being preempted, and thus prevents the attempt to deboost with interrupts disabled. It is quite possible that a better long-term fix is to make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's ->wait_lock. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2011-07-20softirq,rcu: Inform RCU of irq_exit() activityPeter Zijlstra2-3/+11
The rcu_read_unlock_special() function relies on in_irq() to exclude scheduler activity from interrupt level. This fails because exit_irq() can invoke the scheduler after clearing the preempt_count() bits that in_irq() uses to determine that it is at interrupt level. This situation can result in failures as follows: $task IRQ SoftIRQ rcu_read_lock() /* do stuff */ <preempt> |= UNLOCK_BLOCKED rcu_read_unlock() --t->rcu_read_lock_nesting irq_enter(); /* do stuff, don't use RCU */ irq_exit(); sub_preempt_count(IRQ_EXIT_OFFSET); invoke_softirq() ttwu(); spin_lock_irq(&pi->lock) rcu_read_lock(); /* do stuff */ rcu_read_unlock(); rcu_read_unlock_special() rcu_report_exp_rnp() ttwu() spin_lock_irq(&pi->lock) /* deadlock */ rcu_read_unlock_special(t); Ed can simply trigger this 'easy' because invoke_softirq() immediately does a ttwu() of ksoftirqd/# instead of doing the in-place softirq stuff first, but even without that the above happens. Cure this by also excluding softirqs from the rcu_read_unlock_special() handler and ensuring the force_irqthreads ksoftirqd/# wakeup is done from full softirq context. [ Alternatively, delaying the ->rcu_read_lock_nesting decrement until after the special handling would make the thing more robust in the face of interrupts as well. And there is a separate patch for that. ] Cc: Thomas Gleixner <[email protected]> Reported-and-tested-by: Ed Tomlinson <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2011-07-20sched: Add irq_{enter,exit}() to scheduler_ipi()Peter Zijlstra1-6/+38
Ensure scheduler_ipi() calls irq_{enter,exit} when it does some actual work. Traditionally we never did any actual work from the resched IPI and all magic happened in the return from interrupt path. Now that we do do some work, we need to ensure irq_{enter,exit} are called so that we don't confuse things. This affects things like timekeeping, NO_HZ and RCU, basically everything with a hook in irq_enter/exit. Explicit examples of things going wrong are: sched_clock_cpu() -- has a callback when leaving NO_HZ state to take a new reading from GTOD and TSC. Without this callback, time is stuck in the past. RCU -- needs in_irq() to work in order to avoid some nasty deadlocks Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2011-07-20rcu: protect __rcu_read_unlock() against scheduler-using irq handlersPaul E. McKenney1-5/+24
The addition of RCU read-side critical sections within runqueue and priority-inheritance lock critical sections introduced some deadlock cycles, for example, involving interrupts from __rcu_read_unlock() where the interrupt handlers call wake_up(). This situation can cause the instance of __rcu_read_unlock() invoked from interrupt to do some of the processing that would otherwise have been carried out by the task-level instance of __rcu_read_unlock(). When the interrupt-level instance of __rcu_read_unlock() is called with a scheduler lock held from interrupt-entry/exit situations where in_irq() returns false, deadlock can result. This commit resolves these deadlocks by using negative values of the per-task ->rcu_read_lock_nesting counter to indicate that an instance of __rcu_read_unlock() is in flight, which in turn prevents instances from interrupt handlers from doing any special processing. This patch is inspired by Steven Rostedt's earlier patch that similarly made __rcu_read_unlock() guard against interrupt-mediated recursion (see https://lkml.org/lkml/2011/7/15/326), but this commit refines Steven's approach to avoid the need for preemption disabling on the __rcu_read_unlock() fastpath and to also avoid the need for manipulating a separate per-CPU variable. This patch avoids need for preempt_disable() by instead using negative values of the per-task ->rcu_read_lock_nesting counter. Note that nested rcu_read_lock()/rcu_read_unlock() pairs are still permitted, but they will never see ->rcu_read_lock_nesting go to zero, and will therefore never invoke rcu_read_unlock_special(), thus preventing them from seeing the RCU_READ_UNLOCK_BLOCKED bit should it be set in ->rcu_read_unlock_special. This patch also adds a check for ->rcu_read_unlock_special being negative in rcu_check_callbacks(), thus preventing the RCU_READ_UNLOCK_NEED_QS bit from being set should a scheduling-clock interrupt occur while __rcu_read_unlock() is exiting from an outermost RCU read-side critical section. Of course, __rcu_read_unlock() can be preempted during the time that ->rcu_read_lock_nesting is negative. This could result in the setting of the RCU_READ_UNLOCK_BLOCKED bit after __rcu_read_unlock() checks it, and would also result it this task being queued on the corresponding rcu_node structure's blkd_tasks list. Therefore, some later RCU read-side critical section would enter rcu_read_unlock_special() to clean up -- which could result in deadlock if that critical section happened to be in the scheduler where the runqueue or priority-inheritance locks were held. This situation is dealt with by making rcu_preempt_note_context_switch() check for negative ->rcu_read_lock_nesting, thus refraining from queuing the task (and from setting RCU_READ_UNLOCK_BLOCKED) if we are already exiting from the outermost RCU read-side critical section (in other words, we really are no longer actually in that RCU read-side critical section). In addition, rcu_preempt_note_context_switch() invokes rcu_read_unlock_special() to carry out the cleanup in this case, which clears out the ->rcu_read_unlock_special bits and dequeues the task (if necessary), in turn avoiding needless delay of the current RCU grace period and needless RCU priority boosting. It is still illegal to call rcu_read_unlock() while holding a scheduler lock if the prior RCU read-side critical section has ever had either preemption or irqs enabled. However, the common use case is legal, namely where then entire RCU read-side critical section executes with irqs disabled, for example, when the scheduler lock is held across the entire lifetime of the RCU read-side critical section. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2011-07-20sched: Avoid creating superfluous NUMA domains on non-NUMA systemsPeter Zijlstra1-0/+2
When creating sched_domains, stop when we've covered the entire target span instead of continuing to create domains, only to later find they're redundant and throw them away again. This avoids single node systems from touching funny NUMA sched_domain creation code and reduces the risks of the new SD_OVERLAP code. Requested-by: Linus Torvalds <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1311180177.29152.57.camel@twins Signed-off-by: Ingo Molnar <[email protected]>
2011-07-20sched: Allow for overlapping sched_domain spansPeter Zijlstra2-29/+130
Allow for sched_domain spans that overlap by giving such domains their own sched_group list instead of sharing the sched_groups amongst each-other. This is needed for machines with more than 16 nodes, because sched_domain_node_span() will generate a node mask from the 16 nearest nodes without regard if these masks have any overlap. Currently sched_domains have a sched_group that maps to their child sched_domain span, and since there is no overlap we share the sched_group between the sched_domains of the various CPUs. If however there is overlap, we would need to link the sched_group list in different ways for each cpu, and hence sharing isn't possible. In order to solve this, allocate private sched_groups for each CPU's sched_domain but have the sched_groups share a sched_group_power structure such that we can uniquely track the power. Reported-and-tested-by: Anton Blanchard <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-07-20sched: Break out cpu_power from the sched_group structurePeter Zijlstra2-29/+49
In order to prepare for non-unique sched_groups per domain, we need to carry the cpu_power elsewhere, so put a level of indirection in. Reported-and-tested-by: Anton Blanchard <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-07-20make sure that nsproxy_cache is initialized early enoughAl Viro2-3/+2
Signed-off-by: Al Viro <[email protected]>
2011-07-20kill file_permission() completelyAl Viro1-1/+2
convert the last remaining caller to inode_permission() Signed-off-by: Al Viro <[email protected]>
2011-07-19rcu: Streamline code produced by __rcu_read_unlock()Paul E. McKenney1-6/+6
Given some common flag combinations, particularly -Os, gcc will inline rcu_read_unlock_special() despite its being in an unlikely() clause. Use noinline to prohibit this misoptimization. In addition, move the second barrier() in __rcu_read_unlock() so that it is not on the common-case code path. This will allow the compiler to generate better code for the common-case path through __rcu_read_unlock(). Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Acked-by: Mathieu Desnoyers <[email protected]>
2011-07-19rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_specialPaul E. McKenney1-2/+6
The RCU_BOOST commits for TREE_PREEMPT_RCU introduced an other-task write to a new RCU_READ_UNLOCK_BOOSTED bit in the task_struct structure's ->rcu_read_unlock_special field, but, as noted by Steven Rostedt, without correctly synchronizing all accesses to ->rcu_read_unlock_special. This could result in bits in ->rcu_read_unlock_special being spuriously set and cleared due to conflicting accesses, which in turn could result in deadlocks between the rcu_node structure's ->lock and the scheduler's rq and pi locks. These deadlocks would result from RCU incorrectly believing that the just-ended RCU read-side critical section had been preempted and/or boosted. If that RCU read-side critical section was executed with either rq or pi locks held, RCU's ensuing (incorrect) calls to the scheduler would cause the scheduler to attempt to once again acquire the rq and pi locks, resulting in deadlock. More complex deadlock cycles are also possible, involving multiple rq and pi locks as well as locks from multiple rcu_node structures. This commit fixes synchronization by creating ->rcu_boosted field in task_struct that is accessed and modified only when holding the ->lock in the rcu_node structure on which the task is queued (on that rcu_node structure's ->blkd_tasks list). This results in tasks accessing only their own current->rcu_read_unlock_special fields, making unsynchronized access once again legal, and keeping the rcu_read_unlock() fastpath free of atomic instructions and memory barriers. The reason that the rcu_read_unlock() fastpath does not need to access the new current->rcu_boosted field is that this new field cannot be non-zero unless the RCU_READ_UNLOCK_BLOCKED bit is set in the current->rcu_read_unlock_special field. Therefore, rcu_read_unlock() need only test current->rcu_read_unlock_special: if that is zero, then current->rcu_boosted must also be zero. This bug does not affect TINY_PREEMPT_RCU because this implementation of RCU accesses current->rcu_read_unlock_special with irqs disabled, thus preventing races on the !SMP systems that TINY_PREEMPT_RCU runs on. Maybe-reported-by: Dave Jones <[email protected]> Maybe-reported-by: Sergey Senozhatsky <[email protected]> Reported-by: Steven Rostedt <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Steven Rostedt <[email protected]>
2011-07-19rcu: decrease rcu_report_exp_rnp coupling with schedulerPaul E. McKenney1-2/+4
PREEMPT_RCU read-side critical sections blocking an expedited grace period invoke rcu_report_exp_rnp(). When the last such critical section has completed, rcu_report_exp_rnp() invokes the scheduler to wake up the task that invoked synchronize_rcu_expedited() -- needlessly holding the root rcu_node structure's lock while doing so, thus needlessly providing a way for RCU and the scheduler to deadlock. This commit therefore releases the root rcu_node structure's lock before calling wake_up(). Reported-by: Ed Tomlinson <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2011-07-20kernel: prevent unnecessary rebuilding due to config_data.gzPeter Foley1-3/+2
When IKCONFIG is built-in make oldconfig will cause the kernel to be relinked even if .config didn't change. This happens because of a config_data.gz dependency on .config. This patch changes the if_changed to a filechk so that config_data.h is only rebuilt when the contents have actually changed. Signed-off-by: Peter Foley <[email protected]> Signed-off-by: Michal Marek <[email protected]>
2011-07-18connector: add an event for monitoring process tracersVladimir Zapolskiy1-1/+6
This change adds a procfs connector event, which is emitted on every successful process tracer attach or detach. If some process connects to other one, kernelspace connector reports process id and thread group id of both these involved processes. On disconnection null process id is returned. Such an event allows to create a simple automated userspace mechanism to be aware about processes connecting to others, therefore predefined process policies can be applied to them if needed. Note, a detach signal is emitted only in case, if a tracer process explicitly executes PTRACE_DETACH request. In other cases like tracee or tracer exit detach event from proc connector is not reported. Signed-off-by: Vladimir Zapolskiy <[email protected]> Acked-by: Evgeniy Polyakov <[email protected]> Cc: David S. Miller <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]>
2011-07-17ptrace: mv send-SIGSTOP from do_fork() to ptrace_init_task()Oleg Nesterov1-12/+0
If the new child is traced, do_fork() adds the pending SIGSTOP. It assumes that either it is traced because of auto-attach or the tracer attached later, in both cases sigaddset/set_thread_flag is correct even if SIGSTOP is already pending. Now that we have PTRACE_SEIZE this is no longer right in the latter case. If the tracer does PTRACE_SEIZE after copy_process() makes the child visible the queued SIGSTOP is wrong. We could check PT_SEIZED bit and change ptrace_attach() to set both PT_PTRACED and PT_SEIZED bits simultaneously but see the next patch, we need to know whether this child was auto-attached or not anyway. So this patch simply moves this code to ptrace_init_task(), this way we can never race with ptrace_attach(). Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Tejun Heo <[email protected]>
2011-07-17has_stopped_jobs: s/task_is_stopped/SIGNAL_STOP_STOPPED/Oleg Nesterov1-7/+5
has_stopped_jobs() naively checks task_is_stopped(group_leader). This was always wrong even without ptrace, group_leader can be dead. And given that ptrace can change the state to TRACED this is wrong even in the single-threaded case. Change the code to check SIGNAL_STOP_STOPPED and simplify the code, retval + break/continue doesn't make this trivial code more readable. We could probably add the usual "|| signal->group_stop_count" check but I don't think this makes sense, the task can start the group-stop right after the check anyway. Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Tejun Heo <[email protected]>
2011-07-15Merge branch 'pm-domains' into for-linusRafael J. Wysocki1-2/+6
* pm-domains: (33 commits) ARM / shmobile: Return -EBUSY from A4LC power off if A3RV is active PM / Domains: Take .power_off() error code into account ARM / shmobile: Use genpd_queue_power_off_work() ARM / shmobile: Use pm_genpd_poweroff_unused() PM / Domains: Introduce function to power off all unused PM domains PM / Domains: Queue up power off work only if it is not pending PM / Domains: Improve handling of wakeup devices during system suspend PM / Domains: Do not restore all devices on power off error PM / Domains: Allow callbacks to execute all runtime PM helpers PM / Domains: Do not execute device callbacks under locks PM / Domains: Make failing pm_genpd_prepare() clean up properly PM / Domains: Set device state to "active" during system resume ARM: mach-shmobile: sh7372 A3RV requires A4LC PM / Domains: Export pm_genpd_poweron() in header ARM: mach-shmobile: sh7372 late pm domain off ARM: mach-shmobile: Runtime PM late init callback ARM: mach-shmobile: sh7372 D4 support ARM: mach-shmobile: sh7372 A4MP support ARM: mach-shmobile: sh7372: make sure that fsi is peripheral of spu2 ARM: mach-shmobile: sh7372 A3SG support ...
2011-07-15PM: Improve error code of pm_notifier_call_chain()Akinobu Mita1-2/+3
This enables pm_notifier_call_chain() to get the actual error code in the callback rather than always assume -EINVAL by converting all PM notifier calls to return encapsulate error code with notifier_from_errno(). Signed-off-by: Akinobu Mita <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2011-07-15PM / Suspend: Export suspend_set_ops, suspend_valid_only_memKevin Hilman1-0/+2
Some platforms wish to implement their PM core suspend code as modules. To do so, these functions need to be exported to modules. [rjw: Replaced EXPORT_SYMBOL with EXPORT_SYMBOL_GPL] Reported-by: Jean Pihet <[email protected]> Signed-off-by: Kevin Hilman <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2011-07-15PM / Suspend: Add .suspend_again() callback to suspend_opsMyungJoo Ham1-6/+12
A system or a device may need to control suspend/wakeup events. It may want to wakeup the system after a predefined amount of time or at a predefined event decided while entering suspend for polling or delayed work. Then, it may want to enter suspend again if its predefined wakeup condition is the only wakeup reason and there is no outstanding events; thus, it does not wakeup the userspace unnecessary or unnecessary devices and keeps suspended as long as possible (saving the power). Enabling a system to wakeup after a specified time can be easily achieved by using RTC. However, to enter suspend again immediately without invoking userland and unrelated devices, we need additional features in the suspend framework. Such need comes from: 1. Monitoring a critical device status without interrupts that can wakeup the system. (in-suspend polling) An example is ambient temperature monitoring that needs to shut down the system or a specific device function if it is too hot or cold. The temperature of a specific device may be needed to be monitored as well; e.g., a charger monitors battery temperature in order to stop charging if overheated. 2. Execute critical "delayed work" at suspend. A driver or a system/board may have a delayed work (or any similar things) that it wants to execute at the requested time. For example, some chargers want to check the battery voltage some time (e.g., 30 seconds) after the battery is fully charged and the charger has stopped. Then, the charger restarts charging if the voltage has dropped more than a threshold, which is smaller than "restart-charger" voltage, which is a threshold to restart charging regardless of the time passed. This patch allows to add "suspend_again" callback at struct platform_suspend_ops and let the "suspend_again" callback return true if the system is required to enter suspend again after the current instance of wakeup. Device-wise suspend_again implemented at dev_pm_ops or syscore is not done because: a) suspend_again feature is usually under platform-wise decision and controls the behavior of the whole platform and b) There are very limited devices related to the usage cases of suspend_again; chargers and temperature sensors are mentioned so far. With suspend_again callback registered at struct platform_suspend_ops suspend_ops in kernel/power/suspend.c with suspend_set_ops by the platform, the suspend framework tries to enter suspend again by looping suspend_enter() if suspend_again has returned true and there has been no errors in the suspending sequence or pending wakeups (by pm_wakeup_pending). Tested at Exynos4-NURI. [rjw: Fixed up kerneldoc comment for suspend_enter().] Signed-off-by: MyungJoo Ham <[email protected]> Signed-off-by: Kyungmin Park <[email protected]> Acked-by: Pavel Machek <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2011-07-15tracing/kprobe: Update symbol reference when loading moduleMasami Hiramatsu1-1/+36
Since the address of a module-local variable can only be solved after the target module is loaded, the symbol fetch-argument should be updated when loading target module. Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Link: http://lkml.kernel.org/r/20110627072703.6528.75042.stgit@fedora15 Signed-off-by: Steven Rostedt <[email protected]>
2011-07-15tracing/kprobes: Support module init function probingMasami Hiramatsu1-26/+138
To support probing module init functions, kprobe-tracer allows user to define a probe on non-existed function when it is given with a module name. This also enables user to set a probe on a function on a specific module, even if a same name (but different) function is locally defined in another module. The module name must be in the front of function name and separated by a ':'. e.g. btrfs:btrfs_init_sysfs Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Link: http://lkml.kernel.org/r/20110627072656.6528.89970.stgit@fedora15 Signed-off-by: Steven Rostedt <[email protected]>
2011-07-15kprobes: Return -ENOENT if probe point doesn't existMasami Hiramatsu1-10/+23
Return -ENOENT if probe point doesn't exist, but still returns -EINVAL if both of kprobe->addr and kprobe->symbol_name are specified or both are not specified. Acked-by: Ananth N Mavinakayanahalli <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: "David S. Miller" <[email protected]> Link: http://lkml.kernel.org/r/20110627072650.6528.67329.stgit@fedora15 Signed-off-by: Steven Rostedt <[email protected]>
2011-07-15tracing/kprobes: Merge trace probe enable/disable functionsMasami Hiramatsu1-56/+36
Merge redundant enable/disable functions into enable_trace_probe() and disable_trace_probe(). Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Link: http://lkml.kernel.org/r/20110627072644.6528.26910.stgit@fedora15 [ converted kprobe selftest to use enable_trace_probe ] Signed-off-by: Steven Rostedt <[email protected]>
2011-07-15Merge branch 'rcu/urgent' of ↵Linus Torvalds2-5/+36
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu * 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu: rcu: Prevent RCU callbacks from executing before scheduler initialized
2011-07-15sched: Fix 32bit racePeter Zijlstra1-0/+3
Commit 3fe1698b7fe0 ("sched: Deal with non-atomic min_vruntime reads on 32bit") forgot to initialize min_vruntime_copy which could lead to an infinite while loop in task_waking_fair() under some circumstances (early boot, lucky timing). [ This bug was also reported by others that blamed it on the RCU initialization problems ] Reported-and-tested-by: Bruno Wolff III <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-07-14ftrace: Fix regression where ftrace breaks when modules are loadedSteven Rostedt1-2/+28
Enabling function tracer to trace all functions, then load a module and then disable function tracing will cause ftrace to fail. This can also happen by enabling function tracing on the command line: ftrace=function and during boot up, modules are loaded, then you disable function tracing with 'echo nop > current_tracer' you will trigger a bug in ftrace that will shut itself down. The reason is, the new ftrace code keeps ref counts of all ftrace_ops that are registered for tracing. When one or more ftrace_ops are registered, all the records that represent the functions that the ftrace_ops will trace have a ref count incremented. If this ref count is not zero, when the code modification runs, that function will be enabled for tracing. If the ref count is zero, that function will be disabled from tracing. To make sure the accounting was working, FTRACE_WARN_ON()s were added to updating of the ref counts. If the ref count hits its max (> 2^30 ftrace_ops added), or if the ref count goes below zero, a FTRACE_WARN_ON() is triggered which disables all modification of code. Since it is common for ftrace_ops to trace all functions in the kernel, instead of creating > 20,000 hash items for the ftrace_ops, the hash count is just set to zero, and it represents that the ftrace_ops is to trace all functions. This is where the issues arrise. If you enable function tracing to trace all functions, and then add a module, the modules function records do not get the ref count updated. When the function tracer is disabled, all function records ref counts are subtracted. Since the modules never had their ref counts incremented, they go below zero and the FTRACE_WARN_ON() is triggered. The solution to this is rather simple. When modules are loaded, and their functions are added to the the ftrace pool, look to see if any ftrace_ops are registered that trace all functions. And for those, update the ref count for the module function records. Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-07-14tracing/kprobes: Rename probe_* to trace_probe_*Masami Hiramatsu1-21/+22
Rename probe_* to trace_probe_* for avoiding namespace confliction. This also fixes improper names of find_probe_event() and cleanup_all_probes() to find_trace_probe() and release_all_trace_probes() respectively. Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> Link: http://lkml.kernel.org/r/20110627072636.6528.60374.stgit@fedora15 Signed-off-by: Steven Rostedt <[email protected]>
2011-07-14perf, x86: P4 PMU - Introduce event alias featureCyrill Gorcunov1-2/+0
Instead of hw_nmi_watchdog_set_attr() weak function and appropriate x86_pmu::hw_watchdog_set_attr() call we introduce even alias mechanism which allow us to drop this routines completely and isolate quirks of Netburst architecture inside P4 PMU code only. The main idea remains the same though -- to allow nmi-watchdog and perf top run simultaneously. Note the aliasing mechanism applies to generic PERF_COUNT_HW_CPU_CYCLES event only because arbitrary event (say passed as RAW initially) might have some additional bits set inside ESCR register changing the behaviour of event and we can't guarantee anymore that alias event will give the same result. P.S. Thanks a huge to Don and Steven for for testing and early review. Acked-by: Don Zickus <[email protected]> Tested-by: Steven Rostedt <[email protected]> Signed-off-by: Cyrill Gorcunov <[email protected]> CC: Ingo Molnar <[email protected]> CC: Peter Zijlstra <[email protected]> CC: Stephane Eranian <[email protected]> CC: Lin Ming <[email protected]> CC: Arnaldo Carvalho de Melo <[email protected]> CC: Frederic Weisbecker <[email protected]> Link: http://lkml.kernel.org/r/20110708201712.GS23657@sun Signed-off-by: Steven Rostedt <[email protected]>
2011-07-14tracing: Have dynamic size event stack tracesSteven Rostedt3-19/+87
Currently the stack trace per event in ftace is only 8 frames. This can be quite limiting and sometimes useless. Especially when the "ignore frames" is wrong and we also use up stack frames for the event processing itself. Change this to be dynamic by adding a percpu buffer that we can write a large stack frame into and then copy into the ring buffer. For interrupts and NMIs that come in while another event is being process, will only get to use the 8 frame stack. That should be enough as the task that it interrupted will have the full stack frame anyway. Requested-by: Thomas Gleixner <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-07-14sched: adjust scheduler cpu power for stolen timeGlauber Costa2-12/+39
This patch makes update_rq_clock() aware of steal time. The mechanism of operation is not different from irq_time, and follows the same principles. This lives in a CONFIG option itself, and can be compiled out independently of the rest of steal time reporting. The effect of disabling it is that the scheduler will still report steal time (that cannot be disabled), but won't use this information for cpu power adjustments. Everytime update_rq_clock_task() is invoked, we query information about how much time was stolen since last call, and feed it into sched_rt_avg_update(). Although steal time reporting in account_process_tick() keeps track of the last time we read the steal clock, in prev_steal_time, this patch do it independently using another field, prev_steal_time_rq. This is because otherwise, information about time accounted in update_process_tick() would never reach us in update_rq_clock(). Signed-off-by: Glauber Costa <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Tested-by: Eric B Munson <[email protected]> CC: Jeremy Fitzhardinge <[email protected]> CC: Anthony Liguori <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2011-07-14KVM guest: Steal time accountingGlauber Costa1-0/+43
This patch accounts steal time time in account_process_tick. If one or more tick is considered stolen in the current accounting cycle, user/system accounting is skipped. Idle is fine, since the hypervisor does not report steal time if the guest is halted. Accounting steal time from the core scheduler give us the advantage of direct acess to the runqueue data. In a later opportunity, it can be used to tweak cpu power and make the scheduler aware of the time it lost. [avi: <asm/paravirt.h> doesn't exist on many archs] Signed-off-by: Glauber Costa <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Tested-by: Eric B Munson <[email protected]> CC: Jeremy Fitzhardinge <[email protected]> CC: Anthony Liguori <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2011-07-14KVM: Steal time implementationGlauber Costa1-0/+2
To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM, while the vcpu had meaningful work to do - halt time does not count. This information is acquired through the run_delay field of delayacct/schedstats infrastructure, that counts time spent in a runqueue but not running. Steal time is a per-cpu information, so the traditional MSR-based infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the memory area address containing information about steal time This patch contains the hypervisor part of the steal time infrasructure, and can be backported independently of the guest portion. [avi, yongjie: export delayacct_on, to avoid build failures in some configs] Signed-off-by: Glauber Costa <[email protected]> Tested-by: Eric B Munson <[email protected]> CC: Rik van Riel <[email protected]> CC: Jeremy Fitzhardinge <[email protected]> CC: Peter Zijlstra <[email protected]> CC: Anthony Liguori <[email protected]> Signed-off-by: Yongjie Ren <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2011-07-13ftrace: Fix dynamic selftest failure on some archsSteven Rostedt1-0/+26
Archs that do not implement CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, will fail the dynamic ftrace selftest. The function tracer has a quick 'off' variable that will prevent the call back functions from being called. This variable is called function_trace_stop. In x86, this is implemented directly in the mcount assembly, but for other archs, an intermediate function is used called ftrace_test_stop_func(). In dynamic ftrace, the function pointer variable ftrace_trace_function is used to update the caller code in the mcount caller. But for archs that do not have CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST set, it only calls ftrace_test_stop_func() instead, which in turn calls __ftrace_trace_function. When more than one ftrace_ops is registered, the function it calls is ftrace_ops_list_func(), which will iterate over all registered ftrace_ops and call the callbacks that have their hash matching. The issue happens when two ftrace_ops are registered for different functions and one is then unregistered. The __ftrace_trace_function is then pointed to the remaining ftrace_ops callback function directly. This mean it will be called for all functions that were registered to trace by both ftrace_ops that were registered. This is not an issue for archs with CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, because the update of ftrace_trace_function doesn't happen until after all functions have been updated, and then the mcount caller is updated. But for those archs that do use the ftrace_test_stop_func(), the update is immediate. The dynamic selftest fails because it hits this situation, and the ftrace_ops that it registers fails to only trace what it was suppose to and instead traces all other functions. The solution is to delay the setting of __ftrace_trace_function until after all the functions have been updated according to the registered ftrace_ops. Also, function_trace_stop is set during the update to prevent function tracing from calling code that is caused by the function tracer itself. Signed-off-by: Steven Rostedt <[email protected]>
2011-07-13ftrace: Update filter when tracing enabled in set_ftrace_filter()Steven Rostedt1-0/+4
Currently, if set_ftrace_filter() is called when the ftrace_ops is active, the function filters will not be updated. They will only be updated when tracing is disabled and re-enabled. Update the functions immediately during set_ftrace_filter(). Signed-off-by: Steven Rostedt <[email protected]>
2011-07-13ftrace: Balance records when updating the hashSteven Rostedt1-16/+33
Whenever the hash of the ftrace_ops is updated, the record counts must be balance. This requires disabling the records that are set in the original hash, and then enabling the records that are set in the updated hash. Moving the update into ftrace_hash_move() removes the bug where the hash was updated but the records were not, which results in ftrace triggering a warning and disabling itself because the ftrace_ops filter is updated while the ftrace_ops was registered, and then the failure happens when the ftrace_ops is unregistered. The current code will not trigger this bug, but new code will. Signed-off-by: Steven Rostedt <[email protected]>
2011-07-13rcu: Prevent RCU callbacks from executing before scheduler initializedPaul E. McKenney2-5/+36
Under some rare but real combinations of configuration parameters, RCU callbacks are posted during early boot that use kernel facilities that are not yet initialized. Therefore, when these callbacks are invoked, hard hangs and crashes ensue. This commit therefore prevents RCU callbacks from being invoked until after the scheduler is fully up and running, as in after multiple tasks have been spawned. It might well turn out that a better approach is to identify the specific RCU callbacks that are causing this problem, but that discussion will wait until such time as someone really needs an RCU callback to be invoked (as opposed to merely registered) during early boot. Reported-by: julie Sullivan <[email protected]> Reported-by: RKK <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Tested-by: Konrad Rzeszutek Wilk <[email protected]> Tested-by: julie Sullivan <[email protected]> Tested-by: RKK <[email protected]>
2011-07-12Merge branch 'fixes' of ↵Linus Torvalds1-2/+16
git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc: pcmcia: pxa2xx/vpac270: free gpios on exist rather than requesting ARM: pxa/raumfeld: fix device name for codec ak4104 ARM: pxa/raumfeld: display initialisation fixes ARM: pxa/raumfeld: adapt to upcoming hardware change ARM: pxa: fix gpio_to_chip() clash with gpiolib namespace genirq: replace irq_gc_ack() with {set,clr}_bit variants (fwd) arm: mach-vt8500: add forgotten irq_data conversion ARM: pxa168: correct nand pmu setting ARM: pxa910: correct nand pmu setting ARM: pxa: fix PGSR register address calculation
2011-07-12KVM: Add compat ioctl for KVM_SET_SIGNAL_MASKAlexander Graf1-0/+1
KVM has an ioctl to define which signal mask should be used while running inside VCPU_RUN. At least for big endian systems, this mask is different on 32-bit and 64-bit systems (though the size is identical). Add a compat wrapper that converts the mask to whatever the kernel accepts, allowing 32-bit kvm user space to set signal masks. This patch fixes qemu with --enable-io-thread on ppc64 hosts when running 32-bit user land. Signed-off-by: Alexander Graf <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2011-07-12fixlet: Remove fs_excl from struct task.Justin TerAvest2-2/+0
fs_excl is a poor man's priority inheritance for filesystems to hint to the block layer that an operation is important. It was never clearly specified, not widely adopted, and will not prevent starvation in many cases (like across cgroups). fs_excl was introduced with the time sliced CFQ IO scheduler, to indicate when a process held FS exclusive resources and thus needed a boost. It doesn't cover all file systems, and it was never fully complete. Lets kill it. Signed-off-by: Justin TerAvest <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2011-07-11doc: Konfig: Documentation/power/{pm => apm-acpi}.txtMichael Witten1-2/+2
Signed-off-by: Michael Witten <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2011-07-11Merge branch 'master' into for-nextJiri Kosina23-554/+743
Sync with Linus' tree to be able to apply pending patches that are based on newer code already present upstream.