aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2010-11-30tracing: Fix panic when lseek() called on "trace" opened for writingSlava Pestov1-1/+9
The file_ops struct for the "trace" special file defined llseek as seq_lseek(). However, if the file was opened for writing only, seq_open() was not called, and the seek would dereference a null pointer, file->private_data. This patch introduces a new wrapper for seq_lseek() which checks if the file descriptor is opened for reading first. If not, it does nothing. Cc: <[email protected]> Signed-off-by: Slava Pestov <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2010-11-30sched: Add 'autogroup' scheduling feature: automated per session task groupsMike Galbraith7-49/+292
A recurring complaint from CFS users is that parallel kbuild has a negative impact on desktop interactivity. This patch implements an idea from Linus, to automatically create task groups. Currently, only per session autogroups are implemented, but the patch leaves the way open for enhancement. Implementation: each task's signal struct contains an inherited pointer to a refcounted autogroup struct containing a task group pointer, the default for all tasks pointing to the init_task_group. When a task calls setsid(), a new task group is created, the process is moved into the new task group, and a reference to the preveious task group is dropped. Child processes inherit this task group thereafter, and increase it's refcount. When the last thread of a process exits, the process's reference is dropped, such that when the last process referencing an autogroup exits, the autogroup is destroyed. At runqueue selection time, IFF a task has no cgroup assignment, its current autogroup is used. Autogroup bandwidth is controllable via setting it's nice level through the proc filesystem: cat /proc/<pid>/autogroup Displays the task's group and the group's nice level. echo <nice level> > /proc/<pid>/autogroup Sets the task group's shares to the weight of nice <level> task. Setting nice level is rate limited for !admin users due to the abuse risk of task group locking. The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via the boot option noautogroup, and can also be turned on/off on the fly via: echo [01] > /proc/sys/kernel/sched_autogroup_enabled ... which will automatically move tasks to/from the root task group. Signed-off-by: Mike Galbraith <[email protected]> Acked-by: Linus Torvalds <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Markus Trippelsdorf <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Paul Turner <[email protected]> Cc: Oleg Nesterov <[email protected]> [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ] Signed-off-by: Ingo Molnar <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-30sched: Fix unregister_fair_sched_group()Paul Turner1-2/+1
In the flipping and flopping between calling unregister_fair_sched_group() on a per-cpu versus per-group basis we ended up in a bad state. Remove from the list for the passed cpu as opposed to some arbitrary index. ( This fixes explosions w/ autogroup as well as a group creation/destruction stress test. ) Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-29rcu: Make synchronize_srcu_expedited() fast if running readersPaul E. McKenney1-1/+7
The synchronize_srcu_expedited() function is currently quick if there are no active readers, but will delay a full jiffy if there are any. If these readers leave their SRCU read-side critical sections quickly, this is way too long to wait. So this commit first waits ten microseconds, and only then falls back to jiffy-at-a-time waiting. Reported-by: Avi Kivity <[email protected]> Reported-by: Marcelo Tosatti <[email protected]> Tested-by: Takuya Yoshikawa <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2010-11-29rcu: fix race condition in synchronize_sched_expedited()Paul E. McKenney1-2/+16
The new (early 2010) implementation of synchronize_sched_expedited() uses try_stop_cpu() to force a context switch on every CPU. It also permits concurrent calls to synchronize_sched_expedited() to share a single call to try_stop_cpu() through use of an atomically incremented synchronize_sched_expedited_count variable. Unfortunately, this is subject to failure as follows: o Task A invokes synchronize_sched_expedited(), try_stop_cpus() succeeds, but Task A is preempted before getting to the atomic increment of synchronize_sched_expedited_count. o Task B also invokes synchronize_sched_expedited(), with exactly the same outcome as Task A. o Task C also invokes synchronize_sched_expedited(), again with exactly the same outcome as Tasks A and B. o Task D also invokes synchronize_sched_expedited(), but only gets as far as acquiring the mutex within try_stop_cpus() before being preempted, interrupted, or otherwise delayed. o Task E also invokes synchronize_sched_expedited(), but only gets to the snapshotting of synchronize_sched_expedited_count. o Tasks A, B, and C all increment synchronize_sched_expedited_count. o Task E fails to get the mutex, so checks the new value of synchronize_sched_expedited_count. It finds that the value has increased, so (wrongly) assumes that its work has been done, returning despite there having been no expedited grace period since it began. The solution is to have the lowest-numbered CPU atomically increment the synchronize_sched_expedited_count variable within the synchronize_sched_expedited_cpu_stop() function, which is under the protection of the mutex acquired by try_stop_cpus(). However, this also requires that piggybacking tasks wait for three rather than two instances of try_stop_cpu(), because we cannot control the order in which the per-CPU callback function occur. Cc: Tejun Heo <[email protected]> Cc: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2010-11-29rcu: update documentation/comments for Lai's adoption patchPaul E. McKenney2-5/+7
Lai's RCU-callback immediate-adoption patch changes the RCU tracing output, so update tracing.txt. Also update a few comments to clarify the synchronization design. Signed-off-by: Paul E. McKenney <[email protected]>
2010-11-29rcu,cleanup: simplify the code when cpu is dyingLai Jiangshan4-78/+31
When we handle the CPU_DYING notifier, the whole system is stopped except for the current CPU. We therefore need no synchronization with the other CPUs. This allows us to move any orphaned RCU callbacks directly to the list of any online CPU without needing to run them through the global orphan lists. These global orphan lists can therefore be dispensed with. This commit makes thes changes, though currently victimizes CPU 0 @@@. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2010-11-29rcu,cleanup: move synchronize_sched_expedited() out of sched.cLai Jiangshan2-69/+71
The first version of synchronize_sched_expedited() used the migration code in the scheduler, and was therefore implemented in kernel/sched.c. However, the more recent version of this code no longer uses the migration code, so this commit moves it to the main RCU source files. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2010-11-29rcu: get rid of obsolete "classic" names in TREE_RCU tracingPaul E. McKenney1-4/+4
The TREE_RCU tracing had obsolete rcuclassic_trace_init() and rcuclassic_trace_cleanup() function names. This commit brings them up to date: rcutree_trace_init() and rcutree_trace_cleanup(), respectively. Signed-off-by: Paul E. McKenney <[email protected]>
2010-11-29rcu: Distinguish between boosting and boostedPaul E. McKenney1-0/+4
RCU priority boosting's tracing did not distinguish between ongoing boosting and completion of boosting. This commit therefore adds this capability. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2010-11-29rcu: add tracing for TINY_RCU and TINY_PREEMPT_RCUPaul E. McKenney2-10/+226
Add tracing for the tiny RCU implementations, including statistics on boosting in the case of TINY_PREEMPT_RCU and RCU_BOOST. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2010-11-29rcu: priority boosting for TINY_PREEMPT_RCUPaul E. McKenney2-50/+224
Add priority boosting, but only for TINY_PREEMPT_RCU. This is enabled by the default-off RCU_BOOST kernel parameter. The priority to which to boost preempted RCU readers is controlled by the RCU_BOOST_PRIO kernel parameter (defaulting to real-time priority 1) and the time to wait before boosting the readers blocking a given grace period is controlled by the RCU_BOOST_DELAY kernel parameter (defaulting to 500 milliseconds). Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2010-11-28Kill off a bunch of warning: ‘inline’ is not at beginning of declarationJesper Juhl2-3/+3
These warnings are spewed during a build of a 'allnoconfig' kernel (especially the ones from u64_stats_sync.h show up a lot) when building with -Wextra (which I often do).. They are a) annoying b) easy to get rid of. This patch kills them off. include/linux/u64_stats_sync.h:70:1: warning: ‘inline’ is not at beginning of declaration include/linux/u64_stats_sync.h:77:1: warning: ‘inline’ is not at beginning of declaration include/linux/u64_stats_sync.h:84:1: warning: ‘inline’ is not at beginning of declaration include/linux/u64_stats_sync.h:96:1: warning: ‘inline’ is not at beginning of declaration include/linux/u64_stats_sync.h:115:1: warning: ‘inline’ is not at beginning of declaration include/linux/u64_stats_sync.h:127:1: warning: ‘inline’ is not at beginning of declaration kernel/time.c:241:1: warning: ‘inline’ is not at beginning of declaration kernel/time.c:257:1: warning: ‘inline’ is not at beginning of declaration kernel/perf_event.c:4513:1: warning: ‘inline’ is not at beginning of declaration mm/page_alloc.c:4012:1: warning: ‘inline’ is not at beginning of declaration Signed-off-by: Jesper Juhl <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2010-11-29security: Define CAP_SYSLOGSerge E. Hallyn1-1/+7
Privileged syslog operations currently require CAP_SYS_ADMIN. Split this off into a new CAP_SYSLOG privilege which we can sanely take away from a container through the capability bounding set. With this patch, an lxc container can be prevented from messing with the host's syslog (i.e. dmesg -c). Changelog: mar 12 2010: add selinux capability2:cap_syslog perm Changelog: nov 22 2010: . port to new kernel . add a WARN_ONCE if userspace isn't using CAP_SYSLOG Signed-off-by: Serge Hallyn <[email protected]> Acked-by: Andrew G. Morgan <[email protected]> Acked-By: Kees Cook <[email protected]> Cc: James Morris <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Stephen Smalley <[email protected]> Cc: "Christopher J. PeBenito" <[email protected]> Cc: Eric Paris <[email protected]> Signed-off-by: James Morris <[email protected]>
2010-11-28Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds1-4/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix the software context switch counter perf, x86: Fixup Kconfig deps x86, perf, nmi: Disable perf if counters are not accessible perf: Fix inherit vs. context rotation bug
2010-11-27Merge branch 'cleanup-bd_claim' of ↵Jens Axboe1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into for-2.6.38/core
2010-11-27Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: posix-cpu-timers: Rcu_read_lock/unlock protect find_task_by_vpid call
2010-11-27Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds5-15/+92
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf symbols: Remove incorrect open-coded container_of() perf record: Handle restrictive permissions in /proc/{kallsyms,modules} x86/kprobes: Prevent kprobes to probe on save_args() irq_work: Drop cmpxchg() result perf: Fix owner-list vs exit x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG tracing: Fix recursive user stack trace perf,hw_breakpoint: Initialize hardware api earlier x86: Ignore trap bits on single step exceptions tracing: Force arch_local_irq_* notrace for paravirt tracing: Fix module use of trace_bprintk()
2010-11-27Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds1-5/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix idle balancing sched: Fix volanomark performance regression
2010-11-26perf, arch: Cleanup perf-pmu init vs lockup-detectorPeter Zijlstra1-4/+3
The perf hardware pmu got initialized at various points in the boot, some before early_initcall() some after (notably arch_initcall). The problem is that the NMI lockup detector is ran from early_initcall() and expects the hardware pmu to be present. Sanitize this by moving all architecture hardware pmu implementations to initialize at early_initcall() and move the lockup detector to an explicit initcall right after that. Cc: paulus <[email protected]> Cc: davem <[email protected]> Cc: Michael Cree <[email protected]> Cc: Deng-Cheng Zhu <[email protected]> Acked-by: Paul Mundt <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <1290707759.2145.119.camel@laptop> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26perf: Ignore non-sampling overflowsPeter Zijlstra1-0/+7
Some arch implementations call perf_event_overflow() by 'accident', ignore this. Reported-by: Francis Moreau <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26perf: Don't bother to init the hrtimer for no SW sampling countersFranck Bui-Huu1-11/+13
Signed-off-by: Franck Bui-Huu <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26perf: Limit event refresh to sampling eventFranck Bui-Huu1-1/+1
Signed-off-by: Franck Bui-Huu <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26perf: Introduce is_sampling_event()Franck Bui-Huu1-5/+5
and use it when appropriate. Signed-off-by: Franck Bui-Huu <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26Merge branch 'perf/urgent' into perf/coreIngo Molnar3-19/+81
Conflicts: arch/x86/kernel/apic/hw_nmi.c Merge reason: Resolve conflict, queue up dependent patch. Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26sched: Remove unused argument dest_cpu to migrate_task()Nikanth Karthikesan1-5/+3
Remove unused argument, 'dest_cpu' of migrate_task(), and pass runqueue, as it is always known at the call site. Signed-off-by: Nikanth Karthikesan <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26mutexes, sched: Introduce arch_mutex_cpu_relax()Gerald Schaefer2-2/+3
The spinning mutex implementation uses cpu_relax() in busy loops as a compiler barrier. Depending on the architecture, cpu_relax() may do more than needed in this specific mutex spin loops. On System z we also give up the time slice of the virtual cpu in cpu_relax(), which prevents effective spinning on the mutex. This patch replaces cpu_relax() in the spinning mutex code with arch_mutex_cpu_relax(), which can be defined by each architecture that selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so this patch should not affect other architectures than System z for now. Signed-off-by: Gerald Schaefer <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <1290437256.7455.4.camel@thinkpad> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26Merge commit 'v2.6.37-rc3' into sched/coreIngo Molnar11-39/+84
Merge reason: Pick up latest fixes. Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26Merge commit 'v2.6.37-rc3' into perf/coreIngo Molnar11-39/+84
Merge reason: Pick up latest fixes. Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26nohz: Fix printk_needs_cpu() return value on offline cpusHeiko Carstens1-0/+2
This patch fixes a hang observed with 2.6.32 kernels where timers got enqueued on offline cpus. printk_needs_cpu() may return 1 if called on offline cpus. When a cpu gets offlined it schedules the idle process which, before killing its own cpu, will call tick_nohz_stop_sched_tick(). That function in turn will call printk_needs_cpu() in order to check if the local tick can be disabled. On offline cpus this function should naturally return 0 since regardless if the tick gets disabled or not the cpu will be dead short after. That is besides the fact that __cpu_disable() should already have made sure that no interrupts on the offlined cpu will be delivered anyway. In this case it prevents tick_nohz_stop_sched_tick() to call select_nohz_load_balancer(). No idea if that really is a problem. However what made me debug this is that on 2.6.32 the function get_nohz_load_balancer() is used within __mod_timer() to select a cpu on which a timer gets enqueued. If printk_needs_cpu() returns 1 then the nohz_load_balancer cpu doesn't get updated when a cpu gets offlined. It may contain the cpu number of an offline cpu. In turn timers get enqueued on an offline cpu and not very surprisingly they never expire and cause system hangs. This has been observed 2.6.32 kernels. On current kernels __mod_timer() uses get_nohz_timer_target() which doesn't have that problem. However there might be other problems because of the too early exit tick_nohz_stop_sched_tick() in case a cpu goes offline. Easiest way to fix this is just to test if the current cpu is offline and call printk_tick() directly which clears the condition. Alternatively I tried a cpu hotplug notifier which would clear the condition, however between calling the notifier function and printk_needs_cpu() something could have called printk() again and the problem is back again. This seems to be the safest fix. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26printk: Fix wake_up_klogd() vs cpu hotplugHeiko Carstens1-1/+1
wake_up_klogd() may get called from preemptible context but uses __raw_get_cpu_var() to write to a per cpu variable. If it gets preempted between getting the address and writing to it, the cpu in question could be offline if the process gets scheduled back and hence writes to the per cpu data of an offline cpu. This buggy behaviour was introduced with fa33507a "printk: robustify printk, fix #2" which was supposed to fix a "using smp_processor_id() in preemptible" warning. Let's use this_cpu_write() instead which disables preemption and makes sure that the outlined scenario cannot happen. Signed-off-by: Heiko Carstens <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26perf: Fix the software context switch counterPeter Zijlstra1-2/+0
Stephane noticed that because the perf_sw_event() call is inside the perf_event_task_sched_out() call it won't get called unless we have a per-task counter. Reported-by: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26perf: Fix inherit vs. context rotation bugThomas Gleixner1-2/+20
It was found that sometimes children of tasks with inherited events had one extra event. Eventually it turned out to be due to the list rotation no being exclusive with the list iteration in the inheritance code. Cure this by temporarily disabling the rotation while we inherit the events. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Cc: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-26workqueue: check the allocation of system_unbound_wqHitoshi Mitake1-1/+2
I found a trivial bug on initialization of workqueue. Current init_workqueues doesn't check the result of allocation of system_unbound_wq, this should be checked like other queues. Signed-off-by: Hitoshi Mitake <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: David Howells <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2010-11-23sched: Add some clock info to sched_debugPeter Zijlstra2-5/+28
Add more clock information to /proc/sched_debug, Thomas wanted to see the sched_clock_stable state. Requested-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-23cpu: Remove incorrect BUG_ONPeter Zijlstra1-1/+4
Oleg mentioned that there is no actual guarantee the dying cpu's migration thread is actually finished running when we get there, so replace the BUG_ON() with a spinloop waiting for it. Reported-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-23cpu: Remove unused variableDhaval Giani1-1/+0
GCC warns us about: kernel/cpu.c: In function ‘take_cpu_down’: kernel/cpu.c:200:15: warning: unused variable ‘cpu’ This variable is unused since param->hcpu is directly used later on in cpu_notify. Signed-off-by: Dhaval Giani <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-23sched: Fix UP build breakagePeter Zijlstra1-1/+1
The recent cgroup-scheduling rework caused a UP build problem. Cc: Paul Turner <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-23sched: Make task dump print all 15 chars of proc commErik Gilling1-1/+1
Signed-off-by: Erik Gilling <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-19Revert "kernel: make /proc/kallsyms mode 400 to reduce ease of attacking"Linus Torvalds1-1/+1
This reverts commit 59365d136d205cc20fe666ca7f89b1c5001b0d5a. It turns out that this can break certain existing user land setups. Quoth Sarah Sharp: "On Wednesday, I updated my branch to commit 460781b from linus' tree, and my box would not boot. klogd segfaulted, which stalled the whole system. At first I thought it actually hung the box, but it continued booting after 5 minutes, and I was able to log in. It dropped back to the text console instead of the graphical bootup display for that period of time. dmesg surprisingly still works. I've bisected the problem down to this commit (commit 59365d136d205cc20fe666ca7f89b1c5001b0d5a) The box is running klogd 1.5.5ubuntu3 (from Jaunty). Yes, I know that's old. I read the bit in the commit about changing the permissions of kallsyms after boot, but if I can't boot that doesn't help." So let's just keep the old default, and encourage distributions to do the "chmod -r /proc/kallsyms" in their bootup scripts. This is not worth a kernel option to change default behavior, since it's so easily done in user space. Reported-and-bisected-by: Sarah Sharp <[email protected]> Cc: Marcus Meissner <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Eugene Teo <[email protected]> Cc: Jesper Juhl <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-11-19tracing/events: Show real number in array fieldsSteven Rostedt2-4/+16
Currently we have in something like the sched_switch event: field:char prev_comm[TASK_COMM_LEN]; offset:12; size:16; signed:1; When a userspace tool such as perf tries to parse this, the TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro simply uses a #len to show the string of the length. When the length is an enum, we get a string that means nothing for tools. By adding a static buffer and a mutex to protect it, we can store the string into that buffer with snprintf and show the actual number. Now we get: field:char prev_comm[16]; offset:12; size:16; signed:1; Something much more useful. Signed-off-by: Steven Rostedt <[email protected]>
2010-11-18Merge branch 'perf/core' of ↵Ingo Molnar2-10/+30
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core
2010-11-18Merge branch 'for_linus' of ↵Linus Torvalds1-10/+11
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb,ppc: Fix regression in evr register handling kgdb,x86: fix regression in detach handling kdb: fix crash when KDB_BASE_CMD_MAX is exceeded kdb: fix memory leak in kdb_main.c
2010-11-18tracing: New flag to allow non privileged users to use a trace eventFrederic Weisbecker2-10/+30
This adds a new trace event internal flag that allows them to be used in perf by non privileged users in case of task bound tracing. This is desired for syscalls tracepoint because they don't leak global system informations, like some other tracepoints. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Li Zefan <[email protected]> Cc: Jason Baron <[email protected]>
2010-11-18x86: Add RO/NX protection for loadable kernel modulesmatthieu castet1-2/+169
This patch is a logical extension of the protection provided by CONFIG_DEBUG_RODATA to LKMs. The protection is provided by splitting module_core and module_init into three logical parts each and setting appropriate page access permissions for each individual section: 1. Code: RO+X 2. RO data: RO+NX 3. RW data: RW+NX In order to achieve proper protection, layout_sections() have been modified to align each of the three parts mentioned above onto page boundary. Next, the corresponding page access permissions are set right before successful exit from load_module(). Further, free_module() and sys_init_module have been modified to set module_core and module_init as RW+NX right before calling module_free(). By default, the original section layout and access flags are preserved. When compiled with CONFIG_DEBUG_SET_MODULE_RONX=y, the patch will page-align each group of sections to ensure that each page contains only one type of content and will enforce RO/NX for each group of pages. -v1: Initial proof-of-concept patch. -v2: The patch have been re-written to reduce the number of #ifdefs and to make it architecture-agnostic. Code formatting has also been corrected. -v3: Opportunistic RO/NX protection is now unconditional. Section page-alignment is enabled when CONFIG_DEBUG_RODATA=y. -v4: Removed most macros and improved coding style. -v5: Changed page-alignment and RO/NX section size calculation -v6: Fixed comments. Restricted RO/NX enforcement to x86 only -v7: Introduced CONFIG_DEBUG_SET_MODULE_RONX, added calls to set_all_modules_text_rw() and set_all_modules_text_ro() in ftrace -v8: updated for compatibility with linux 2.6.33-rc5 -v9: coding style fixes -v10: more coding style fixes -v11: minor adjustments for -tip -v12: minor adjustments for v2.6.35-rc2-tip -v13: minor adjustments for v2.6.37-rc1-tip Signed-off-by: Siarhei Liakh <[email protected]> Signed-off-by: Xuxian Jiang <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Reviewed-by: James Morris <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Dave Jones <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> LKML-Reference: <[email protected]> [ minor cleanliness edits, -v14: build failure fix ] Signed-off-by: Ingo Molnar <[email protected]>
2010-11-18sched: Update tg->shares after cpu.shares writePaul Turner1-31/+11
Formerly sched_group_set_shares would force a rebalance by overflowing domain share sums. Now that per-cpu averages are maintained we can set the true value by issuing an update_cfs_shares() following a tg->shares update. Also initialize tg se->load to 0 for consistency since we'll now set correct weights on enqueue. Signed-off-by: Paul Turner <[email protected]?> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-18sched: Allow update_cfs_load() to update global loadPaul Turner1-15/+29
Refactor the global load updates from update_shares_cpu() so that update_cfs_load() can update global load when it is more than ~10% out of sync. The new global_load parameter allows us to force an update, regardless of the error factor so that we can synchronize w/ update_shares(). Signed-off-by: Paul Turner <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-18sched: Implement demand based update_cfs_load()Paul Turner2-1/+20
When the system is busy, dilation of rq->next_balance makes lb->update_shares() insufficiently frequent for threads which don't sleep (no dequeue/enqueue updates). Adjust for this by making demand based updates based on the accumulation of execution time sufficient to wrap our averaging window. Signed-off-by: Paul Turner <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-18sched: Update shares on idle_balancePaul Turner1-0/+1
Since shares updates are no longer expensive and effectively local, update them at idle_balance(). This allows us to more quickly redistribute shares to another cpu when our load becomes idle. Signed-off-by: Paul Turner <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-11-18sched: Add sysctl_sched_shares_windowPaul Turner2-1/+15
Introduce a new sysctl for the shares window and disambiguate it from sched_time_avg. A 10ms window appears to be a good compromise between accuracy and performance. Signed-off-by: Paul Turner <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>