aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2010-02-22kernel/sys.c: fix missing rcu protection for sys_getpriority()Tetsuo Handa1-0/+2
find_task_by_vpid() is not safe without rcu_read_lock(). 2.6.33-rc7 got RCU protection for sys_setpriority() but missed it for sys_getpriority(). Signed-off-by: Tetsuo Handa <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Acked-by: Serge Hallyn <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-02-22Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds1-6/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf probe: Init struct probe_point and set counter correctly hw-breakpoint: Keep track of dr7 local enable bits hw-breakpoints: Accept breakpoints on NULL address perf_events: Fix FORK events
2010-02-16kfifo: Don't use integer as NULL pointerAnton Vorontsov1-1/+1
This patch fixes following sparse warnings: include/linux/kfifo.h:127:25: warning: Using plain integer as NULL pointer kernel/kfifo.c:83:21: warning: Using plain integer as NULL pointer Signed-off-by: Anton Vorontsov <[email protected]> Acked-by: Stefani Seibold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2010-02-16kfifo: Make kfifo_initialized work after kfifo_freeAnton Vorontsov1-0/+1
After kfifo rework it's no longer possible to reliably know if kfifo is usable, since after kfifo_free(), kfifo_initialized() would still return true. The correct behaviour is needed for at least FHCI USB driver. This patch fixes the issue by resetting the kfifo to zero values (the same approach is used in kfifo_alloc() if allocation failed). Signed-off-by: Anton Vorontsov <[email protected]> Acked-by: Stefani Seibold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2010-02-15Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds1-10/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimer, softirq: Fix hrtimer->softirq trampoline
2010-02-15Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds2-1/+25
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing/kprobes: Fix probe parsing tracing: Fix circular dead lock in stack trace
2010-02-15Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf top: Fix help text alignment perf: Fix hypervisor sample reporting perf: Make bp_len type to u64 generic across the arch
2010-02-14perf_events: Fix FORK eventsPeter Zijlstra1-6/+5
Commit 22e19085 ("Honour event state for aux stream data") introduced a bug where we would drop FORK events. The thing is that we deliver FORK events to the child process' event, which at that time will be PERF_EVENT_STATE_INACTIVE because the child won't be scheduled in (we're in the middle of fork). Solve this twice, change the event state filter to exclude only disabled (STATE_OFF) or worse, and deliver FORK events to the current (parent). Signed-off-by: Peter Zijlstra <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> LKML-Reference: <1266142324.5273.411.camel@laptop> Signed-off-by: Ingo Molnar <[email protected]>
2010-02-14tracing/kprobes: Fix probe parsingHeiko Carstens1-1/+1
Trying to add a probe like: echo p:myprobe 0x10000 > /sys/kernel/debug/tracing/kprobe_events will fail since the wrong pointer is passed to strict_strtoul when trying to convert the address to an unsigned long. Signed-off-by: Heiko Carstens <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-02-09Export the symbol of getboottime and mmonotonic_to_bootbasedJason Wang1-0/+2
Export getboottime and monotonic_to_bootbased in order to let them could be used by following patch. Cc: [email protected] Signed-off-by: Jason Wang <[email protected]> Signed-off-by: Marcelo Tosatti <[email protected]>
2010-02-04Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds3-6/+45
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Handle futex value corruption gracefully futex: Handle user space corruption gracefully futex_lock_pi() key refcnt fix softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume
2010-02-04perf: Make bp_len type to u64 generic across the archMahesh Salgaonkar2-2/+2
Change 'bp_len' type to __u64 to make it work across archs as the s390 architecture watch point length can be upto 2^64. reference: http://lkml.org/lkml/2010/1/25/212 This is an ABI change that is not backward compatible with the previous hardware breakpoint info layout integrated in this development cycle, a rebuilt of perf tools is necessary for versions based on 2.6.33-rc1 - 2.6.33-rc6 to work with a kernel based on this patch. Signed-off-by: Mahesh Salgaonkar <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: "K. Prasad" <[email protected]> Cc: Maneesh Soni <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Martin <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2010-02-03hrtimer, softirq: Fix hrtimer->softirq trampolinePeter Zijlstra1-10/+5
hrtimers callbacks are always done from hardirq context, either the jiffy tick interrupt or the hrtimer device interrupt. [ there is currently one exception that can still call a hrtimer callback from softirq, but even in that case this will still work correctly. ] Reported-by: Wei Yongjun <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Yury Polyanskiy <[email protected]> Tested-by: Wei Yongjun <[email protected]> Acked-by: David S. Miller <[email protected]> LKML-Reference: <1265120401.24455.306.camel@laptop> Signed-off-by: Thomas Gleixner <[email protected]>
2010-02-03futex: Handle futex value corruption gracefullyThomas Gleixner1-2/+19
The WARN_ON in lookup_pi_state which complains about a mismatch between pi_state->owner->pid and the pid which we retrieved from the user space futex is completely bogus. The code just emits the warning and then continues despite the fact that it detected an inconsistent state of the futex. A conveniant way for user space to spam the syslog. Replace the WARN_ON by a consistency check. If the values do not match return -EINVAL and let user space deal with the mess it created. This also fixes the missing task_pid_vnr() when we compare the pi_state->owner pid with the futex value. Reported-by: Jermome Marchand <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Darren Hart <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: <[email protected]>
2010-02-03futex: Handle user space corruption gracefullyThomas Gleixner1-0/+7
If the owner of a PI futex dies we fix up the pi_state and set pi_state->owner to NULL. When a malicious or just sloppy programmed user space application sets the futex value to 0 e.g. by calling pthread_mutex_init(), then the futex can be acquired again. A new waiter manages to enqueue itself on the pi_state w/o damage, but on unlock the kernel dereferences pi_state->owner and oopses. Prevent this by checking pi_state->owner in the unlock path. If pi_state->owner is not current we know that user space manipulated the futex value. Ignore the mess and return -EINVAL. This catches the above case and also the case where a task hijacks the futex by setting the tid value and then tries to unlock it. Reported-by: Jermome Marchand <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Darren Hart <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: <[email protected]>
2010-02-03futex_lock_pi() key refcnt fixMikael Pettersson1-1/+1
This fixes a futex key reference count bug in futex_lock_pi(), where a key's reference count is incremented twice but decremented only once, causing the backing object to not be released. If the futex is created in a temporary file in an ext3 file system, this bug causes the file's inode to become an "undead" orphan, which causes an oops from a BUG_ON() in ext3_put_super() when the file system is unmounted. glibc's test suite is known to trigger this, see <http://bugzilla.kernel.org/show_bug.cgi?id=14256>. The bug is a regression from 2.6.28-git3, namely Peter Zijlstra's 38d47c1b7075bd7ec3881141bb3629da58f88dab "[PATCH] futex: rely on get_user_pages() for shared futexes". That commit made get_futex_key() also increment the reference count of the futex key, and updated its callers to decrement the key's reference count before returning. Unfortunately the normal exit path in futex_lock_pi() wasn't corrected: the reference count is incremented by get_futex_key() and queue_lock(), but the normal exit path only decrements once, via unqueue_me_pi(). The fix is to put_futex_key() after unqueue_me_pi(), since 2.6.31 this is easily done by 'goto out_put_key' rather than 'goto out'. Signed-off-by: Mikael Pettersson <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Darren Hart <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: <[email protected]>
2010-02-02Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: kernel/cred.c: use kmem_cache_free
2010-02-02cgroups: fix to return errno in a failure pathLi Zefan1-2/+5
In cgroup_create(), if alloc_css_id() returns failure, the errno is not propagated to userspace, so mkdir will fail silently. To trigger this bug, we mount blkio (or memory subsystem), and create more then 65534 cgroups. (The number of cgroups is limited to 65535 if a subsystem has use_id == 1) # mount -t cgroup -o blkio xxx /mnt # for ((i = 0; i < 65534; i++)); do mkdir /mnt/$i; done # mkdir /mnt/65534 (should return ENOSPC) # Signed-off-by: Li Zefan <[email protected]> Acked-by: Serge Hallyn <[email protected]> Acked-by: Paul Menage <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-02-02kfifo: fix kernel-doc notationRandy Dunlap1-1/+2
Fix kfifo kernel-doc warnings: Warning(kernel/kfifo.c:361): No description found for parameter 'total' Warning(kernel/kfifo.c:402): bad line: @ @lenout: pointer to output variable with copied data Warning(kernel/kfifo.c:412): No description found for parameter 'lenout' Signed-off-by: Randy Dunlap <[email protected]> Cc: Stefani Seibold <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-02-03kernel/cred.c: use kmem_cache_freeJulia Lawall1-1/+1
Free memory allocated using kmem_cache_zalloc using kmem_cache_free rather than kfree. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,E,c; @@ x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...) ... when != x = E when != &x ?-kfree(x) +kmem_cache_free(c,x) // </smpl> Signed-off-by: Julia Lawall <[email protected]> Acked-by: David Howells <[email protected]> Cc: James Morris <[email protected]> Cc: Steve Dickson <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: James Morris <[email protected]>
2010-02-02tracing: Fix circular dead lock in stack traceLai Jiangshan1-0/+24
When we cat <debugfs>/tracing/stack_trace, we may cause circular lock: sys_read() t_start() arch_spin_lock(&max_stack_lock); t_show() seq_printf(), vsnprintf() .... /* they are all trace-able, when they are traced, max_stack_lock may be required again. */ The following script can trigger this circular dead lock very easy: #!/bin/bash echo 1 > /proc/sys/kernel/stack_tracer_enabled mount -t debugfs xxx /mnt > /dev/null 2>&1 ( # make check_stack() zealous to require max_stack_lock for ((; ;)) { echo 1 > /mnt/tracing/stack_max_size } ) & for ((; ;)) { cat /mnt/tracing/stack_trace > /dev/null } To fix this bug, we increase the percpu trace_active before require the lock. Reported-by: Li Zefan <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2010-02-01Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: lockdep: Fix check_usage_backwards() error message
2010-02-01Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2-10/+49
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, hw_breakpoint, kgdb: Do not take mutex for kernel debugger x86, hw_breakpoints, kgdb: Fix kgdb to use hw_breakpoint API hw_breakpoints: Release the bp slot if arch_validate_hwbkpt_settings() fails. perf: Ignore perf.data.old perf report: Fix segmentation fault when running with '-g none'
2010-02-01Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds3-32/+32
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Correct printk whitespace in warning from cpu down task check sched: Fix incorrect sanity check sched: Fix fork vs hotplug vs cpuset namespaces
2010-02-01Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds1-3/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource: Prevent potential kgdb dead lock
2010-02-01softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resumeJason Wessel2-3/+18
When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, sched_clock() gets the time from hardware such as the TSC on x86. In this configuration kgdb will report a softlock warning message on resuming or detaching from a debug session. Sequence of events in the problem case: 1) "cpu sched clock" and "hardware time" are at 100 sec prior to a call to kgdb_handle_exception() 2) Debugger waits in kgdb_handle_exception() for 80 sec and on exit the following is called ... touch_softlockup_watchdog() --> __raw_get_cpu_var(touch_timestamp) = 0; 3) "cpu sched clock" = 100s (it was not updated, because the interrupt was disabled in kgdb) but the "hardware time" = 180 sec 4) The first timer interrupt after resuming from kgdb_handle_exception updates the watchdog from the "cpu sched clock" update_process_times() { ... run_local_timers() --> softlockup_tick() --> check (touch_timestamp == 0) (it is "YES" here, we have set "touch_timestamp = 0" at kgdb) --> __touch_softlockup_watchdog() ***(A)--> reset "touch_timestamp" to "get_timestamp()" (Here, the "touch_timestamp" will still be set to 100s.) ... scheduler_tick() ***(B)--> sched_clock_tick() (update "cpu sched clock" to "hardware time" = 180s) ... } 5) The Second timer interrupt handler appears to have a large jump and trips the softlockup warning. update_process_times() { ... run_local_timers() --> softlockup_tick() --> "cpu sched clock" - "touch_timestamp" = 180s-100s > 60s --> printk "soft lockup error messages" ... } note: ***(A) reset "touch_timestamp" to "get_timestamp(this_cpu)" Why is "touch_timestamp" 100 sec, instead of 180 sec? When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, the call trace of get_timestamp() is: get_timestamp(this_cpu) -->cpu_clock(this_cpu) -->sched_clock_cpu(this_cpu) -->__update_sched_clock(sched_clock_data, now) The __update_sched_clock() function uses the GTOD tick value to create a window to normalize the "now" values. So if "now" value is too big for sched_clock_data, it will be ignored. The fix is to invoke sched_clock_tick() to update "cpu sched clock" in order to recover from this state. This is done by introducing the function touch_softlockup_watchdog_sync(). This allows kgdb to request that the sched clock is updated when the watchdog thread runs the first time after a resume from kgdb. [[email protected]: Use per cpu instead of an array] Signed-off-by: Jason Wessel <[email protected]> Signed-off-by: Dongdong Deng <[email protected]> Cc: [email protected] Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-30perf, hw_breakpoint, kgdb: Do not take mutex for kernel debuggerJason Wessel1-10/+42
This patch fixes the regression in functionality where the kernel debugger and the perf API do not nicely share hw breakpoint reservations. The kernel debugger cannot use any mutex_lock() calls because it can start the kernel running from an invalid context. A mutex free version of the reservation API needed to get created for the kernel debugger to safely update hw breakpoint reservations. The possibility for a breakpoint reservation to be concurrently processed at the time that kgdb interrupts the system is improbable. Should this corner case occur the end user is warned, and the kernel debugger will prohibit updating the hardware breakpoint reservations. Any time the kernel debugger reserves a hardware breakpoint it will be a system wide reservation. Signed-off-by: Jason Wessel <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: [email protected] Cc: K.Prasad <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Alan Stern <[email protected]> Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-30x86, hw_breakpoints, kgdb: Fix kgdb to use hw_breakpoint APIJason Wessel1-0/+3
In the 2.6.33 kernel, the hw_breakpoint API is now used for the performance event counters. The hw_breakpoint_handler() now consumes the hw breakpoints that were previously set by kgdb arch specific code. In order for kgdb to work in conjunction with this core API change, kgdb must use some of the low level functions of the hw_breakpoint API to install, uninstall, and deal with hw breakpoint reservations. The kgdb core required a change to call kgdb_disable_hw_debug anytime a slave cpu enters kgdb_wait() in order to keep all the hw breakpoints in sync as well as to prevent hitting a hw breakpoint while kgdb is active. During the architecture specific initialization of kgdb, it will pre-allocate 4 disabled (struct perf event **) structures. Kgdb will use these to manage the capabilities for the 4 hw breakpoint registers, per cpu. Right now the hw_breakpoint API does not have a way to ask how many breakpoints are available, on each CPU so it is possible that the install of a breakpoint might fail when kgdb restores the system to the run state. The intent of this patch is to first get the basic functionality of hw breakpoints working and leave it to the person debugging the kernel to understand what hw breakpoints are in use and what restrictions have been imposed as a result. Breakpoint constraints will be dealt with in a future patch. While atomic, the x86 specific kgdb code will call arch_uninstall_hw_breakpoint() and arch_install_hw_breakpoint() to manage the cpu specific hw breakpoints. The net result of these changes allow kgdb to use the same pool of hw_breakpoints that are used by the perf event API, but neither knows about future reservations for the available hw breakpoint slots. Signed-off-by: Jason Wessel <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: [email protected] Cc: K.Prasad <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Alan Stern <[email protected]> Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-28hw_breakpoints: Release the bp slot if arch_validate_hwbkpt_settings() fails.Mahesh Salgaonkar1-0/+4
On a given architecture, when hardware breakpoint registration fails due to un-supported access type (read/write/execute), we lose the bp slot since register_perf_hw_breakpoint() does not release the bp slot on failure. Hence, any subsequent hardware breakpoint registration starts failing with 'no space left on device' error. This patch introduces error handling in register_perf_hw_breakpoint() function and releases bp slot on error. Signed-off-by: Mahesh Salgaonkar <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: K. Prasad <[email protected]> Cc: Maneesh Soni <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2010-01-28sched: Correct printk whitespace in warning from cpu down task checkFrans Pop1-4/+4
Due to an incorrect line break the output currently contains tabs. Also remove trailing space. The actual output that logcheck sent me looked like this: Task events/1 (pid = 10) is on cpu 1^I^I^I^I(state = 1, flags = 84208040) After this patch it becomes: Task events/1 (pid = 10) is on cpu 1 (state = 1, flags = 84208040) Signed-off-by: Frans Pop <elendilplanet.nl> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-28sched: Fix incorrect sanity checkPeter Zijlstra1-1/+1
We moved to migrate on wakeup, which means that sleeping tasks could still be present on offline cpus. Amend the check to only test running tasks. Reported-by: Heiko Carstens <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-27lockdep: Fix check_usage_backwards() error messageOleg Nesterov1-1/+1
Lockdep has found the real bug, but the output doesn't look right to me: > ========================================================= > [ INFO: possible irq lock inversion dependency detected ] > 2.6.33-rc5 #77 > --------------------------------------------------------- > emacs/1609 just changed the state of lock: > (&(&tty->ctrl_lock)->rlock){+.....}, at: [<ffffffff8127c648>] tty_fasync+0xe8/0x190 > but this lock took another, HARDIRQ-unsafe lock in the past: > (&(&sighand->siglock)->rlock){-.....} "HARDIRQ-unsafe" and "this lock took another" looks wrong, afaics. > ... key at: [<ffffffff81c054a4>] __key.46539+0x0/0x8 > ... acquired at: > [<ffffffff81089af6>] __lock_acquire+0x1056/0x15a0 > [<ffffffff8108a0df>] lock_acquire+0x9f/0x120 > [<ffffffff81423012>] _raw_spin_lock_irqsave+0x52/0x90 > [<ffffffff8127c1be>] __proc_set_tty+0x3e/0x150 > [<ffffffff8127e01d>] tty_open+0x51d/0x5e0 The stack-trace shows that this lock (ctrl_lock) was taken under ->siglock (which is hopefully irq-safe). This is a clear typo in check_usage_backwards() where we tell the print a fancy routine we're forwards. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-26tracing/documentation: Cover new frame pointer semanticsMike Frysinger1-3/+1
Update the graph tracer examples to cover the new frame pointer semantics (in terms of passing it along). Move the HAVE_FUNCTION_GRAPH_FP_TEST docs out of the Kconfig, into the right place, and expand on the details. Signed-off-by: Mike Frysinger <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2010-01-26ring-buffer: Check for end of page in iteratorSteven Rostedt1-3/+8
If the iterator comes to an empty page for some reason, or if the page is emptied by a consuming read. The iterator code currently does not check if the iterator is pass the contents, and may return a false entry. This patch adds a check to the ring buffer iterator to test if the current page has been completely read and sets the iterator to the next page if necessary. Signed-off-by: Steven Rostedt <[email protected]>
2010-01-26ring-buffer: Check if ring buffer iterator has stale dataSteven Rostedt1-0/+13
Usually reads of the ring buffer is performed by a single task. There are two types of reads from the ring buffer. One is a consuming read which will consume the entry that was read and the next read will be the entry that follows. The other is an iterator that will let the user read the contents of the ring buffer without modifying it. When an iterator is allocated, writes to the ring buffer are disabled to protect the iterator. The problem exists when consuming reads happen while an iterator is allocated. Specifically, the kind of read that swaps out an entire page (used by splice) and replaces it with a new read. If the iterator is on the page that is swapped out, then the next read may read from this swapped out page and return garbage. This patch adds a check when reading the iterator to make sure that the iterator contents are still valid. If a consuming read has taken place, the iterator is reset. Signed-off-by: Steven Rostedt <[email protected]>
2010-01-26clocksource: Prevent potential kgdb dead lockThomas Gleixner1-3/+15
commit 0f8e8ef7 (clocksource: Simplify clocksource watchdog resume logic) introduced a potential kgdb dead lock. When the kernel is stopped by kgdb inside code which holds watchdog_lock then kgdb dead locks in clocksource_resume_watchdog(). clocksource_resume_watchdog() is called from kbdg via clocksource_touch_watchdog() to avoid that the clock source watchdog marks TSC unstable after the kernel has been stopped. Solve this by replacing spin_lock with a spin_trylock and just return in case the lock is held. Not resetting the watchdog might result in TSC becoming marked unstable, but that's an acceptable penalty for using kgdb. The timekeeping is anyway easily screwed up by kgdb when the system uses either jiffies or a clock source which wraps in short intervals (e.g. pm_timer wraps about every 4.6s), so we really do not have to worry about that occasional TSC marked unstable side effect. The second caller of clocksource_resume_watchdog() is clocksource_resume(). The trylock is safe here as well because the system is UP at this point, interrupts are disabled and nothing else can hold watchdog_lock(). Reported-by: Jason Wessel <[email protected]> LKML-Reference: <[email protected]> Cc: [email protected] Cc: Martin Schwidefsky <[email protected]> Cc: John Stultz <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2010-01-25tracing: Prevent kernel oops with corrupted bufferSteven Rostedt1-0/+5
If the contents of the ftrace ring buffer gets corrupted and the trace file is read, it could create a kernel oops (usualy just killing the user task thread). This is caused by the checking of the pid in the buffer. If the pid is negative, it still references the cmdline cache array, which could point to an invalid address. The simple fix is to test for negative PIDs. Signed-off-by: Steven Rostedt <[email protected]>
2010-01-24Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clockevent: Don't remove broadcast device when cpu is dead
2010-01-24Merge git://git.infradead.org/~dwmw2/mtd-2.6.33Linus Torvalds3-1/+7
* git://git.infradead.org/~dwmw2/mtd-2.6.33: mtd: tests: fix read, speed and stress tests on NOR flash mtd: Really add ARM pismo support kmsg_dump: Dump on crash_kexec as well
2010-01-21sched: Fix fork vs hotplug vs cpuset namespacesPeter Zijlstra2-27/+27
There are a number of issues: 1) TASK_WAKING vs cgroup_clone (cpusets) copy_process(): sched_fork() child->state = TASK_WAKING; /* waiting for wake_up_new_task() */ if (current->nsproxy != p->nsproxy) ns_cgroup_clone() cgroup_clone() mutex_lock(inode->i_mutex) mutex_lock(cgroup_mutex) cgroup_attach_task() ss->can_attach() ss->attach() [ -> cpuset_attach() ] cpuset_attach_task() set_cpus_allowed_ptr(); while (child->state == TASK_WAKING) cpu_relax(); will deadlock the system. 2) cgroup_clone (cpusets) vs copy_process So even if the above would work we still have: copy_process(): if (current->nsproxy != p->nsproxy) ns_cgroup_clone() cgroup_clone() mutex_lock(inode->i_mutex) mutex_lock(cgroup_mutex) cgroup_attach_task() ss->can_attach() ss->attach() [ -> cpuset_attach() ] cpuset_attach_task() set_cpus_allowed_ptr(); ... p->cpus_allowed = current->cpus_allowed over-writing the modified cpus_allowed. 3) fork() vs hotplug if we unplug the child's cpu after the sanity check when the child gets attached to the task_list but before wake_up_new_task() shit will meet with fan. Solve all these issues by moving fork cpu selection into wake_up_new_task(). Reported-by: Serge E. Hallyn <[email protected]> Tested-by: Serge E. Hallyn <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <1264106190.4283.1314.camel@laptop> Signed-off-by: Thomas Gleixner <[email protected]>
2010-01-21Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2-2/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: x86: Add support for the ANY bit perf: Change the is_software_event() definition perf: Honour event state for aux stream data perf: Fix perf_event_do_pending() fallback callsite perf kmem: Print usage help for unknown commands perf kmem: Increase "Hit" column length hw-breakpoints, perf: Fix broken mmiotrace due to dr6 by reference change perf timechart: Use tid not pid for COMM change
2010-01-21perf: Honour event state for aux stream dataPeter Zijlstra1-0/+9
Anton reported that perf record kept receiving events even after calling ioctl(PERF_EVENT_IOC_DISABLE). It turns out that FORK,COMM and MMAP events didn't respect the disabled state and kept flowing in. Reported-by: Anton Blanchard <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Tested-by: Anton Blanchard <[email protected]> LKML-Reference: <1263459187.4244.265.camel@laptop> CC: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2010-01-21perf: Fix perf_event_do_pending() fallback callsitePeter Zijlstra1-2/+1
Paul questioned the context in which we should call perf_event_do_pending(). After looking at that I found that it should be called from IRQ context these days, however the fallback call-site is placed in softirq context. Ammend this by placing the callback in the IRQ timer path. Reported-by: Paul Mackerras <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <1263374859.4244.192.camel@laptop> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-21sched: Reassign prev and switch_count when reacquire_kernel_lock() failYong Zhang1-1/+4
Assume A->B schedule is processing, if B have acquired BKL before and it need reschedule this time. Then on B's context, it will go to need_resched_nonpreemptible for reschedule. But at this time, prev and switch_count are related to A. It's wrong and will lead to incorrect scheduler statistics. Signed-off-by: Yong Zhang <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-21sched: Fix vmark regression on big machinesMike Galbraith1-1/+1
SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't enabled, leading to many cache misses on large machines as we traverse looking for an idle shared cache to wake to. Change the enabler of select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the sibling domain level. Reported-by: Lin Ming <[email protected]> Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-01-18clockevent: Don't remove broadcast device when cpu is deadXiaotian Feng1-1/+2
Marc reported that the BUG_ON in clockevents_notify() triggers on his system. This happens because the kernel tries to remove an active clock event device (used for broadcasting) from the device list. The handling of devices which can be used as per cpu device and as a global broadcast device is suboptimal. The simplest solution for now (and for stable) is to check whether the device is used as global broadcast device, but this needs to be revisited. [ tglx: restored the cpuweight check and massaged the changelog ] Reported-by: Marc Dionne <[email protected]> Tested-by: Marc Dionne <[email protected]> Signed-off-by: Xiaotian Feng <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected]
2010-01-16Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds1-15/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futexes: Remove rw parameter from get_futex_key()
2010-01-16Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds3-14/+25
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing/filters: Add comment for match callbacks tracing/filters: Fix MATCH_FULL filter matching for PTR_STRING tracing/filters: Fix MATCH_MIDDLE_ONLY filter matching lib: Introduce strnstr() tracing/filters: Fix MATCH_END_ONLY filter matching tracing/filters: Fix MATCH_FRONT_ONLY filter matching ftrace: Fix MATCH_END_ONLY function filter tracing/x86: Derive arch from bits argument in recordmcount.pl ring-buffer: Add rb_list_head() wrapper around new reader page next field ring-buffer: Wrap a list.next reference with rb_list_head()
2010-01-16smp_call_function_any(): pass the node value to cpumask_of_node()David John1-1/+1
The change in acpi_cpufreq to use smp_call_function_any causes a warning when it is called since the function erroneously passes the cpu id to cpumask_of_node rather than the node that the cpu is on. Fix this. cpumask_of_node(3): node > nr_node_ids(1) Pid: 1, comm: swapper Not tainted 2.6.33-rc3-00097-g2c1f189 #223 Call Trace: [<ffffffff81028bb3>] cpumask_of_node+0x23/0x58 [<ffffffff81061f51>] smp_call_function_any+0x65/0xfa [<ffffffff810160d1>] ? do_drv_read+0x0/0x2f [<ffffffff81015fba>] get_cur_val+0xb0/0x102 [<ffffffff81016080>] get_cur_freq_on_cpu+0x74/0xc5 [<ffffffff810168a7>] acpi_cpufreq_cpu_init+0x417/0x515 [<ffffffff81562ce9>] ? __down_write+0xb/0xd [<ffffffff8148055e>] cpufreq_add_dev+0x278/0x922 Signed-off-by: David John <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-01-16kfifo: document everywhere that size has to be power of twoAndi Kleen1-1/+1
On my first try using them I missed that the fifos need to be power of two, resulting in a runtime bug. Document that requirement everywhere (and fix one grammar bug) Signed-off-by: Andi Kleen <[email protected]> Acked-by: Stefani Seibold <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Dmitry Torokhov <[email protected]> Cc: Andy Walls <[email protected]> Cc: Vikram Dhillon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>