Age | Commit message (Collapse) | Author | Files | Lines |
|
> James Bottomley (1):
> module: workaround duplicate section names
-tip testing found that this patch breaks the build on x86 if
CONFIG_KALLSYMS is disabled:
kernel/module.c: In function ‘load_module’:
kernel/module.c:2367: error: ‘struct module’ has no member named ‘sect_attrs’
distcc[8269] ERROR: compile kernel/module.c on ph/32 failed
make[1]: *** [kernel/module.o] Error 1
make: *** [kernel] Error 2
make: *** Waiting for unfinished jobs....
Commit 1b364bf misses the fact that section attributes are only
built and dealt with if kallsyms is enabled. The patch below fixes
this.
( note, technically speaking this should depend on CONFIG_SYSFS as
well but this patch is correct too and keeps the #ifdef less
intrusive - in the KALLSYMS && !SYSFS case the code is a NOP. )
Signed-off-by: Ingo Molnar <[email protected]>
[ Replaced patch with a slightly cleaner variation by James Bottomley ]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The root cause is a duplicate section name (.text); is this legal?
[ Amerigo Wang: "AFAIK, yes." ]
However, there's a problem with commit
6d76013381ed28979cd122eb4b249a88b5e384fa in that if you fail to allocate
a mod->sect_attrs (in this case it's null because of the duplication),
it still gets used without checking in add_notes_attrs()
This should fix it
[ This patch leaves other problems, particularly the sections directory,
but recent parisc toolchains seem to produce these modules and this
prevents a crash and is a minimal change -- RR ]
Signed-off-by: James Bottomley <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Tested-by: Helge Deller <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The rarely-used symbol_put_addr() needs to use dereference_function_descriptor
on powerpc.
Reported-by: Paul Mackerras <[email protected]>
Signed-off-by: Rusty Russell <[email protected].
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Spotted by Hiroshi Shimamoto who also provided the test-case below.
copy_process() uses signal->count as a reference counter, but it is not.
This test case
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
void *null_thread(void *p)
{
for (;;)
sleep(1);
return NULL;
}
void *exec_thread(void *p)
{
execl("/bin/true", "/bin/true", NULL);
return null_thread(p);
}
int main(int argc, char **argv)
{
for (;;) {
pid_t pid;
int ret, status;
pid = fork();
if (pid < 0)
break;
if (!pid) {
pthread_t tid;
pthread_create(&tid, NULL, exec_thread, NULL);
for (;;)
pthread_create(&tid, NULL, null_thread, NULL);
}
do {
ret = waitpid(pid, &status, 0);
} while (ret == -1 && errno == EINTR);
}
return 0;
}
quickly creates an unkillable task.
If copy_process(CLONE_THREAD) races with de_thread()
copy_signal()->atomic(signal->count) breaks the signal->notify_count
logic, and the execing thread can hang forever in kernel space.
Change copy_process() to increment count/live only when we know for sure
we can't fail. In this case the forked thread will take care of its
reference to signal correctly.
If copy_process() fails, check CLONE_THREAD flag. If it it set - do
nothing, the counters were not changed and current belongs to the same
thread group. If it is not set, ->signal must be released in any case
(and ->count must be == 1), the forked child is the only thread in the
thread group.
We need more cleanups here, in particular signal->count should not be used
by de_thread/__exit_signal at all. This patch only fixes the bug.
Reported-by: Hiroshi Shimamoto <[email protected]>
Tested-by: Hiroshi Shimamoto <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Roland McGrath <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Fix typo in read() output generation
perf tools: Check perf.data owner
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clockevent: Prevent dead lock on clockevents_lock
timers: Drop write permission on /proc/timer_list
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Fix too large stack usage in do_one_initcall()
tracing: handle broken names in ftrace filter
ftrace: Unify effect of writing to trace_options and option/*
|
|
When you iterate a list, using the iterator is useful.
Before:
ID: 5
ID: 5
ID: 5
ID: 5
EVNT: 0x40088b scale: nan ID: 5 CNT: 1006252 ID: 6 CNT: 1011090 ID: 7 CNT: 1011196 ID: 8 CNT: 1011095
EVNT: 0x40088c scale: 1.000000 ID: 5 CNT: 2003065 ID: 6 CNT: 2011671 ID: 7 CNT: 2012620 ID: 8 CNT: 2013479
EVNT: 0x40088c scale: 1.000000 ID: 5 CNT: 3002390 ID: 6 CNT: 3015996 ID: 7 CNT: 3018019 ID: 8 CNT: 3020006
EVNT: 0x40088b scale: 1.000000 ID: 5 CNT: 4002406 ID: 6 CNT: 4021120 ID: 7 CNT: 4024241 ID: 8 CNT: 4027059
After:
ID: 1
ID: 2
ID: 3
ID: 4
EVNT: 0x400889 scale: nan ID: 1 CNT: 1005270 ID: 2 CNT: 1009833 ID: 3 CNT: 1010065 ID: 4 CNT: 1010088
EVNT: 0x400898 scale: nan ID: 1 CNT: 2001531 ID: 2 CNT: 2022309 ID: 3 CNT: 2022470 ID: 4 CNT: 2022627
EVNT: 0x400888 scale: 0.489467 ID: 1 CNT: 3001261 ID: 2 CNT: 3027088 ID: 3 CNT: 3027941 ID: 4 CNT: 3028762
Reported-by: stephane eranian <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Corey J Ashford <[email protected]>
Cc: perfmon2-devel <[email protected]>
LKML-Reference: <1250867976.7538.73.camel@twins>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tools: Make 'make html' work
perf annotate: Fix segmentation fault
perf_counter: Fix the PARISC build
perf_counter: Check task on counter read IPI
perf: Rename perf-examples.txt to examples.txt
perf record: Fix typo in pid_synthesize_comm_event
|
|
Currently clockevents_notify() is called with interrupts enabled at
some places and interrupts disabled at some other places.
This results in a deadlock in this scenario.
cpu A holds clockevents_lock in clockevents_notify() with irqs enabled
cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled
cpu C doing set_mtrr() which will try to rendezvous of all the cpus.
This will result in C and A come to the rendezvous point and waiting
for B. B is stuck forever waiting for the spinlock and thus not
reaching the rendezvous point.
Fix the clockevents code so that clockevents_lock is taken with
interrupts disabled and thus avoid the above deadlock.
Also call lapic_timer_propagate_broadcast() on the destination cpu so
that we avoid calling smp_call_function() in the clockevents notifier
chain.
This issue left us wondering if we need to change the MTRR rendezvous
logic to use stop machine logic (instead of smp_call_function) or add
a check in spinlock debug code to see if there are other spinlocks
which gets taken under both interrupts enabled/disabled conditions.
Signed-off-by: Suresh Siddha <[email protected]>
Signed-off-by: Venkatesh Pallipadi <[email protected]>
Cc: "Pallipadi Venkatesh" <[email protected]>
Cc: "Brown Len" <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
If one filter item (for set_ftrace_filter and set_ftrace_notrace) is being
setup by more than 1 consecutive writes (FTRACE_ITER_CONT flag), it won't
be handled corretly.
I used following program to test/verify:
[snip]
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
int main(int argc, char **argv)
{
int fd, i;
char *file = argv[1];
if (-1 == (fd = open(file, O_WRONLY))) {
perror("open failed");
return -1;
}
for(i = 0; i < (argc - 2); i++) {
int len = strlen(argv[2+i]);
int cnt, off = 0;
while(len) {
cnt = write(fd, argv[2+i] + off, len);
len -= cnt;
off += cnt;
}
}
close(fd);
return 0;
}
[snip]
before change:
sh-4.0# echo > ./set_ftrace_filter
sh-4.0# /test ./set_ftrace_filter "sys" "_open "
sh-4.0# cat ./set_ftrace_filter
#### all functions enabled ####
sh-4.0#
after change:
sh-4.0# echo > ./set_ftrace_notrace
sh-4.0# test ./set_ftrace_notrace "sys" "_open "
sh-4.0# cat ./set_ftrace_notrace
sys_open
sh-4.0#
Signed-off-by: Jiri Olsa <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
the mm_struct. It was a very good first step for sanitize OOM.
However Paul Menage reported the commit makes regression to his job
scheduler. Current OOM logic can kill OOM_DISABLED process.
Why? His program has the code of similar to the following.
...
set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
...
if (vfork() == 0) {
set_oom_adj(0); /* Invoked child can be killed */
execve("foo-bar-cmd");
}
....
vfork() parent and child are shared the same mm_struct. then above
set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
change oom_adj for vfork() parent. Then, vfork() parent (job scheduler)
lost OOM immune and it was killed.
Actually, fork-setting-exec idiom is very frequently used in userland program.
We must not break this assumption.
Then, this patch revert commit 2ff05b2b and related commit.
Reverted commit list
---------------------
- commit 2ff05b2b4e (oom: move oom_adj value from task_struct to mm_struct)
- commit 4d8b9135c3 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
- commit 8123681022 (oom: only oom kill exiting tasks with attached memory)
- commit 933b787b57 (mm: copy over oom_adj value at fork time)
Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Wake up irq thread after action has been installed
|
|
The wake_up_process() of the new irq thread in __setup_irq() is too
early as the irqaction is not yet fully initialized especially
action->irq is not yet set. The interrupt thread might dereference the
wrong irq descriptor.
Move the wakeup after the action is installed and action->irq has been
set.
Reported-by: Michael Buesch <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Michael Buesch <[email protected]>
|
|
PARISC does not build:
/home/mingo/tip/kernel/perf_counter.c: In function 'perf_counter_index':
/home/mingo/tip/kernel/perf_counter.c:2016: error: 'PERF_COUNTER_INDEX_OFFSET' undeclared (first use in this function)
/home/mingo/tip/kernel/perf_counter.c:2016: error: (Each undeclared identifier is reported only once
/home/mingo/tip/kernel/perf_counter.c:2016: error: for each function it appears in.)
As PERF_COUNTER_INDEX_OFFSET is not defined.
Now, we could define it in the architecture - but lets also provide
a core default of 0 (which happens to be what all but one
architecture uses at the moment).
Architectures that need a different index offset should set this
value in their asm/perf_counter.h files.
Cc: Kyle McMartin <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
"echo noglobal-clock > trace_options" can be used to change trace
clock but "echo 0 > options/global-clock" can't. The flag toggling
will be silently accepted without actually changing the clock callback.
We can fix it by using set_tracer_flags() in
trace_options_core_write().
Changelog:
v1->v2: Simplified switch() after Li Zefan <[email protected]>'s
suggestion
Signed-off-by: Zhao Lei <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Li Zefan <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
/proc/timer_list and /proc/slabinfo are not supposed to be
written, so there should be no write permissions on it.
Signed-off-by: WANG Cong <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Vegard Nossum <[email protected]>
Cc: Eduard - Gabriel Munteanu <[email protected]>
Cc: [email protected]
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Amerigo Wang <[email protected]>
Cc: Matt Mackall <[email protected]>
Cc: Arjan van de Ven <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
In general, code in perf_counter.c that is called through an
IPI checks, for per-task counters, that the counter's task is
still the current task. This is to handle the race condition
where the cpu switches from the task we want to another task in
the interval between sending the IPI and the IPI arriving and
being handled on the target CPU.
For some reason, __perf_counter_read is missing this check, yet
there is no reason why the race condition can't occur. This
adds a check that the current task is the one we want. If it
isn't, we just return. In that case the counter->count value
should be up to date, since it will have been updated when the
counter was scheduled out, which must have happened since the
IPI was sent.
I don't have an example of an actual failure due to this race,
but it seems obvious that it could occur and we need to guard
against it.
Signed-off-by: Paul Mackerras <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Currently SELinux enforcement of controls on the ability to map low memory
is determined by the mmap_min_addr tunable. This patch causes SELinux to
ignore the tunable and instead use a seperate Kconfig option specific to how
much space the LSM should protect.
The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
permissions will always protect the amount of low memory designated by
CONFIG_LSM_MMAP_MIN_ADDR.
This allows users who need to disable the mmap_min_addr controls (usual reason
being they run WINE as a non-root user) to do so and still have SELinux
controls preventing confined domains (like a web server) from being able to
map some area of low memory.
Signed-off-by: Eric Paris <[email protected]>
Signed-off-by: James Morris <[email protected]>
|
|
free_irq() can remove an irqaction while the corresponding interrupt
is in progress, but free_irq() sets action->thread to NULL
unconditionally, which might lead to a NULL pointer dereference in
handle_IRQ_event() when the hard interrupt context tries to wake up
the handler thread.
Prevent this by moving the thread stop after synchronize_irq(). No
need to set action->thread to NULL either as action is going to be
freed anyway.
This fixes a boot crash reported against preempt-rt which uses the
mainline irq threads code to implement full irq threading.
[ tglx: removed local irqthread variable ]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Report the cloning task as parent on perf_counter_fork()
perf_counter: Fix an ipi-deadlock
perf: Rework/fix the whole read vs group stuff
perf_counter: Fix swcounter context invariance
perf report: Don't show unresolved DSOs and symbols when -S/-d is used
perf tools: Add a general option to enable raw sample records
perf tools: Add a per tracepoint counter attribute to get raw sample
perf_counter: Provide hw_perf_counter_setup_online() APIs
perf list: Fix large list output by using the pager
perf_counter, x86: Fix/improve apic fallback
perf record: Add missing -C option support for specifying profile cpu
perf tools: Fix dso__new handle() to handle deleted DSOs
perf tools: Fix fallback to cplus_demangle() when bfd_demangle() is not available
perf report: Show the tid too in -D
perf record: Fix .tid and .pid fill-in when synthesizing events
perf_counter, x86: Fix generic cache events on P6-mobile CPUs
perf_counter, x86: Fix lapic printk message
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: Fix handling of bad requeue syscall pairing
futex: Fix compat_futex to be same as futex for REQUEUE_PI
locking, sched: Give waitqueue spinlocks their own lockdep classes
futex: Update futex_q lock_ptr on requeue proxy lock
|
|
A bug in (9f498cc: perf_counter: Full task tracing) makes
profiling multi-threaded apps it go belly up.
[ output as: (PID:TID):(PPID:PTID) ]
# ./perf report -D | grep FORK
0x4b0 [0x18]: PERF_EVENT_FORK: (3237:3237):(3236:3236)
0xa10 [0x18]: PERF_EVENT_FORK: (3237:3238):(3236:3236)
0xa70 [0x18]: PERF_EVENT_FORK: (3237:3239):(3236:3236)
0xad0 [0x18]: PERF_EVENT_FORK: (3237:3240):(3236:3236)
0xb18 [0x18]: PERF_EVENT_FORK: (3237:3241):(3236:3236)
Shows us that the test (27d028d perf report: Update for the new
FORK/EXIT events) in builtin-report.c:
/*
* A thread clone will have the same PID for both
* parent and child.
*/
if (thread == parent)
return 0;
Will clearly fail.
The problem is that perf_counter_fork() reports the actual
parent, instead of the cloning thread.
Fixing that (with the below patch), yields:
# ./perf report -D | grep FORK
0x4c8 [0x18]: PERF_EVENT_FORK: (1590:1590):(1589:1589)
0xbd8 [0x18]: PERF_EVENT_FORK: (1590:1591):(1590:1590)
0xc80 [0x18]: PERF_EVENT_FORK: (1590:1592):(1590:1590)
0x3338 [0x18]: PERF_EVENT_FORK: (1590:1593):(1590:1590)
0x66b0 [0x18]: PERF_EVENT_FORK: (1590:1594):(1590:1590)
Which both makes more sense and doesn't confuse perf report
anymore.
Reported-by: Pekka Enberg <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: [email protected]
Cc: Anton Blanchard <[email protected]>
Cc: Arjan van de Ven <[email protected]>
LKML-Reference: <1250172882.5241.62.camel@twins>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
perf_pending_counter() is called from IRQ context and will call
perf_counter_disable(), however perf_counter_disable() uses
smp_call_function_single() which doesn't fancy being used with
IRQs disabled due to IPI deadlocks.
Fix this by making it use the local __perf_counter_disable()
call and teaching the counter_sched_out() code about pending
disables as well.
This should cover the case where a counter migrates before the
pending queue gets processed.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Corey J Ashford <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: stephane eranian <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Replace PERF_SAMPLE_GROUP with PERF_SAMPLE_READ and introduce
PERF_FORMAT_GROUP to deal with group reads in a more generic
way.
This allows you to get group reads out of read() as well.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Corey J Ashford <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: stephane eranian <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
perf_swcounter_is_counting() uses a lock, which means we cannot
use swcounters from NMI or when holding that particular lock,
this is unintended.
The below removes the lock, this opens up race window, but not
worse than the swcounters already experience due to RCU
traversal of the context in perf_swcounter_ctx_event().
This also fixes the hard lockups while opening a lockdep
tracepoint counter.
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Frederic Weisbecker <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: stephane eranian <[email protected]>
Cc: Corey J Ashford <[email protected]>
LKML-Reference: <1250149915.10001.66.camel@twins>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Provide weak aliases for hw_perf_counter_setup_online(). This is
used by the BTS patches (for v2.6.32), but it interacts with
fixes so propagate this upstream. (it has no effect as of yet)
Also export perf_counter_output() to architecture code.
Cc: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
commit fd51d251e4cdb21f68e9dbc4336514d64a105a79
Author: Stefan Raspl <[email protected]>
Date: Tue May 19 09:59:08 2009 +0200
blktrace: remove debugfs entries on bad path
added in an explicit invocation of debugfs_remove for bt->dir, in
blk_remove_buf_file_callback we are also getting the directory removed. On
occasion I am seeing memory corruption that I have bisected down to
this commit. [The testing involves a (long) series of I/O benchmarks
with blktrace invoked around the actual runs.] I believe that this
committed patch is correct, but the problem actually lies in the code
in blk_remove_buf_file_callback.
With this patch I am able to consistently get complete runs whereas
previously I could not get a single run to complete.
The first part of the patch simply moves the debugfs_remove below the
relay_close: the relay_close call will remove files under bt->dir, and
so we should not remove the directory until all the files we created
have been removed. (Note: This is not sufficient to fix the problem -
the file system code has ref counts on the directoy, so our invocation
does not cause the directory to actually be removed. Nonetheless, we
should not rely upon that feature.)
Signed-off-by: Alan D. Brunelle <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
perf_counter: Zero dead bytes from ftrace raw samples size alignment
perf_counter: Subtract the buffer size field from the event record size
perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data
perf_counter: Correct PERF_SAMPLE_RAW output
perf tools: callchain: Fix bad rounding of minimum rate
perf_counter tools: Fix libbfd detection for systems with libz dependency
perf: "Longum est iter per praecepta, breve et efficax per exempla"
perf_counter: Fix a race on perf_counter_ctx
perf_counter: Fix tracepoint sampling to be part of generic sampling
perf_counter: Work around gcc warning by initializing tracepoint record unconditionally
perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode
perf tools: callchain: Fix 'perf report' display to be callchain by default
perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains
perf record: Fix the -A UI for empty or non-existent perf.data
perf util: Fix do_read() to fail on EOF instead of busy-looping
perf list: Fix the output to not include tracepoints without an id
perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support
perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale
perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc)
perf report: Fix per task mult-counter stat reporting
...
|
|
If futex_requeue(requeue_pi=1) finds a futex_q that was created by a call
other the futex_wait_requeue_pi(), the q.rt_waiter may be null. If so,
this will result in an oops from the following call graph:
futex_requeue()
rt_mutex_start_proxy_lock()
task_blocks_on_rt_mutex()
waiter->task dereference
OOPS
We currently WARN_ON() if this is detected, clearly this is inadequate.
If we detect a mispairing in futex_requeue(), bail out, seding -EINVAL to
user-space.
V2: Fix parenthesis warnings.
Signed-off-by: Darren Hart <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: John Kacur <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Dinakar Guniguntala <[email protected]>
Cc: John Stultz <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/irq: Fix move_irq_desc() for nodes without ram
|
|
Need to add the REQUEUE_PI checks to the compat_sys_futex API
as well to ensure 32 bit requeue's work fine on a 64 bit
system. Patch is against latest tip
Signed-off-by: Dinakar Guniguntala <[email protected]>
Cc: Darren Hart <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Give waitqueue spinlocks their own lockdep classes when they
are initialised from init_waitqueue_head(). This means that
struct wait_queue::func functions can operate other waitqueues.
This is used by CacheFiles to catch the page from a backing fs
being unlocked and to wake up another thread to take a copy of
it.
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: David Howells <[email protected]>
Tested-by: Takashi Iwai <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Raw tracepoint data contains various kernel internals and
data from other users, so restrict this to CAP_SYS_ADMIN.
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Frederic Weisbecker <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <1249896452.17467.75.camel@twins>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
PERF_SAMPLE_* output switches should unconditionally output the
correct format, as they are the only way to unambiguously parse
the PERF_EVENT_SAMPLE data.
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Frederic Weisbecker <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <1249896447.17467.74.camel@twins>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
futex_requeue() can acquire the lock on behalf of a waiter
early on or during the requeue loop if it is uncontended or in
the event of a lock steal or owner died. On wakeup, the waiter
(in futex_wait_requeue_pi()) cleans up the pi_state owner using
the lock_ptr to protect against concurrent access to the
pi_state. The pi_state is hung off futex_q's on the requeue
target futex hash bucket so the lock_ptr needs to be updated
accordingly.
The problem manifested by triggering the WARN_ON in
lookup_pi_state() about the pid != pi_state->owner->pid. With
this patch, the pi_state is properly guarded against concurrent
access via the requeue target hb lock.
The astute reviewer may notice that there is a window of time
between when futex_requeue() unlocks the hb locks and when
futex_wait_requeue_pi() will acquire hb2->lock. During this
time the pi_state and uval are not in sync with the underlying
rtmutex owner (but the uval does indicate there are waiters, so
no atomic changes will occur in userspace). However, this is
not a problem. Should a contending thread enter
lookup_pi_state() and acquire hb2->lock before the ownership is
fixed up, it will find the pi_state hung off a waiter's
(possibly the pending owner's) futex_q and block on the
rtmutex. Once futex_wait_requeue_pi() fixes up the owner, it
will also move the pi_state from the old owner's
task->pi_state_list to its own.
v3: Fix plist lock name for application to mainline (rather
than -rt) Compile tested against tip/v2.6.31-rc5.
Signed-off-by: Darren Hart <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Dinakar Guniguntala <[email protected]>
Cc: John Stultz <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
posix_cpu_timers_exit_group(): Do not use thread_group_cputimer()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Fix/complete ftrace event records sampling
perf_counter, ftrace: Fix perf_counter integration
tracing/filters: Always free pred on filter_add_subsystem_pred() failure
tracing/filters: Don't use pred on alloc failure
ring-buffer: Fix memleak in ring_buffer_free()
tracing: Fix recordmcount.pl to handle sections with only weak functions
ring-buffer: Fix advance of reader in rb_buffer_peek()
tracing: do not use functions starting with .L in recordmcount.pl
ring-buffer: do not disable ring buffer on oops_in_progress
ring-buffer: fix check of try_to_discard result
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: Fix typos in documentation
lockdep: Fix file mode of lock_stat
rtmutex: Avoid deadlock in rt_mutex_start_proxy_lock()
|
|
While extending perfcounters with BTS hw-tracing, Markus
Metzger managed to trigger this warning:
[ 995.557128] WARNING: at kernel/perf_counter.c:1191 __perf_counter_task_sched_out+0x48/0x6b()
triggers because commit
9f498cc5be7e013d8d6e4c616980ed0ffc8680d2 (perf_counter: Full
task tracing) removed clearing of tsk->perf_counter_ctxp out
from under ctx->lock which introduced a race (against
perf_lock_task_context).
Move it back and deal with the exit notification by explicitly
passing along the former task context.
Reported-by: Markus T Metzger <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <1249667341.17467.5.camel@twins>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Based on Peter's comments, make tracepoint sampling generic
just like all the other sampling bits are. This is a rename
with no code changes:
- PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW
- struct perf_tracepoint_record to perf_raw_record
We want the system in place that transport tracepoints raw
samples events into the perf ring buffer to be generalized and
usable by any type of counter.
Reported-by; Peter Zijlstra <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
unconditionally
Despite that the tracepoint record is always present when the
PERF_SAMPLE_TP_RECORD flag is set, gcc raises a warning,
thinking it might not be initialized:
kernel/perf_counter.c: In function ‘perf_counter_output’:
kernel/perf_counter.c:2650: warning: ‘tp’ may be used uninitialized in this function
Then, initialize it to NULL and always check if it's not NULL
before dereference it.
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Reimplement the software counters to deal with fast moving
event sources (such as tracepoints). This means being able
to generate multiple overflows from a single 'event' as well
as support throttling.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This patch implements the kernel side support for ftrace event
record sampling.
A new counter sampling attribute is added:
PERF_SAMPLE_TP_RECORD
which requests ftrace events record sampling. In this case
if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
fires, we emit the tracepoint binary record to the
perfcounter event buffer, as a sample.
Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
record:
perf record -f -F 1 -a -e workqueue:workqueue_execution
perf report -D
0x21e18 [0x48]: event: 9
.
. ... raw event: size 72 bytes
. 0000: 09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff ......H........
. 0010: 0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00 ........!......
. 0020: 2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e +...........eve
. 0030: 74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00 ts/1...........
. 0040: e0 b1 31 81 ff ff ff ff .......
.
0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
The raw ftrace binary record starts at offset 0020.
Translation:
struct trace_entry {
type = 0x2b = 43;
flags = 1;
preempt_count = 2;
pid = 0xa = 10;
tgid = 0xa = 10;
}
thread_comm = "events/1"
thread_pid = 0xa = 10;
func = 0xffffffff8131b1e0 = flush_to_ldisc()
What will come next?
- Userspace support ('perf trace'), 'flight data recorder' mode
for perf trace, etc.
- The unconditional copy from the profiling callback brings
some costs however if someone wants no such sampling to
occur, and needs to be fixed in the future. For that we need
to have an instant access to the perf counter attribute.
This is a matter of a flag to add in the struct ftrace_event.
- Take care of the events recursivity! Don't ever try to record
a lock event for example, it seems some locking is used in
the profiling fast path and lead to a tracing recursivity.
That will be fixed using raw spinlock or recursivity
protection.
- [...]
- Profit! :-)
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Gabriel Munteanu <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Adds possible second part to the assign argument of TP_EVENT().
TP_perf_assign(
__perf_count(foo);
__perf_addr(bar);
)
Which, when specified make the swcounter increment with @foo instead
of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address
associated with the event) when this triggers a counter overflow.
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Paul Mackerras <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Merge reason: Merge up to almost-rc6 to pick up latest perfcounters
(on which we'll queue up a dependent fix)
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When the process exits we don't have to run new cputimer nor
use running one (as it not accounts when tsk->exit_state != 0)
to get process CPU times. As there is only one thread we can
just use CPU times fields from task and signal structs.
Signed-off-by: Stanislaw Gruszka <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Vitaly Mayatskikh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
If filter_add_subsystem_pred() fails due to ENOSPC or ENOMEM,
the pred doesn't get freed, while as a side effect it does for
other errors. Make it so the caller always frees the pred for
any error.
Signed-off-by: Tom Zanussi <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Li Zefan <[email protected]>
LKML-Reference: <1249746593.6453.32.camel@tropicana>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Dan Carpenter sent me a fix to prevent pred from being used if
it couldn't be allocated. I noticed the same problem also
existed for the create_pred() case and added a fix for that.
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Tom Zanussi <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Li Zefan <[email protected]>
LKML-Reference: <1249746549.6453.29.camel@tropicana>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Don't move it if target node is -1.
Signed-off-by: Yinghai Lu <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|