aboutsummaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)AuthorFilesLines
2011-08-19tracing/filter: Add startup tests for events filterJiri Olsa4-0/+264
Adding automated tests running as late_initcall. Tests are compiled in with CONFIG_FTRACE_STARTUP_TEST option. Adding test event "ftrace_test_filter" used to simulate filter processing during event occurance. String filters are compiled and tested against several test events with different values. Also testing that evaluation of explicit predicates is ommited due to the lazy filter evaluation. Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-08-19tracing/filter: Change filter_match_preds function to use walk_pred_treeJiri Olsa1-66/+58
Changing filter_match_preds function to use unified predicates tree processing. Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-08-19tracing/filter: Change fold_pred function to use walk_pred_treeJiri Olsa1-35/+33
Changing fold_pred_tree function to use unified predicates tree processing. Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-08-19tracing/filter: Change fold_pred_tree function to use walk_pred_treeJiri Olsa1-45/+20
Changing fold_pred_tree function to use unified predicates tree processing. Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-08-19tracing/filter: Change count_leafs function to use walk_pred_treeJiri Olsa1-33/+14
Changing count_leafs function to use unified predicates tree processing. Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-08-19tracing/filter: Unify predicate tree walking, change check_pred_tree ↵Jiri Olsa1-51/+86
function to use it Adding walk_pred_tree function to be used for walking throught the filter predicates. For each predicate the callback function is called, allowing users to add their own functionality or customize their way through the filter predicates. Changing check_pred_tree function to use walk_pred_tree. Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-08-19tracing/filter: Simplify tracepoint event lookupJiri Olsa1-6/+3
We dont need to perform lookup through the ftrace_events list, instead we can use the 'tp_event' field. Each perf_event contains tracepoint event field 'tp_event', which got initialized during the tracepoint event initialization. Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-08-19tracing/filter: Remove field_name from filter_pred structJiri Olsa2-51/+13
The field_name was used just for finding event's fields. This way we don't need to care about field_name allocation/free. Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-08-19tracing/filter: Separate predicate init and filter additionJiri Olsa1-33/+23
Making the code cleaner by having one function to fully prepare the predicate (create_pred), and another to add the predicate to the filter (filter_add_pred). As a benefit, this way the dry_run flag stays only inside the replace_preds function and is not passed deeper. Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-08-19tracing/filter: Use static allocation for filter predicatesJiri Olsa1-41/+16
Don't dynamically allocate filter_pred struct, use static memory. This way we can get rid of the code managing the dynamic filter_pred struct object. The create_pred function integrates create_logical_pred function. This way the static predicate memory is returned only from one place. Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-08-19Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds1-5/+16
* 'for-linus' of git://git.kernel.dk/linux-block: (23 commits) Revert "cfq: Remove special treatment for metadata rqs." block: fix flush machinery for stacking drivers with differring flush flags block: improve rq_affinity placement blktrace: add FLUSH/FUA support Move some REQ flags to the common bio/request area allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH xen/blkback: Make description more obvious. cfq-iosched: Add documentation about idling block: Make rq_affinity = 1 work as expected block: swim3: fix unterminated of_device_id table block/genhd.c: remove useless cast in diskstats_show() drivers/cdrom/cdrom.c: relax check on dvd manufacturer value drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse bsg-lib: add module.h include cfq-iosched: Reduce linked group count upon group destruction blk-throttle: correctly determine sync bio loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices loop: add management interface for on-demand device allocation loop: replace linked list of allocated devices with an idr index ...
2011-08-11blktrace: add FLUSH/FUA supportNamhyung Kim1-5/+16
Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or FUA follows WRITE, use the same 'F' flag for both cases and distinguish them by their (relative) position. The end results look like (other flags might be shown also): - WRITE: W - WRITE_FLUSH: FW - WRITE_FUA: WF - WRITE_FLUSH_FUA: FWF Note that we reuse TC_BARRIER due to lack of bit space of act_mask so that the older versions of blktrace tools will report flush requests as barriers from now on. Cc: Steven Rostedt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Reviewed-by: Jeff Moyer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2011-08-10tracing: Clean up tb_fmt to not give faulty compile warningSteven Rostedt1-9/+10
gcc incorrectly states that the variable "fmt" is uninitialized when CC_OPITMIZE_FOR_SIZE is set. Instead of just blindly setting fmt to NULL, the code is cleaned up a little to be a bit easier for humans to follow, as well as gcc to know the variables are initialized. Signed-off-by: Steven Rostedt <[email protected]>
2011-08-05Merge branch 'linus' into perf/urgentIngo Molnar16-488/+860
Merge reason: Include most of the merge window trees, to do fixes on top. Signed-off-by: Ingo Molnar <[email protected]>
2011-07-26atomic: use <linux/atomic.h>Arun Sharma2-2/+2
This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Miller <[email protected]> Cc: Eric Dumazet <[email protected]> Acked-by: Mike Frysinger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-07-25trace events: Update version number reference to new 3.x scheme for ↵Jesper Juhl1-1/+1
EVENT_POWER_TRACING_DEPRECATED What was scheduled to be 2.6.41 is now going to be 3.1 . Signed-off-by: Jesper Juhl <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-07-21Merge branch 'tip/perf/core' of ↵Ingo Molnar7-152/+419
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2011-07-21Merge branch 'perf/urgent' into perf/coreIngo Molnar5-30/+135
Merge reason: pick up the latest fixes - they won't make v3.0. Signed-off-by: Ingo Molnar <[email protected]>
2011-07-15tracing/kprobe: Update symbol reference when loading moduleMasami Hiramatsu1-1/+36
Since the address of a module-local variable can only be solved after the target module is loaded, the symbol fetch-argument should be updated when loading target module. Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Link: http://lkml.kernel.org/r/20110627072703.6528.75042.stgit@fedora15 Signed-off-by: Steven Rostedt <[email protected]>
2011-07-15tracing/kprobes: Support module init function probingMasami Hiramatsu1-26/+138
To support probing module init functions, kprobe-tracer allows user to define a probe on non-existed function when it is given with a module name. This also enables user to set a probe on a function on a specific module, even if a same name (but different) function is locally defined in another module. The module name must be in the front of function name and separated by a ':'. e.g. btrfs:btrfs_init_sysfs Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Link: http://lkml.kernel.org/r/20110627072656.6528.89970.stgit@fedora15 Signed-off-by: Steven Rostedt <[email protected]>
2011-07-15tracing/kprobes: Merge trace probe enable/disable functionsMasami Hiramatsu1-56/+36
Merge redundant enable/disable functions into enable_trace_probe() and disable_trace_probe(). Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Link: http://lkml.kernel.org/r/20110627072644.6528.26910.stgit@fedora15 [ converted kprobe selftest to use enable_trace_probe ] Signed-off-by: Steven Rostedt <[email protected]>
2011-07-14ftrace: Fix regression where ftrace breaks when modules are loadedSteven Rostedt1-2/+28
Enabling function tracer to trace all functions, then load a module and then disable function tracing will cause ftrace to fail. This can also happen by enabling function tracing on the command line: ftrace=function and during boot up, modules are loaded, then you disable function tracing with 'echo nop > current_tracer' you will trigger a bug in ftrace that will shut itself down. The reason is, the new ftrace code keeps ref counts of all ftrace_ops that are registered for tracing. When one or more ftrace_ops are registered, all the records that represent the functions that the ftrace_ops will trace have a ref count incremented. If this ref count is not zero, when the code modification runs, that function will be enabled for tracing. If the ref count is zero, that function will be disabled from tracing. To make sure the accounting was working, FTRACE_WARN_ON()s were added to updating of the ref counts. If the ref count hits its max (> 2^30 ftrace_ops added), or if the ref count goes below zero, a FTRACE_WARN_ON() is triggered which disables all modification of code. Since it is common for ftrace_ops to trace all functions in the kernel, instead of creating > 20,000 hash items for the ftrace_ops, the hash count is just set to zero, and it represents that the ftrace_ops is to trace all functions. This is where the issues arrise. If you enable function tracing to trace all functions, and then add a module, the modules function records do not get the ref count updated. When the function tracer is disabled, all function records ref counts are subtracted. Since the modules never had their ref counts incremented, they go below zero and the FTRACE_WARN_ON() is triggered. The solution to this is rather simple. When modules are loaded, and their functions are added to the the ftrace pool, look to see if any ftrace_ops are registered that trace all functions. And for those, update the ref count for the module function records. Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-07-14tracing/kprobes: Rename probe_* to trace_probe_*Masami Hiramatsu1-21/+22
Rename probe_* to trace_probe_* for avoiding namespace confliction. This also fixes improper names of find_probe_event() and cleanup_all_probes() to find_trace_probe() and release_all_trace_probes() respectively. Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> Link: http://lkml.kernel.org/r/20110627072636.6528.60374.stgit@fedora15 Signed-off-by: Steven Rostedt <[email protected]>
2011-07-14tracing: Have dynamic size event stack tracesSteven Rostedt3-19/+87
Currently the stack trace per event in ftace is only 8 frames. This can be quite limiting and sometimes useless. Especially when the "ignore frames" is wrong and we also use up stack frames for the event processing itself. Change this to be dynamic by adding a percpu buffer that we can write a large stack frame into and then copy into the ring buffer. For interrupts and NMIs that come in while another event is being process, will only get to use the 8 frame stack. That should be enough as the task that it interrupted will have the full stack frame anyway. Requested-by: Thomas Gleixner <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-07-13ftrace: Fix dynamic selftest failure on some archsSteven Rostedt1-0/+26
Archs that do not implement CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, will fail the dynamic ftrace selftest. The function tracer has a quick 'off' variable that will prevent the call back functions from being called. This variable is called function_trace_stop. In x86, this is implemented directly in the mcount assembly, but for other archs, an intermediate function is used called ftrace_test_stop_func(). In dynamic ftrace, the function pointer variable ftrace_trace_function is used to update the caller code in the mcount caller. But for archs that do not have CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST set, it only calls ftrace_test_stop_func() instead, which in turn calls __ftrace_trace_function. When more than one ftrace_ops is registered, the function it calls is ftrace_ops_list_func(), which will iterate over all registered ftrace_ops and call the callbacks that have their hash matching. The issue happens when two ftrace_ops are registered for different functions and one is then unregistered. The __ftrace_trace_function is then pointed to the remaining ftrace_ops callback function directly. This mean it will be called for all functions that were registered to trace by both ftrace_ops that were registered. This is not an issue for archs with CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, because the update of ftrace_trace_function doesn't happen until after all functions have been updated, and then the mcount caller is updated. But for those archs that do use the ftrace_test_stop_func(), the update is immediate. The dynamic selftest fails because it hits this situation, and the ftrace_ops that it registers fails to only trace what it was suppose to and instead traces all other functions. The solution is to delay the setting of __ftrace_trace_function until after all the functions have been updated according to the registered ftrace_ops. Also, function_trace_stop is set during the update to prevent function tracing from calling code that is caused by the function tracer itself. Signed-off-by: Steven Rostedt <[email protected]>
2011-07-13ftrace: Update filter when tracing enabled in set_ftrace_filter()Steven Rostedt1-0/+4
Currently, if set_ftrace_filter() is called when the ftrace_ops is active, the function filters will not be updated. They will only be updated when tracing is disabled and re-enabled. Update the functions immediately during set_ftrace_filter(). Signed-off-by: Steven Rostedt <[email protected]>
2011-07-13ftrace: Balance records when updating the hashSteven Rostedt1-16/+33
Whenever the hash of the ftrace_ops is updated, the record counts must be balance. This requires disabling the records that are set in the original hash, and then enabling the records that are set in the updated hash. Moving the update into ftrace_hash_move() removes the bug where the hash was updated but the records were not, which results in ftrace triggering a warning and disabling itself because the ftrace_ops filter is updated while the ftrace_ops was registered, and then the failure happens when the ftrace_ops is unregistered. The current code will not trigger this bug, but new code will. Signed-off-by: Steven Rostedt <[email protected]>
2011-07-07ftrace: Do not disable interrupts for modules in mcount updateSteven Rostedt1-5/+11
When I mounted an NFS directory, it caused several modules to be loaded. At the time I was running the preemptirqsoff tracer, and it showed the following output: # tracer: preemptirqsoff # # preemptirqsoff latency trace v1.1.5 on 2.6.33.9-rt30-mrg-test # -------------------------------------------------------------------- # latency: 1177 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: modprobe-19370 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: ftrace_module_notify # => ended at: ftrace_module_notify # # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| /_--=> lock-depth # |||||/ delay # cmd pid |||||| time | caller # \ / |||||| \ | / modprobe-19370 3d.... 0us!: ftrace_process_locs <-ftrace_module_notify modprobe-19370 3d.... 1176us : ftrace_process_locs <-ftrace_module_notify modprobe-19370 3d.... 1178us : trace_hardirqs_on <-ftrace_module_notify modprobe-19370 3d.... 1178us : <stack trace> => ftrace_process_locs => ftrace_module_notify => notifier_call_chain => __blocking_notifier_call_chain => blocking_notifier_call_chain => sys_init_module => system_call_fastpath That's over 1ms that interrupts are disabled on a Real-Time kernel! Looking at the cause (being the ftrace author helped), I found that the interrupts are disabled before the code modification of mcounts into nops. The interrupts only need to be disabled on start up around this code, not when modules are being loaded. Signed-off-by: Steven Rostedt <[email protected]>
2011-07-07tracing: Still trace filtered irq functions when irq trace is disabledSteven Rostedt2-17/+35
If a function is set to be traced by the set_graph_function, but the option funcgraph-irqs is zero, and the traced function happens to be called from a interrupt, it will not be traced. The point of funcgraph-irqs is to not trace interrupts when we are preempted by an irq, not to not trace functions we want to trace that happen to be *in* a irq. Luckily the current->trace_recursion element is perfect to add a flag to help us be able to trace functions within an interrupt even when we are not tracing interrupts that preempt the trace. Reported-by: Heiko Carstens <[email protected]> Tested-by: Heiko Carstens <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-07-07ftrace: Fix regression of :mod:module function enablingSteven Rostedt2-10/+5
The new code that allows different utilities to pick and choose what functions they trace broke the :mod: hook that allows users to trace only functions of a particular module. The reason is that the :mod: hook bypasses the hash that is setup to allow individual users to trace their own functions and uses the global hash directly. But if the global hash has not been set up, it will cause a bug: echo '*:mod:radeon' > /sys/kernel/debug/set_ftrace_filter produces: [drm:drm_mode_getfb] *ERROR* invalid framebuffer id [drm:radeon_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip BUG: unable to handle kernel paging request at ffffffff8160ec90 IP: [<ffffffff810d9136>] add_hash_entry+0x66/0xd0 PGD 1a05067 PUD 1a09063 PMD 80000000016001e1 Oops: 0003 [#1] SMP Jul 7 04:02:28 phyllis kernel: [55303.858604] CPU 1 Modules linked in: cryptd aes_x86_64 aes_generic binfmt_misc rfcomm bnep ip6table_filter hid radeon r8169 ahci libahci mii ttm drm_kms_helper drm video i2c_algo_bit intel_agp intel_gtt Pid: 10344, comm: bash Tainted: G WC 3.0.0-rc5 #1 Dell Inc. Inspiron N5010/0YXXJJ RIP: 0010:[<ffffffff810d9136>] [<ffffffff810d9136>] add_hash_entry+0x66/0xd0 RSP: 0018:ffff88003a96bda8 EFLAGS: 00010246 RAX: ffff8801301735c0 RBX: ffffffff8160ec80 RCX: 0000000000306ee0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880137c92940 RBP: ffff88003a96bdb8 R08: ffff880137c95680 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff81c9df78 R13: ffff8801153d1000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f329c18a700(0000) GS:ffff880137c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff8160ec90 CR3: 000000003002b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 10344, threadinfo ffff88003a96a000, task ffff88012fcfc470) Stack: 0000000000000fd0 00000000000000fc ffff88003a96be38 ffffffff810d92f5 ffff88011c4c4e00 ffff880000000000 000000000b69f4d0 ffffffff8160ec80 ffff8800300e6f06 0000000081130295 0000000000000282 ffff8800300e6f00 Call Trace: [<ffffffff810d92f5>] match_records+0x155/0x1b0 [<ffffffff810d940c>] ftrace_mod_callback+0xbc/0x100 [<ffffffff810dafdf>] ftrace_regex_write+0x16f/0x210 [<ffffffff810db09f>] ftrace_filter_write+0xf/0x20 [<ffffffff81166e48>] vfs_write+0xc8/0x190 [<ffffffff81167001>] sys_write+0x51/0x90 [<ffffffff815c7e02>] system_call_fastpath+0x16/0x1b Code: 48 8b 33 31 d2 48 85 f6 75 33 49 89 d4 4c 03 63 08 49 8b 14 24 48 85 d2 48 89 10 74 04 48 89 42 08 49 89 04 24 4c 89 60 08 31 d2 RIP [<ffffffff810d9136>] add_hash_entry+0x66/0xd0 RSP <ffff88003a96bda8> CR2: ffffffff8160ec90 ---[ end trace a5d031828efdd88e ]--- Reported-by: Brian Marete <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-07-07tracing: Have "enable" file use refcounts like the "filter" fileSteven Rostedt1-9/+22
The "enable" file for the event system can be removed when a module is unloaded and the event system only has events from that module. As the event system nr_events count goes to zero, it may be freed if its ref_count is also set to zero. Like the "filter" file, the "enable" file may be opened by a task and referenced later, after a module has been unloaded and the events for that event system have been removed. Although the "filter" file referenced the event system structure, the "enable" file only references a pointer to the event system name. Since the name is freed when the event system is removed, it is possible that an access to the "enable" file may reference a freed pointer. Update the "enable" file to use the subsystem_open() routine that the "filter" file uses, to keep a reference to the event system structure while the "enable" file is opened. Cc: <[email protected]> Reported-by: Johannes Berg <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-07-07tracing: Fix bug when reading system filters on module removalSteven Rostedt3-11/+82
The event system is freed when its nr_events is set to zero. This happens when a module created an event system and then later the module is removed. Modules may share systems, so the system is allocated when it is created and freed when the modules are unloaded and all the events under the system are removed (nr_events set to zero). The problem arises when a task opened the "filter" file for the system. If the module is unloaded and it removed the last event for that system, the system structure is freed. If the task that opened the filter file accesses the "filter" file after the system has been freed, the system will access an invalid pointer. By adding a ref_count, and using it to keep track of what is using the event system, we can free it after all users are finished with the event system. Cc: <[email protected]> Reported-by: Johannes Berg <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-07-05Merge branch 'tip/perf/core-2' of ↵Ingo Molnar11-308/+308
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
2011-06-14tracing/kprobes: Fix kprobe-tracer to support stack traceMasami Hiramatsu3-7/+42
Fix to support kernel stack trace correctly on kprobe-tracer. Since the execution path of kprobe-based dynamic events is different from other tracepoint-based events, normal ftrace_trace_stack() doesn't work correctly. To fix that, this introduces ftrace_trace_stack_regs() which traces stack via pt_regs instead of current stack register. e.g. # echo p schedule+4 > /sys/kernel/debug/tracing/kprobe_events # echo 1 > /sys/kernel/debug/tracing/options/stacktrace # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable # head -n 20 /sys/kernel/debug/tracing/trace bash-2968 [000] 10297.050245: p_schedule_4: (schedule+0x4/0x4ca) bash-2968 [000] 10297.050247: <stack trace> => schedule_timeout => n_tty_read => tty_read => vfs_read => sys_read => system_call_fastpath kworker/0:1-2940 [000] 10297.050265: p_schedule_4: (schedule+0x4/0x4ca) kworker/0:1-2940 [000] 10297.050266: <stack trace> => worker_thread => kthread => kernel_thread_helper sshd-1132 [000] 10297.050365: p_schedule_4: (schedule+0x4/0x4ca) sshd-1132 [000] 10297.050365: <stack trace> => sysret_careful Note: Even with this fix, the first entry will be skipped if the probe is put on the function entry area before the frame pointer is set up (usually, that is 4 bytes (push %bp; mov %sp %bp) on x86), because stack unwinder depends on the frame pointer. Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/20110608070934.17777.17116.stgit@fedora15 Signed-off-by: Steven Rostedt <[email protected]>
2011-06-14ring-buffer: Set __GFP_NORETRY flag for ring buffer allocating processVaibhav Nagarnaik1-6/+19
The tracing ring buffer is allocated from kernel memory. While allocating a large chunk of memory, OOM might happen which destabilizes the system. Thus random processes might get killed during the allocation. This patch adds __GFP_NORETRY flag to the ring buffer allocation calls to make it fail more gracefully if the system will not be able to complete the allocation request. Acked-by: David Rientjes <[email protected]> Signed-off-by: Vaibhav Nagarnaik <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Michael Rubin <[email protected]> Cc: David Sharp <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-06-14tracing: Convert to kstrtoul_from_userPeter Huewe5-110/+20
This patch replaces the code for getting an unsigned long from a userspace buffer by a simple call to kstroul_from_user. This makes it easier to read and less error prone. Signed-off-by: Peter Huewe <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-06-14tracing, function_graph: Add context-info support for function_graph tracerJiri Olsa1-22/+31
The function_graph tracer does not follow global context-info option. Adding TRACE_ITER_CONTEXT_INFO trace_flags check to enable it. With following commands: # echo function_graph > ./current_tracer # echo 0 > options/context-info # cat trace This is what it looked like before: # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 1) 0.079 us | } /* __vma_link_rb */ 1) 0.056 us | copy_page_range(); 1) | security_vm_enough_memory() { ... This is what it looks like now: # tracer: function_graph # } /* update_ts_time_stats */ timekeeping_max_deferment(); ... Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-06-14tracing, function_graph: Remove lock-depth from latency traceJiri Olsa1-4/+3
The lock_depth was removed in commit e6e1e25 tracing: Remove lock_depth from event entry Removing the lock_depth info from function_graph latency header. With following commands: # echo function_graph > ./current_tracer # echo 1 > options/latency-format # cat trace This is what it looked like before: # tracer: function_graph # # function_graph latency trace v1.1.5 on 3.0.0-rc1-tip+ # -------------------------------------------------------------------- # latency: 0 us, #59756/311298, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) # ----------------- # | task: -0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> lock-depth # |||| / # CPU||||| DURATION FUNCTION CALLS # | ||||| | | | | | | 0) .... 0.068 us | } /* __rcu_read_unlock */ ... This is what it looks like now: # tracer: function_graph # # function_graph latency trace v1.1.5 on 3.0.0-rc1-tip+ # -------------------------------------------------------------------- # latency: 0 us, #59747/1744610, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) # ----------------- # | task: -0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # CPU|||| DURATION FUNCTION CALLS # | |||| | | | | | | 0) ..s. 1.641 us | } /* __rcu_process_callbacks */ ... Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-06-14tracing, function: Fix trace header to follow context-info optionJiri Olsa1-0/+3
The header display of function tracer does not follow the context-info option, so field names are displayed even if this option is off. Added check for TRACE_ITER_CONTEXT_INFO trace_flags. With following commands: # echo function > ./current_tracer # echo 0 > options/context-info # cat trace This is what it looked like before: # tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | add_preempt_count <-schedule rcu_note_context_switch <-schedule ... This is what it looks like now: # tracer: function # _raw_spin_unlock_irqrestore <-hrtimer_try_to_cancel ... Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-06-14tracing, function_graph: Merge overhead and duration display functionsJiri Olsa1-74/+74
Functions print_graph_overhead() and print_graph_duration() displays data for one field - DURATION. I merged them into single function print_graph_duration(), and added a way to display the empty parts of the field. This way the print_graph_irq() function can use this column to display the IRQ signs if needed and the DURATION field details stays inside the print_graph_duration() function. Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-06-14tracing, function_graph: Remove dependency of abstime and duration fields on ↵Jiri Olsa3-18/+9
latency The display of absolute time and duration fields is based on the latency field. This was added during the irqsoff/wakeup tracers graph support changes. It's causing confusion in what fields will be displayed for the function_graph tracer itself. So I'm removing this depency, and adding absolute time and duration fields to the preemptirqsoff preemptoff irqsoff wakeup tracers. With following commands: # echo function_graph > ./current_tracer # cat trace This is what it looked like before: # tracer: function_graph # # TIME CPU DURATION FUNCTION CALLS # | | | | | | | | 0) 0.068 us | } /* page_add_file_rmap */ 0) | _raw_spin_unlock() { ... This is what it looks like now: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) 0.068 us | } /* add_preempt_count */ 0) 0.993 us | } /* vfsmount_lock_local_lock */ ... For preemptirqsoff preemptoff irqsoff wakeup tracers, this is what it looked like before: SNIP # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> lock-depth # |||| / # CPU TASK/PID ||||| DURATION FUNCTION CALLS # | | | ||||| | | | | | | 1) <idle>-0 | d..1 0.000 us | acpi_idle_enter_simple(); ... This is what it looks like now: SNIP # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / # TIME CPU TASK/PID |||| DURATION FUNCTION CALLS # | | | | |||| | | | | | | 19.847735 | 1) <idle>-0 | d..1 0.000 us | acpi_idle_enter_simple(); ... Signed-off-by: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-06-14ftrace: Fixed an include coding style issuePaul McQuade1-5/+2
Removed <asm/ftrace.h> because <linux/ftrace.h> was already declared. Braces of struct's coding style fixed. Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Paul McQuade <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-06-14tracing: Add disable_on_free optionSteven Rostedt2-2/+5
Add a trace option to disable tracing on free. When this option is set, a write into the free_buffer file will not only shrink the ring buffer down to zero, but it will also disable tracing. Cc: Vaibhav Nagarnaik <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-06-14tracing: Add a proc file to stop tracing and free bufferVaibhav Nagarnaik1-35/+73
The proc file entry buffer_size_kb is used to set the size of tracing buffer. The memory to expand the buffer size is kernel memory. Consider a use case where tracing is handled by a user space utility, which acts as a gate keeper for tracing requests. In an OOM condition, tracing is considered a low priority task and if the utility gets killed the ring buffer memory cannot be released back to the kernel. This patch adds a proc file called "free_buffer" whose purpose is to stop tracing and free up the ring buffer when it is closed. The user space process can then set the desired size in buffer_size_kb file and open the fd to the "free_buffer" file. Under OOM condition, if the process gets killed, the kernel closes the file descriptor. The release handler stops the tracing and releases the kernel memory automatically. Cc: Ingo Molnar <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Michael Rubin <[email protected]> Cc: David Sharp <[email protected]> Signed-off-by: Vaibhav Nagarnaik <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-06-14tracing: Use NUMA allocation for per-cpu ring buffer pagesVaibhav Nagarnaik3-22/+23
The tracing ring buffer is a group of per-cpu ring buffers where allocation and logging is done on a per-cpu basis. The events that are generated on a particular CPU are logged in the corresponding buffer. This is to provide wait-free writes between CPUs and good NUMA node locality while accessing the ring buffer. However, the allocation routines consider NUMA locality only for buffer page metadata and not for the actual buffer page. This causes the pages to be allocated on the NUMA node local to the CPU where the allocation routine is running at the time. This patch fixes the problem by using a NUMA node specific allocation routine so that the pages are allocated from a NUMA node local to the logging CPU. I tested with the getuid_microbench from autotest. It is a simple binary that calls getuid() in a loop and measures the average time for the syscall to complete. The following command was used to test: $ getuid_microbench 1000000 Compared the numbers found on kernel with and without this patch and found that logging latency decreases by 30-50 ns/call. tracing with non-NUMA allocation - 569 ns/call tracing with NUMA allocation - 512 ns/call Signed-off-by: Vaibhav Nagarnaik <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Rubin <[email protected]> Cc: David Sharp <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-06-14tracing: Schedule a delayed work to call wakeup()Vaibhav Nagarnaik1-11/+12
In using syscall tracing by concurrent processes, the wakeup() that is called in the event commit function causes contention on the spin lock of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid tracepoints, and by running getuid_microbench from autotest in parallel I found that the contention causes exponential latency increase in the tracing path. The autotest binary getuid_microbench calls getuid() in a tight loop for the given number of iterations and measures the average time required to complete a single invocation of syscall. The patch schedules a delayed work after 2 ms once an event commit calls to wake up the trace wait_queue. This removes the delay caused by contention on spin lock in wakeup() and amortizes the wakeup() calls scheduled over the 2 ms period. In the following example, the script enables the sys_enter_getuid and sys_exit_getuid tracepoints and runs the getuid_microbench in parallel with the given number of processes. The output clearly shows the latency increase caused by contentions. $ ~/getuid.sh 1 1000000 calls in 0.720974253 s (720.974253 ns/call) $ ~/getuid.sh 2 1000000 calls in 1.166457554 s (1166.457554 ns/call) 1000000 calls in 1.168933765 s (1168.933765 ns/call) $ ~/getuid.sh 3 1000000 calls in 1.783827516 s (1783.827516 ns/call) 1000000 calls in 1.795553270 s (1795.553270 ns/call) 1000000 calls in 1.796493376 s (1796.493376 ns/call) $ ~/getuid.sh 4 1000000 calls in 4.483041796 s (4483.041796 ns/call) 1000000 calls in 4.484165388 s (4484.165388 ns/call) 1000000 calls in 4.484850762 s (4484.850762 ns/call) 1000000 calls in 4.485643576 s (4485.643576 ns/call) $ ~/getuid.sh 5 1000000 calls in 6.497521653 s (6497.521653 ns/call) 1000000 calls in 6.502000236 s (6502.000236 ns/call) 1000000 calls in 6.501709115 s (6501.709115 ns/call) 1000000 calls in 6.502124100 s (6502.124100 ns/call) 1000000 calls in 6.502936358 s (6502.936358 ns/call) After the patch, the latencies scale better. 1000000 calls in 0.728720455 s (728.720455 ns/call) 1000000 calls in 0.842782857 s (842.782857 ns/call) 1000000 calls in 0.883803135 s (883.803135 ns/call) 1000000 calls in 0.902077764 s (902.077764 ns/call) 1000000 calls in 0.902838202 s (902.838202 ns/call) 1000000 calls in 0.908896885 s (908.896885 ns/call) 1000000 calls in 0.932523515 s (932.523515 ns/call) 1000000 calls in 0.958009672 s (958.009672 ns/call) 1000000 calls in 0.986188020 s (986.188020 ns/call) 1000000 calls in 0.989771102 s (989.771102 ns/call) 1000000 calls in 0.933518391 s (933.518391 ns/call) 1000000 calls in 0.958897947 s (958.897947 ns/call) 1000000 calls in 1.031038897 s (1031.038897 ns/call) 1000000 calls in 1.089516025 s (1089.516025 ns/call) 1000000 calls in 1.141998347 s (1141.998347 ns/call) Signed-off-by: Vaibhav Nagarnaik <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Rubin <[email protected]> Cc: David Sharp <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>
2011-06-09tracing: Fix regression in printk_formats fileSteven Rostedt1-4/+1
The fix to fix the printk_formats of modules broke the printk_formats of trace_printks in the kernel. The update of what to show via the seq_file was only updated if the passed in fmt was NULL, which happens only on the first iteration. The result was showing the first format every time instead of iterating through the available formats. Signed-off-by: Steven Rostedt <[email protected]>
2011-06-07ftrace: Revert 8ab2b7efd ftrace: Remove unnecessary disabling of irqsSteven Rostedt1-0/+7
Revert the commit that removed the disabling of interrupts around the initial modifying of mcount callers to nops, and update the comment. The original comment was outdated and stated that the interrupts were being disabled to prevent kstop machine, which was required with the old ftrace daemon, but was no longer the case. What the comment failed to mention was that interrupts needed to be disabled to keep interrupts from preempting the modifying of the code and then executing the code that was partially modified. Revert the commit and update the comment. Reported-by: Richard W.M. Jones <[email protected]> Tested-by: Richard W.M. Jones <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-06-07kprobes/trace: Fix kprobe selftest for gcc 4.6Steven Rostedt1-2/+6
With gcc 4.6, the self test kprobe function: kprobe_trace_selftest_target() is optimized such that kallsyms does not list it. The kprobes test uses this function to insert a probe and test it. But it will fail the test if the function is not listed in kallsyms. Adding a __used annotation keeps the symbol in the kallsyms table. Suggested-by: David Daney <[email protected]> Cc: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-06-06ftrace: Fix possible undefined return codeGuoWen Li1-1/+1
kernel/trace/ftrace.c: In function 'ftrace_regex_write.clone.15': kernel/trace/ftrace.c:2743:6: warning: 'ret' may be used uninitialized in this function Signed-off-by: GuoWen Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt <[email protected]>