aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-04-27ring-buffer: Sync IRQ works before buffer destructionJohannes Berg1-0/+4
If something was written to the buffer just before destruction, it may be possible (maybe not in a real system, but it did happen in ARCH=um with time-travel) to destroy the ringbuffer before the IRQ work ran, leading this KASAN report (or a crash without KASAN): BUG: KASAN: slab-use-after-free in irq_work_run_list+0x11a/0x13a Read of size 8 at addr 000000006d640a48 by task swapper/0 CPU: 0 PID: 0 Comm: swapper Tainted: G W O 6.3.0-rc1 #7 Stack: 60c4f20f 0c203d48 41b58ab3 60f224fc 600477fa 60f35687 60c4f20f 601273dd 00000008 6101eb00 6101eab0 615be548 Call Trace: [<60047a58>] show_stack+0x25e/0x282 [<60c609e0>] dump_stack_lvl+0x96/0xfd [<60c50d4c>] print_report+0x1a7/0x5a8 [<603078d3>] kasan_report+0xc1/0xe9 [<60308950>] __asan_report_load8_noabort+0x1b/0x1d [<60232844>] irq_work_run_list+0x11a/0x13a [<602328b4>] irq_work_tick+0x24/0x34 [<6017f9dc>] update_process_times+0x162/0x196 [<6019f335>] tick_sched_handle+0x1a4/0x1c3 [<6019fd9e>] tick_sched_timer+0x79/0x10c [<601812b9>] __hrtimer_run_queues.constprop.0+0x425/0x695 [<60182913>] hrtimer_interrupt+0x16c/0x2c4 [<600486a3>] um_timer+0x164/0x183 [...] Allocated by task 411: save_stack_trace+0x99/0xb5 stack_trace_save+0x81/0x9b kasan_save_stack+0x2d/0x54 kasan_set_track+0x34/0x3e kasan_save_alloc_info+0x25/0x28 ____kasan_kmalloc+0x8b/0x97 __kasan_kmalloc+0x10/0x12 __kmalloc+0xb2/0xe8 load_elf_phdrs+0xee/0x182 [...] The buggy address belongs to the object at 000000006d640800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 584 bytes inside of freed 1024-byte region [000000006d640800, 000000006d640c00) Add the appropriate irq_work_sync() so the work finishes before the buffers are destroyed. Prior to the commit in the Fixes tag below, there was only a single global IRQ work, so this issue didn't exist. Link: https://lore.kernel.org/linux-trace-kernel/20230427175920.a76159263122.I8295e405c44362a86c995e9c2c37e3e03810aa56@changeid Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Fixes: 15693458c4bc ("tracing/ring-buffer: Move poll wake ups into ring buffer code") Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-26tracing: Add missing spaces in trace_print_hex_seq()Ken Lin1-1/+4
If the buffer length is larger than 16 and concatenate is set to false, there would be missing spaces every 16 bytes. Example: Before: c5 11 10 50 05 4d 31 40 00 40 00 40 00 4d 31 4000 40 00 After: c5 11 10 50 05 4d 31 40 00 40 00 40 00 4d 31 40 00 40 00 Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Signed-off-by: Ken Lin <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-26ring-buffer: Ensure proper resetting of atomic variables in ↵Tze-nan Wu1-3/+13
ring_buffer_reset_online_cpus In ring_buffer_reset_online_cpus, the buffer_size_kb write operation may permanently fail if the cpu_online_mask changes between two for_each_online_buffer_cpu loops. The number of increases and decreases on both cpu_buffer->resize_disabled and cpu_buffer->record_disabled may be inconsistent, causing some CPUs to have non-zero values for these atomic variables after the function returns. This issue can be reproduced by "echo 0 > trace" while hotplugging cpu. After reproducing success, we can find out buffer_size_kb will not be functional anymore. To prevent leaving 'resize_disabled' and 'record_disabled' non-zero after ring_buffer_reset_online_cpus returns, we ensure that each atomic variable has been set up before atomic_sub() to it. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: <[email protected]> Cc: [email protected] Fixes: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU") Reviewed-by: Cheng-Jui Wang <[email protected]> Signed-off-by: Tze-nan Wu <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-25recordmcount: Fix memory leaks in the uwrite functionHao Zeng1-1/+5
Common realloc mistake: 'file_append' nulled but not freed upon failure Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hao Zeng <[email protected]> Suggested-by: Steven Rostedt <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-25tracing/user_events: Limit max fault-in attemptsBeau Belgrave1-14/+35
When event enablement changes, user_events attempts to update a bit in the user process. If a fault is hit, an attempt to fault-in the page and the write is retried if the page made it in. While this normally requires a couple attempts, it is possible a bad user process could attempt to cause infinite loops. Ensure fault-in attempts either sync or async are limited to a max of 10 attempts for each update. When the max is hit, return -EFAULT so another attempt is not made in all cases. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-25tracing/user_events: Prevent same address and bit per processBeau Belgrave2-1/+49
User processes register an address and bit pair for events. If the same address and bit pair are registered multiple times in the same process, it can cause undefined behavior when events are enabled/disabled. When more than one are used, the bit could be turned off by another event being disabled, while the original event is still enabled. Prevent undefined behavior by checking the current mm to see if any event has already been registered for the address and bit pair. Return EADDRINUSE back to the user process if it's already being used. Update ftrace self-test to ensure this occurs properly. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Doug Cook <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-25tracing/user_events: Ensure bit is cleared on unregisterBeau Belgrave2-3/+40
If an event is enabled and a user process unregisters user_events, the bit is left set. Fix this by always clearing the bit in the user process if unregister is successful. Update abi self-test to ensure this occurs properly. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Doug Cook <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-25tracing/user_events: Ensure write index cannot be negativeBeau Belgrave2-0/+8
The write index indicates which event the data is for and accesses a per-file array. The index is passed by user processes during write() calls as the first 4 bytes. Ensure that it cannot be negative by returning -EINVAL to prevent out of bounds accesses. Update ftrace self-test to ensure this occurs properly. Link: https://lkml.kernel.org/r/[email protected] Fixes: 7f5a08c79df3 ("user_events: Add minimal support for trace_event into ftrace") Reported-by: Doug Cook <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-25seq_buf: Add seq_buf_do_printk() helperSergey Senozhatsky2-0/+34
Sometimes we use seq_buf to format a string buffer, which we then pass to printk(). However, in certain situations the seq_buf string buffer can get too big, exceeding the PRINTKRB_RECORD_MAX bytes limit, and causing printk() to truncate the string. Add a new seq_buf helper. This helper prints the seq_buf string buffer line by line, using \n as a delimiter, rather than passing the whole string buffer to printk() at once. Link: https://lkml.kernel.org/r/[email protected] Cc: Andrew Morton <[email protected]> Signed-off-by: Sergey Senozhatsky <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Tested-by: Yosry Ahmed <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-25tracing: Fix print_fields() for __dyn_loc/__rel_locBeau Belgrave1-4/+6
Both print_fields() and print_array() do not handle if dynamic data ends at the last byte of the payload for both __dyn_loc and __rel_loc field types. For __rel_loc, the offset was off by 4 bytes, leading to incorrect strings and data being printed out. In print_array() the buffer pos was missed from being advanced, which results in the first payload byte being used as the offset base instead of the field offset. Advance __rel_loc offset by 4 to ensure correct offset and advance pos to the field offset to ensure correct data is displayed when printing arrays. Change >= to > when checking if data is in-bounds, since it's valid for dynamic data to include the last byte of the payload. Example outputs for event format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:__rel_loc char text[]; offset:8; size:4; signed:1; Output before: tp_rel_loc: text=<OVERFLOW> Output after: tp_rel_loc: text=Test Link: https://lkml.kernel.org/r/[email protected] Fixes: 80a76994b2d8 ("tracing: Add "fields" option to show raw trace event fields") Reported-by: Doug Cook <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-25tracing/user_events: Set event filter_type from typeBeau Belgrave1-0/+3
Users expect that events can be filtered by the kernel. User events currently sets all event fields as FILTER_OTHER which limits to binary filters only. When strings are being used, functionality is reduced. Use filter_assign_type() to find the most appropriate filter type for each field in user events to ensure full kernel capabilities. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-04-25ring-buffer: Clearly check null ptr returned by rb_set_head_page()Zheng Yejian1-2/+3
In error case, 'buffer_page' returned by rb_set_head_page() is NULL, currently check '&buffer_page->list' is equivalent to check 'buffer_page' due to 'list' is the first member of 'buffer_page', but suppose it is not some time, 'head_page' would be wild memory while check would be bypassed. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: <[email protected]> Signed-off-by: Zheng Yejian <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing: Unbreak user eventsSteven Rostedt (Google)1-1/+0
The user events was added a bit prematurely, and there were a few kernel developers that had issues with it. The API also needed a bit of work to make sure it would be stable. It was decided to make user events "broken" until this was settled. Now it has a new API that appears to be as stable as it will be without the use of a crystal ball. It's being used within Microsoft as is, which means the API has had some testing in real world use cases. It went through many discussions in the bi-weekly tracing meetings, and there's been no more comments about updates. I feel this is good to go. Cc: Mathieu Desnoyers <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Use print_format_fields() for trace outputSteven Rostedt (Google)1-6/+3
Currently, user events are shown using the "hex" output for "safety" reasons as one cannot trust user events behaving nicely. But the hex output is not the only utility for safe outputting of trace events. The print_event_fields() is just as safe and gives user readable output. Before: example-839 [001] ..... 43.222244: 00000000: b1 06 00 00 47 03 00 00 00 00 00 00 ....G....... example-839 [001] ..... 43.564433: 00000000: b1 06 00 00 47 03 00 00 01 00 00 00 ....G....... example-839 [001] ..... 43.763917: 00000000: b1 06 00 00 47 03 00 00 02 00 00 00 ....G....... example-839 [001] ..... 43.967929: 00000000: b1 06 00 00 47 03 00 00 03 00 00 00 ....G....... After: example-837 [006] ..... 55.739249: test: count=0x0 (0) example-837 [006] ..... 111.104784: test: count=0x1 (1) example-837 [006] ..... 111.268444: test: count=0x2 (2) example-837 [006] ..... 111.416533: test: count=0x3 (3) example-837 [006] ..... 111.542859: test: count=0x4 (4) Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Align structs with tabs for readabilityBeau Belgrave3-60/+60
Add tabs to make struct members easier to read and unify the style of the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Limit global user_event countBeau Belgrave1-0/+47
Operators want to be able to ensure enough tracepoints exist on the system for kernel components as well as for user components. Since there are only up to 64K events, by default allow up to half to be used by user events. Add a kernel sysctl parameter (kernel.user_events_max) to set a global limit that is honored among all groups on the system. This ensures hard limits can be setup to prevent user processes from consuming all event IDs on the system. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Charge event allocs to cgroupsBeau Belgrave1-10/+10
Operators need a way to limit how much memory cgroups use. User events need to be included into that accounting. Fix this by using GFP_KERNEL_ACCOUNT for allocations generated by user programs for user_event tracing. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Update documentation for ABIBeau Belgrave1-70/+97
The ABI for user_events has changed from mmap() based to remote writes. Update the documentation to reflect these changes, add new section for unregistering events since lifetime is now tied to tasks instead of files. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Use write ABI in exampleBeau Belgrave1-37/+8
The ABI has changed to use a remote write approach. Update the example to show the expected use of this new ABI. Also remove debugfs path and use tracefs to ensure example works in more environments. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Add ABI self-testBeau Belgrave2-1/+227
Add ABI specific self-test to ensure enablements work in various scenarios such as fork, VM_CLONE, and basic event enable/disable. Ensure ABI contracts/limits are also being upheld, such as bit limits and data size limits. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Update self-tests to write ABIBeau Belgrave2-89/+96
ABI has been changed to remote writes, update existing test cases to use this new ABI to ensure existing functionality continues to work. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Add ioctl for disabling addressesBeau Belgrave2-2/+119
Enablements are now tracked by the lifetime of the task/mm. User processes need to be able to disable their addresses if tracing is requested to be turned off. Before unmapping the page would suffice. However, we now need a stronger contract. Add an ioctl to enable this. A new flag bit is added, freeing, to user_event_enabler to ensure that if the event is attempted to be removed while a fault is being handled that the remove is delayed until after the fault is reattempted. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Fixup enable faults asynclyBeau Belgrave1-6/+114
When events are enabled within the various tracing facilities, such as ftrace/perf, the event_mutex is held. As events are enabled pages are accessed. We do not want page faults to occur under this lock. Instead queue the fault to a workqueue to be handled in a process context safe way without the lock. The enable address is marked faulting while the async fault-in occurs. This ensures that we don't attempt to fault-in more than is necessary. Once the page has been faulted in, an address write is re-attempted. If the page couldn't fault-in, then we wait until the next time the event is enabled to prevent any potential infinite loops. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Use remote writes for event enablementBeau Belgrave4-142/+517
As part of the discussions for user_events aligned with user space tracers, it was determined that user programs should register a aligned value to set or clear a bit when an event becomes enabled. Currently a shared page is being used that requires mmap(). Remove the shared page implementation and move to a user registered address implementation. In this new model during the event registration from user programs 3 new values are specified. The first is the address to update when the event is either enabled or disabled. The second is the bit to set/clear to reflect the event being enabled. The third is the size of the value at the specified address. This allows for a local 32/64-bit value in user programs to support both kernel and user tracers. As an example, setting bit 31 for kernel tracers when the event becomes enabled allows for user tracers to use the other bits for ref counts or other flags. The kernel side updates the bit atomically, user programs need to also update these values atomically. User provided addresses must be aligned on a natural boundary, this allows for single page checking and prevents odd behaviors such as a enable value straddling 2 pages instead of a single page. Currently page faults are only logged, future patches will handle these. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Mathieu Desnoyers <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Track fork/exec/exit for mm lifetimeBeau Belgrave5-0/+29
During tracefs discussions it was decided instead of requiring a mapping within a user-process to track the lifetime of memory descriptors we should hook the appropriate calls. Do this by adding the minimal stubs required for task fork, exec, and exit. Currently this is just a NOP. Future patches will implement these calls fully. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Mathieu Desnoyers <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing/user_events: Split header into uapi and kernelBeau Belgrave3-51/+54
The UAPI parts need to be split out from the kernel parts of user_events now that other parts of the kernel will reference it. Do so by moving the existing include/linux/user_events.h into include/uapi/linux/user_events.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tracing: Add "fields" option to show raw trace event fieldsSteven Rostedt (Google)5-2/+183
The hex, raw and bin formats come from the old PREEMPT_RT patch set latency tracer. That actually gave real alternatives to reading the ascii buffer. But they have started to bit rot and they do not give a good representation of the tracing data. Add "fields" option that will read the trace event fields and parse the data from how the fields are defined: With "fields" = 0 (default) echo 1 > events/sched/sched_switch/enable cat trace <idle>-0 [003] d..2. 540.078653: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/3:1 next_pid=83 next_prio=120 kworker/3:1-83 [003] d..2. 540.078860: sched_switch: prev_comm=kworker/3:1 prev_pid=83 prev_prio=120 prev_state=I ==> next_comm=swapper/3 next_pid=0 next_prio=120 <idle>-0 [003] d..2. 540.206423: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=807 next_prio=120 sshd-807 [003] d..2. 540.206531: sched_switch: prev_comm=sshd prev_pid=807 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120 <idle>-0 [001] d..2. 540.206597: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120 kworker/u16:4-58 [001] d..2. 540.206617: sched_switch: prev_comm=kworker/u16:4 prev_pid=58 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=830 next_prio=120 bash-830 [001] d..2. 540.206678: sched_switch: prev_comm=bash prev_pid=830 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120 kworker/u16:4-58 [001] d..2. 540.206696: sched_switch: prev_comm=kworker/u16:4 prev_pid=58 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=830 next_prio=120 bash-830 [001] d..2. 540.206713: sched_switch: prev_comm=bash prev_pid=830 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120 echo 1 > options/fields <...>-998 [002] d..2. 538.643732: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/2 prev_state=0x20 (32) prev_prio=0x78 (120) prev_pid=0x3e6 (998) prev_comm=trace-cmd <idle>-0 [001] d..2. 538.643806: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x0 (0) prev_comm=swapper/1 bash-830 [001] d..2. 538.644106: sched_switch: next_prio=0x78 (120) next_pid=0x3a (58) next_comm=kworker/u16:4 prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash kworker/u16:4-58 [001] d..2. 538.644130: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x80 (128) prev_prio=0x78 (120) prev_pid=0x3a (58) prev_comm=kworker/u16:4 bash-830 [001] d..2. 538.644180: sched_switch: next_prio=0x78 (120) next_pid=0x3a (58) next_comm=kworker/u16:4 prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash kworker/u16:4-58 [001] d..2. 538.644185: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x80 (128) prev_prio=0x78 (120) prev_pid=0x3a (58) prev_comm=kworker/u16:4 bash-830 [001] d..2. 538.644204: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/1 prev_state=0x1 (1) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash <idle>-0 [003] d..2. 538.644211: sched_switch: next_prio=0x78 (120) next_pid=0x327 (807) next_comm=sshd prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x0 (0) prev_comm=swapper/3 sshd-807 [003] d..2. 538.644340: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/3 prev_state=0x1 (1) prev_prio=0x78 (120) prev_pid=0x327 (807) prev_comm=sshd It traces the data safely without using the trace print formatting. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29tools/kvm_stat: use canonical ftrace pathRoss Zwisler1-1/+1
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing A comment in kvm_stat still refers to this older debugfs path, so let's update it to avoid confusion. Link: https://lkml.kernel.org/r/[email protected] Cc: "Tobin C. Harding" <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Tycho Andersen <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Reviewed-by: Mukesh Ojha <[email protected]> Signed-off-by: Ross Zwisler <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29leaking_addresses: also skip canonical ftrace pathRoss Zwisler1-0/+1
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing scripts/leaking_addresses.pl only skipped this older debugfs path, so let's add the canonical path as well. Link: https://lkml.kernel.org/r/[email protected] Cc: "Tobin C. Harding" <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Shuah Khan <[email protected]> Acked-by: Tycho Andersen <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Ross Zwisler <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-29selftests: use canonical ftrace pathRoss Zwisler4-12/+12
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing A few spots in tools/testing/selftests still refer to this older debugfs path, so let's update them to avoid confusion. Link: https://lkml.kernel.org/r/[email protected] Cc: "Tobin C. Harding" <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Tycho Andersen <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Reviewed-by: Mukesh Ojha <[email protected]> Signed-off-by: Ross Zwisler <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28docs: tracing: Update fprobe documentationMasami Hiramatsu (Google)1-4/+12
Update fprobe.rst for - the private entry_data argument - the return value of the entry handler - the nr_rethook_node field. Link: https://lkml.kernel.org/r/167526701579.433354.3057889264263546659.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28lib/test_fprobe: Add a testcase for skipping exit_handlerMasami Hiramatsu (Google)1-1/+25
Add a testcase for skipping exit_handler if entry_handler returns !0. Link: https://lkml.kernel.org/r/167526700658.433354.12922388040490848613.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28fprobe: Skip exit_handler if entry_handler returns !0Masami Hiramatsu (Google)5-13/+32
Skip hooking function return and calling exit_handler if the entry_handler() returns !0. Link: https://lkml.kernel.org/r/167526699798.433354.10998365726830117303.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28lib/test_fprobe: Add a test case for nr_maxactiveMasami Hiramatsu (Google)1-0/+42
Add a test case for nr_maxactive. If the number of active functions is more than nr_maxactive, it must be skipped. Link: https://lkml.kernel.org/r/167526698856.433354.4430007340787176666.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28fprobe: Add nr_maxactive to specify rethook_node pool sizeMasami Hiramatsu (Google)2-1/+6
Add nr_maxactive to specify rethook_node pool size. This means the maximum number of actively running target functions concurrently for probing by exit_handler. Note that if the running function is preempted or sleep, it is still counted as 'active'. Link: https://lkml.kernel.org/r/167526697917.433354.17779774988245113106.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28lib/test_fprobe: Add private entry_data testcasesMasami Hiramatsu (Google)1-0/+30
Add test cases for checking whether private entry_data is correctly passed or not. Link: https://lkml.kernel.org/r/167526697074.433354.17790288501657876219.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-28fprobe: Pass entry_data to handlersMasami Hiramatsu (Google)5-14/+29
Pass the private entry_data to the entry and exit handlers so that they can share the context data, something like saved function arguments etc. User must specify the private entry_data size by @entry_data_size field before registering the fprobe. Link: https://lkml.kernel.org/r/167526696173.433354.17408372048319432574.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-21ftrace: Show a list of all functions that have ever been enabledSteven Rostedt (Google)2-6/+50
When debugging a crash that appears to be related to ftrace, but not for sure, it is useful to know if a function was ever enabled by ftrace or not. It could be that a BPF program was attached to it, or possibly a live patch. We are having crashes in the field where this information is not always known. But having ftrace set a flag if a function has ever been attached since boot up helps tremendously in trying to know if a crash had to do with something using ftrace. For analyzing crashes, the use of a kdump image can have access to the flags. When looking at issues where the kernel did not panic, the touched_functions file can simply be used. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Catalin Marinas <[email protected]> Tested-by: Mark Rutland <[email protected]> Tested-by: Chris Li <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-21ring_buffer: Use try_cmpxchg instead of cmpxchgUros Bizjak1-4/+4
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Signed-off-by: Uros Bizjak <[email protected]> Acked-by: Mukesh Ojha <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-21ring_buffer: Change some static functions to boolUros Bizjak1-24/+23
The return values of some functions are of boolean type. Change the type of these function to bool and adjust their return values. Also change type of some internal varibles to bool. No functional change intended. Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Signed-off-by: Uros Bizjak <[email protected]> Reviewed-by: Mukesh Ojha <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-21ring_buffer: Change some static functions to voidUros Bizjak1-15/+7
The results of some static functions are not used. Change the type of these function to void and remove unnecessary returns. No functional change intended. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uros Bizjak <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Reviewed-by: Mukesh Ojha <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-21ftrace: selftest: remove broken trace_direct_trampMark Rutland5-10/+18
The ftrace selftest code has a trace_direct_tramp() function which it uses as a direct call trampoline. This happens to work on x86, since the direct call's return address is in the usual place, and can be returned to via a RET, but in general the calling convention for direct calls is different from regular function calls, and requires a trampoline written in assembly. On s390, regular function calls place the return address in %r14, and an ftrace patch-site in an instrumented function places the trampoline's return address (which is within the instrumented function) in %r0, preserving the original %r14 value in-place. As a regular C function will return to the address in %r14, using a C function as the trampoline results in the trampoline returning to the caller of the instrumented function, skipping the body of the instrumented function. Note that the s390 issue is not detcted by the ftrace selftest code, as the instrumented function is trivial, and returning back into the caller happens to be equivalent. On arm64, regular function calls place the return address in x30, and an ftrace patch-site in an instrumented function saves this into r9 and places the trampoline's return address (within the instrumented function) in x30. A regular C function will return to the address in x30, but will not restore x9 into x30. Consequently, using a C function as the trampoline results in returning to the trampoline's return address having corrupted x30, such that when the instrumented function returns, it will return back into itself. To avoid future issues in this area, remove the trace_direct_tramp() function, and require that each architecture with direct calls provides a stub trampoline, named ftrace_stub_direct_tramp. This can be written to handle the architecture's trampoline calling convention, and in future could be used elsewhere (e.g. in the ftrace ops sample, to measure the overhead of direct calls), so we may as well always build it in. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mark Rutland <[email protected]> Cc: Li Huafei <[email protected]> Cc: Xu Kuohai <[email protected]> Signed-off-by: Florent Revest <[email protected]> Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-21ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGSFlorent Revest3-2/+8
Direct called trampolines can be called in two ways: - either from the ftrace callsite. In this case, they do not access any struct ftrace_regs nor pt_regs - Or, if a ftrace ops is also attached, from the end of a ftrace trampoline. In this case, the call_direct_funcs ops is in charge of setting the direct call trampoline's address in a struct ftrace_regs Since: commit 9705bc709604 ("ftrace: pass fregs to arch_ftrace_set_direct_caller()") The later case no longer requires a full pt_regs. It only needs a struct ftrace_regs so DIRECT_CALLS can work with both WITH_ARGS or WITH_REGS. With architectures like arm64 already abandoning WITH_REGS in favor of WITH_ARGS, it's important to have DIRECT_CALLS work WITH_ARGS only. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Florent Revest <[email protected]> Co-developed-by: Mark Rutland <[email protected]> Signed-off-by: Mark Rutland <[email protected]> Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-21ftrace: Store direct called addresses in their opsFlorent Revest2-2/+8
All direct calls are now registered using the register_ftrace_direct API so each ops can jump to only one direct-called trampoline. By storing the direct called trampoline address directly in the ops we can save one hashmap lookup in the direct call ops and implement arm64 direct calls on top of call ops. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Florent Revest <[email protected]> Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-21ftrace: Rename _ftrace_direct_multi APIs to _ftrace_direct APIsFlorent Revest10-54/+55
Now that the original _ftrace_direct APIs are gone, the "_multi" suffixes only add confusion. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Florent Revest <[email protected]> Acked-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-21ftrace: Remove the legacy _ftrace_direct APIFlorent Revest2-425/+0
This API relies on a single global ops, used for all direct calls registered with it. However, to implement arm64 direct calls, we need each ops to point to a single direct call trampoline. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Florent Revest <[email protected]> Acked-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-21ftrace: Replace uses of _ftrace_direct APIs with _ftrace_direct_multiFlorent Revest4-18/+28
The _multi API requires that users keep their own ops but can enforce that an op is only associated to one direct call. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Florent Revest <[email protected]> Acked-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-21ftrace: Let unregister_ftrace_direct_multi() call ftrace_free_filter()Florent Revest5-8/+12
A common pattern when using the ftrace_direct_multi API is to unregister the ops and also immediately free its filter. We've noticed it's very easy for users to miss calling ftrace_free_filter(). This adds a "free_filters" argument to unregister_ftrace_direct_multi() to both remind the user they should free filters and also to make their life easier. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Steven Rostedt <[email protected]> Signed-off-by: Florent Revest <[email protected]> Acked-by: Mark Rutland <[email protected]> Acked-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-03-19Linux 6.3-rc3Linus Torvalds1-1/+1
2023-03-19Merge tag 'trace-v6.3-rc2' of ↵Linus Torvalds6-17/+16
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix setting affinity of hwlat threads in containers Using sched_set_affinity() has unwanted side effects when being called within a container. Use set_cpus_allowed_ptr() instead - Fix per cpu thread management of the hwlat tracer: - Do not start per_cpu threads if one is already running for the CPU - When starting per_cpu threads, do not clear the kthread variable as it may already be set to running per cpu threads - Fix return value for test_gen_kprobe_cmd() On error the return value was overwritten by being set to the result of the call from kprobe_event_delete(), which would likely succeed, and thus have the function return success - Fix splice() reads from the trace file that was broken by commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") - Remove obsolete and confusing comment in ring_buffer.c The original design of the ring buffer used struct page flags for tricks to optimize, which was shortly removed due to them being tricks. But a comment for those tricks remained - Set local functions and variables to static * tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr ring-buffer: remove obsolete comment for free_buffer_page() tracing: Make splice_read available again ftrace: Set direct_ops storage-class-specifier to static trace/hwlat: Do not start per-cpu thread if it is already running trace/hwlat: Do not wipe the contents of per-cpu thread data tracing/osnoise: set several trace_osnoise.c variables storage-class-specifier to static tracing: Fix wrong return in kprobe_event_gen_test.c