aboutsummaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)AuthorFilesLines
2020-11-30ring-buffer: Always check to put back before stamp when crossing pagesSteven Rostedt (VMware)1-8/+6
The current ring buffer logic checks to see if the updating of the event buffer was interrupted, and if it is, it will try to fix up the before stamp with the write stamp to make them equal again. This logic is flawed, because if it is not interrupted, the two are guaranteed to be different, as the current event just updated the before stamp before allocation. This guarantees that the next event (this one or another interrupting one) will think it interrupted the time updates of a previous event and inject an absolute time stamp to compensate. The correct logic is to always update the timestamps when traversing to a new sub buffer. Cc: [email protected] Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30ftrace: Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependencyNaveen N. Rao1-1/+1
DYNAMIC_FTRACE_WITH_DIRECT_CALLS should depend on DYNAMIC_FTRACE_WITH_REGS since we need ftrace_regs_caller(). Link: https://lkml.kernel.org/r/fc4b257ea8689a36f086d2389a9ed989496ca63a.1606412433.git.naveen.n.rao@linux.vnet.ibm.com Cc: [email protected] Fixes: 763e34e74bb7d5c ("ftrace: Add register_ftrace_direct()") Signed-off-by: Naveen N. Rao <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30ftrace: Fix updating FTRACE_FL_TRAMPNaveen N. Rao1-1/+21
On powerpc, kprobe-direct.tc triggered FTRACE_WARN_ON() in ftrace_get_addr_new() followed by the below message: Bad trampoline accounting at: 000000004222522f (wake_up_process+0xc/0x20) (f0000001) The set of steps leading to this involved: - modprobe ftrace-direct-too - enable_probe - modprobe ftrace-direct - rmmod ftrace-direct <-- trigger The problem turned out to be that we were not updating flags in the ftrace record properly. From the above message about the trampoline accounting being bad, it can be seen that the ftrace record still has FTRACE_FL_TRAMP set though ftrace-direct module is going away. This happens because we are checking if any ftrace_ops has the FTRACE_FL_TRAMP flag set _before_ updating the filter hash. The fix for this is to look for any _other_ ftrace_ops that also needs FTRACE_FL_TRAMP. Link: https://lkml.kernel.org/r/56c113aa9c3e10c19144a36d9684c7882bf09af5.1606412433.git.naveen.n.rao@linux.vnet.ibm.com Cc: [email protected] Fixes: a124692b698b0 ("ftrace: Enable trampoline when rec count returns back to one") Signed-off-by: Naveen N. Rao <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30tracing: Fix alignment of static bufferMinchan Kim1-1/+1
With 5.9 kernel on ARM64, I found ftrace_dump output was broken but it had no problem with normal output "cat /sys/kernel/debug/tracing/trace". With investigation, it seems coping the data into temporal buffer seems to break the align binary printf expects if the static buffer is not aligned with 4-byte. IIUC, get_arg in bstr_printf expects that args has already right align to be decoded and seq_buf_bprintf says ``the arguments are saved in a 32bit word array that is defined by the format string constraints``. So if we don't keep the align under copy to temporal buffer, the output will be broken by shifting some bytes. This patch fixes it. Link: https://lkml.kernel.org/r/[email protected] Cc: <[email protected]> Fixes: 8e99cf91b99bb ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic") Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Minchan Kim <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30tracing: Remove WARN_ON in start_thread()Vasily Averin1-1/+1
This patch reverts commit 978defee11a5 ("tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists") .start hook can be legally called several times if according tracer is stopped screen window 1 [root@localhost ~]# echo 1 > /sys/kernel/tracing/events/kmem/kfree/enable [root@localhost ~]# echo 1 > /sys/kernel/tracing/options/pause-on-trace [root@localhost ~]# less -F /sys/kernel/tracing/trace screen window 2 [root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on 0 [root@localhost ~]# echo hwlat > /sys/kernel/debug/tracing/current_tracer [root@localhost ~]# echo 1 > /sys/kernel/debug/tracing/tracing_on [root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on 0 [root@localhost ~]# echo 2 > /sys/kernel/debug/tracing/tracing_on triggers warning in dmesg: WARNING: CPU: 3 PID: 1403 at kernel/trace/trace_hwlat.c:371 hwlat_tracer_start+0xc9/0xd0 Link: https://lkml.kernel.org/r/[email protected] Cc: Ingo Molnar <[email protected]> Cc: [email protected] Fixes: 978defee11a5 ("tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists") Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30ring-buffer: Set the right timestamp in the slow path of __rb_reserve_next()Andrea Righi1-3/+3
In the slow path of __rb_reserve_next() a nested event(s) can happen between evaluating the timestamp delta of the current event and updating write_stamp via local_cmpxchg(); in this case the delta is not valid anymore and it should be set to 0 (same timestamp as the interrupting event), since the event that we are currently processing is not the last event in the buffer. Link: https://lkml.kernel.org/r/X8IVJcp1gRE+FJCJ@xps-13-7390 Cc: Ingo Molnar <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: [email protected] Link: https://lwn.net/Articles/831207 Fixes: a389d86f7fd0 ("ring-buffer: Have nested events still record running time stamp") Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-30ring-buffer: Update write stamp with the correct tsSteven Rostedt (VMware)1-1/+1
The write stamp, used to calculate deltas between events, was updated with the stale "ts" value in the "info" structure, and not with the updated "ts" variable. This caused the deltas between events to be inaccurate, and when crossing into a new sub buffer, had time go backwards. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp") Reported-by: "J. Avila" <[email protected]> Tested-by: Daniel Mentz <[email protected]> Tested-by: Will McVicker <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-27Merge branch 'linus' into sched/core, to resolve semantic conflictIngo Molnar6-31/+97
Signed-off-by: Ingo Molnar <[email protected]>
2020-11-24irq_work: CleanupPeter Zijlstra1-1/+1
Get rid of the __call_single_node union and clean up the API a little to avoid external code relying on the structure layout as much. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]>
2020-11-19Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+11
Signed-off-by: Jakub Kicinski <[email protected]>
2020-11-19lib/strncpy_from_user.c: Mask out bytes after NUL terminator.Daniel Xu1-0/+10
do_strncpy_from_user() may copy some extra bytes after the NUL terminator into the destination buffer. This usually does not matter for normal string operations. However, when BPF programs key BPF maps with strings, this matters a lot. A BPF program may read strings from user memory by calling the bpf_probe_read_user_str() helper which eventually calls do_strncpy_from_user(). The program can then key a map with the destination buffer. BPF map keys are fixed-width and string-agnostic, meaning that map keys are treated as a set of bytes. The issue is when do_strncpy_from_user() overcopies bytes after the NUL terminator, it can result in seemingly identical strings occupying multiple slots in a BPF map. This behavior is subtle and totally unexpected by the user. This commit masks out the bytes following the NUL while preserving long-sized stride in the fast path. Fixes: 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers") Signed-off-by: Daniel Xu <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/21efc982b3e9f2f7b0379eed642294caaa0c27a7.1605642949.git.dxu@dxuuu.xyz
2020-11-18bpf: Add bpf_ktime_get_coarse_ns helperDmitrii Banshchikov1-0/+2
The helper uses CLOCK_MONOTONIC_COARSE source of time that is less accurate but more performant. We have a BPF CGROUP_SKB firewall that supports event logging through bpf_perf_event_output(). Each event has a timestamp and currently we use bpf_ktime_get_ns() for it. Use of bpf_ktime_get_coarse_ns() saves ~15-20 ns in time required for event logging. bpf_ktime_get_ns(): EgressLogByRemoteEndpoint 113.82ns 8.79M bpf_ktime_get_coarse_ns(): EgressLogByRemoteEndpoint 95.40ns 10.48M Signed-off-by: Dmitrii Banshchikov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-11-16tracepoints: Migrate to use SYSCALL_WORK flagGabriel Krisman Bertazi1-4/+4
On architectures using the generic syscall entry code the architecture independent syscall work is moved to flags in thread_info::syscall_work. This removes architecture dependencies and frees up TIF bits. Define SYSCALL_WORK_SYSCALL_TRACEPOINT, use it in the generic entry code and convert the code which uses the TIF specific helper functions to use the new *_syscall_work() helpers which either resolve to the new mode for users of the generic entry code or to the TIF based functions for the other architectures. Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-11-16tracing: Clean up after filter logic rewritingLukas Bulwahn1-21/+0
The functions event_{set,clear,}_no_set_filter_flag were only used in replace_system_preds() [now, renamed to process_system_preds()]. Commit 80765597bc58 ("tracing: Rewrite filter logic to be simpler and faster") removed the use of those functions in replace_system_preds(). Since then, the functions event_{set,clear,}_no_set_filter_flag were unused. Fortunately, make CC=clang W=1 indicates this with -Wunused-function warnings on those three functions. So, clean up these obsolete unused functions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Lukas Bulwahn <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-1/+28
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-11-14 1) Add BTF generation for kernel modules and extend BTF infra in kernel e.g. support for split BTF loading and validation, from Andrii Nakryiko. 2) Support for pointers beyond pkt_end to recognize LLVM generated patterns on inlined branch conditions, from Alexei Starovoitov. 3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh. 4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage infra, from Martin KaFai Lau. 5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the XDP_REDIRECT path, from Lorenzo Bianconi. 6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI context, from Song Liu. 7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build for different target archs on same source tree, from Jean-Philippe Brucker. 8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet. 9) Move functionality from test_tcpbpf_user into the test_progs framework so it can run in BPF CI, from Alexander Duyck. 10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner. Note that for the fix from Song we have seen a sparse report on context imbalance which requires changes in sparse itself for proper annotation detection where this is currently being discussed on linux-sparse among developers [0]. Once we have more clarification/guidance after their fix, Song will follow-up. [0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/ https://lore.kernel.org/linux-sparse/[email protected]/T/ * git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits) net: mlx5: Add xdp tx return bulking support net: mvpp2: Add xdp tx return bulking support net: mvneta: Add xdp tx return bulking support net: page_pool: Add bulk support for ptr_ring net: xdp: Introduce bulking for xdp tx return path bpf: Expose bpf_d_path helper to sleepable LSM hooks bpf: Augment the set of sleepable LSM hooks bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP bpf: Rename some functions in bpf_sk_storage bpf: Folding omem_charge() into sk_storage_charge() selftests/bpf: Add asm tests for pkt vs pkt_end comparison. selftests/bpf: Add skb_pkt_end test bpf: Support for pointers beyond pkt_end. tools/bpf: Always run the *-clean recipes tools/bpf: Add bootstrap/ to .gitignore bpf: Fix NULL dereference in bpf_task_storage tools/bpftool: Fix build slowdown tools/runqslower: Build bpftool using HOSTCC tools/runqslower: Enable out-of-tree build ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-11-13tracing: Remove the useless value assignment in test_create_synth_event()Kaixu Xia1-1/+1
The value of variable ret is overwritten on the delete branch in the test_create_synth_event() and we care more about the above error than this delete portion. Remove it. Link: https://lkml.kernel.org/r/[email protected] Reported-by: Tosk Robot <[email protected]> Signed-off-by: Kaixu Xia <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-13ftrace/x86: Allow for arguments to be passed in to ftrace_regs by defaultSteven Rostedt (VMware)1-0/+9
Currently, the only way to get access to the registers of a function via a ftrace callback is to set the "FL_SAVE_REGS" bit in the ftrace_ops. But as this saves all regs as if a breakpoint were to trigger (for use with kprobes), it is expensive. The regs are already saved on the stack for the default ftrace callbacks, as that is required otherwise a function being traced will get the wrong arguments and possibly crash. And on x86, the arguments are already stored where they would be on a pt_regs structure to use that code for both the regs version of a callback, it makes sense to pass that information always to all functions. If an architecture does this (as x86_64 now does), it is to set HAVE_DYNAMIC_FTRACE_WITH_ARGS, and this will let the generic code that it could have access to arguments without having to set the flags. This also includes having the stack pointer being saved, which could be used for accessing arguments on the stack, as well as having the function graph tracer not require its own trampoline! Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-13ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regsSteven Rostedt (VMware)8-31/+35
In preparation to have arguments of a function passed to callbacks attached to functions as default, change the default callback prototype to receive a struct ftrace_regs as the forth parameter instead of a pt_regs. For callbacks that set the FL_SAVE_REGS flag in their ftrace_ops flags, they will now need to get the pt_regs via a ftrace_get_regs() helper call. If this is called by a callback that their ftrace_ops did not have a FL_SAVE_REGS flag set, it that helper function will return NULL. This will allow the ftrace_regs to hold enough just to get the parameters and stack pointer, but without the worry that callbacks may have a pt_regs that is not completely filled. Acked-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-13bpf: Expose bpf_d_path helper to sleepable LSM hooksKP Singh1-1/+7
Sleepable hooks are never called from an NMI/interrupt context, so it is safe to use the bpf_d_path helper in LSM programs attaching to these hooks. The helper is not restricted to sleepable programs and merely uses the list of sleepable hooks as the initial subset of LSM hooks where it can be used. Signed-off-by: KP Singh <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-11-12bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TPMartin KaFai Lau1-0/+5
This patch enables the FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage_(get|delete) helper, so those tracing programs can access the sk's bpf_local_storage and the later selftest will show some examples. The bpf_sk_storage is currently used in bpf-tcp-cc, tc, cg sockops...etc which is running either in softirq or task context. This patch adds bpf_sk_storage_get_tracing_proto and bpf_sk_storage_delete_tracing_proto. They will check in runtime that the helpers can only be called when serving softirq or running in a task context. That should enable most common tracing use cases on sk. During the load time, the new tracing_allowed() function will ensure the tracing prog using the bpf_sk_storage_(get|delete) helper is not tracing any bpf_sk_storage*() function itself. The sk is passed as "void *" when calling into bpf_local_storage. This patch only allows tracing a kernel function. Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-11-10tracing: Fix some typos in commentsQiujun Huang15-25/+25
s/detetector/detector/ s/enfoced/enforced/ s/writen/written/ s/actualy/actually/ s/bascially/basically/ s/Regarldess/Regardless/ s/zeroes/zeros/ s/followd/followed/ s/incrememented/incremented/ s/separatelly/separately/ s/accesible/accessible/ s/sythetic/synthetic/ s/enabed/enabled/ s/heurisitc/heuristic/ s/assocated/associated/ s/otherwides/otherwise/ s/specfied/specified/ s/seaching/searching/ s/hierachry/hierarchy/ s/internel/internal/ s/Thise/This/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qiujun Huang <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-10ftrace: Remove unused varible 'ret'Alex Shi1-4/+2
'ret' in 2 functions are not used. and one of them is a void function. So remove them to avoid gcc warning: kernel/trace/ftrace.c:4166:6: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable] kernel/trace/ftrace.c:5571:6: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alex Shi <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-10ring-buffer: Add recording of ring buffer recursion into recursed_functionsSteven Rostedt (VMware)2-1/+25
Add a new config RING_BUFFER_RECORD_RECURSION that will place functions that recurse from the ring buffer into the ftrace recused_functions file. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-10fgraph: Make overruns 4 bytes in graph stack structureSteven Rostedt (VMware)2-3/+3
Inspecting the data structures of the function graph tracer, I found that the overrun value is unsigned long, which is 8 bytes on a 64 bit machine, and not only that, the depth is an int (4 bytes). The overrun can be simply an unsigned int (4 bytes) and pack the ftrace_graph_ret structure better. The depth is moved up next to the func, as it is used more often with func, and improves cache locality. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-09bpf: Fix passing zero to PTR_ERR() in bpf_btf_printf_prepareWang Qing1-1/+1
There is a bug when passing zero to PTR_ERR() and return. Fix the smatch error. Fixes: c4d0bfb45068 ("bpf: Add bpf_snprintf_btf helper") Signed-off-by: Wang Qing <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Yonghong Song <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-11-07Merge branch 'linus' into perf/kprobesIngo Molnar19-355/+1057
Conflicts: include/asm-generic/atomic-instrumented.h kernel/kprobes.c Use the upstream atomic-instrumented.h checksum, and pick the kprobes version of kernel/kprobes.c, which effectively reverts this upstream workaround: 645f224e7ba2: ("kprobes: Tell lockdep about kprobe nesting") Since the new code *should* be fine without nesting. Knock on wood ... Signed-off-by: Ingo Molnar <[email protected]>
2020-11-07Merge branch 'linus' into perf/kprobesIngo Molnar19-114/+146
Merge recent kprobes updates into perf/kprobes that came from -mm. Signed-off-by: Ingo Molnar <[email protected]>
2020-11-06bpf: Implement get_current_task_btf and RET_PTR_TO_BTF_IDKP Singh1-0/+16
The currently available bpf_get_current_task returns an unsigned integer which can be used along with BPF_CORE_READ to read data from the task_struct but still cannot be used as an input argument to a helper that accepts an ARG_PTR_TO_BTF_ID of type task_struct. In order to implement this helper a new return type, RET_PTR_TO_BTF_ID, is added. This is similar to RET_PTR_TO_BTF_ID_OR_NULL but does not require checking the nullness of returned pointer. Signed-off-by: KP Singh <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-11-06ftrace: Add recording of functions that caused recursionSteven Rostedt (VMware)8-7/+270
This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file "recursed_functions" all the functions that caused recursion while a callback to the function tracer was running. Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Guo Ren <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Cc: Kees Cook <[email protected]> Cc: Anton Vorontsov <[email protected]> Cc: Colin Cross <[email protected]> Cc: Tony Luck <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Joe Lawrence <[email protected]> Cc: Kamalesh Babulal <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-06ftrace: Reverse what the RECURSION flag means in the ftrace_opsSteven Rostedt (VMware)6-21/+13
Now that all callbacks are recursion safe, reverse the meaning of the RECURSION flag and rename it from RECURSION_SAFE to simply RECURSION. Now only callbacks that request to have recursion protecting it will have the added trampoline to do so. Also remove the outdated comment about "PER_CPU" when determining to use the ftrace_ops_assist_func. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Kamalesh Babulal <[email protected]> Cc: Petr Mladek <[email protected]> Cc: [email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-06perf/ftrace: Check for rcu_is_watching() in callback functionSteven Rostedt (VMware)1-1/+3
If a ftrace callback requires "rcu_is_watching", then it adds the FTRACE_OPS_FL_RCU flag and it will not be called if RCU is not "watching". But this means that it will use a trampoline when called, and this slows down the function tracing a tad. By checking rcu_is_watching() from within the callback, it no longer needs the RCU flag set in the ftrace_ops and it can be safely called directly. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-06perf/ftrace: Add recursion protection to the ftrace callbackSteven Rostedt (VMware)1-1/+8
If a ftrace callback does not supply its own recursion protection and does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will make a helper trampoline to do so before calling the callback instead of just calling the callback directly. The default for ftrace_ops is going to change. It will expect that handlers provide their own recursion protection, unless its ftrace_ops states otherwise. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-06ftrace: Add ftrace_test_recursion_trylock() helper functionSteven Rostedt (VMware)1-7/+5
To make it easier for ftrace callbacks to have recursion protection, provide a ftrace_test_recursion_trylock() and ftrace_test_recursion_unlock() helper that tests for recursion. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-06ftrace: Move the recursion testing into global headersSteven Rostedt (VMware)1-177/+0
Currently, if a callback is registered to a ftrace function and its ftrace_ops does not have the RECURSION flag set, it is encapsulated in a helper function that does the recursion for it. Really, all the callbacks should have their own recursion protection for performance reasons. But they should not all implement their own. Move the recursion helpers to global headers, so that all callbacks can use them. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-02tracing: Make -ENOMEM the default error for parse_synth_field()Steven Rostedt (VMware)1-10/+7
parse_synth_field() returns a pointer and requires that errors get surrounded by ERR_PTR(). The ret variable is initialized to zero, but should never be used as zero, and if it is, it could cause a false return code and produce a NULL pointer dereference. It makes no sense to set ret to zero. Set ret to -ENOMEM (the most common error case), and have any other errors set it to something else. This removes the need to initialize ret on *every* error branch. Fixes: 761a8c58db6b ("tracing, synthetic events: Replace buggy strcat() with seq_buf operations") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-02ring-buffer: Fix recursion protection transitions between interrupt contextSteven Rostedt (VMware)1-12/+46
The recursion protection of the ring buffer depends on preempt_count() to be correct. But it is possible that the ring buffer gets called after an interrupt comes in but before it updates the preempt_count(). This will trigger a false positive in the recursion code. Use the same trick from the ftrace function callback recursion code which uses a "transition" bit that gets set, to allow for a single recursion for to handle transitions between contexts. Cc: [email protected] Fixes: 567cd4da54ff4 ("ring-buffer: User context bit recursion checking") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-02tracing: Fix the checking of stackidx in __ftrace_trace_stackQiujun Huang1-2/+2
The array size is FTRACE_KSTACK_NESTING, so the index FTRACE_KSTACK_NESTING is illegal too. And fix two typos by the way. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qiujun Huang <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-02ftrace: Handle tracing when switching between contextSteven Rostedt (VMware)2-4/+28
When an interrupt or NMI comes in and switches the context, there's a delay from when the preempt_count() shows the update. As the preempt_count() is used to detect recursion having each context have its own bit get set when tracing starts, and if that bit is already set, it is considered a recursion and the function exits. But if this happens in that section where context has changed but preempt_count() has not been updated, this will be incorrectly flagged as a recursion. To handle this case, create another bit call TRANSITION and test it if the current context bit is already set. Flag the call as a recursion if the TRANSITION bit is already set, and if not, set it and continue. The TRANSITION bit will be cleared normally on the return of the function that set it, or if the current context bit is clear, set it and clear the TRANSITION bit to allow for another transition between the current context and an even higher one. Cc: [email protected] Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-02ftrace: Fix recursion check for NMI testSteven Rostedt (VMware)1-1/+2
The code that checks recursion will work to only do the recursion check once if there's nested checks. The top one will do the check, the other nested checks will see recursion was already checked and return zero for its "bit". On the return side, nothing will be done if the "bit" is zero. The problem is that zero is returned for the "good" bit when in NMI context. This will set the bit for NMIs making it look like *all* NMI tracing is recursing, and prevent tracing of anything in NMI context! The simple fix is to return "bit + 1" and subtract that bit on the end to get the real bit. Cc: [email protected] Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-02tracing: Fix out of bounds write in get_trace_bufQiujun Huang1-1/+1
The nesting count of trace_printk allows for 4 levels of nesting. The nesting counter starts at zero and is incremented before being used to retrieve the current context's buffer. But the index to the buffer uses the nesting counter after it was incremented, and not its original number, which in needs to do. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: 3d9622c12c887 ("tracing: Add barrier to trace_printk() buffer nesting modification") Signed-off-by: Qiujun Huang <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-10-30timekeeping: remove arch_gettimeoffsetArnd Bergmann1-2/+0
With Arm EBSA110 gone, nothing uses it any more, so the corresponding code and the Kconfig option can be removed. Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2020-10-28Merge tag 'trace-v5.10-rc1' of ↵Linus Torvalds1-14/+22
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix synthetic event "strcat" overrun New synthetic event code used strcat() and miscalculated the ending, causing the concatenation to write beyond the allocated memory. Instead of using strncat(), the code is switched over to seq_buf which has all the mechanisms in place to protect against writing more than what is allocated, and cleans up the code a bit" * tag 'trace-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing, synthetic events: Replace buggy strcat() with seq_buf operations
2020-10-27tracing, synthetic events: Replace buggy strcat() with seq_buf operationsSteven Rostedt (VMware)1-14/+22
There was a memory corruption bug happening while running the synthetic event selftests: kmemleak: Cannot insert 0xffff8c196fa2afe5 into the object search tree (overlaps existing) CPU: 5 PID: 6866 Comm: ftracetest Tainted: G W 5.9.0-rc5-test+ #577 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 Call Trace: dump_stack+0x8d/0xc0 create_object.cold+0x3b/0x60 slab_post_alloc_hook+0x57/0x510 ? tracing_map_init+0x178/0x340 __kmalloc+0x1b1/0x390 tracing_map_init+0x178/0x340 event_hist_trigger_func+0x523/0xa40 trigger_process_regex+0xc5/0x110 event_trigger_write+0x71/0xd0 vfs_write+0xca/0x210 ksys_write+0x70/0xf0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fef0a63a487 Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007fff76f18398 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000039 RCX: 00007fef0a63a487 RDX: 0000000000000039 RSI: 000055eb3b26d690 RDI: 0000000000000001 RBP: 000055eb3b26d690 R08: 000000000000000a R09: 0000000000000038 R10: 000055eb3b2cdb80 R11: 0000000000000246 R12: 0000000000000039 R13: 00007fef0a70b500 R14: 0000000000000039 R15: 00007fef0a70b700 kmemleak: Kernel memory leak detector disabled kmemleak: Object 0xffff8c196fa2afe0 (size 8): kmemleak: comm "ftracetest", pid 6866, jiffies 4295082531 kmemleak: min_count = 1 kmemleak: count = 0 kmemleak: flags = 0x1 kmemleak: checksum = 0 kmemleak: backtrace: __kmalloc+0x1b1/0x390 tracing_map_init+0x1be/0x340 event_hist_trigger_func+0x523/0xa40 trigger_process_regex+0xc5/0x110 event_trigger_write+0x71/0xd0 vfs_write+0xca/0x210 ksys_write+0x70/0xf0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The cause came down to a use of strcat() that was adding an string that was shorten, but the strcat() did not take that into account. strcat() is extremely dangerous as it does not care how big the buffer is. Replace it with seq_buf operations that prevent the buffer from being overwritten if what is being written is bigger than the buffer. Fixes: 10819e25799a ("tracing: Handle synthetic event array field type checking correctly") Reviewed-by: Tom Zanussi <[email protected]> Tested-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-10-25treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches2-2/+2
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/[email protected]/2-convert_section.pl Signed-off-by: Joe Perches <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Reviewed-by: Miguel Ojeda <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-10-23Merge tag 'trace-v5.10-3' of ↵Linus Torvalds1-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing ring-buffer fix from Steven Rostedt: "The success return value of ring_buffer_resize() is stated to be zero and checked that way. But it was incorrectly returning the size allocated. Also, a fix to a comment" * tag 'trace-v5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Update the description for ring_buffer_wait ring-buffer: Return 0 on success from ring_buffer_resize()
2020-10-22ring-buffer: Update the description for ring_buffer_waitQiujun Huang1-1/+1
The function changed at some point, but the description was not updated. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qiujun Huang <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-10-22ring-buffer: Return 0 on success from ring_buffer_resize()Qiujun Huang1-4/+4
We don't need to check the new buffer size, and the return value had confused resize_buffer_duplicate_size(). ... ret = ring_buffer_resize(trace_buf->buffer, per_cpu_ptr(size_buf->data,cpu_id)->entries, cpu_id); if (ret == 0) per_cpu_ptr(trace_buf->data, cpu_id)->entries = per_cpu_ptr(size_buf->data, cpu_id)->entries; ... Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: d60da506cbeb3 ("tracing: Add a resize function to make one buffer equivalent to another buffer") Signed-off-by: Qiujun Huang <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-10-16Merge tag 'trace-v5.10-2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix mismatch section of adding early trace events. Fixes the issue of a mismatch section that was missed due to gcc inlining the offending function, while clang did not (and reported the issue)" * tag 'trace-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Remove __init from __trace_early_add_new_event()
2020-10-16tracing: Remove __init from __trace_early_add_new_event()Masami Hiramatsu1-1/+1
The commit 720dee53ad8d ("tracing/boot: Initialize per-instance event list in early boot") removes __init from __trace_early_add_events() but __trace_early_add_new_event() still has __init and will cause a section mismatch. Remove __init from __trace_early_add_new_event() as same as __trace_early_add_events(). Link: https://lore.kernel.org/lkml/CAHk-=wjU86UhovK4XuwvCqTOfc+nvtpAuaN2PJBz15z=w=u0Xg@mail.gmail.com/ Reported-by: Linus Torvalds <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-10-15Merge tag 'net-next-5.10' of ↵Linus Torvalds1-7/+165
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: - Add redirect_neigh() BPF packet redirect helper, allowing to limit stack traversal in common container configs and improving TCP back-pressure. Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain. - Expand netlink policy support and improve policy export to user space. (Ge)netlink core performs request validation according to declared policies. Expand the expressiveness of those policies (min/max length and bitmasks). Allow dumping policies for particular commands. This is used for feature discovery by user space (instead of kernel version parsing or trial and error). - Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge. - Allow more than 255 IPv4 multicast interfaces. - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK packets of TCPv6. - In Multi-patch TCP (MPTCP) support concurrent transmission of data on multiple subflows in a load balancing scenario. Enhance advertising addresses via the RM_ADDR/ADD_ADDR options. - Support SMC-Dv2 version of SMC, which enables multi-subnet deployments. - Allow more calls to same peer in RxRPC. - Support two new Controller Area Network (CAN) protocols - CAN-FD and ISO 15765-2:2016. - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit kernel problem. - Add TC actions for implementing MPLS L2 VPNs. - Improve nexthop code - e.g. handle various corner cases when nexthop objects are removed from groups better, skip unnecessary notifications and make it easier to offload nexthops into HW by converting to a blocking notifier. - Support adding and consuming TCP header options by BPF programs, opening the doors for easy experimental and deployment-specific TCP option use. - Reorganize TCP congestion control (CC) initialization to simplify life of TCP CC implemented in BPF. - Add support for shipping BPF programs with the kernel and loading them early on boot via the User Mode Driver mechanism, hence reusing all the user space infra we have. - Support sleepable BPF programs, initially targeting LSM and tracing. - Add bpf_d_path() helper for returning full path for given 'struct path'. - Make bpf_tail_call compatible with bpf-to-bpf calls. - Allow BPF programs to call map_update_elem on sockmaps. - Add BPF Type Format (BTF) support for type and enum discovery, as well as support for using BTF within the kernel itself (current use is for pretty printing structures). - Support listing and getting information about bpf_links via the bpf syscall. - Enhance kernel interfaces around NIC firmware update. Allow specifying overwrite mask to control if settings etc. are reset during update; report expected max time operation may take to users; support firmware activation without machine reboot incl. limits of how much impact reset may have (e.g. dropping link or not). - Extend ethtool configuration interface to report IEEE-standard counters, to limit the need for per-vendor logic in user space. - Adopt or extend devlink use for debug, monitoring, fw update in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx, dpaa2-eth). - In mlxsw expose critical and emergency SFP module temperature alarms. Refactor port buffer handling to make the defaults more suitable and support setting these values explicitly via the DCBNL interface. - Add XDP support for Intel's igb driver. - Support offloading TC flower classification and filtering rules to mscc_ocelot switches. - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as fixed interval period pulse generator and one-step timestamping in dpaa-eth. - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3) offload. - Add Lynx PHY/PCS MDIO module, and convert various drivers which have this HW to use it. Convert mvpp2 to split PCS. - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as 7-port Mediatek MT7531 IP. - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver, and wcn3680 support in wcn36xx. - Improve performance for packets which don't require much offloads on recent Mellanox NICs by 20% by making multiple packets share a descriptor entry. - Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto subtree to drivers/net. Move MDIO drivers out of the phy directory. - Clean up a lot of W=1 warnings, reportedly the actively developed subsections of networking drivers should now build W=1 warning free. - Make sure drivers don't use in_interrupt() to dynamically adapt their code. Convert tasklets to use new tasklet_setup API (sadly this conversion is not yet complete). * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits) Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" net, sockmap: Don't call bpf_prog_put() on NULL pointer bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo bpf, sockmap: Add locking annotations to iterator netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements net: fix pos incrementment in ipv6_route_seq_next net/smc: fix invalid return code in smcd_new_buf_create() net/smc: fix valid DMBE buffer sizes net/smc: fix use-after-free of delayed events bpfilter: Fix build error with CONFIG_BPFILTER_UMH cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info bpf: Fix register equivalence tracking. rxrpc: Fix loss of final ack on shutdown rxrpc: Fix bundle counting for exclusive connections netfilter: restore NF_INET_NUMHOOKS ibmveth: Identify ingress large send packets. ibmveth: Switch order of ibmveth_helper calls. cxgb4: handle 4-tuple PEDIT to NAT mode translation selftests: Add VRF route leaking tests ...