Age | Commit message (Collapse) | Author | Files | Lines |
|
As instruction tracking updates the type state for each register, check
the final type info for the target register at the given instruction.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If it failed to find a variable for the location directly, it might be
due to a missing variable in the source code. For example, accessing
pointer variables in a chain can result in the case like below:
struct foo *foo = ...;
int i = foo->bar->baz;
The DWARF debug information is created for each variable so it'd have
one for 'foo'. But there's no variable for 'foo->bar' and then it
cannot know the type of 'bar' and 'baz'.
The above source code can be compiled to the follow x86 instructions:
mov 0x8(%rax), %rcx
mov 0x4(%rcx), %rdx <=== PMU sample
mov %rdx, -4(%rbp)
Let's say 'foo' is located in the %rax and it has a pointer to struct
foo. But perf sample is captured in the second instruction and there
is no variable or type info for the %rcx.
It'd be great if compiler could generate debug info for %rcx, but we
should handle it on our side. So this patch implements the logic to
iterate instructions and update the type table for each location.
As it already collected a list of scopes including the target
instruction, we can use it to construct the type table smartly.
+---------------- scope[0] subprogram
|
| +-------------- scope[1] lexical_block
| |
| | +------------ scope[2] inlined_subroutine
| | |
| | | +---------- scope[3] inlined_subroutine
| | | |
| | | | +-------- scope[4] lexical_block
| | | | |
| | | | | *** target instruction
...
Image the target instruction has 5 scopes, each scope will have its own
variables and parameters. Then it can start with the innermost scope
(4). So it'd search the shortest path from the start of scope[4] to
the target address and build a list of basic blocks. Then it iterates
the basic blocks with the variables in the scope and update the table.
If it finds a type at the target instruction, then returns it.
Otherwise, it moves to the upper scope[3]. Now it'd search the shortest
path from the start of scope[3] to the start of scope[4]. Then connect
it to the existing basic block list. Then it'd iterate the blocks with
variables for both scopes. It can repeat this until it finds a type at
the target instruction or reaches to the top scope[0].
As the basic blocks contain the shortest path, it won't worry about
branches and can update the table simply.
The final check will be done by find_matching_type() in the next patch.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When updating instruction states, the call instruction should play a
role since it changes the register states. For simplicity, mark some
registers as caller-saved registers (should be arch-dependent), and
invalidate them all after a function call.
If the function returns something, the designated register (ret_reg)
will have the type info.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When updating the instruction states, it also needs to handle global
variable accesses. Same as it does for PC-relative addressing, it can
look up the type by address (if it's defined in the same file), or by
name after finding the symbol by address (for declarations).
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Accessing global variable is common when it tracks execution later.
Factor out the common code into a function for later use.
It adds thread and cpumode to struct data_loc_info to find (global)
symbols if needed. Also remove var_name as it's retrieved in the
helper function.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The update_insn_state() function is to update the type state table after
processing each instruction. For now, it handles MOV (on x86) insn
to transfer type info from the source location to the target.
The location can be a register or a stack slot. Check carefully when
memory reference happens and fetch the type correctly. It basically
ignores write to a memory since it doesn't change the type info. One
exception is writes to (new) stack slots for register spilling.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
As it collected basic block and variable information in each scope, it
now can build a state table to find matching variable at the location.
The struct type_state is to keep the type info saved in each register
and stack slot. The update_var_state() updates the table when it finds
variables in the current address. It expects die_collect_vars() filled
a list of variables with type info and starting address.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a new debug option "type-profile" to enable the detailed info during
the type analysis especially for instruction tracking. You can use this
before the command name like 'report' or 'annotate'.
$ perf --debug type-profile annotate --data-type
Committer testing:
First get some memory events:
$ perf mem record ls
Then, without data-type profiling debug:
$ perf annotate --data-type | head
Annotate type: 'struct rtld_global' in /usr/lib64/ld-linux-x86-64.so.2 (1 samples):
============================================================================
samples offset size field
1 0 4336 struct rtld_global {
0 0 0 struct link_namespaces* _dl_ns;
0 2560 8 size_t _dl_nns;
0 2568 40 __rtld_lock_recursive_t _dl_load_lock {
0 2568 40 pthread_mutex_t mutex {
0 2568 40 struct __pthread_mutex_s __data {
0 2568 4 int __lock;
$
And with only data-type profiling:
$ perf --debug type-profile annotate --data-type | head
-----------------------------------------------------------
find_data_type_die [1e67] for reg13873052 (PC) offset=0x150e2 in dl_main
CU die offset: 0x29cd3
found PC-rel by addr=0x34020 offset=0x20
-----------------------------------------------------------
find_data_type_die [2e] for reg12 offset=0 in __GI___readdir64
CU die offset: 0x137a45
frame base: cfa=1 fbreg=-1
found "__futex" in scope=2/2 (die: 0x137ad5) 0(reg12) type=int (die:2a)
-----------------------------------------------------------
find_data_type_die [52] for reg5 offset=0 in __memmove_avx_unaligned_erms
CU die offset: 0x1124ed
no variable found
Annotate type: 'struct rtld_global' in /usr/lib64/ld-linux-x86-64.so.2 (1 samples):
============================================================================
samples offset size field
1 0 4336 struct rtld_global {
0 0 0 struct link_namespaces* _dl_ns;
0 2560 8 size_t _dl_nns;
0 2568 40 __rtld_lock_recursive_t _dl_load_lock {
0 2568 40 pthread_mutex_t mutex {
0 2568 40 struct __pthread_mutex_s __data {
0 2568 4 int __lock;
$
Signed-off-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The annotate_get_basic_blocks() is to find a list of basic blocks from
the source instruction to the destination instruction in a function.
It'll be used to find variables in a scope. Use BFS (Breadth First
Search) to find a shortest path to carry the variable/register state
minimally.
Also change find_disasm_line() to be used in annotate_get_basic_blocks()
and add 'allow_update' argument to control if it can update the IP.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The find_data_type() needs many information to describe the location of
the data. Add the new 'struct data_loc_info' to pass those information at
once.
No functional changes intended.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Sometimes we want to convert an address in objdump output to
map-relative address to match with a sample data. Let's add
map__objdump_2rip() for that.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The die_find_func_rettype() is to find a debug entry for the given
function name and sets the type information of the return value. By
convention, it'd return the pointer to the type die (should be the
same as the given mem_die argument) if found, or NULL otherwise.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
We want to track type states as instructions are executed. Each
instruction can access compound types like struct or union and load/
store its members to a different location.
The die_deref_ptr_type() is to find a type of memory access with a
pointer variable. If it points to a compound type like struct, the
target memory is a member in the struct. The access will happen with an
offset indicating which member it refers. Let's follow the DWARF info
to figure out the type of the pointer target.
For example, say we have the following code.
struct foo {
int a;
int b;
};
struct foo *p = malloc(sizeof(*p));
p->b = 0;
The last pointer access should produce x86 asm like below:
mov 0x0, 4(%rbx)
And we know %rbx register has a pointer to struct foo. Then offset 4
should return the debug info of member 'b'.
Also variables of compound types can be accessed directly without a
pointer. The die_get_member_type() is to handle a such case.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Check if die_get_real_type() returned NULL ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The die_collect_vars() is to find all variable information in the scope
including function parameters. The struct die_var_type is to save the
type of the variable with the location (reg and offset) as well as where
it's defined in the code (addr).
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It's not used, let's get rid of it.
Signed-off-by: Namhyung Kim <[email protected]>
Reviewed-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Rather than manually iterating the CPU map, use
perf_cpu_map__for_each_cpu(). When possible tidy local variables.
Reviewed-by: James Clark <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andrew Jones <[email protected]>
Cc: André Almeida <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Paran Lee <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steinar H. Gunderson <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Yang Li <[email protected]>
Cc: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use libperf's perf_cpu_map__equal() that performs the same function.
Reviewed-by: James Clark <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andrew Jones <[email protected]>
Cc: André Almeida <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Paran Lee <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steinar H. Gunderson <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Yang Li <[email protected]>
Cc: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In both cases the CPU map is known owned by either the caller or a
PMU.
Reviewed-by: James Clark <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andrew Jones <[email protected]>
Cc: André Almeida <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Paran Lee <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steinar H. Gunderson <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Yang Li <[email protected]>
Cc: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Most uses of what was perf_cpu_map__empty but is now
perf_cpu_map__has_any_cpu_or_is_empty want to do something with the
CPU map if it contains CPUs. Replace uses of
perf_cpu_map__has_any_cpu_or_is_empty with other helpers so that CPUs
within the map can be handled.
Reviewed-by: James Clark <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andrew Jones <[email protected]>
Cc: André Almeida <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Paran Lee <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steinar H. Gunderson <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Yang Li <[email protected]>
Cc: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Switch perf_cpu_map__has_any_cpu_or_is_empty() to
perf_cpu_map__is_any_cpu_or_is_empty() as a CPU map may contain CPUs as
well as the dummy event and perf_cpu_map__is_any_cpu_or_is_empty() is a
more correct alternative.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andrew Jones <[email protected]>
Cc: André Almeida <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Paran Lee <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steinar H. Gunderson <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Yang Li <[email protected]>
Cc: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Rather than iterate all CPUs and see if they are in CPU maps, directly
iterate the CPU map. Similarly make use of the intersect function
taking care for when "any" CPU is specified. Switch
perf_cpu_map__has_any_cpu_or_is_empty() to more appropriate
alternatives.
Reviewed-by: James Clark <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andrew Jones <[email protected]>
Cc: André Almeida <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Paran Lee <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steinar H. Gunderson <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Yang Li <[email protected]>
Cc: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Potential corner cases could cause a cpumap to be allocated with size
0, but an empty cpumap should be represented as NULL. Add a path in
perf_cpu_map__alloc() to ensure this.
Suggested-by: James Clark <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andrew Jones <[email protected]>
Cc: André Almeida <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Paran Lee <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steinar H. Gunderson <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Yang Li <[email protected]>
Cc: Yanteng Si <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Additional helpers to better replace perf_cpu_map__has_any_cpu_or_is_empty().
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Andrew Jones <[email protected]>
Cc: André Almeida <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Paran Lee <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Steinar H. Gunderson <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Yang Li <[email protected]>
Cc: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It seems that a previous modification to sysreg-defs, which corrected
emitting the header to the specified output directory, exposed missing
subdir, prefix variables.
This breaks out of tree builds of perf as the file is now built into the
output directory, but still tries to descend into output directory as a
subdir.
Fixes: a29ee6aea7030786 ("perf build: Ensure sysreg-defs Makefile respects output dir")
Reviewed-by: Oliver Upton <[email protected]>
Reviewed-by: Tycho Andersen <[email protected]>
Signed-off-by: Ethan Adams <[email protected]>
Tested-by: Tycho Andersen <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If the --itrace option is used more than once, the options are
combined, but "i" and "y" (sub-)options can be corrupted because
itrace_do_parse_synth_opts() incorrectly overwrites the period type and
period with default values.
For example, with:
--itrace=i0ns --itrace=e
The processing of "--itrace=e", resets the "i" period from 0 nanoseconds
to the default 100 microseconds.
Fix by performing the default setting of period type and period only if
"i" or "y" are present in the currently processed --itrace value.
Fixes: f6986c95af84ff2a ("perf session: Add instruction tracing options")
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The trace could be misleading if trace errors are not taken into
account, so display them also by adding the itrace "e" option.
Note --call-trace and --call-ret-trace already add the itrace "e"
option.
Fixes: b585ebdb5912cf14 ("perf script: Add --insn-trace for instruction decoding")
Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The question of exactly when KPTI needs to be disabled comes up a lot
because it doesn't always need to be done. Add the relevant kernel
function and some examples that describe the behavior.
Also describe the interrupt requirement and that no error message will
be printed if this isn't met.
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: James Clark <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
defines
These were used to build perf to provide defines not available in older
distros, but this was back in 2017, nowadays most the distros that are
supported and I have build containers for work using just the system
headers, so ditch them.
For the few that don't have STATX_MNT_ID{_UNIQUE}, or STATX_MNT_DIOALIGN
add them conditionally.
Some of these older distros may not have things that are used in 'perf
trace', but then they also don't have libtraceevent packages, so don't
build 'perf trace'.
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
asm/fcntl.h
These were used to build perf to provide defines not available in older
distros, but this was back in 2017, nowadays all the distros that are
supported and I have build containers for work using just the system
headers, so ditch them.
Some of these older distros may not have things that are used in 'perf
trace', but then they also don't have libtraceevent packages, so don't
build 'perf trace'.
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Builds ok all the way back to these older distros:
1 almalinux:8 : Ok gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20) , clang version 16.0.6 (Red Hat 16.0.6-2.module_el8.9.0+3621+df7f7146) flex 2.6.1
3 alpine:3.15 : Ok gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027 , Alpine clang version 12.0.1 flex 2.6.4
15 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0 , Debian clang version 11.0.1-2~deb10u1 flex 2.6.4
32 opensuse:15.4 : Ok gcc (SUSE Linux) 7.5.0 , clang version 15.0.7 flex 2.6.4
23 fedora:35 : Ok gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-3) , clang version 13.0.1 (Fedora 13.0.1-1.fc35) flex 2.6.4
38 ubuntu:18.04 : Ok gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 flex 2.6.4
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
directory used to build perf
It is used only to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/{include,arch}/ hierarchies, that is used
just for scraping.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming
from either 'make install_headers' or from the system /usr/include/
directory.
Suggested-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use the system one, nothing used in that file isn't available in the
supported, active distros.
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
To: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
directory used to build perf
It is used only to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/include/ hierarchy, that is used just for
scraping.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
No other tools/ living code uses it.
Suggested-by: Ian Rogers <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
build perf
It is used only to generate string tables, not to build perf, so move it
to the tools/perf/trace/beauty/include/ hierarchy, that is used just for
scraping.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
Suggested-by: Ian Rogers <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
to build perf
It is mostly used only to generate string tables, not to build perf, so
move it to the tools/perf/trace/beauty/include/ hierarchy, that is used
just for scraping.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
No other tools/ living code uses it, just <linux/usbdevice_fs.h> coming
from either 'make install_headers' or from the system /usr/include/
directory.
Suggested-by: Ian Rogers <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
perf
It is mostly used only to generate string tables, not to build perf, so
move it to the tools/perf/trace/beauty/include/ hierarchy, that is used
just for scraping.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
No other tools/ living code uses it, just <linux/mount.h> coming from
either 'make install_headers' or from the system /usr/include/
directory.
Suggested-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The tools/include/uapi/linux/mount.h file is mostly used for scrapping
defines into id->string tables, this is the only place were it is being
directly used, stop doing so.
Define MOUNT_ATTR_RELATIME and MOUNT_ATTR__ATIME if not available in the
system's headers.
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It is mostly used only to generate string tables, not to build perf, so
move it to the tools/perf/trace/beauty/include/ hierarchy, that is used
just for scraping.
The only case where it was being used to build was in
tools/perf/trace/beauty/sync_file_range.c, because some older systems
doesn't have the SYNC_FILE_RANGE_WRITE_AND_WAIT define, just use the
system's linux/fs.h header instead, defining it if not available.
This is a something that should've have happened, as happened with the
linux/socket.h scrapper, do it now as Ian suggested while doing an
audit/refactor session in the headers used by perf.
No other tools/ living code uses it, just <linux/fs.h> coming from
either 'make install_headers' or from the system /usr/include/
directory.
Suggested-by: Ian Rogers <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Several such tables were depending on uapi/linux/fs.h, cut and paste
error when they were introduced, fix it.
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/Ze9vjxv42PN_QGZv@x1
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
s/dont/don\'t/
Signed-off-by: Bhaskar Chowdhury <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
That is a 'struct timespec' passed from userspace to the kernel as we
can see with a system wide syscall tracing:
root@number:~# perf trace -e nanosleep
0.000 (10.102 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
38.924 (10.077 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
100.177 (10.107 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
139.171 (10.063 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
200.603 (10.105 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
239.399 (10.064 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
300.994 (10.096 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
339.584 (10.067 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
401.335 (10.057 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
439.758 (10.166 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
501.814 (10.110 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
539.983 (10.227 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
602.284 (10.199 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
640.208 (10.105 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
702.662 (10.163 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
740.440 (10.107 ms): podman/2195174 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
802.993 (10.159 ms): podman/9150 nanosleep(rqtp: { .tv_sec: 0, .tv_nsec: 10000000 }) = 0
^Croot@number:~# strace -p 9150 -e nanosleep
If we then use the ptrace method to look at that podman process:
root@number:~# strace -p 9150 -e nanosleep
strace: Process 9150 attached
nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
^Cstrace: Process 9150 detached
root@number:~#
With some changes we can get something closer to the strace output,
still in system wide mode:
root@number:~# perf config trace.show_arg_names=false
root@number:~# perf config trace.show_duration=false
root@number:~# perf config trace.show_timestamp=false
root@number:~# perf config trace.show_zeros=true
root@number:~# perf config trace.args_alignment=0
root@number:~# perf trace -e nanosleep --max-events=10
podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
podman/2195174 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
podman/9150 nanosleep({ .tv_sec: 0, .tv_nsec: 10000000 }, NULL) = 0
root@number:~#
root@number:~# perf config
trace.show_arg_names=false
trace.show_duration=false
trace.show_timestamp=false
trace.show_zeros=true
trace.args_alignment=0
root@number:~# cat ~/.perfconfig
# this file is auto-generated.
[trace]
show_arg_names = false
show_duration = false
show_timestamp = false
show_zeros = true
args_alignment = 0
root@number:~#
This will not get reused by any other syscall as nanosleep is the only
one to have as its first argument a 'struct timespec" pointer argument
passed from userspace to the kernel:
root@number:~# grep timespec /sys/kernel/tracing/events/syscalls/sys_enter_*/format | grep offset:16
/sys/kernel/tracing/events/syscalls/sys_enter_nanosleep/format: field:struct __kernel_timespec * rqtp; offset:16; size:8; signed:0;
root@number:~#
BTF based pretty printing will simplify all this, but then lets just get
the low hanging fruits first.
Reviewed-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Pull smb server updates from Steve French:
- add support for durable file handles (an important data integrity
feature)
- fixes for potential out of bounds issues
- fix possible null dereference in close
- getattr fixes
- trivial typo fix and minor cleanup
* tag 'v6.9-rc-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: remove module version
ksmbd: fix potencial out-of-bounds when buffer offset is invalid
ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16()
ksmbd: Fix spelling mistake "connction" -> "connection"
ksmbd: fix possible null-deref in smb_lazy_parent_lease_break_close
ksmbd: add support for durable handles v1/v2
ksmbd: mark SMB2_SESSION_EXPIRED to session when destroying previous session
ksmbd: retrieve number of blocks using vfs_getattr in set_file_allocation_info
ksmbd: replace generic_fillattr with vfs_getattr
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull trace tool updates from Steven Rostedt:
"Tracing:
- Update makefiles for latency-collector and RTLA, using tools/build/
makefiles like perf does, inheriting its benefits. For example,
having a proper way to handle library dependencies.
- The timerlat tracer has an interface for any tool to use. rtla
timerlat tool uses this interface dispatching its own threads as
workload. But, rtla timerlat could also be used for any other
process. So, add 'rtla timerlat -U' option, allowing the timerlat
tool to measure the latency of any task using the timerlat tracer
interface.
Verification:
- Update makefiles for verification/rv, using tools/build/ makefiles
like perf does, inheriting its benefits. For example, having a
proper way to handle dependencies"
* tag 'trace-tools-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tools/rtla: Add -U/--user-load option to timerlat
tools/verification: Use tools/build makefiles on rv
tools/rtla: Use tools/build makefiles to build rtla
tools/tracing: Use tools/build makefiles on latency-collector
|
|
Pull more documentation updates from Jonathan Corbet:
"A handful of late-arriving documentation fixes and enhancements"
* tag 'docs-6.9-2' of git://git.lwn.net/linux:
docs: verify/bisect: remove a level of indenting
docs: verify/bisect: drop 'v' prefix, EOL aspect, and assorted fixes
docs: verify/bisect: check taint flag
docs: verify/bisect: improve install instructions
docs: handling-regressions.rst: Update regzbot command fixed-by to fix
docs: *-regressions.rst: Add colon to regzbot commands
doc: Fix typo in admin-guide/cifs/introduction.rst
README: Fix spelling
|
|
The timerlat tracer provides an interface for any application to wait
for the timerlat's periodic wakeup. Currently, rtla timerlat uses it
to dispatch its user-space workload (-u option).
But as the tracer interface is generic, rtla timerlat can also be used
to monitor any workload that uses it. For example, a user might
place their own workload to wait on the tracer interface, and
monitor the results with rtla timerlat.
Add the -U option to rtla timerlat top and hist. With this option, rtla
timerlat will not dispatch its workload but only setting up the
system, waiting for a user to dispatch its workload.
The sample code in this patch is an example of python application
that loops in the timerlat tracer fd.
To use it, dispatch:
# rtla timerlat -U
In a terminal, then run the python program on another terminal,
specifying the CPU to run it. For example, setting on CPU 1:
#./timerlat_load.py 1
Then rtla timerlat will start printing the statistics of the
./timerlat_load.py app.
An interesting point is that the "Ret user Timer Latency" value
is the overall response time of the load. The sample load does
a memory copy to exemplify that.
The stop tracing options on rtla timerlat works in this setup
as well, including auto analysis.
Link: https://lkml.kernel.org/r/36e6bcf18fe15c7601048fd4c65aeb193c502cc8.1707229706.git.bristot@kernel.org
Cc: Jonathan Corbet <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
|
|
Use tools/build/ makefiles to build rv, inheriting the benefits of
it. For example, having a proper way to handle dependencies.
Link: https://lkml.kernel.org/r/2a38a8f7b8dc65fa790381ec9ab42fb62beb2e25.1710519524.git.bristot@kernel.org
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: John Kacur <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
|
|
Use tools/build/ makefiles to build rtla, inheriting the benefits of
it. For example, having a proper way to handle dependencies.
rtla is built using perf infra-structure when building inside the
kernel tree.
At this point, rtla diverges from perf in two points: Documentation
and tarball generation/build.
At the documentation level, rtla is one step ahead, placing the
documentation at Documentation/tools/rtla/, using the same build
tools as kernel documentation. The idea is to move perf
documentation to the same scheme and then share the same makefiles.
rtla has a tarball target that the (old) RHEL8 uses. The tarball was
kept using a simple standalone makefile for compatibility. The
standalone makefile shares most of the code, e.g., flags, with
regular buildings.
The tarball method was set as deprecated. If necessary, we can make
a rtla tarball like perf, which includes the entire tools/build.
But this would also require changes in the user side (the directory
structure changes, and probably the deps to build the package).
Inspired on perf and objtool.
Link: https://lkml.kernel.org/r/57563abf2715d22515c0c54a87cff3849eca5d52.1710519524.git.bristot@kernel.org
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: John Kacur <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
|
|
Use tools/build/ makefiles to build latency-collector, inheriting
the benefits of it. For example: Before this patch, a missing
tracefs/traceevents headers will result in fail like this:
~/linux/tools/tracing/latency $ make
cc -Wall -Wextra -g -O2 -o latency-collector latency-collector.c -lpthread
latency-collector.c:26:10: fatal error: tracefs.h: No such file or directory
26 | #include <tracefs.h>
| ^~~~~~~~~~~
compilation terminated.
make: *** [Makefile:14: latency-collector] Error 1
Which is not that helpful. After this change it reports:
~/linux/tools/tracing/latency# make
Auto-detecting system features:
... libtraceevent: [ OFF ]
... libtracefs: [ OFF ]
libtraceevent is missing. Please install libtraceevent-dev/libtraceevent-devel
libtracefs is missing. Please install libtracefs-dev/libtracefs-devel
Makefile.config:29: *** Please, check the errors above.. Stop.
This type of output is common across other tools in tools/ like perf
and objtool.
Link: https://lkml.kernel.org/r/872420b0880b11304e4ba144a0086c6478c5b469.1710519524.git.bristot@kernel.org
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: John Kacur <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
|
|
Pull bcachefs fixes from Kent Overstreet:
"Assorted bugfixes.
Most are fixes for simple assertion pops; the most significant fix is
for a deadlock in recovery when we have to rewrite large numbers of
btree nodes to fix errors. This was incorrectly running out of the
same workqueue as the core interior btree update path - we now give it
its own single threaded workqueue.
This was visible to users as "bch2_btree_update_start(): error:
BCH_ERR_journal_reclaim_would_deadlock" - and then recovery hanging"
* tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefs:
bcachefs: Fix lost wakeup on journal shutdown
bcachefs; Fix deadlock in bch2_btree_update_start()
bcachefs: ratelimit errors from async_btree_node_rewrite
bcachefs: Run check_topology() first
bcachefs: Improve bch2_fatal_error()
bcachefs: Fix lost transaction restart error
bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info
bcachefs: fix for building in userspace
bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recovery
bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init()
bcachefs: Improve sysfs internal/btree_updates
bcachefs: Split out btree_node_rewrite_worker
bcachefs: Fix locking in bch2_alloc_write_key()
bcachefs: Avoid extent entry type assertions in .invalid()
bcachefs: Fix spurious -BCH_ERR_transaction_restart_nested
bcachefs: Fix check_key_has_snapshot() call
bcachefs: Change "accounting overran journal reservation" to a warning
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull more ARM SoC updates from Arnd Bergmann:
"These are changes that for some reason ended up not making it into the
first four branches but that should still make it into 6.9:
- A rework of the omap clock support that touches both drivers and
device tree files
- The reset controller branch changes that had a dependency on late
bugfixes. Merging them here avoids a backmerge of 6.8-rc5 into the
drivers branch
- The RISC-V/starfive, RISC-V/microchip and ARM/Broadcom devicetree
changes that got delayed and needed some extra time in linux-next
for wider testing"
* tag 'soc-late-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (31 commits)
soc: fsl: dpio: fix kcalloc() argument order
bus: ts-nbus: Improve error reporting
bus: ts-nbus: Convert to atomic pwm API
riscv: dts: starfive: jh7110: Add camera subsystem nodes
ARM: bcm: stop selecing CONFIG_TICK_ONESHOT
ARM: dts: omap3: Update clksel clocks to use reg instead of ti,bit-shift
ARM: dts: am3: Update clksel clocks to use reg instead of ti,bit-shift
clk: ti: Improve clksel clock bit parsing for reg property
clk: ti: Handle possible address in the node name
dt-bindings: pwm: opencores: Add compatible for StarFive JH8100
dt-bindings: riscv: cpus: reg matches hart ID
reset: Instantiate reset GPIO controller for shared reset-gpios
reset: gpio: Add GPIO-based reset controller
cpufreq: do not open-code of_phandle_args_equal()
of: Add of_phandle_args_equal() helper
reset: simple: add support for Sophgo SG2042
dt-bindings: reset: sophgo: support SG2042
riscv: dts: microchip: add specific compatible for mpfs pdma
riscv: dts: microchip: add missing CAN bus clocks
ARM: brcmstb: Add debug UART entry for 74165
...
|