Age | Commit message (Collapse) | Author | Files | Lines |
|
The 'Session topology' test currently fails with this message when
evlist__new_default() opens more than one event:
32: Session topology :
--- start ---
templ file: /tmp/perf-test-vv5YzZ
Using CPUID 0x00000000410fd070
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xb00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 4
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 5
non matching sample_type
FAILED tests/topology.c:73 can't get session
---- end ----
Session topology: FAILED!
This is because when re-opening the file and parsing the header, Perf
expects that any file that has more than one event has the sample ID
flag set. Perf record already sets the flag in a similar way when there
is more than one event, so add the same logic to evlist__new_default().
evlist__new_default() is only currently used in tests, so I don't
expect this change to have any other side effects. The other tests that
use it don't save and re-open the file so don't hit this issue.
The session topology test has been failing on Arm big.LITTLE platforms
since commit 251aa040244a ("perf parse-events: Wildcard most
"numeric" events") when evlist__new_default() started opening multiple
events for 'cycles'.
Fixes: 251aa040244a ("perf parse-events: Wildcard most "numeric" events")
Closes: https://lore.kernel.org/lkml/CAP-5=fWVQ-7ijjK3-w1q+k2WYVNHbAcejb-xY0ptbjRw476VKA@mail.gmail.com/
Tested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20240124094358.489372-1-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The number of mem PMUs can be calculated by searching the
perf_pmus__scan_mem().
Remove the ARCH specific perf_pmus__num_mem_pmus()
Tested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: ravi.bangoria@amd.com
Cc: james.clark@arm.com
Cc: will@kernel.org
Cc: mike.leach@linaro.org
Cc: renyu.zj@linux.alibaba.com
Cc: yuhaixin.yhx@linux.alibaba.com
Cc: tmricht@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: john.g.garry@oracle.com
Link: https://lore.kernel.org/r/20240123185036.3461837-8-kan.liang@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The current code iterates all memory PMUs. It doesn't matter if the
system has only one memory PMU or multiple PMUs. The check of
perf_pmus__num_mem_pmus() is not required anymore.
The rec_tmp is not used in c2c and mem. Removing them as well.
Suggested-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: ravi.bangoria@amd.com
Cc: james.clark@arm.com
Cc: will@kernel.org
Cc: mike.leach@linaro.org
Cc: renyu.zj@linux.alibaba.com
Cc: yuhaixin.yhx@linux.alibaba.com
Cc: tmricht@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: john.g.garry@oracle.com
Link: https://lore.kernel.org/r/20240123185036.3461837-7-kan.liang@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The aux_event can be retrieved from the perf_pmu now. Implement a
generic support.
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: james.clark@arm.com
Cc: will@kernel.org
Cc: mike.leach@linaro.org
Cc: renyu.zj@linux.alibaba.com
Cc: yuhaixin.yhx@linux.alibaba.com
Cc: tmricht@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: john.g.garry@oracle.com
Link: https://lore.kernel.org/r/20240123185036.3461837-6-kan.liang@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
For some ARCHs, e.g., ARM and AMD, to get the availability of the
mem-events, perf checks the existence of a specific PMU. For the other
ARCHs, e.g., Intel and Power, perf has to check the existence of some
specific events.
The current perf only iterates the mem-events-supported PMUs. It's not
required to check the existence of a specific PMU anymore.
Rename sysfs_name to event_name, which stores the specific mem-events.
Perf only needs to check those events for the availability of the
mem-events.
Rename perf_mem_event__supported to perf_pmu__mem_events_supported.
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: james.clark@arm.com
Cc: will@kernel.org
Cc: mike.leach@linaro.org
Cc: renyu.zj@linux.alibaba.com
Cc: yuhaixin.yhx@linux.alibaba.com
Cc: tmricht@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: john.g.garry@oracle.com
Link: https://lore.kernel.org/r/20240123185036.3461837-5-kan.liang@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Introduce a generic perf_mem_events__name(). Remove the ARCH-specific
one.
The mem_load events may have a different format. Add ldlat and aux_event
in the struct perf_mem_event to indicate the format and the extra aux
event.
Add perf_mem_events_intel_aux[] to support the extra mem_load_aux event.
Rename perf_mem_events__name to perf_pmu__mem_events_name.
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: james.clark@arm.com
Cc: will@kernel.org
Cc: mike.leach@linaro.org
Cc: renyu.zj@linux.alibaba.com
Cc: yuhaixin.yhx@linux.alibaba.com
Cc: tmricht@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: john.g.garry@oracle.com
Link: https://lore.kernel.org/r/20240123185036.3461837-4-kan.liang@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The mem_events can be retrieved from the struct perf_pmu now. An ARCH
specific perf_mem_events__ptr() is not required anymore. Remove all of
them.
The Intel hybrid has multiple mem-events-supported PMUs. But they share
the same mem_events. Other ARCHs only support one mem-events-supported
PMU. In the configuration, it's good enough to only configure the
mem_events for one PMU. Add perf_mem_events_find_pmu() which returns the
first mem-events-supported PMU.
In the perf_mem_events__init(), the perf_pmus__scan() is not required
anymore. It avoids checking the sysfs for every PMU on the system.
Make the perf_mem_events__record_args() more generic. Remove the
perf_mem_events__print_unsupport_hybrid().
Since pmu is added as a new parameter, rename perf_mem_events__ptr() to
perf_pmu__mem_events_ptr(). Several other functions also do a similar
rename.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Kajol jain <kjain@linux.ibm.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: james.clark@arm.com
Cc: will@kernel.org
Cc: leo.yan@linaro.org
Cc: mike.leach@linaro.org
Cc: renyu.zj@linux.alibaba.com
Cc: yuhaixin.yhx@linux.alibaba.com
Cc: tmricht@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: john.g.garry@oracle.com
Link: https://lore.kernel.org/r/20240123185036.3461837-3-kan.liang@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
With the mem_events, perf doesn't need to read sysfs for each PMU to
find the mem-events-supported PMU. The patch also makes it possible to
clean up the related __weak functions later.
The patch is only to add the mem_events into the perf_pmu for all ARCHs.
It will be used in the later cleanup patches.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Kajol Jain <kjain@linux.ibm.com>
Suggested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: will@kernel.org
Cc: mike.leach@linaro.org
Cc: renyu.zj@linux.alibaba.com
Cc: yuhaixin.yhx@linux.alibaba.com
Cc: tmricht@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: john.g.garry@oracle.com
Link: https://lore.kernel.org/r/20240123185036.3461837-2-kan.liang@linux.intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Now that we have evsel__taskstate() which no longer relies on the
hardcoded task state string and has good backward compatibility,
we have a good reason to use it.
Note TASK_STATE_TO_CHAR_STR and task bitmasks are useless now so
we remove them for good. And now we pass the state info back and
forth in a symbolic char which explains itself well instead.
Signed-off-by: Ze Gao <zegao@tencent.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240123022425.1611483-1-zegao@tencent.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Now that we have the __prinf_flags() parsing routines, we add a new
helper evsel__taskstate() to extract the task state info from the
recorded data.
Signed-off-by: Ze Gao <zegao@tencent.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240122070859.1394479-5-zegao@tencent.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Perf uses a hard coded string "RSDTtXZPI" to index the sched_switch
prev_state field raw bitmask value. This works well except for when
the kernel changes this string, in which case this will break again.
Instead we add a new way to parse task state string from tracepoint
print format already recorded by perf, which eliminates the further
dependencies with this hardcode and unmaintainable macro, and this
is exactly what libtraceevent[1] does for now.
So we borrow the print flags parsing logic from libtraceevent[1].
And in get_states(), we walk the print arguments until the
__print_flags() for the target state field is found, and use that to
build the states string for future parsing.
[1]: https://lore.kernel.org/linux-trace-devel/20231224140732.7d41698d@rorschach.local.home/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Ze Gao <zegao@tencent.com>
Link: https://lore.kernel.org/r/20240122070859.1394479-4-zegao@tencent.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Update state char array to match the latest kernel definitions and
remove unused state mapping macros.
Note this is the preparing patch for get rid of the way to parse
process state from raw bitmask value. Instead we are going to
parse it from the recorded tracepoint print format, and this change
marks why we're doing it.
Signed-off-by: Ze Gao <zegao@tencent.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240122070859.1394479-3-zegao@tencent.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Minor code style alignment cleanup for perf_data__switch() and
perf_data__write().
No functional change.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240119040304.3708522-4-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
mode before recording
In pipe mode, no need to switch perf data output, therefore,
'--timestamp-filename' option should not take effect.
Check the conflict before recording and output WARNING.
In this case, the check pipe mode in perf_data__switch() can be removed.
Before:
# perf record --timestamp-filename -o- perf test -w noploop | perf report -i- --percent-limit=1
# To display the perf.data header info, please use --header/--header-only options.
#
[ perf record: Woken up 1 times to write data ]
[ perf record: Dump -.2024011812110182 ]
#
# Total Lost Samples: 0
#
# Samples: 4K of event 'cycles:P'
# Event count (approx.): 2176784359
#
# Overhead Command Shared Object Symbol
# ........ ....... .................... ......................................
#
97.83% perf perf [.] noploop
#
# (Tip: Print event counts in CSV format with: perf stat -x,)
#
After:
# perf record --timestamp-filename -o- perf test -w noploop | perf report -i- --percent-limit=1
WARNING: --timestamp-filename option is not available in pipe mode.
# To display the perf.data header info, please use --header/--header-only options.
#
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
#
# Total Lost Samples: 0
#
# Samples: 4K of event 'cycles:P'
# Event count (approx.): 2185575421
#
# Overhead Command Shared Object Symbol
# ........ ....... ..................... .............................................
#
97.75% perf perf [.] noploop
#
# (Tip: Profiling branch (mis)predictions with: perf record -b / perf report)
#
Fixes: ecfd7a9c044e ("perf record: Add '--timestamp-filename' option to append timestamp to output file name")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240119040304.3708522-3-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
perf_data__switch() may not assign a legal value to 'new_filename'.
In this case, 'new_filename' uses the on-stack value, which may cause a
incorrect free and unexpected result.
Fixes: 03724b2e9c45 ("perf record: Allow to limit number of reported perf.data files")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240119040304.3708522-2-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The DWARF location expression can be fairly complex and it'd be hard
to match it with the condition correctly. So let's be conservative
and only allow simple expressions. For now it just checks the first
operation in the list. The following operations looks ok:
* DW_OP_stack_value
* DW_OP_deref_size
* DW_OP_deref
* DW_OP_piece
To refuse complex (and unsupported) location expressions, add
check_allowed_ops() to compare the rest of the list. It seems earlier
result contained those unsupported expressions. For example, I found
some local struct variable is placed like below.
<2><43d1517>: Abbrev Number: 62 (DW_TAG_variable)
<43d1518> DW_AT_location : 15 byte block: 91 50 93 8 91 78 93 4 93 84 8 91 68 93 4
(DW_OP_fbreg: -48; DW_OP_piece: 8;
DW_OP_fbreg: -8; DW_OP_piece: 4;
DW_OP_piece: 1028;
DW_OP_fbreg: -24; DW_OP_piece: 4)
Another example is something like this.
0057c8be ffffffffffffffff ffffffff812109f0 (base address)
0057c8ce ffffffff812112b5 ffffffff812112c8 (DW_OP_breg3 (rbx): 0;
DW_OP_constu: 18446744073709551612;
DW_OP_and;
DW_OP_stack_value)
It should refuse them. After the change, the stat shows:
Annotate data type stats:
total 294, ok 158 (53.7%), bad 136 (46.3%)
-----------------------------------------------------------
30 : no_sym
32 : no_mem_ops
53 : no_var
14 : no_typeinfo
7 : bad_offset
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240117062657.985479-10-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Local variables are allocated in the stack and the location list
should look like base register(s) and an offset. Extend the
die_find_variable_by_reg() to handle the following expressions
* DW_OP_breg{0..31}
* DW_OP_bregx
* DW_OP_fbreg
Ususally DWARF subprogram entries have frame base information and
use it to locate stack variable like below:
<2><43d1575>: Abbrev Number: 62 (DW_TAG_variable)
<43d1576> DW_AT_location : 2 byte block: 91 7c (DW_OP_fbreg: -4) <--- here
<43d1579> DW_AT_name : (indirect string, offset: 0x2c00c9): i
<43d157d> DW_AT_decl_file : 1
<43d157e> DW_AT_decl_line : 78
<43d157f> DW_AT_type : <0x43d19d7>
I found some differences on saving the frame base between gcc and clang.
The gcc uses the CFA to get the base so it needs to check the current
frame's CFI info. In this case, stack offset needs to be adjusted from
the start of the CFA.
<1><1bb8d>: Abbrev Number: 102 (DW_TAG_subprogram)
<1bb8e> DW_AT_name : (indirect string, offset: 0x74d41): kernel_init
<1bb92> DW_AT_decl_file : 2
<1bb92> DW_AT_decl_line : 1440
<1bb94> DW_AT_decl_column : 18
<1bb95> DW_AT_prototyped : 1
<1bb95> DW_AT_type : <0xcc>
<1bb99> DW_AT_low_pc : 0xffffffff81bab9e0
<1bba1> DW_AT_high_pc : 0x1b2
<1bba9> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa) <------ here
<1bbab> DW_AT_call_all_calls: 1
<1bbab> DW_AT_sibling : <0x1bf5a>
While clang sets it to a register directly and it can check the register
and offset in the instruction directly.
<1><43d1542>: Abbrev Number: 60 (DW_TAG_subprogram)
<43d1543> DW_AT_low_pc : 0xffffffff816a7c60
<43d154b> DW_AT_high_pc : 0x98
<43d154f> DW_AT_frame_base : 1 byte block: 56 (DW_OP_reg6 (rbp)) <---------- here
<43d1551> DW_AT_GNU_all_call_sites: 1
<43d1551> DW_AT_name : (indirect string, offset: 0x3bce91): foo
<43d1555> DW_AT_decl_file : 1
<43d1556> DW_AT_decl_line : 75
<43d1557> DW_AT_prototyped : 1
<43d1557> DW_AT_type : <0x43c7332>
<43d155b> DW_AT_external : 1
Also it needs to update the offset after finding the type like global
variables since the offset was from the frame base. Factor out
match_var_offset() to check global and local variables in the same way.
The type stats are improved too:
Annotate data type stats:
total 294, ok 160 (54.4%), bad 134 (45.6%)
-----------------------------------------------------------
30 : no_sym
32 : no_mem_ops
51 : no_var
14 : no_typeinfo
7 : bad_offset
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-9-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The die_get_cfa() is to get frame base register and offset at the given
instruction address (pc). This info will be used to locate stack
variables which have location expression using DW_OP_fbreg.
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-8-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Global variables are accessed using PC-relative address so it needs to
be handled separately. The PC-rel addressing is detected by using
DWARF_REG_PC. On x86, %rip register would be used.
The address can be calculated using the ip and offset in the
instruction. But it should start from the next instruction so add
calculate_pcrel_addr() to do it properly.
But global variables defined in a different file would only have a
declaration which doesn't include a location list. So it first tries
to get the type info using the address, and then looks up the variable
declarations using name. The name of global variables should be get
from the symbol table. The declaration would have the type info.
So extend find_var_type() to take both address and name for global
variables.
The stat is now looks like:
Annotate data type stats:
total 294, ok 153 (52.0%), bad 141 (48.0%)
-----------------------------------------------------------
30 : no_sym
32 : no_mem_ops
61 : no_var
10 : no_typeinfo
8 : bad_offset
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-7-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Extend find_data_type_die() to find data type from PC-relative address
using die_find_variable_by_addr(). Users need to pass the address for
the (global) variable.
The offset for the variable should be updated after finding the type
because the offset in the instruction is just to calcuate the address
for the variable. So it changed to pass a pointer to offset and renamed
it to 'poffset'.
First it searches variables in the CU DIE as it's likely that the global
variables are defined in the file level. And then it iterates the scope
DIEs to find a local (static) variable.
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-6-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
A typical function prologue and epilogue include multiple stack
operations to save and restore the current value of registers.
On x86, it looks like below:
push r15
push r14
push r13
push r12
...
pop r12
pop r13
pop r14
pop r15
ret
As these all touches the stack memory region, chances are high that they
appear in a memory profile data. But these are not used for any real
purpose yet so it'd return no types.
One of my profile type shows that non neglible portion of data came from
the stack operations. It also seems GCC generates more stack operations
than clang.
Annotate Instruction stats
total 264, ok 169 (64.0%), bad 95 (36.0%)
Name : Good Bad
-----------------------------------------------------------
movq : 49 27
movl : 24 9
popq : 0 19 <-- here
cmpl : 17 2
addq : 14 1
cmpq : 12 2
cmpxchgl : 3 7
Instead of dealing them as unknown, let's create a seperate pseudo type
to represent those stack operations separately.
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-5-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
On x86, instructions for array access often looks like below.
mov 0x1234(%rax,%rbx,8), %rcx
Usually the first register holds the type information and the second one
has the index. And the current code only looks up a variable for the
first register. But it's possible to be in the other way around so it
needs to check the second register if the first one failed.
The stat changed like this.
Annotate data type stats:
total 294, ok 148 (50.3%), bad 146 (49.7%)
-----------------------------------------------------------
30 : no_sym
32 : no_mem_ops
66 : no_var
10 : no_typeinfo
8 : bad_offset
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-4-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When a sample was come from a conditional branch without a memory
operand, it could be due to a macro fusion with a previous instruction.
So it needs to check the memory operand in the previous one.
This improves the stat like below:
Annotate data type stats:
total 294, ok 147 (50.0%), bad 147 (50.0%)
-----------------------------------------------------------
30 : no_sym
32 : no_mem_ops
71 : no_var
6 : no_typeinfo
8 : bad_offset
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
For the performance reason, I prefer llvm-objdump over GNU's. But I
found that llvm-objdump puts x86 lock prefix in a separate line like
below.
ffffffff81000695: f0 lock
ffffffff81000696: ff 83 54 0b 00 00 incl 2900(%rbx)
This should be parsed properly, but I just changed to find the insn
with next offset for now.
This improves the statistics as it can process more instructions.
Annotate data type stats:
total 294, ok 144 (49.0%), bad 150 (51.0%)
-----------------------------------------------------------
30 : no_sym
35 : no_mem_ops
71 : no_var
6 : no_typeinfo
8 : bad_offset
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
If pkg-config is not installed when libtraceevent is linked, the build fails.
The error information is as follows:
$ make
<SNIP>
In file included from /home/yjh/projects_linux/perf-tool-next/linux/tools/perf/util/evsel.c:43:
/home/yjh/projects_linux/perf-tool-next/linux/tools/perf/util/trace-event.h:149:62: error: operator '&&' has no right operand
149 | #if defined(LIBTRACEEVENT_VERSION) && LIBTRACEEVENT_VERSION >= MAKE_LIBTRACEEVENT_VERSION(1, 5, 0)
| ^~
error: command '/usr/bin/gcc' failed with exit code 1
cp: cannot stat 'python_ext_build/lib/perf*.so': No such file or directory
make[2]: *** [Makefile.perf:668: python/perf.cpython-310-x86_64-linux-gnu.so] Error 1
make[2]: *** Waiting for unfinished jobs....
Because pkg-config is not installed, fail to get libtraceevent version in
Makefile.config file. As a result, LIBTRACEEVENT_VERSION is empty.
However, the preceding error information is not user-friendly.
Identify errors in advance by checking that pkg-config is installed at
compile time.
The build results of various scenarios are as follows:
1. build successful when libtraceevent is not linked and pkg-config is not installed
$ pkg-config --version
-bash: /usr/bin/pkg-config: No such file or directory
$ make clean >/dev/null
$ make NO_LIBTRACEEVENT=1 >/dev/null
Makefile.config:1133: No alternatives command found, you need to set JDIR= to point to the root of your Java directory
PERF_VERSION = 6.7.rc6.gd988c9f511af
$ echo $?
0
2. dummy pkg-config is missing when libtraceevent is linked
$ pkg-config --version
-bash: /usr/bin/pkg-config: No such file or directory
$ make clean >/dev/null
$ make >/dev/null
Makefile.config:221: *** Error: pkg-config needed by libtraceevent is missing on this system, please install it. Stop.
make[1]: *** [Makefile.perf:251: sub-make] Error 2
make: *** [Makefile:70: all] Error 2
$ echo $?
2
3. build successful when libtraceevent is linked and pkg-config is installed
$ pkg-config --version
0.29.2
$ make clean >/dev/null
$ make >/dev/null
Makefile.config:1133: No alternatives command found, you need to set JDIR= to point to the root of your Java directory
PERF_VERSION = 6.7.rc6.gd988c9f511af
$ echo $?
0
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240112034019.3558584-1-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This test case often fails on s390 (about 2 out of 10) because the
10% percent limit on the difference between --bpf-counters event counting
and s390 hardware counting is more than 10% in all failure cases.
Raise the limit to 20% on s390 and the test case succeeds.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Cc: sumanthk@linux.ibm.com
Cc: svens@linux.ibm.com
Link: https://lore.kernel.org/r/20240108084009.3959211-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
"Add Namhyung Kim as tools/perf/ co-maintainer, we're taking turns
processing patches, switching roles from perf-tools to perf-tools-next
at each Linux release.
Data profiling:
- Associate samples that identify loads and stores with data
structures. This uses events available on Intel, AMD and others and
DWARF info:
# To get memory access samples in kernel for 1 second (on Intel)
$ perf mem record -a -K --ldlat=4 -- sleep 1
# Similar for the AMD (but it requires 6.3+ kernel for BPF filters)
$ perf mem record -a --filter 'mem_op == load || mem_op == store, ip > 0x8000000000000000' -- sleep 1
Then, amongst several modes of post processing, one can do things like:
$ perf report -s type,typeoff --hierarchy --group --stdio
...
#
# Samples: 10K of events 'cpu/mem-loads,ldlat=4/P, cpu/mem-stores/P, dummy:u'
# Event count (approx.): 602758064
#
# Overhead Data Type / Data Type Offset
# ........................... ............................
#
26.09% 3.28% 0.00% long unsigned int
26.09% 3.28% 0.00% long unsigned int +0 (no field)
18.48% 0.73% 0.00% struct page
10.83% 0.02% 0.00% struct page +8 (lru.next)
3.90% 0.28% 0.00% struct page +0 (flags)
3.45% 0.06% 0.00% struct page +24 (mapping)
0.25% 0.28% 0.00% struct page +48 (_mapcount.counter)
0.02% 0.06% 0.00% struct page +32 (index)
0.02% 0.00% 0.00% struct page +52 (_refcount.counter)
0.02% 0.01% 0.00% struct page +56 (memcg_data)
0.00% 0.01% 0.00% struct page +16 (lru.prev)
15.37% 17.54% 0.00% (stack operation)
15.37% 17.54% 0.00% (stack operation) +0 (no field)
11.71% 50.27% 0.00% (unknown)
11.71% 50.27% 0.00% (unknown) +0 (no field)
$ perf annotate --data-type
...
Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples):
============================================================================
samples offset size field
13 0 640 struct cfs_rq {
2 0 16 struct load_weight load {
2 0 8 unsigned long weight;
0 8 4 u32 inv_weight;
};
0 16 8 unsigned long runnable_weight;
0 24 4 unsigned int nr_running;
1 28 4 unsigned int h_nr_running;
...
$ perf annotate --data-type=page --group
Annotate type: 'struct page' in [kernel.kallsyms] (480 samples):
event[0] = cpu/mem-loads,ldlat=4/P
event[1] = cpu/mem-stores/P
event[2] = dummy:u
===================================================================================
samples offset size field
447 33 0 0 64 struct page {
108 8 0 0 8 long unsigned int flags;
319 13 0 8 40 union {
319 13 0 8 40 struct {
236 2 0 8 16 union {
236 2 0 8 16 struct list_head lru {
236 1 0 8 8 struct list_head* next;
0 1 0 16 8 struct list_head* prev;
};
236 2 0 8 16 struct {
236 1 0 8 8 void* __filler;
0 1 0 16 4 unsigned int mlock_count;
};
236 2 0 8 16 struct list_head buddy_list {
236 1 0 8 8 struct list_head* next;
0 1 0 16 8 struct list_head* prev;
};
236 2 0 8 16 struct list_head pcp_list {
236 1 0 8 8 struct list_head* next;
0 1 0 16 8 struct list_head* prev;
};
};
82 4 0 24 8 struct address_space* mapping;
1 7 0 32 8 union {
1 7 0 32 8 long unsigned int index;
1 7 0 32 8 long unsigned int share;
};
0 0 0 40 8 long unsigned int private;
};
This uses the existing annotate code, calling objdump to do the
disassembly, with improvements to avoid having this take too long,
but longer term a switch to a disassembler library, possibly
reusing code in the kernel will be pursued.
This is the initial implementation, please use it and report
impressions and bugs. Make sure the kernel-debuginfo packages match
the running kernel. The 'perf report' phase for non short perf.data
files may take a while.
There is a great article about it on LWN:
https://lwn.net/Articles/955709/ - "Data-type profiling for perf"
One last test I did while writing this text, on a AMD Ryzen 5950X,
using a distro kernel, while doing a simple 'find /' on an
otherwise idle system resulted in:
# uname -r
6.6.9-100.fc38.x86_64
# perf -vv | grep BPF_
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
bpf_skeletons: [ on ] # HAVE_BPF_SKEL
# rpm -qa | grep kernel-debuginfo
kernel-debuginfo-common-x86_64-6.6.9-100.fc38.x86_64
kernel-debuginfo-6.6.9-100.fc38.x86_64
#
# perf mem record -a --filter 'mem_op == load || mem_op == store, ip > 0x8000000000000000'
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.199 MB perf.data (2913 samples) ]
#
# ls -la perf.data
-rw-------. 1 root root 2346486 Jan 9 18:36 perf.data
# perf evlist
ibs_op//
dummy:u
# perf evlist -v
ibs_op//: type: 11, size: 136, config: 0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1
dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT, read_format: ID, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
# perf report -s type,typeoff --hierarchy --group --stdio
# Total Lost Samples: 0
#
# Samples: 2K of events 'ibs_op//, dummy:u'
# Event count (approx.): 1904553038
#
# Overhead Data Type / Data Type Offset
# ................... ............................
#
73.70% 0.00% (unknown)
73.70% 0.00% (unknown) +0 (no field)
3.01% 0.00% long unsigned int
3.00% 0.00% long unsigned int +0 (no field)
0.01% 0.00% long unsigned int +2 (no field)
2.73% 0.00% struct task_struct
1.71% 0.00% struct task_struct +52 (on_cpu)
0.38% 0.00% struct task_struct +2104 (rcu_read_unlock_special.b.blocked)
0.23% 0.00% struct task_struct +2100 (rcu_read_lock_nesting)
0.14% 0.00% struct task_struct +2384 ()
0.06% 0.00% struct task_struct +3096 (signal)
0.05% 0.00% struct task_struct +3616 (cgroups)
0.05% 0.00% struct task_struct +2344 (active_mm)
0.02% 0.00% struct task_struct +46 (flags)
0.02% 0.00% struct task_struct +2096 (migration_disabled)
0.01% 0.00% struct task_struct +24 (__state)
0.01% 0.00% struct task_struct +3956 (mm_cid_active)
0.01% 0.00% struct task_struct +1048 (cpus_ptr)
0.01% 0.00% struct task_struct +184 (se.group_node.next)
0.01% 0.00% struct task_struct +20 (thread_info.cpu)
0.00% 0.00% struct task_struct +104 (on_rq)
0.00% 0.00% struct task_struct +2456 (pid)
1.36% 0.00% struct module
0.59% 0.00% struct module +952 (kallsyms)
0.42% 0.00% struct module +0 (state)
0.23% 0.00% struct module +8 (list.next)
0.12% 0.00% struct module +216 (syms)
0.95% 0.00% struct inode
0.41% 0.00% struct inode +40 (i_sb)
0.22% 0.00% struct inode +0 (i_mode)
0.06% 0.00% struct inode +76 (i_rdev)
0.06% 0.00% struct inode +56 (i_security)
<SNIP>
perf top/report:
- Don't ignore job control, allowing control+Z + bg to work.
- Add s390 raw data interpretation for PAI (Processor Activity
Instrumentation) counters.
perf archive:
- Add new option '--all' to pack perf.data with DSOs.
- Add new option '--unpack' to expand tarballs.
Initialization speedups:
- Lazily initialize zstd streams to save memory when not using it.
- Lazily allocate/size mmap event copy.
- Lazy load kernel symbols in 'perf record'.
- Be lazier in allocating lost samples buffer in 'perf record'.
- Don't synthesize BPF events when disabled via the command line
(perf record --no-bpf-event).
Assorted improvements:
- Show note on AMD systems that the :p, :pp, :ppp and :P are all the
same, as IBS (Instruction Based Sampling) is used and it is
inherentely precise, not having levels of precision like in Intel
systems.
- When 'cycles' isn't available, fall back to the "task-clock" event
when not system wide, not to 'cpu-clock'.
- Add --debug-file option to redirect debug output, e.g.:
$ perf --debug-file /tmp/perf.log record -v true
- Shrink 'struct map' to under one cacheline by avoiding function
pointers for selecting if addresses are identity or DSO relative,
and using just a byte for some boolean struct members.
- Resolve the arch specific strerrno just once to use in
perf_env__arch_strerrno().
- Reduce memory for recording PERF_RECORD_LOST_SAMPLES event.
Assorted fixes:
- Fix the default 'perf top' usage on Intel hybrid systems, now it
starts with a browser showing the number of samples for Efficiency
(cpu_atom/cycles/P) and Performance (cpu_core/cycles/P). This
behaviour is similar on ARM64, with its respective set of
big.LITTLE processors.
- Fix segfault on build_mem_topology() error path.
- Fix 'perf mem' error on hybrid related to availability of mem event
in a PMU.
- Fix missing reference count gets (map, maps) in the db-export code.
- Avoid recursively taking env->bpf_progs.lock in the 'perf_env'
code.
- Use the newly introduced maps__for_each_map() to add missing
locking around iteration of 'struct map' entries.
- Parse NOTE segments until the build id is found, don't stop on the
first one, ELF files may have several such NOTE segments.
- Remove 'egrep' usage, its deprecated, use 'grep -E' instead.
- Warn first about missing libelf, not libbpf, that depends on
libelf.
- Use alternative to 'find ... -printf' as this isn't supported in
busybox.
- Address python 3.6 DeprecationWarning for string scapes.
- Fix memory leak in uniq() in libsubcmd.
- Fix man page formatting for 'perf lock'
- Fix some spelling mistakes.
perf tests:
- Fail shell tests that needs some symbol in perf itself if it is
stripped. These tests check if a symbol is resolved, if some hot
function is indeed detected by profiling, etc.
- The 'perf test sigtrap' test is currently failing on PREEMPT_RT,
skip it if sleeping spinlocks are detected (using BTF) and point to
the mailing list discussion about it. This test is also being
skipped on several architectures (powerpc, s390x, arm and aarch64)
due to other pending issues with intruction breakpoints.
- Adjust test case perf record offcpu profiling tests for s390.
- Fix 'Setup struct perf_event_attr' fails on s390 on z/VM guest,
addressing issues caused by the fallback from cycles to task-clock
done in this release.
- Fix mask for VG register in the user-regs test.
- Use shellcheck on 'perf test' shell scripts automatically to make
sure changes don't introduce things it flags as problematic.
- Add option to change objdump binary and allow it to be set via
'perf config'.
- Add basic 'perf script', 'perf list --json" and 'perf diff' tests.
- Basic branch counter support.
- Make DSO tests a suite rather than individual.
- Remove atomics from test_loop to avoid test failures.
- Fix call chain match on powerpc for the record+probe_libc_inet_pton
test.
- Improve Intel hybrid tests.
Vendor event files (JSON):
powerpc:
- Update datasource event name to fix duplicate events on IBM's
Power10.
- Add PVN for HX-C2000 CPU with Power8 Architecture.
Intel:
- Alderlake/rocketlake metric fixes.
- Update emeraldrapids events to v1.02.
- Update icelakex events to v1.23.
- Update sapphirerapids events to v1.17.
- Add skx, clx, icx and spr upi bandwidth metric.
AMD:
- Add Zen 4 memory controller events.
RISC-V:
- Add StarFive Dubhe-80 and Dubhe-90 JSON files.
https://www.starfivetech.com/en/site/cpu-u
- Add T-HEAD C9xx JSON file.
https://github.com/riscv-software-src/opensbi/blob/master/docs/platform/thead-c9xx.md
ARM64:
- Remove UTF-8 characters from cmn.json, that were causing build
failure in some distros.
- Add core PMU events and metrics for Ampere One X.
- Rename Ampere One's BPU_FLUSH_MEM_FAULT to GPC_FLUSH_MEM_FAULT
libperf:
- Rename several perf_cpu_map constructor names to clarify what they
really do.
- Ditto for some other methods, coping with some issues in their
semantics, like perf_cpu_map__empty() ->
perf_cpu_map__has_any_cpu_or_is_empty().
- Document perf_cpu_map__nr()'s behavior
perf stat:
- Exit if parse groups fails.
- Combine the -A/--no-aggr and --no-merge options.
- Fix help message for --metric-no-threshold option.
Hardware tracing:
ARM64 CoreSight:
- Bump minimum OpenCSD version to ensure a bugfix is present.
- Add 'T' itrace option for timestamp trace
- Set start vm addr of exectable file to 0 and don't ignore first
sample on the arm-cs-trace-disasm.py 'perf script'"
* tag 'perf-tools-for-v6.8-1-2024-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (179 commits)
MAINTAINERS: Add Namhyung as tools/perf/ co-maintainer
perf test: test case 'Setup struct perf_event_attr' fails on s390 on z/vm
perf db-export: Fix missing reference count get in call_path_from_sample()
perf tests: Add perf script test
libsubcmd: Fix memory leak in uniq()
perf TUI: Don't ignore job control
perf vendor events intel: Update sapphirerapids events to v1.17
perf vendor events intel: Update icelakex events to v1.23
perf vendor events intel: Update emeraldrapids events to v1.02
perf vendor events intel: Alderlake/rocketlake metric fixes
perf x86 test: Add hybrid test for conflicting legacy/sysfs event
perf x86 test: Update hybrid expectations
perf vendor events amd: Add Zen 4 memory controller events
perf stat: Fix hard coded LL miss units
perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event
perf env: Avoid recursively taking env->bpf_progs.lock
perf annotate: Add --insn-stat option for debugging
perf annotate: Add --type-stat option for debugging
perf annotate: Support event group display
perf annotate: Add --data-type option
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull security module updates from Paul Moore:
- Add three new syscalls: lsm_list_modules(), lsm_get_self_attr(), and
lsm_set_self_attr().
The first syscall simply lists the LSMs enabled, while the second and
third get and set the current process' LSM attributes. Yes, these
syscalls may provide similar functionality to what can be found under
/proc or /sys, but they were designed to support multiple,
simultaneaous (stacked) LSMs from the start as opposed to the current
/proc based solutions which were created at a time when only one LSM
was allowed to be active at a given time.
We have spent considerable time discussing ways to extend the
existing /proc interfaces to support multiple, simultaneaous LSMs and
even our best ideas have been far too ugly to support as a kernel
API; after +20 years in the kernel, I felt the LSM layer had
established itself enough to justify a handful of syscalls.
Support amongst the individual LSM developers has been nearly
unanimous, with a single objection coming from Tetsuo (TOMOYO) as he
is worried that the LSM_ID_XXX token concept will make it more
difficult for out-of-tree LSMs to survive. Several members of the LSM
community have demonstrated the ability for out-of-tree LSMs to
continue to exist by picking high/unused LSM_ID values as well as
pointing out that many kernel APIs rely on integer identifiers, e.g.
syscalls (!), but unfortunately Tetsuo's objections remain.
My personal opinion is that while I have no interest in penalizing
out-of-tree LSMs, I'm not going to penalize in-tree development to
support out-of-tree development, and I view this as a necessary step
forward to support the push for expanded LSM stacking and reduce our
reliance on /proc and /sys which has occassionally been problematic
for some container users. Finally, we have included the linux-api
folks on (all?) recent revisions of the patchset and addressed all of
their concerns.
- Add a new security_file_ioctl_compat() LSM hook to handle the 32-bit
ioctls on 64-bit systems problem.
This patch includes support for all of the existing LSMs which
provide ioctl hooks, although it turns out only SELinux actually
cares about the individual ioctls. It is worth noting that while
Casey (Smack) and Tetsuo (TOMOYO) did not give explicit ACKs to this
patch, they did both indicate they are okay with the changes.
- Fix a potential memory leak in the CALIPSO code when IPv6 is disabled
at boot.
While it's good that we are fixing this, I doubt this is something
users are seeing in the wild as you need to both disable IPv6 and
then attempt to configure IPv6 labeled networking via
NetLabel/CALIPSO; that just doesn't make much sense.
Normally this would go through netdev, but Jakub asked me to take
this patch and of all the trees I maintain, the LSM tree seemed like
the best fit.
- Update the LSM MAINTAINERS entry with additional information about
our process docs, patchwork, bug reporting, etc.
I also noticed that the Lockdown LSM is missing a dedicated
MAINTAINERS entry so I've added that to the pull request. I've been
working with one of the major Lockdown authors/contributors to see if
they are willing to step up and assume a Lockdown maintainer role;
hopefully that will happen soon, but in the meantime I'll continue to
look after it.
- Add a handful of mailmap entries for Serge Hallyn and myself.
* tag 'lsm-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (27 commits)
lsm: new security_file_ioctl_compat() hook
lsm: Add a __counted_by() annotation to lsm_ctx.ctx
calipso: fix memory leak in netlbl_calipso_add_pass()
selftests: remove the LSM_ID_IMA check in lsm/lsm_list_modules_test
MAINTAINERS: add an entry for the lockdown LSM
MAINTAINERS: update the LSM entry
mailmap: add entries for Serge Hallyn's dead accounts
mailmap: update/replace my old email addresses
lsm: mark the lsm_id variables are marked as static
lsm: convert security_setselfattr() to use memdup_user()
lsm: align based on pointer length in lsm_fill_user_ctx()
lsm: consolidate buffer size handling into lsm_fill_user_ctx()
lsm: correct error codes in security_getselfattr()
lsm: cleanup the size counters in security_getselfattr()
lsm: don't yet account for IMA in LSM_CONFIG_COUNT calculation
lsm: drop LSM_ID_IMA
LSM: selftests for Linux Security Module syscalls
SELinux: Add selfattr hooks
AppArmor: Add selfattr hooks
Smack: implement setselfattr and getselfattr hooks
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the series
'maple_tree: add mt_free_one() and mt_attr() helpers'
'Some cleanups of maple tree'
- In the series 'mm: use memmap_on_memory semantics for dax/kmem'
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few fixes)
in the patch series
'Add folio_zero_tail() and folio_fill_tail()'
'Make folio_start_writeback return void'
'Fix fault handler's handling of poisoned tail pages'
'Convert aops->error_remove_page to ->error_remove_folio'
'Finish two folio conversions'
'More swap folio conversions'
- Kefeng Wang has also contributed folio-related work in the series
'mm: cleanup and use more folio in page fault'
- Jim Cromie has improved the kmemleak reporting output in the series
'tweak kmemleak report format'.
- In the series 'stackdepot: allow evicting stack traces' Andrey
Konovalov to permits clients (in this case KASAN) to cause eviction
of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series 'mm:
page_alloc: fixes for high atomic reserve caluculations'.
- Dmitry Rokosov has added to the samples/ dorectory some sample code
for a userspace memcg event listener application. See the series
'samples: introduce cgroup events listeners'.
- Some mapletree maintanance work from Liam Howlett in the series
'maple_tree: iterator state changes'.
- Nhat Pham has improved zswap's approach to writeback in the series
'workload-specific and memory pressure-driven zswap writeback'.
- DAMON/DAMOS feature and maintenance work from SeongJae Park in the
series
'mm/damon: let users feed and tame/auto-tune DAMOS'
'selftests/damon: add Python-written DAMON functionality tests'
'mm/damon: misc updates for 6.8'
- Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
memcg: subtree stats flushing and thresholds'.
- In the series 'Multi-size THP for anonymous memory' Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series 'More buffer_head
cleanups'.
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
Add ksm advisor'. This is a governor which tunes KSM's scanning
aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory use
in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.
- Matthew Wilcox has performed some maintenance work on the writeback
code, both code and within filesystems. The series is 'Clean up the
writeback paths'.
- Andrey Konovalov has optimized KASAN's handling of alloc and free
stack traces for secondary-level allocators, in the series 'kasan:
save mempool stack traces'.
- Andrey also performed some KASAN maintenance work in the series
'kasan: assorted clean-ups'.
- David Hildenbrand has gone to town on the rmap code. Cleanups, more
pte batching, folio conversions and more. See the series 'mm/rmap:
interface overhaul'.
- Kinsey Ho has contributed some maintenance work on the MGLRU code
in the series 'mm/mglru: Kconfig cleanup'.
- Matthew Wilcox has contributed lruvec page accounting code cleanups
in the series 'Remove some lruvec page accounting functions'"
* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
mm, treewide: introduce NR_PAGE_ORDERS
selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
selftests/mm: skip test if application doesn't has root privileges
selftests/mm: conform test to TAP format output
selftests: mm: hugepage-mmap: conform to TAP format output
selftests/mm: gup_test: conform test to TAP format output
mm/selftests: hugepage-mremap: conform test to TAP format output
mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
mm/memcontrol: remove __mod_lruvec_page_state()
mm/khugepaged: use a folio more in collapse_file()
slub: use a folio in __kmalloc_large_node
slub: use folio APIs in free_large_kmalloc()
slub: use alloc_pages_node() in alloc_slab_page()
mm: remove inc/dec lruvec page state functions
mm: ratelimit stat flush from workingset shrinker
kasan: stop leaking stack trace handles
mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
mm/mglru: add dummy pmd_dirty()
...
|
|
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive. This has caused
issues with code that was not yet upstream and depended on the previous
definition.
To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.
Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
perf test 17 'Setup struct perf_event_attr' fails on s390 z/VM guest,
using linux-next kernel.
Root cause is the fall-back from hardware counter cycles
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|ADDR|PERIOD|DATA_SRC
read_format ID|LOST
which returns -ENOENT on s390 z/VM guest. This causes the code to fall
back to software counter task-clock, as can be seen in the debug output:
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0x1 (PERF_COUNT_SW_TASK_CLOCK) <-here
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|ADDR|PERIOD|DATA_SRC
read_format ID|LOST
This succeeds on s390 z/VM guest.
This successful installation of the counter task-clock is not listed in
the expected results and the test case fails.
This is caused by commit eb2eac0c7b618033 ("perf evsel: Fallback to
"task-clock" when not system wide") which introduced fall back from
event 'cycles' to event 'task-clock'.
To fix this on s390 allow event number 0 (cycles) and event number 1
(task-clock) as expected result.
Output before:
# ./perf test -Fv 17
17: Setup struct perf_event_attr :
--- start ---
running './tests/attr/test-stat-group1'
unsupp './tests/attr/test-stat-group1'
running './tests/attr/test-record-graph-default'
test limitation '!aarch64'
excluded architecture list ['aarch64']
expected config=0, got 1
FAILED './tests/attr/test-record-graph-default' - match failure
---- end ----
Setup struct perf_event_attr: FAILED!
#
Output after:
# ./perf test -F 17
17: Setup struct perf_event_attr : Ok
#
Fixes: eb2eac0c7b618033 ("perf evsel: Fallback to "task-clock" when not system wide")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20231219143235.1075522-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The addr_location map and maps fields in the inner loop were missing
calls to map__get()/maps__get(). The subsequent addr_location__exit()
call in each loop puts the map/maps fields causing use-after-free
aborts.
This issue reproduces on at least arm64 and x86_64 with something
simple like `perf record -g ls` followed by `perf script -s script.py`
with the following script:
perf_db_export_mode = True
perf_db_export_calls = False
perf_db_export_callchains = True
def sample_table(*args):
print(f'sample_table({args})')
def call_path_table(*args):
print(f'call_path_table({args}')
Committer testing:
This test, just introduced by Ian Rogers, now passes, not segfaulting
anymore:
# perf test "perf script tests"
95: perf script tests : Ok
#
Fixes: 0dd5041c9a0eaf8c ("perf addr_location: Add init/exit/copy functions")
Signed-off-by: Ben Gainey <ben.gainey@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231207140911.3240408-1-ben.gainey@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Start a new set of shell tests for testing perf script. The initial
contribution is checking that some perf db-export functionality works
as reported in this regression by Ben Gainey <ben.gainey@arm.com>:
https://lore.kernel.org/lkml/20231207140911.3240408-1-ben.gainey@arm.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Ben Gainey <ben.gainey@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231207174057.1482161-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In its infinite wisdom, by default, SLang sets susp undef, and this can
only be un-done by calling SLtty_set_suspend_state(true). After every
SLang_init_tty().
Additionally, no provisions are made for maintaining the teletype
attributes across suspend/continue (outside of curses emulation
mode(?!), which provides full support, naturally), so we need to save
and restore the flags ourselves, as well as reset the text colours when
going under. We need to also re-draw the screen, and raising SIGWINCH,
shockingly, Just Works.
The correct solution would be to Not Use SLang, but as a stop-gap,
this makes TUI 'perf report' usable.
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yaowenbin <yaowenbin1@huawei.com>
Link: https://lore.kernel.org/r/0354dcae23a8713f75f4fed609e0caec3c6e3cd5.1672174189.git.nabijaczleweli@nabijaczleweli.xyz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Update to v1.17 released in:
https://github.com/intel/perfmon/pull/123
Add events FP_ARITH_DISPATCHED.V0, FP_ARITH_DISPATCHED.V1,
FP_ARITH_DISPATCHED.V2, UNC_IIO_IOMMU0.1G_HITS, UNC_IIO_IOMMU0.2M_HITS
and UNC_IIO_IOMMU0.4K_HITS. Description updates.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240104074259.653219-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Update to v1.23 released in:
https://github.com/intel/perfmon/pull/123
Updates to event descriptions.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240104074259.653219-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Update to v1.02 released in:
https://github.com/intel/perfmon/pull/123
Removes events AMX_OPS_RETIRED.BF16 and AMX_OPS_RETIRED.INT8. Add
events FP_ARITH_DISPATCHED.V0, FP_ARITH_DISPATCHED.V1,
FP_ARITH_DISPATCHED.V2, UNC_IIO_IOMMU0.1G_HITS, UNC_IIO_IOMMU0.2M_HITS
and UNC_IIO_IOMMU0.4K_HITS. Description updates.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240104074259.653219-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Fix that the core PMU is being specified for 2 uncore events. Specify
a PMU for the alderlake UNCORE_FREQ metric.
Conversion script updated in:
https://github.com/intel/perfmon/pull/126
Committer testing:
Before this patch the "perf all metricgroups test" was failing, now:
root@number:~# perf test metric
10: PMU events :
10.3: Parsing of PMU event table metrics : Ok
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
10.5: Parsing of metric thresholds with fake PMUs : Ok
61: Parse and process metrics : Ok
98: perf stat metrics (shadow stat) test : Skip
101: perf all metricgroups test : Ok
102: perf all metrics test : FAILED!
107: perf metrics value validation : Ok
root@number:~#
Test 102 is failing for another reason, not being able to get as many
counters as needed, Ian Rogers suggested disabling the NMI watchdog to
have more counters available:
root@number:/home/acme# cat /proc/sys/kernel/nmi_watchdog
1
root@number:/home/acme# echo 0 > /proc/sys/kernel/nmi_watchdog
root@number:/home/acme# perf test 102
102: perf all metrics test : Ok
root@number:/home/acme#
Closes: https://lore.kernel.org/lkml/ZZWOdHXJJ_oecWwm@kernel.org/
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240104074259.653219-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The cpu-cycles event is both a legacy event and declared in
/sys/devices/cpu_core/events/cpu-cycles. The cycles event is a legacy
event but with no sysfs version.
Add a test that the sysfs version is preferred to the legacy for
cpu-cycles, while for cycles we use the legacy version.
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240103170159.1435753-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The legacy events cpu-cycles and instructions have sysfs event
equivalents on x86 (see /sys/devices/cpu_core/events).
As sysfs/JSON events are now higher in priority than legacy events this
causes the hybrid test expectations not to be met.
To fix this switch to legacy events that don't have sysfs versions,
namely cpu-cycles becomes cycles and instructions becomes branches.
Fixes: a24d9d9dc096fc0d ("perf parse-events: Make legacy events lower priority than sysfs/JSON")
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Closes: https://lore.kernel.org/lkml/ZYbm5L7tw7bdpDpE@kernel.org/
Link: https://lore.kernel.org/r/20240103170159.1435753-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Make the jevents parser aware of the Unified Memory Controller (UMC) PMU
and add events taken from Section 8.2.1 "UMC Performance Monitor Events"
of the Processor Programming Reference (PPR) for AMD Family 19h Model 11h
processors. The events capture UMC command activity such as CAS, ACTIVATE,
PRECHARGE etc. while the metrics derive data bus utilization and memory
bandwidth out of these events.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/e0d8a7e8ca8ee3e378d8029e80b456ac327d6419.1701238314.git.sandipan.das@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Copy-paste error where LL cache misses are reported as l1i.
Fixes: 0a57b910807ad163 ("perf stat: Use counts rather than saved_value")
Suggested-by: Guillaume Endignoux <guillaumee@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231211181242.1721059-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Reduce from PERF_SAMPLE_MAX_SIZE to "sizeof(*lost) +
session->machines.host.id_hdr_size".
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231207021627.1322884-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add variants of perf_env__insert_bpf_prog_info(), perf_env__insert_btf()
and perf_env__find_btf prefixed with __ to indicate the
env->bpf_progs.lock is assumed held.
Call these variants when the lock is held to avoid recursively taking it
and potentially having a thread deadlock with itself.
Fixes: f8dfeae009effc0b ("perf bpf: Show more BPF program info in print_bpf_prog_info()")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20231207014655.1252484-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This is for a debugging purpose. It'd be useful to see per-instrucion
level success/failure stats.
$ perf annotate --data-type --insn-stat
Annotate Instruction stats
total 264, ok 143 (54.2%), bad 121 (45.8%)
Name : Good Bad
-----------------------------------------------------------
movq : 45 31
movl : 22 11
popq : 0 19
cmpl : 16 3
addq : 8 7
cmpq : 11 3
cmpxchgl : 3 7
cmpxchgq : 8 0
incl : 3 3
movzbl : 4 2
incq : 4 2
decl : 6 0
...
Committer notes:
So these are about being able to find the type for accesses from these
instructions, we should improve the naming, but it is for debugging, we
can improve this later:
@@ -3726,6 +3759,10 @@ struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he)
continue;
mem_type = find_data_type(ms, ip, op_loc->reg, op_loc->offset);
+ if (mem_type)
+ istat->good++;
+ else
+ istat->bad++;
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-18-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The --type-stat option is to be used with --data-type and to print
detailed failure reasons for the data type annotation.
$ perf annotate --data-type --type-stat
Annotate data type stats:
total 294, ok 116 (39.5%), bad 178 (60.5%)
-----------------------------------------------------------
30 : no_sym
40 : no_insn_ops
33 : no_mem_ops
63 : no_var
4 : no_typeinfo
8 : bad_offset
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-17-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When events are grouped together, it'd be natural to show them at once
like in other mode. Handle group leaders with members to collect the
number of samples together and display like below:
$ perf annotate --data-type --group
...
Annotate type: 'struct page' in vmlinux (1 samples):
event[0] = cpu/mem-loads,ldlat=30/P
event[1] = cpu/mem-stores/P
event[2] = dummy:u
============================================================================
samples offset size field
1 0 0 0 64 struct page {
0 0 0 0 8 long unsigned int flags;
0 0 0 8 40 union {
0 0 0 8 40 struct {
0 0 0 8 16 union {
0 0 0 8 16 struct list_head lru {
0 0 0 8 8 struct list_head* next;
0 0 0 16 8 struct list_head* prev;
};
0 0 0 8 16 struct {
0 0 0 8 8 void* __filler;
0 0 0 16 4 unsigned int mlock_count;
};
0 0 0 8 16 struct list_head buddy_list {
0 0 0 8 8 struct list_head* next;
0 0 0 16 8 struct list_head* prev;
};
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-16-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Support data type annotation with new --data-type option. It internally
uses type sort key to collect sample histogram for the type and display
every members like below.
$ perf annotate --data-type
...
Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples):
============================================================================
samples offset size field
13 0 640 struct cfs_rq {
2 0 16 struct load_weight load {
2 0 8 unsigned long weight;
0 8 4 u32 inv_weight;
};
0 16 8 unsigned long runnable_weight;
0 24 4 unsigned int nr_running;
1 28 4 unsigned int h_nr_running;
...
For simplicity it prints the number of samples per field for now.
But it should be easy to show the overhead percentage instead.
The number at the outer struct is a sum of the numbers of the inner
members. For example, struct cfs_rq got total 13 samples, and 2 came
from the load (struct load_weight) and 1 from h_nr_running. Similarly,
the struct load_weight got total 2 samples and they all came from the
weight field.
I've added two new flags in the symbol_conf for this. The
annotate_data_member is to get the members of the type. This is also
needed for perf report with typeoff sort key. The annotate_data_sample
is to update sample stats for each offset and used only in annotate.
Currently it only support stdio output mode, TUI support can be added
later.
Committer testing:
With the perf.data from the previous csets, a very simple, short
duration one:
# perf annotate --data-type
Annotate type: 'struct list_head' in [kernel.kallsyms] (1 samples):
============================================================================
samples offset size field
1 0 16 struct list_head {
0 0 8 struct list_head* next;
1 8 8 struct list_head* prev;
};
Annotate type: 'char' in [kernel.kallsyms] (1 samples):
============================================================================
samples offset size field
1 0 1 char ;
#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-15-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The symoff sort key is to print symbol and offset of sample. This is
useful for data type profiling to show exact instruction in the function
which refers the data.
$ perf report -s type,sym,typeoff,symoff --hierarchy
...
# Overhead Data Type / Symbol / Data Type Offset / Symbol Offset
# .............. .....................................................
#
1.23% struct cfs_rq
0.84% update_blocked_averages
0.19% struct cfs_rq +336 (leaf_cfs_rq_list.next)
0.19% [k] update_blocked_averages+0x96
0.19% struct cfs_rq +0 (load.weight)
0.14% [k] update_blocked_averages+0x104
0.04% [k] update_blocked_averages+0x31c
0.17% struct cfs_rq +404 (throttle_count)
0.12% [k] update_blocked_averages+0x9d
0.05% [k] update_blocked_averages+0x1f9
0.08% struct cfs_rq +272 (propagate)
0.07% [k] update_blocked_averages+0x3d3
0.02% [k] update_blocked_averages+0x45b
...
Committer testing:
# perf report --stdio -s type,typeoff,symoff
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 4 of event 'cpu_atom/mem-loads,ldlat=30/P'
# Event count (approx.): 7
#
# Overhead Data Type Data Type Offset Symbol Offset
# ........ ......... ................ .............
#
42.86% struct list_head struct list_head +8 (prev) [k] __list_del_entry_valid_or_report+0x7
28.57% (unknown) (unknown) +0 (no field) [.] _nl_intern_locale_data+0x25
14.29% char char +0 (no field) [k] strncpy_from_user+0xa5
14.29% (unknown) (unknown) +0 (no field) [.] _dl_lookup_symbol_x+0x50
#
# (Tip: To change sampling frequency to 100 Hz: perf record -F 100)
#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-14-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The typeoff sort key shows the data type name, offset and the name of
the field. This is useful to see which field in the struct is accessed
most frequently.
$ perf report -s type,typeoff --hierarchy --stdio
...
# Overhead Data Type / Data Type Offset
# ............ ............................
#
...
1.23% struct cfs_rq
0.19% struct cfs_rq +404 (throttle_count)
0.19% struct cfs_rq +0 (load.weight)
0.19% struct cfs_rq +336 (leaf_cfs_rq_list.next)
0.09% struct cfs_rq +272 (propagate)
0.09% struct cfs_rq +196 (removed.nr)
0.09% struct cfs_rq +80 (curr)
0.09% struct cfs_rq +544 (lt_b_children_throttled)
0.06% struct cfs_rq +320 (rq)
Committer testing:
Again with the perf.data from the previous csets:
# perf report --stdio -s type,typeoff
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 4 of event 'cpu_atom/mem-loads,ldlat=30/P'
# Event count (approx.): 7
#
# Overhead Data Type Data Type Offset
# ........ ......... ................
#
42.86% struct list_head struct list_head +8 (prev)
42.86% (unknown) (unknown) +0 (no field)
14.29% char char +0 (no field)
#
# (Tip: To see callchains in a more compact form: perf report -g folded)
#
# perf report --stdio -s dso,type,typeoff
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 4 of event 'cpu_atom/mem-loads,ldlat=30/P'
# Event count (approx.): 7
#
# Overhead Shared Object Data Type Data Type Offset
# ........ .................... ......... ................
#
42.86% [kernel.kallsyms] struct list_head struct list_head +8 (prev)
28.57% libc.so.6 (unknown) (unknown) +0 (no field)
14.29% [kernel.kallsyms] char char +0 (no field)
14.29% ld-linux-x86-64.so.2 (unknown) (unknown) +0 (no field)
#
# (Tip: If you have debuginfo enabled, try: perf report -s sym,srcline)
#
#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-13-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|