aboutsummaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
2023-04-06perf maps: Modify maps_by_name to hold a reference to a mapIan Rogers2-18/+33
To make it clearer about the ownership of a reference count split the by-name case from the regular start-address sorted tree. Put the reference count when maps_by_name is freed, which requires moving a decrement to nr_maps in maps__remove. Add two missing map puts in maps__fixup_overlappings in the event maps__insert fails. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf test: Add extra diagnostics to maps testIan Rogers1-15/+36
Dump the resultant and comparison maps on failure. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf map: Add accessors for ->pgoff and ->relocIan Rogers9-23/+33
Later changes will add reference count checking for 'struct map'. Add accessors so that the reference count check is only necessary in one place. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf map: Add accessors for ->prot, ->priv and ->flagsIan Rogers6-12/+28
Later changes will add reference count checking for 'struct map'. Add an accessor so that the reference count check is only necessary in one place. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf map: Add helper for ->map_ip() and ->unmap_ip()Ian Rogers26-69/+81
Later changes will add reference count checking for struct map, add a helper function to invoke the map_ip and unmap_ip function pointers. The helper allows the reference count check to be in fewer places. Committer notes: Add missing conversions to: tools/perf/util/map.c tools/perf/util/cs-etm.c tools/perf/util/annotate.c tools/perf/arch/powerpc/util/sym-handling.c tools/perf/arch/s390/annotate/instructions.c Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf map: Rename map_ip() and unmap_ip()Ian Rogers6-13/+13
Add dso to match comment. This avoids a naming conflict with later added accessor functions for variables in struct map. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf vendor events intel: Update free running tigerlake eventsIan Rogers2-36/+50
Fix the topic, PMU name, event code and umask. These updates were generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py with this PR: https://github.com/intel/perfmon/pull/66 Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf vendor events intel: Update free running snowridgex eventsIan Rogers2-66/+30
Fix the PMU names, event code and umask. Remove UNC_IIO_BANDWIDTH_OUT events that aren't supported. These updates were generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py with this PR: https://github.com/intel/perfmon/pull/66 Signed-off-by: Ian Rogers <[email protected]> : Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf vendor events intel: Correct knightslanding memory topicIan Rogers2-36/+38
Correct the memory topic of events for the imc related PMUs. These updates were generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py with this PR: https://github.com/intel/perfmon/pull/66 Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf vendor events intel: Update free running icelakex eventsIan Rogers2-58/+30
Fix the PMU names, event code and umask. Remove UNC_IIO_BANDWIDTH_OUT events that aren't supported. These updates were generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py with this PR: https://github.com/intel/perfmon/pull/66 Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf vendor events intel: Update free running alderlake eventsIan Rogers2-8/+24
Fix the PMU name, event code and umask. These updates were generated by: https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py with this PR: https://github.com/intel/perfmon/pull/66 Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Xing Zhengjun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf pmu: Sort and remove duplicates using JSON PMU nameIan Rogers1-16/+31
We may have a lot of copies of a particular uncore PMU, such as uncore_cha_0 to uncore_cha_59 on Intel sapphirerapids. The JSON events may match each of PMUs and so the events are copied to it. In 'perf list' this means we see the same JSON event 60 times as events on different PMUs don't have duplicates removed. There are 284 uncore_cha events on sapphirerapids. Rather than use the PMU's name to sort and remove duplicates, use the JSON PMU name. This reduces the 60 copies back down to 1 and has the side effect of speeding things like the "perf all PMU test" shell test. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf pmu: Improve name/comments, avoid a memory allocationIan Rogers2-10/+25
Improve documentation around perf_pmu_alias pmu_name and on functions. Reduce the scope of pmu_uncore_alias_match to just file. Rename perf_pmu__valid_suffix to the more revealing perf_pmu__match_ignoring_suffix. Add a short-cut to perf_pmu__match_ignoring_suffix for PMU names that don't also have a socket value, and can therefore avoid a memory allocation. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf pmu: Fewer const castsIan Rogers1-6/+6
struct pmu_event has const char*s, only unit needs to be non-const for the sake of passing as an out argument to strtod(). Reduce the const casts from 4 down to 1. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf lock contention: Do not try to update if hash map is fullNamhyung Kim1-3/+19
It doesn't delete data in the task_data and lock_stat maps. The data is kept there until it's consumed by userspace at the end. But it calls bpf_map_update_elem() again and again, and the data will be discarded if the map is full. This is not good. Worse, in the bpf_map_update_elem(), it keeps trying to get a new node even if the map was full. I guess it makes sense if it deletes some node like in the tstamp map (that's why I didn't make the change there). In a pre-allocated hash map, that means it'd iterate all CPU to check the freelist. And it has a bad performance impact on large machines. I've checked it on my 64 CPU machine with this. $ perf bench sched messaging -g 1000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 1000 groups == 40000 processes run Total time: 2.825 [sec] And I used the task mode, so that it can guarantee the map is full. The default map entry size is 16K and this workload has 40K tasks. Before: $ sudo ./perf lock con -abt -E3 -- perf bench sched messaging -g 1000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 1000 groups == 40000 processes run Total time: 11.299 [sec] contended total wait max wait avg wait pid comm 19284 3.51 s 3.70 ms 181.91 us 1305863 sched-messaging 243 84.09 ms 466.67 us 346.04 us 1336608 sched-messaging 177 66.35 ms 12.08 ms 374.88 us 1220416 node For some reason, it didn't report the data failures. But you can see the total time in the workload is increased a lot (2.8 -> 11.3). If it fails early when the map is full, it goes back to normal. After: $ sudo ./perf lock con -abt -E3 -- perf bench sched messaging -g 1000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 1000 groups == 40000 processes run Total time: 3.044 [sec] contended total wait max wait avg wait pid comm 18743 591.92 ms 442.96 us 31.58 us 1431454 sched-messaging 51 210.64 ms 207.45 ms 4.13 ms 1468724 sched-messaging 81 68.61 ms 65.79 ms 847.07 us 1463183 sched-messaging === output for debug === bad: 1164137, total: 2253341 bad rate: 51.66 % histogram of failure reasons task: 0 stack: 0 time: 0 data: 1164137 Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf lock contention: Revise needs_callstack() conditionNamhyung Kim2-2/+2
It needs callstacks for two reasons: * for stack aggregation mode, the map key is the stack id and it can also show the full stack traces when -v is used * for other aggregation modes, the stack filter can be used to limit lock contentions from known call paths The -v option is meaningful (in terms of stack trace) only for stack aggregation mode, so it should not set the save_callstack for other mode like with -t or -l options. I've noticed this with the following command line: $ sudo ./perf lock con -ablv -E 3 -M 16 -- ./perf bench sched messaging ... contended total wait max wait avg wait address symbol 88 4.59 ms 108.07 us 52.13 us ffff935757f46ec0 (spinlock) 33 905.22 us 73.67 us 27.43 us ffff935757f41700 (spinlock) 28 703.69 us 79.28 us 25.13 us ffff938a3d9b0c80 rq_lock (spinlock) === output for debug === bad: 12272, total: 12421 bad rate: 98.80 % histogram of failure reasons task: 8285 stack: 3987 <---------- here time: 0 data: 0 It should not have any failure on stacks since it doesn't use it. No functional change intended. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf lock contention: Update total/bad stats for hidden entriesNamhyung Kim3-1/+15
When -E option is used, it only prints the given number of entries but the event stat at the end should have the numbers for entire entries. Likewise, -S option will hide entries that don't have the named function in the callstack. Also update event stat for them. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf lock contention: Add data failure statNamhyung Kim4-2/+8
It's possible to fail to update the data when the lock_stat map is full. We should check that case and show the number at the end. $ sudo ./perf lock con -ablv -E3 -- ./perf bench sched messaging ... contended total wait max wait avg wait address symbol 6157 208.48 ms 69.29 us 33.86 us ffff934c001c1f00 (spinlock) 4030 72.04 ms 61.84 us 17.88 us ffff934c000415c0 (spinlock) 3201 50.30 ms 47.73 us 15.71 us ffff934c2eead850 (spinlock) === output for debug === bad: 0, total: 13388 bad rate: 0.00 % histogram of failure reasons task: 0 stack: 0 time: 0 data: 0 <----- added Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf lock contention: Update default map size to 16384Namhyung Kim4-6/+7
The BPF hash map will align the map size to a power of 2. So 10k would be 16k anyway. Let's have the actual size to avoid confusions. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf lock contention: Use -M for --map-nr-entriesNamhyung Kim2-1/+2
Users often want to change the map size, let's add a short option (-M) for that. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf lock contention: Simplify parse_lock_type()Namhyung Kim1-35/+8
The get_type_flag() should check both str and name fields in the lock_type_table so that it can find the appropriate flag without retrying with ':R' or ':W' suffix from the caller. Also fix a typo in the rt-mutex. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06tools: Rename __fallthrough to fallthroughLiam Howlett13-17/+17
Rename the fallthrough attribute to better align with the kernel version. Copy the definition from include/linux/compiler_attributes.h including the #else clause. Adding the #else clause allows the tools compiler.h header to drop the check for a definition entirely and keeps both definitions together. Change any __fallthrough statements to fallthrough anywhere it was used within perf. This allows other tools to use the same key word as the kernel. Committer notes: Did some missing conversions to: builtin-list.c Also included gtk.h before the 'fallthrough' definition in: tools/perf/ui/gtk/hists.c tools/perf/ui/gtk/helpline.c tools/perf/ui/gtk/browser.c As it is the arg name for a macro in glib.h: /var/home/acme/git/perf-tools-next/tools/include/linux/compiler-gcc.h:16:55: error: missing binary operator before token "(" 16 | # define fallthrough __attribute__((__fallthrough__)) | ^ /usr/include/glib-2.0/glib/gmacros.h:637:28: note: in expansion of macro ‘fallthrough’ 637 | #if g_macro__has_attribute(fallthrough) Reviewed-by: Miguel Ojeda <[email protected]> Signed-off-by: Liam Howlett <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] <[email protected]> Cc: [email protected] <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf pmu: Fix a few potential fd leaksIan Rogers1-2/+10
Ensure fd is closed on error paths. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Gaosheng Cui <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-06perf pmu: Make parser reentrantIan Rogers4-13/+37
By default bison uses global state for compatibility with yacc. Make the parser reentrant so that it may be used in asynchronous and multithreaded situations. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Gaosheng Cui <[email protected]> Cc: German Gomez <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Rob Herring <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf map: Add accessor for start and endIan Rogers29-114/+125
Later changes will add reference count checking for struct map, start and end are frequently accessed variables. Add an accessor so that the reference count check is only necessary in one place. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf map: Add accessor for dsoIan Rogers54-319/+447
Later changes will add reference count checking for struct map, with dso being the most frequently accessed variable. Add an accessor so that the reference count check is only necessary in one place. Additional changes: - add a dso variable to avoid repeated map__dso calls. - in builtin-mem.c dump_raw_samples, code only partially tested for dso == NULL. Make the possibility of NULL consistent. - in thread.c thread__memcpy fix use of spaces and use tabs. Committer notes: Did missing conversions on these files: tools/perf/arch/powerpc/util/skip-callchain-idx.c tools/perf/arch/powerpc/util/sym-handling.c tools/perf/ui/browsers/hists.c tools/perf/ui/gtk/annotate.c tools/perf/util/cs-etm.c tools/perf/util/thread.c tools/perf/util/unwind-libunwind-local.c tools/perf/util/unwind-libunwind.c Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf maps: Add functions to access mapsIan Rogers20-111/+175
Introduce functions to access struct maps. These functions reduce the number of places reference counting is necessary. While tidying APIs do some small const-ification, in particlar to unwind_libunwind_ops. Committer notes: Fixed up tools/perf/util/unwind-libunwind.c: - return ops->get_entries(cb, arg, thread, data, max_stack); + return ops->get_entries(cb, arg, thread, data, max_stack, best_effort); Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf maps: Remove rb_node from struct mapIan Rogers17-186/+295
struct map is reference counted, having it also be a node in an red-black tree complicates the reference counting. Switch to having a map_rb_node which is a red-block tree node but points at the reference counted struct map. This reference is responsible for a single reference count. Committer notes: Fixed up tools/perf/util/unwind-libunwind-local.c to use map_rb_node as well. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf map: Move map list node into symbolIan Rogers2-24/+54
Using a perf map as a list node is only done in symbol. Move the list_node struct into symbol as a single pointer to the map. This makes reference count behavior more obvious and easy to check. Committer notes: Some changes to reduce the number of lines touched by keeping, for instance, the 'new_map' variable and setting it to new_node->map, so that we keep more of the project history in place and keep as much as possible the value of the 'git blame' tool. Also use map__zput() when putting a struct members, so that when we free the container struct we can get use-after-free errors as NULL pointer derefs sometimes. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: German Gomez <[email protected]> Cc: Hao Luo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Miaoqian Lin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Shunsuke Nakamura <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Richter <[email protected]> Cc: Yury Norov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf jit: Fix a few memory leaksIan Rogers2-18/+34
As reported by leak sanitizer. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Brian Robbins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Yuan Can <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf build: Allow C++ demangle without libelfIan Rogers1-1/+0
The cxa demangle support isn't dependent on libelf and so we no longer need to disable demangling if libelf isn't present. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf srcline: Avoid addr2line SIGPIPEsIan Rogers1-0/+2
Ignore SIGPIPEs when addr2line is configured. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf srcline: Support for llvm-addr2lineIan Rogers1-7/+67
The sentinel value differs for llvm-addr2line. Configure this once and then detect when reading records. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf srcline: Simplify addr2line subprocessIan Rogers1-65/+38
Don't wrap stdin and stdout of subprocess with streams, use the api/io library for buffering. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04tools api: Add io__getlineIan Rogers1-0/+36
Reads a line to allocated memory up to a newline following the getline API. Committer notes: It also adds this new function to the 'api io' 'perf test' entry: $ perf test "api io" 64: Test api io : Ok $ Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf intel-pt: Use perf_pmu__scan_file_at() if possibleNamhyung Kim1-20/+32
Intel-PT calls perf_pmu__scan_file() a lot, let's use relative address when it accesses multiple files at one place. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Adrian Hunter <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf pmu: Add perf_pmu__{open,scan}_file_at()Namhyung Kim2-13/+50
These two helpers will also use openat() to reduce the overhead with relative pathnames. Convert other functions in pmu_lookup() to use the new helpers. Committer testing: Before: ⬢[acme@toolbox perf-tools-next]$ perf bench internals pmu-scan # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 100 times Average PMU scanning took: 2729.040 usec (+- 7.117 usec) ⬢[acme@toolbox perf-tools-next]$ After: ⬢[acme@toolbox perf-tools-next]$ perf bench internals pmu-scan # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 100 times Average PMU scanning took: 2419.870 usec (+- 9.057 usec) ⬢[acme@toolbox perf-tools-next]$ Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf pmu: Use relative path in setup_pmu_alias_list()Namhyung Kim1-6/+7
Likewise, x86 needs to traverse the PMU list to build alias. Let's use the new helpers to use relative paths. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf pmu: Use relative path in perf_pmu__caps_parse()Namhyung Kim1-4/+6
Likewise, it needs to traverse the pmu/caps directory, let's use openat() with the dirfd instead of open() using the absolute path. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: Peter Zijlstra <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Leo Yan <[email protected]> Cc: Kan Liang <[email protected]> Cc: LKML <[email protected]> Cc: [email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf pmu: Use relative path for sysfs scanNamhyung Kim3-63/+111
The PMU information is in the kernel sysfs so it needs to scan the directory to get the whole information like event aliases, formats and so on. During the traversal, it opens a lot of files and directories like below: dir = opendir("/sys/bus/event_source/devices"); while (dentry = readdir(dir)) { char buf[PATH_MAX]; snprintf(buf, sizeof(buf), "%s/%s", "/sys/bus/event_source/devices", dentry->d_name); fd = open(buf, O_RDONLY); ... } But this is not good since it needs to copy the string to build the absolute pathname, and it makes redundant pathname walk (from the /sys) unnecessarily. We can use openat(2) to open the file in the given directory. While it's not a problem ususally, it can be a problem when the kernel has contentions on the sysfs. Add a couple of new helper to return the file descriptor of PMU directory so that it can use it with relative paths. * perf_pmu__event_source_devices_fd() - returns a fd for the PMU root ("/sys/bus/event_source/devices") * perf_pmu__pathname_fd() - returns a fd for "<pmu>/<file>" under the PMU root Now the above code can be converted something like below: dirfd = perf_pmu__event_source_devices_fd(); dir = fdopendir(dirfd); while (dentry = readdir(dir)) { fd = openat(dirfd, dentry->d_name, O_RDONLY); ... } Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf bench: Add pmu-scan benchmarkNamhyung Kim4-0/+187
The pmu-scan benchmark will repeatedly scan the sysfs to get the available PMU information. $ ./perf bench internals pmu-scan # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 100 times Average PMU scanning took: 6850.990 usec (+- 48.445 usec) Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf pmu: Add perf_pmu__destroy() functionNamhyung Kim2-0/+52
It seems there's no function to delete the perf pmu struct. Add the perf_pmu__destroy() to do the job. While at it, add some more helper functions to delete pmu aliases and caps. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf tools: Fix a asan issue in parse_events_multi_pmu_add()Namhyung Kim1-1/+1
In the parse_events_multi_pmu_add() it passes the 'config' variable twice to parse_events_term__num() - one for config and another for loc_term. I'm not sure about the second one as it's converted to YYLTYPE variable. Asan reports it like below: In function ‘parse_events_term__num’, inlined from ‘parse_events_multi_pmu_add’ at util/parse-events.c:1602:6: util/parse-events.c:2653:64: error: array subscript ‘YYLTYPE[0]’ is partly outside array bounds of ‘char[8]’ [-Werror=array-bounds] 2653 | .err_term = loc_term ? loc_term->first_column : 0, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~ util/parse-events.c: In function ‘parse_events_multi_pmu_add’: util/parse-events.c:1587:15: note: object ‘config’ of size 8 1587 | char *config; | ^~~~~~ cc1: all warnings being treated as errors Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf list: Use relative path for tracepoint scanNamhyung Kim1-7/+31
Committer notes: Added missing #include <unistd.h> for the close() prototype to fix this on Alma Linux 8: 1 21.54 almalinux:8 : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-16) (GCC) util/print-events.c: In function 'print_tracepoint_events': util/print-events.c:103:4: error: implicit declaration of function 'close'; did you mean 'clone'? [-Werror=implicit-function-declaration] close(evt_fd); ^~~~~ clone Also use the newly added scandirat feature test to check if that function is available, providing a HAVE_SCANDIRAT_SUPPORT conditional warning to the user if it isn't available. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04tools build: Add a feature test for scandirat(), that is not implemented so ↵Arnaldo Carvalho de Melo1-0/+4
far in musl and uclibc We use it just when listing tracepoint events, and for root, so just emit a warning about it to get users to ask the library maintainers to implement it, as suggested in this systemd ticket: https://github.com/systemd/casync/issues/129 Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/ZCwv4z5Dh%[email protected]/ Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf intel-pt: Fix CYC timestamps after standalone CBRAdrian Hunter1-0/+2
After a standalone CBR (not associated with TSC), update the cycles reference timestamp and reset the cycle count, so that CYC timestamps are calculated relative to that point with the new frequency. Fixes: cc33618619cefc6d ("perf tools: Add Intel PT support for decoding CYC packets") Signed-off-by: Adrian Hunter <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf auxtrace: Fix address filter entire kernel sizeAdrian Hunter1-1/+4
kallsyms is not completely in address order. In find_entire_kern_cb(), calculate the kernel end from the maximum address not the last symbol. Example: Before: $ sudo cat /proc/kallsyms | grep ' [twTw] ' | tail -1 ffffffffc00b8bd0 t bpf_prog_6deef7357e7b4530 [bpf] $ sudo cat /proc/kallsyms | grep ' [twTw] ' | sort | tail -1 ffffffffc15e0cc0 t iwl_mvm_exit [iwlmvm] $ perf.d093603a05aa record -v --kcore -e intel_pt// --filter 'filter *' -- uname |& grep filter Address filter: filter 0xffffffff93200000/0x2ceba000 After: $ perf.8fb0f7a01f8e record -v --kcore -e intel_pt// --filter 'filter *' -- uname |& grep filter Address filter: filter 0xffffffff93200000/0x2e3e2000 Fixes: 1b36c03e356936d6 ("perf record: Add support for using symbols in address filters") Signed-off-by: Adrian Hunter <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf arm-spe: Add raw decoding for SPEv1.3 MTE and MOPS load/storeRob Herring2-0/+12
Arm SPEv1.3 adds new load/store operation subclasses for Memory Tagging Extension (MTE) and memory operations (MOPS). The memory operations are memcpy and memset. Add support for decoding these new subclasses in the raw decoding. Reviewed-by: Leo Yan <[email protected] Signed-off-by: Rob Herring <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packetMike Leach2-18/+235
When using dynamically assigned CoreSight trace IDs the drivers can output the ID / CPU association as a PERF_RECORD_AUX_OUTPUT_HW_ID packet. Update cs-etm decoder to handle this packet by setting the CPU/Trace ID mapping. Reviewed-by: James Clark <[email protected]> Signed-off-by: Mike Leach <[email protected]> Acked-by: Suzuki Poulouse <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Darren Hart <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-04perf cs-etm: Update record event to use new Trace ID protocolMike Leach1-10/+17
Trace IDs are now dynamically allocated. Previously used the static association algorithm that is no longer used. The 'cpu * 2 + seed' was outdated and broken for systems with high core counts (>46). as it did not scale and was broken for larger core counts. Trace ID will now be sent in PERF_RECORD_AUX_OUTPUT_HW_ID record. Legacy ID algorithm renamed and retained for limited backward compatibility use. Reviewed-by: James Clark <[email protected]> Signed-off-by: Mike Leach <[email protected]> Acked-by: Suzuki Poulouse <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Darren Hart <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>