Age | Commit message (Collapse) | Author | Files | Lines |
|
Patch enhances current metric infrastructure to handle "?" in the metric
expression. The "?" can be use for parameters whose value not known
while creating metric events and which can be replace later at runtime
to the proper value. It also add flexibility to create multiple events
out of single metric event added in JSON file.
Patch adds function 'arch_get_runtimeparam' which is a arch specific
function, returns the count of metric events need to be created. By
default it return 1.
This infrastructure needed for hv_24x7 socket/chip level events.
"hv_24x7" chip level events needs specific chip-id to which the data is
requested. Function 'arch_get_runtimeparam' implemented in header.c
which extract number of sockets from sysfs file "sockets" under
"/sys/devices/hv_24x7/interface/".
With this patch basically we are trying to create as many metric events
as define by runtime_param.
For that one loop is added in function 'metricgroup__add_metric', which
create multiple events at run time depend on return value of
'arch_get_runtimeparam' and merge that event in 'group_list'.
To achieve that we are actually passing this parameter value as part of
`expr__find_other` function and changing "?" present in metric
expression with this value.
As in our JSON file, there gonna be single metric event, and out of
which we are creating multiple events.
To understand which data count belongs to which parameter value,
we also printing param value in generic_metric function.
For example,
command:# ./perf stat -M PowerBUS_Frequency -C 0 -I 1000
1.000101867 9,356,933 hv_24x7/pm_pb_cyc,chip=0/ # 2.3 GHz PowerBUS_Frequency_0
1.000101867 9,366,134 hv_24x7/pm_pb_cyc,chip=1/ # 2.3 GHz PowerBUS_Frequency_1
2.000314878 9,365,868 hv_24x7/pm_pb_cyc,chip=0/ # 2.3 GHz PowerBUS_Frequency_0
2.000314878 9,366,092 hv_24x7/pm_pb_cyc,chip=1/ # 2.3 GHz PowerBUS_Frequency_1
So, here _0 and _1 after PowerBUS_Frequency specify parameter value.
Signed-off-by: Kajol Jain <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Anju T Sudhakar <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Joe Mario <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Mamatha Inamdar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
get_cpuid_str() is used in tools/perf/arch/xxx/util/header.c,
fix the name in comment.
Signed-off-by: Shaokun Zhang <[email protected]>
Cc: Andi Kleen <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fixes coccicheck warning:
tools/perf/builtin-report.c:1403:2-34: WARNING: Assignment of 0/1 to bool variable
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zou Wei <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fixes coccicheck warnings:
tools/perf/builtin-diff.c:1565:2-3: Unneeded semicolon
tools/perf/builtin-lock.c:778:2-3: Unneeded semicolon
tools/perf/builtin-mem.c:126:2-3: Unneeded semicolon
tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c:555:2-3: Unneeded semicolon
tools/perf/util/ordered-events.c:317:2-3: Unneeded semicolon
tools/perf/util/synthetic-events.c:1131:2-3: Unneeded semicolon
tools/perf/util/trace-event-read.c:78:2-3: Unneeded semicolon
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zou Wei <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fixes coccicheck warnings:
tools/perf/builtin-c2c.c:1712:2-3: Unneeded semicolon
tools/perf/builtin-c2c.c:1928:2-3: Unneeded semicolon
tools/perf/builtin-c2c.c:2962:2-3: Unneeded semicolon
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zou Wei <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Fixes coccicheck warning:
tools/lib/traceevent/kbuffer-parse.c:441:2-3: Unneeded semicolon
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zou Wei <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When printing iregs, there was a double newline printed because
perf_sample__fprintf_regs() was printing its own and then at the end of
all fields, perf script was adding one. This was causing blank line in
the output:
Before:
$ perf script -Fip,iregs
401b8d ABI:2 DX:0x100 SI:0x4a8340 DI:0x4a9340
401b8d ABI:2 DX:0x100 SI:0x4a9340 DI:0x4a8340
401b8d ABI:2 DX:0x100 SI:0x4a8340 DI:0x4a9340
401b8d ABI:2 DX:0x100 SI:0x4a9340 DI:0x4a8340
After:
$ perf script -Fip,iregs
401b8d ABI:2 DX:0x100 SI:0x4a8340 DI:0x4a9340
401b8d ABI:2 DX:0x100 SI:0x4a9340 DI:0x4a8340
401b8d ABI:2 DX:0x100 SI:0x4a8340 DI:0x4a9340
Committer testing:
First we need to figure out how to request that registers be recorded,
so we use:
# perf record -h reg
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-I, --intr-regs[=<any register>]
sample selected machine registers on interrupt, use '-I?' to list register names
--buildid-all Record build-id of all DSOs regardless of hits
--user-regs[=<any register>]
sample selected machine registers on interrupt, use '--user-regs=?' to list register names
#
Ok, now lets ask for them all:
# perf record -a --intr-regs --user-regs sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.105 MB perf.data (2760 samples) ]
#
Lets look at the first 6 output lines:
# perf script -Fip,iregs | head -6
ffffffff8a06f2f4 ABI:2 AX:0xffffd168fee0a980 BX:0xffff8a23b087f000 CX:0xfffeb69aaeb25d73 DX:0xffff8a253e8310f0 SI:0xfffffff9bafe7359 DI:0xffffb1690204fb10 BP:0xffffd168fee0a950 SP:0xffffb1690204fb88 IP:0xffffffff8a06f2f4 FLAGS:0x4e CS:0x10 SS:0x18 R8:0x1495f0a91129a R9:0xffff8a23b087f000 R10:0x1 R11:0xffffffff R12:0x0 R13:0xffff8a253e827e00 R14:0xffffd168fee0aa5c R15:0xffffd168fee0a980
ffffffff8a06f2f4 ABI:2 AX:0x0 BX:0xffffd168fee0a950 CX:0x5684cc1118491900 DX:0x0 SI:0xffffd168fee0a9d0 DI:0x202 BP:0xffffb1690204fd70 SP:0xffffb1690204fd20 IP:0xffffffff8a06f2f4 FLAGS:0x24e CS:0x10 SS:0x18 R8:0x0 R9:0xffffd168fee0a9d0 R10:0x1 R11:0xffffffff R12:0xffffffff8a23e480 R13:0xffff8a23b087f240 R14:0xffff8a23b087f000 R15:0xffffd168fee0a950
ffffffff8a06f2f4 ABI:2 AX:0x0 BX:0x0 CX:0x7f25f334335b DX:0x0 SI:0x2400 DI:0x4 BP:0x7fff5f264570 SP:0x7fff5f264538 IP:0xffffffff8a06f2f4 FLAGS:0x24e CS:0x10 SS:0x2b R8:0x0 R9:0x2312d20 R10:0x0 R11:0x246 R12:0x22cc0e0 R13:0x0 R14:0x0 R15:0x22d0780
#
Reproduced, apply the patch and:
[root@five ~]# perf script -Fip,iregs | head -6
ffffffff8a06f2f4 ABI:2 AX:0xffffd168fee0a980 BX:0xffff8a23b087f000 CX:0xfffeb69aaeb25d73 DX:0xffff8a253e8310f0 SI:0xfffffff9bafe7359 DI:0xffffb1690204fb10 BP:0xffffd168fee0a950 SP:0xffffb1690204fb88 IP:0xffffffff8a06f2f4 FLAGS:0x4e CS:0x10 SS:0x18 R8:0x1495f0a91129a R9:0xffff8a23b087f000 R10:0x1 R11:0xffffffff R12:0x0 R13:0xffff8a253e827e00 R14:0xffffd168fee0aa5c R15:0xffffd168fee0a980
ffffffff8a06f2f4 ABI:2 AX:0x0 BX:0xffffd168fee0a950 CX:0x5684cc1118491900 DX:0x0 SI:0xffffd168fee0a9d0 DI:0x202 BP:0xffffb1690204fd70 SP:0xffffb1690204fd20 IP:0xffffffff8a06f2f4 FLAGS:0x24e CS:0x10 SS:0x18 R8:0x0 R9:0xffffd168fee0a9d0 R10:0x1 R11:0xffffffff R12:0xffffffff8a23e480 R13:0xffff8a23b087f240 R14:0xffff8a23b087f000 R15:0xffffd168fee0a950
ffffffff8a06f2f4 ABI:2 AX:0x0 BX:0x0 CX:0x7f25f334335b DX:0x0 SI:0x2400 DI:0x4 BP:0x7fff5f264570 SP:0x7fff5f264538 IP:0xffffffff8a06f2f4 FLAGS:0x24e CS:0x10 SS:0x2b R8:0x0 R9:0x2312d20 R10:0x0 R11:0x246 R12:0x22cc0e0 R13:0x0 R14:0x0 R15:0x22d0780
ffffffff8a24074b ABI:2 AX:0xcb BX:0xcb CX:0x0 DX:0x0 SI:0xffffb1690204ff58 DI:0xcb BP:0xffffb1690204ff58 SP:0xffffb1690204ff40 IP:0xffffffff8a24074b FLAGS:0x24e CS:0x10 SS:0x18 R8:0x0 R9:0x0 R10:0x0 R11:0x0 R12:0x0 R13:0x0 R14:0x0 R15:0x0
ffffffff8a310600 ABI:2 AX:0x0 BX:0xffffffff8b8c39a0 CX:0x0 DX:0xffff8a2503890300 SI:0xffffb1690204ff20 DI:0xffff8a23e4080000 BP:0xffff8a23e4080000 SP:0xffffb1690204fec0 IP:0xffffffff8a310600 FLAGS:0x28e CS:0x10 SS:0x18 R8:0x0 R9:0x0 R10:0x0 R11:0x0 R12:0xffffffffffffffea R13:0xffff8a23e4080020 R14:0x0 R15:0x0
ffffffff8a11b688 ABI:2 AX:0x0 BX:0xffff8a237b7c8800 CX:0xffffb1690204fae0 DX:0x78 SI:0xffff8a237b7c8800 DI:0xffffb1690204fa10 BP:0xffffb1690204fb00 SP:0xffffb1690204fa00 IP:0xffffffff8a11b688 FLAGS:0x8a CS:0x10 SS:0x18 R8:0x1495f0a917eba R9:0xffffd168fde19a48 R10:0xffffb1690204fd98 R11:0xffff8a253e82afb0 R12:0xffff8a237b7c8800 R13:0xffffb1690204fb00 R14:0x0 R15:0xffff8a237b7c8800
[root@five ~]#
To see it more clearly, lets get just two of those registers by sample:
# perf record -a --intr-regs=ax,bx --user-regs=cx,dx sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.502 MB perf.data (1653 samples) ]
#
Extra info, lets see what gets setup in that 'struct perf_event_attr':
# perf evlist -v
cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|REGS_USER|REGS_INTR, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, sample_regs_user: 0xc, sample_regs_intr: 0x3
#
Cook, some PERF_SAMPLE_REGS_USER|PERF_SAMPLE_REGS_INTR +
attr.sample_regs_user and attr.sample_regs_intr register masks, now lets
see if those newlines are gone in a more compact fashion:
# perf script -Fip,iregs,uregs
ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920
ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920
ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920
ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920
ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920
ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920
ffffffff8a29b78d ABI:2 AX:0x2a20ffcd6000 BX:0x2ec7d9000 ABI:2 CX:0x7f204460e49b DX:0xf42920
#
And where was that?
# perf script -Fip,iregs,uregs,sym,dso
ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920
ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920
ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920
ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920
ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920
ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920
ffffffff8a29b78d __vma_link_rb (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0x2a20ffcd6000 BX:0x2ec7d9000 ABI:2 CX:0x7f204460e49b DX:0xf42920
#
Signed-off-by: Stephane Eranian <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The synthesize benchmark, run on a single process and thread, shows
perf_event__synthesize_mmap_events as the hottest function with fgets
and sscanf taking the majority of execution time.
fscanf performs similarly well. Replace the scanf call with manual
reading of each field of the /proc/pid/maps line, and remove some
unnecessary buffering.
This change also addresses potential, but unlikely, buffer overruns for
the string values read by scanf.
Performance before is:
$ sudo perf bench internals synthesize -m 16 -M 16 -s -t
\# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 102.810 usec (+- 0.027 usec)
Average num. events: 17.000 (+- 0.000)
Average time per event 6.048 usec
Average data synthesis took: 106.325 usec (+- 0.018 usec)
Average num. events: 89.000 (+- 0.000)
Average time per event 1.195 usec
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 16
Average synthesis took: 68103.100 usec (+- 441.234 usec)
Average num. events: 30703.000 (+- 0.730)
Average time per event 2.218 usec
And after is:
$ sudo perf bench internals synthesize -m 16 -M 16 -s -t
\# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 50.388 usec (+- 0.031 usec)
Average num. events: 17.000 (+- 0.000)
Average time per event 2.964 usec
Average data synthesis took: 52.693 usec (+- 0.020 usec)
Average num. events: 89.000 (+- 0.000)
Average time per event 0.592 usec
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 16
Average synthesis took: 45022.400 usec (+- 552.740 usec)
Average num. events: 30624.200 (+- 10.037)
Average time per event 1.470 usec
On a Intel Xeon 6154 compiling with Debian gcc 9.2.1.
Committer testing:
On a AMD Ryzen 5 3600X 6-Core Processor:
Before:
# perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 267.491 usec (+- 0.176 usec)
Average num. events: 56.000 (+- 0.000)
Average time per event 4.777 usec
Average data synthesis took: 277.257 usec (+- 0.169 usec)
Average num. events: 287.000 (+- 0.000)
Average time per event 0.966 usec
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 12
Average synthesis took: 81599.500 usec (+- 346.315 usec)
Average num. events: 36096.100 (+- 2.523)
Average time per event 2.261 usec
#
After:
# perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 110.125 usec (+- 0.080 usec)
Average num. events: 56.000 (+- 0.000)
Average time per event 1.967 usec
Average data synthesis took: 118.518 usec (+- 0.057 usec)
Average num. events: 287.000 (+- 0.000)
Average time per event 0.413 usec
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 12
Average synthesis took: 43490.700 usec (+- 284.527 usec)
Average num. events: 37028.500 (+- 0.563)
Average time per event 1.175 usec
#
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrey Zhizhikin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The synthesize benchmark shows the majority of execution time going to
fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
reading library that will be used to replace these calls in a follow-up
CL. Add tests for the library to perf test.
Committer tests:
$ perf test api
63: Test api io : Ok
$
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrey Zhizhikin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
By default this isn't run as it reads /proc and may not have access.
For consistency, modify the single threaded benchmark to compute an
average time per event.
Committer testing:
$ grep -m1 "model name" /proc/cpuinfo
model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
$ grep "model name" /proc/cpuinfo | wc -l
8
$
$ perf bench internals synthesize -h
# Running 'internals/synthesize' benchmark:
Usage: perf bench internals synthesize <options>
-I, --multi-iterations <n>
Number of iterations used to compute multi-threaded average
-i, --single-iterations <n>
Number of iterations used to compute single-threaded average
-M, --max-threads <n>
Maximum number of threads in multithreaded bench
-m, --min-threads <n>
Minimum number of threads in multithreaded bench
-s, --st Run single threaded benchmark
-t, --mt Run multi-threaded benchmark
$
$ perf bench internals synthesize -t
# Running 'internals/synthesize' benchmark:
Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:
Number of synthesis threads: 1
Average synthesis took: 65449.000 usec (+- 586.442 usec)
Average num. events: 9405.400 (+- 0.306)
Average time per event 6.959 usec
Number of synthesis threads: 2
Average synthesis took: 37838.300 usec (+- 130.259 usec)
Average num. events: 9501.800 (+- 20.469)
Average time per event 3.982 usec
Number of synthesis threads: 3
Average synthesis took: 48551.400 usec (+- 225.686 usec)
Average num. events: 9544.000 (+- 0.000)
Average time per event 5.087 usec
Number of synthesis threads: 4
Average synthesis took: 29632.500 usec (+- 50.808 usec)
Average num. events: 9544.000 (+- 0.000)
Average time per event 3.105 usec
Number of synthesis threads: 5
Average synthesis took: 33920.400 usec (+- 284.509 usec)
Average num. events: 9544.000 (+- 0.000)
Average time per event 3.554 usec
Number of synthesis threads: 6
Average synthesis took: 27604.100 usec (+- 72.344 usec)
Average num. events: 9548.000 (+- 0.000)
Average time per event 2.891 usec
Number of synthesis threads: 7
Average synthesis took: 25406.300 usec (+- 933.371 usec)
Average num. events: 9545.500 (+- 0.167)
Average time per event 2.662 usec
Number of synthesis threads: 8
Average synthesis took: 24110.400 usec (+- 73.229 usec)
Average num. events: 9551.000 (+- 0.000)
Average time per event 2.524 usec
$
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrey Zhizhikin <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Update bpf_sk_assign test to fetch the server socket from SOCKMAP, now that
map lookup from BPF in SOCKMAP is enabled. This way the test TC BPF program
doesn't need to know what address server socket is bound to.
Signed-off-by: Jakub Sitnicki <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Now that bpf_map_lookup_elem() is white-listed for SOCKMAP/SOCKHASH,
replace the tests which check that verifier prevents lookup on these map
types with ones that ensure that lookup operation is permitted, but only
with a release of acquired socket reference.
Signed-off-by: Jakub Sitnicki <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The new libcap dependency is not used for an essential feature of
bpftool, and we could imagine building the tool without checks on
CAP_SYS_ADMIN by disabling probing features as an unprivileged users.
Make it so, in order to avoid a hard dependency on libcap, and to ease
packaging/embedding of bpftool.
Signed-off-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
There is demand for a way to identify what BPF helper functions are
available to unprivileged users. To do so, allow unprivileged users to
run "bpftool feature probe" to list BPF-related features. This will only
show features accessible to those users, and may not reflect the full
list of features available (to administrators) on the system.
To avoid the case where bpftool is inadvertently run as non-root and
would list only a subset of the features supported by the system when it
would be expected to list all of them, running as unprivileged is gated
behind the "unprivileged" keyword passed to the command line. When used
by a privileged user, this keyword allows to drop the CAP_SYS_ADMIN and
to list the features available to unprivileged users. Note that this
addsd a dependency on libpcap for compiling bpftool.
Note that there is no particular reason why the probes were restricted
to root, other than the fact I did not need them for unprivileged and
did not bother with the additional checks at the time probes were added.
Signed-off-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The "full_mode" variable used for switching between full or partial
feature probing (i.e. with or without probing helpers that will log
warnings in kernel logs) was piped from the main do_probe() function
down to probe_helpers_for_progtype(), where it is needed.
Define it as a global variable: the calls will be more readable, and if
other similar flags were to be used in the future, we could use global
variables as well instead of extending again the list of arguments with
new flags.
Signed-off-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
With recent changes, runqslower is being copied into selftests/bpf root
directory. So add it into .gitignore.
Fixes: b26d1e2b6028 ("selftests/bpf: Copy runqslower to OUTPUT directory")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Cc: Veronika Kabatova <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
If condition is inverted, but it's also just not necessary.
Fixes: 1c1052e0140a ("tools/testing/selftests/bpf: Add self-tests for new helper bpf_get_ns_current_pid_tgid.")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Cc: Carlos Neira <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
AddressSanitizer assumes that all memory dereferences are done against memory
allocated by sanitizer's malloc()/free() code and not touched by anyone else.
Seems like this doesn't hold for perf buffer memory. Disable instrumentation
on perf buffer callback function.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
BTF object wasn't freed.
Fixes: a6ed02cac690 ("libbpf: Load btf_vmlinux only once per object.")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Cc: KP Singh <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Another one found by AddressSanitizer. input_len is bigger than actually
initialized data size.
Fixes: c7566a69695c ("selftests/bpf: Add field existence CO-RE relocs tests")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
getline() allocates string, which has to be freed.
Fixes: 81f77fd0deeb ("bpf: add selftest for stackmap with BPF_F_STACK_BUILD_ID")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Cc: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Free test selector substrings, which were strdup()'ed.
Fixes: b65053cd94f4 ("selftests/bpf: Add whitelist/blacklist of test names to test_progs")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Fix memory leak in hashmap_clear() not freeing hashmap_entry structs for each
of the remaining entries. Also NULL-out bucket list to prevent possible
double-free between hashmap__clear() and hashmap__free().
Running test_progs-asan flavor clearly showed this problem.
Reported-by: Alston Tang <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Fold stand-alone test_hashmap test into test_progs.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add ability to specify extra compiler flags with SAN_CFLAGS for compilation of
all user-space C files. This allows to build all of selftest programs with,
e.g., custom sanitizer flags, without requiring support for such sanitizers
from anyone compiling selftest/bpf.
As an example, to compile everything with AddressSanitizer, one would do:
$ make clean && make SAN_CFLAGS="-fsanitize=address"
For AddressSanitizer to work, one needs appropriate libasan shared library
installed in the system, with version of libasan matching what GCC links
against. E.g., GCC8 needs libasan5, while GCC7 uses libasan4.
For CentOS 7, to build everything successfully one would need to:
$ sudo yum install devtoolset-8-gcc devtoolset-libasan-devel
$ scl enable devtoolset-8 bash # set up environment
For Arch Linux to run selftests, one would need to install gcc-libs package to
get libasan.so.5:
$ sudo pacman -S gcc-libs
N.B. EXTRA_CFLAGS name wasn't used, because it's also used by libbpf's
Makefile and this causes few issues:
1. default "-g -Wall" flags are overriden;
2. compiling shared library with AddressSanitizer generates a bunch of symbols
like: "_GLOBAL__sub_D_00099_0_btf_dump.c", "_GLOBAL__sub_D_00099_0_bpf.c",
etc, which screws up versioned symbols check.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Cc: Julia Kartseva <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Ensure that test runner flavors include their own skeletons from <flavor>/
directory. Previously, skeletons generated for no-flavor test_progs were used.
Apart from fixing correctness, this also makes it possible to compile only
flavors individually:
$ make clean && make test_progs-no_alu32
... now succeeds ...
Fixes: 74b5a5968fe8 ("selftests/bpf: Replace test_progs and test_maps w/ general rule")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
As discussed at LPC 2019 ([0]), this patch brings (a quite belated) support
for declarative BTF-defined map-in-map support in libbpf. It allows to define
ARRAY_OF_MAPS and HASH_OF_MAPS BPF maps without any user-space initialization
code involved.
Additionally, it allows to initialize outer map's slots with references to
respective inner maps at load time, also completely declaratively.
Despite a weak type system of C, the way BTF-defined map-in-map definition
works, it's actually quite hard to accidentally initialize outer map with
incompatible inner maps. This being C, of course, it's still possible, but
even that would be caught at load time and error returned with helpful debug
log pointing exactly to the slot that failed to be initialized.
As an example, here's a rather advanced HASH_OF_MAPS declaration and
initialization example, filling slots #0 and #4 with two inner maps:
#include <bpf/bpf_helpers.h>
struct inner_map {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 1);
__type(key, int);
__type(value, int);
} inner_map1 SEC(".maps"),
inner_map2 SEC(".maps");
struct outer_hash {
__uint(type, BPF_MAP_TYPE_HASH_OF_MAPS);
__uint(max_entries, 5);
__uint(key_size, sizeof(int));
__array(values, struct inner_map);
} outer_hash SEC(".maps") = {
.values = {
[0] = &inner_map2,
[4] = &inner_map1,
},
};
Here's the relevant part of libbpf debug log showing pretty clearly of what's
going on with map-in-map initialization:
libbpf: .maps relo #0: for 6 value 0 rel.r_offset 96 name 260 ('inner_map1')
libbpf: .maps relo #0: map 'outer_arr' slot [0] points to map 'inner_map1'
libbpf: .maps relo #1: for 7 value 32 rel.r_offset 112 name 249 ('inner_map2')
libbpf: .maps relo #1: map 'outer_arr' slot [2] points to map 'inner_map2'
libbpf: .maps relo #2: for 7 value 32 rel.r_offset 144 name 249 ('inner_map2')
libbpf: .maps relo #2: map 'outer_hash' slot [0] points to map 'inner_map2'
libbpf: .maps relo #3: for 6 value 0 rel.r_offset 176 name 260 ('inner_map1')
libbpf: .maps relo #3: map 'outer_hash' slot [4] points to map 'inner_map1'
libbpf: map 'inner_map1': created successfully, fd=4
libbpf: map 'inner_map2': created successfully, fd=5
libbpf: map 'outer_hash': created successfully, fd=7
libbpf: map 'outer_hash': slot [0] set to map 'inner_map2' fd=5
libbpf: map 'outer_hash': slot [4] set to map 'inner_map1' fd=4
Notice from the log above that fd=6 (not logged explicitly) is used for inner
"prototype" map, necessary for creation of outer map. It is destroyed
immediately after outer map is created.
See also included selftest with some extra comments explaining extra details
of usage. Additionally, similar initialization syntax and libbpf functionality
can be used to do initialization of BPF_PROG_ARRAY with references to BPF
sub-programs. This can be done in follow up patches, if there will be a demand
for this.
[0] https://linuxplumbersconf.org/event/4/contributions/448/
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Factor out map creation and destruction logic to simplify code and especially
error handling. Also fix map FD leak in case of partially successful map
creation during bpf_object load operation.
Fixes: 57a00f41644f ("libbpf: Add auto-pinning of maps when loading BPF objects")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Factor out BTF map definition logic into stand-alone routine for easier reuse
for map-in-map case.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Extend bpftool's bash-completion script to handle new link command and its
sub-commands.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add bpftool-link manpage with information and examples of link-related
commands.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add `bpftool link show` and `bpftool link pin` commands.
Example plain output for `link show` (with showing pinned paths):
[vmuser@archvm bpf]$ sudo ~/local/linux/tools/bpf/bpftool/bpftool -f link
1: tracing prog 12
prog_type tracing attach_type fentry
pinned /sys/fs/bpf/my_test_link
pinned /sys/fs/bpf/my_test_link2
2: tracing prog 13
prog_type tracing attach_type fentry
3: tracing prog 14
prog_type tracing attach_type fentry
4: tracing prog 15
prog_type tracing attach_type fentry
5: tracing prog 16
prog_type tracing attach_type fentry
6: tracing prog 17
prog_type tracing attach_type fentry
7: raw_tracepoint prog 21
tp 'sys_enter'
8: cgroup prog 25
cgroup_id 584 attach_type egress
9: cgroup prog 25
cgroup_id 599 attach_type egress
10: cgroup prog 25
cgroup_id 614 attach_type egress
11: cgroup prog 25
cgroup_id 629 attach_type egress
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Move attach_type_strings into main.h for access in non-cgroup code.
bpf_attach_type is used for non-cgroup attach types quite widely now. So also
complete missing string translations for non-cgroup attach types.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Extend bpf_obj_id selftest to verify bpf_link's observability APIs.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add low-level API calls for bpf_link_get_next_id() and
bpf_link_get_fd_by_id().
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command.
Also enhance show_fdinfo to potentially include bpf_link type-specific
information (similarly to obj_info).
Also introduce enum bpf_link_type stored in bpf_link itself and expose it in
UAPI. bpf_link_tracing also now will store and return bpf_attach_type.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Similar to commit b7a0d65d80a0 ("bpf, testing: Workaround a verifier failure for test_progs")
fix test_sysctl_prog.c as well.
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
- add SPDX header;
- adjust title markup;
- mark code blocks and literals as such;
- use footnote markup;
- mark tables as such;
- adjust identation, whitespaces and blank lines;
- add to networking/index.rst.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
TLS 1.2 and TLS 1.3 differ in the implementation.
Use fixture parameters to run all tests for both
versions, and remove the one-off TLS 1.2 test.
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Allow users to build parameterized variants of fixtures.
If fixtures want variants, they call FIXTURE_VARIANT() to declare
the structure to fill for each variant. Each fixture will be re-run
for each of the variants defined by calling FIXTURE_VARIANT_ADD()
with the differing parameters initializing the structure.
Since tests are being re-run, additional initialization (steps,
no_print) is also added.
Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Now that all tests have a fixture object move from a global
list of tests to a list of tests per fixture.
Order of tests may change as we will now group and run test
fixture by fixture, rather than in declaration order.
Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Grouping tests by fixture will allow us to parametrize
test runs. Create full objects for fixtures.
Add a "global" fixture for tests without a fixture.
Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Kees suggest to factor out the list append code to a macro,
since following commits need it, which leads to code duplication.
Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
New tests to check route dump and notifications with
net.ipv4.nexthop_compat_mode on and off.
Signed-off-by: Roopa Prabhu <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Fixes the following coccicheck warning:
tools/lib/bpf/btf_dump.c:661:4-5: Unneeded semicolon
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zou Wei <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
$(OUTPUT)/runqslower makefile target doesn't actually create runqslower
binary in the $(OUTPUT) directory. As lib.mk expects all
TEST_GEN_PROGS_EXTENDED (which runqslower is a part of) to be present in
the OUTPUT directory, this results in an error when running e.g. `make
install`:
rsync: link_stat "tools/testing/selftests/bpf/runqslower" failed: No
such file or directory (2)
Copy the binary into the OUTPUT directory after building it to fix the
error.
Fixes: 3a0d3092a4ed ("selftests/bpf: Build runqslower from selftests")
Signed-off-by: Veronika Kabatova <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add test for matchall classifier with mirred egress mirror action.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds a new port attribute, IFLA_BRPORT_MRP_RING_OPEN, which allows
to notify the userspace when the port lost the continuite of MRP frames.
This attribute is set by kernel whenever the SW or HW detects that the ring is
being open or closed.
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
System hangs or killed rcutorture guest OSes can result in truncated
"Reader Pipe:" lines, which can in turn result in false-positive
reader-batch near-miss warnings. This commit therefore adjusts the
reader-batch checks to account for possible line truncation.
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
This commit adds a TRACE02 scenario which enables preemption and RCU
Tasks Trace IPIs, more specifically, disabling heavyweight readers.
Signed-off-by: Paul E. McKenney <[email protected]>
|