Age | Commit message (Collapse) | Author | Files | Lines |
|
At the moment test_sockmap runs all 800+ tests ungrouped which is not
ideal because it makes it hard to see what is failing but also more
importantly its hard to confirm all cases are tested. Additionally,
after inspecting we noticed the runtime is bloated because we run
many duplicate tests. Worse some of these tests are known error cases
that wait for the recvmsg handler to timeout which creats long delays.
Also we noted some tests were not clearing their options and as a
result the following tests would run with extra and incorrect options.
Fix this by reorganizing test code so its clear what tests are running
and when. Then it becomes easy to remove duplication and run tests with
only the set of send/recv patterns that are relavent.
To accomplish this break test_sockmap into subtests and remove
unnecessary duplication. The output is more readable now and
the runtime reduced.
Now default output prints subtests like this,
$ ./test_sockmap
# 1/ 6 sockmap:txmsg test passthrough:OK
...
#22/ 1 sockhash:txmsg test push/pop data:OK
Pass: 22 Fail: 0
Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Jakub Sitnicki <[email protected]>
Link: https://lore.kernel.org/bpf/158939728384.15176.13601520183665880762.stgit@john-Precision-5820-Tower
|
|
The recv thread in test_sockmap waits to receive all bytes from sender but
in the case we use pop data it may wait for more bytes then actually being
sent. This stalls the test harness for multiple seconds. Because this
happens in multiple tests it slows time to run the selftest.
Fix by doing a better job of accounting for total bytes when pop helpers
are used.
Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Jakub Sitnicki <[email protected]>
Link: https://lore.kernel.org/bpf/158939726542.15176.5964532245173539540.stgit@john-Precision-5820-Tower
|
|
Its helpful to know the error value if an error occurs.
Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Jakub Sitnicki <[email protected]>
Link: https://lore.kernel.org/bpf/158939724566.15176.12079885932643225626.stgit@john-Precision-5820-Tower
|
|
Running test_sockmap with arguments to specify a test pattern requires
including a cgroup argument. Instead of requiring this if the option is
not provided create one
This is not used by selftest runs but I use it when I want to test a
specific test. Most useful when developing new code and/or tests.
Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Jakub Sitnicki <[email protected]>
Link: https://lore.kernel.org/bpf/158939722675.15176.6294210959489131688.stgit@john-Precision-5820-Tower
|
|
The prints in the test_sockmap programs were only useful when we
didn't have enough control over test infrastructure to know from
user program what was being pushed into kernel side.
Now that we have or will shortly have better test controls lets
remove the printers. This means we can remove half the programs
and cleanup bpf side.
Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Jakub Sitnicki <[email protected]>
Link: https://lore.kernel.org/bpf/158939720756.15176.9806965887313279429.stgit@john-Precision-5820-Tower
|
|
Moves test_sockmap_kern.h into progs directory but does not change
code at all.
Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Jakub Sitnicki <[email protected]>
Link: https://lore.kernel.org/bpf/158939718921.15176.5766299102332077086.stgit@john-Precision-5820-Tower
|
|
There is a much higher chance we can see the regressions if the
test is part of test_progs.
Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Commit 294f2fc6da27 ("bpf: Verifer, adjust_scalar_min_max_vals to always
call update_reg_bounds()") changed the way verifier logs some of its state,
adjust the test_align accordingly. Where possible, I tried to not copy-paste
the entire log line and resorted to dropping the last closing brace instead.
Fixes: 294f2fc6da27 ("bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds()")
Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Fixes the following warnings:
hashmap.c: In function ‘hashmap__clear’:
hashmap.h:150:20: error: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Werror=sign-compare]
150 | for (bkt = 0; bkt < map->cap; bkt++) \
hashmap.c: In function ‘hashmap_grow’:
hashmap.h:150:20: error: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Werror=sign-compare]
150 | for (bkt = 0; bkt < map->cap; bkt++) \
Signed-off-by: Ian Rogers <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Remove #include of libbpf_internal.h that is unused.
Discussed in this thread:
https://lore.kernel.org/lkml/CAEf4BzZRmiEds_8R8g4vaAeWvJzPb4xYLnpF0X2VNY8oTzkphQ@mail.gmail.com/
Signed-off-by: Ian Rogers <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Move the bpf verifier trace check into the new switch statement in
HEAD.
Resolve the overlapping changes in hinic, where bug fixes overlap
the addition of VF support.
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull networking fixes from David Miller:
1) Fix sk_psock reference count leak on receive, from Xiyu Yang.
2) CONFIG_HNS should be invisible, from Geert Uytterhoeven.
3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this,
from Maciej Żenczykowski.
4) ipv4 route redirect backoff wasn't actually enforced, from Paolo
Abeni.
5) Fix netprio cgroup v2 leak, from Zefan Li.
6) Fix infinite loop on rmmod in conntrack, from Florian Westphal.
7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet.
8) Various bpf probe handling fixes, from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits)
selftests: mptcp: pm: rm the right tmp file
dpaa2-eth: properly handle buffer size restrictions
bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier
bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range
bpf: Restrict bpf_probe_read{, str}() only to archs where they work
MAINTAINERS: Mark networking drivers as Maintained.
ipmr: Add lockdep expression to ipmr_for_each_table macro
ipmr: Fix RCU list debugging warning
drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c
net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810
tcp: fix error recovery in tcp_zerocopy_receive()
MAINTAINERS: Add Jakub to networking drivers.
MAINTAINERS: another add of Karsten Graul for S390 networking
drivers: ipa: fix typos for ipa_smp2p structure doc
pppoe: only process PADT targeted at local interfaces
selftests/bpf: Enforce returning 0 for fentry/fexit programs
bpf: Enforce returning 0 for fentry/fexit progs
net: stmmac: fix num_por initialization
security: Fix the default value of secid_to_secctx hook
libbpf: Fix register naming in PT_REGS s390 macros
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
- lkdtm runner fixes to prevent dmesg clearing and shellcheck errors
- ftrace test handling when test module doesn't exist
- nsfs test fix to replace zero-length array with flexible-array
- dmabuf-heaps test fix to return clear error value
* tag 'linux-kselftest-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/lkdtm: Use grep -E instead of egrep
selftests/lkdtm: Don't clear dmesg when running tests
selftests/ftrace: mark irqsoff_tracer.tc test as unresolved if the test module does not exist
tools/testing: Replace zero-length array with flexible-array
kselftests: dmabuf-heaps: Fix confused return value on expected error testing
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2020-05-15
The following pull-request contains BPF updates for your *net* tree.
We've added 9 non-merge commits during the last 2 day(s) which contain
a total of 14 files changed, 137 insertions(+), 43 deletions(-).
The main changes are:
1) Fix secid_to_secctx LSM hook default value, from Anders.
2) Fix bug in mmap of bpf array, from Andrii.
3) Restrict bpf_probe_read to archs where they work, from Daniel.
4) Enforce returning 0 for fentry/fexit progs, from Yonghong.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-05-15
The following pull-request contains BPF updates for your *net-next* tree.
We've added 37 non-merge commits during the last 1 day(s) which contain
a total of 67 files changed, 741 insertions(+), 252 deletions(-).
The main changes are:
1) bpf_xdp_adjust_tail() now allows to grow the tail as well, from Jesper.
2) bpftool can probe CONFIG_HZ, from Daniel.
3) CAP_BPF is introduced to isolate user processes that use BPF infra and
to secure BPF networking services by dropping CAP_SYS_ADMIN requirement
in certain cases, from Alexei.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
"$err" is a variable pointing to a temp file. "$out" is not: only used
as a local variable in "check()" and representing the output of a
command line.
Fixes: eedbc685321b (selftests: add PM netlink functional tests)
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Implement two basic tests to verify terse dump functionality of flower
classifier:
- Test that verifies that terse dump works.
- Test that verifies that terse dump doesn't print filter key.
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Make all test_verifier test exercise CAP_BPF and CAP_PERFMON
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
In Cilium we've recently switched to make use of bpf_jiffies64() for
parts of our tc and XDP datapath since bpf_ktime_get_ns() is more
expensive and high-precision is not needed for our timeouts we have
anyway. Our agent has a probe manager which picks up the json of
bpftool's feature probe and we also use the macro output in our C
programs e.g. to have workarounds when helpers are not available on
older kernels.
Extend the kernel config info dump to also include the kernel's
CONFIG_HZ, and rework the probe_kernel_image_config() for allowing a
macro dump such that CONFIG_HZ can be propagated to BPF C code as a
simple define if available via config. Latter allows to have _compile-
time_ resolution of jiffies <-> sec conversion in our code since all
are propagated as known constants.
Given we cannot generally assume availability of kconfig everywhere,
we also have a kernel hz probe [0] as a fallback. Potentially, bpftool
could have an integrated probe fallback as well, although to derive it,
we might need to place it under 'bpftool feature probe full' or similar
given it would slow down the probing process overall. Yet 'full' doesn't
fit either for us since we don't want to pollute the kernel log with
warning messages from bpf_probe_write_user() and bpf_trace_printk() on
agent startup; I've left it out for the time being.
[0] https://github.com/cilium/cilium/blob/master/bpf/cilium-probe-kernel-hz.c
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Instead of iterating through all instructions to find the last
instruction each time .rela.discard.(un)reachable points beyond the
section, use find_insn to locate the last instruction by looking at
the last bytes of the section instead.
Suggested-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Sami Tolvanen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Currently, objtool fails to load the correct section for symbols when
the index is greater than SHN_LORESERVE. Use gelf_getsymshndx instead
of gelf_getsym to handle >64k sections.
Signed-off-by: Sami Tolvanen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Randy reported a false-positive:
arch/x86/hyperv/hv_apic.o: warning: objtool: hv_apic_write()+0x25: alternative modifies stack
What happens is that:
alternative_io("movl %0, %P1", "xchgl %0, %P1", X86_BUG_11AP,
13d: 89 9d 00 d0 7f ff mov %ebx,-0x803000(%rbp)
decodes to an instruction with CFI-ops because it modifies RBP.
However, due to this being a !frame-pointer build, that should not in
fact change the CFI state.
So instead of dis-allowing any CFI-op, verify the op would've actually
changed the CFI state.
Fixes: 7117f16bf460 ("objtool: Fix ORC vs alternatives")
Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
|
|
Extend BPF selftest xdp_adjust_tail with grow tail tests, which is added
as subtest's. The first grow test stays in same form as original shrink
test. The second grow test use the newer bpf_prog_test_run_xattr() calls,
and does extra checking of data contents.
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/158945350567.97035.9632611946765811876.stgit@firesoul
|
|
Current selftest for BPF-helper xdp_adjust_tail only shrink tail.
Make it more clear that this is a shrink test case.
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/158945350058.97035.17280775016196207372.stgit@firesoul
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-05-14
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON.
2) support for narrow loads in bpf_sock_addr progs and additional
helpers in cg-skb progs, from Andrey.
3) bpf benchmark runner, from Andrii.
4) arm and riscv JIT optimizations, from Luke.
5) bpf iterator infrastructure, from Yonghong.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Test bpf_sk_lookup_tcp, bpf_sk_release, bpf_sk_cgroup_id and
bpf_sk_ancestor_cgroup_id helpers from cgroup skb program.
The test creates a testing cgroup, starts a TCPv6 server inside the
cgroup and creates two client sockets: one inside testing cgroup and one
outside.
Then it attaches cgroup skb program to the cgroup that checks all TCP
segments coming to the server and allows only those coming from the
cgroup of the server. If a segment comes from a peer outside of the
cgroup, it'll be dropped.
Finally the test checks that client from inside testing cgroup can
successfully connect to the server, but client outside the cgroup fails
to connect by timeout.
The main goal of the test is to check newly introduced
bpf_sk_{,ancestor_}cgroup_id helpers.
It also checks a couple of socket lookup helpers (tcp & release), but
lookup helpers were introduced much earlier and covered by other tests.
Here it's mostly checked that they can be called from cgroup skb.
Signed-off-by: Andrey Ignatov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/171f4c5d75e8ff4fe1c4e8c1c12288b5240a4549.1589486450.git.rdna@fb.com
|
|
Add two new network helpers.
connect_fd_to_fd connects an already created client socket fd to address
of server fd. Sometimes it's useful to separate client socket creation
and connecting this socket to a server, e.g. if client socket has to be
created in a cgroup different from that of server cgroup.
Additionally connect_to_fd is now implemented using connect_fd_to_fd,
both helpers don't treat EINPROGRESS as an error and let caller decide
how to proceed with it.
connect_wait is a helper to work with non-blocking client sockets so
that if connect_to_fd or connect_fd_to_fd returned -1 with errno ==
EINPROGRESS, caller can wait for connect to finish or for connection
timeout. The helper returns -1 on error, 0 on timeout (1sec,
hard-coded), and positive number on success.
Signed-off-by: Andrey Ignatov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/1403fab72300f379ca97ead4820ae43eac4414ef.1589486450.git.rdna@fb.com
|
|
With having ability to lookup sockets in cgroup skb programs it becomes
useful to access cgroup id of retrieved sockets so that policies can be
implemented based on origin cgroup of such socket.
For example, a container running in a cgroup can have cgroup skb ingress
program that can lookup peer socket that is sending packets to a process
inside the container and decide whether those packets should be allowed
or denied based on cgroup id of the peer.
More specifically such ingress program can implement intra-host policy
"allow incoming packets only from this same container and not from any
other container on same host" w/o relying on source IP addresses since
quite often it can be the case that containers share same IP address on
the host.
Introduce two new helpers for this use-case: bpf_sk_cgroup_id() and
bpf_sk_ancestor_cgroup_id().
These helpers are similar to existing bpf_skb_{,ancestor_}cgroup_id
helpers with the only difference that sk is used to get cgroup id
instead of skb, and share code with them.
See documentation in UAPI for more details.
Signed-off-by: Andrey Ignatov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/f5884981249ce911f63e9b57ecd5d7d19154ff39.1589486450.git.rdna@fb.com
|
|
There is a spelling mistake in an error message, fix it.
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Test 1,2,4-byte loads from bpf_sock_addr.user_port in sock_addr
programs.
Signed-off-by: Andrey Ignatov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/e5c734a58cca4041ab30cb5471e644246f8cdb5a.1589420814.git.rdna@fb.com
|
|
bpf_sock_addr.user_port supports only 4-byte load and it leads to ugly
code in BPF programs, like:
volatile __u32 user_port = ctx->user_port;
__u16 port = bpf_ntohs(user_port);
Since otherwise clang may optimize the load to be 2-byte and it's
rejected by verifier.
Add support for 1- and 2-byte loads same way as it's supported for other
fields in bpf_sock_addr like user_ip4, msg_src_ip4, etc.
Signed-off-by: Andrey Ignatov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/c1e983f4c17573032601d0b2b1f9d1274f24bc16.1589420814.git.rdna@fb.com
|
|
Clean up after recent fixes, move address calculations
around and change the variable init, so that we can have
just one start_offset == end_offset check.
Make the check a little stricter to preserve the -EINVAL
error if requested start offset is larger than the region
itself.
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There are a few fentry/fexit programs returning non-0.
The tests with these programs will break with the previous
patch which enfoced return-0 rules. Fix them properly.
Fixes: ac065870d928 ("selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros")
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Flower tests used to create ingress filter with specified parent qdisc
"parent ffff:" but dump them on "ingress". With recent commit that fixed
tcm_parent handling in dump those are not considered same parent anymore,
which causes iproute2 tc to emit additional "parent ffff:" in first line of
filter dump output. The change in output causes filter match in tests to
fail.
Prevent parent qdisc output when dumping filters in flower tests by always
correctly specifying "ingress" parent both when creating and dumping
filters.
Fixes: a7df4870d79b ("net_sched: fix tcm_parent in tc filter dump")
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Fix register naming in PT_REGS s390 macros
Fixes: b8ebce86ffe6 ("libbpf: Provide CO-RE variants of PT_REGS macros")
Signed-off-by: Sumanth Korikkar <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Julian Wiedmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
mmap() subsystem allows user-space application to memory-map region with
initial page offset. This wasn't taken into account in initial implementation
of BPF array memory-mapping. This would result in wrong pages, not taking into
account requested page shift, being memory-mmaped into user-space. This patch
fixes this gap and adds a test for such scenario.
Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Constants of the *_ALL type can be actively harmful due to the fact that
developers will usually fail to consider the possible effects of future
changes to the definition.
Deprecate STATX_ALL in the uapi, while no damage has been done yet.
We could keep something like this around in the kernel, but there's
actually no point, since all filesystems should be explicitly checking
flags that they support and not rely on the VFS masking unknown ones out: a
flag could be known to the VFS, yet not known to the filesystem.
Cc: David Howells <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Miklos Szeredi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
This is to be consistent with tracing and lsm programs
which have prefix "bpf_trace_" and "bpf_lsm_" respectively.
Suggested-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Commit 6879c042e105 ("tools/bpf: selftests: Add bpf_iter selftests")
added self tests for bpf_iter feature. But two subtests
ipv6_route and netlink needs llvm latest 10.x release branch
or trunk due to a bug in llvm BPF backend. This patch added
the file README.rst to document these two failures
so people using llvm 10.0.0 can be aware of them.
Suggested-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
It is sometimes desirable to be able to trigger BPF program from user-space
with minimal overhead. sys_enter would seem to be a good candidate, yet in
a lot of cases there will be a lot of noise from syscalls triggered by other
processes on the system. So while searching for low-overhead alternative, I've
stumbled upon getpgid() syscall, which seems to be specific enough to not
suffer from accidental syscall by other apps.
This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe,
fentry and fmod_ret with returning error (so that syscall would not be
executed), to determine the lowest-overhead way. Here are results on my
machine (using benchs/run_bench_trigger.sh script):
base : 9.200 ± 0.319M/s
tp : 6.690 ± 0.125M/s
rawtp : 8.571 ± 0.214M/s
kprobe : 6.431 ± 0.048M/s
fentry : 8.955 ± 0.241M/s
fmodret : 8.903 ± 0.135M/s
So it seems like fmodret doesn't give much benefit for such lightweight
syscall. Raw tracepoint is pretty decent despite additional filtering logic,
but it will be called for any other syscall in the system, which rules it out.
Fentry, though, seems to be adding the least amoung of overhead and achieves
97.3% of performance of baseline no-BPF-attached syscall.
Using getpgid() seems to be preferable to set_task_comm() approach from
test_overhead, as it's about 2.35x faster in a baseline performance.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: John Fastabend <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add fmod_ret BPF program to existing test_overhead selftest. Also re-implement
user-space benchmarking part into benchmark runner to compare results. Results
with ./bench are consistently somewhat lower than test_overhead's, but relative
performance of various types of BPF programs stay consisten (e.g., kretprobe is
noticeably slower). This slowdown seems to be coming from the fact that
test_overhead is single-threaded, while benchmark always spins off at least
one thread for producer. This has been confirmed by hacking multi-threaded
test_overhead variant and also single-threaded bench variant. Resutls are
below. run_bench_rename.sh script from benchs/ subdirectory was used to
produce results for ./bench.
Single-threaded implementations
===============================
/* bench: single-threaded, atomics */
base : 4.622 ± 0.049M/s
kprobe : 3.673 ± 0.052M/s
kretprobe : 2.625 ± 0.052M/s
rawtp : 4.369 ± 0.089M/s
fentry : 4.201 ± 0.558M/s
fexit : 4.309 ± 0.148M/s
fmodret : 4.314 ± 0.203M/s
/* selftest: single-threaded, no atomics */
task_rename base 4555K events per sec
task_rename kprobe 3643K events per sec
task_rename kretprobe 2506K events per sec
task_rename raw_tp 4303K events per sec
task_rename fentry 4307K events per sec
task_rename fexit 4010K events per sec
task_rename fmod_ret 3984K events per sec
Multi-threaded implementations
==============================
/* bench: multi-threaded w/ atomics */
base : 3.910 ± 0.023M/s
kprobe : 3.048 ± 0.037M/s
kretprobe : 2.300 ± 0.015M/s
rawtp : 3.687 ± 0.034M/s
fentry : 3.740 ± 0.087M/s
fexit : 3.510 ± 0.009M/s
fmodret : 3.485 ± 0.050M/s
/* selftest: multi-threaded w/ atomics */
task_rename base 3872K events per sec
task_rename kprobe 3068K events per sec
task_rename kretprobe 2350K events per sec
task_rename raw_tp 3731K events per sec
task_rename fentry 3639K events per sec
task_rename fexit 3558K events per sec
task_rename fmod_ret 3511K events per sec
/* selftest: multi-threaded, no atomics */
task_rename base 3945K events per sec
task_rename kprobe 3298K events per sec
task_rename kretprobe 2451K events per sec
task_rename raw_tp 3718K events per sec
task_rename fentry 3782K events per sec
task_rename fexit 3543K events per sec
task_rename fmod_ret 3526K events per sec
Note that the fact that ./bench benchmark always uses atomic increments for
counting, while test_overhead doesn't, doesn't influence test results all that
much.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: John Fastabend <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
While working on BPF ringbuf implementation, testing, and benchmarking, I've
developed a pretty generic and modular benchmark runner, which seems to be
generically useful, as I've already used it for one more purpose (testing
fastest way to trigger BPF program, to minimize overhead of in-kernel code).
This patch adds generic part of benchmark runner and sets up Makefile for
extending it with more sets of benchmarks.
Benchmarker itself operates by spinning up specified number of producer and
consumer threads, setting up interval timer sending SIGALARM signal to
application once a second. Every second, current snapshot with hits/drops
counters are collected and stored in an array. Drops are useful for
producer/consumer benchmarks in which producer might overwhelm consumers.
Once test finishes after given amount of warm-up and testing seconds, mean and
stddev are calculated (ignoring warm-up results) and is printed out to stdout.
This setup seems to give consistent and accurate results.
To validate behavior, I added two atomic counting tests: global and local.
For global one, all the producer threads are atomically incrementing same
counter as fast as possible. This, of course, leads to huge drop of
performance once there is more than one producer thread due to CPUs fighting
for the same memory location.
Local counting, on the other hand, maintains one counter per each producer
thread, incremented independently. Once per second, all counters are read and
added together to form final "counting throughput" measurement. As expected,
such setup demonstrates linear scalability with number of producers (as long
as there are enough physical CPU cores, of course). See example output below.
Also, this setup can nicely demonstrate disastrous effects of false sharing,
if care is not taken to take those per-producer counters apart into
independent cache lines.
Demo output shows global counter first with 1 producer, then with 4. Both
total and per-producer performance significantly drop. The last run is local
counter with 4 producers, demonstrating near-perfect scalability.
$ ./bench -a -w1 -d2 -p1 count-global
Setting up benchmark 'count-global'...
Benchmark 'count-global' started.
Iter 0 ( 24.822us): hits 148.179M/s (148.179M/prod), drops 0.000M/s
Iter 1 ( 37.939us): hits 149.308M/s (149.308M/prod), drops 0.000M/s
Iter 2 (-10.774us): hits 150.717M/s (150.717M/prod), drops 0.000M/s
Iter 3 ( 3.807us): hits 151.435M/s (151.435M/prod), drops 0.000M/s
Summary: hits 150.488 ± 1.079M/s (150.488M/prod), drops 0.000 ± 0.000M/s
$ ./bench -a -w1 -d2 -p4 count-global
Setting up benchmark 'count-global'...
Benchmark 'count-global' started.
Iter 0 ( 60.659us): hits 53.910M/s ( 13.477M/prod), drops 0.000M/s
Iter 1 (-17.658us): hits 53.722M/s ( 13.431M/prod), drops 0.000M/s
Iter 2 ( 5.865us): hits 53.495M/s ( 13.374M/prod), drops 0.000M/s
Iter 3 ( 0.104us): hits 53.606M/s ( 13.402M/prod), drops 0.000M/s
Summary: hits 53.608 ± 0.113M/s ( 13.402M/prod), drops 0.000 ± 0.000M/s
$ ./bench -a -w1 -d2 -p4 count-local
Setting up benchmark 'count-local'...
Benchmark 'count-local' started.
Iter 0 ( 23.388us): hits 640.450M/s (160.113M/prod), drops 0.000M/s
Iter 1 ( 2.291us): hits 605.661M/s (151.415M/prod), drops 0.000M/s
Iter 2 ( -6.415us): hits 607.092M/s (151.773M/prod), drops 0.000M/s
Iter 3 ( -1.361us): hits 601.796M/s (150.449M/prod), drops 0.000M/s
Summary: hits 604.849 ± 2.739M/s (151.212M/prod), drops 0.000 ± 0.000M/s
Benchmark runner supports setting thread affinity for producer and consumer
threads. You can use -a flag for default CPU selection scheme, where first
consumer gets CPU #0, next one gets CPU #1, and so on. Then producer threads
pick up next CPU and increment one-by-one as well. But user can also specify
a set of CPUs independently for producers and consumers with --prod-affinity
1,2-10,15 and --cons-affinity <set-of-cpus>. The latter allows to force
producers and consumers to share same set of CPUs, if necessary.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add testing_helpers.c, which will contain generic helpers for test runners and
tests needing some common generic functionality, like parsing a set of
numbers.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
|
|
This is basically a test-suite for setns() and as of now contains:
- test that we can't pass garbage flags
- test that we can't attach to the namespaces of task that has already exited
- test that we can incrementally setns into all namespaces of a target task
using a pidfd
- test that we can setns atomically into all namespaces of a target task
- test that we can't cross setns into a user namespace outside of our user
namespace hierarchy
- test that we can't setns into namespaces owned by user namespaces over which
we are not privileged
Signed-off-by: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When the probe code was failing for any reason ENOTSUP was returned, even
if this was due to not having enough lock space. This patch fixes this by
returning EPERM to the user application, so it can respond and increase
the RLIMIT_MEMLOCK size.
Signed-off-by: Eelco Chaudron <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/158927424896.2342.10402475603585742943.stgit@ebuild
|
|
Before commit 74b5a5968fe8 ("selftests/bpf: Replace test_progs and
test_maps w/ general rule") selftests/bpf used generic install
target from selftests/lib.mk to install generated bpf test progs
by mentioning them in TEST_GEN_FILES variable.
Take that functionality back.
Fixes: 74b5a5968fe8 ("selftests/bpf: Replace test_progs and test_maps w/ general rule")
Signed-off-by: Yauheni Kaliuta <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Fixes to previous fixes.
Unfortunately, the last set of fixes introduced some minor bugs:
- The bootconfig apply_xbc() leak fix caused the application to
return a positive number on success, when it should have returned
zero.
- The preempt_irq_delay_thread fix to make the creation code wait for
the kthread to finish to prevent it from executing after module
unload, can now cause the kthread to exit before it even executes
(preventing it to run its tests).
- The fix to the bootconfig that fixed the initrd to remove the
bootconfig from causing the kernel to panic, now prints a warning
that the bootconfig is not found, even when bootconfig is not on
the command line"
* tag 'trace-v5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
bootconfig: Fix to prevent warning message if no bootconfig option
tracing: Wait for preempt irq delay thread to execute
tools/bootconfig: Fix apply_xbc() to return zero on success
|
|
The return of apply_xbc() returns the result of the last write() call, which
is not what is expected. It should only return zero on success.
Link: https://lore.kernel.org/r/20200508093059.GF9365@kadam
Fixes: 8842604446d1 ("tools/bootconfig: Fix resource leak in apply_xbc()")
Reported-by: Dan Carpenter <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Tested-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
Synchronise the bpf.h header under tools, to report the fixes recently
brought to the documentation for the BPF helpers.
Signed-off-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|