Age | Commit message (Collapse) | Author | Files | Lines |
|
Dump a summary after running whatever test specified. Useful for human
runners to identify any kind of failures (besides exit code).
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Axel Rasmussen <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Rapoport (IBM) <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Zach O'Keefe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "selftests/mm: Split / Refactor userfault test", v2.
This patchset splits userfaultfd.c into two tests:
- uffd-stress: the "vanilla", old and powerful stress test
- uffd-unit-tests: all the unit tests will be moved here
This is on my todo list for a long time but I never did it for real. The
uffd test is growing into a small and cute monster. I start to notice it's
going harder to maintain such a test and make it useful.
A few issues I found when looking at userfaultfd test:
- We have a bunch of unit tests in userfaultfd.c, but they always need to
be run only after a stress type. No way to not do it.
- We can only run an unit test for one memory type only, if we want to
do a quick smoke test to check regressions, there's no good way. The
best to come currently is "bash ./run_vmtests.sh -t userfaultfd" thanks
to the most recent changes to run_vmtests.sh on tagging. Still, that
needs to run the stress tests always and hard to see what's wrong.
- It's hard to add a new unit test to userfaultfd.c, we don't really know
what's happening, not until we mostly read the whole file.
- We did a bunch of useless tests, e.g. we run twice the whole suite of
stress test just to verify both syscall and /dev/userfaultfd. They're
all using userfaultfd_new() to create the handle, everything should
really be the same underneath. One simple unit test should cover that!
- We have tens of global variables in one file but shared with all the
tests. Some of them are not suitable to be a global var from
maintainance pov. It enforces every unit test to consider how these
vars affects the stress test and vice versa, but that's logically not
necessary.
- Userfaultfd test is not friendly to old kernels. Mostly it only works
on the latest kernel tree. It's preferrable to be run on all kernels
and properly report what's missing.
I'll stop here, I feel like I can still list some..
This patchset should resolve all issues above, and actually we can do even
more on top. I stopped doing that until I found I already got 29 patches
and 2000+ LOC changes. That's already a patchset terrible enough so we
should move in small steps.
After the whole set applied, "./run_vmtests.sh -t userfaultfd" looks like
this:
===8<===
vm.nr_hugepages = 1024
-------------------------
running ./uffd-unit-tests
-------------------------
Testing UFFDIO_API (with syscall)... done
Testing UFFDIO_API (with /dev/userfaultfd)... done
Testing register-ioctls on anon... done
Testing register-ioctls on shmem... done
Testing register-ioctls on shmem-private... done
Testing register-ioctls on hugetlb... done
Testing register-ioctls on hugetlb-private... done
Testing zeropage on anon... done
Testing zeropage on shmem... done
Testing zeropage on shmem-private... done
Testing zeropage on hugetlb... done
Testing zeropage on hugetlb-private... done
Testing pagemap on anon... done
Testing wp-unpopulated on anon... done
Testing minor on shmem... done
Testing minor on hugetlb... done
Testing minor-wp on shmem... done
Testing minor-wp on hugetlb... done
Testing minor-collapse on shmem... done
Testing sigbus on anon... done
Testing sigbus on shmem... done
Testing sigbus on shmem-private... done
Testing sigbus on hugetlb... done
Testing sigbus on hugetlb-private... done
Testing sigbus-wp on anon... done
Testing sigbus-wp on shmem... done
Testing sigbus-wp on shmem-private... done
Testing sigbus-wp on hugetlb... done
Testing sigbus-wp on hugetlb-private... done
Testing events on anon... done
Testing events on shmem... done
Testing events on shmem-private... done
Testing events on hugetlb... done
Testing events on hugetlb-private... done
Testing events-wp on anon... done
Testing events-wp on shmem... done
Testing events-wp on shmem-private... done
Testing events-wp on hugetlb... done
Testing events-wp on hugetlb-private... done
Userfaults unit tests: pass=39, skip=0, fail=0 (total=39)
[PASS]
--------------------------------
running ./uffd-stress anon 20 16
--------------------------------
nr_pages: 5120, nr_pages_per_cpu: 640
bounces: 15, mode: rnd racing ver poll, userfaults: 345 missing (26+48+61+102+30+12+59+7) 1596 wp (120+139+317+346+215+67+306+86)
[...]
[PASS]
------------------------------------
running ./uffd-stress hugetlb 128 32
------------------------------------
nr_pages: 64, nr_pages_per_cpu: 8
bounces: 31, mode: rnd racing ver poll, userfaults: 29 missing (6+6+6+5+4+2+0+0) 104 wp (20+19+22+18+7+12+5+1)
[...]
[PASS]
--------------------------------------------
running ./uffd-stress hugetlb-private 128 32
--------------------------------------------
nr_pages: 64, nr_pages_per_cpu: 8
bounces: 31, mode: rnd racing ver poll, userfaults: 33 missing (12+9+7+0+5+0+0+0) 111 wp (24+25+14+14+11+17+5+1)
[...]
[PASS]
---------------------------------
running ./uffd-stress shmem 20 16
---------------------------------
nr_pages: 5120, nr_pages_per_cpu: 640
bounces: 15, mode: rnd racing ver poll, userfaults: 247 missing (15+17+34+60+81+37+3+0) 2038 wp (180+114+276+400+381+318+165+204)
[...]
[PASS]
-----------------------------------------
running ./uffd-stress shmem-private 20 16
-----------------------------------------
nr_pages: 5120, nr_pages_per_cpu: 640
bounces: 15, mode: rnd racing ver poll, userfaults: 235 missing (52+29+55+56+13+9+16+5) 2849 wp (218+406+461+531+328+284+430+191)
[...]
[PASS]
SUMMARY: PASS=6 SKIP=0 FAIL=0
===8<===
The output may be different if we miss some features (e.g., hugetlb not
allocated, old kernel, less privilege of uffd handle), but they should show
up with good reasons. E.g., I tried to run the unit test on my Fedora
kernel and it gives me:
===8<===
UFFDIO_API (with syscall)... failed [reason: UFFDIO_API should fail with wrong api but didn't]
UFFDIO_API (with /dev/userfaultfd)... skipped [reason: cannot open userfaultfd handle]
zeropage on anon... done
zeropage on shmem... done
zeropage on shmem-private... done
zeropage-hugetlb on hugetlb... done
zeropage-hugetlb on hugetlb-private... done
pagemap on anon... pagemap on anon... pagemap on anon... done
wp-unpopulated on anon... skipped [reason: feature missing]
minor on shmem... done
minor on hugetlb... done
minor-wp on shmem... skipped [reason: feature missing]
minor-wp on hugetlb... skipped [reason: feature missing]
minor-collapse on shmem... done
sigbus on anon... skipped [reason: possible lack of priviledge]
sigbus on shmem... skipped [reason: possible lack of priviledge]
sigbus on shmem-private... skipped [reason: possible lack of priviledge]
sigbus on hugetlb... skipped [reason: possible lack of priviledge]
sigbus on hugetlb-private... skipped [reason: possible lack of priviledge]
sigbus-wp on anon... skipped [reason: possible lack of priviledge]
sigbus-wp on shmem... skipped [reason: possible lack of priviledge]
sigbus-wp on shmem-private... skipped [reason: possible lack of priviledge]
sigbus-wp on hugetlb... skipped [reason: possible lack of priviledge]
sigbus-wp on hugetlb-private... skipped [reason: possible lack of priviledge]
events on anon... skipped [reason: possible lack of priviledge]
events on shmem... skipped [reason: possible lack of priviledge]
events on shmem-private... skipped [reason: possible lack of priviledge]
events on hugetlb... skipped [reason: possible lack of priviledge]
events on hugetlb-private... skipped [reason: possible lack of priviledge]
events-wp on anon... skipped [reason: possible lack of priviledge]
events-wp on shmem... skipped [reason: possible lack of priviledge]
events-wp on shmem-private... skipped [reason: possible lack of priviledge]
events-wp on hugetlb... skipped [reason: possible lack of priviledge]
events-wp on hugetlb-private... skipped [reason: possible lack of priviledge]
Userfaults unit tests: pass=9, skip=24, fail=1 (total=34)
===8<===
Patch layout:
- Revert "userfaultfd: don't fail on unrecognized features"
Something I found when I got the UFFDIO_API test below. Axel, I still
propose to revert it as a whole, but feel free to continue the discussion
from the original patch thread.
- selftests/mm: Update .gitignore with two missing tests
- selftests/mm: Dump a summary in run_vmtests.sh
- selftests/mm: Merge util.h into vm_util.h
- selftests/mm: Use TEST_GEN_PROGS where proper
- selftests/mm: Link vm_util.c always
- selftests/mm: Merge default_huge_page_size() into one
- selftests/mm: Use PM_* macros in vm_utils.h
- selftests/mm: Reuse pagemap_get_entry() in vm_util.h
- selftests/mm: Test UFFDIO_ZEROPAGE only when !hugetlb
- selftests/mm: Drop test_uffdio_zeropage_eexist
Until here, all cleanups here and there. I wanted to keep going, but I
found that maybe it'll take a few more days to split the test. Hence I
did a split starting from the next one, so we have a working thing first.
- selftests/mm: Create uffd-common.[ch]
- selftests/mm: Split uffd tests into uffd-stress and uffd-unit-tests
This did the major brute force split of common codes into
uffd-common.[ch]. That'll be the so far common base for stress and unit
tests. Then a new unit test is created.
- selftests/mm: uffd_[un]register()
- selftests/mm: uffd_open_{dev|sys}()
- selftests/mm: UFFDIO_API test
This patch hides here to start writting the 1st unit test with
UFFDIO_API, also detection of userfaultfd privileges.
- selftests/mm: Drop global mem_fd in uffd tests
- selftests/mm: Drop global hpage_size in uffd tests
- selftests/mm: Rename uffd_stats to uffd_args
- selftests/mm: Let uffd_handle_page_fault() takes wp parameter
- selftests/mm: Allow allocate_area() to fail properly
Some further cleanup that I noticed otherwise hard to move the tests.
- selftests/mm: Add framework for uffd-unit-test
The major patch provides the framework for most of the rest unit tests.
- selftests/mm: Move uffd pagemap test to unit test
- selftests/mm: Move uffd minor test to unit test
- selftests/mm: Move uffd sig/events tests into uffd unit tests
- selftests/mm: Move zeropage test into uffd unit tests
Move unit tests and suite them into the new file.
- selftests/mm: Workaround no way to detect uffd-minor + wp
- selftests/mm: Allow uffd test to skip properly with no privilege
- selftests/mm: Drop sys/dev test in uffd-stress test
- selftests/mm: Add shmem-private test to uffd-stress
A bunch of changes to do better on error reportings, and add
shmem-private to the stress test which was long missing.
- selftests/mm: Add uffdio register ioctls test
One more patch to test uffdio_register.ioctls.
This patch (of 30):
Update .gitignore with two missing tests.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Rapoport (IBM) <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Zach O'Keefe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
write permissions
Let's add some tests that trigger (pte|pmd)_mkdirty on VMAs without write
permissions. If an architecture implementation is wrong, we might
accidentally set the PTE/PMD writable and allow for write access in a VMA
without write permissions.
The tests include reproducers for the two issues recently discovered
and worked-around in core-MM for now:
(1) commit 624a2c94f5b7 ("Partly revert "mm/thp: carry over dirty
bit when thp splits on pmd"")
(2) commit 96a9c287e25d ("mm/migrate: fix wrongly apply write bit
after mkdirty on sparc64")
In addition, some other tests that reveal further issues.
All tests pass under x86_64:
./mkdirty
# [INFO] detected THP size: 2048 KiB
TAP version 13
1..6
# [INFO] PTRACE write access
ok 1 SIGSEGV generated, page not modified
# [INFO] PTRACE write access to THP
ok 2 SIGSEGV generated, page not modified
# [INFO] Page migration
ok 3 SIGSEGV generated, page not modified
# [INFO] Page migration of THP
ok 4 SIGSEGV generated, page not modified
# [INFO] PTE-mapping a THP
ok 5 SIGSEGV generated, page not modified
# [INFO] UFFDIO_COPY
ok 6 SIGSEGV generated, page not modified
# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
But some fail on sparc64:
./mkdirty
# [INFO] detected THP size: 8192 KiB
TAP version 13
1..6
# [INFO] PTRACE write access
not ok 1 SIGSEGV generated, page not modified
# [INFO] PTRACE write access to THP
not ok 2 SIGSEGV generated, page not modified
# [INFO] Page migration
ok 3 SIGSEGV generated, page not modified
# [INFO] Page migration of THP
ok 4 SIGSEGV generated, page not modified
# [INFO] PTE-mapping a THP
ok 5 SIGSEGV generated, page not modified
# [INFO] UFFDIO_COPY
not ok 6 SIGSEGV generated, page not modified
Bail out! 3 out of 6 tests failed
# Totals: pass:3 fail:3 xfail:0 xpass:0 skip:0 error:0
Reverting both above commits makes all tests fail on sparc64:
./mkdirty
# [INFO] detected THP size: 8192 KiB
TAP version 13
1..6
# [INFO] PTRACE write access
not ok 1 SIGSEGV generated, page not modified
# [INFO] PTRACE write access to THP
not ok 2 SIGSEGV generated, page not modified
# [INFO] Page migration
not ok 3 SIGSEGV generated, page not modified
# [INFO] Page migration of THP
not ok 4 SIGSEGV generated, page not modified
# [INFO] PTE-mapping a THP
not ok 5 SIGSEGV generated, page not modified
# [INFO] UFFDIO_COPY
not ok 6 SIGSEGV generated, page not modified
Bail out! 6 out of 6 tests failed
# Totals: pass:0 fail:6 xfail:0 xpass:0 skip:0 error:0
The tests are useful to detect other problematic archs, to verify new
arch fixes, and to stop such issues from reappearing in the future.
For now, we don't add any hugetlb tests.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm: (pte|pmd)_mkdirty() should not unconditionally allow for
write access".
This is the follow-up on [1], adding selftests (testing for known issues
we added workarounds for and other issues that haven't been fixed yet),
fixing sparc64, reverting the workarounds, and perform one cleanup.
The patch from [1] was modified slightly (updated/extended patch
description, dropped one unnecessary NOP instruction from the ASM in
__pte_mkhwwrite()).
Retested on x86_64 and sparc64 (sun4u in QEMU).
I scanned most architectures to make sure their (pte|pmd)_mkdirty()
handling is correct. To be sure, we can run the selftests and find out if
other architectures are still affectes (loongarch was fixed recently as
well).
Based on master for now. I don't expect surprises regarding mm-tress, but
I can rebase if there are any problems.
This patch (of 6):
The COW selftest can deal with THP not being configured. So move error
handling of read_pmd_pagesize() into the callers such that we can reuse it
in the COW selftest.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected] [1]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add a test case to check whether the number of maple_alloc structures is
actually equal to mas->alloc->total.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Zhang <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This adds a test for the recently added RISC-V interface for probing
hardware capabilities. It happens to be the first selftest we have for
RISC-V, so I've added some infrastructure for those as well.
Co-developed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Evan Green <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
To make it easier for bleeding-edge BPF applications, such as sched_ext,
to utilize open-coded iterators, move bpf_for(), bpf_for_each(), and
bpf_repeat() macros from selftests/bpf-internal bpf_misc.h helper, to
libbpf-provided bpf_helpers.h header.
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add test validating that libbpf correctly poisons and reports __weak
unresolved kfuncs in post-processed verifier log.
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
monotonicity
The first field of /proc/uptime relies on the CLOCK_BOOTTIME clock which
can also be fetched from clock_gettime() API.
Improve the test coverage while verifying the monotonicity of
CLOCK_BOOTTIME accross both interfaces.
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Due to broken iowait task counting design (cf: comments above
get_cpu_idle_time_us() and nr_iowait()), it is not possible to provide
the guarantee that /proc/stat or /proc/uptime display monotonic idle
time values.
Remove the assertions that verify the related wrong assumption so that
testers and maintainers don't spend more time on that.
Reported-by: Yu Liao <[email protected]>
Reported-by: Thomas Gleixner <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add a selftest to ensure subreg equality if source register
upper 32bit is 0. Without previous patch, the test will
fail verification.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Most of the code had an issue according to ShellCheck.
That's mainly due to the fact it incorrectly believes most of the code
was unreachable because it's invoked by variable name, see how the
"tests" array is used.
Once SC2317 has been ignored, three small warnings were still visible:
- SC2155: Declare and assign separately to avoid masking return values.
- SC2046: Quote this to prevent word splitting: can be ignored because
"ip netns pids" can display more than one pid.
- SC2166: Prefer [ p ] || [ q ] as [ p -o q ] is not well defined.
This probably didn't fix any actual issues but it might help spotting
new interesting warnings reported by ShellCheck as just before,
ShellCheck was reporting issues for most lines making it a bit useless.
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
mptcp_connect tool was printing some duplicated entries when showing how
to use it: -j -l -r
While at it, I also:
- moved the very few entries that were not sorted,
- added -R that was missing since
commit 8a4b910d005d ("mptcp: selftests: add rcvbuf set option"),
- removed the -u parameter that has been removed in
commit f730b65c9d85 ("selftests: mptcp: try to set mptcp ulp mode in different sk states").
No need to backport this, it is just an internal tool used by our
selftests. The help menu is mainly useful for MPTCP kernel devs.
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The upcall socket interface can be exercised now to make sure that
future feature adjustments to the field can maintain backwards
compatibility.
Signed-off-by: Aaron Conole <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add a basic set of fields to print in a 'dpflow' format. This will be
used by future commits to check for flow fields after parsing, as
well as verifying the flow fields pushed into the kernel from
userspace.
Signed-off-by: Aaron Conole <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Includes an associated test to generate netns and connect
interfaces, with the option to include packet tracing.
This will be used in the future when flow support is added
for additional test cases.
Signed-off-by: Aaron Conole <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
|
|
We've managed to improve the UX for kptrs significantly over the last 9
months. All of the prior main use cases, struct bpf_cpumask *, struct
task_struct *, and struct cgroup *, have all been updated to be
synchronized mainly using RCU. In other words, their KF_ACQUIRE kfunc
calls are all KF_RCU, and the pointers themselves are MEM_RCU and can be
accessed in an RCU read region in BPF.
In a follow-on change, we'll be removing the KF_KPTR_GET kfunc flag.
This patch prepares for that by removing the
bpf_kfunc_call_test_kptr_get() kfunc, and all associated selftests.
Signed-off-by: David Vernet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Validate that the following new ptrace requests work as expected
* PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG
returns the contents of task->syscall_dispatch
* PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG
sets the contents of task->syscall_dispatch
Signed-off-by: Gregory Price <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Test that POSIX timers using CLOCK_PROCESS_CPUTIME_ID eventually deliver
a signal to all running threads. This effectively tests that the kernel
doesn't prefer any one thread (or subset of threads) for signal delivery.
Signed-off-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Test refcounted local kptr functionality added in previous patches in
the series.
Usecases which pass verification:
* Add refcounted local kptr to both tree and list. Then, read and -
possibly, depending on test variant - delete from tree, then list.
* Also test doing read-and-maybe-delete in opposite order
* Stash a refcounted local kptr in a map_value, then add it to a
rbtree. Read from both, possibly deleting after tree read.
* Add refcounted local kptr to both tree and list. Then, try reading and
deleting twice from one of the collections.
* bpf_refcount_acquire of just-added non-owning ref should work, as
should bpf_refcount_acquire of owning ref just out of bpf_obj_new
Usecases which fail verification:
* The simple successful bpf_refcount_acquire cases from above should
both fail to verify if the newly-acquired owning ref is not dropped
Signed-off-by: Dave Marchevsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
This patch modifies bpf_rbtree_remove to account for possible failure
due to the input rb_node already not being in any collection.
The function can now return NULL, and does when the aforementioned
scenario occurs. As before, on successful removal an owning reference to
the removed node is returned.
Adding KF_RET_NULL to bpf_rbtree_remove's kfunc flags - now KF_RET_NULL |
KF_ACQUIRE - provides the desired verifier semantics:
* retval must be checked for NULL before use
* if NULL, retval's ref_obj_id is released
* retval is a "maybe acquired" owning ref, not a non-owning ref,
so it will live past end of critical section (bpf_spin_unlock), and
thus can be checked for NULL after the end of the CS
BPF programs must add checks
============================
This does change bpf_rbtree_remove's verifier behavior. BPF program
writers will need to add NULL checks to their programs, but the
resulting UX looks natural:
bpf_spin_lock(&glock);
n = bpf_rbtree_first(&ghead);
if (!n) { /* ... */}
res = bpf_rbtree_remove(&ghead, &n->node);
bpf_spin_unlock(&glock);
if (!res) /* Newly-added check after this patch */
return 1;
n = container_of(res, /* ... */);
/* Do something else with n */
bpf_obj_drop(n);
return 0;
The "if (!res)" check above is the only addition necessary for the above
program to pass verification after this patch.
bpf_rbtree_remove no longer clobbers non-owning refs
====================================================
An issue arises when bpf_rbtree_remove fails, though. Consider this
example:
struct node_data {
long key;
struct bpf_list_node l;
struct bpf_rb_node r;
struct bpf_refcount ref;
};
long failed_sum;
void bpf_prog()
{
struct node_data *n = bpf_obj_new(/* ... */);
struct bpf_rb_node *res;
n->key = 10;
bpf_spin_lock(&glock);
bpf_list_push_back(&some_list, &n->l); /* n is now a non-owning ref */
res = bpf_rbtree_remove(&some_tree, &n->r, /* ... */);
if (!res)
failed_sum += n->key; /* not possible */
bpf_spin_unlock(&glock);
/* if (res) { do something useful and drop } ... */
}
The bpf_rbtree_remove in this example will always fail. Similarly to
bpf_spin_unlock, bpf_rbtree_remove is a non-owning reference
invalidation point. The verifier clobbers all non-owning refs after a
bpf_rbtree_remove call, so the "failed_sum += n->key" line will fail
verification, and in fact there's no good way to get information about
the node which failed to add after the invalidation. This patch removes
non-owning reference invalidation from bpf_rbtree_remove to allow the
above usecase to pass verification. The logic for why this is now
possible is as follows:
Before this series, bpf_rbtree_add couldn't fail and thus assumed that
its input, a non-owning reference, was in the tree. But it's easy to
construct an example where two non-owning references pointing to the same
underlying memory are acquired and passed to rbtree_remove one after
another (see rbtree_api_release_aliasing in
selftests/bpf/progs/rbtree_fail.c).
So it was necessary to clobber non-owning refs to prevent this
case and, more generally, to enforce "non-owning ref is definitely
in some collection" invariant. This series removes that invariant and
the failure / runtime checking added in this patch provide a clean way
to deal with the aliasing issue - just fail to remove.
Because the aliasing issue prevented by clobbering non-owning refs is no
longer an issue, this patch removes the invalidate_non_owning_refs
call from verifier handling of bpf_rbtree_remove. Note that
bpf_spin_unlock - the other caller of invalidate_non_owning_refs -
clobbers non-owning refs for a different reason, so its clobbering
behavior remains unchanged.
No BPF program changes are necessary for programs to remain valid as a
result of this clobbering change. A valid program before this patch
passed verification with its non-owning refs having shorter (or equal)
lifetimes due to more aggressive clobbering.
Also, update existing tests to check bpf_rbtree_remove retval for NULL
where necessary, and move rbtree_api_release_aliasing from
progs/rbtree_fail.c to progs/rbtree.c since it's now expected to pass
verification.
Signed-off-by: Dave Marchevsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The linked_list tests use macros and function pointers to reduce code
duplication. Earlier in the series, bpf_list_push_{front,back} were
modified to be macros, expanding to invoke actual kfuncs
bpf_list_push_{front,back}_impl. Due to this change, a code snippet
like:
void (*p)(void *, void *) = (void *)&bpf_list_##op;
p(hexpr, nexpr);
meant to do bpf_list_push_{front,back}(hexpr, nexpr), will no longer
work as it's no longer valid to do &bpf_list_push_{front,back} since
they're no longer functions.
This patch fixes issues of this type, along with two other minor changes
- one improvement and one fix - both related to the node argument to
list_push_{front,back}.
* The fix: migration of list_push tests away from (void *, void *)
func ptr uncovered that some tests were incorrectly passing pointer
to node, not pointer to struct bpf_list_node within the node. This
patch fixes such issues (CHECK(..., f) -> CHECK(..., &f->node))
* The improvement: In linked_list tests, the struct foo type has two
list_node fields: node and node2, at byte offsets 0 and 40 within
the struct, respectively. Currently node is used in ~all tests
involving struct foo and lists. The verifier needs to do some work
to account for the offset of bpf_list_node within the node type, so
using node2 instead of node exercises that logic more in the tests.
This patch migrates linked_list tests to use node2 instead of node.
Signed-off-by: Dave Marchevsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Consider this code snippet:
struct node {
long key;
bpf_list_node l;
bpf_rb_node r;
bpf_refcount ref;
}
int some_bpf_prog(void *ctx)
{
struct node *n = bpf_obj_new(/*...*/), *m;
bpf_spin_lock(&glock);
bpf_rbtree_add(&some_tree, &n->r, /* ... */);
m = bpf_refcount_acquire(n);
bpf_rbtree_add(&other_tree, &m->r, /* ... */);
bpf_spin_unlock(&glock);
/* ... */
}
After bpf_refcount_acquire, n and m point to the same underlying memory,
and that node's bpf_rb_node field is being used by the some_tree insert,
so overwriting it as a result of the second insert is an error. In order
to properly support refcounted nodes, the rbtree and list insert
functions must be allowed to fail. This patch adds such support.
The kfuncs bpf_rbtree_add, bpf_list_push_{front,back} are modified to
return an int indicating success/failure, with 0 -> success, nonzero ->
failure.
bpf_obj_drop on failure
=======================
Currently the only reason an insert can fail is the example above: the
bpf_{list,rb}_node is already in use. When such a failure occurs, the
insert kfuncs will bpf_obj_drop the input node. This allows the insert
operations to logically fail without changing their verifier owning ref
behavior, namely the unconditional release_reference of the input
owning ref.
With insert that always succeeds, ownership of the node is always passed
to the collection, since the node always ends up in the collection.
With a possibly-failed insert w/ bpf_obj_drop, ownership of the node
is always passed either to the collection (success), or to bpf_obj_drop
(failure). Regardless, it's correct to continue unconditionally
releasing the input owning ref, as something is always taking ownership
from the calling program on insert.
Keeping owning ref behavior unchanged results in a nice default UX for
insert functions that can fail. If the program's reaction to a failed
insert is "fine, just get rid of this owning ref for me and let me go
on with my business", then there's no reason to check for failure since
that's default behavior. e.g.:
long important_failures = 0;
int some_bpf_prog(void *ctx)
{
struct node *n, *m, *o; /* all bpf_obj_new'd */
bpf_spin_lock(&glock);
bpf_rbtree_add(&some_tree, &n->node, /* ... */);
bpf_rbtree_add(&some_tree, &m->node, /* ... */);
if (bpf_rbtree_add(&some_tree, &o->node, /* ... */)) {
important_failures++;
}
bpf_spin_unlock(&glock);
}
If we instead chose to pass ownership back to the program on failed
insert - by returning NULL on success or an owning ref on failure -
programs would always have to do something with the returned ref on
failure. The most likely action is probably "I'll just get rid of this
owning ref and go about my business", which ideally would look like:
if (n = bpf_rbtree_add(&some_tree, &n->node, /* ... */))
bpf_obj_drop(n);
But bpf_obj_drop isn't allowed in a critical section and inserts must
occur within one, so in reality error handling would become a
hard-to-parse mess.
For refcounted nodes, we can replicate the "pass ownership back to
program on failure" logic with this patch's semantics, albeit in an ugly
way:
struct node *n = bpf_obj_new(/* ... */), *m;
bpf_spin_lock(&glock);
m = bpf_refcount_acquire(n);
if (bpf_rbtree_add(&some_tree, &n->node, /* ... */)) {
/* Do something with m */
}
bpf_spin_unlock(&glock);
bpf_obj_drop(m);
bpf_refcount_acquire is used to simulate "return owning ref on failure".
This should be an uncommon occurrence, though.
Addition of two verifier-fixup'd args to collection inserts
===========================================================
The actual bpf_obj_drop kfunc is
bpf_obj_drop_impl(void *, struct btf_struct_meta *), with bpf_obj_drop
macro populating the second arg with 0 and the verifier later filling in
the arg during insn fixup.
Because bpf_rbtree_add and bpf_list_push_{front,back} now might do
bpf_obj_drop, these kfuncs need a btf_struct_meta parameter that can be
passed to bpf_obj_drop_impl.
Similarly, because the 'node' param to those insert functions is the
bpf_{list,rb}_node within the node type, and bpf_obj_drop expects a
pointer to the beginning of the node, the insert functions need to be
able to find the beginning of the node struct. A second
verifier-populated param is necessary: the offset of {list,rb}_node within the
node type.
These two new params allow the insert kfuncs to correctly call
__bpf_obj_drop_impl:
beginning_of_node = bpf_rb_node_ptr - offset
if (already_inserted)
__bpf_obj_drop_impl(beginning_of_node, btf_struct_meta->record);
Similarly to other kfuncs with "hidden" verifier-populated params, the
insert functions are renamed with _impl prefix and a macro is provided
for common usage. For example, bpf_rbtree_add kfunc is now
bpf_rbtree_add_impl and bpf_rbtree_add is now a macro which sets
"hidden" args to 0.
Due to the two new args BPF progs will need to be recompiled to work
with the new _impl kfuncs.
This patch also rewrites the "hidden argument" explanation to more
directly say why the BPF program writer doesn't need to populate the
arguments with anything meaningful.
How does this new logic affect non-owning references?
=====================================================
Currently, non-owning refs are valid until the end of the critical
section in which they're created. We can make this guarantee because, if
a non-owning ref exists, the referent was added to some collection. The
collection will drop() its nodes when it goes away, but it can't go away
while our program is accessing it, so that's not a problem. If the
referent is removed from the collection in the same CS that it was added
in, it can't be bpf_obj_drop'd until after CS end. Those are the only
two ways to free the referent's memory and neither can happen until
after the non-owning ref's lifetime ends.
On first glance, having these collection insert functions potentially
bpf_obj_drop their input seems like it breaks the "can't be
bpf_obj_drop'd until after CS end" line of reasoning. But we care about
the memory not being _freed_ until end of CS end, and a previous patch
in the series modified bpf_obj_drop such that it doesn't free refcounted
nodes until refcount == 0. So the statement can be more accurately
rewritten as "can't be free'd until after CS end".
We can prove that this rewritten statement holds for any non-owning
reference produced by collection insert functions:
* If the input to the insert function is _not_ refcounted
* We have an owning reference to the input, and can conclude it isn't
in any collection
* Inserting a node in a collection turns owning refs into
non-owning, and since our input type isn't refcounted, there's no
way to obtain additional owning refs to the same underlying
memory
* Because our node isn't in any collection, the insert operation
cannot fail, so bpf_obj_drop will not execute
* If bpf_obj_drop is guaranteed not to execute, there's no risk of
memory being free'd
* Otherwise, the input to the insert function is refcounted
* If the insert operation fails due to the node's list_head or rb_root
already being in some collection, there was some previous successful
insert which passed refcount to the collection
* We have an owning reference to the input, it must have been
acquired via bpf_refcount_acquire, which bumped the refcount
* refcount must be >= 2 since there's a valid owning reference and the
node is already in a collection
* Insert triggering bpf_obj_drop will decr refcount to >= 1, never
resulting in a free
So although we may do bpf_obj_drop during the critical section, this
will never result in memory being free'd, and no changes to non-owning
ref logic are needed in this patch.
Signed-off-by: Dave Marchevsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Currently, BPF programs can interact with the lifetime of refcounted
local kptrs in the following ways:
bpf_obj_new - Initialize refcount to 1 as part of new object creation
bpf_obj_drop - Decrement refcount and free object if it's 0
collection add - Pass ownership to the collection. No change to
refcount but collection is responsible for
bpf_obj_dropping it
In order to be able to add a refcounted local kptr to multiple
collections we need to be able to increment the refcount and acquire a
new owning reference. This patch adds a kfunc, bpf_refcount_acquire,
implementing such an operation.
bpf_refcount_acquire takes a refcounted local kptr and returns a new
owning reference to the same underlying memory as the input. The input
can be either owning or non-owning. To reinforce why this is safe,
consider the following code snippets:
struct node *n = bpf_obj_new(typeof(*n)); // A
struct node *m = bpf_refcount_acquire(n); // B
In the above snippet, n will be alive with refcount=1 after (A), and
since nothing changes that state before (B), it's obviously safe. If
n is instead added to some rbtree, we can still safely refcount_acquire
it:
struct node *n = bpf_obj_new(typeof(*n));
struct node *m;
bpf_spin_lock(&glock);
bpf_rbtree_add(&groot, &n->node, less); // A
m = bpf_refcount_acquire(n); // B
bpf_spin_unlock(&glock);
In the above snippet, after (A) n is a non-owning reference, and after
(B) m is an owning reference pointing to the same memory as n. Although
n has no ownership of that memory's lifetime, it's guaranteed to be
alive until the end of the critical section, and n would be clobbered if
we were past the end of the critical section, so it's safe to bump
refcount.
Implementation details:
* From verifier's perspective, bpf_refcount_acquire handling is similar
to bpf_obj_new and bpf_obj_drop. Like the former, it returns a new
owning reference matching input type, although like the latter, type
can be inferred from concrete kptr input. Verifier changes in
{check,fixup}_kfunc_call and check_kfunc_args are largely copied from
aforementioned functions' verifier changes.
* An exception to the above is the new KF_ARG_PTR_TO_REFCOUNTED_KPTR
arg, indicated by new "__refcounted_kptr" kfunc arg suffix. This is
necessary in order to handle both owning and non-owning input without
adding special-casing to "__alloc" arg handling. Also a convenient
place to confirm that input type has bpf_refcount field.
* The implemented kfunc is actually bpf_refcount_acquire_impl, with
'hidden' second arg that the verifier sets to the type's struct_meta
in fixup_kfunc_call.
Signed-off-by: Dave Marchevsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add testing for the event "Instructions retired" (0xc0) in the PMU
event filter on both Intel and AMD to ensure that the event doesn't
count when it is disallowed. Unlike most of the other events, the
event "Instructions retired" will be incremented by KVM when an
instruction is emulated. Test that this case is being properly handled
and that KVM doesn't increment the counter when that event is
disallowed.
Signed-off-by: Aaron Lewis <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Use a single struct to track all PMC event counts in the PMU filter test,
and copy the full struct to/from the guest when running and measuring each
guest workload. Using a common struct avoids naming conflicts, e.g. the
loads/stores testcase has claimed "perf_counter", and eliminates the
unnecessary truncation of the counter values when they are propagated from
the guest MSRs to the host structs.
Zero the struct before running the guest workload to ensure that the test
doesn't get a false pass due to consuming data from a previous run.
Link: https://lore.kernel.org/r/[email protected]
Reviewed by: Aaron Lewis <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Use '0' to signal success and '-errno' to signal failure in the PMU event
filter test so that the values are slightly less magical/arbitrary. Using
'0' in the error paths is especially confusing as understanding it's an
error value requires following the breadcrumbs to the host code that
ultimately consumes the value.
Arguably there should also be a #define for "success", but 0/-errno is a
common enough pattern that defining another macro on top would likely do
more harm than good.
Link: https://lore.kernel.org/r/[email protected]
Reviewed by: Aaron Lewis <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Provide the actual vs. expected count in the PMU event filter test's
asserts instead of relying on pr_info() to provide the context, e.g. so
that all information needed to triage a failure is readily available even
if the environment in which the test is run captures only the assert
itself.
Signed-off-by: Aaron Lewis <[email protected]>
[sean: rewrite changelog]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Add helper macros to consolidate the asserts that a PMC is/isn't counting
(branch) instructions retired. This will make it easier to add additional
asserts related to counting instructions later on.
No functional changes intended.
Signed-off-by: Aaron Lewis <[email protected]>
[sean: add "INSTRUCTIONS", massage changelog]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Split out the common parts of the Intel and AMD guest code in the PMU
event filter test into a helper function. This is in preparation for
adding additional counters to the test.
No functional changes intended.
Signed-off-by: Aaron Lewis <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
An error snuck in between two recent conflicting changes:
Until recently ->setup() used negative values to indicate
normal test termination. This was changed in
commit fa10366cc6f4 ("selftests/resctrl: Allow ->setup() to return
errors") that transitioned ->setup() to use negative values
to indicate errors and a new END_OF_TESTS to indicate normal
termination.
commit 42e3b093eb7c ("selftests/resctrl: Fix set up schemata with 100%
allocation on first run in MBM test") continued to use
negative return to indicate normal test termination.
Fix mbm_setup() to use the new END_OF_TESTS to indicate
error-free test termination.
Fixes: 42e3b093eb7c ("selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test")
Reported-by: Ilpo Järvinen <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
There is a spelling mistake in a test report message. Fix it.
Signed-off-by: Colin Ian King <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Some distros ship with older vm_sockets.h that doesn't have VMADDR_CID_LOCAL
which causes selftests build to fail:
/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c:261:18: error: ‘VMADDR_CID_LOCAL’ undeclared (first use in this function); did you mean ‘VMADDR_CID_HOST’?
261 | addr->svm_cid = VMADDR_CID_LOCAL;
| ^~~~~~~~~~~~~~~~
| VMADDR_CID_HOST
Workaround this issue by defining it on demand.
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Fix merge conflict between bpf/bpf-next trees due to change of arguments in SYS() macro.
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
====================
pull-request: bpf-next 2023-04-13
We've added 260 non-merge commits during the last 36 day(s) which contain
a total of 356 files changed, 21786 insertions(+), 11275 deletions(-).
The main changes are:
1) Rework BPF verifier log behavior and implement it as a rotating log
by default with the option to retain old-style fixed log behavior,
from Andrii Nakryiko.
2) Adds support for using {FOU,GUE} encap with an ipip device operating
in collect_md mode and add a set of BPF kfuncs for controlling encap
params, from Christian Ehrig.
3) Allow BPF programs to detect at load time whether a particular kfunc
exists or not, and also add support for this in light skeleton,
from Alexei Starovoitov.
4) Optimize hashmap lookups when key size is multiple of 4,
from Anton Protopopov.
5) Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps, from David Vernet.
6) Add support for stashing local BPF kptr into a map value via
bpf_kptr_xchg(). This is useful e.g. for rbtree node creation
for new cgroups, from Dave Marchevsky.
7) Fix BTF handling of is_int_ptr to skip modifiers to work around
tracing issues where a program cannot be attached, from Feng Zhou.
8) Migrate a big portion of test_verifier unit tests over to
test_progs -a verifier_* via inline asm to ease {read,debug}ability,
from Eduard Zingerman.
9) Several updates to the instruction-set.rst documentation
which is subject to future IETF standardization
(https://lwn.net/Articles/926882/), from Dave Thaler.
10) Fix BPF verifier in the __reg_bound_offset's 64->32 tnum sub-register
known bits information propagation, from Daniel Borkmann.
11) Add skb bitfield compaction work related to BPF with the overall goal
to make more of the sk_buff bits optional, from Jakub Kicinski.
12) BPF selftest cleanups for build id extraction which stand on its own
from the upcoming integration work of build id into struct file object,
from Jiri Olsa.
13) Add fixes and optimizations for xsk descriptor validation and several
selftest improvements for xsk sockets, from Kal Conley.
14) Add BPF links for struct_ops and enable switching implementations
of BPF TCP cong-ctls under a given name by replacing backing
struct_ops map, from Kui-Feng Lee.
15) Remove a misleading BPF verifier env->bypass_spec_v1 check on variable
offset stack read as earlier Spectre checks cover this,
from Luis Gerhorst.
16) Fix issues in copy_from_user_nofault() for BPF and other tracers
to resemble copy_from_user_nmi() from safety PoV, from Florian Lehner
and Alexei Starovoitov.
17) Add --json-summary option to test_progs in order for CI tooling to
ease parsing of test results, from Manu Bretelle.
18) Batch of improvements and refactoring to prep for upcoming
bpf_local_storage conversion to bpf_mem_cache_{alloc,free} allocator,
from Martin KaFai Lau.
19) Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations,
from Quentin Monnet.
20) Fix attaching fentry/fexit/fmod_ret/lsm to modules by extracting
the module name from BTF of the target and searching kallsyms of
the correct module, from Viktor Malik.
21) Improve BPF verifier handling of '<const> <cond> <non_const>'
to better detect whether in particular jmp32 branches are taken,
from Yonghong Song.
22) Allow BPF TCP cong-ctls to write app_limited of struct tcp_sock.
A built-in cc or one from a kernel module is already able to write
to app_limited, from Yixin Shen.
Conflicts:
Documentation/bpf/bpf_devel_QA.rst
b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
0f10f647f455 ("bpf, docs: Use internal linking for link to netdev subsystem doc")
https://lore.kernel.org/all/[email protected]/
include/net/ip_tunnels.h
bc9d003dc48c3 ("ip_tunnel: Preserve pointer const in ip_tunnel_info_opts")
ac931d4cdec3d ("ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices")
https://lore.kernel.org/all/[email protected]/
net/bpf/test_run.c
e5995bc7e2ba ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption")
294635a8165a ("bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES")
https://lore.kernel.org/all/[email protected]/
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Conflicts:
tools/testing/selftests/net/config
62199e3f1658 ("selftests: net: Add VXLAN MDB test")
3a0385be133e ("selftests: add the missing CONFIG_IP_SCTP in net config")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, and bluetooth.
Not all that quiet given spring celebrations, but "current" fixes are
thinning out, which is encouraging. One outstanding regression in the
mlx5 driver when using old FW, not blocking but we're pushing for a
fix.
Current release - new code bugs:
- eth: enetc: workaround for unresponsive pMAC after receiving
express traffic
Previous releases - regressions:
- rtnetlink: restore RTM_NEW/DELLINK notification behavior, keep the
pid/seq fields 0 for backward compatibility
Previous releases - always broken:
- sctp: fix a potential overflow in sctp_ifwdtsn_skip
- mptcp:
- use mptcp_schedule_work instead of open-coding it and make the
worker check stricter, to avoid scheduling work on closed
sockets
- fix NULL pointer dereference on fastopen early fallback
- skbuff: fix memory corruption due to a race between skb coalescing
and releasing clones confusing page_pool reference counting
- bonding: fix neighbor solicitation validation on backup slaves
- bpf: tcp: use sock_gen_put instead of sock_put in bpf_iter_tcp
- bpf: arm64: fixed a BTI error on returning to patched function
- openvswitch: fix race on port output leading to inf loop
- sfp: initialize sfp->i2c_block_size at sfp allocation to avoid
returning a different errno than expected
- phy: nxp-c45-tja11xx: unregister PTP, purge queues on remove
- Bluetooth: fix printing errors if LE Connection times out
- Bluetooth: assorted UaF, deadlock and data race fixes
- eth: macb: fix memory corruption in extended buffer descriptor mode
Misc:
- adjust the XDP Rx flow hash API to also include the protocol layers
over which the hash was computed"
* tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
xdp: rss hash types representation
selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
skbuff: Fix a race between coalescing and releasing SKBs
net: macb: fix a memory corruption in extended buffer descriptor mode
selftests: add the missing CONFIG_IP_SCTP in net config
udp6: fix potential access to stale information
selftests: openvswitch: adjust datapath NL message declaration
selftests: mptcp: userspace pm: uniform verify events
mptcp: fix NULL pointer dereference on fastopen early fallback
mptcp: stricter state check in mptcp_worker
mptcp: use mptcp_schedule_work instead of open-coding it
net: enetc: workaround for unresponsive pMAC after receiving express traffic
sctp: fix a potential overflow in sctp_ifwdtsn_skip
net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
rtnetlink: Restore RTM_NEW/DELLINK notification behavior
net: ti/cpsw: Add explicit platform_device.h and of_platform.h includes
...
|
|
Update BPF selftests to use the new RSS type argument for kfunc
bpf_xdp_metadata_rx_hash.
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/r/168132894068.340624.8914711185697163690.stgit@firesoul
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The tool xdp_hw_metadata can be used by driver developers
implementing XDP-hints metadata kfuncs.
Remove all bpf_printk calls, as the tool already transfers all the
XDP-hints related information via metadata area to AF_XDP
userspace process.
Add counters for providing remaining information about failure and
skipped packet events.
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/r/168132891533.340624.7313781245316405141.stgit@firesoul
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Before exiting each test function(run_cmt/cat/mbm/mba_test()),
test results("ok","not ok") are printed by ksft_test_result() and then
temporary result files are cleaned by function
cmt/cat/mbm/mba_test_cleanup().
However, before running ksft_test_result(),
function cmt/cat/mbm/mba_test_cleanup()
has been run in each test function as follows:
cmt_resctrl_val()
cat_perf_miss_val()
mba_schemata_change()
mbm_bw_change()
Remove duplicate codes that clear each test result file,
while ensuring cleanup properly even when errors occur in each test.
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: Shaopeng Tan <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
tests
After creating a child process with fork() in CAT test, if a signal such
as SIGINT is received, the parent process will be terminated immediately,
and therefore the child process will not be killed and also resctrlfs is
not unmounted.
There is a signal handler registered in CMT/MBM/MBA tests, which kills
child process, unmount resctrlfs, cleanups result files, etc., if a
signal such as SIGINT is received.
Commonize the signal handler registered for CMT/MBM/MBA tests and
reuse it in CAT.
To reuse the signal handler to kill child process use global bm_pid
instead of local bm_pid.
Also, since the MBA/MBA/CMT/CAT are run in order, unregister the signal
handler at the end of each test so that the signal handler cannot be
inherited by other tests.
Reviewed-by: Ilpo Jarvinen <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: Shaopeng Tan <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
After creating a child process with fork() in CAT test, if an error
occurs when parent process runs cat_val() or check_results(), the child
process will not be killed and also resctrlfs is not unmounted. Also if
an error occurs when child process runs cat_val() or check_results(),
the parent process will wait for the pipe message from the child process
which will never be sent by the child process and the parent process
cannot proceed to unmount resctrlfs.
Synchronize the exits between the parent and child. An error could
occur whether in parent process or child process. The parent process
always kills the child process and runs umount_resctrlfs(). The
child process always waits to be killed by the parent process.
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: Shaopeng Tan <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
When a process has buffered output, a child process created by fork()
will also copy buffered output. When using kselftest framework,
the output (resctrl test result message) will be printed multiple times.
Add fflush() to flush out the buffered output before executing fork().
Reviewed-by: Shuah Khan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: Shaopeng Tan <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
Since MBA check result is not returned, the MBA test result message
is always output as "ok" regardless of whether the MBA check result is
true or false.
Make output message to be "not ok" if MBA check result is failed.
Reviewed-by: Shuah Khan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: Shaopeng Tan <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
MBM test
There is a comment "Set up shemata with 100% allocation on the first run"
in function mbm_setup(), but there is an increment bug and the condition
"num_of_runs == 0" will never be met and write_schemata() will never be
called to set schemata to 100%. Even if write_schemata() is called in MBM
test, since it is not supported for MBM test it does not set the schemata.
This is currently fine because resctrl_val_parm->mum_resctrlfs is always 1
and umount/mount will be run in each test to set the schemata to 100%.
To support the usage when MBM test does not unmount/remount resctrl
filesystem before the test starts, fix to call write_schemata() and
set schemata properly when the function is called for the first time.
Also, remove static local variable 'num_of_runs' because this is not
needed as there is resctrl_val_param->num_of_runs which should be used
instead like in cat_setup().
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: Shaopeng Tan <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
Use ksft_finished() after running tests so that resctrl_tests doesn't
return exit code 0 when tests fail.
Consequently, report the MBA and MBM tests as skipped when running on
non-Intel hardware, otherwise resctrl_tests will exit with a failure
code.
Signed-off-by: Peter Newman <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>
|
|
The selftest sctp_vrf needs CONFIG_IP_SCTP set in config
when building the kernel, so add it.
Fixes: a61bd7b9fef3 ("selftests: add a selftest for sctp vrf")
Reported-by: Naresh Kamboju <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Sridhar Samudrala <[email protected]>
Link: https://lore.kernel.org/r/61dddebc4d2dd98fe7fb145e24d4b2430e42b572.1681312386.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The netlink message for creating a new datapath takes an array
of ports for the PID creation. This shouldn't cause much issue
but correct it for future cases where we need to do decode of
datapath information that could include the per-cpu PID map.
Fixes: 25f16c873fb1 ("selftests: add openvswitch selftest suite")
Signed-off-by: Aaron Conole <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Simply adding a "sleep" before checking something is usually not a good
idea because the time that has been picked can not be enough or too
much. The best is to wait for events with a timeout.
In this selftest, 'sleep 0.5' is used more than 40 times. It is always
used before calling a 'verify_*' function except for this
verify_listener_events which has been added later.
At the end, using all these 'sleep 0.5' seems to work: the slow CIs
don't complain so far. Also because it doesn't take too much time, we
can just add two more 'sleep 0.5' to uniform what is done before calling
a 'verify_*' function. For the same reasons, we can also delay a bigger
refactoring to replace all these 'sleep 0.5' by functions waiting for
events instead of waiting for a fix time and hope for the best.
Fixes: 6c73008aa301 ("selftests: mptcp: listener test for userspace PM")
Cc: [email protected]
Suggested-by: Paolo Abeni <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|