blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-04-18	selftests/mm: move uffd pagemap test to unit test	Peter Xu	2	-166/+145
	Move it over and make it split into two tests, one for pagemap and one for the new WP_UNPOPULATED (to be a separate one). The thp pagemap test wasn't really working (with MADV_HUGEPAGE). Let's just drop it (since it never really worked anyway..) and leave that for later. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: add framework for uffd-unit-test	Peter Xu	3	-0/+163
	Add a framework to be prepared to move unit tests from uffd-stress.c into uffd-unit-tests.c. The goal is to allow detection of uffd features for each test, and also loop over specified types of memory that a test support. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: allow allocate_area() to fail properly	Peter Xu	2	-15/+36
	Mostly to detect hugetlb allocation errors and skip hugetlb tests when pages are not allocated. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: let uffd_handle_page_fault() take wp parameter	Peter Xu	3	-16/+23
	Make the handler optionally apply WP bit when resolving page faults for either missing or minor page faults. This moves towards removing global test_uffdio_wp outside of the common code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: rename uffd_stats to uffd_args	Peter Xu	3	-43/+42
	Prepare for adding more fields into the struct. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Suggested-by: Mike Rapoport (IBM) <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: drop global hpage_size in uffd tests	Peter Xu	3	-7/+8
	hpage_size was wrongly used. Sometimes it means hugetlb default size, sometimes it was used as thp size. Remove the global variable and use the right one at each place. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: drop global mem_fd in uffd tests	Peter Xu	3	-17/+28
	Drop it by creating the memfd dynamically in the tests. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: UFFDIO_API test	Peter Xu	1	-1/+108
	Add one simple test for UFFDIO_API. With that, I also added a bunch of small but handy helpers along the way. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: uffd_open_{dev\|sys}()	Peter Xu	3	-23/+31
	Provide two helpers to open an uffd handle. Drop the error checks around SKIPs because it's inside an errexit() anyway, which IMHO doesn't really help much if the test will not continue. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: uffd_[un]register()	Peter Xu	7	-104/+64
	Add two helpers to register/unregister to an uffd. Use them to drop duplicate codes. This patch also drops assert_expected_ioctls_present() and get_expected_ioctls(). Reasons: - It'll need a lot of effort to pass test_type==HUGETLB into it from the upper, so it's the simplest way to get rid of another global var - The ioctls returned in UFFDIO_REGISTER is hardly useful at all, because any app can already detect kernel support on any ioctl via its corresponding UFFD_FEATURE_*. The check here is for sanity mostly but it's probably destined no user app will even use it. - It's not friendly to one future goal of uffd to run on old kernels, the problem is get_expected_ioctls() compiles against UFFD_API_RANGE_IOCTLS, which is a value that can change depending on where the test is compiled, rather than reflecting what the kernel underneath has. It means it'll report false negatives on old kernels so it's against our will. So let's make our lives easier. [[email protected]; tools/testing/selftests/mm/hugepage-mremap.c: add headers] Link: https://lkml.kernel.org/r/ZDxrvZh/cw357D8P@x1n Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: split uffd tests into uffd-stress and uffd-unit-tests	Peter Xu	5	-8/+40
	In many ways it's weird and unwanted to keep all the tests in the same userfaultfd.c at least when still in the current way. For example, it doesn't make much sense to run the stress test for each method we can create an userfaultfd handle (either via syscall or /dev/ node). It's a waste of time running this twice for the whole stress as the stress paths are the same, only the open path is different. It's also just weird to need to manually specify different types of memory to run all unit tests for the userfaultfd interface. We should be able to just run a single program and that should go through all functional uffd tests without running the stress test at all. The stress test was more for torturing and finding race conditions. We don't want to wait for stress to finish just to regress test a functional test. When we start to pile up more things on top of the same file and same functions, things start to go a bit chaos and the code is just harder to maintain too with tons of global variables. This patch creates a new test uffd-unit-tests to keep userfaultfd unit tests in the future, currently empty. Meanwhile rename the old userfaultfd.c test to uffd-stress.c. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: create uffd-common.[ch]	Peter Xu	4	-693/+731
	Move common utility functions into uffd-common.[ch] files from the original userfaultfd.c. This prepares for a split of userfaultfd.c into two tests: one to only cover the old but powerful stress test, the other one covers all the functional tests. This movement is kind of a brute-force effort for now, with light touch-ups but nothing should really change. There's chances to optimize more, but let's leave that for later. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: drop test_uffdio_zeropage_eexist	Peter Xu	1	-9/+11
	The idea was trying to flip this var in the alarm handler from time to time to test -EEXIST of UFFDIO_ZEROPAGE, but firstly it's only used in the zeropage test so probably only used once, meanwhile we passed "retry==false" so it'll never got tested anyway. Drop both sides so we always test UFFDIO_ZEROPAGE retries if has_zeropage is set (!hugetlb). One more thing to do is doing UFFDIO_REGISTER for the alias buffer too, because otherwise the test won't even pass! We were just lucky that this test never really got ran at all. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: test UFFDIO_ZEROPAGE only when !hugetlb	Peter Xu	1	-1/+1
	Make the check as simple as "test_type == TEST_HUGETLB" because that's the only mem that doesn't support ZEROPAGE. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: reuse pagemap_get_entry() in vm_util.h	Peter Xu	1	-22/+9
	Meanwhile drop pagemap_read_vaddr(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: use PM_* macros in vm_utils.h	Peter Xu	3	-20/+12
	We've got the macros in uffd-stress.c, move it over and use it in vm_util.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: merge default_huge_page_size() into one	Peter Xu	5	-66/+24
	There're already 3 same definitions of the three functions. Move it into vm_util.[ch]. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: link vm_util.c always	Peter Xu	1	-12/+1
	We do have plenty of files that want to link against vm_util.c. Just make it simple by linking it always. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: use TEST_GEN_PROGS where proper	Peter Xu	1	-32/+33
	TEST_GEN_PROGS and TEST_GEN_FILES are used randomly in the mm/Makefile to specify programs that need to build. Logically all these binaries should all fall into TEST_GEN_PROGS. Replace those TEST_GEN_FILES with TEST_GEN_PROGS, so that we can reference all the tests easily later. [[email protected]: tools/testing/selftests/mm/Makefile: don't wipe out TEST_GEN_PROGS] Link: https://lkml.kernel.org/r/ZDxrvZh/cw357D8P@x1n Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: merge util.h into vm_util.h	Peter Xu	8	-85/+80
	There're two util headers under mm/ kselftest. Merge one with another. It turns out util.h is the easy one to move. When merging, drop PAGE_SIZE / PAGE_SHIFT because they're unnecessary wrappers to page_size() / page_shift(), meanwhile rename them to psize() and pshift() so as to not conflict with some existing definitions in some test files that includes vm_util.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: dump a summary in run_vmtests.sh	Peter Xu	1	-0/+8
	Dump a summary after running whatever test specified. Useful for human runners to identify any kind of failures (besides exit code). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: update .gitignore with two missing tests	Peter Xu	1	-0/+2
	Patch series "selftests/mm: Split / Refactor userfault test", v2. This patchset splits userfaultfd.c into two tests: - uffd-stress: the "vanilla", old and powerful stress test - uffd-unit-tests: all the unit tests will be moved here This is on my todo list for a long time but I never did it for real. The uffd test is growing into a small and cute monster. I start to notice it's going harder to maintain such a test and make it useful. A few issues I found when looking at userfaultfd test: - We have a bunch of unit tests in userfaultfd.c, but they always need to be run only after a stress type. No way to not do it. - We can only run an unit test for one memory type only, if we want to do a quick smoke test to check regressions, there's no good way. The best to come currently is "bash ./run_vmtests.sh -t userfaultfd" thanks to the most recent changes to run_vmtests.sh on tagging. Still, that needs to run the stress tests always and hard to see what's wrong. - It's hard to add a new unit test to userfaultfd.c, we don't really know what's happening, not until we mostly read the whole file. - We did a bunch of useless tests, e.g. we run twice the whole suite of stress test just to verify both syscall and /dev/userfaultfd. They're all using userfaultfd_new() to create the handle, everything should really be the same underneath. One simple unit test should cover that! - We have tens of global variables in one file but shared with all the tests. Some of them are not suitable to be a global var from maintainance pov. It enforces every unit test to consider how these vars affects the stress test and vice versa, but that's logically not necessary. - Userfaultfd test is not friendly to old kernels. Mostly it only works on the latest kernel tree. It's preferrable to be run on all kernels and properly report what's missing. I'll stop here, I feel like I can still list some.. This patchset should resolve all issues above, and actually we can do even more on top. I stopped doing that until I found I already got 29 patches and 2000+ LOC changes. That's already a patchset terrible enough so we should move in small steps. After the whole set applied, "./run_vmtests.sh -t userfaultfd" looks like this: ===8<=== vm.nr_hugepages = 1024 ------------------------- running ./uffd-unit-tests ------------------------- Testing UFFDIO_API (with syscall)... done Testing UFFDIO_API (with /dev/userfaultfd)... done Testing register-ioctls on anon... done Testing register-ioctls on shmem... done Testing register-ioctls on shmem-private... done Testing register-ioctls on hugetlb... done Testing register-ioctls on hugetlb-private... done Testing zeropage on anon... done Testing zeropage on shmem... done Testing zeropage on shmem-private... done Testing zeropage on hugetlb... done Testing zeropage on hugetlb-private... done Testing pagemap on anon... done Testing wp-unpopulated on anon... done Testing minor on shmem... done Testing minor on hugetlb... done Testing minor-wp on shmem... done Testing minor-wp on hugetlb... done Testing minor-collapse on shmem... done Testing sigbus on anon... done Testing sigbus on shmem... done Testing sigbus on shmem-private... done Testing sigbus on hugetlb... done Testing sigbus on hugetlb-private... done Testing sigbus-wp on anon... done Testing sigbus-wp on shmem... done Testing sigbus-wp on shmem-private... done Testing sigbus-wp on hugetlb... done Testing sigbus-wp on hugetlb-private... done Testing events on anon... done Testing events on shmem... done Testing events on shmem-private... done Testing events on hugetlb... done Testing events on hugetlb-private... done Testing events-wp on anon... done Testing events-wp on shmem... done Testing events-wp on shmem-private... done Testing events-wp on hugetlb... done Testing events-wp on hugetlb-private... done Userfaults unit tests: pass=39, skip=0, fail=0 (total=39) [PASS] -------------------------------- running ./uffd-stress anon 20 16 -------------------------------- nr_pages: 5120, nr_pages_per_cpu: 640 bounces: 15, mode: rnd racing ver poll, userfaults: 345 missing (26+48+61+102+30+12+59+7) 1596 wp (120+139+317+346+215+67+306+86) [...] [PASS] ------------------------------------ running ./uffd-stress hugetlb 128 32 ------------------------------------ nr_pages: 64, nr_pages_per_cpu: 8 bounces: 31, mode: rnd racing ver poll, userfaults: 29 missing (6+6+6+5+4+2+0+0) 104 wp (20+19+22+18+7+12+5+1) [...] [PASS] -------------------------------------------- running ./uffd-stress hugetlb-private 128 32 -------------------------------------------- nr_pages: 64, nr_pages_per_cpu: 8 bounces: 31, mode: rnd racing ver poll, userfaults: 33 missing (12+9+7+0+5+0+0+0) 111 wp (24+25+14+14+11+17+5+1) [...] [PASS] --------------------------------- running ./uffd-stress shmem 20 16 --------------------------------- nr_pages: 5120, nr_pages_per_cpu: 640 bounces: 15, mode: rnd racing ver poll, userfaults: 247 missing (15+17+34+60+81+37+3+0) 2038 wp (180+114+276+400+381+318+165+204) [...] [PASS] ----------------------------------------- running ./uffd-stress shmem-private 20 16 ----------------------------------------- nr_pages: 5120, nr_pages_per_cpu: 640 bounces: 15, mode: rnd racing ver poll, userfaults: 235 missing (52+29+55+56+13+9+16+5) 2849 wp (218+406+461+531+328+284+430+191) [...] [PASS] SUMMARY: PASS=6 SKIP=0 FAIL=0 ===8<=== The output may be different if we miss some features (e.g., hugetlb not allocated, old kernel, less privilege of uffd handle), but they should show up with good reasons. E.g., I tried to run the unit test on my Fedora kernel and it gives me: ===8<=== UFFDIO_API (with syscall)... failed [reason: UFFDIO_API should fail with wrong api but didn't] UFFDIO_API (with /dev/userfaultfd)... skipped [reason: cannot open userfaultfd handle] zeropage on anon... done zeropage on shmem... done zeropage on shmem-private... done zeropage-hugetlb on hugetlb... done zeropage-hugetlb on hugetlb-private... done pagemap on anon... pagemap on anon... pagemap on anon... done wp-unpopulated on anon... skipped [reason: feature missing] minor on shmem... done minor on hugetlb... done minor-wp on shmem... skipped [reason: feature missing] minor-wp on hugetlb... skipped [reason: feature missing] minor-collapse on shmem... done sigbus on anon... skipped [reason: possible lack of priviledge] sigbus on shmem... skipped [reason: possible lack of priviledge] sigbus on shmem-private... skipped [reason: possible lack of priviledge] sigbus on hugetlb... skipped [reason: possible lack of priviledge] sigbus on hugetlb-private... skipped [reason: possible lack of priviledge] sigbus-wp on anon... skipped [reason: possible lack of priviledge] sigbus-wp on shmem... skipped [reason: possible lack of priviledge] sigbus-wp on shmem-private... skipped [reason: possible lack of priviledge] sigbus-wp on hugetlb... skipped [reason: possible lack of priviledge] sigbus-wp on hugetlb-private... skipped [reason: possible lack of priviledge] events on anon... skipped [reason: possible lack of priviledge] events on shmem... skipped [reason: possible lack of priviledge] events on shmem-private... skipped [reason: possible lack of priviledge] events on hugetlb... skipped [reason: possible lack of priviledge] events on hugetlb-private... skipped [reason: possible lack of priviledge] events-wp on anon... skipped [reason: possible lack of priviledge] events-wp on shmem... skipped [reason: possible lack of priviledge] events-wp on shmem-private... skipped [reason: possible lack of priviledge] events-wp on hugetlb... skipped [reason: possible lack of priviledge] events-wp on hugetlb-private... skipped [reason: possible lack of priviledge] Userfaults unit tests: pass=9, skip=24, fail=1 (total=34) ===8<=== Patch layout: - Revert "userfaultfd: don't fail on unrecognized features" Something I found when I got the UFFDIO_API test below. Axel, I still propose to revert it as a whole, but feel free to continue the discussion from the original patch thread. - selftests/mm: Update .gitignore with two missing tests - selftests/mm: Dump a summary in run_vmtests.sh - selftests/mm: Merge util.h into vm_util.h - selftests/mm: Use TEST_GEN_PROGS where proper - selftests/mm: Link vm_util.c always - selftests/mm: Merge default_huge_page_size() into one - selftests/mm: Use PM_* macros in vm_utils.h - selftests/mm: Reuse pagemap_get_entry() in vm_util.h - selftests/mm: Test UFFDIO_ZEROPAGE only when !hugetlb - selftests/mm: Drop test_uffdio_zeropage_eexist Until here, all cleanups here and there. I wanted to keep going, but I found that maybe it'll take a few more days to split the test. Hence I did a split starting from the next one, so we have a working thing first. - selftests/mm: Create uffd-common.[ch] - selftests/mm: Split uffd tests into uffd-stress and uffd-unit-tests This did the major brute force split of common codes into uffd-common.[ch]. That'll be the so far common base for stress and unit tests. Then a new unit test is created. - selftests/mm: uffd_[un]register() - selftests/mm: uffd_open_{dev\|sys}() - selftests/mm: UFFDIO_API test This patch hides here to start writting the 1st unit test with UFFDIO_API, also detection of userfaultfd privileges. - selftests/mm: Drop global mem_fd in uffd tests - selftests/mm: Drop global hpage_size in uffd tests - selftests/mm: Rename uffd_stats to uffd_args - selftests/mm: Let uffd_handle_page_fault() takes wp parameter - selftests/mm: Allow allocate_area() to fail properly Some further cleanup that I noticed otherwise hard to move the tests. - selftests/mm: Add framework for uffd-unit-test The major patch provides the framework for most of the rest unit tests. - selftests/mm: Move uffd pagemap test to unit test - selftests/mm: Move uffd minor test to unit test - selftests/mm: Move uffd sig/events tests into uffd unit tests - selftests/mm: Move zeropage test into uffd unit tests Move unit tests and suite them into the new file. - selftests/mm: Workaround no way to detect uffd-minor + wp - selftests/mm: Allow uffd test to skip properly with no privilege - selftests/mm: Drop sys/dev test in uffd-stress test - selftests/mm: Add shmem-private test to uffd-stress A bunch of changes to do better on error reportings, and add shmem-private to the stress test which was long missing. - selftests/mm: Add uffdio register ioctls test One more patch to test uffdio_register.ioctls. This patch (of 30): Update .gitignore with two missing tests. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: mkdirty: test behavior of (pte\|pmd)_mkdirty on VMAs without ↵	David Hildenbrand	2	-0/+381
	write permissions Let's add some tests that trigger (pte\|pmd)_mkdirty on VMAs without write permissions. If an architecture implementation is wrong, we might accidentally set the PTE/PMD writable and allow for write access in a VMA without write permissions. The tests include reproducers for the two issues recently discovered and worked-around in core-MM for now: (1) commit 624a2c94f5b7 ("Partly revert "mm/thp: carry over dirty bit when thp splits on pmd"") (2) commit 96a9c287e25d ("mm/migrate: fix wrongly apply write bit after mkdirty on sparc64") In addition, some other tests that reveal further issues. All tests pass under x86_64: ./mkdirty # [INFO] detected THP size: 2048 KiB TAP version 13 1..6 # [INFO] PTRACE write access ok 1 SIGSEGV generated, page not modified # [INFO] PTRACE write access to THP ok 2 SIGSEGV generated, page not modified # [INFO] Page migration ok 3 SIGSEGV generated, page not modified # [INFO] Page migration of THP ok 4 SIGSEGV generated, page not modified # [INFO] PTE-mapping a THP ok 5 SIGSEGV generated, page not modified # [INFO] UFFDIO_COPY ok 6 SIGSEGV generated, page not modified # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0 But some fail on sparc64: ./mkdirty # [INFO] detected THP size: 8192 KiB TAP version 13 1..6 # [INFO] PTRACE write access not ok 1 SIGSEGV generated, page not modified # [INFO] PTRACE write access to THP not ok 2 SIGSEGV generated, page not modified # [INFO] Page migration ok 3 SIGSEGV generated, page not modified # [INFO] Page migration of THP ok 4 SIGSEGV generated, page not modified # [INFO] PTE-mapping a THP ok 5 SIGSEGV generated, page not modified # [INFO] UFFDIO_COPY not ok 6 SIGSEGV generated, page not modified Bail out! 3 out of 6 tests failed # Totals: pass:3 fail:3 xfail:0 xpass:0 skip:0 error:0 Reverting both above commits makes all tests fail on sparc64: ./mkdirty # [INFO] detected THP size: 8192 KiB TAP version 13 1..6 # [INFO] PTRACE write access not ok 1 SIGSEGV generated, page not modified # [INFO] PTRACE write access to THP not ok 2 SIGSEGV generated, page not modified # [INFO] Page migration not ok 3 SIGSEGV generated, page not modified # [INFO] Page migration of THP not ok 4 SIGSEGV generated, page not modified # [INFO] PTE-mapping a THP not ok 5 SIGSEGV generated, page not modified # [INFO] UFFDIO_COPY not ok 6 SIGSEGV generated, page not modified Bail out! 6 out of 6 tests failed # Totals: pass:0 fail:6 xfail:0 xpass:0 skip:0 error:0 The tests are useful to detect other problematic archs, to verify new arch fixes, and to stop such issues from reappearing in the future. For now, we don't add any hugetlb tests. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: David S. Miller <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Peter Xu <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests/mm: reuse read_pmd_pagesize() in COW selftest	David Hildenbrand	5	-31/+17
	Patch series "mm: (pte\|pmd)_mkdirty() should not unconditionally allow for write access". This is the follow-up on [1], adding selftests (testing for known issues we added workarounds for and other issues that haven't been fixed yet), fixing sparc64, reverting the workarounds, and perform one cleanup. The patch from [1] was modified slightly (updated/extended patch description, dropped one unnecessary NOP instruction from the ASM in __pte_mkhwwrite()). Retested on x86_64 and sparc64 (sun4u in QEMU). I scanned most architectures to make sure their (pte\|pmd)_mkdirty() handling is correct. To be sure, we can run the selftests and find out if other architectures are still affectes (loongarch was fixed recently as well). Based on master for now. I don't expect surprises regarding mm-tress, but I can rebase if there are any problems. This patch (of 6): The COW selftest can deal with THP not being configured. So move error handling of read_pmd_pagesize() into the callers such that we can reuse it in the COW selftest. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] [1] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: David S. Miller <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Peter Xu <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	maple_tree: add a test case to check maple_alloc	Peng Zhang	1	-0/+24
	Add a test case to check whether the number of maple_alloc structures is actually equal to mas->alloc->total. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peng Zhang <[email protected]> Cc: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18	selftests: Test the new RISC-V hwprobe interface	Evan Green	5	-0/+171
	This adds a test for the recently added RISC-V interface for probing hardware capabilities. It happens to be the first selftest we have for RISC-V, so I've added some infrastructure for those as well. Co-developed-by: Palmer Dabbelt <[email protected]> Signed-off-by: Evan Green <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2023-04-18	libbpf: move bpf_for(), bpf_for_each(), and bpf_repeat() into bpf_helpers.h	Andrii Nakryiko	1	-103/+0
	To make it easier for bleeding-edge BPF applications, such as sched_ext, to utilize open-coded iterators, move bpf_for(), bpf_for_each(), and bpf_repeat() macros from selftests/bpf-internal bpf_misc.h helper, to libbpf-provided bpf_helpers.h header. Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-18	selftests/bpf: add missing __weak kfunc log fixup test	Andrii Nakryiko	2	-0/+41
	Add test validating that libbpf correctly poisons and reports __weak unresolved kfuncs in post-processed verifier log. Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-18	selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime ↵	Frederic Weisbecker	3	-8/+47
	monotonicity The first field of /proc/uptime relies on the CLOCK_BOOTTIME clock which can also be fetched from clock_gettime() API. Improve the test coverage while verifying the monotonicity of CLOCK_BOOTTIME accross both interfaces. Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-18	selftests/proc: Remove idle time monotonicity assertions	Frederic Weisbecker	3	-27/+14
	Due to broken iowait task counting design (cf: comments above get_cpu_idle_time_us() and nr_iowait()), it is not possible to provide the guarantee that /proc/stat or /proc/uptime display monotonic idle time values. Remove the assertions that verify the related wrong assumption so that testers and maintainers don't spend more time on that. Reported-by: Yu Liao <[email protected]> Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-17	selftests/bpf: Add a selftest for checking subreg equality	Yonghong Song	2	-0/+60
	Add a selftest to ensure subreg equality if source register upper 32bit is 0. Without previous patch, the test will fail verification. Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-17	selftests: mptcp: join: fix ShellCheck warnings	Matthieu Baerts	1	-2/+8
	Most of the code had an issue according to ShellCheck. That's mainly due to the fact it incorrectly believes most of the code was unreachable because it's invoked by variable name, see how the "tests" array is used. Once SC2317 has been ignored, three small warnings were still visible: - SC2155: Declare and assign separately to avoid masking return values. - SC2046: Quote this to prevent word splitting: can be ignored because "ip netns pids" can display more than one pid. - SC2166: Prefer [ p ] \|\| [ q ] as [ p -o q ] is not well defined. This probably didn't fix any actual issues but it might help spotting new interesting warnings reported by ShellCheck as just before, ShellCheck was reporting issues for most lines making it a bit useless. Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-17	selftests: mptcp: remove duplicated entries in usage	Matthieu Baerts	1	-4/+4
	mptcp_connect tool was printing some duplicated entries when showing how to use it: -j -l -r While at it, I also: - moved the very few entries that were not sorted, - added -R that was missing since commit 8a4b910d005d ("mptcp: selftests: add rcvbuf set option"), - removed the -u parameter that has been removed in commit f730b65c9d85 ("selftests: mptcp: try to set mptcp ulp mode in different sk states"). No need to backport this, it is just an internal tool used by our selftests. The help menu is mainly useful for MPTCP kernel devs. Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-17	selftests: openvswitch: add support for upcall testing	Aaron Conole	2	-11/+165
	The upcall socket interface can be exercised now to make sure that future feature adjustments to the field can maintain backwards compatibility. Signed-off-by: Aaron Conole <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-17	selftests: openvswitch: add flow dump support	Aaron Conole	1	-0/+1026
	Add a basic set of fields to print in a 'dpflow' format. This will be used by future commits to check for flow fields after parsing, as well as verifying the flow fields pushed into the kernel from userspace. Signed-off-by: Aaron Conole <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-17	selftests: openvswitch: add interface support	Aaron Conole	2	-10/+163
	Includes an associated test to generate netns and connect interfaces, with the option to include packet tracing. This will be used in the future when flow support is added for additional test cases. Signed-off-by: Aaron Conole <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-16	sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changes	Andrew Morton	1	-0/+16

2023-04-16	bpf: Remove bpf_kfunc_call_test_kptr_get() test kfunc	David Vernet	3	-140/+5
	We've managed to improve the UX for kptrs significantly over the last 9 months. All of the prior main use cases, struct bpf_cpumask , struct task_struct , and struct cgroup *, have all been updated to be synchronized mainly using RCU. In other words, their KF_ACQUIRE kfunc calls are all KF_RCU, and the pointers themselves are MEM_RCU and can be accessed in an RCU read region in BPF. In a follow-on change, we'll be removing the KF_KPTR_GET kfunc flag. This patch prepares for that by removing the bpf_kfunc_call_test_kptr_get() kfunc, and all associated selftests. Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-16	selftest, ptrace: Add selftest for syscall user dispatch config api	Gregory Price	3	-1/+74
	Validate that the following new ptrace requests work as expected * PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG returns the contents of task->syscall_dispatch * PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG sets the contents of task->syscall_dispatch Signed-off-by: Gregory Price <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-16	selftests/timers/posix_timers: Test delivery of signals across threads	Dmitry Vyukov	1	-0/+77
	Test that POSIX timers using CLOCK_PROCESS_CPUTIME_ID eventually deliver a signal to all running threads. This effectively tests that the kernel doesn't prefer any one thread (or subset of threads) for signal delivery. Signed-off-by: Dmitry Vyukov <[email protected]> Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-15	selftests/bpf: Add refcounted_kptr tests	Dave Marchevsky	3	-0/+496
	Test refcounted local kptr functionality added in previous patches in the series. Usecases which pass verification: * Add refcounted local kptr to both tree and list. Then, read and - possibly, depending on test variant - delete from tree, then list. * Also test doing read-and-maybe-delete in opposite order * Stash a refcounted local kptr in a map_value, then add it to a rbtree. Read from both, possibly deleting after tree read. * Add refcounted local kptr to both tree and list. Then, try reading and deleting twice from one of the collections. * bpf_refcount_acquire of just-added non-owning ref should work, as should bpf_refcount_acquire of owning ref just out of bpf_obj_new Usecases which fail verification: * The simple successful bpf_refcount_acquire cases from above should both fail to verify if the newly-acquired owning ref is not dropped Signed-off-by: Dave Marchevsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-15	bpf: Migrate bpf_rbtree_remove to possibly fail	Dave Marchevsky	4	-84/+182
	This patch modifies bpf_rbtree_remove to account for possible failure due to the input rb_node already not being in any collection. The function can now return NULL, and does when the aforementioned scenario occurs. As before, on successful removal an owning reference to the removed node is returned. Adding KF_RET_NULL to bpf_rbtree_remove's kfunc flags - now KF_RET_NULL \| KF_ACQUIRE - provides the desired verifier semantics: * retval must be checked for NULL before use * if NULL, retval's ref_obj_id is released * retval is a "maybe acquired" owning ref, not a non-owning ref, so it will live past end of critical section (bpf_spin_unlock), and thus can be checked for NULL after the end of the CS BPF programs must add checks ============================ This does change bpf_rbtree_remove's verifier behavior. BPF program writers will need to add NULL checks to their programs, but the resulting UX looks natural: bpf_spin_lock(&glock); n = bpf_rbtree_first(&ghead); if (!n) { /* ... /} res = bpf_rbtree_remove(&ghead, &n->node); bpf_spin_unlock(&glock); if (!res) / Newly-added check after this patch / return 1; n = container_of(res, / ... /); / Do something else with n / bpf_obj_drop(n); return 0; The "if (!res)" check above is the only addition necessary for the above program to pass verification after this patch. bpf_rbtree_remove no longer clobbers non-owning refs ==================================================== An issue arises when bpf_rbtree_remove fails, though. Consider this example: struct node_data { long key; struct bpf_list_node l; struct bpf_rb_node r; struct bpf_refcount ref; }; long failed_sum; void bpf_prog() { struct node_data n = bpf_obj_new(/* ... /); struct bpf_rb_node res; n->key = 10; bpf_spin_lock(&glock); bpf_list_push_back(&some_list, &n->l); /* n is now a non-owning ref / res = bpf_rbtree_remove(&some_tree, &n->r, / ... /); if (!res) failed_sum += n->key; / not possible / bpf_spin_unlock(&glock); / if (res) { do something useful and drop } ... */ } The bpf_rbtree_remove in this example will always fail. Similarly to bpf_spin_unlock, bpf_rbtree_remove is a non-owning reference invalidation point. The verifier clobbers all non-owning refs after a bpf_rbtree_remove call, so the "failed_sum += n->key" line will fail verification, and in fact there's no good way to get information about the node which failed to add after the invalidation. This patch removes non-owning reference invalidation from bpf_rbtree_remove to allow the above usecase to pass verification. The logic for why this is now possible is as follows: Before this series, bpf_rbtree_add couldn't fail and thus assumed that its input, a non-owning reference, was in the tree. But it's easy to construct an example where two non-owning references pointing to the same underlying memory are acquired and passed to rbtree_remove one after another (see rbtree_api_release_aliasing in selftests/bpf/progs/rbtree_fail.c). So it was necessary to clobber non-owning refs to prevent this case and, more generally, to enforce "non-owning ref is definitely in some collection" invariant. This series removes that invariant and the failure / runtime checking added in this patch provide a clean way to deal with the aliasing issue - just fail to remove. Because the aliasing issue prevented by clobbering non-owning refs is no longer an issue, this patch removes the invalidate_non_owning_refs call from verifier handling of bpf_rbtree_remove. Note that bpf_spin_unlock - the other caller of invalidate_non_owning_refs - clobbers non-owning refs for a different reason, so its clobbering behavior remains unchanged. No BPF program changes are necessary for programs to remain valid as a result of this clobbering change. A valid program before this patch passed verification with its non-owning refs having shorter (or equal) lifetimes due to more aggressive clobbering. Also, update existing tests to check bpf_rbtree_remove retval for NULL where necessary, and move rbtree_api_release_aliasing from progs/rbtree_fail.c to progs/rbtree.c since it's now expected to pass verification. Signed-off-by: Dave Marchevsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-15	selftests/bpf: Modify linked_list tests to work with macro-ified inserts	Dave Marchevsky	4	-67/+73
	The linked_list tests use macros and function pointers to reduce code duplication. Earlier in the series, bpf_list_push_{front,back} were modified to be macros, expanding to invoke actual kfuncs bpf_list_push_{front,back}_impl. Due to this change, a code snippet like: void (p)(void , void ) = (void )&bpf_list_##op; p(hexpr, nexpr); meant to do bpf_list_push_{front,back}(hexpr, nexpr), will no longer work as it's no longer valid to do &bpf_list_push_{front,back} since they're no longer functions. This patch fixes issues of this type, along with two other minor changes - one improvement and one fix - both related to the node argument to list_push_{front,back}. * The fix: migration of list_push tests away from (void , void ) func ptr uncovered that some tests were incorrectly passing pointer to node, not pointer to struct bpf_list_node within the node. This patch fixes such issues (CHECK(..., f) -> CHECK(..., &f->node)) * The improvement: In linked_list tests, the struct foo type has two list_node fields: node and node2, at byte offsets 0 and 40 within the struct, respectively. Currently node is used in ~all tests involving struct foo and lists. The verifier needs to do some work to account for the offset of bpf_list_node within the node type, so using node2 instead of node exercises that logic more in the tests. This patch migrates linked_list tests to use node2 instead of node. Signed-off-by: Dave Marchevsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-15	bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail	Dave Marchevsky	1	-10/+39
	Consider this code snippet: struct node { long key; bpf_list_node l; bpf_rb_node r; bpf_refcount ref; } int some_bpf_prog(void ctx) { struct node n = bpf_obj_new(/.../), m; bpf_spin_lock(&glock); bpf_rbtree_add(&some_tree, &n->r, / ... /); m = bpf_refcount_acquire(n); bpf_rbtree_add(&other_tree, &m->r, / ... /); bpf_spin_unlock(&glock); / ... / } After bpf_refcount_acquire, n and m point to the same underlying memory, and that node's bpf_rb_node field is being used by the some_tree insert, so overwriting it as a result of the second insert is an error. In order to properly support refcounted nodes, the rbtree and list insert functions must be allowed to fail. This patch adds such support. The kfuncs bpf_rbtree_add, bpf_list_push_{front,back} are modified to return an int indicating success/failure, with 0 -> success, nonzero -> failure. bpf_obj_drop on failure ======================= Currently the only reason an insert can fail is the example above: the bpf_{list,rb}_node is already in use. When such a failure occurs, the insert kfuncs will bpf_obj_drop the input node. This allows the insert operations to logically fail without changing their verifier owning ref behavior, namely the unconditional release_reference of the input owning ref. With insert that always succeeds, ownership of the node is always passed to the collection, since the node always ends up in the collection. With a possibly-failed insert w/ bpf_obj_drop, ownership of the node is always passed either to the collection (success), or to bpf_obj_drop (failure). Regardless, it's correct to continue unconditionally releasing the input owning ref, as something is always taking ownership from the calling program on insert. Keeping owning ref behavior unchanged results in a nice default UX for insert functions that can fail. If the program's reaction to a failed insert is "fine, just get rid of this owning ref for me and let me go on with my business", then there's no reason to check for failure since that's default behavior. e.g.: long important_failures = 0; int some_bpf_prog(void ctx) { struct node n, m, o; / all bpf_obj_new'd / bpf_spin_lock(&glock); bpf_rbtree_add(&some_tree, &n->node, / ... /); bpf_rbtree_add(&some_tree, &m->node, / ... /); if (bpf_rbtree_add(&some_tree, &o->node, / ... /)) { important_failures++; } bpf_spin_unlock(&glock); } If we instead chose to pass ownership back to the program on failed insert - by returning NULL on success or an owning ref on failure - programs would always have to do something with the returned ref on failure. The most likely action is probably "I'll just get rid of this owning ref and go about my business", which ideally would look like: if (n = bpf_rbtree_add(&some_tree, &n->node, / ... /)) bpf_obj_drop(n); But bpf_obj_drop isn't allowed in a critical section and inserts must occur within one, so in reality error handling would become a hard-to-parse mess. For refcounted nodes, we can replicate the "pass ownership back to program on failure" logic with this patch's semantics, albeit in an ugly way: struct node n = bpf_obj_new(/* ... /), m; bpf_spin_lock(&glock); m = bpf_refcount_acquire(n); if (bpf_rbtree_add(&some_tree, &n->node, /* ... /)) { / Do something with m / } bpf_spin_unlock(&glock); bpf_obj_drop(m); bpf_refcount_acquire is used to simulate "return owning ref on failure". This should be an uncommon occurrence, though. Addition of two verifier-fixup'd args to collection inserts =========================================================== The actual bpf_obj_drop kfunc is bpf_obj_drop_impl(void , struct btf_struct_meta ), with bpf_obj_drop macro populating the second arg with 0 and the verifier later filling in the arg during insn fixup. Because bpf_rbtree_add and bpf_list_push_{front,back} now might do bpf_obj_drop, these kfuncs need a btf_struct_meta parameter that can be passed to bpf_obj_drop_impl. Similarly, because the 'node' param to those insert functions is the bpf_{list,rb}_node within the node type, and bpf_obj_drop expects a pointer to the beginning of the node, the insert functions need to be able to find the beginning of the node struct. A second verifier-populated param is necessary: the offset of {list,rb}_node within the node type. These two new params allow the insert kfuncs to correctly call __bpf_obj_drop_impl: beginning_of_node = bpf_rb_node_ptr - offset if (already_inserted) __bpf_obj_drop_impl(beginning_of_node, btf_struct_meta->record); Similarly to other kfuncs with "hidden" verifier-populated params, the insert functions are renamed with _impl prefix and a macro is provided for common usage. For example, bpf_rbtree_add kfunc is now bpf_rbtree_add_impl and bpf_rbtree_add is now a macro which sets "hidden" args to 0. Due to the two new args BPF progs will need to be recompiled to work with the new _impl kfuncs. This patch also rewrites the "hidden argument" explanation to more directly say why the BPF program writer doesn't need to populate the arguments with anything meaningful. How does this new logic affect non-owning references? ===================================================== Currently, non-owning refs are valid until the end of the critical section in which they're created. We can make this guarantee because, if a non-owning ref exists, the referent was added to some collection. The collection will drop() its nodes when it goes away, but it can't go away while our program is accessing it, so that's not a problem. If the referent is removed from the collection in the same CS that it was added in, it can't be bpf_obj_drop'd until after CS end. Those are the only two ways to free the referent's memory and neither can happen until after the non-owning ref's lifetime ends. On first glance, having these collection insert functions potentially bpf_obj_drop their input seems like it breaks the "can't be bpf_obj_drop'd until after CS end" line of reasoning. But we care about the memory not being _freed_ until end of CS end, and a previous patch in the series modified bpf_obj_drop such that it doesn't free refcounted nodes until refcount == 0. So the statement can be more accurately rewritten as "can't be free'd until after CS end". We can prove that this rewritten statement holds for any non-owning reference produced by collection insert functions: If the input to the insert function is _not_ refcounted * We have an owning reference to the input, and can conclude it isn't in any collection * Inserting a node in a collection turns owning refs into non-owning, and since our input type isn't refcounted, there's no way to obtain additional owning refs to the same underlying memory * Because our node isn't in any collection, the insert operation cannot fail, so bpf_obj_drop will not execute * If bpf_obj_drop is guaranteed not to execute, there's no risk of memory being free'd * Otherwise, the input to the insert function is refcounted * If the insert operation fails due to the node's list_head or rb_root already being in some collection, there was some previous successful insert which passed refcount to the collection * We have an owning reference to the input, it must have been acquired via bpf_refcount_acquire, which bumped the refcount * refcount must be >= 2 since there's a valid owning reference and the node is already in a collection * Insert triggering bpf_obj_drop will decr refcount to >= 1, never resulting in a free So although we may do bpf_obj_drop during the critical section, this will never result in memory being free'd, and no changes to non-owning ref logic are needed in this patch. Signed-off-by: Dave Marchevsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-15	bpf: Add bpf_refcount_acquire kfunc	Dave Marchevsky	1	-0/+13
	Currently, BPF programs can interact with the lifetime of refcounted local kptrs in the following ways: bpf_obj_new - Initialize refcount to 1 as part of new object creation bpf_obj_drop - Decrement refcount and free object if it's 0 collection add - Pass ownership to the collection. No change to refcount but collection is responsible for bpf_obj_dropping it In order to be able to add a refcounted local kptr to multiple collections we need to be able to increment the refcount and acquire a new owning reference. This patch adds a kfunc, bpf_refcount_acquire, implementing such an operation. bpf_refcount_acquire takes a refcounted local kptr and returns a new owning reference to the same underlying memory as the input. The input can be either owning or non-owning. To reinforce why this is safe, consider the following code snippets: struct node n = bpf_obj_new(typeof(n)); // A struct node m = bpf_refcount_acquire(n); // B In the above snippet, n will be alive with refcount=1 after (A), and since nothing changes that state before (B), it's obviously safe. If n is instead added to some rbtree, we can still safely refcount_acquire it: struct node n = bpf_obj_new(typeof(n)); struct node m; bpf_spin_lock(&glock); bpf_rbtree_add(&groot, &n->node, less); // A m = bpf_refcount_acquire(n); // B bpf_spin_unlock(&glock); In the above snippet, after (A) n is a non-owning reference, and after (B) m is an owning reference pointing to the same memory as n. Although n has no ownership of that memory's lifetime, it's guaranteed to be alive until the end of the critical section, and n would be clobbered if we were past the end of the critical section, so it's safe to bump refcount. Implementation details: * From verifier's perspective, bpf_refcount_acquire handling is similar to bpf_obj_new and bpf_obj_drop. Like the former, it returns a new owning reference matching input type, although like the latter, type can be inferred from concrete kptr input. Verifier changes in {check,fixup}_kfunc_call and check_kfunc_args are largely copied from aforementioned functions' verifier changes. * An exception to the above is the new KF_ARG_PTR_TO_REFCOUNTED_KPTR arg, indicated by new "__refcounted_kptr" kfunc arg suffix. This is necessary in order to handle both owning and non-owning input without adding special-casing to "__alloc" arg handling. Also a convenient place to confirm that input type has bpf_refcount field. * The implemented kfunc is actually bpf_refcount_acquire_impl, with 'hidden' second arg that the verifier sets to the type's struct_meta in fixup_kfunc_call. Signed-off-by: Dave Marchevsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-14	KVM: selftests: Test the PMU event "Instructions retired"	Aaron Lewis	1	-2/+32
	Add testing for the event "Instructions retired" (0xc0) in the PMU event filter on both Intel and AMD to ensure that the event doesn't count when it is disallowed. Unlike most of the other events, the event "Instructions retired" will be incremented by KVM when an instruction is emulated. Test that this case is being properly handled and that KVM doesn't increment the counter when that event is disallowed. Signed-off-by: Aaron Lewis <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-04-14	KVM: selftests: Copy full counter values from guest in PMU event filter test	Sean Christopherson	1	-90/+80
	Use a single struct to track all PMC event counts in the PMU filter test, and copy the full struct to/from the guest when running and measuring each guest workload. Using a common struct avoids naming conflicts, e.g. the loads/stores testcase has claimed "perf_counter", and eliminates the unnecessary truncation of the counter values when they are propagated from the guest MSRs to the host structs. Zero the struct before running the guest workload to ensure that the test doesn't get a false pass due to consuming data from a previous run. Link: https://lore.kernel.org/r/[email protected] Reviewed by: Aaron Lewis <[email protected]> Signed-off-by: Sean Christopherson <[email protected]>
2023-04-14	KVM: selftests: Use error codes to signal errors in PMU event filter test	Sean Christopherson	1	-8/+8
	Use '0' to signal success and '-errno' to signal failure in the PMU event filter test so that the values are slightly less magical/arbitrary. Using '0' in the error paths is especially confusing as understanding it's an error value requires following the breadcrumbs to the host code that ultimately consumes the value. Arguably there should also be a #define for "success", but 0/-errno is a common enough pattern that defining another macro on top would likely do more harm than good. Link: https://lore.kernel.org/r/[email protected] Reviewed by: Aaron Lewis <[email protected]> Signed-off-by: Sean Christopherson <[email protected]>
2023-04-14	KVM: selftests: Print detailed info in PMU event filter asserts	Aaron Lewis	1	-6/+5
	Provide the actual vs. expected count in the PMU event filter test's asserts instead of relying on pr_info() to provide the context, e.g. so that all information needed to triage a failure is readily available even if the environment in which the test is run captures only the assert itself. Signed-off-by: Aaron Lewis <[email protected]> [sean: rewrite changelog] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-04-14	KVM: selftests: Add helpers for PMC asserts in PMU event filter test	Aaron Lewis	1	-25/+27
	Add helper macros to consolidate the asserts that a PMC is/isn't counting (branch) instructions retired. This will make it easier to add additional asserts related to counting instructions later on. No functional changes intended. Signed-off-by: Aaron Lewis <[email protected]> [sean: add "INSTRUCTIONS", massage changelog] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>