aboutsummaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2023-04-20selftests: forwarding: generalize bail_on_lldpad from mlxswPetr Machata11-46/+39
mlxsw selftests often invoke a bail_on_lldpad() helper to make sure LLDPAD is not running, to prevent conflicts between the QoS configuration applied through TC or DCB command line tool, and the DCB configuration that LLDPAD might apply. This helper might be useful to others. Move the function to lib.sh, and parameterize to make reusable in other contexts. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-20selftests: forwarding: sch_tbf_*: Add a pre-run hookPetr Machata5-3/+23
The driver-specific wrappers of these selftests invoke bail_on_lldpad to make sure that LLDPAD doesn't trample the configuration. The function bail_on_lldpad is going to move to lib.sh in the next patch. With that, it won't be visible for the wrappers before sourcing the framework script. And after sourcing it, it is too late: the selftest will have run by then. One option might be to source NUM_NETIFS=0 lib.sh from the wrapper, but even if that worked (it might, it might not), that seems cumbersome. lib.sh is doing fair amount of stuff, and even if it works today, it does not look particularly solid as a solution. Instead, introduce a hook, sch_tbf_pre_hook(), that when available, gets invoked. Move the bail to the hook. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Danielle Ratson <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-20selftests/bpf: populate map_array_ro map for verifier_array_access testEduard Zingerman2-5/+41
Two test cases: - "valid read map access into a read-only array 1" and - "valid read map access into a read-only array 2" Expect that map_array_ro map is filled with mock data. This logic was not taken into acount during initial test conversion. This commit modifies prog_tests/verifier.c entry point for this test to fill the map. Fixes: a3c830ae0209 ("selftests/bpf: verifier/array_access.c converted to inline assembly") Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-20selftests/bpf: add pre bpf_prog_test_run_opts() callback for test_loaderEduard Zingerman2-0/+17
When a test case is annotated with __retval tag the test_loader engine would use libbpf's bpf_prog_test_run_opts() to do a test run of the program and compare retvals. This commit allows to perform arbitrary actions on bpf object right before test loader invokes bpf_prog_test_run_opts(). This could be used to setup some state for program execution, e.g. fill some maps. Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-20selftests/bpf: fix __retval() being always ignoredEduard Zingerman2-3/+3
Florian Westphal found a bug in and suggested a fix for test_loader.c processing of __retval tag. Because of this bug the function test_loader.c:do_prog_test_run() never executed and all __retval test tags were ignored. If this bug is fixed a number of test cases from progs/verifier_array_access.c fail with retval not matching the expected value. This test was recently converted to use test_loader.c and inline assembly in [1]. When doing the conversion I missed the important detail of test_verifier.c operation: when it creates fixup_map_array_ro, fixup_map_array_wo and fixup_map_array_small it populates these maps with a dummy record. Disabling the __retval checks for the affected verifier_array_access in this commit to avoid false-postivies in any potential bisects. The issue is addressed in the next patch. I verified that the __retval tags are now respected by changing expected return values for all tests annotated with __retval, and checking that these tests started to fail. [1] https://lore.kernel.org/bpf/[email protected]/ Fixes: 19a8e06f5f91 ("selftests/bpf: Tests execution support for test_loader.c") Reported-by: Florian Westphal <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]/T/ Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-20selftests/bpf: disable program test run for progs/refcounted_kptr.cEduard Zingerman1-4/+4
Florian Westphal found a bug in test_loader.c processing of __retval tag. Because of this bug the function test_loader.c:do_prog_test_run() never executed and all __retval test tags were ignored. This hid an issue with progs/refcounted_kptr.c tests. When __retval tag bug is fixed and refcounted_kptr.c tests are run kernel reports various issues and eventually hangs. Shortest reproducer is the following command run a few times: $ for i in $(seq 1 4); do (./test_progs --allow=refcounted_kptr &); done Commenting out __retval tags for these tests until this issue is resolved. Reported-by: Florian Westphal <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]/T/ Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-20bpftool: Replace "__fallthrough" by a comment to address merge conflictQuentin Monnet1-1/+1
The recent support for inline annotations in control flow graphs generated by bpftool introduced the usage of the "__fallthrough" macro in a switch/case block in btf_dumper.c. This change went through the bpf-next tree, but resulted in a merge conflict in linux-next, because this macro has been renamed "fallthrough" (no underscores) in the meantime. To address the conflict, we temporarily switch to a simple comment instead of a macro. Related: commit f7a858bffcdd ("tools: Rename __fallthrough to fallthrough") Fixes: 9fd496848b1c ("bpftool: Support inline annotations when dumping the CFG of a program") Reported-by: Sven Schnelle <[email protected]> Reported-by: Thomas Richter <[email protected]> Suggested-by: Daniel Borkmann <[email protected]> Signed-off-by: Quentin Monnet <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/bpf/[email protected]/ Link: https://lore.kernel.org/bpf/[email protected]
2023-04-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-9/+9
Adjacent changes: net/mptcp/protocol.h 63740448a32e ("mptcp: fix accept vs worker race") 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") ddb1a072f858 ("mptcp: move first subflow allocation at mpc access time") Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-20libperf rc_check: Enable implicitly with sanitizersIan Rogers1-0/+8
If using leak sanitizer then implicitly enable reference count checking. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-20perf test: Fix maps use after putIan Rogers1-4/+4
Fix a use after put reference count issue. maps is copied from leader, but the leader is put on line 79 and then maps is used to read the reference count below - so a use after put, with the put of maps happening within thread__put. Fix by reversing the order of puts so that the leader is put last. To explain the reference count checker, I wrote this up as a little example here: https://perf.wiki.kernel.org/index.php/Reference_Count_Checking Note, the bug was introduced by the committer and wasn't present in the original reference count patch set. Committer notes: Yes, the bug predated your patch and is detected by the reference count checking you contributed. This was just part of splitting up your series into smaller chunks, in this case either we fix the problem detected while developing this reference counting infrastructure before the patch introducing REFCNT_CHECKING or fix it later after the merged infrastructure, when built with EXTRA_CFLAGS="-DREFCNT_CHECKING=1" detects it when running 'perf test', which is what this patch does. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-19selftests/bpf: Add test to access integer type of variable arrayFeng Zhou5-0/+70
Add prog test for accessing integer type of variable array in tracing program. In addition, hook load_balance function to access sd->span[0], only to confirm whether the load is successful. Because there is no direct way to trigger load_balance call. Co-developed-by: Chengming Zhou <[email protected]> Signed-off-by: Chengming Zhou <[email protected]> Signed-off-by: Feng Zhou <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-20selftests/powerpc/dscr: Restore timeout to DSCR selftestsBenjamin Gray2-3/+0
Reducing the time taken by dscr_sysfs_test.c allows restoring the default timeout, which was removed in commit 850507f30c38 ("selftests/powerpc: Turn off timeout setting for benchmarks, dscr, signal, tm") because that test took too long. Signed-off-by: Benjamin Gray <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2023-04-20selftests/powerpc/dscr: Speed up DSCR sysfs testsBenjamin Gray1-7/+4
This test case is extremely slow, taking around a minute compared to most of the other DSCR tests taking a second at most. Perf shows most time is spent by the kernel switching to each CPU it reads in /sys/devices/system/cpu. This switching is an unavoidable consequnce of reading all the .../cpuN/dscr values. Remove the outer iteration loop from this test case, reducing the reads from 1600 to 16. This still updates the DSCR 16 times and verifies on every CPU each time, so I do not expect the lower coverage to be meaningful. The speedup is significant: back down to ~1 second like the other tests. Signed-off-by: Benjamin Gray <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2023-04-20selftests/powerpc/dscr: Improve DSCR explicit random test caseBenjamin Gray3-115/+113
The tests currently have a single writer thread updating the system DSCR with a 1/1000 chance looped only 100 times. So only around one in 10 runs actually do anything. * Add multiple threads to the dscr_explicit_random_test case. * Use a barrier to make all the threads start work as simultaneously as possible. * Use a rwlock and make all threads have a reasonable chance to write to the DSCR on each iteration. PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP is used to prevent writers from starving while all the other threads keep reading. Logging the reads/writes shows a decent mix across the whole test. * Allow all threads a chance to write. * Make the chance of writing more likely. Signed-off-by: Benjamin Gray <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2023-04-20selftests/powerpc/dscr: Add lockstep test cases to DSCR explicit testsBenjamin Gray3-14/+159
Add new cases to the relevant tests that use explicitly synchronized threads to test the behaviour across context switches with less randomness. By locking the participants to the same CPU we guarantee a context switch occurs each time they make progress, which is a likely failure point if the kernel is not tracking the thread local DSCR correctly. The random case is left in to keep exercising potential edge cases. Signed-off-by: Benjamin Gray <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2023-04-20selftests/powerpc: Allow bind_to_cpu() to automatically pick CPUBenjamin Gray7-15/+21
All current users of bind_to_cpu() don't care _which_ CPU they get, just that they are bound to a single free one. So alter the interface to 1. Accept a BIND_CPU_ANY value that tells it to automatically pick a CPU 2. Return the picked CPU And convert all these users to bind_to_cpu(BIND_CPU_ANY). Signed-off-by: Benjamin Gray <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2023-04-20selftests/powerpc: Move bind_to_cpu() to utils.hBenjamin Gray4-14/+13
This function will be useful in the DSCR test patches later in this series, so promote it to be shared by all powerpc selftests. Signed-off-by: Benjamin Gray <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2023-04-20selftests/powerpc/dscr: Correct typosBenjamin Gray3-6/+6
Correct a couple of typos while working on other improvements to the DSCR tests. Signed-off-by: Benjamin Gray <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2023-04-20powerpc: add CFUNC assembly label annotationNicholas Piggin2-0/+2
This macro is to be used in assembly where C functions are called. pcrel addressing mode requires branches to functions with a localentry value of 1 to have either a trailing nop or @notoc. This macro permits the latter without changing callers. Signed-off-by: Nicholas Piggin <[email protected]> [mpe: Add dummy definitions to fix selftests build] Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2023-04-19Merge tag 'mm-hotfixes-stable-2023-04-19-16-36' of ↵Linus Torvalds2-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "22 hotfixes. 19 are cc:stable and the remainder address issues which were introduced during this merge cycle, or aren't considered suitable for -stable backporting. 19 are for MM and the remainder are for other subsystems" * tag 'mm-hotfixes-stable-2023-04-19-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) nilfs2: initialize unused bytes in segment summary blocks mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages mm/mmap: regression fix for unmapped_area{_topdown} maple_tree: fix mas_empty_area() search maple_tree: make maple state reusable after mas_empty_area_rev() mm: kmsan: handle alloc failures in kmsan_ioremap_page_range() mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush() tools/Makefile: do missed s/vm/mm/ mm: fix memory leak on mm_init error handling mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() Revert "userfaultfd: don't fail on unrecognized features" writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used mailmap: update jtoppins' entry to reference correct email mm/mempolicy: fix use-after-free of VMA iterator mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO mm/mprotect: fix do_mprotect_pkey() return on error mm/khugepaged: check again on anon uffd-wp during isolation ...
2023-04-19perf probe: Add missing 0x prefix for addresses printed in hexadecimalArnaldo Carvalho de Melo1-1/+1
To fix this confusing warning: # perf probe -l Failed to find debug information for address 798240 probe_main:prometheus_new_counter__return (on github.com/prometheus/client_golang/prometheus.NewCounter%return in /home/acme/git/prometheus-uprobes/main with counter) # As that 798240 is printed with PRIx64 but has no letters, better print the 0x prefix to disambiguate. Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-19perf build: Test the refcnt check buildArnaldo Carvalho de Melo1-0/+2
Make sure we test build the currently added REFCNT_CHECKING infrastructure. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-19perf map: Add reference count checkingIan Rogers6-51/+62
There's no strict get/put policy with map that leads to leaks or use after free. Reference count checking identifies correct pairing of gets and puts. Committer notes: Extracted from a larger patch removing bits that were covered by the use of pre-existing map__ accessors (e.g. maps__nr_maps()) and new ones added (map__refcnt() and the maps__set_ ones) to reduce RC_CHK_ACCESS(maps)-> source code pollution. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-19perf map: Add set_ methods for ↵Arnaldo Carvalho de Melo9-65/+132
map->{start,end,pgoff,pgoff,reloc,erange_warned,dso,map_ip,unmap_ip,priv} To have a way to intercept usage of the reference counted struct map. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-19perf map: Add missing conversions to map__refcnt()Arnaldo Carvalho de Melo2-4/+4
Some conversions weren't performed in 4e8db2d7520f780f ("perf map: Add map__refcnt() accessor to use in the maps test"), fix it. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-19selftests/xsk: Fix munmap for hugepage allocated umemMagnus Karlsson2-4/+16
Fix the unmapping of hugepage allocated umems so that they are properly unmapped. The new test referred to in the fixes label, introduced a test that allocated a umem that is not a multiple of a 2M hugepage size. This is fine for mmap() that rounds the size up the nearest multiple of 2M. But munmap() requires the size to be a multiple of the hugepage size in order for it to unmap the region. The current behaviour of not properly unmapping the umem, was discovered when further additions of tests that require hugepages (unaligned mode tests only) started failing as the system was running out of hugepages. Fixes: c0801598e543 ("selftests: xsk: Add test UNALIGNED_INV_DESC_4K1_FRAME_SIZE") Signed-off-by: Magnus Karlsson <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-04-19perf maps: Add reference count checkingIan Rogers7-43/+50
Add reference count checking to make sure of good use of get and put. Add and use accessors to reduce RC_CHK clutter. The only significant issue was in tests/thread-maps-share.c where reference counts were released in the reverse order to acquisition, leading to a use after put. This was fixed by reversing the put order. Committer notes: Extracted from a larger patch removing bits that were covered by the use of pre-existing maps__ accessors (e.g. maps__nr_maps()) and new ones added (maps__refcnt()) to reduce RC_CHK_ACCESS(maps)-> source code pollution. Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stephen Brennan <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-19perf maps: Use maps__nr_maps() instead of open coded maps->nr_mapsArnaldo Carvalho de Melo1-1/+1
To use the existing accessor and be consistent. Signef-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-19perf maps: Add maps__refcnt() accessor to allow checking maps pointerArnaldo Carvalho de Melo3-9/+14
To remove one more direct access to 'struct maps' so that we can intercept accesses to its instantiations and refcount check it to catch use after free, etc. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-19perf dso: Fix use before NULL check introduced by map__dso() introductionArnaldo Carvalho de Melo3-7/+6
James Clark noticed that the recent 63df0e4bc368adbd ("perf map: Add accessor for dso") patch accessed map->dso before the 'map' variable was NULL checked, which is a change in logic that leads to segmentation faults, so comb thru that patch to fix similar cases. Fixes: 63df0e4bc368adbd ("perf map: Add accessor for dso") Acked-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected] Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-04-19tools/loongarch: Use __SIZEOF_LONG__ to define __BITS_PER_LONGTiezhu Yang1-1/+1
Although __SIZEOF_POINTER__ is equal to _SIZEOF_LONG__ on LoongArch, it is better to use __SIZEOF_LONG__ to define __BITS_PER_LONG to keep consistent between arch/loongarch/include/uapi/asm/bitsperlong.h and tools/arch/loongarch/include/uapi/asm/bitsperlong.h. Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18x86: improve on the non-rep 'copy_user' functionLinus Torvalds1-1/+1
The old 'copy_user_generic_unrolled' function was oddly implemented for largely historical reasons: it had been largely based on the uncached copy case, which has some other concerns. For example, the __copy_user_nocache() function uses 'movnti' for the destination stores, and those want the destination to be aligned. In contrast, the regular copy function doesn't really care, and trying to align things only complicates matters. Also, like the clear_user function, the copy function had some odd handling of the repeat counts, complicating the exception handling for no really good reason. So as with clear_user, just write it to keep all the byte counts in the %rcx register, exactly like the 'rep movs' functionality that this replaces. Unlike a real 'rep movs', we do allow for this to trash a few temporary registers to not have to unnecessarily save/restore registers on the stack. And like the clearing case, rename this to what it now clearly is: 'rep_movs_alternative', and make it one coherent function, so that it shows up as such in profiles (instead of the odd split between "copy_user_generic_unrolled" and "copy_user_short_string", the latter of which was not about strings at all, and which was shared with the uncached case). Signed-off-by: Linus Torvalds <[email protected]>
2023-04-18x86: improve on the non-rep 'clear_user' functionLinus Torvalds1-1/+1
The old version was oddly written to have the repeat count in multiple registers. So instead of taking advantage of %rax being zero, it had some sub-counts in it. All just for a "single word clearing" loop, which isn't even efficient to begin with. So get rid of those games, and just keep all the state in the same registers we got it in (and that we should return things in). That not only makes this act much more like 'rep stos' (which this function is replacing), but makes it much easier to actually do the obvious loop unrolling. Also rename the function from the now nonsensical 'clear_user_original' to what it now clearly is: 'rep_stos_alternative'. End result: if we don't have a fast 'rep stosb', at least we can have a fast fallback for it. Signed-off-by: Linus Torvalds <[email protected]>
2023-04-18x86: inline the 'rep movs' in user copies for the FSRM caseLinus Torvalds1-1/+0
This does the same thing for the user copies as commit 0db7058e8e23 ("x86/clear_user: Make it faster") did for clear_user(). In other words, it inlines the "rep movs" case when X86_FEATURE_FSRM is set, avoiding the function call entirely. In order to do that, it makes the calling convention for the out-of-line case ("copy_user_generic_unrolled") match the 'rep movs' calling convention, although it does also end up clobbering a number of additional registers. Also, to simplify code sharing in the low-level assembly with the __copy_user_nocache() function (that uses the normal C calling convention), we end up with a kind of mixed return value for the low-level asm code: it will return the result in both %rcx (to work as an alternative for the 'rep movs' case), _and_ in %rax (for the nocache case). We could avoid this by wrapping __copy_user_nocache() callers in an inline asm, but since the cost is just an extra register copy, it's probably not worth it. Signed-off-by: Linus Torvalds <[email protected]>
2023-04-18x86: move stac/clac from user copy routines into callersLinus Torvalds1-0/+3
This is preparatory work for inlining the 'rep movs' case, but also a cleanup. The __copy_user_nocache() function was mis-used by the rdma code to do uncached kernel copies that don't actually want user copies at all, and as a result doesn't want the stac/clac either. Signed-off-by: Linus Torvalds <[email protected]>
2023-04-18x86: don't use REP_GOOD or ERMS for user memory clearingLinus Torvalds1-2/+0
The modern target to use is FSRS (Fast Short REP STOS), and the other cases should only be used for bigger areas (ie mainly things like page clearing). Note! This changes the conditional for the inlining from FSRM ("fast short rep movs") to FSRS ("fast short rep stos"). We'll have a separate fixup for AMD microarchitectures that have a good 'rep stosb' yet do not set the new Intel-specific FSRS bit (because FSRM was there first). Signed-off-by: Linus Torvalds <[email protected]>
2023-04-18selftests/memfd: fix test_sysctlJeff Xu1-6/+8
sysctl memfd_noexec is pid-namespaced, non-reservable, and inherent to the child process. Move the inherence test from init ns to child ns, so init ns can keep the default value. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jeff Xu <[email protected]> Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/oe-lkp/[email protected] Tested-by: Yujie Liu <[email protected]> Cc: Daniel Verkamp <[email protected]> Cc: Dmitry Torokhov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jorge Lucangeli Obes <[email protected]> Cc: Kees Cook <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18selftests/mm: run hugetlb testcases of va switchChaitanya S Prakash1-0/+4
The va_high_addr_switch selftest is used to test mmap across 128TB boundary. It divides the selftest cases into two main categories on the basis of size. One set is used to create mappings that are multiples of PAGE_SIZE while the other creates mappings that are multiples of HUGETLB_SIZE. In order to run the hugetlb testcases the binary must be appended with "--run-hugetlb" but the file that used to run the test only invokes the binary, thereby completely skipping the hugetlb testcases. Hence, the required statement has been added. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chaitanya S Prakash <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18selftests/mm: configure nr_hugepages for arm64Chaitanya S Prakash1-1/+9
Arm64 has a default hugepage size of 512MB when CONFIG_ARM64_64K_PAGES=y is enabled. While testing on arm64 platforms having up to 4PB of virtual address space, a minimum of 6 hugepages were required for all test cases to pass. Support for this requirement has been added. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chaitanya S Prakash <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18selftests/mm: add platform independent in code commentsChaitanya S Prakash1-7/+8
The in code comments for the selftest were made on the basis of 128TB switch, an architecture feature specific to PowerPc and x86 platforms. Keeping in mind the support added for arm64 platforms which implements a 256TB switch, a more generic explanation has been provided. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chaitanya S Prakash <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18selftests/mm: rename va_128TBswitch to va_high_addr_switchChaitanya S Prakash4-4/+4
As the initial selftest only took into consideration PowperPC and x86 architectures, on adding support for arm64, a platform independent naming convention is chosen. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chaitanya S Prakash <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18selftests/mm: add support for arm64 platform on va switchChaitanya S Prakash1-2/+24
Patch series "selftests/mm: Implement support for arm64 on va". The va_128TBswitch selftest is designed and implemented for PowerPC and x86 architectures which support a 128TB switch, up to 256TB of virtual address space and hugepage sizes of 16MB and 2MB respectively. Arm64 platforms on the other hand support a 256Tb switch, up to 4PB of virtual address space and a default hugepage size of 512MB when 64k pagesize is enabled. These architectural differences require introducing support for arm64 platforms, after which a more generic naming convention is suggested. The in code comments are amended to provide a more platform independent explanation of the working of the code and nr_hugepages are configured as required. Finally, the file running the testcase is modified in order to prevent skipping of hugetlb testcases of va_high_addr_switch. This patch (of 5): Arm64 platforms have the ability to support 64kb pagesize, 512MB default hugepage size and up to 4PB of virtual address space. The address switch occurs at 256TB as opposed to 128TB. Hence, the necessary support has been added. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chaitanya S Prakash <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18delayacct: track delays from IRQ/SOFTIRQYang Yang1-12/+18
Delay accounting does not track the delay of IRQ/SOFTIRQ. While IRQ/SOFTIRQ could have obvious impact on some workloads productivity, such as when workloads are running on system which is busy handling network IRQ/SOFTIRQ. Get the delay of IRQ/SOFTIRQ could help users to reduce such delay. Such as setting interrupt affinity or task affinity, using kernel thread for NAPI etc. This is inspired by "sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure"[1]. Also fix some code indent problems of older code. And update tools/accounting/getdelays.c: / # ./getdelays -p 156 -di print delayacct stats ON printing IO accounting PID 156 CPU count real total virtual total delay total delay average 15 15836008 16218149 275700790 18.380ms IO count delay total delay average 0 0 0.000ms SWAP count delay total delay average 0 0 0.000ms RECLAIM count delay total delay average 0 0 0.000ms THRASHING count delay total delay average 0 0 0.000ms COMPACT count delay total delay average 0 0 0.000ms WPCOPY count delay total delay average 36 7586118 0.211ms IRQ count delay total delay average 42 929161 0.022ms [1] commit 52b1364ba0b1("sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Yang <[email protected]> Cc: Jiang Xuexin <[email protected]> Cc: wangyong <[email protected]> Cc: junhua huang <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18selftests/mm: add uffdio register ioctls testPeter Xu1-15/+97
This new test tests against the returned ioctls from UFFDIO_REGISTER, where put into uffdio_register.ioctls. This also tests the expected failure cases of UFFDIO_REGISTER, aka: - Register with empty mode should fail with -EINVAL - Register minor without page cache (anon) should fail with -EINVAL Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18selftests/mm: add shmem-private test to uffd-stressPeter Xu2-5/+9
The userfaultfd stress test never tested private shmem, which I think was overlooked long due. Add it so it matches with uffd unit test and it'll cover all memory supported with the three memory types. Meanwhile, rename the memory types a bit. Considering shared mem is the major use case for both shmem / hugetlbfs, changing from: anon, hugetlb, hugetlb_shared, shmem To (with shmem-private added): anon, hugetlb, hugetlb-private, shmem, shmem-private Add the shmem-private to run_vmtests.sh too. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18selftests/mm: drop sys/dev test in uffd-stress testPeter Xu4-40/+11
With the new uffd unit test covering the /dev/userfaultfd path and syscall path of uffd initializations, we can safely drop the devnode test in the old stress test. One thing is to avoid duplication of running the stress test twice which is an overkill to only test the /dev/ interface in run_vmtests.sh. The other benefit is now all uffd tests (that uses userfaultfd_open) can run automatically as long as any type of interface is enabled (either syscall or dev), so it's more likely to succeed rather than fail due to unprivilege. With this patch lands, we can drop all the "mem_type:XXX" handlings too. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18selftests/mm: allow uffd test to skip properly with no privilegePeter Xu4-18/+29
Allow skip a unit test properly due to no privilege (e.g. sigbus and events tests). [[email protected]: fix spelling mistake "priviledge" -> "privilege"] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Colin Ian King <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18selftests/mm: workaround no way to detect uffd-minor + wpPeter Xu1-1/+7
Userfaultfd minor+wp mode was very recently added. The test will fail on the old kernels at ioctl(UFFDIO_CONTINUE) which is misterious. Unfortunately there's no feature bit to detect for this support. Add a hack to leverage WP_UNPOPULATED to detect whether that feature existed, since WP_UNPOPULATED was merged right after minor+wp. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18selftests/mm: move zeropage test into uffd unit testsPeter Xu4-95/+108
Simplifies it a bit along the way, e.g., drop the never used offset field (which was always the 1st page so offset=0). Introduce uffd_register_with_ioctls() out of uffd_register() to detect uffdio_register.ioctls got returned. Check that automatically when testing UFFDIO_ZEROPAGE on different types of memory (and kernel). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18selftests/mm: move uffd sig/events tests into uffd unit testsPeter Xu2-226/+265
Move the two tests into the unit test, and convert it into 20 standalone tests: - events test on all 5 mem types, with wp on/off - signal test on all 5 mem types, with wp on/off Testing sigbus on anon... done Testing sigbus on shmem... done Testing sigbus on shmem-private... done Testing sigbus on hugetlb... done Testing sigbus on hugetlb-private... done Testing sigbus-wp on anon... done Testing sigbus-wp on shmem... done Testing sigbus-wp on shmem-private... done Testing sigbus-wp on hugetlb... done Testing sigbus-wp on hugetlb-private... done Testing events on anon... done Testing events on shmem... done Testing events on shmem-private... done Testing events on hugetlb... done Testing events on hugetlb-private... done Testing events-wp on anon... done Testing events-wp on shmem... done Testing events-wp on shmem-private... done Testing events-wp on hugetlb... done Testing events-wp on hugetlb-private... done It'll also remove a lot of global references along the way, e.g. test_uffdio_wp will be replaced with the wp value passed over. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>