aboutsummaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2022-09-07kselftest/arm64: Validate contents of EXTRA_CONTEXT blocksMark Brown1-4/+21
Currently in validate_reserved() we check the basic form and contents of an EXTRA_CONTEXT block but do not actually validate anything inside the data block it provides. Extend the validation to do so, when we get to the terminator for the main data block reset and start walking the extra data block instead. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-07kselftest/arm64: Only validate each signal context onceMark Brown1-7/+12
Currently for the more complex signal context types we validate the context specific details the end of the parsing loop validate_reserved() if we've ever seen a context of that type. This is currently merely a bit inefficient but will get a bit awkward when we start parsing extra_context, at which point we will need to reset the head to advance into the extra space that extra_context provides. Instead only do the more detailed checks on each context type the first time we see that context type. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-07kselftest/arm64: Remove unneeded protype for validate_extra_context()Mark Brown1-2/+0
Nothing outside testcases.c should need to use validate_extra_context(), remove the prototype to ensure nothing does. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-07kselftest/arm64: Fix validation of EXTRA_CONTEXT signal context locationMark Brown1-1/+1
Currently in validate_extra_context() we assert both that the extra data pointed to by the EXTRA_CONTEXT is 16 byte aligned and that it immediately follows the struct _aarch64_ctx providing the terminator for the linked list of contexts in the signal frame. Since struct _aarch64_ctx is an 8 byte structure which must be 16 byte aligned these cannot both be true. As documented in sigcontext.h and implemented by the kernel the extra data should be at the next 16 byte aligned address after the terminator so fix the validation to match. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-07kselftest/arm64: Fix validatation termination record after EXTRA_CONTEXTMark Brown1-1/+1
When arm64 signal context data overflows the base struct sigcontext it gets placed in an extra buffer pointed to by a record of type EXTRA_CONTEXT in the base struct sigcontext which is required to be the last record in the base struct sigframe. The current validation code attempts to check this by using GET_RESV_NEXT_HEAD() to step forward from the current record to the next but that is a macro which assumes it is being provided with a struct _aarch64_ctx and uses the size there to skip forward to the next record. Instead validate_extra_context() passes it a struct extra_context which has a separate size field. This compiles but results in us trying to validate a termination record in completely the wrong place, at best failing validation and at worst just segfaulting. Fix this by passing the struct _aarch64_ctx we meant to into the macro. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-07kselftest/arm64: Validate signal ucontext in placeMark Brown1-3/+6
In handle_input_signal_copyctx() we use ASSERT_GOOD_CONTEXT() to validate that the context we are saving meets expectations however we do this on the saved copy rather than on the actual signal context passed in. This breaks validation of EXTRA_CONTEXT since we attempt to validate the ABI requirement that the additional space supplied is immediately after the termination record in the standard context which will not be the case after it has been copied to another location. Fix this by doing the validation before we copy. Note that nothing actually looks inside the EXTRA_CONTEXT at present. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-07kselftest/arm64: Enumerate SME rather than SVE vector lengths for za_regsMark Brown1-2/+2
The za_regs signal test was enumerating the SVE vector lengths rather than the SME vector lengths through cut'n'paste error when determining what to test. Enumerate the SME vector lengths instead. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-07kselftest/arm64: Add a test for signal frames with ZA disabledMark Brown1-0/+119
When ZA is disabled there should be no register data in the ZA signal frame, add a test case which confirms that this is the case. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-07kselftest/arm64: Tighten up validation of ZA signal contextMark Brown1-1/+15
Currently we accept any size for the ZA signal context that the shared code will accept which means we don't verify that any data is present. Since we have enabled ZA we know that there must be data so strengthen the check to only accept a signal frame with data, and while we're at it since we enabled ZA but did not set any data we know that ZA must contain zeros, confirm that. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-07kselftest/arm64: kselftest harness for FP stress testsMark Brown3-1/+540
Currently the stress test programs for floating point context switching are run by hand, there are extremely simplistic harnesses which run some copies of each test individually but they are not integrated into kselftest and with SVE and SME they only run with whatever vector length the process has by default. This is hassle when running the tests and means that they're not being run at all by CI systems picking up kselftest. In order to improve our coverage and provide a more convenient interface provide a harness program which starts enough stress test programs up to cause context switching and runs them for a set period. If only FPSIMD is available in the system we start two copies of the FPSIMD stress test per CPU, otherwise we start one copy of the FPSIMD and then start the SVE, streaming SVE and ZA tests once per CPU for each available VL they have to run on. We then run for a set period monitoring for any errors reported by the test programs before cleanly terminating them. In order to provide additional coverage of signal handling and some extra noise in the scheduling we send a SIGUSR2 to the stress tests once a second, the tests will count the number of signals they get. Since kselftest is generally expected to run quickly we by default only run for ten seconds. This is enough to show if there is anything cripplingly wrong but not exactly a thorough soak test, for interactive and more focused use a command line option -t N is provided which overrides the length of time to run for (specified in seconds) and if 0 is specified then there is no timeout and the test must be manually terminated. The timeout is counted in seconds with no output, this is done to account for the potentially slow startup time for the test programs on virtual platforms which tend to struggle during startup as they are both slow and tend to support a wide range of vector lengths. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-07kselftest/arm64: Install signal handlers before output in FP stress testsMark Brown3-72/+72
To interface more robustly with other processes install the signal handers in the floating point stress tests before we produce any output, this means that a parent process can know that if it has seen any output from the test then the test is ready to handle incoming signals. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-07selftests: nft_concat_range: add socat supportFlorian Westphal1-12/+53
There are different flavors of 'nc' around, this script fails on my test vm because 'nc' is 'nmap-ncat' which isn't 100% compatible. Add socat support and use it if available. Signed-off-by: Florian Westphal <[email protected]>
2022-09-06selftests/bpf: Add tracing_struct test in DENYLIST.s390xYonghong Song1-0/+1
Add tracing_struct test in DENYLIST.s390x since s390x does not support trampoline now. Signed-off-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-06selftests/bpf: Use BPF_PROG2 for some fentry programs without struct argumentsYonghong Song1-2/+2
Use BPF_PROG2 instead of BPF_PROG for programs in progs/timer.c to test BPF_PROG2 for cases without struct arguments. Signed-off-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-06selftests/bpf: Add struct argument tests with fentry/fexit programs.Yonghong Song3-0/+231
Add various struct argument tests with fentry/fexit programs. Also add one test with a kernel func which does not have any argument to test BPF_PROG2 macro in such situation. Signed-off-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-06libbpf: Add new BPF_PROG2 macroYonghong Song1-0/+79
To support struct arguments in trampoline based programs, existing BPF_PROG doesn't work any more since the type size is needed to find whether a parameter takes one or two registers. So this patch added a new BPF_PROG2 macro to support such trampoline programs. The idea is suggested by Andrii. For example, if the to-be-traced function has signature like typedef struct { void *x; int t; } sockptr; int blah(sockptr x, char y); In the new BPF_PROG2 macro, the argument can be represented as __bpf_prog_call( ({ union { struct { __u64 x, y; } ___z; sockptr x; } ___tmp = { .___z = { ctx[0], ctx[1] }}; ___tmp.x; }), ({ union { struct { __u8 x; } ___z; char y; } ___tmp = { .___z = { ctx[2] }}; ___tmp.y; })); In the above, the values stored on the stack are properly assigned to the actual argument type value by using 'union' magic. Note that the macro also works even if no arguments are with struct types. Note that new BPF_PROG2 works for both llvm16 and pre-llvm16 compilers where llvm16 supports bpf target passing value with struct up to 16 byte size and pre-llvm16 will pass by reference by storing values on the stack. With static functions with struct argument as always inline, the compiler is able to optimize and remove additional stack saving of struct values. Signed-off-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-06bpf: Update descriptions for helpers bpf_get_func_arg[_cnt]()Yonghong Song1-4/+5
Now instead of the number of arguments, the number of registers holding argument values are stored in trampoline. Update the description of bpf_get_func_arg[_cnt]() helpers. Previous programs without struct arguments should continue to work as usual. Signed-off-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-06Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextPaolo Abeni112-507/+3020
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-09-05 The following pull-request contains BPF updates for your *net-next* tree. We've added 106 non-merge commits during the last 18 day(s) which contain a total of 159 files changed, 5225 insertions(+), 1358 deletions(-). There are two small merge conflicts, resolve them as follows: 1) tools/testing/selftests/bpf/DENYLIST.s390x Commit 27e23836ce22 ("selftests/bpf: Add lru_bug to s390x deny list") in bpf tree was needed to get BPF CI green on s390x, but it conflicted with newly added tests on bpf-next. Resolve by adding both hunks, result: [...] lru_bug # prog 'printk': failed to auto-attach: -524 setget_sockopt # attach unexpected error: -524 (trampoline) cb_refs # expected error message unexpected error: -524 (trampoline) cgroup_hierarchical_stats # JIT does not support calling kernel function (kfunc) htab_update # failed to attach: ERROR: strerror_r(-524)=22 (trampoline) [...] 2) net/core/filter.c Commit 1227c1771dd2 ("net: Fix data-races around sysctl_[rw]mem_(max|default).") from net tree conflicts with commit 29003875bd5b ("bpf: Change bpf_setsockopt(SOL_SOCKET) to reuse sk_setsockopt()") from bpf-next tree. Take the code as it is from bpf-next tree, result: [...] if (getopt) { if (optname == SO_BINDTODEVICE) return -EINVAL; return sk_getsockopt(sk, SOL_SOCKET, optname, KERNEL_SOCKPTR(optval), KERNEL_SOCKPTR(optlen)); } return sk_setsockopt(sk, SOL_SOCKET, optname, KERNEL_SOCKPTR(optval), *optlen); [...] The main changes are: 1) Add any-context BPF specific memory allocator which is useful in particular for BPF tracing with bonus of performance equal to full prealloc, from Alexei Starovoitov. 2) Big batch to remove duplicated code from bpf_{get,set}sockopt() helpers as an effort to reuse the existing core socket code as much as possible, from Martin KaFai Lau. 3) Extend BPF flow dissector for BPF programs to just augment the in-kernel dissector with custom logic. In other words, allow for partial replacement, from Shmulik Ladkani. 4) Add a new cgroup iterator to BPF with different traversal options, from Hao Luo. 5) Support for BPF to collect hierarchical cgroup statistics efficiently through BPF integration with the rstat framework, from Yosry Ahmed. 6) Support bpf_{g,s}et_retval() under more BPF cgroup hooks, from Stanislav Fomichev. 7) BPF hash table and local storages fixes under fully preemptible kernel, from Hou Tao. 8) Add various improvements to BPF selftests and libbpf for compilation with gcc BPF backend, from James Hilliard. 9) Fix verifier helper permissions and reference state management for synchronous callbacks, from Kumar Kartikeya Dwivedi. 10) Add support for BPF selftest's xskxceiver to also be used against real devices that support MAC loopback, from Maciej Fijalkowski. 11) Various fixes to the bpf-helpers(7) man page generation script, from Quentin Monnet. 12) Document BPF verifier's tnum_in(tnum_range(), ...) gotchas, from Shung-Hsi Yu. 13) Various minor misc improvements all over the place. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (106 commits) bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc. bpf: Remove usage of kmem_cache from bpf_mem_cache. bpf: Remove prealloc-only restriction for sleepable bpf programs. bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs. bpf: Remove tracing program restriction on map types bpf: Convert percpu hash map to per-cpu bpf_mem_alloc. bpf: Add percpu allocation support to bpf_mem_alloc. bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU. bpf: Adjust low/high watermarks in bpf_mem_cache bpf: Optimize call_rcu in non-preallocated hash map. bpf: Optimize element count in non-preallocated hash map. bpf: Relax the requirement to use preallocated hash maps in tracing progs. samples/bpf: Reduce syscall overhead in map_perf_test. selftests/bpf: Improve test coverage of test_maps bpf: Convert hash map to bpf_mem_alloc. bpf: Introduce any context BPF specific memory allocator. selftest/bpf: Add test for bpf_getsockopt() bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt() bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt() bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt() ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-09-06kselftest/arm64: Count SIGUSR2 deliveries in FP stress testsMark Brown4-0/+46
Currently the floating point stress tests mostly support testing that the data they are checking can be disrupted from a signal handler triggered by SIGUSR1. This is not properly implemented for all the tests and in testing is frequently modified to just handle the signal without corrupting data in order to ensure that signal handling does not corrupt data. Directly support this usage by installing a SIGUSR2 handler which simply counts the signal delivery. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-06kselftest/arm64: Always encourage preemption for za-testMark Brown1-6/+1
Since we now have an explicit test for the syscall ABI there is no need for za-test to cover getpid() so just unconditionally do sched_yield() like we do in fpsimd-test. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-06kselftest/arm64: Add simple hwcap validationMark Brown3-1/+190
Add some trivial hwcap validation which checks that /proc/cpuinfo and AT_HWCAP agree with each other and can verify that for extensions that can generate a SIGILL due to adding new instructions one appears or doesn't appear as expected. I've added SVE and SME, other capabilities can be added later if this gets merged. This isn't super exciting but on the other hand took very little time to write and should be handy when verifying that you wired up AT_HWCAP properly. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-09-06perf c2c: Prevent potential memory leak in c2c_he_zalloc()Shang XiaoJing1-3/+9
Free allocated resources when zalloc() fails for members in c2c_he, to prevent potential memory leak in c2c_he_zalloc(). Signed-off-by: Shang XiaoJing <[email protected]> Reviewed-by: Leo Yan <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-09-06perf genelf: Switch deprecated openssl MD5_* functions to new EVP APIZixuan Tan1-9/+11
Switch to the flavored EVP API like in test-libcrypto.c, and remove the bad gcc #pragma. Inspired-by: 5b245985a6de5ac1 ("tools build: Switch to new openssl API for test-libcrypto") Signed-off-by: Zixuan Tan <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/CABwm_eTnARC1GwMD-JF176k8WXU1Z0+H190mvXn61yr369qt6g@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-09-06tools/perf: Fix out of bound access to cpu mask arrayAthira Rajeev1-6/+20
The cpu mask init code in "record__mmap_cpu_mask_init" function access "bits" array part of "struct mmap_cpu_mask". The size of this array is the value from cpu__max_cpu().cpu. This array is used to contain the cpumask value for each cpu. While setting bit for each cpu, it calls "set_bit" function which access index in "bits" array. If we provide a command line option to -C which is greater than the number of CPU's present in the system, the set_bit could access an array member which is out-of the array size. This is because currently, there is no boundary check for the CPU. This will result in seg fault: <<>> ./perf record -C 12341234 ls Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS Segmentation fault (core dumped) <<>> Debugging with gdb, points to function flow as below: <<>> set_bit record__mmap_cpu_mask_init record__init_thread_default_masks record__init_thread_masks cmd_record <<>> Fix this by adding boundary check for the array. After the patch: <<>> ./perf record -C 12341234 ls Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS Failed to initialize parallel data streaming masks <<>> With this fix, if -C is given a non-exsiting CPU, perf record will fail with: <<>> ./perf record -C 50 ls Failed to initialize parallel data streaming masks <<>> Reported-by: Nageswara R Sastry <[email protected]> Signed-off-by: Athira Jajeev <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Nageswara R Sastry <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-09-06perf affinity: Fix out of bound access to "sched_cpus" maskAthira Rajeev1-1/+7
The affinity code in "affinity_set" function access array named "sched_cpus". The size for this array is allocated in affinity_setup function which is nothing but value from get_cpu_set_size. This is used to contain the cpumask value for each cpu. While setting bit for each cpu, it calls "set_bit" function which access index in sched_cpus array. If we provide a command-line option to -C which is more than the number of CPU's present in the system, the set_bit could access an array member which is out-of the array size. This is because currently, there is no boundary check for the CPU. This will result in seg fault: <<>> ./perf stat -C 12323431 ls Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS Segmentation fault (core dumped) <<>> Fix this by adding boundary check for the array. After the fix from powerpc system: <<>> ./perf stat -C 12323431 ls 1>out Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS Performance counter stats for 'CPU(s) 12323431': <not supported> msec cpu-clock <not supported> context-switches <not supported> cpu-migrations <not supported> page-faults <not supported> cycles <not supported> instructions <not supported> branches <not supported> branch-misses 0.001192373 seconds time elapsed <<>> Reported-by: Nageswara R Sastry <[email protected]> Signed-off-by: Athira Jajeev <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Nageswara R Sastry <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-09-05iio: Add new event type gesture and use direction for single and double tapJagath Jog J1-1/+7
Add new event type for tap called gesture and the direction can be used to differentiate single and double tap. This may be used by accelerometer sensors to express single and double tap events. For directional tap, modifiers like IIO_MOD_(X/Y/Z) can be used along with singletap and doubletap direction. Signed-off-by: Jagath Jog J <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Cameron <[email protected]>
2022-09-05tools: hv: kvp: remove unnecessary (void*) conversionsZhou jie1-2/+2
Remove unnecessary void* type casting. Signed-off-by: Zhou jie <[email protected]> Reviewed-by: Michael Kelley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]>
2022-09-05bpf: Optimize call_rcu in non-preallocated hash map.Alexei Starovoitov1-11/+0
Doing call_rcu() million times a second becomes a bottle neck. Convert non-preallocated hash map from call_rcu to SLAB_TYPESAFE_BY_RCU. The rcu critical section is no longer observed for one htab element which makes non-preallocated hash map behave just like preallocated hash map. The map elements are released back to kernel memory after observing rcu critical section. This improves 'map_perf_test 4' performance from 100k events per second to 250k events per second. bpf_mem_alloc + percpu_counter + typesafe_by_rcu provide 10x performance boost to non-preallocated hash map and make it within few % of preallocated map while consuming fraction of memory. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-05selftests/bpf: Improve test coverage of test_mapsAlexei Starovoitov1-14/+24
Make test_maps more stressful with more parallelism in update/delete/lookup/walk including different value sizes. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-04kselftest/cgroup: Add cpuset v2 partition root state testWaiman Long4-2/+765
Add a test script test_cpuset_prs.sh with a helper program wait_inotify for exercising the cpuset v2 partition root state code. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2022-09-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-2/+2
Pull kvm fixes from Paolo Bonzini: "s390: - PCI interpretation compile fixes RISC-V: - fix unused variable warnings in vcpu_timer.c - move extern sbi_ext declarations to a header x86: - check validity of argument to KVM_SET_MP_STATE - use guest's global_ctrl to completely disable guest PEBS - fix a memory leak on memory allocation failure - mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIES - fix build failure with Clang integrated assembler - fix MSR interception - always flush TLBs when enabling dirty logging" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: check validity of argument to KVM_SET_MP_STATE perf/x86/core: Completely disable guest PEBS via guest's global_ctrl KVM: x86: fix memoryleak in kvm_arch_vcpu_create() KVM: x86: Mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIES KVM: s390: pci: Hook to access KVM lowlevel from VFIO riscv: kvm: move extern sbi_ext declarations to a header riscv: kvm: vcpu_timer: fix unused variable warnings KVM: selftests: Fix ambiguous mov in KVM_ASM_SAFE() KVM: selftests: Fix KVM_EXCEPTION_MAGIC build with Clang KVM: VMX: Heed the 'msr' argument in msr_write_intercepted() kvm: x86: mmu: Always flush TLBs when enabling dirty logging kvm: x86: mmu: Drop the need_remote_flush() function
2022-09-04selftests/powerpc: Skip 4PB test on 4K PAGE_SIZE systemsMichael Ellerman1-0/+2
Systems using the hash MMU with a 4K page size don't support 4PB address space, so skip the test because the bug it tests for can't be triggered. Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-04memblock_tests: move variable declarations to single blockRebecca Mckeever3-127/+42
Move variable declarations to a single block at the beginning of each testing function. Signed-off-by: Rebecca Mckeever <[email protected]> Signed-off-by: Mike Rapoport <[email protected]> Link: https://lore.kernel.org/r/e61431e73977f305fdd027bca99d1dc119e96d84.1662264355.git.remckee0@gmail.com
2022-09-04memblock tests: remove 'cleared' from comment blocksRebecca Mckeever1-11/+11
The tests in alloc_nid_api can now run either memblock_alloc_try_nid() or memblock_alloc_try_nid_raw(). The comment blocks for these tests should not refer to a 'cleared' region since that only applies to memblock_alloc_try_nid(). Remove 'cleared' from the comment blocks so that the comments are accurate for either memblock function. Signed-off-by: Rebecca Mckeever <[email protected]> Signed-off-by: Mike Rapoport <[email protected]> Link: https://lore.kernel.org/r/e8be24137e54e9f81a06af969ded82b319114d7a.1662264347.git.remckee0@gmail.com
2022-09-03ACPI: tools: pfrut: Do not initialize ret in main()Shi junming1-1/+1
The initialization is unnecessary, because ret is always assigned a new value before reading it. Signed-off-by: Shi junming <[email protected]> [ rjw: Subject edits, new changelog ] Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-09-02selftest/bpf: Add test for bpf_getsockopt()Martin KaFai Lau2-106/+43
This patch removes the __bpf_getsockopt() which directly reads the sk by using PTR_TO_BTF_ID. Instead, the test now directly uses the kernel bpf helper bpf_getsockopt() which supports all the required optname now. TCP_SAVE[D]_SYN and TCP_MAXSEG are not tested in a loop for all the hooks and sock_ops's cb. TCP_SAVE[D]_SYN only works in passive connection. TCP_MAXSEG only works when it is setsockopt before the connection is established and the getsockopt return value can only be tested after the connection is established. Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-09-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski1-10/+26
Florian Westphal says: ==================== netfilter: bug fixes for net 1. Fix IP address check in irc DCC conntrack helper, this should check the opposite direction rather than the destination address of the packets' direction, from David Leadbeater. 2. bridge netfilter needs to drop dst references, from Harsh Modi. This was fine back in the day the code was originally written, but nowadays various tunnels can pre-set metadata dsts on packets. 3. Remove nf_conntrack_helper sysctl and the modparam toggle, users need to explicitily assign the helpers to use via nftables or iptables. Conntrack helpers, by design, may be used to add dynamic port redirections to internal machines, so its necessary to restrict which hosts/peers are allowed to use them. It was discovered that improper checking in the irc DCC helper makes it possible to trigger the 'please do dynamic port forward' from outside by embedding a 'DCC' in a PING request; if the client echos that back a expectation/port forward gets added. The auto-assign-for-everything mechanism has been in "please don't do this" territory since 2012. From Pablo. 4. Fix a memory leak in the netdev hook error unwind path, also from Pablo. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_conntrack_irc: Fix forged IP logic netfilter: nf_tables: clean up hook list when offload flags check fails netfilter: br_netfilter: Drop dst references before setting. netfilter: remove nf_conntrack_helper sysctl and modparam toggles ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-02Merge tag 'io_uring-6.0-2022-09-02' of git://git.kernel.dk/linux-blockLinus Torvalds2-70/+41
Pull io_uring fixes from Jens Axboe: - A single fix for over-eager retries for networking (Pavel) - Revert the notification slot support for zerocopy sends. It turns out that even after more than a year or development and testing, there's not full agreement on whether just using plain ordered notifications is Good Enough to avoid the complexity of using the notifications slots. Because of that, we decided that it's best left to a future final decision. We can always bring back this feature, but we can't really change it or remove it once we've released 6.0 with it enabled. The reverts leave the usual CQE notifications as the primary interface for knowing when data was sent, and when it was acked. (Pavel) * tag 'io_uring-6.0-2022-09-02' of git://git.kernel.dk/linux-block: selftests/net: return back io_uring zc send tests io_uring/net: simplify zerocopy send user API io_uring/notif: remove notif registration Revert "io_uring: rename IORING_OP_FILES_UPDATE" Revert "io_uring: add zc notification flush requests" selftests/net: temporarily disable io_uring zc test io_uring/net: fix overexcessive retries
2022-09-02Merge tag 'landlock-6.0-rc4' of ↵Linus Torvalds1-10/+145
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock fix from Mickaël Salaün: "This fixes a mis-handling of the LANDLOCK_ACCESS_FS_REFER right when multiple rulesets/domains are stacked. The expected behaviour was that an additional ruleset can only restrict the set of permitted operations, but in this particular case, it was potentially possible to re-gain the LANDLOCK_ACCESS_FS_REFER right" * tag 'landlock-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: landlock: Fix file reparenting without explicit LANDLOCK_ACCESS_FS_REFER
2022-09-02perf stat: Fix L2 Topdown metrics disappear for raw eventsZhengjun Xing1-2/+3
In perf/Documentation/perf-stat.txt, for "--td-level" the default "0" means the max level that the current hardware support. So we need initialize the stat_config.topdown_level to TOPDOWN_MAX_LEVEL when “--td-level=0” or no “--td-level” option. Otherwise, for the hardware with a max level is 2, the 2nd level metrics disappear for raw events in this case. The issue cannot be observed for the perf stat default or "--topdown" options. This commit fixes the raw events issue and removes the duplicated code for the perf stat default. Before: # ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1 Performance counter stats for 'sleep 1': 1.03 msec cpu-clock # 0.001 CPUs utilized 1 context-switches # 966.216 /sec 0 cpu-migrations # 0.000 /sec 60 page-faults # 57.973 K/sec 1,132,112 instructions # 1.41 insn per cycle 803,872 cycles # 0.777 GHz 1,909,120 ref-cycles # 1.845 G/sec 236,634 branches # 228.640 M/sec 6,367 branch-misses # 2.69% of all branches 4,823,232 slots # 4.660 G/sec 1,210,536 topdown-retiring # 25.1% Retiring 699,841 topdown-bad-spec # 14.5% Bad Speculation 1,777,975 topdown-fe-bound # 36.9% Frontend Bound 1,134,878 topdown-be-bound # 23.5% Backend Bound 189,146 topdown-heavy-ops # 182.756 M/sec 662,012 topdown-br-mispredict # 639.647 M/sec 1,097,048 topdown-fetch-lat # 1.060 G/sec 416,121 topdown-mem-bound # 402.063 M/sec 1.002423690 seconds time elapsed 0.002494000 seconds user 0.000000000 seconds sys After: # ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1 Performance counter stats for 'sleep 1': 1.13 msec cpu-clock # 0.001 CPUs utilized 1 context-switches # 882.128 /sec 0 cpu-migrations # 0.000 /sec 61 page-faults # 53.810 K/sec 1,137,612 instructions # 1.29 insn per cycle 881,477 cycles # 0.778 GHz 2,093,496 ref-cycles # 1.847 G/sec 236,356 branches # 208.496 M/sec 7,090 branch-misses # 3.00% of all branches 5,288,862 slots # 4.665 G/sec 1,223,697 topdown-retiring # 23.1% Retiring 767,403 topdown-bad-spec # 14.5% Bad Speculation 2,053,322 topdown-fe-bound # 38.8% Frontend Bound 1,244,438 topdown-be-bound # 23.5% Backend Bound 186,665 topdown-heavy-ops # 3.5% Heavy Operations # 19.6% Light Operations 725,922 topdown-br-mispredict # 13.7% Branch Mispredict # 0.8% Machine Clears 1,327,400 topdown-fetch-lat # 25.1% Fetch Latency # 13.7% Fetch Bandwidth 497,775 topdown-mem-bound # 9.4% Memory Bound # 14.1% Core Bound 1.002701530 seconds time elapsed 0.002744000 seconds user 0.000000000 seconds sys Fixes: 63e39aa6ae103451 ("perf stat: Support L2 Topdown events") Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Xing Zhengjun <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-09-02selftests/xsk: Avoid use-after-free on ctxIan Rogers1-3/+3
The put lowers the reference count to 0 and frees ctx, reading it afterwards is invalid. Move the put after the uses and determine the last use by the reference count being 1. Fixes: 39e940d4abfa ("selftests/xsk: Destroy BPF resources only when ctx refcount drops to 0") Signed-off-by: Ian Rogers <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02selftests/bpf: Store BPF object files with .bpf.o extensionDaniel Müller68-250/+250
BPF object files are, in a way, the final artifact produced as part of the ahead-of-time compilation process. That makes them somewhat special compared to "regular" object files, which are a intermediate build artifacts that can typically be removed safely. As such, it can make sense to name them differently to make it easier to spot this difference at a glance. Among others, libbpf-bootstrap [0] has established the extension .bpf.o for BPF object files. It seems reasonable to follow this example and establish the same denomination for selftest build artifacts. To that end, this change adjusts the corresponding part of the build system and the test programs loading BPF object files to work with .bpf.o files. [0] https://github.com/libbpf/libbpf-bootstrap Suggested-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Müller <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02selftests/xsk: Add support for zero copy testingMaciej Fijalkowski2-4/+74
Introduce new mode to xdpxceiver responsible for testing AF_XDP zero copy support of driver that serves underlying physical device. When setting up test suite, determine whether driver has ZC support or not by trying to bind XSK ZC socket to the interface. If it succeeded, interpret it as ZC support being in place and do softirq and busy poll tests for zero copy mode. Note that Rx dropped tests are skipped since ZC path is not touching rx_dropped stat at all. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02selftests/xsk: Make sure single threaded test terminatesMaciej Fijalkowski1-0/+7
For single threaded poll tests call pthread_kill() from main thread so that we are sure worker thread has finished its job and it is possible to proceed with next test types from test suite. It was observed that on some platforms it takes a bit longer for worker thread to exit and next test case sees device as busy in this case. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02selftests/xsk: Add support for executing tests on physical deviceMaciej Fijalkowski3-87/+170
Currently, architecture of xdpxceiver is designed strictly for conducting veth based tests. Veth pair is created together with a network namespace and one of the veth interfaces is moved to the mentioned netns. Then, separate threads for Tx and Rx are spawned which will utilize described setup. Infrastructure described in the paragraph above can not be used for testing AF_XDP support on physical devices. That testing will be conducted on a single network interface and same queue. Xskxceiver needs to be extended to distinguish between veth tests and physical interface tests. Since same iface/queue id pair will be used by both Tx/Rx threads for physical device testing, Tx thread, which happen to run after the Rx thread, is going to create XSK socket with shared umem flag. In order to track this setting throughout the lifetime of spawned threads, introduce 'shared_umem' boolean variable to struct ifobject and set it to true when xdpxceiver is run against physical device. In such case, UMEM size needs to be doubled, so half of it will be used by Rx thread and other half by Tx thread. For two step based test types, value of XSKMAP element under key 0 has to be updated as there is now another socket for the second step. Also, to avoid race conditions when destroying XSK resources, move this activity to the main thread after spawned Rx and Tx threads have finished its job. This way it is possible to gracefully remove shared umem without introducing synchronization mechanisms. To run xsk selftests suite on physical device, append "-i $IFACE" when invoking test_xsk.sh. For veth based tests, simply skip it. When "-i $IFACE" is in place, under the hood test_xsk.sh will use $IFACE for both interfaces supplied to xdpxceiver, which in turn will interpret that this execution of test suite is for a physical device. Note that currently this makes it possible only to test SKB and DRV mode (in case underlying device has native XDP support). ZC testing support is added in a later patch. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02selftests/xsk: Increase chars for interface name to 16Maciej Fijalkowski1-2/+2
So that "enp240s0f0" or such name can be used against xskxceiver. While at it, also extend character count for netns name. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02selftests/xsk: Introduce default Rx pkt streamMaciej Fijalkowski2-27/+51
In order to prepare xdpxceiver for physical device testing, let us introduce default Rx pkt stream. Reason for doing it is that physical device testing will use a UMEM with a doubled size where half of it will be used by Tx and other half by Rx. This means that pkt addresses will differ for Tx and Rx streams. Rx thread will initialize the xsk_umem_info::base_addr that is added here so that pkt_set(), when working on Rx UMEM will add this offset and second half of UMEM space will be used. Note that currently base_addr is 0 on both sides. Future commit will do the mentioned initialization. Previously, veth based testing worked on separate UMEMs, so single default stream was fine. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02selftests/xsk: Query for native XDP supportMaciej Fijalkowski1-2/+37
Currently, xdpxceiver assumes that underlying device supports XDP in native mode - it is fine by now since tests can run only on a veth pair. Future commit is going to allow running test suite against physical devices, so let us query the device if it is capable of running XDP programs in native mode. This way xdpxceiver will not try to run TEST_MODE_DRV if device being tested is not supporting it. Signed-off-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-02landlock: Fix file reparenting without explicit LANDLOCK_ACCESS_FS_REFERMickaël Salaün1-10/+145
This change fixes a mis-handling of the LANDLOCK_ACCESS_FS_REFER right when multiple rulesets/domains are stacked. The expected behaviour was that an additional ruleset can only restrict the set of permitted operations, but in this particular case, it was potentially possible to re-gain the LANDLOCK_ACCESS_FS_REFER right. With the introduction of LANDLOCK_ACCESS_FS_REFER, we added the first globally denied-by-default access right. Indeed, this lifted an initial Landlock limitation to rename and link files, which was initially always denied when the source or the destination were different directories. This led to an inconsistent backward compatibility behavior which was only taken into account if no domain layer were using the new LANDLOCK_ACCESS_FS_REFER right. However, when restricting a thread with a new ruleset handling LANDLOCK_ACCESS_FS_REFER, all inherited parent rulesets/layers not explicitly handling LANDLOCK_ACCESS_FS_REFER would behave as if they were handling this access right and with all their rules allowing it. This means that renaming and linking files could became allowed by these parent layers, but all the other required accesses must also be granted: all layers must allow file removal or creation, and renaming and linking operations cannot lead to privilege escalation according to the Landlock policy. See detailed explanation in commit b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER"). To say it another way, this bug may lift the renaming and linking limitations of the initial Landlock version, and a same ruleset can enforce different restrictions depending on previous or next enforced ruleset (i.e. inconsistent behavior). The LANDLOCK_ACCESS_FS_REFER right cannot give access to data not already allowed, but this doesn't follow the contract of the first Landlock ABI. This fix puts back the limitation for sandboxes that didn't opt-in for this additional right. For instance, if a first ruleset allows LANDLOCK_ACCESS_FS_MAKE_REG on /dst and LANDLOCK_ACCESS_FS_REMOVE_FILE on /src, renaming /src/file to /dst/file is denied. However, without this fix, stacking a new ruleset which allows LANDLOCK_ACCESS_FS_REFER on / would now permit the sandboxed thread to rename /src/file to /dst/file . This change fixes the (absolute) rule access rights, which now always forbid LANDLOCK_ACCESS_FS_REFER except when it is explicitly allowed when creating a rule. Making all domain handle LANDLOCK_ACCESS_FS_REFER was an initial approach but there is two downsides: * it makes the code more complex because we still want to check that a rule allowing LANDLOCK_ACCESS_FS_REFER is legitimate according to the ruleset's handled access rights (i.e. ABI v1 != ABI v2); * it would not allow to identify if the user created a ruleset explicitly handling LANDLOCK_ACCESS_FS_REFER or not, which will be an issue to audit Landlock. Instead, this change adds an ACCESS_INITIALLY_DENIED list of denied-by-default rights, which (only) contains LANDLOCK_ACCESS_FS_REFER. All domains are treated as if they are also handling this list, but without modifying their fs_access_masks field. A side effect is that the errno code returned by rename(2) or link(2) *may* be changed from EXDEV to EACCES according to the enforced restrictions. Indeed, we now have the mechanic to identify if an access is denied because of a required right (e.g. LANDLOCK_ACCESS_FS_MAKE_REG, LANDLOCK_ACCESS_FS_REMOVE_FILE) or if it is denied because of missing LANDLOCK_ACCESS_FS_REFER rights. This may result in different errno codes than for the initial Landlock version, but this approach is more consistent and better for rename/link compatibility reasons, and it wasn't possible before (hence no backport to ABI v1). The layout1.rename_file test reflects this change. Add 4 layout1.refer_denied_by_default* test suites to check that the behavior of a ruleset not handling LANDLOCK_ACCESS_FS_REFER (ABI v1) is unchanged even if another layer handles LANDLOCK_ACCESS_FS_REFER (i.e. ABI v1 precedence). Make sure rule's absolute access rights are correct by testing with and without a matching path. Add test_rename() and test_exchange() helpers. Extend layout1.inval tests to check that a denied-by-default access right is not necessarily part of a domain's handled access rights. Test coverage for security/landlock is 95.3% of 599 lines according to gcc/gcov-11. Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER") Reviewed-by: Paul Moore <[email protected]> Reviewed-by: Günther Noack <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] [mic: Constify and slightly simplify test helpers] Signed-off-by: Mickaël Salaün <[email protected]>
2022-09-02selftests/bpf: Amend test_tunnel to exercise BPF_F_TUNINFO_FLAGSShmulik Ladkani1-8/+16
Get the tunnel flags in {ipv6}vxlan_get_tunnel_src and ensure they are aligned with tunnel params set at {ipv6}vxlan_set_tunnel_dst. Signed-off-by: Shmulik Ladkani <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]