aboutsummaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2021-07-01perf script: Add option to list dlfiltersAdrian Hunter6-1/+134
Add option --list-dlfilters to list dlfilters in the current directory or the exec-path e.g. ~/libexec/perf-core/dlfilters. Use with option -v (must come before option --list-dlfilters) to show long descriptions. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf script: Add dlfilter__filter_event_early()Adrian Hunter5-16/+59
filter_event_early() can be more than 30% faster than filter_event() because it is called before internal filtering. In other respects it is the same as filter_event(), except that it will be passed events that have yet to be filtered out. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf script: Add API for filtering via dynamically loaded shared objectAdrian Hunter7-2/+780
In some cases, users want to filter very large amounts of data (e.g. from AUX area tracing like Intel PT) looking for something specific. While scripting such as Python can be used, Python is 10 to 20 times slower than C. So define a C API so that custom filters can be written and loaded. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf llvm: Return -ENOMEM when asprintf() failsArnaldo Carvalho de Melo1-0/+2
Zhihao sent a patch but it made llvm__compile_bpf() return what asprintf() returns on error, which is just -1, but since this function returns -errno, fix it by returning -ENOMEM for this case instead. Fixes: cb76371441d098 ("perf llvm: Allow passing options to llc ...") Fixes: 5eab5a7ee032ac ("perf llvm: Display eBPF compiling command ...") Reported-by: Hulk Robot <[email protected]> Reported-by: Zhihao Cheng <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Yu Kuai <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()James Clark1-1/+5
Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and non-timeless mode starts decoding on the fist PERF_RECORD_AUX record. This can cause the "data has no samples!" error if the first PERF_RECORD_AUX record comes before the first (or any relevant) PERF_RECORD_MMAP2 record because the mmaps are required by the decoder to access the binary data. This change pushes the start of non-timeless decoding to the very end of parsing the file. The PERF_RECORD_EXIT event can't be used because it might not exist in system-wide or snapshot modes. I have not been able to find the exact cause for the events to be intermittently in the wrong order in the basic scenario: perf record -e cs_etm/@tmc_etr0/u top But it can be made to happen every time with the --delay option. This is because "enable_on_exec" is disabled, which causes tracing to start before the process to be launched is exec'd. For example: perf record -e cs_etm/@tmc_etr0/u --delay=1 top perf report -D | grep 'AUX\|MAP' 0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 [] 0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 [] 0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 [] 4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top 4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so Another scenario in which decoding from the first aux record fails is a workload that forks. Although the aux record comes after 'bash', it comes before 'top', which is what we are interested in. For example: perf record -e cs_etm/@tmc_etr0/u -- bash -c top perf report -D | grep 'AUX\|MAP' 4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash 4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so 4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso] 0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 [] ... 0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 [] 4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top 4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so 4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso] A third scenario is when the majority of time is spent in a shared library that is not loaded at startup. For example a dynamically loaded plugin. Testing ======= Testing was done by checking if any samples that are present in the old output are missing from the new output. Timestamps must be stripped out with awk because now they are set to the last AUX sample, rather than the first: ./perf script $4 | awk '!($4="")' > new.script ./perf-default script $4 | awk '!($4="")' > default.script comm -13 <(sort -u new.script) <(sort -u default.script) Testing showed that the new output is a superset of the old. When lines appear in the comm output, it is not because they are missing but because [unknown] is now resolved to sensible locations. For example last putp branch here now resolves to libtinfo, so it's not missing from the output, but is actually improved: Old: top 305 [001] 1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps) top 305 [001] 1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps) top 305 [001] 1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown]) New: top 305 [001] 1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps) top 305 [001] 1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps) top 305 [001] 1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9) In the following two modes, decoding now works and the "data has no samples!" error is not displayed any more: perf record -e cs_etm/@tmc_etr0/u -- bash -c top perf record -e cs_etm/@tmc_etr0/u --delay=1 top In snapshot mode, there is also an improvement to decoding. Previously samples for the 'kill' process that was used to send SIGUSR2 were completely missing, because the process hadn't started yet. But now there are additional samples present: perf record -e cs_etm/@tmc_etr0/u --snapshot -a perf script stress 19380 [003] 161627.938153: 1000000 instructions:uH: aaaabb612fb4 [unknown] (/usr/bin/stress) kill 19644 [000] 161627.938153: 1000000 instructions:uH: ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so) stress 19380 [003] 161627.938153: 1000000 instructions:uH: ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so) Also tested was the round trip of 'perf inject' followed by 'perf report' which has the same differences and improvements. Signed-off-by: James Clark <[email protected]> Reviewed-by: Leo Yan <[email protected]> Tested-by: Leo Yan <[email protected]> Acked-by: Mathieu Poirier <[email protected]> Cc: Al Grant <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Denis Nikitin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01tools headers UAPI: Synch KVM's svm.h header with the kernelArnaldo Carvalho de Melo1-0/+3
To pick up the changes from: 59d21d67f37481cf ("KVM: SVM: Software reserved fields") Picking the new SVM_EXIT_SW exit reasons. Addressing this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/svm.h' differs from latest version at 'arch/x86/include/uapi/asm/svm.h' diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h Cc: Paolo Bonzini <[email protected]> Cc: Vineeth Pillai <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01tools kvm headers arm64: Update KVM headers from the kernel sourcesArnaldo Carvalho de Melo1-0/+11
To pick the changes from: f0376edb1ddcab19 ("KVM: arm64: Add ioctl to fetch/store tags in a guest") That don't causes any changes in tooling (when built on x86), only addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h' diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h Cc: Marc Zyngier <[email protected]> Cc: Steven Price <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01tools headers UAPI: Sync linux/kvm.h with the kernel sourcesArnaldo Carvalho de Melo2-0/+118
To pick the changes in: 19238e75bd8ed8ff ("kvm: x86: Allow userspace to handle emulation errors") cb082bfab59a224a ("KVM: stats: Add fd-based API to read binary stats data") b87cc116c7e1bc62 ("KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability") f0376edb1ddcab19 ("KVM: arm64: Add ioctl to fetch/store tags in a guest") 0dbb11230437895f ("KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall") 6dba940352038b56 ("KVM: x86: Introduce KVM_GET_SREGS2 / KVM_SET_SREGS2") 644f706719f0297b ("KVM: x86: hyper-v: Introduce KVM_CAP_HYPERV_ENFORCE_CPUID") That automatically adds support for these new ioctls: $ tools/perf/trace/beauty/kvm_ioctl.sh > before $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h $ tools/perf/trace/beauty/kvm_ioctl.sh > after $ diff -u before after --- before 2021-07-01 13:42:07.006387354 -0300 +++ after 2021-07-01 13:45:16.051649301 -0300 @@ -95,6 +95,9 @@ [0xc9] = "XEN_HVM_SET_ATTR", [0xca] = "XEN_VCPU_GET_ATTR", [0xcb] = "XEN_VCPU_SET_ATTR", + [0xcc] = "GET_SREGS2", + [0xcd] = "SET_SREGS2", + [0xce] = "GET_STATS_FD", [0xe0] = "CREATE_DEVICE", [0xe1] = "SET_DEVICE_ATTR", [0xe2] = "GET_DEVICE_ATTR", $ This silences these perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h' diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h' diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h Cc: Aaron Lewis <[email protected]> Cc: Ashish Kalra <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Jing Zhang <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Maxim Levitsky <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Steven Price <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01tools headers cpufeatures: Sync with the kernel sourcesArnaldo Carvalho de Melo1-1/+2
To pick the changes from: 1348924ba8169f35 ("x86/msr: Define new bits in TSX_FORCE_ABORT MSR") cbcddaa33d7e11a0 ("perf/x86/rapl: Use CPUID bit on AMD and Hygon parts") This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Andrew Cooper <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Pawan Gupta <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01tools include UAPI: Update linux/mount.h copyArnaldo Carvalho de Melo1-0/+1
To pick the changes from: dd8b477f9a3d8edb ("mount: Support "nosymfollow" in new mount api") That ends up adding support for the new MOUNT_ATTR_NOSYMFOLLOW mount attribute: $ tools/perf/trace/beauty/fsmount.sh > before $ cp include/uapi/linux/mount.h tools/include/uapi/linux/mount.h $ tools/perf/trace/beauty/fsmount.sh > after $ diff -u before after --- before 2021-07-01 13:34:04.542517355 -0300 +++ after 2021-07-01 13:34:12.423694537 -0300 @@ -7,4 +7,5 @@ [ilog2(0x00000020) + 1] = "STRICTATIME", [ilog2(0x00000080) + 1] = "NODIRATIME", [ilog2(0x00100000) + 1] = "IDMAP", + [ilog2(0x00200000) + 1] = "NOSYMFOLLOW", }; $ So now one can use it in --filter expressions for tracepoints. This silences this perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/linux/mount.h' differs from latest version at 'include/uapi/linux/mount.h' diff -u tools/include/uapi/linux/mount.h include/uapi/linux/mount.h Cc: Christian Brauner <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01tools arch x86: Sync the msr-index.h copy with the kernel sourcesArnaldo Carvalho de Melo1-0/+4
To pick up the changes from these csets: 1348924ba8169f35 ("x86/msr: Define new bits in TSX_FORCE_ABORT MSR") That cause no changes to tooling: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after $ Just silences this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Cc: Borislav Petkov <[email protected]> Cc: Pawan Gupta <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf arm-spe: Don't wait for PERF_RECORD_EXIT eventLeo Yan1-5/+1
When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last perf event) for processing trace data, which is needless and even might cause logic error, e.g. it might fail to correlate perf events with Arm SPE events correctly. So this patch removes the condition checking for PERF_RECORD_EXIT event. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Dave Martin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: [email protected] Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf arm-spe: Bail out if the trace is later than perf eventLeo Yan1-3/+34
It's possible that record in Arm SPE trace is later than perf event and vice versa. This asks to correlate the perf events and Arm SPE synthesized events to be processed in the manner of correct timing. To achieve the time ordering, this patch reverses the flow, it firstly calls arm_spe_sample() and then calls arm_spe_decode(). By comparing the timestamp value and detect the perf event is coming earlier than Arm SPE trace data, it bails out from the decoding loop, the last record is pushed into auxtrace stack and is deferred to generate sample. To track the timestamp, everytime it updates timestamp for the latest record. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Dave Martin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf arm-spe: Assign kernel time to synthesized eventLeo Yan1-1/+1
In current code, it assigns the arch timer counter to the synthesized samples Arm SPE trace, thus the samples don't contain the kernel time but only contain the raw counter value. To fix the issue, this patch converts the timer counter to kernel time and assigns it to sample timestamp. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Dave Martin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf arm-spe: Convert event kernel time to counter valueLeo Yan1-1/+1
When handle a perf event, Arm SPE decoder needs to decide if this perf event is earlier or later than the samples from Arm SPE trace data; to do comparision, it needs to use the same unit for the time. This patch converts the event kernel time to arch timer's counter value, thus it can be used to compare with counter value contained in Arm SPE Timestamp packet. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Dave Martin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf arm-spe: Save clock parameters from TIME_CONV eventLeo Yan1-0/+26
During the recording phase, "perf record" tool synthesizes event PERF_RECORD_TIME_CONV for the hardware clock parameters and saves the event into the data file. Afterwards, when processing the data file, the event TIME_CONV will be processed at the very early time and is stored into session context. This patch extracts these parameters from the session context and saves into the structure "spe->tc" with the type perf_tsc_conversion, so that the parameters are ready for conversion between clock counter and time stamp. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Al Grant <[email protected]> Cc: Dave Martin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf cs-etm: Remove callback cs_etm_find_snapshot()Leo Yan1-133/+0
The callback cs_etm_find_snapshot() is invoked for snapshot mode, its main purpose is to find the correct AUX trace data and returns "head" and "old" (we can call "old" as "old head") to the caller, the caller __auxtrace_mmap__read() uses these two pointers to decide the AUX trace data size. This patch removes cs_etm_find_snapshot() with below reasons: - The first thing in cs_etm_find_snapshot() is to check if the head has wrapped around, if it is not, directly bails out. The checking is pointless, this is because the "head" and "old" pointers both are monotonical increasing so they never wrap around. - cs_etm_find_snapshot() adjusts the "head" and "old" pointers and assumes the AUX ring buffer is fully filled with the hardware trace data, so it always subtracts the difference "mm->len" from "head" to get "old". Let's imagine the snapshot is taken in very short interval, the tracers only fill a small chunk of the trace data into the AUX ring buffer, in this case, it's wrongly to copy the whole the AUX ring buffer to perf file. - As the "head" and "old" pointers are monotonically increased, the function __auxtrace_mmap__read() handles these two pointers properly. It calculates the reminders for these two pointers, and the size is clamped to be never more than "snapshot_size". We can simply reply on the function __auxtrace_mmap__read() to calculate the correct result for data copying, it's not necessary to add Arm CoreSight specific callback. Signed-off-by: Leo Yan <[email protected]> Reviewed-by: James Clark <[email protected]> Tested-by: James Clark <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Daniel Kiss <[email protected]> Cc: Denis Nikitin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf bpf_counter: Move common functions to bpf_counter.hNamhyung Kim2-52/+52
Some helper functions will be used for cgroup counting too. Move them to a header file for sharing. Committer notes: Fix the build on older systems with: - struct bpf_map_info map_info = {0}; + struct bpf_map_info map_info = { .id = 0, }; This wasn't breaking the build in such systems as bpf_counter.c isn't built due to: tools/perf/util/Build: perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o The bpf_counter.h file on the other hand is included from places that are built everywhere. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01selftests/vm/pkeys: exercise x86 XSAVE init stateDave Hansen3-2/+76
On x86, there is a set of instructions used to save and restore register state collectively known as the XSAVE architecture. There are about a dozen different features managed with XSAVE. The protection keys register, PKRU, is one of those features. The hardware optimizes XSAVE by tracking when the state has not changed from its initial (init) state. In this case, it can avoid the cost of writing state to memory (it would usually just be a bunch of 0's). When the pkey register is 0x0 the hardware optionally choose to track the register as being in the init state (optimize away the writes). AMD CPUs do this more aggressively compared to Intel. On x86, PKRU is rarely in its (very permissive) init state. Instead, the value defaults to something very restrictive. It is not surprising that bugs have popped up in the rare cases when PKRU reaches its init state. Add a protection key selftest which gets the protection keys register into its init state in a way that should work on Intel and AMD. Then, do a bunch of pkey register reads to watch for inadvertent changes. This adds "-mxsave" to CFLAGS for all the x86 vm selftests in order to allow use of the XSAVE instruction __builtin functions. This will make the builtins available on all of the vm selftests, but is expected to be harmless. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Aneesh Kumar K.V <[email protected]> Cc: Ram Pai <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "Desnes A. Nunes do Rosario" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Suchanek <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01selftests/vm/pkeys: refill shadow register after implicit kernel writeDave Hansen1-0/+7
The pkey test code keeps a "shadow" of the pkey register around. This ensures that any bugs which might write to the register can be caught more quickly. Generally, userspace has a good idea when the kernel is going to write to the register. For instance, alloc_pkey() is passed a permission mask. The caller of alloc_pkey() can update the shadow based on the return value and the mask. But, the kernel can also modify the pkey register in a more sneaky way. For mprotect(PROT_EXEC) mappings, the kernel will allocate a pkey and write the pkey register to create an execute-only mapping. The kernel never tells userspace what key it uses for this. This can cause the test to fail with messages like: protection_keys_64.2: pkey-helpers.h:132: _read_pkey_reg: Assertion `pkey_reg == shadow_pkey_reg' failed. because the shadow was not updated with the new kernel-set value. Forcibly update the shadow value immediately after an mprotect(). Link: https://lkml.kernel.org/r/[email protected] Fixes: 6af17cf89e99 ("x86/pkeys/selftests: Add PROT_EXEC test") Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Aneesh Kumar K.V <[email protected]> Cc: Ram Pai <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "Desnes A. Nunes do Rosario" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Suchanek <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01selftests/vm/pkeys: handle negative sys_pkey_alloc() return codeDave Hansen1-1/+1
The alloc_pkey() sefltest function wraps the sys_pkey_alloc() system call. On success, it updates its "shadow" register value because sys_pkey_alloc() updates the real register. But, the success check is wrong. pkey_alloc() considers any non-zero return code to indicate success where the pkey register will be modified. This fails to take negative return codes into account. Consider only a positive return value as a successful call. Link: https://lkml.kernel.org/r/[email protected] Fixes: 5f23f6d082a9 ("x86/pkeys: Add self-tests") Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Tested-by: Aneesh Kumar K.V <[email protected]> Cc: Ram Pai <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "Desnes A. Nunes do Rosario" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Suchanek <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really randomDave Hansen1-1/+2
Patch series "selftests/vm/pkeys: Bug fixes and a new test". There has been a lot of activity on the x86 front around the XSAVE architecture which is used to context-switch processor state (among other things). In addition, AMD has recently joined the protection keys club by adding processor support for PKU. The AMD implementation helped uncover a kernel bug around the PKRU "init state", which actually applied to Intel's implementation but was just harder to hit. This series adds a test which is expected to help find this class of bug both on AMD and Intel. All the work around pkeys on x86 also uncovered a few bugs in the selftest. This patch (of 4): The "random" pkey allocation code currently does the good old: srand((unsigned int)time(NULL)); *But*, it unfortunately does this on every random pkey allocation. There may be thousands of these a second. time() has a one second resolution. So, each time alloc_random_pkey() is called, the PRNG is *RESET* to time(). This is nasty. Normally, if you do: srand(<ANYTHING>); foo = rand(); bar = rand(); You'll be quite guaranteed that 'foo' and 'bar' are different. But, if you do: srand(1); foo = rand(); srand(1); bar = rand(); You are quite guaranteed that 'foo' and 'bar' are the *SAME*. The recent "fix" effectively forced the test case to use the same "random" pkey for the whole test, unless the test run crossed a second boundary. Only run srand() once at program startup. This explains some very odd and persistent test failures I've been seeing. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 6e373263ce07 ("selftests/vm/pkeys: fix alloc_random_pkey() to make it really random") Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Aneesh Kumar K.V <[email protected]> Cc: Ram Pai <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Florian Weimer <[email protected]> Cc: "Desnes A. Nunes do Rosario" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Suchanek <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01mm: selftests for exclusive device memoryAlistair Popple1-0/+158
Adds some selftests for exclusive device memory. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Acked-by: Jason Gunthorpe <[email protected]> Tested-by: Ralph Campbell <[email protected]> Reviewed-by: Ralph Campbell <[email protected]> Cc: Ben Skeggs <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Peter Xu <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01perf tools: Add cgroup_is_v2() helperNamhyung Kim2-0/+21
The cgroup_is_v2() is to check if the given subsystem is mounted on cgroup v2 or not. It'll be used by BPF cgroup code later. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-07-01perf tools: Add read_cgroup_id() functionNamhyung Kim2-0/+35
The read_cgroup_id() is to read a cgroup id from a file handle using name_to_handle_at(2) for the given cgroup. It'll be used by bperf cgroup stat later. Committer notes: -int read_cgroup_id(struct cgroup *cgrp) +static inline int read_cgroup_id(struct cgroup *cgrp __maybe_unused) To fix the build when HAVE_FILE_HANDLE is not defined. Signed-off-by: Namhyung Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-06-30selftests/vm: add test for MADV_POPULATE_(READ|WRITE)David Hildenbrand4-0/+360
Let's add a simple test for MADV_POPULATE_READ and MADV_POPULATE_WRITE, verifying some error handling, that population works, and that softdirty tracking works as expected. For now, limit the test to private anonymous memory. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Max Filippov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rolf Eike Beer <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Ram Pai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30selftests/vm: add protection_keys_32 / protection_keys_64 to gitignoreDavid Hildenbrand1-0/+2
We missed adding two binaries to gitignore. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ram Pai <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Rolf Eike Beer <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30userfaultfd/selftests: exercise minor fault handling shmem supportAxel Rasmussen1-4/+25
Enable test_uffdio_minor for test_type == TEST_SHMEM, and modify the test slightly to pass in / check for the right feature flags. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Brian Geffon <[email protected]> Cc: "Dr . David Alan Gilbert" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Wang Qing <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30userfaultfd/selftests: reinitialize test context in each testAxel Rasmussen1-105/+117
Currently, the context (fds, mmap-ed areas, etc.) are global. Each test mutates this state in some way, in some cases really "clobbering it" (e.g., the events test mremap-ing area_dst over the top of area_src, or the minor faults tests overwriting the count_verify values in the test areas). We run the tests in a particular order, each test is careful to make the right assumptions about its starting state, etc. But, this is fragile. It's better for a test's success or failure to not depend on what some other prior test case did to the global state. To that end, clear and reinitialize the test context at the start of each test case, so whatever prior test cases did doesn't affect future tests. This is particularly relevant to this series because the events test's mremap of area_dst screws up assumptions the minor fault test was relying on. This wasn't a problem for hugetlb, as we don't mremap in that case. [[email protected]: fix conflict between this patch and the uffd pagemap series] Link: https://lkml.kernel.org/r/YKQqKrl+/cQ1utrb@t490s Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Brian Geffon <[email protected]> Cc: "Dr . David Alan Gilbert" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Wang Qing <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30userfaultfd/selftests: create alias mappings in the shmem testAxel Rasmussen1-3/+19
Previously, we just allocated two shm areas: area_src and area_dst. With this commit, change this so we also allocate area_src_alias, and area_dst_alias. area_*_alias and area_* (respectively) point to the same underlying physical pages, but are different VMAs. In a future commit in this series, we'll leverage this setup to exercise minor fault handling support for shmem, just like we do in the hugetlb_shared test. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Brian Geffon <[email protected]> Cc: "Dr . David Alan Gilbert" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Wang Qing <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30userfaultfd/selftests: use memfd_create for shmem test typeAxel Rasmussen1-1/+15
This is a preparatory commit. In the future, we want to be able to setup alias mappings for area_src and area_dst in the shmem test, like we do in the hugetlb_shared test. With a VMA obtained via mmap(MAP_ANONYMOUS | MAP_SHARED), it isn't clear how to do this. So, mmap() with an fd, so we can create alias mappings. Use memfd_create instead of actually passing in a tmpfs path like hugetlb does, since it's more convenient / simpler to run, and works just as well. Future commits will: 1. Setup the alias mappings. 2. Extend our tests to actually take advantage of this, to test new userfaultfd behavior being introduced in this series. Also, a small fix in the area we're changing: when the hugetlb setup fails in main(), pass in the right argv[] so we actually print out the hugetlb file path. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Brian Geffon <[email protected]> Cc: "Dr . David Alan Gilbert" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Wang Qing <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30userfaultfd/selftests: add pagemap uffd-wp testPeter Xu1-0/+154
Add one anonymous specific test to start using pagemap. With pagemap support, we can directly read the uffd-wp bit from pgtable without triggering any fault, so it's easier to do sanity checks in unit tests. Meanwhile this test also leverages the newly introduced MADV_PAGEOUT madvise function to test swap ptes with uffd-wp bit set, and across fork()s. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Brian Geffon <[email protected]> Cc: "Dr . David Alan Gilbert" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Wang Qing <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30userfaultfd/selftests: unify error handlingPeter Xu1-369/+187
Introduce err()/_err() and replace all the different ways to fail the program, mostly "fprintf" and "perror" with tons of exit() calls. Always stop the test program at any failure. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Brian Geffon <[email protected]> Cc: "Dr . David Alan Gilbert" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Wang Qing <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30userfaultfd/selftests: only dump counts if mode enabledPeter Xu1-10/+20
WP and MINOR modes are conditionally enabled on specific memory types. This patch avoids dumping tons of zeros for those cases when the modes are not supported at all. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Brian Geffon <[email protected]> Cc: "Dr . David Alan Gilbert" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Wang Qing <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30userfaultfd/selftests: dropping VERIFY check in locking_threadPeter Xu1-54/+1
It tries to check against all zeros and looped for quite a few times. However after that we'll verify the same page with count_verify, while count_verify can never be zero. So it means if it's a zero page we'll detect it anyways with below code. There's yet another place we conditionally check the fault flag - just do it unconditionally. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Brian Geffon <[email protected]> Cc: "Dr . David Alan Gilbert" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Wang Qing <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30userfaultfd/selftests: remove the time() check on delayed uffdPeter Xu1-8/+0
There seems to have no guarantee that time() will return the same for the two calls even if there's no delay, e.g. when a fault is accidentally crossing the changing of a second. Meanwhile, this message is also not helping that much since delay could happen with a lot of reasons, e.g., schedule latency of resolving thread. It may not mean an issue with uffd. Neither do I saw this error triggered either in the past runs. Even if it triggers, it'll be drown in all the rest of test logs. Remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Brian Geffon <[email protected]> Cc: "Dr . David Alan Gilbert" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Wang Qing <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30userfaultfd/selftests: use user mode onlyPeter Xu1-1/+1
Patch series "userfaultfd/selftests: A few cleanups", v2. I wanted to cleanup userfaultfd.c fault handling for a long time. If it's not cleaned, when the new code grows the file it'll also grow the size that needs to be cleaned... This is my attempt to cleanup the userfaultfd selftest on fault handling, to use an err() macro instead of either fprintf() or perror() then another exit() call. The huge cleanup is done in the last patch. The first 4 patches are some other standalone cleanups for the same file, so I put them together. This patch (of 5): Userfaultfd selftest does not need to handle kernel initiated fault. Set user mode so it can be run even if unprivileged_userfaultfd=0 (which is the default). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Brian Geffon <[email protected]> Cc: "Dr . David Alan Gilbert" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Wang Qing <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30khugepaged: selftests: remove debug_cowNanyong Sun1-4/+0
The debug_cow attribute had been removed since commit 4958e4d86ecb01 ("mm: thp: remove debug_cow switch"), so remove it in selftest code too, otherwise the khugepaged test will fail. Link: https://lkml.kernel.org/r/[email protected] Fixes: 4958e4d86ecb01 ("mm: thp: remove debug_cow switch") Signed-off-by: Nanyong Sun <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Kefeng Wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30Merge tag 'net-next-5.14' of ↵Linus Torvalds192-1669/+8874
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - BPF: - add syscall program type and libbpf support for generating instructions and bindings for in-kernel BPF loaders (BPF loaders for BPF), this is a stepping stone for signed BPF programs - infrastructure to migrate TCP child sockets from one listener to another in the same reuseport group/map to improve flexibility of service hand-off/restart - add broadcast support to XDP redirect - allow bypass of the lockless qdisc to improving performance (for pktgen: +23% with one thread, +44% with 2 threads) - add a simpler version of "DO_ONCE()" which does not require jump labels, intended for slow-path usage - virtio/vsock: introduce SOCK_SEQPACKET support - add getsocketopt to retrieve netns cookie - ip: treat lowest address of a IPv4 subnet as ordinary unicast address allowing reclaiming of precious IPv4 addresses - ipv6: use prandom_u32() for ID generation - ip: add support for more flexible field selection for hashing across multi-path routes (w/ offload to mlxsw) - icmp: add support for extended RFC 8335 PROBE (ping) - seg6: add support for SRv6 End.DT46 behavior - mptcp: - DSS checksum support (RFC 8684) to detect middlebox meddling - support Connection-time 'C' flag - time stamping support - sctp: packetization Layer Path MTU Discovery (RFC 8899) - xfrm: speed up state addition with seq set - WiFi: - hidden AP discovery on 6 GHz and other HE 6 GHz improvements - aggregation handling improvements for some drivers - minstrel improvements for no-ack frames - deferred rate control for TXQs to improve reaction times - switch from round robin to virtual time-based airtime scheduler - add trace points: - tcp checksum errors - openvswitch - action execution, upcalls - socket errors via sk_error_report Device APIs: - devlink: add rate API for hierarchical control of max egress rate of virtual devices (VFs, SFs etc.) - don't require RCU read lock to be held around BPF hooks in NAPI context - page_pool: generic buffer recycling New hardware/drivers: - mobile: - iosm: PCIe Driver for Intel M.2 Modem - support for Qualcomm MSM8998 (ipa) - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU) - NXP SJA1110 Automotive Ethernet 10-port switch - Qualcomm QCA8327 switch support (qca8k) - Mikrotik 10/25G NIC (atl1c) Driver changes: - ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP (our first foray into MAC/PHY description via ACPI) - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx - Mellanox/Nvidia NIC (mlx5) - NIC VF offload of L2 bridging - support IRQ distribution to Sub-functions - Marvell (prestera): - add flower and match all - devlink trap - link aggregation - Netronome (nfp): connection tracking offload - Intel 1GE (igc): add AF_XDP support - Marvell DPU (octeontx2): ingress ratelimit offload - Google vNIC (gve): new ring/descriptor format support - Qualcomm mobile (rmnet & ipa): inline checksum offload support - MediaTek WiFi (mt76) - mt7915 MSI support - mt7915 Tx status reporting - mt7915 thermal sensors support - mt7921 decapsulation offload - mt7921 enable runtime pm and deep sleep - Realtek WiFi (rtw88) - beacon filter support - Tx antenna path diversity support - firmware crash information via devcoredump - Qualcomm WiFi (wcn36xx) - Wake-on-WLAN support with magic packets and GTK rekeying - Micrel PHY (ksz886x/ksz8081): add cable test support" * tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits) tcp: change ICSK_CA_PRIV_SIZE definition tcp_yeah: check struct yeah size at compile time gve: DQO: Fix off by one in gve_rx_dqo() stmmac: intel: set PCI_D3hot in suspend stmmac: intel: Enable PHY WOL option in EHL net: stmmac: option to enable PHY WOL with PMT enabled net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del} net: use netdev_info in ndo_dflt_fdb_{add,del} ptp: Set lookup cookie when creating a PTP PPS source. net: sock: add trace for socket errors net: sock: introduce sk_error_report net: dsa: replay the local bridge FDB entries pointing to the bridge dev too net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev net: dsa: include fdb entries pointing to bridge in the host fdb list net: dsa: include bridge addresses which are local in the host fdb list net: dsa: sync static FDB entries on foreign interfaces to hardware net: dsa: install the host MDB and FDB entries in the master's RX filter net: dsa: reference count the FDB addresses at the cross-chip notifier level net: dsa: introduce a separate cross-chip notifier type for host FDBs net: dsa: reference count the MDB entries at the cross-chip notifier level ...
2021-06-30tools lib: Adopt bitmap_intersects() operation from the kernel sourcesAlexey Bayduraev2-0/+25
Adopt bitmap_intersects() routine that tests whether bitmaps bitmap1 and bitmap2 intersects. This routine will be used during thread masks initialization. Signed-off-by: Alexey Bayduraev <[email protected]> Acked-by: Andi Kleen <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Antonov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Riccardo Mancini <[email protected]> Link: http://lore.kernel.org/lkml/f75aa738d8ff8f9cffd7532d671f3ef3deb97a7c.1625065643.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-06-30Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo88-757/+5137
To pick up fixes. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-06-30Merge tag 'platform-drivers-x86-v5.14-1' of ↵Linus Torvalds4-2/+35
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: "Highlights: - New think-lmi driver adding support for changing Lenovo Thinkpad BIOS settings from within Linux using the standard firmware- attributes class sysfs API - MS Surface aggregator-cdev now also supports forwarding events to user-space (for debugging / new driver development purposes only) - New intel_skl_int3472 driver this provides the necessary glue to translate ACPI table information to GPIOs, regulators, etc. for camera sensors on Intel devices with IPU3 attached MIPI cameras - A whole bunch of other fixes + device-specific quirk additions - New devm_work_autocancel() devm-helpers.h function" * tag 'platform-drivers-x86-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (83 commits) platform/x86: dell-wmi-sysman: Change user experience when Admin/System Password is modified platform/x86: intel_skl_int3472: Uninitialized variable in skl_int3472_handle_gpio_resources() platform/x86: think-lmi: Move kfree(setting->possible_values) to tlmi_attr_setting_release() platform/x86: think-lmi: Split current_value to reflect only the value platform/x86: think-lmi: Fix issues with duplicate attributes platform/x86: think-lmi: Return EINVAL when kbdlang gets set to a 0 length string platform/x86: intel_cht_int33fe: Move to its own subfolder platform/x86: intel_skl_int3472: Move to intel/ subfolder platform/x86: intel_skl_int3472: Provide skl_int3472_unregister_clock() platform/x86: intel_skl_int3472: Provide skl_int3472_unregister_regulator() platform/x86: intel_skl_int3472: Use ACPI GPIO resource directly platform/x86: intel_skl_int3472: Fix dependencies (drop CLKDEV_LOOKUP) platform/x86: intel_skl_int3472: Free ACPI device resources after use platform/x86: Remove "default n" entries platform/x86: ISST: Use numa node id for cpu pci dev mapping platform/x86: ISST: Optimize CPU to PCI device mapping tools/power/x86/intel-speed-select: v1.10 release tools/power/x86/intel-speed-select: Fix uncore memory frequency display extcon: extcon-max8997: Simplify driver using devm extcon: extcon-max8997: Fix IRQ freeing at error path ...
2021-06-29Merge tag 'fs.openat2.unknown_flags.v5.14' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull openat2 fixes from Christian Brauner: - Remove the unused VALID_UPGRADE_FLAGS define we carried from an extension to openat2() that we haven't merged. Aleksa might be getting back to it at some point but just not right now. - openat2() used to accidently ignore unknown flag values in the upper 32 bits. The new openat2() syscall verifies that no unknown O-flag values are set and returns an error to userspace if they are while the older open syscalls like open() and openat() simply ignore unknown flag values: #define O_FLAG_CURRENTLY_INVALID (1 << 31) struct open_how how = { .flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID, .resolve = 0, }; /* fails */ fd = openat2(-EBADF, "/dev/null", &how, sizeof(how)); /* succeeds */ fd = openat(-EBADF, "/dev/null", O_RDONLY | O_FLAG_CURRENTLY_INVALID); However, openat2() silently truncates the upper 32 bits meaning: #define O_FLAG_CURRENTLY_INVALID_LOWER32 (1 << 31) #define O_FLAG_CURRENTLY_INVALID_UPPER32 (1 << 40) struct open_how how_lowe32 = { .flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID_LOWER32, }; struct open_how how_upper32 = { .flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID_UPPER32, }; /* fails */ fd = openat2(-EBADF, "/dev/null", &how_lower32, sizeof(how_lower32)); /* succeeds */ fd = openat2(-EBADF, "/dev/null", &how_upper32, sizeof(how_upper32)); Fix this by preventing the immediate truncation in build_open_flags() and add a compile-time check to catch when we add flags in the upper 32 bit range. * tag 'fs.openat2.unknown_flags.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: test: add openat2() test for invalid upper 32 bit flag value open: don't silently ignore unknown O-flags in openat2() fcntl: remove unused VALID_UPGRADE_FLAGS
2021-06-29Merge tag 'fs.mount_setattr.nosymfollow.v5.14' of ↵Linus Torvalds1-3/+85
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull mount_setattr updates from Christian Brauner: "A few releases ago the old mount API gained support for a mount options which prevents following symlinks on a given mount. This adds support for it in the new mount api through the MOUNT_ATTR_NOSYMFOLLOW flag via mount_setattr() and fsmount(). With mount_setattr() that flag can even be applied recursively. There's an additional ack from Ross Zwisler who originally authored the nosymfollow patch. As I've already had the patches in my for-next I didn't add his ack explicitly" * tag 'fs.mount_setattr.nosymfollow.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: test MOUNT_ATTR_NOSYMFOLLOW with mount_setattr() mount: Support "nosymfollow" in new mount api
2021-06-29Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-31/+69
Merge misc updates from Andrew Morton: "191 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab, slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap, mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization, pagealloc, and memory-failure)" * emailed patches from Andrew Morton <[email protected]>: (191 commits) mm,hwpoison: make get_hwpoison_page() call get_any_page() mm,hwpoison: send SIGBUS with error virutal address mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes mm/page_alloc: allow high-order pages to be stored on the per-cpu lists mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA docs: remove description of DISCONTIGMEM arch, mm: remove stale mentions of DISCONIGMEM mm: remove CONFIG_DISCONTIGMEM m68k: remove support for DISCONTIGMEM arc: remove support for DISCONTIGMEM arc: update comment about HIGHMEM implementation alpha: remove DISCONTIGMEM and NUMA mm/page_alloc: move free_the_page mm/page_alloc: fix counting of managed_pages mm/page_alloc: improve memmap_pages dbg msg mm: drop SECTION_SHIFT in code comments mm/page_alloc: introduce vm.percpu_pagelist_high_fraction mm/page_alloc: limit the number of pages on PCP lists when reclaim is active mm/page_alloc: scale the number of pages that are batch freed ...
2021-06-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-1/+9
Trivial conflict in net/netfilter/nf_tables_api.c. Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version. skmsg, and L4 bpf - keep the bpf code but remove the flags and err params. Signed-off-by: Jakub Kicinski <[email protected]>
2021-06-29Merge tag 'x86-entry-2021-06-29' of ↵Linus Torvalds1-49/+442
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry code related updates from Thomas Gleixner: - Consolidate the macros for .byte ... opcode sequences - Deduplicate register offset defines in include files - Simplify the ia32,x32 compat handling of the related syscall tables to get rid of #ifdeffery. - Clear all EFLAGS which are not required for syscall handling - Consolidate the syscall tables and switch the generation over to the generic shell script and remove the CFLAGS tweaks which are not longer required. - Use 'int' type for system call numbers to match the generic code. - Add more selftests for syscalls * tag 'x86-entry-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/syscalls: Don't adjust CFLAGS for syscall tables x86/syscalls: Remove -Wno-override-init for syscall tables x86/uml/syscalls: Remove array index from syscall initializers x86/syscalls: Clear 'offset' and 'prefix' in case they are set in env x86/entry: Use int everywhere for system call numbers x86/entry: Treat out of range and gap system calls the same x86/entry/64: Sign-extend system calls on entry to int selftests/x86/syscall: Add tests under ptrace to syscall_numbering_64 selftests/x86/syscall: Simplify message reporting in syscall_numbering selftests/x86/syscall: Update and extend syscall_numbering_64 x86/syscalls: Switch to generic syscallhdr.sh x86/syscalls: Use __NR_syscalls instead of __NR_syscall_max x86/unistd: Define X32_NR_syscalls only for 64-bit kernel x86/syscalls: Stop filling syscall arrays with *_sys_ni_syscall x86/syscalls: Switch to generic syscalltbl.sh x86/entry/x32: Rename __x32_compat_sys_* to __x64_compat_sys_*
2021-06-29Merge tag 'x86-irq-2021-06-29' of ↵Linus Torvalds1-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 interrupt related updates from Thomas Gleixner: - Consolidate the VECTOR defines and the usage sites. - Cleanup GDT/IDT related code and replace open coded ASM with proper native helper functions. * tag 'x86-irq-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kexec: Set_[gi]dt() -> native_[gi]dt_invalidate() in machine_kexec_*.c x86: Add native_[ig]dt_invalidate() x86/idt: Remove address argument from idt_invalidate() x86/irq: Add and use NR_EXTERNAL_VECTORS and NR_SYSTEM_VECTORS x86/irq: Remove unused vectors defines
2021-06-29Merge tag 'printk-for-5.14' of ↵Linus Torvalds3-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add %pt[RT]s modifier to vsprintf(). It overrides ISO 8601 separator by using ' ' (space). It produces "YYYY-mm-dd HH:MM:SS" instead of "YYYY-mm-ddTHH:MM:SS". - Correctly parse long row of numbers by sscanf() when using the field width. Add extensive sscanf() selftest. - Generalize re-entrant CPU lock that has already been used to serialize dump_stack() output. It is part of the ongoing printk rework. It will allow to remove the obsoleted printk_safe buffers and introduce atomic consoles. - Some code clean up and sparse warning fixes. * tag 'printk-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: fix cpu lock ordering lib/dump_stack: move cpu lock to printk.c printk: Remove trailing semicolon in macros random32: Fix implicit truncation warning in prandom_seed_state() lib: test_scanf: Remove pointless use of type_min() with unsigned types selftests: lib: Add wrapper script for test_scanf lib: test_scanf: Add tests for sscanf number conversion lib: vsprintf: Fix handling of number field widths in vsscanf lib: vsprintf: scanf: Negative number must have field width > 1 usb: host: xhci-tegra: Switch to use %ptTs nilfs2: Switch to use %ptTs kdb: Switch to use %ptTs lib/vsprintf: Allow to override ISO 8601 date and time separator
2021-06-29mm/gup_benchmark: support threadingPeter Xu1-31/+65
Patch series "mm/gup: Fix pin page write cache bouncing on has_pinned", v2. This series contains 3 patches, the 1st one enables threading for gup_benchmark in the kselftest. The latter two patches are collected from Andrea's local branch which can fix write cache bouncing issue with pinning fast-gup. To be explicit on the latter two patches: - the 2nd patch fixes the perf degrade when introducing has_pinned, then - the last patch tries to remove the has_pinned with a bit in mm->flags For patch 3: originally I think we had a plan to reuse has_pinned into a counter very soon, however that's not happening at least until today, so maybe it proves that we can remove it until we really want such a counter for whatever reason. As the commit message stated, it saves 4 bytes for each mm without observable regressions. Regarding testing: we can reference to the commit message of patch 2 for some detailed testing with will-is-scale. Meanwhile I did patch 1 just because then we can even easily verify the patchset using the existing kselftest facilities or even regress test it in the future with the repo if we want. Below numbers are extra verification tests that I did besides commit message of patch 2 using the new gup_benchmark and 256 cpus. Below test is done on 40 cpus host with Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz, and I can get similar result (of course the write cache bouncing get severe with even more cores). After patch 1 applied (only test patch, so using old kernel): $ sudo chrt -f 1 ./gup_test -a -m 512 -j 40 PIN_FAST_BENCHMARK: Time: get:459632 put:5990 us PIN_FAST_BENCHMARK: Time: get:461967 put:5840 us PIN_FAST_BENCHMARK: Time: get:464521 put:6140 us PIN_FAST_BENCHMARK: Time: get:465176 put:7100 us PIN_FAST_BENCHMARK: Time: get:465960 put:6733 us PIN_FAST_BENCHMARK: Time: get:465324 put:6781 us PIN_FAST_BENCHMARK: Time: get:466018 put:7130 us PIN_FAST_BENCHMARK: Time: get:466362 put:7118 us PIN_FAST_BENCHMARK: Time: get:465118 put:6975 us PIN_FAST_BENCHMARK: Time: get:466422 put:6602 us PIN_FAST_BENCHMARK: Time: get:465791 put:6818 us PIN_FAST_BENCHMARK: Time: get:467091 put:6298 us PIN_FAST_BENCHMARK: Time: get:467694 put:5432 us PIN_FAST_BENCHMARK: Time: get:469575 put:5581 us PIN_FAST_BENCHMARK: Time: get:468124 put:6055 us PIN_FAST_BENCHMARK: Time: get:468877 put:6720 us PIN_FAST_BENCHMARK: Time: get:467212 put:4961 us PIN_FAST_BENCHMARK: Time: get:467834 put:6697 us PIN_FAST_BENCHMARK: Time: get:470778 put:6398 us PIN_FAST_BENCHMARK: Time: get:469788 put:6310 us PIN_FAST_BENCHMARK: Time: get:488277 put:7113 us PIN_FAST_BENCHMARK: Time: get:486613 put:7085 us PIN_FAST_BENCHMARK: Time: get:486940 put:7202 us PIN_FAST_BENCHMARK: Time: get:488728 put:7101 us PIN_FAST_BENCHMARK: Time: get:487570 put:7327 us PIN_FAST_BENCHMARK: Time: get:489260 put:7027 us PIN_FAST_BENCHMARK: Time: get:488846 put:6866 us PIN_FAST_BENCHMARK: Time: get:488521 put:6745 us PIN_FAST_BENCHMARK: Time: get:489950 put:6459 us PIN_FAST_BENCHMARK: Time: get:489777 put:6617 us PIN_FAST_BENCHMARK: Time: get:488224 put:6591 us PIN_FAST_BENCHMARK: Time: get:488644 put:6477 us PIN_FAST_BENCHMARK: Time: get:488754 put:6711 us PIN_FAST_BENCHMARK: Time: get:488875 put:6743 us PIN_FAST_BENCHMARK: Time: get:489290 put:6657 us PIN_FAST_BENCHMARK: Time: get:490264 put:6684 us PIN_FAST_BENCHMARK: Time: get:489631 put:6737 us PIN_FAST_BENCHMARK: Time: get:488434 put:6655 us PIN_FAST_BENCHMARK: Time: get:492213 put:6297 us PIN_FAST_BENCHMARK: Time: get:491124 put:6173 us After the whole series applied (new fixed kernel): $ sudo chrt -f 1 ./gup_test -a -m 512 -j 40 PIN_FAST_BENCHMARK: Time: get:82038 put:7041 us PIN_FAST_BENCHMARK: Time: get:82144 put:6817 us PIN_FAST_BENCHMARK: Time: get:83417 put:6674 us PIN_FAST_BENCHMARK: Time: get:82540 put:6594 us PIN_FAST_BENCHMARK: Time: get:83214 put:6681 us PIN_FAST_BENCHMARK: Time: get:83444 put:6889 us PIN_FAST_BENCHMARK: Time: get:83194 put:7499 us PIN_FAST_BENCHMARK: Time: get:84876 put:7369 us PIN_FAST_BENCHMARK: Time: get:86092 put:10289 us PIN_FAST_BENCHMARK: Time: get:86153 put:10415 us PIN_FAST_BENCHMARK: Time: get:85026 put:7751 us PIN_FAST_BENCHMARK: Time: get:85458 put:7944 us PIN_FAST_BENCHMARK: Time: get:85735 put:8154 us PIN_FAST_BENCHMARK: Time: get:85851 put:8299 us PIN_FAST_BENCHMARK: Time: get:86323 put:9617 us PIN_FAST_BENCHMARK: Time: get:86288 put:10496 us PIN_FAST_BENCHMARK: Time: get:87697 put:9346 us PIN_FAST_BENCHMARK: Time: get:87980 put:8382 us PIN_FAST_BENCHMARK: Time: get:88719 put:8400 us PIN_FAST_BENCHMARK: Time: get:87616 put:8588 us PIN_FAST_BENCHMARK: Time: get:86730 put:9563 us PIN_FAST_BENCHMARK: Time: get:88167 put:8673 us PIN_FAST_BENCHMARK: Time: get:86844 put:9777 us PIN_FAST_BENCHMARK: Time: get:88068 put:11774 us PIN_FAST_BENCHMARK: Time: get:86170 put:15676 us PIN_FAST_BENCHMARK: Time: get:87967 put:12827 us PIN_FAST_BENCHMARK: Time: get:95773 put:7652 us PIN_FAST_BENCHMARK: Time: get:87734 put:13650 us PIN_FAST_BENCHMARK: Time: get:89833 put:14237 us PIN_FAST_BENCHMARK: Time: get:96186 put:8029 us PIN_FAST_BENCHMARK: Time: get:95532 put:8886 us PIN_FAST_BENCHMARK: Time: get:95351 put:5826 us PIN_FAST_BENCHMARK: Time: get:96401 put:8407 us PIN_FAST_BENCHMARK: Time: get:96473 put:8287 us PIN_FAST_BENCHMARK: Time: get:97177 put:8430 us PIN_FAST_BENCHMARK: Time: get:98120 put:5263 us PIN_FAST_BENCHMARK: Time: get:96271 put:7757 us PIN_FAST_BENCHMARK: Time: get:99628 put:10467 us PIN_FAST_BENCHMARK: Time: get:99344 put:10045 us PIN_FAST_BENCHMARK: Time: get:94212 put:15485 us Summary: Old kernel: 477729.97 (+-3.79%) New kernel: 89144.65 (+-11.76%) This patch (of 3): Add a new parameter "-j N" to support concurrent gup test. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: John Hubbard <[email protected]> Cc: Jan Kara <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Kirill Shutemov <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Jann Horn <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>