aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-11-03tools x86 headers: Update cpufeatures.h headers copiesArnaldo Carvalho de Melo2-2/+13
To pick the changes from: 5866e9205b47a983 ("x86/cpu: Add hardware-enforced cache coherency as a CPUID feature") ff4f82816dff28ff ("x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions") 360e7c5c4ca4fd8e ("x86/cpufeatures: Add SEV-ES CPU feature") 18ec63faefb3fd31 ("x86/cpufeatures: Enumerate TSX suspend load address tracking instructions") e48cb1a3fb916500 ("x86/resctrl: Enumerate per-thread MBA controls") Which don't cause any changes in tooling, just addresses these build warnings: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h' diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Krish Sadhukhan <[email protected]> Cc: Kyung Min Park <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Tom Lendacky <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-11-03tools headers UAPI: Update fscrypt.h copyArnaldo Carvalho de Melo1-3/+3
To get the changes from: c7f0207b613033c5 ("fscrypt: make "#define fscrypt_policy" user-only") That don't cause any changes in tools/perf, only addresses this perf tools build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h' diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h Cc: Adrian Hunter <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-11-03tools headers UAPI: Sync drm/i915_drm.h with the kernel sourcesArnaldo Carvalho de Melo1-3/+56
To pick the changes in: 13149e8bafc46572 ("drm/i915: add syncobj timeline support") cda9edd02425d790 ("drm/i915: introduce a mechanism to extend execbuf2") That don't result in any changes in tooling, just silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h' diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Lionel Landwerlin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Rodrigo Vivi <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-11-03tools headers UAPI: Sync prctl.h with the kernel sourcesArnaldo Carvalho de Melo1-0/+9
To get the changes in: 1c101da8b971a366 ("arm64: mte: Allow user control of the tag check mode via prctl()") af5ce95282dc99d0 ("arm64: mte: Allow user control of the generated random tags via prctl()") Which don't cause any change in tooling, only addresses this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h' diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h Cc: Adrian Hunter <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-11-03perf scripting python: Avoid declaring function pointers with a visibility ↵Arnaldo Carvalho de Melo1-5/+2
attribute To avoid this: util/scripting-engines/trace-event-python.c: In function 'python_start_script': util/scripting-engines/trace-event-python.c:1595:2: error: 'visibility' attribute ignored [-Werror=attributes] 1595 | PyMODINIT_FUNC (*initfunc)(void); | ^~~~~~~~~~~~~~ That started breaking when building with PYTHON=python3 and these gcc versions (I haven't checked with the clang ones, maybe it breaks there as well): # export PERF_TARBALL=http://192.168.86.5/perf/perf-5.9.0.tar.xz # dm fedora:33 fedora:rawhide 1 107.80 fedora:33 : Ok gcc (GCC) 10.2.1 20201005 (Red Hat 10.2.1-5), clang version 11.0.0 (Fedora 11.0.0-1.fc33) 2 92.47 fedora:rawhide : Ok gcc (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6), clang version 11.0.0 (Fedora 11.0.0-1.fc34) # Avoid that by ditching that 'initfunc' function pointer with its: #define Py_EXPORTED_SYMBOL _attribute_ ((visibility ("default"))) #define PyMODINIT_FUNC Py_EXPORTED_SYMBOL PyObject* And just call PyImport_AppendInittab() at the end of the ifdef python3 block with the functions that were being attributed to that initfunc. Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-11-03perf tools: Remove broken __no_tail_call attributePeter Zijlstra3-20/+5
The GCC specific __attribute__((optimize)) attribute does not what is commonly expected and is explicitly recommended against using in production code by the GCC people. Unlike what is often expected, it doesn't add to the optimization flags, but it fully replaces them, loosing any and all optimization flags provided by the compiler commandline. The only guaranteed upon means of inhibiting tail-calls is by placing a volatile asm with side-effects after the call such that the tail-call simply cannot be done. Given the original commit wasn't specific on which calls were the problem, this removal might re-introduce the problem, which can then be re-analyzed and cured properly. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Acked-by: Miguel Ojeda <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Arvind Sankar <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kees Kook <[email protected]> Cc: Martin Liška <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-11-03perf vendor events: Fix DRAM_BW_Use 0 issue for CLX/SKXJin Yao2-2/+2
Ian reports an issue that the metric DRAM_BW_Use often remains 0. The metric expression for DRAM_BW_Use on CLX/SKX: "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time" The counts of uncore_imc/cas_count_read/ and uncore_imc/cas_count_write/ are scaled up by 64, that is to turn a count of cache lines into bytes, the count is then divided by 1000000000 to give GB. However, the counts of uncore_imc/cas_count_read/ and uncore_imc/cas_count_write/ have been scaled yet. The scale values are from sysfs, such as /sys/devices/uncore_imc_0/events/cas_count_read.scale. It's 6.103515625e-5 (64 / 1024.0 / 1024.0). So if we use original metric expression, the result is not correct. But the difficulty is, for SKL client, the counts are not scaled. The metric expression for DRAM_BW_Use on SKL: "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000" root@kbl-ppc:~# perf stat -M DRAM_BW_Use -a -- sleep 1 Performance counter stats for 'system wide': 190 arb/event=0x84,umask=0x1/ # 1.86 DRAM_BW_Use 29,093,178 arb/event=0x81,umask=0x1/ 1,000,703,287 ns duration_time 1.000703287 seconds time elapsed The result is expected. So the easy way is just change the metric expression for CLX/SKX. This patch changes the metric expression to: "( ( ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) * 1048576 ) / 1000000000 ) / duration_time" 1048576 = 1024 * 1024. Before (tested on CLX): root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1 Performance counter stats for 'system wide': 765.35 MiB uncore_imc/cas_count_read/ # 0.00 DRAM_BW_Use 5.42 MiB uncore_imc/cas_count_write/ 1001515088 ns duration_time 1.001515088 seconds time elapsed After: root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1 Performance counter stats for 'system wide': 767.95 MiB uncore_imc/cas_count_read/ # 0.80 DRAM_BW_Use 5.02 MiB uncore_imc/cas_count_write/ 1001900010 ns duration_time 1.001900010 seconds time elapsed Fixes: 038d3b53c284 ("perf vendor events intel: Update CascadelakeX events to v1.08") Fixes: b5ff7f2799a4 ("perf vendor events: Update SkylakeX events to v1.21") Signed-off-by: Jin Yao <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-11-03perf trace: Fix segfault when trying to trace events by cgroupStanislav Ivanichkin1-6/+9
# ./perf trace -e sched:sched_switch -G test -a sleep 1 perf: Segmentation fault Obtained 11 stack frames. ./perf(sighandler_dump_stack+0x43) [0x55cfdc636db3] /lib/x86_64-linux-gnu/libc.so.6(+0x3efcf) [0x7fd23eecafcf] ./perf(parse_cgroups+0x36) [0x55cfdc673f36] ./perf(+0x3186ed) [0x55cfdc70d6ed] ./perf(parse_options_subcommand+0x629) [0x55cfdc70e999] ./perf(cmd_trace+0x9c2) [0x55cfdc5ad6d2] ./perf(+0x1e8ae0) [0x55cfdc5ddae0] ./perf(+0x1e8ded) [0x55cfdc5ddded] ./perf(main+0x370) [0x55cfdc556f00] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fd23eeadb96] ./perf(_start+0x29) [0x55cfdc557389] Segmentation fault # It happens because "struct trace" in option->value is passed to the parse_cgroups function instead of "struct evlist". Fixes: 9ea42ba4411ac ("perf trace: Support setting cgroups as targets") Signed-off-by: Stanislav Ivanichkin <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Dmitry Monakhov <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-11-03perf tools: Fix crash with non-jited bpf progsTommi Rantala3-1/+19
The addr in PERF_RECORD_KSYMBOL events for non-jited bpf progs points to the bpf interpreter, ie. within kernel text section. When processing the unregister event, this causes unexpected removal of vmlinux_map, crashing perf later in cleanup: # perf record -- timeout --signal=INT 2s /usr/share/bcc/tools/execsnoop PCOMM PID PPID RET ARGS [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.208 MB perf.data (5155 samples) ] perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed. Aborted (core dumped) # perf script -D|grep KSYM 0 0xa40 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_f958f6eb72ef5af6 0 0xab0 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_8c42dee26e8cd4c2 0 0xb20 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_f958f6eb72ef5af6 108563691893 0x33d98 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3b0 len 0 type 1 flags 0x0 name bpf_prog_bc5697a410556fc2_syscall__execve 108568518458 0x34098 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3f0 len 0 type 1 flags 0x0 name bpf_prog_45e2203c2928704d_do_ret_sys_execve 109301967895 0x34830 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3b0 len 0 type 1 flags 0x1 name bpf_prog_bc5697a410556fc2_syscall__execve 109302007356 0x348b0 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3f0 len 0 type 1 flags 0x1 name bpf_prog_45e2203c2928704d_do_ret_sys_execve perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed. Here the addresses match the bpf interpreter: # grep -e ffffffffa9b6b530 -e ffffffffa9b6b3b0 -e ffffffffa9b6b3f0 /proc/kallsyms ffffffffa9b6b3b0 t __bpf_prog_run224 ffffffffa9b6b3f0 t __bpf_prog_run192 ffffffffa9b6b530 t __bpf_prog_run32 Fix by not allowing vmlinux_map to be removed by PERF_RECORD_KSYMBOL unregister event. Signed-off-by: Tommi Rantala <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-11-03tools headers UAPI: Update process_madvise affected filesArnaldo Carvalho de Melo2-5/+10
To pick the changes from: ecb8ac8b1f146915 ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API") That addresses these perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h' diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-11-03perf tools: Update copy of libbpf's hashmap.cArnaldo Carvalho de Melo2-0/+15
To pick the changes in: 85367030a6c7ef33 ("libbpf: Centralize poisoning and poison reallocarray()") 7d9c71e10baa3496 ("libbpf: Extract generic string hashing function for reuse") That don't entail any changes in tools/perf. This addresses this perf build warning: Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h' diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h Not a kernel ABI, its just that this uses the mechanism in place for checking kernel ABI files drift. Cc: Adrian Hunter <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-11-03perf tools: Remove LTO compiler options when building perl supportJustin M. Forbes1-0/+1
To avoid breaking the build by mixing files compiled with things coming from distro specific compiler options for perl with the rest of perf, i.e. to avoid this: `.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of /tmp/build/perf/util/scripting-engines/perf-in.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.stdcpredef.h.19.8dc41bed5d9037ff9622e015fb5f0ce3]' of /tmp/build/perf/util/scripting-engines/perf-in.o Noticed on Fedora 33. Signed-off-by: Justin M. Forbes <[email protected]> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1593431 Cc: Jiri Olsa <[email protected]> Link: https://src.fedoraproject.org/rpms/kernel-tools/c/589a32b62f0c12516ab7b34e3dd30d450145bfa4?branch=master Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-11-03Fonts: Replace discarded const qualifierLee Jones13-13/+13
Commit 6735b4632def ("Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts") introduced the following error when building rpc_defconfig (only this build appears to be affected): `acorndata_8x8' referenced in section `.text' of arch/arm/boot/compressed/ll_char_wr.o: defined in discarded section `.data' of arch/arm/boot/compressed/font.o `acorndata_8x8' referenced in section `.data.rel.ro' of arch/arm/boot/compressed/font.o: defined in discarded section `.data' of arch/arm/boot/compressed/font.o make[3]: *** [/scratch/linux/arch/arm/boot/compressed/Makefile:191: arch/arm/boot/compressed/vmlinux] Error 1 make[2]: *** [/scratch/linux/arch/arm/boot/Makefile:61: arch/arm/boot/compressed/vmlinux] Error 2 make[1]: *** [/scratch/linux/arch/arm/Makefile:317: zImage] Error 2 The .data section is discarded at link time. Reinstating acorndata_8x8 as const ensures it is still available after linking. Do the same for the other 12 built-in fonts as well, for consistency purposes. Cc: <[email protected]> Cc: Russell King <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Fixes: 6735b4632def ("Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts") Signed-off-by: Lee Jones <[email protected]> Co-developed-by: Peilin Ye <[email protected]> Signed-off-by: Peilin Ye <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-11-03arm64: NUMA: Kconfig: Increase NODES_SHIFT to 4Vanshidhar Konda1-1/+1
The current arm64 default config limits max NUMA nodes available on system to 4 (NODES_SHIFT = 2). Today's arm64 systems can reach or exceed 16 NUMA nodes. To accomodate current hardware and to fit NODES_SHIFT within page flags on arm64, increase NODES_SHIFT to 4. Signed-off-by: Vanshidhar Konda <[email protected]> Acked-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2020-11-03nvme-tcp: avoid repeated request completionSagi Grimberg1-1/+1
The request may be executed asynchronously, and rq->state may be changed to IDLE. To avoid repeated request completion, only MQ_RQ_COMPLETE of rq->state is checked in nvme_tcp_complete_timed_out. It is not safe, so need adding check IDLE for rq->state. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Chao Leng <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-11-03nvme-rdma: avoid repeated request completionSagi Grimberg1-1/+1
The request may be executed asynchronously, and rq->state may be changed to IDLE. To avoid repeated request completion, only MQ_RQ_COMPLETE of rq->state is checked in nvme_rdma_complete_timed_out. It is not safe, so need adding check IDLE for rq->state. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Chao Leng <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-11-03nvme-tcp: avoid race between time out and tear downChao Leng1-11/+3
Now use teardown_lock to serialize for time out and tear down. This may cause abnormal: first cancel all request in tear down, then time out may complete the request again, but the request may already be freed or restarted. To avoid race between time out and tear down, in tear down process, first we quiesce the queue, and then delete the timer and cancel the time out work for the queue. At the same time we need to delete teardown_lock. Signed-off-by: Chao Leng <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-11-03nvme-rdma: avoid race between time out and tear downChao Leng1-10/+2
Now use teardown_lock to serialize for time out and tear down. This may cause abnormal: first cancel all request in tear down, then time out may complete the request again, but the request may already be freed or restarted. To avoid race between time out and tear down, in tear down process, first we quiesce the queue, and then delete the timer and cancel the time out work for the queue. At the same time we need to delete teardown_lock. Signed-off-by: Chao Leng <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-11-03nvme: introduce nvme_sync_io_queuesChao Leng2-2/+7
Introduce sync io queues for some scenarios which just only need sync io queues not sync all queues. Signed-off-by: Chao Leng <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-11-03drm/vc4: drv: Remove unused variableMaxime Ripard1-1/+0
The commit dcda7c28bff2 ("drm/vc4: kms: Add functions to create the state objects") removed the last users of the vc4 variable, but didn't remove that variable resulting in a warning. Fixes: dcda7c28bff2 ("drm/vc4: kms: Add functions to create the state objects") Reported-by: Daniel Vetter <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-11-03drm/panfrost: Fix module unloadSteven Price1-2/+3
When unloading the call to pm_runtime_put_sync_suspend() will attempt to turn the GPU cores off, however panfrost_device_fini() will have turned the clocks off. This leads to the hardware locking up. Instead don't call pm_runtime_put_sync_suspend() and instead simply mark the device as suspended using pm_runtime_set_suspended(). And also include this on the error path in panfrost_probe(). Fixes: aebe8c22a912 ("drm/panfrost: Fix possible suspend in panfrost_remove") Signed-off-by: Steven Price <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-11-03drm/panfrost: Fix a deadlock between the shrinker and madvise pathBoris Brezillon3-7/+13
panfrost_ioctl_madvise() and panfrost_gem_purge() acquire the mappings and shmem locks in different orders, thus leading to a potential the mappings lock first. Fixes: bdefca2d8dc0 ("drm/panfrost: Add the panfrost_gem_mapping concept") Cc: <[email protected]> Cc: Christian Hewitt <[email protected]> Reported-by: Christian Hewitt <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-11-02sfp: Fix error handing in sfp_probe()YueHaibing1-1/+2
gpiod_to_irq() never return 0, but returns negative in case of error, check it and set gpio_irq to 0. Fixes: 73970055450e ("sfp: add SFP module support") Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-11-02powerpc/vnic: Extend "failover pending" windowSukadev Bhattiprolu1-4/+32
Commit 5a18e1e0c193b introduced the 'failover_pending' state to track the "failover pending window" - where we wait for the partner to become ready (after a transport event) before actually attempting to failover. i.e window is between following two events: a. we get a transport event due to a FAILOVER b. later, we get CRQ_INITIALIZED indicating the partner is ready at which point we schedule a FAILOVER reset. and ->failover_pending is true during this window. If during this window, we attempt to open (or close) a device, we pretend that the operation succeded and let the FAILOVER reset path complete the operation. This is fine, except if the transport event ("a" above) occurs during the open and after open has already checked whether a failover is pending. If that happens, we fail the open, which can cause the boot scripts to leave the interface down requiring administrator to manually bring up the device. This fix "extends" the failover pending window till we are _actually_ ready to perform the failover reset (i.e until after we get the RTNL lock). Since open() holds the RTNL lock, we can be sure that we either finish the open or if the open() fails due to the failover pending window, we can again pretend that open is done and let the failover complete it. We could try and block the open until failover is completed but a) that could still timeout the application and b) Existing code "pretends" that failover occurred "just after" open succeeded, so marks the open successful and lets the failover complete the open. So, mark the open successful even if the transport event occurs before we actually start the open. Fixes: 5a18e1e0c193 ("ibmvnic: Fix failover case for non-redundant configuration") Signed-off-by: Sukadev Bhattiprolu <[email protected]> Acked-by: Dany Madden <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-11-02RDMA/vmw_pvrdma: Fix the active_speed and phys_state valueAdit Ranadive1-1/+1
The pvrdma_port_attr structure is ABI toward the hypervisor, changing it breaks the ability to report the speed properly. Revert the change to u16. Fixes: 376ceb31ff87 ("RDMA: Fix link active_speed size") Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Vishnu Dasa <[email protected]> Signed-off-by: Adit Ranadive <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-11-02net: dsa: qca8k: Fix port MTU settingJonathan McDowell1-2/+2
The qca8k only supports a switch-wide MTU setting, and the code to take the max of all ports was only looking at the port currently being set. Fix to examine all ports. Reported-by: DENG Qingfang <[email protected]> Fixes: f58d2598cf70 ("net: dsa: qca8k: implement the port MTU callbacks") Signed-off-by: Jonathan McDowell <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-11-02scsi: mpt3sas: Fix timeouts observed while reenabling IRQSreekanth Reddy1-0/+7
While reenabling the IRQ after irq poll there may be small time window where HBA firmware has posted some replies and raise the interrupts but driver has not received the interrupts. So we may observe I/O timeouts as the driver has not processed the replies as interrupts got missed while reenabling the IRQ. To fix this issue the driver has to go for one more round of processing the reply descriptors from reply descriptor post queue after enabling the IRQ. Link: https://lore.kernel.org/r/[email protected] Reported-by: Tomas Henzl <[email protected]> Reviewed-by: Tomas Henzl <[email protected]> Signed-off-by: Sreekanth Reddy <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2020-11-02scsi: scsi_dh_alua: Avoid crash during alua_bus_detach()Hannes Reinecke1-4/+5
alua_bus_detach() might be running concurrently with alua_rtpg_work(), so we might trip over h->sdev == NULL and call BUG_ON(). The correct way of handling it is to not set h->sdev to NULL in alua_bus_detach(), and call rcu_synchronize() before the final delete to ensure that all concurrent threads have left the critical section. Then we can get rid of the BUG_ON() and replace it with a simple if condition. Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Cc: Brian Bunker <[email protected]> Acked-by: Brian Bunker <[email protected]> Tested-by: Jitendra Khasdev <[email protected]> Reviewed-by: Jitendra Khasdev <[email protected]> Signed-off-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2020-11-02sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian platformsPetr Malat1-2/+2
Commit 978aa0474115 ("sctp: fix some type cast warnings introduced since very beginning")' broke err reading from sctp_arg, because it reads the value as 32-bit integer, although the value is stored as 16-bit integer. Later this value is passed to the userspace in 16-bit variable, thus the user always gets 0 on big-endian platforms. Fix it by reading the __u16 field of sctp_arg union, as reading err field would produce a sparse warning. Fixes: 978aa0474115 ("sctp: fix some type cast warnings introduced since very beginning") Signed-off-by: Petr Malat <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-11-02tty: make FONTX ioctl use the tty pointer they were actually passedLinus Torvalds1-17/+19
Some of the font tty ioctl's always used the current foreground VC for their operations. Don't do that then. This fixes a data race on fg_console. Side note: both Michael Ellerman and Jiri Slaby point out that all these ioctls are deprecated, and should probably have been removed long ago, and everything seems to be using the KDFONTOP ioctl instead. In fact, Michael points out that it looks like busybox's loadfont program seems to have switched over to using KDFONTOP exactly _because_ of this bug (ahem.. 12 years ago ;-). Reported-by: Minh Yuan <[email protected]> Acked-by: Michael Ellerman <[email protected]> Acked-by: Jiri Slaby <[email protected]> Cc: Greg KH <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds14-105/+275
Merge misc fixes from Andrew Morton: "Subsystems affected by this patch series: mm (memremap, memcg, slab-generic, kasan, mempolicy, pagecache, oom-kill, pagemap), kthread, signals, lib, epoll, and core-kernel" * emailed patches from Andrew Morton <[email protected]>: kernel/hung_task.c: make type annotations consistent epoll: add a selftest for epoll timeout race mm: always have io_remap_pfn_range() set pgprot_decrypted() mm, oom: keep oom_adj under or at upper limit when printing kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled mm/truncate.c: make __invalidate_mapping_pages() static lib/crc32test: remove extra local_irq_disable/enable ptrace: fix task_join_group_stop() for the case when current is traced mm: mempolicy: fix potential pte_unmap_unlock pte error kasan: adopt KUNIT tests to SW_TAGS mode mm: memcg: link page counters to root if use_hierarchy is false mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg hugetlb_cgroup: fix reservation accounting mm/mremap_pages: fix static key devmap_managed_key updates
2020-11-02net: ethernet: ti: cpsw: disable PTPv1 hw timestamping advertisementGrygorii Strashko2-5/+1
The TI CPTS does not natively support PTPv1, only PTPv2. But, as it happens, the CPTS can provide HW timestamp for PTPv1 Sync messages, because CPTS HW parser looks for PTP messageType id in PTP message octet 0 which value is 0 for PTPv1. As result, CPTS HW can detect Sync messages for PTPv1 and PTPv2 (Sync messageType = 0 for both), but it fails for any other PTPv1 messages (Delay_req/resp) and will return PTP messageType id 0 for them. The commit e9523a5a32a1 ("net: ethernet: ti: cpsw: enable HWTSTAMP_FILTER_PTP_V1_L4_EVENT filter") added PTPv1 hw timestamping advertisement by mistake, only to make Linux Kernel "timestamping" utility work, and this causes issues with only PTPv1 compatible HW/SW - Sync HW timestamped, but Delay_req/resp are not. Hence, fix it disabling PTPv1 hw timestamping advertisement, so only PTPv1 compatible HW/SW can properly roll back to SW timestamping. Fixes: e9523a5a32a1 ("net: ethernet: ti: cpsw: enable HWTSTAMP_FILTER_PTP_V1_L4_EVENT filter") Signed-off-by: Grygorii Strashko <[email protected]> Acked-by: Richard Cochran <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-11-02vfio/fsl-mc: return -EFAULT if copy_to_user() failsDan Carpenter1-2/+6
The copy_to_user() function returns the number of bytes remaining to be copied, but this code should return -EFAULT. Fixes: df747bcd5b21 ("vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Diana Craciun <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-11-02vfio/type1: Use the new helper to find vfio_groupZenghui Yu1-12/+5
When attaching a new group to the container, let's use the new helper vfio_iommu_find_iommu_group() to check if it's already attached. There is no functional change. Also take this chance to add a missing blank line. Signed-off-by: Zenghui Yu <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-11-02tracing: Make -ENOMEM the default error for parse_synth_field()Steven Rostedt (VMware)1-10/+7
parse_synth_field() returns a pointer and requires that errors get surrounded by ERR_PTR(). The ret variable is initialized to zero, but should never be used as zero, and if it is, it could cause a false return code and produce a NULL pointer dereference. It makes no sense to set ret to zero. Set ret to -ENOMEM (the most common error case), and have any other errors set it to something else. This removes the need to initialize ret on *every* error branch. Fixes: 761a8c58db6b ("tracing, synthetic events: Replace buggy strcat() with seq_buf operations") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-02ring-buffer: Fix recursion protection transitions between interrupt contextSteven Rostedt (VMware)1-12/+46
The recursion protection of the ring buffer depends on preempt_count() to be correct. But it is possible that the ring buffer gets called after an interrupt comes in but before it updates the preempt_count(). This will trigger a false positive in the recursion code. Use the same trick from the ftrace function callback recursion code which uses a "transition" bit that gets set, to allow for a single recursion for to handle transitions between contexts. Cc: [email protected] Fixes: 567cd4da54ff4 ("ring-buffer: User context bit recursion checking") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-11-02gfs2: Don't call cancel_delayed_work_sync from within delete work functionAndreas Gruenbacher1-1/+2
Right now, we can end up calling cancel_delayed_work_sync from within delete_work_func via gfs2_lookup_by_inum -> gfs2_inode_lookup -> gfs2_cancel_delete_work. When that happens, it will result in a deadlock. Instead, gfs2_inode_lookup should skip the call to gfs2_cancel_delete_work when called from delete_work_func (blktype == GFS2_BLKST_UNLINKED). Reported-by: Alexander Ahring Oder Aring <[email protected]> Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work") Cc: [email protected] # v5.8+ Signed-off-by: Andreas Gruenbacher <[email protected]>
2020-11-02kernel/hung_task.c: make type annotations consistentLukas Bulwahn1-2/+1
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") removed various __user annotations from function signatures as part of its refactoring. It also removed the __user annotation for proc_dohung_task_timeout_secs() at its declaration in sched/sysctl.h, but not at its definition in kernel/hung_task.c. Hence, sparse complains: kernel/hung_task.c:271:5: error: symbol 'proc_dohung_task_timeout_secs' redeclared with different type (incompatible argument 3 (different address spaces)) Adjust the annotation at the definition fitting to that refactoring to make sparse happy again, which also resolves this warning from sparse: kernel/hung_task.c:277:52: warning: incorrect type in argument 3 (different address spaces) kernel/hung_task.c:277:52: expected void * kernel/hung_task.c:277:52: got void [noderef] __user *buffer No functional change. No change in object code. Signed-off-by: Lukas Bulwahn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Al Viro <[email protected]> Cc: Andrey Ignatov <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02epoll: add a selftest for epoll timeout raceSoheil Hassas Yeganeh1-0/+95
Add a test case to ensure an event is observed by at least one poller when an epoll timeout is used. Signed-off-by: Guantao Liu <[email protected]> Signed-off-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Khazhismel Kumykov <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Cc: Al Viro <[email protected]> Cc: Davidlohr Bueso <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02mm: always have io_remap_pfn_range() set pgprot_decrypted()Jason Gunthorpe2-4/+9
The purpose of io_remap_pfn_range() is to map IO memory, such as a memory mapped IO exposed through a PCI BAR. IO devices do not understand encryption, so this memory must always be decrypted. Automatically call pgprot_decrypted() as part of the generic implementation. This fixes a bug where enabling AMD SME causes subsystems, such as RDMA, using io_remap_pfn_range() to expose BAR pages to user space to fail. The CPU will encrypt access to those BAR pages instead of passing unencrypted IO directly to the device. Places not mapping IO should use remap_pfn_range(). Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption") Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brijesh Singh <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: "Dave Young" <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Larry Woodman <[email protected]> Cc: Matt Fleming <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Toshimitsu Kani <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02mm, oom: keep oom_adj under or at upper limit when printingCharles Haithcock1-0/+2
For oom_score_adj values in the range [942,999], the current calculations will print 16 for oom_adj. This patch simply limits the output so output is inline with docs. Signed-off-by: Charles Haithcock <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02kthread_worker: prevent queuing delayed work from timer_fn when it is being ↵Zqiang1-1/+2
canceled There is a small race window when a delayed work is being canceled and the work still might be queued from the timer_fn: CPU0 CPU1 kthread_cancel_delayed_work_sync() __kthread_cancel_work_sync() __kthread_cancel_work() work->canceling++; kthread_delayed_work_timer_fn() kthread_insert_work(); BUG: kthread_insert_work() should not get called when work->canceling is set. Signed-off-by: Zqiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02mm/truncate.c: make __invalidate_mapping_pages() staticJason Yan1-1/+1
Fix the following sparse warning: mm/truncate.c:531:15: warning: symbol '__invalidate_mapping_pages' was not declared. Should it be static? Fixes: eb1d7a65f08a ("mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED") Signed-off-by: Jason Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Yafang Shao <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02lib/crc32test: remove extra local_irq_disable/enableVasily Gorbik1-4/+0
Commit 4d004099a668 ("lockdep: Fix lockdep recursion") uncovered the following issue in lib/crc32test reported on s390: BUG: using __this_cpu_read() in preemptible [00000000] code: swapper/0/1 caller is lockdep_hardirqs_on_prepare+0x48/0x270 CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.9.0-next-20201015-15164-g03d992bd2de6 #19 Hardware name: IBM 3906 M04 704 (LPAR) Call Trace: lockdep_hardirqs_on_prepare+0x48/0x270 trace_hardirqs_on+0x9c/0x1b8 crc32_test.isra.0+0x170/0x1c0 crc32test_init+0x1c/0x40 do_one_initcall+0x40/0x130 do_initcalls+0x126/0x150 kernel_init_freeable+0x1f6/0x230 kernel_init+0x22/0x150 ret_from_fork+0x24/0x2c no locks held by swapper/0/1. Remove extra local_irq_disable/local_irq_enable helpers calls. Fixes: 5fb7f87408f1 ("lib: add module support to crc32 tests") Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Link: https://lkml.kernel.org/r/patch.git-4369da00c06e.your-ad-here.call-01602859837-ext-1679@work.hours Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02ptrace: fix task_join_group_stop() for the case when current is tracedOleg Nesterov1-9/+10
This testcase #include <stdio.h> #include <unistd.h> #include <signal.h> #include <sys/ptrace.h> #include <sys/wait.h> #include <pthread.h> #include <assert.h> void *tf(void *arg) { return NULL; } int main(void) { int pid = fork(); if (!pid) { kill(getpid(), SIGSTOP); pthread_t th; pthread_create(&th, NULL, tf, NULL); return 0; } waitpid(pid, NULL, WSTOPPED); ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_TRACECLONE); waitpid(pid, NULL, 0); ptrace(PTRACE_CONT, pid, 0,0); waitpid(pid, NULL, 0); int status; int thread = waitpid(-1, &status, 0); assert(thread > 0 && thread != pid); assert(status == 0x80137f); return 0; } fails and triggers WARN_ON_ONCE(!signr) in do_jobctl_trap(). This is because task_join_group_stop() has 2 problems when current is traced: 1. We can't rely on the "JOBCTL_STOP_PENDING" check, a stopped tracee can be woken up by debugger and it can clone another thread which should join the group-stop. We need to check group_stop_count || SIGNAL_STOP_STOPPED. 2. If SIGNAL_STOP_STOPPED is already set, we should not increment sig->group_stop_count and add JOBCTL_STOP_CONSUME. The new thread should stop without another do_notify_parent_cldstop() report. To clarify, the problem is very old and we should blame ptrace_init_task(). But now that we have task_join_group_stop() it makes more sense to fix this helper to avoid the code duplication. Reported-by: [email protected] Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Christian Brauner <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Zhiqiang Liu <[email protected]> Cc: Tejun Heo <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02mm: mempolicy: fix potential pte_unmap_unlock pte errorShijie Luo1-3/+3
When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to pte_unmap_unlock seems like not a good idea. queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't migrate misplaced pages but returns with EIO when encountering such a page. Since commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified") and early break on the first pte in the range results in pte_unmap_unlock on an underflow pte. This can lead to lockups later on when somebody tries to lock the pte resp. page_table_lock again.. Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified") Signed-off-by: Shijie Luo <[email protected]> Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Feilong Lin <[email protected]> Cc: Shijie Luo <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02kasan: adopt KUNIT tests to SW_TAGS modeAndrey Konovalov1-42/+107
Now that we have KASAN-KUNIT tests integration, it's easy to see that some KASAN tests are not adopted to the SW_TAGS mode and are failing. Adjust the allocation size for kasan_memchr() and kasan_memcmp() by roung it up to OOB_TAG_OFF so the bad access ends up in a separate memory granule. Add a new kmalloc_uaf_16() tests that relies on UAF, and a new kasan_bitops_tags() test that is tailored to tag-based mode, as it's hard to adopt the existing kmalloc_oob_16() and kasan_bitops_generic() (renamed from kasan_bitops()) without losing the precision. Add new kmalloc_uaf_16() and kasan_bitops_uaf() tests that rely on UAFs, as it's hard to adopt the existing kmalloc_oob_16() and kasan_bitops_oob() (rename from kasan_bitops()) without losing the precision. Disable kasan_global_oob() and kasan_alloca_oob_left/right() as SW_TAGS mode doesn't instrument globals nor dynamic allocas. Signed-off-by: Andrey Konovalov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: David Gow <[email protected]> Link: https://lkml.kernel.org/r/76eee17b6531ca8b3ca92b240cb2fd23204aaff7.1603129942.git.andreyknvl@google.com Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02mm: memcg: link page counters to root if use_hierarchy is falseRoman Gushchin1-5/+10
Richard reported a warning which can be reproduced by running the LTP madvise6 test (cgroup v1 in the non-hierarchical mode should be used): WARNING: CPU: 0 PID: 12 at mm/page_counter.c:57 page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156) Modules linked in: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.9.0-rc7-22-default #77 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812d-rebuilt.opensuse.org 04/01/2014 Workqueue: events drain_local_stock RIP: 0010:page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156) Call Trace: __memcg_kmem_uncharge (mm/memcontrol.c:3022) drain_obj_stock (./include/linux/rcupdate.h:689 mm/memcontrol.c:3114) drain_local_stock (mm/memcontrol.c:2255) process_one_work (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274) worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416) kthread (kernel/kthread.c:292) ret_from_fork (arch/x86/entry/entry_64.S:300) The problem occurs because in the non-hierarchical mode non-root page counters are not linked to root page counters, so the charge is not propagated to the root memory cgroup. After the removal of the original memory cgroup and reparenting of the object cgroup, the root cgroup might be uncharged by draining a objcg stock, for example. It leads to an eventual underflow of the charge and triggers a warning. Fix it by linking all page counters to corresponding root page counters in the non-hierarchical mode. Please note, that in the non-hierarchical mode all objcgs are always reparented to the root memory cgroup, even if the hierarchy has more than 1 level. This patch doesn't change it. The patch also doesn't affect how the hierarchical mode is working, which is the only sane and truly supported mode now. Thanks to Richard for reporting, debugging and providing an alternative version of the fix! Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API") Reported-by: <[email protected]> Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Reviewed-by: Michal Koutný <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Debugged-by: Richard Palethorpe <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcgzhongjiang-ali1-2/+8
memcg_page_state will get the specified number in hierarchical memcg, It should multiply by HPAGE_PMD_NR rather than an page if the item is NR_ANON_THPS. [[email protected]: fix printk warning] [[email protected]: use u64 cast, per Michal] Fixes: 468c398233da ("mm: memcontrol: switch to native NR_ANON_THPS counter") Signed-off-by: zhongjiang-ali <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Link: https://lkml.kernel.org/r/1603722395-72443-1-git-send-email-zhongjiang-ali@linux.alibaba.com Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02hugetlb_cgroup: fix reservation accountingMike Kravetz1-9/+11
Michal Privoznik was using "free page reporting" in QEMU/virtio-balloon with hugetlbfs and hit the warning below. QEMU with free page hinting uses fallocate(FALLOC_FL_PUNCH_HOLE) to discard pages that are reported as free by a VM. The reporting granularity is in pageblock granularity. So when the guest reports 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE) one huge page in QEMU. WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50 Modules linked in: ... CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137 Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020 RIP: 0010:page_counter_uncharge+0x4b/0x50 ... Call Trace: hugetlb_cgroup_uncharge_file_region+0x4b/0x80 region_del+0x1d3/0x300 hugetlb_unreserve_pages+0x39/0xb0 remove_inode_hugepages+0x1a8/0x3d0 hugetlbfs_fallocate+0x3c4/0x5c0 vfs_fallocate+0x146/0x290 __x64_sys_fallocate+0x3e/0x70 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Investigation of the issue uncovered bugs in hugetlb cgroup reservation accounting. This patch addresses the found issues. Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings") Reported-by: Michal Privoznik <[email protected]> Co-developed-by: David Hildenbrand <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Michal Privoznik <[email protected]> Reviewed-by: Mina Almasry <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Cc: <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Tejun Heo <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>