aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-04-16drivers/perf: Open access for CAP_PERFMON privileged processAlexey Budankov1-2/+2
Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <[email protected]> Reviewed-by: James Morris <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16parisc/perf: open access for CAP_PERFMON privileged processAlexey Budankov1-1/+1
Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <[email protected]> Reviewed-by: James Morris <[email protected]> Acked-by: Helge Deller <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16powerpc/perf: open access for CAP_PERFMON privileged processAlexey Budankov1-2/+2
Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <[email protected]> Reviewed-by: James Morris <[email protected]> Acked-by: Anju T Sudhakar <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16trace/bpf_trace: Open access for CAP_PERFMON privileged processAlexey Budankov1-1/+1
Open access to bpf_trace monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to bpf_trace monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure bpf_trace monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <[email protected]> Reviewed-by: James Morris <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16drm/i915/perf: Open access for CAP_PERFMON privileged processAlexey Budankov1-7/+6
Open access to i915_perf monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to i915_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure i915_events monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <[email protected]> Reviewed-by: James Morris <[email protected]> Acked-by: Lionel Landwerlin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf tools: Support CAP_PERFMON capabilityAlexey Budankov5-8/+15
Extend error messages to mention CAP_PERFMON capability as an option to substitute CAP_SYS_ADMIN capability for secure system performance monitoring and observability operations. Make perf_event_paranoid_check() and __cmd_ftrace() to be aware of CAP_PERFMON capability. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to perf_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged with respect to CAP_PERFMON capability. Committer testing: Using a libcap with this patch: diff --git a/libcap/include/uapi/linux/capability.h b/libcap/include/uapi/linux/capability.h index 78b2fd4c8a95..89b5b0279b60 100644 --- a/libcap/include/uapi/linux/capability.h +++ b/libcap/include/uapi/linux/capability.h @@ -366,8 +366,9 @@ struct vfs_ns_cap_data { #define CAP_AUDIT_READ 37 +#define CAP_PERFMON 38 -#define CAP_LAST_CAP CAP_AUDIT_READ +#define CAP_LAST_CAP CAP_PERFMON #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) Note that using '38' in place of 'cap_perfmon' works to some degree with an old libcap, its only when cap_get_flag() is called that libcap performs an error check based on the maximum value known for capabilities that it will fail. This makes determining the default of perf_event_attr.exclude_kernel to fail, as it can't determine if CAP_PERFMON is in place. Using 'perf top -e cycles' avoids the default check and sets perf_event_attr.exclude_kernel to 1. As root, with a libcap supporting CAP_PERFMON: # groupadd perf_users # adduser perf -g perf_users # mkdir ~perf/bin # cp ~acme/bin/perf ~perf/bin/ # chgrp perf_users ~perf/bin/perf # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" ~perf/bin/perf # getcap ~perf/bin/perf /home/perf/bin/perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep # ls -la ~perf/bin/perf -rwxr-xr-x. 1 root perf_users 16968552 Apr 9 13:10 /home/perf/bin/perf As the 'perf' user in the 'perf_users' group: $ perf top -a --stdio Error: Failed to mmap with 1 (Operation not permitted) $ Either add the cap_ipc_lock capability to the perf binary or reduce the ring buffer size to some smaller value: $ perf top -m10 -a --stdio rounding mmap pages size to 64K (16 pages) Error: Failed to mmap with 1 (Operation not permitted) $ perf top -m4 -a --stdio Error: Failed to mmap with 1 (Operation not permitted) $ perf top -m2 -a --stdio PerfTop: 762 irqs/sec kernel:49.7% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 4 CPUs) ------------------------------------------------------------------------------------------------------ 9.83% perf [.] __symbols__insert 8.58% perf [.] rb_next 5.91% [kernel] [k] module_get_kallsym 5.66% [kernel] [k] kallsyms_expand_symbol.constprop.0 3.98% libc-2.29.so [.] __GI_____strtoull_l_internal 3.66% perf [.] rb_insert_color 2.34% [kernel] [k] vsnprintf 2.30% [kernel] [k] string_nocheck 2.16% libc-2.29.so [.] _IO_getdelim 2.15% [kernel] [k] number 2.13% [kernel] [k] format_decode 1.58% libc-2.29.so [.] _IO_feof 1.52% libc-2.29.so [.] __strcmp_avx2 1.50% perf [.] rb_set_parent_color 1.47% libc-2.29.so [.] __libc_calloc 1.24% [kernel] [k] do_syscall_64 1.17% [kernel] [k] __x86_indirect_thunk_rax $ perf record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.552 MB perf.data (74 samples) ] $ perf evlist cycles $ perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 $ perf report | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 74 of event 'cycles' # Event count (approx.): 15694834 # # Overhead Command Shared Object Symbol # ........ ............... .......................... ...................................... # 19.62% perf [kernel.vmlinux] [k] strnlen_user 13.88% swapper [kernel.vmlinux] [k] intel_idle 13.83% ksoftirqd/0 [kernel.vmlinux] [k] pfifo_fast_dequeue 13.51% swapper [kernel.vmlinux] [k] kmem_cache_free 6.31% gnome-shell [kernel.vmlinux] [k] kmem_cache_free 5.66% kworker/u8:3+ix [kernel.vmlinux] [k] delay_tsc 4.42% perf [kernel.vmlinux] [k] __set_cpus_allowed_ptr 3.45% kworker/2:1-eve [kernel.vmlinux] [k] shmem_truncate_range 2.29% gnome-shell libgobject-2.0.so.0.6000.7 [.] g_closure_ref $ Signed-off-by: Alexey Budankov <[email protected]> Reviewed-by: James Morris <[email protected]> Acked-by: Jiri Olsa <[email protected]> Acked-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf/core: open access to probes for CAP_PERFMON privileged processAlexey Budankov1-2/+2
Open access to monitoring via kprobes and uprobes and eBPF tracing for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. perf kprobes and uprobes are used by ftrace and eBPF. perf probe uses ftrace to define new kprobe events, and those events are treated as tracepoint events. eBPF defines new probes via perf_event_open interface and then the probes are used in eBPF tracing. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to perf_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <[email protected]> Reviewed-by: James Morris <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf/core: Open access to the core for CAP_PERFMON privileged processAlexey Budankov2-4/+4
Open access to monitoring of kernel code, CPUs, tracepoints and namespaces data for a CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons the access to perf_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <[email protected]> Reviewed-by: James Morris <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: [email protected] Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16capabilities: Introduce CAP_PERFMON to kernel and user spaceAlexey Budankov3-3/+13
Introduce the CAP_PERFMON capability designed to secure system performance monitoring and observability operations so that CAP_PERFMON can assist CAP_SYS_ADMIN capability in its governing role for performance monitoring and observability subsystems. CAP_PERFMON hardens system security and integrity during performance monitoring and observability operations by decreasing attack surface that is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access to system performance monitoring and observability operations under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes the operation more secure. Thus, CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e: 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) CAP_PERFMON meets the demand to secure system performance monitoring and observability operations for adoption in security sensitive, restricted, multiuser production environments (e.g. HPC clusters, cloud and virtual compute environments), where root or CAP_SYS_ADMIN credentials are not available to mass users of a system, and securely unblocks applicability and scalability of system performance monitoring and observability operations beyond root and CAP_SYS_ADMIN use cases. CAP_PERFMON takes over CAP_SYS_ADMIN credentials related to system performance monitoring and observability operations and balances amount of CAP_SYS_ADMIN credentials following the recommendations in the capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel developers, below." For backward compatibility reasons access to system performance monitoring and observability subsystems of the kernel remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability usage for secure system performance monitoring and observability operations is discouraged with respect to the designed CAP_PERFMON capability. Although the software running under CAP_PERFMON can not ensure avoidance of related hardware issues, the software can still mitigate these issues following the official hardware issues mitigation procedure [2]. The bugs in the software itself can be fixed following the standard kernel development process [3] to maintain and harden security of system performance monitoring and observability operations. [1] http://man7.org/linux/man-pages/man7/capabilities.7.html [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html Signed-off-by: Alexey Budankov <[email protected]> Acked-by: James Morris <[email protected]> Acked-by: Serge E. Hallyn <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Stephen Smalley <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf annotate: Add basic support for bpf_imageJiri Olsa5-0/+34
Add the DSO_BINARY_TYPE__BPF_IMAGE dso binary type to recognize BPF images that carry trampoline or dispatcher. Upcoming patches will add support to read the image data, store it within the BPF feature in perf.data and display it for annotation purposes. Currently we only display following message: # ./perf annotate bpf_trampoline_24456 --stdio Percent | Source code & Disassembly of . for cycles (504 ... --------------------------------------------------------------- ... : to be implemented Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: David S. Miller <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: John Fastabend <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf machine: Set ksymbol dso as loaded on arrivalJiri Olsa1-0/+1
There's no special load action for ksymbol data on map__load/dso__load action, where the kernel is getting loaded. It only gets confused with kernel kallsyms/vmlinux load for bpf object, which fails and could mess up with the map. Disabling any further load of the map for ksymbol related dso/map. Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: David S. Miller <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: John Fastabend <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf tools: Synthesize bpf_trampoline/dispatcher ksymbol eventJiri Olsa1-0/+93
Synthesize bpf images (trampolines/dispatchers) on start, as ksymbol events from /proc/kallsyms. Having this perf can recognize samples from those images and perf report and top shows them correctly. The rest of the ksymbol handling is already in place from for the bpf programs monitoring, so only the initial state was needed. perf report output: # Overhead Command Shared Object Symbol 12.37% test_progs [kernel.vmlinux] [k] entry_SYSCALL_64 11.80% test_progs [kernel.vmlinux] [k] syscall_return_via_sysret 9.63% test_progs bpf_prog_bcf7977d3b93787c_prog2 [k] bpf_prog_bcf7977d3b93787c_prog2 6.90% test_progs bpf_trampoline_24456 [k] bpf_trampoline_24456 6.36% test_progs [kernel.vmlinux] [k] memcpy_erms Committer notes: Use scnprintf() instead of strncpy() to overcome this on fedora:32, rawhide and OpenMandriva Cooker: CC /tmp/build/perf/util/bpf-event.o In file included from /usr/include/string.h:495, from /git/linux/tools/lib/bpf/libbpf_common.h:12, from /git/linux/tools/lib/bpf/bpf.h:31, from util/bpf-event.c:4: In function 'strncpy', inlined from 'process_bpf_image' at util/bpf-event.c:323:2, inlined from 'kallsyms_process_symbol' at util/bpf-event.c:358:9: /usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation] 106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Song Liu <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: David S. Miller <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: John Fastabend <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf stat: Honour --timeout for forked workloadsArnaldo Carvalho de Melo1-1/+4
When --timeout is used and a workload is specified to be started by 'perf stat', i.e. $ perf stat --timeout 1000 sleep 1h The --timeout wasn't being honoured, i.e. the workload, 'sleep 1h' in the above example, should be terminated after 1000ms, but it wasn't, 'perf stat' was waiting for it to finish. Fix it by sending a SIGTERM when the timeout expires. Now it works: # perf stat -e cycles --timeout 1234 sleep 1h sleep: Terminated Performance counter stats for 'sleep 1h': 1,066,692 cycles 1.234314838 seconds time elapsed 0.000750000 seconds user 0.000000000 seconds sys # Fixes: f1f8ad52f8bf ("perf stat: Add support to print counts after a period of time") Reported-by: Konstantin Kharlamov <[email protected]> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207243 Tested-by: Konstantin Kharlamov <[email protected]> Cc: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: yuzhoujian <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsetsAndrei Vagin2-2/+27
Michael Kerrisk suggested to replace numeric clock IDs with symbolic names. Now the content of these files looks like this: $ cat /proc/774/timens_offsets monotonic 864000 0 boottime 1728000 0 For setting offsets, both representations of clocks (numeric and symbolic) can be used. As for compatibility, it is acceptable to change things as long as userspace doesn't care. The format of timens_offsets files is very new and there are no userspace tools yet which rely on this format. But three projects crun, util-linux and criu rely on the interface of setting time offsets and this is why it's required to continue supporting the numeric clock IDs on write. Fixes: 04a8682a71be ("fs/proc: Introduce /proc/pid/timens_offsets") Suggested-by: Michael Kerrisk <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kerrisk <[email protected]> Acked-by: Michael Kerrisk <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2020-04-16irqchip/gic-v4.1: Update effective affinity of virtual SGIsMarc Zyngier1-0/+1
Although the vSGIs are not directly visible to the host, they still get moved around by the CPU hotplug, for example. This results in the kernel moaning on the console, such as: genirq: irq_chip GICv4.1-sgi did not update eff. affinity mask of irq 38 Updating the effective affinity on set_affinity() fixes it. Reviewed-by: Zenghui Yu <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2020-04-16irqchip/gic-v4.1: Add support for VPENDBASER's Dirty+Valid signalingMarc Zyngier3-4/+28
When a vPE is made resident, the GIC starts parsing the virtual pending table to deliver pending interrupts. This takes place asynchronously, and can at times take a long while. Long enough that the vcpu enters the guest and hits WFI before any interrupt has been signaled yet. The vcpu then exits, blocks, and now gets a doorbell. Rince, repeat. In order to avoid the above, a (optional on GICv4, mandatory on v4.1) feature allows the GIC to feedback to the hypervisor whether it is done parsing the VPT by clearing the GICR_VPENDBASER.Dirty bit. The hypervisor can then wait until the GIC is ready before actually running the vPE. Plug the detection code as well as polling on vPE schedule. While at it, tidy-up the kernel message that displays the GICv4 optional features. Reviewed-by: Zenghui Yu <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2020-04-16Merge tag 'perf-urgent-for-mingo-5.7-20200414' of ↵Ingo Molnar22-387/+646
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: perf stat: Jin Yao: - Fix no metric header if --per-socket and --metric-only set build system: - Fix python building when built with clang, that was failing if the clang version doesn't support -fno-semantic-interposition. tools UAPI headers: Arnaldo Carvalho de Melo: - Update various copies of kernel headers, some ended up automatically updating build-time generated tables to enable tools such as 'perf trace' to decode syscalls and tracepoints arguments. Now the tools/perf build is free of UAPI drift warnings. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2020-04-16Merge branch 'linux-5.7' of git://github.com/skeggsb/linux into drm-fixesDave Airlie2-0/+19
Add missing module firmware for turings. Signed-off-by: Dave Airlie <[email protected]> From: Ben Skeggs <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/ <CACAvsv4njTRpiNqOC54iRjpd=nu3pBG8i_fp8o_dp7AZE6hFWA@mail.gmail.com
2020-04-16drm/nouveau/sec2/gv100-: add missing MODULE_FIRMWARE()Ben Skeggs2-0/+19
ASB was failing to load on Turing GPUs when firmware is being loaded from initramfs, leaving the GPU in an odd state and causing suspend/ resume to fail. Add missing MODULE_FIRMWARE() lines for initramfs generators. Signed-off-by: Ben Skeggs <[email protected]> Cc: <[email protected]> # 5.6
2020-04-15proc: Handle umounts cleanlyEric W. Biederman1-0/+7
syzbot writes: > KASAN: use-after-free Read in dput (2) > > proc_fill_super: allocate dentry failed > ================================================================== > BUG: KASAN: use-after-free in fast_dput fs/dcache.c:727 [inline] > BUG: KASAN: use-after-free in dput+0x53e/0xdf0 fs/dcache.c:846 > Read of size 4 at addr ffff88808a618cf0 by task syz-executor.0/8426 > > CPU: 0 PID: 8426 Comm: syz-executor.0 Not tainted 5.6.0-next-20200412-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x188/0x20d lib/dump_stack.c:118 > print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:382 > __kasan_report.cold+0x35/0x4d mm/kasan/report.c:511 > kasan_report+0x33/0x50 mm/kasan/common.c:625 > fast_dput fs/dcache.c:727 [inline] > dput+0x53e/0xdf0 fs/dcache.c:846 > proc_kill_sb+0x73/0xf0 fs/proc/root.c:195 > deactivate_locked_super+0x8c/0xf0 fs/super.c:335 > vfs_get_super+0x258/0x2d0 fs/super.c:1212 > vfs_get_tree+0x89/0x2f0 fs/super.c:1547 > do_new_mount fs/namespace.c:2813 [inline] > do_mount+0x1306/0x1b30 fs/namespace.c:3138 > __do_sys_mount fs/namespace.c:3347 [inline] > __se_sys_mount fs/namespace.c:3324 [inline] > __x64_sys_mount+0x18f/0x230 fs/namespace.c:3324 > do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 > entry_SYSCALL_64_after_hwframe+0x49/0xb3 > RIP: 0033:0x45c889 > Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:00007ffc1930ec48 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 > RAX: ffffffffffffffda RBX: 0000000001324914 RCX: 000000000045c889 > RDX: 0000000020000140 RSI: 0000000020000040 RDI: 0000000000000000 > RBP: 000000000076bf00 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > R13: 0000000000000749 R14: 00000000004ca15a R15: 0000000000000013 Looking at the code now that it the internal mount of proc is no longer used it is possible to unmount proc. If proc is unmounted the fields of the pid namespace that were used for filesystem specific state are not reinitialized. Which means that proc_self and proc_thread_self can be pointers to already freed dentries. The reported user after free appears to be from mounting and unmounting proc followed by mounting proc again and using error injection to cause the new root dentry allocation to fail. This in turn results in proc_kill_sb running with proc_self and proc_thread_self still retaining their values from the previous mount of proc. Then calling dput on either proc_self of proc_thread_self will result in double put. Which KASAN sees as a use after free. Solve this by always reinitializing the filesystem state stored in the struct pid_namespace, when proc is unmounted. Reported-by: [email protected] Acked-by: Christian Brauner <[email protected]> Fixes: 69879c01a0c3 ("proc: Remove the now unnecessary internal mount of proc") Signed-off-by: "Eric W. Biederman" <[email protected]>
2020-04-15ext4: convert BUG_ON's to WARN_ON's in mballoc.cTheodore Ts'o1-2/+4
If the in-core buddy bitmap gets corrupted (or out of sync with the block bitmap), issue a WARN_ON and try to recover. In most cases this involves skipping trying to allocate out of a particular block group. We can end up declaring the file system corrupted, which is fair, since the file system probably should be checked before we proceed any further. Link: https://lore.kernel.org/r/[email protected] Google-Bug-Id: 34811296 Google-Bug-Id: 34639169 Signed-off-by: Theodore Ts'o <[email protected]>
2020-04-15ext4: increase wait time needed before reuse of deleted inode numbersTheodore Ts'o1-1/+1
Current wait times have proven to be too short to protect against inode reuses that lead to metadata inconsistencies. Now that we will retry the inode allocation if we can't find any recently deleted inodes, it's a lot safer to increase the recently deleted time from 5 seconds to a minute. Link: https://lore.kernel.org/r/[email protected] Google-Bug-Id: 36602237 Signed-off-by: Theodore Ts'o <[email protected]>
2020-04-15ext4: remove set but not used variable 'es' in ext4_jbd2.cJason Yan1-3/+0
Fix the following gcc warning: fs/ext4/ext4_jbd2.c:341:30: warning: variable 'es' set but not used [-Wunused-but-set-variable] struct ext4_super_block *es; ^~ Fixes: 2ea2fc775321 ("ext4: save all error info in save_error_info() and drop ext4_set_errno()") Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-04-15ext4: remove set but not used variable 'es'Jason Yan1-2/+0
Fix the following gcc warning: fs/ext4/super.c:599:27: warning: variable 'es' set but not used [-Wunused-but-set-variable] struct ext4_super_block *es; ^~ Fixes: 2ea2fc775321 ("ext4: save all error info in save_error_info() and drop ext4_set_errno()") Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-04-15ext4: do not zeroout extents beyond i_disksizeJan Kara1-4/+4
We do not want to create initialized extents beyond end of file because for e2fsck it is impossible to distinguish them from a case of corrupted file size / extent tree and so it complains like: Inode 12, i_size is 147456, should be 163840. Fix? no Code in ext4_ext_convert_to_initialized() and ext4_split_convert_extents() try to make sure it does not create initialized extents beyond inode size however they check against inode->i_size which is wrong. They should instead check against EXT4_I(inode)->i_disksize which is the current inode size on disk. That's what e2fsck is going to see in case of crash before all dirty data is written. This bug manifests as generic/456 test failure (with recent enough fstests where fsx got fixed to properly pass FALLOC_KEEP_SIZE_FL flags to the kernel) when run with dioread_lock mount option. CC: [email protected] Fixes: 21ca087a3891 ("ext4: Do not zero out uninitialized extents beyond i_size") Reviewed-by: Lukas Czerner <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-04-15ext4: fix return-value types in several function commentsJosh Triplett2-3/+3
The documentation comments for ext4_read_block_bitmap_nowait and ext4_read_inode_bitmap describe them as returning NULL on error, but they return an ERR_PTR on error; update the documentation to match. The documentation comment for ext4_wait_block_bitmap describes it as returning 1 on error, but it returns -errno on error; update the documentation to match. Signed-off-by: Josh Triplett <[email protected]> Reviewed-by: Ritesh Harani <[email protected]> Link: https://lore.kernel.org/r/60a3f4996f4932c45515aaa6b75ca42f2a78ec9b.1585512514.git.josh@joshtriplett.org Signed-off-by: Theodore Ts'o <[email protected]>
2020-04-15ext4: use non-movable memory for superblock readaheadRoman Gushchin4-2/+21
Since commit a8ac900b8163 ("ext4: use non-movable memory for the superblock") buffers for ext4 superblock were allocated using the sb_bread_unmovable() helper which allocated buffer heads out of non-movable memory blocks. It was necessarily to not block page migrations and do not cause cma allocation failures. However commit 85c8f176a611 ("ext4: preload block group descriptors") broke this by introducing pre-reading of the ext4 superblock. The problem is that __breadahead() is using __getblk() underneath, which allocates buffer heads out of movable memory. It resulted in page migration failures I've seen on a machine with an ext4 partition and a preallocated cma area. Fix this by introducing sb_breadahead_unmovable() and __breadahead_gfp() helpers which use non-movable memory for buffer head allocations and use them for the ext4 superblock readahead. Reviewed-by: Andreas Dilger <[email protected]> Fixes: 85c8f176a611 ("ext4: preload block group descriptors") Signed-off-by: Roman Gushchin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-04-15ext4: use matching invalidatepage in ext4_writepageyangerkun1-1/+1
Run generic/388 with journal data mode sometimes may trigger the warning in ext4_invalidatepage. Actually, we should use the matching invalidatepage in ext4_writepage. Signed-off-by: yangerkun <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Reviewed-by: Ritesh Harjani <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2020-04-15cifs: improve read performance for page size 64KB & cache=strict & vers=2.1+Jones Syue2-1/+5
Found a read performance issue when linux kernel page size is 64KB. If linux kernel page size is 64KB and mount options cache=strict & vers=2.1+, it does not support cifs_readpages(). Instead, it is using cifs_readpage() and cifs_read() with maximum read IO size 16KB, which is much slower than read IO size 1MB when negotiated SMB 2.1+. Since modern SMB server supported SMB 2.1+ and Max Read Size can reach more than 64KB (for example 1MB ~ 8MB), this patch check max_read instead of maxBuf to determine whether server support readpages() and improve read performance for page size 64KB & cache=strict & vers=2.1+, and for SMB1 it is more cleaner to initialize server->max_read to server->maxBuf. The client is a linux box with linux kernel 4.2.8, page size 64KB (CONFIG_ARM64_64K_PAGES=y), cpu arm 1.7GHz, and use mount.cifs as smb client. The server is another linux box with linux kernel 4.2.8, share a file '10G.img' with size 10GB, and use samba-4.7.12 as smb server. The client mount a share from the server with different cache options: cache=strict and cache=none, mount -tcifs //<server_ip>/Public /cache_strict -overs=3.0,cache=strict,username=<xxx>,password=<yyy> mount -tcifs //<server_ip>/Public /cache_none -overs=3.0,cache=none,username=<xxx>,password=<yyy> The client download a 10GbE file from the server across 1GbE network, dd if=/cache_strict/10G.img of=/dev/null bs=1M count=10240 dd if=/cache_none/10G.img of=/dev/null bs=1M count=10240 Found that cache=strict (without patch) is slower read throughput and smaller read IO size than cache=none. cache=strict (without patch): read throughput 40MB/s, read IO size is 16KB cache=strict (with patch): read throughput 113MB/s, read IO size is 1MB cache=none: read throughput 109MB/s, read IO size is 1MB Looks like if page size is 64KB, cifs_set_ops() would use cifs_addr_ops_smallbuf instead of cifs_addr_ops, /* check if server can support readpages */ if (cifs_sb_master_tcon(cifs_sb)->ses->server->maxBuf < PAGE_SIZE + MAX_CIFS_HDR_SIZE) inode->i_data.a_ops = &cifs_addr_ops_smallbuf; else inode->i_data.a_ops = &cifs_addr_ops; maxBuf is came from 2 places, SMB2_negotiate() and CIFSSMBNegotiate(), (SMB2_MAX_BUFFER_SIZE is 64KB) SMB2_negotiate(): /* set it to the maximum buffer size value we can send with 1 credit */ server->maxBuf = min_t(unsigned int, le32_to_cpu(rsp->MaxTransactSize),       SMB2_MAX_BUFFER_SIZE); CIFSSMBNegotiate(): server->maxBuf = le32_to_cpu(pSMBr->MaxBufferSize); Page size 64KB and cache=strict lead to read_pages() use cifs_readpage() instead of cifs_readpages(), and then cifs_read() using maximum read IO size 16KB, which is much slower than maximum read IO size 1MB. (CIFSMaxBufSize is 16KB by default) /* FIXME: set up handlers for larger reads and/or convert to async */ rsize = min_t(unsigned int, cifs_sb->rsize, CIFSMaxBufSize); Reviewed-by: Pavel Shilovsky <[email protected]> Signed-off-by: Jones Syue <[email protected]> Signed-off-by: Steve French <[email protected]>
2020-04-15cifs: dump the session id and keys also for SMB2 sessionsRonnie Sahlberg1-0/+15
We already dump these keys for SMB3, lets also dump it for SMB2 sessions so that we can use the session key in wireshark to check and validate that the signatures are correct. Signed-off-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]> Reviewed-by: Aurelien Aptel <[email protected]>
2020-04-16Merge tag 'amd-drm-fixes-5.7-2020-04-15' of ↵Dave Airlie9-16/+33
git://people.freedesktop.org/~agd5f/linux into drm-fixes amd-drm-fixes-5.7-2020-04-15: amdgpu: - gfx10 fix - SMU7 overclocking fix - RAS fix - GPU reset fix - Fix a regression in a previous s/r fix - Add a gfxoff quirk Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-04-16Merge tag 'drm-intel-fixes-2020-04-15' of ↵Dave Airlie2-78/+33
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix guest page access by using the brand new VFIO dma r/w interface (Yan) - Fix for i915 perf read buffers (Ashutosh) Signed-off-by: Dave Airlie <[email protected]> From: Rodrigo Vivi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-04-15Merge tag 'efi-urgent-2020-04-15' of ↵Linus Torvalds8-30/+61
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "Misc EFI fixes, including the boot failure regression caused by the BSS section not being cleared by the loaders" * tag 'efi-urgent-2020-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Revert struct layout change to fix kexec boot regression efi/x86: Don't remap text<->rodata gap read-only for mixed mode efi/x86: Fix the deletion of variables in mixed mode efi/libstub/file: Merge file name buffers to reduce stack usage Documentation/x86, efi/x86: Clarify EFI handover protocol and its requirements efi/arm: Deal with ADR going out of range in efi_enter_kernel() efi/x86: Always relocate the kernel for EFI handover entry efi/x86: Move efi stub globals from .bss to .data efi/libstub/x86: Remove redundant assignment to pointer hdr efi/cper: Use scnprintf() for avoiding potential buffer overflow
2020-04-15tipc: fix incorrect increasing of link windowTuong Lien1-1/+1
In commit 16ad3f4022bb ("tipc: introduce variable window congestion control"), we allow link window to change with the congestion avoidance algorithm. However, there is a bug that during the slow-start if packet retransmission occurs, the link will enter the fast-recovery phase, set its window to the 'ssthresh' which is never less than 300, so the link window suddenly increases to that limit instead of decreasing. Consequently, two issues have been observed: - For broadcast-link: it can leave a gap between the link queues that a new packet will be inserted and sent before the previous ones, i.e. not in-order. - For unicast: the algorithm does not work as expected, the link window jumps to the slow-start threshold whereas packet retransmission occurs. This commit fixes the issues by avoiding such the link window increase, but still decreasing if the 'ssthresh' is lowered. Fixes: 16ad3f4022bb ("tipc: introduce variable window congestion control") Acked-by: Jon Maloy <[email protected]> Signed-off-by: Tuong Lien <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-15Documentation: Fix tcp_challenge_ack_limit default valueCambda Zhu1-1/+1
The default value of tcp_challenge_ack_limit has been changed from 100 to 1000 and this patch fixes its documentation. Signed-off-by: Cambda Zhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-15net: tulip: make early_486_chipsets staticJason Yan1-1/+1
Fix the following sparse warning: drivers/net/ethernet/dec/tulip/tulip_core.c:1280:28: warning: symbol 'early_486_chipsets' was not declared. Should it be static? Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-15dt-bindings: net: ethernet-phy: add desciption for ethernet-phy-id1234.d400Johan Jonker1-0/+3
The description below is already in use in 'rk3228-evb.dts', 'rk3229-xms6.dts' and 'rk3328.dtsi' but somehow never added to a document, so add "ethernet-phy-id1234.d400", "ethernet-phy-ieee802.3-c22" for ethernet-phy nodes on Rockchip platforms to 'ethernet-phy.yaml'. Signed-off-by: Johan Jonker <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-15ipv6: remove redundant assignment to variable errColin Ian King1-1/+1
The variable err is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-15dt-bindings: pwm: Fix cros-ec-pwm example dtc 'reg' warningRob Herring1-5/+12
The example for the CrOS EC PWM is incomplete and now generates a dtc warning: Documentation/devicetree/bindings/pwm/google,cros-ec-pwm.example.dts:17.11-23.11: Warning (unit_address_vs_reg): /example-0/cros-ec@0: node has a unit name, but no reg or ranges property Fixing this results in more warnings as a parent spi node is needed as well. Cc: Thierry Reding <[email protected]> Cc: Benson Leung <[email protected]> Cc: Enric Balletbo i Serra <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: [email protected] Acked-by: Uwe Kleine-König <[email protected]> Signed-off-by: Rob Herring <[email protected]>
2020-04-15selinux: free str on error in str_read()Ondrej Mosnacek1-4/+4
In [see "Fixes:"] I missed the fact that str_read() may give back an allocated pointer even if it returns an error, causing a potential memory leak in filename_trans_read_one(). Fix this by making the function free the allocated string whenever it returns a non-zero value, which also makes its behavior more obvious and prevents repeating the same mistake in the future. Reported-by: coverity-bot <[email protected]> Addresses-Coverity-ID: 1461665 ("Resource leaks") Fixes: c3a276111ea2 ("selinux: optimize storage of filename transitions") Signed-off-by: Ondrej Mosnacek <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Paul Moore <[email protected]>
2020-04-15scripts: documentation-file-ref-check: Add line break before exitTiezhu Yang1-1/+1
If execute ./scripts/documentation-file-ref-check in a directory which is not a git tree, it will exit without a line break, fix it. Without this patch: [loongson@localhost linux-5.7-rc1]$ ./scripts/documentation-file-ref-check Warning: can't check if file exists, as this is not a git tree[loongson@localhost linux-5.7-rc1]$ With this patch: [loongson@localhost linux-5.7-rc1]$ ./scripts/documentation-file-ref-check Warning: can't check if file exists, as this is not a git tree [loongson@localhost linux-5.7-rc1]$ Signed-off-by: Tiezhu Yang <[email protected]> Reviewed-by: Mauro Carvalho Chehab <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2020-04-15scripts/kernel-doc: Add missing close-paren in c:function directivesPeter Maydell1-1/+1
When kernel-doc generates a 'c:function' directive for a function one of whose arguments is a function pointer, it fails to print the close-paren after the argument list of the function pointer argument. For instance: long work_on_cpu(int cpu, long (*fn) (void *, void * arg) in driver-api/basics.html is missing a ')' separating the "void *" of the 'fn' arguments from the ", void * arg" which is an argument to work_on_cpu(). Add the missing close-paren, so that we render the prototype correctly: long work_on_cpu(int cpu, long (*fn)(void *), void * arg) (Note that Sphinx stops rendering a space between the '(fn*)' and the '(void *)' once it gets something that's syntactically valid.) Signed-off-by: Peter Maydell <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2020-04-15docs: admin-guide: merge sections for the kernel.modprobe sysctlEric Biggers1-28/+19
Documentation for the kernel.modprobe sysctl was added both by commit 0317c5371e6a ("docs: merge debugging-modules.txt into sysctl/kernel.rst") and by commit 6e7158250625 ("docs: admin-guide: document the kernel.modprobe sysctl"), resulting in the same sysctl being documented in two places. Merge these into one place. Signed-off-by: Eric Biggers <[email protected]> Reviewed-by: Stephen Kitt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2020-04-15docs: timekeeping: Use correct prototype for deprecated functionsChris Packham1-3/+3
Use the correct prototypes for do_gettimeofday(), getnstimeofday() and getnstimeofday64(). All of these returned void and passed the return value by reference. This should make the documentation of their deprecation and replacements easier to search for. Signed-off-by: Chris Packham <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2020-04-15net/rds: Use ERR_PTR for rds_message_alloc_sgs()Jason Gunthorpe4-21/+19
Returning the error code via a 'int *ret' when the function returns a pointer is very un-kernely and causes gcc 10's static analysis to choke: net/rds/message.c: In function ‘rds_message_map_pages’: net/rds/message.c:358:10: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized] 358 | return ERR_PTR(ret); Use a typical ERR_PTR return instead. Signed-off-by: Jason Gunthorpe <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-15net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridgeVladimir Oltean4-48/+47
To rehash a previous explanation given in commit 1c44ce560b4d ("net: mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is up"), the switch driver operates the in a mode where a single VLAN can be transmitted as untagged on a particular egress port. That is the "native VLAN on trunk port" use case. The configuration for this native VLAN is driven in 2 ways: - Set the egress port rewriter to strip the VLAN tag for the native VID (as it is egress-untagged, after all). - Configure the ingress port to drop untagged and priority-tagged traffic, if there is no native VLAN. The intention of this setting is that a trunk port with no native VLAN should not accept untagged traffic. Since both of the above configurations for the native VLAN should only be done if VLAN awareness is requested, they are actually done from the ocelot_port_vlan_filtering function, after the basic procedure of toggling the VLAN awareness flag of the port. But there's a problem with that simplistic approach: we are trying to juggle with 2 independent variables from a single function: - Native VLAN of the port - its value is held in port->vid. - VLAN awareness state of the port - currently there are some issues here, more on that later*. The actual problem can be seen when enslaving the switch ports to a VLAN filtering bridge: 0. The driver configures a pvid of zero for each port, when in standalone mode. While the bridge configures a default_pvid of 1 for each port that gets added as a slave to it. 1. The bridge calls ocelot_port_vlan_filtering with vlan_aware=true. The VLAN-filtering-dependent portion of the native VLAN configuration is done, considering that the native VLAN is 0. 2. The bridge calls ocelot_vlan_add with vid=1, pvid=true, untagged=true. The native VLAN changes to 1 (change which gets propagated to hardware). 3. ??? - nobody calls ocelot_port_vlan_filtering again, to reapply the VLAN-filtering-dependent portion of the native VLAN configuration, for the new native VLAN of 1. One can notice that after toggling "ip link set dev br0 type bridge vlan_filtering 0 && ip link set dev br0 type bridge vlan_filtering 1", the new native VLAN finally makes it through and untagged traffic finally starts flowing again. But obviously that shouldn't be needed. So it is clear that 2 independent variables need to both re-trigger the native VLAN configuration. So we introduce the second variable as ocelot_port->vlan_aware. *Actually both the DSA Felix driver and the Ocelot driver already had each its own variable: - Ocelot: ocelot_port_private->vlan_aware - Felix: dsa_port->vlan_filtering but the common Ocelot library needs to work with a single, common, variable, so there is some refactoring done to move the vlan_aware property from the private structure into the common ocelot_port structure. Fixes: 97bb69e1e36e ("net: mscc: ocelot: break apart ocelot_vlan_port_apply") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Horatiu Vultur <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller11-95/+238
Daniel Borkmann says: ==================== pull-request: bpf 2020-04-15 The following pull-request contains BPF updates for your *net* tree. We've added 10 non-merge commits during the last 3 day(s) which contain a total of 11 files changed, 238 insertions(+), 95 deletions(-). The main changes are: 1) Fix offset overflow for BPF_MEM BPF_DW insn mapping on arm32 JIT, from Luke Nelson and Xi Wang. 2) Prevent mprotect() to make frozen & mmap()'ed BPF map writeable again, from Andrii Nakryiko and Jann Horn. 3) Fix type of old_fd in bpf_xdp_set_link_opts to int in libbpf and add selftests, from Toke Høiland-Jørgensen. 4) Fix AF_XDP to check that headroom cannot be larger than the available space in the chunk, from Magnus Karlsson. 5) Fix reset of XDP prog when expected_fd is set, from David Ahern. 6) Fix a segfault in bpftool's struct_ops command when BTF is not available, from Daniel T. Lee. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-04-15Merge tag 'mac80211-for-net-2020-04-15' of ↵David S. Miller5-25/+38
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A couple of fixes: * FTM responder policy netlink validation fix (but the only user validates again later) * kernel-doc fixes * a fix for a race in mac80211 radio registration vs. userspace * a mesh channel switch fix * a fix for a syzbot reported kasprintf() issue ==================== Signed-off-by: David S. Miller <[email protected]>
2020-04-15i2c: tegra: Synchronize DMA before terminationDmitry Osipenko1-0/+9
DMA transfer could be completed, but CPU (which handles DMA interrupt) may get too busy and can't handle the interrupt in a timely manner, despite of DMA IRQ being raised. In this case the DMA state needs to synchronized before terminating DMA transfer in order not to miss the DMA transfer completion. Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2020-04-15i2c: tegra: Better handle case where CPU0 is busy for a long timeDmitry Osipenko1-12/+15
Boot CPU0 always handle I2C interrupt and under some rare circumstances (like running KASAN + NFS root) it may stuck in uninterruptible state for a significant time. In this case we will get timeout if I2C transfer is running on a sibling CPU, despite of IRQ being raised. In order to handle this rare condition, the IRQ status needs to be checked after completion timeout. Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>