aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2018-11-01irq/matrix: Fix memory overallocationMichael Kelley1-1/+1
IRQ_MATRIX_SIZE is the number of longs needed for a bitmap, multiplied by the size of a long, yielding a byte count. But it is used to size an array of longs, which is way more memory than is needed. Change IRQ_MATRIX_SIZE so it is just the number of longs needed and the arrays come out the correct size. Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator") Signed-off-by: Michael Kelley <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: KY Srinivasan <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-10-31bpf: don't set id on after map lookup with ptr_to_map_val returnDaniel Borkmann1-5/+6
In the verifier there is no such semantics where registers with PTR_TO_MAP_VALUE type have an id assigned to them. This is only used in PTR_TO_MAP_VALUE_OR_NULL and later on nullified once the test against NULL has been pattern matched and type transformed into PTR_TO_MAP_VALUE. Fixes: 3e6a4b3e0289 ("bpf/verifier: introduce BPF_PTR_TO_MAP_VALUE") Signed-off-by: Daniel Borkmann <[email protected]> Cc: Roman Gushchin <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-10-31bpf: fix partial copy of map_ptr when dst is scalarDaniel Borkmann1-4/+6
ALU operations on pointers such as scalar_reg += map_value_ptr are handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr and range in the register state share a union, so transferring state through dst_reg->range = ptr_reg->range is just buggy as any new map_ptr in the dst_reg is then truncated (or null) for subsequent checks. Fix this by adding a raw member and use it for copying state over to dst_reg. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Daniel Borkmann <[email protected]> Cc: Edward Cree <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-10-31memblock: stop using implicit alignment to SMP_CACHE_BYTESMike Rapoport1-1/+2
When a memblock allocation APIs are called with align = 0, the alignment is implicitly set to SMP_CACHE_BYTES. Implicit alignment is done deep in the memblock allocator and it can come as a surprise. Not that such an alignment would be wrong even when used incorrectly but it is better to be explicit for the sake of clarity and the prinicple of the least surprise. Replace all such uses of memblock APIs with the 'align' parameter explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment in the memblock internal allocation functions. For the case when memblock APIs are used via helper functions, e.g. like iommu_arena_new_node() in Alpha, the helper functions were detected with Coccinelle's help and then manually examined and updated where appropriate. The direct memblock APIs users were updated using the semantic patch below: @@ expression size, min_addr, max_addr, nid; @@ ( | - memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid) + memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr, nid) | - memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid) + memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr, nid) | - memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid) + memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid) | - memblock_alloc(size, 0) + memblock_alloc(size, SMP_CACHE_BYTES) | - memblock_alloc_raw(size, 0) + memblock_alloc_raw(size, SMP_CACHE_BYTES) | - memblock_alloc_from(size, 0, min_addr) + memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr) | - memblock_alloc_nopanic(size, 0) + memblock_alloc_nopanic(size, SMP_CACHE_BYTES) | - memblock_alloc_low(size, 0) + memblock_alloc_low(size, SMP_CACHE_BYTES) | - memblock_alloc_low_nopanic(size, 0) + memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES) | - memblock_alloc_from_nopanic(size, 0, min_addr) + memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr) | - memblock_alloc_node(size, 0, nid) + memblock_alloc_node(size, SMP_CACHE_BYTES, nid) ) [[email protected]: changelog update] [[email protected]: coding-style fixes] [[email protected]: fix missed uses of implicit alignment] Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Suggested-by: Michal Hocko <[email protected]> Acked-by: Paul Burton <[email protected]> [MIPS] Acked-by: Michael Ellerman <[email protected]> [powerpc] Acked-by: Michal Hocko <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michal Simek <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-31mm: remove include/linux/bootmem.hMike Rapoport8-8/+7
Move remaining definitions and declarations from include/linux/bootmem.h into include/linux/memblock.h and remove the redundant header. The includes were replaced with the semantic patch below and then semi-automated removal of duplicated '#include <linux/memblock.h> @@ @@ - #include <linux/bootmem.h> + #include <linux/memblock.h> [[email protected]: dma-direct: fix up for the removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: powerpc: fix up for removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Serge Semin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-31memblock: remove _virt from APIs returning virtual addressMike Rapoport3-6/+6
The conversion is done using sed -i 's@memblock_virt_alloc@memblock_alloc@g' \ $(git grep -l memblock_virt_alloc) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Simek <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Serge Semin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-31kbuild: fix kernel/bounds.c 'W=1' warningArnd Bergmann1-1/+3
Building any configuration with 'make W=1' produces a warning: kernel/bounds.c:16:6: warning: no previous prototype for 'foo' [-Wmissing-prototypes] When also passing -Werror, this prevents us from building any other files. Nobody ever calls the function, but we can't make it 'static' either since we want the compiler output. Calling it 'main' instead however avoids the warning, because gcc does not insist on having a declaration for main. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Reported-by: Kieran Bingham <[email protected]> Reviewed-by: Kieran Bingham <[email protected]> Cc: David Laight <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-31kernel/panic.c: filter out a potential trailing newlineBorislav Petkov1-2/+6
If a call to panic() terminates the string with a \n , the result puts the closing brace ']---' on a newline because panic() itself adds \n too. Now, if one goes and removes the newline chars from all panic() invocations - and the stats right now look like this: ~300 calls with a \n ~500 calls without a \n one is destined to a neverending game of whack-a-mole because the usual thing to do is add a newline at the end of a string a function is supposed to print. Therefore, simply zap any \n at the end of the panic string to avoid touching so many places in the kernel. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Kees Cook <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-31kernel/panic.c: do not append newline to the stack protector panic stringBorislav Petkov1-1/+1
... because panic() itself already does this. Otherwise you have line-broken trailer: [ 1.836965] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pgd_alloc+0x29e/0x2a0 [ 1.836965] ]--- Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: "Steven Rostedt (VMware)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-31kernel/signal.c: fix a comment errorWeikang Shi1-1/+1
Because get_signal_to_deliver() was renamed to get_signal() the comment should be fixed. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Weikang Shi <[email protected]> Reported-by: Christian Brauner <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Anna-Maria Gleixner <[email protected]> Cc: "Luck, Tony" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-31kernel/fail_function.c: remove meaningless null pointer check before ↵zhong jiang1-2/+1
debugfs_remove_recursive debugfs_remove_recursive() has taken the null pointer into account. just remove the null check before debugfs_remove_recursive(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: zhong jiang <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-30Merge tag 'trace-v4.20' of ↵Linus Torvalds10-931/+1261
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The biggest change here is the updates to kprobes Back in January I posted patches to create function based events. These were the events that you suggested I make to allow developers to easily create events in code where no trace event exists. After posting those changes for review, it was suggested that we implement this instead with kprobes. The problem with kprobes is that the interface is too complex and needs to be simplified. Masami Hiramatsu posted patches in March and I've been playing with them a bit. There's been a bit of clean up in the kprobe code that was inspired by the function based event patches, and a couple of enhancements to the kprobe event interface. - If the arch supports it (we added support for x86), you can place a kprobe event at the start of a function and use $arg1, $arg2, etc to reference the arguments of a function. (Before you needed to know what register or where on the stack the argument was). - The second is a way to see array of events. For example, if you reference a mac address, you can add: echo 'p:mac ip_rcv perm_addr=+574($arg2):x8[6]' > kprobe_events And this will produce: mac: (ip_rcv+0x0/0x140) perm_addr={0x52,0x54,0x0,0xc0,0x76,0xec} Other changes include - Exporting trace_dump_stack to modules - Have the stack tracer trace the entire stack (stop trying to remove tracing itself, as we keep removing too much). - Added support for SDT in uprobes" [ SDT - "Statically Defined Tracing" are userspace markers for tracing. Let's not use random TLA's in explanations unless they are fairly well-established as generic (at least for kernel people) - Linus ] * tag 'trace-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (24 commits) tracing: Have stack tracer trace full stack tracing: Export trace_dump_stack to modules tracing: probeevent: Fix uninitialized used of offset in parse args tracing/kprobes: Allow kprobe-events to record module symbol tracing/kprobes: Check the probe on unloaded module correctly tracing/uprobes: Fix to return -EFAULT if copy_from_user failed tracing: probeevent: Add $argN for accessing function args x86: ptrace: Add function argument access API tracing: probeevent: Add array type support tracing: probeevent: Add symbol type tracing: probeevent: Unify fetch_insn processing common part tracing: probeevent: Append traceprobe_ for exported function tracing: probeevent: Return consumed bytes of dynamic area tracing: probeevent: Unify fetch type tables tracing: probeevent: Introduce new argument fetching code tracing: probeevent: Remove NOKPROBE_SYMBOL from print functions tracing: probeevent: Cleanup argument field definition tracing: probeevent: Cleanup print argument functions trace_uprobe: support reference counter in fd-based uprobe perf probe: Support SDT markers having reference counter (semaphore) ...
2018-10-30Merge tag 'trace-v4.19-rc8-3' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Masami had a couple more fixes to the synthetic events. One was a proper error return value, and the other is for the self tests" * tag 'trace-v4.19-rc8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: selftests/ftrace: Fix synthetic event test to delete event correctly tracing: Return -ENOENT if there is no target synthetic event
2018-10-30Merge tag 'pm-4.20-rc1-2' of ↵Linus Torvalds2-22/+42
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These remove a questionable heuristic from the menu cpuidle governor, fix a recent build regression in the intel_pstate driver, clean up ARM big-Little support in cpufreq and fix up hung task watchdog's interaction with system-wide power management transitions. Specifics: - Fix build regression in the intel_pstate driver that doesn't build without CONFIG_ACPI after recent changes (Dominik Brodowski). - One of the heuristics in the menu cpuidle governor is based on a function returning 0 most of the time, so drop it and clean up the scheduler code related to it (Daniel Lezcano). - Prevent the arm_big_little cpufreq driver from being used on ARM64 which is not suitable for it and drop the arm_big_little_dt driver that is not used any more (Sudeep Holla). - Prevent the hung task watchdog from triggering during resume from system-wide sleep states by disabling it before freezing tasks and enabling it again after they have been thawed (Vitaly Kuznetsov)" * tag 'pm-4.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: kernel: hung_task.c: disable on suspend cpufreq: remove unused arm_big_little_dt driver cpufreq: drop ARM_BIG_LITTLE_CPUFREQ support for ARM64 cpufreq: intel_pstate: Fix compilation for !CONFIG_ACPI cpuidle: menu: Remove get_loadavg() from the performance multiplier sched: Factor out nr_iowait and nr_iowait_cpu
2018-10-30perf/core: Clean up inconsisent indentationColin Ian King1-1/+1
Replace a bunch of spaces with tab, cleans up indentation Signed-off-by: Colin Ian King <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-30Merge branches 'pm-cpuidle' and 'pm-cpufreq'Rafael J. Wysocki1-21/+13
* pm-cpuidle: cpuidle: menu: Remove get_loadavg() from the performance multiplier sched: Factor out nr_iowait and nr_iowait_cpu * pm-cpufreq: cpufreq: remove unused arm_big_little_dt driver cpufreq: drop ARM_BIG_LITTLE_CPUFREQ support for ARM64 cpufreq: intel_pstate: Fix compilation for !CONFIG_ACPI
2018-10-29sched/rt: Update comment in pick_next_task_rt()Muchun Song1-1/+1
Commit: f4ebcbc0d7e0 ("sched/rt: Substract number of tasks of throttled queues from rq->nr_running") added a new rt_rq->rt_queued field, which is used to indicate the status of rq->rt enqueue or dequeue. So, the ->rt_nr_running check was removed and we now check ->rt_queued instead. Fix the comment in pick_next_task_rt() as well, which was still referencing the old logic. Signed-off-by: Muchun Song <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds6-43/+84
Pull networking fixes from David Miller: 1) GRO overflow entries are not unlinked properly, resulting in list poison pointers being dereferenced. 2) Fix bridge build with ipv6 disabled, from Nikolay Aleksandrov. 3) Direct packet access and other fixes in BPF from Daniel Borkmann. 4) gred_change_table_def() gets passed the wrong pointer, a pointer to a set of unparsed attributes instead of the attribute itself. From Jakub Kicinski. 5) Allow macsec device to be brought up even if it's lowerdev is down, from Sabrina Dubroca. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: diag: document swapped src/dst in udp_dump_one. macsec: let the administrator set UP state even if lowerdev is down macsec: update operstate when lower device changes net: sched: gred: pass the right attribute to gred_change_table_def() ptp: drop redundant kasprintf() to create worker name net: bridge: remove ipv6 zero address check in mcast queries net: Properly unlink GRO packets on overflow. bpf: fix wrong helper enablement in cgroup local storage bpf: add bpf_jit_limit knob to restrict unpriv allocations bpf: make direct packet write unclone more robust bpf: fix leaking uninitialized memory on pop/peek helpers bpf: fix direct packet write into pop/peek helpers bpf: fix cg_skb types to hint access type in may_access_direct_pkt_data bpf: fix direct packet access for flow dissector progs bpf: disallow direct packet access for unpriv in cg_skb bpf: fix test suite to enable all unpriv program types bpf, btf: fix a missing check bug in btf_parse selftests/bpf: add config fragments BPF_STREAM_PARSER and XDP_SOCKETS bpf: devmap: fix wrong interface selection in notifier_call
2018-10-28Merge tag 'kbuild-v4.20' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - optimize kallsyms slightly - remove check for old CFLAGS usage - add some compiler flags unconditionally instead of evaluating $(call cc-option,...) - fix variable shadowing in host tools - refactor scripts/mkmakefile - refactor various makefiles * tag 'kbuild-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: modpost: Create macro to avoid variable shadowing ASN.1: Remove unnecessary shadowed local variable kbuild: use 'else ifeq' for checksrc to improve readability kbuild: remove unneeded link_multi_deps kbuild: add -Wno-unused-but-set-variable flag unconditionally kbuild: add -Wdeclaration-after-statement flag unconditionally kbuild: add -Wno-pointer-sign flag unconditionally modpost: remove leftover symbol prefix handling for module device table kbuild: simplify command line creation in scripts/mkmakefile kbuild: do not pass $(objtree) to scripts/mkmakefile kbuild: remove user ID check in scripts/mkmakefile kbuild: remove VERSION and PATCHLEVEL from $(objtree)/Makefile kbuild: add --include-dir flag only for out-of-tree build kbuild: remove dead code in cmd_files calculation in top Makefile kbuild: hide most of targets when running config or mixed targets kbuild: remove old check for CFLAGS use kbuild: prefix Makefile.dtbinst path with $(srctree) unconditionally kallsyms: remove left-over Blackfin code kallsyms: reduce size a little on 64-bit
2018-10-28Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds1-60/+15
Pull XArray conversion from Matthew Wilcox: "The XArray provides an improved interface to the radix tree data structure, providing locking as part of the API, specifying GFP flags at allocation time, eliminating preloading, less re-walking the tree, more efficient iterations and not exposing RCU-protected pointers to its users. This patch set 1. Introduces the XArray implementation 2. Converts the pagecache to use it 3. Converts memremap to use it The page cache is the most complex and important user of the radix tree, so converting it was most important. Converting the memremap code removes the only other user of the multiorder code, which allows us to remove the radix tree code that supported it. I have 40+ followup patches to convert many other users of the radix tree over to the XArray, but I'd like to get this part in first. The other conversions haven't been in linux-next and aren't suitable for applying yet, but you can see them in the xarray-conv branch if you're interested" * 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits) radix tree: Remove multiorder support radix tree test: Convert multiorder tests to XArray radix tree tests: Convert item_delete_rcu to XArray radix tree tests: Convert item_kill_tree to XArray radix tree tests: Move item_insert_order radix tree test suite: Remove multiorder benchmarking radix tree test suite: Remove __item_insert memremap: Convert to XArray xarray: Add range store functionality xarray: Move multiorder_check to in-kernel tests xarray: Move multiorder_shrink to kernel tests xarray: Move multiorder account test in-kernel radix tree test suite: Convert iteration test to XArray radix tree test suite: Convert tag_tagged_items to XArray radix tree: Remove radix_tree_clear_tags radix tree: Remove radix_tree_maybe_preload_order radix tree: Remove split/join code radix tree: Remove radix_tree_update_node_t page cache: Finish XArray conversion dax: Convert page fault handlers to XArray ...
2018-10-28tracing: Return -ENOENT if there is no target synthetic eventMasami Hiramatsu1-1/+3
Return -ENOENT error if there is no target synthetic event. This notices an operation failure to user as below; # echo 'wakeup_latency u64 lat; pid_t pid;' > synthetic_events # echo '!wakeup' >> synthetic_events sh: write error: No such file or directory Link: http://lkml.kernel.org/r/154013449986.25576.9487131386597290172.stgit@devbox Acked-by: Tom Zanussi <[email protected]> Tested-by: Tom Zanussi <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Rajvi Jingar <[email protected]> Cc: [email protected] Fixes: 4b147936fa50 ('tracing: Add support for 'synthetic' events') Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-10-27tracing: Have stack tracer trace full stackSteven Rostedt (VMware)1-1/+1
The stack tracer traces every function call checking the current stack (in non interrupt context), looking for the deepest stack, and saving it when it finds a new max depth. The problem is that it calls save_stack_trace(), and with the new ORC unwinder, it can skip too much. As it looks at the ip of the function call in the backtrace to find where it should start, it doesn't need to skip anything. The stack trace selftest would fail when the kernel was complied with the ORC UNDWINDER enabled. Without skipping functions when doing the stack trace, it now passes again. Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-10-27tracing: Export trace_dump_stack to modulesNikolay Borisov1-0/+1
There is no reason for this function to be unexprted and it's a useful debugging aid. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller6-43/+84
Daniel Borkmann says: ==================== pull-request: bpf 2018-10-27 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix toctou race in BTF header validation, from Martin and Wenwen. 2) Fix devmap interface comparison in notifier call which was neglecting netns, from Taehee. 3) Several fixes in various places, for example, correcting direct packet access and helper function availability, from Daniel. 4) Fix BPF kselftest config fragment to include af_xdp and sockmap, from Naresh. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-10-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds11-193/+1137
Merge updates from Andrew Morton: - a few misc things - ocfs2 updates - most of MM * emailed patches from Andrew Morton <[email protected]>: (132 commits) hugetlbfs: dirty pages as they are added to pagecache mm: export add_swap_extent() mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page() mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page() mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t mm/gup: cache dev_pagemap while pinning pages Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved" mm: return zero_resv_unavail optimization mm: zero remaining unavailable struct pages tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option tools/testing/selftests/vm/gup_benchmark.c: allow user specified file tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage mm/gup_benchmark.c: add additional pinning methods mm/gup_benchmark.c: time put_page() mm: don't raise MEMCG_OOM event due to failed high-order allocation mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock ...
2018-10-26mm: defer ZONE_DEVICE page initialization to the point where we init pgmapAlexander Duyck1-15/+10
The ZONE_DEVICE pages were being initialized in two locations. One was with the memory_hotplug lock held and another was outside of that lock. The problem with this is that it was nearly doubling the memory initialization time. Instead of doing this twice, once while holding a global lock and once without, I am opting to defer the initialization to the one outside of the lock. This allows us to avoid serializing the overhead for memory init and we can instead focus on per-node init times. One issue I encountered is that devm_memremap_pages and hmm_devmmem_pages_create were initializing only the pgmap field the same way. One wasn't initializing hmm_data, and the other was initializing it to a poison value. Since this is something that is exposed to the driver in the case of hmm I am opting for a third option and just initializing hmm_data to 0 since this is going to be exposed to unknown third party drivers. [[email protected]: fix reference count for pgmap in devm_memremap_pages] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Duyck <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Tested-by: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-26psi: cgroup supportJohannes Weiner2-10/+153
On a system that executes multiple cgrouped jobs and independent workloads, we don't just care about the health of the overall system, but also that of individual jobs, so that we can ensure individual job health, fairness between jobs, or prioritize some jobs over others. This patch implements pressure stall tracking for cgroups. In kernels with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure, and io.pressure files that track aggregate pressure stall times for only the tasks inside the cgroup. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Tejun Heo <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Daniel Drake <[email protected]> Tested-by: Suren Baghdasaryan <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Enderborg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Vinayak Menon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-26psi: pressure stall information for CPU, memory, and IOJohannes Weiner6-2/+760
When systems are overcommitted and resources become contended, it's hard to tell exactly the impact this has on workload productivity, or how close the system is to lockups and OOM kills. In particular, when machines work multiple jobs concurrently, the impact of overcommit in terms of latency and throughput on the individual job can be enormous. In order to maximize hardware utilization without sacrificing individual job health or risk complete machine lockups, this patch implements a way to quantify resource pressure in the system. A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that expose the percentage of time the system is stalled on CPU, memory, or IO, respectively. Stall states are aggregate versions of the per-task delay accounting delays: cpu: some tasks are runnable but not executing on a CPU memory: tasks are reclaiming, or waiting for swapin or thrashing cache io: tasks are waiting for io completions These percentages of walltime can be thought of as pressure percentages, and they give a general sense of system health and productivity loss incurred by resource overcommit. They can also indicate when the system is approaching lockup scenarios and OOMs. To do this, psi keeps track of the task states associated with each CPU and samples the time they spend in stall states. Every 2 seconds, the samples are averaged across CPUs - weighted by the CPUs' non-idle time to eliminate artifacts from unused CPUs - and translated into percentages of walltime. A running average of those percentages is maintained over 10s, 1m, and 5m periods (similar to the loadaverage). [[email protected]: doc fixlet, per Randy] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: code optimization] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: rename psi_clock() to psi_update_work(), per Peter] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix build] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Daniel Drake <[email protected]> Tested-by: Suren Baghdasaryan <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Enderborg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vinayak Menon <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-26sched: introduce this_rq_lock_irq()Johannes Weiner2-3/+13
do_sched_yield() disables IRQs, looks up this_rq() and locks it. The next patch is adding another site with the same pattern, so provide a convenience function for it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Suren Baghdasaryan <[email protected]> Tested-by: Daniel Drake <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Enderborg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vinayak Menon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-26sched: sched.h: make rq locking and clock functions available in stats.hJohannes Weiner1-82/+82
kernel/sched/sched.h includes "stats.h" half-way through the file. The next patch introduces users of sched.h's rq locking functions and update_rq_clock() in kernel/sched/stats.h. Move those definitions up in the file so they are available in stats.h. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Suren Baghdasaryan <[email protected]> Tested-by: Daniel Drake <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Enderborg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vinayak Menon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-26sched: loadavg: make calc_load_n() publicJohannes Weiner1-69/+69
It's going to be used in a later patch. Keep the churn separate. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Suren Baghdasaryan <[email protected]> Tested-by: Daniel Drake <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Enderborg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vinayak Menon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-26sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOADJohannes Weiner2-21/+1
There are several definitions of those functions/macros in places that mess with fixed-point load averages. Provide an official version. [[email protected]: fix missed conversion in block/blk-iolatency.c] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Suren Baghdasaryan <[email protected]> Tested-by: Daniel Drake <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Enderborg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vinayak Menon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-26delayacct: track delays from thrashing cache pagesJohannes Weiner1-0/+15
Delay accounting already measures the time a task spends in direct reclaim and waiting for swapin, but in low memory situations tasks spend can spend a significant amount of their time waiting on thrashing page cache. This isn't tracked right now. To know the full impact of memory contention on an individual task, measure the delay when waiting for a recently evicted active cache page to read back into memory. Also update tools/accounting/getdelays.c: [hannes@computer accounting]$ sudo ./getdelays -d -p 1 print delayacct stats ON PID 1 CPU count real total virtual total delay total delay average 50318 745000000 847346785 400533713 0.008ms IO count delay total delay average 435 122601218 0ms SWAP count delay total delay average 0 0 0ms RECLAIM count delay total delay average 0 0 0ms THRASHING count delay total delay average 19 12621439 0ms Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Daniel Drake <[email protected]> Tested-by: Suren Baghdasaryan <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Enderborg <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vinayak Menon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-26mm: rework memcg kernel stack accountingRoman Gushchin1-6/+49
If CONFIG_VMAP_STACK is set, kernel stacks are allocated using __vmalloc_node_range() with __GFP_ACCOUNT. So kernel stack pages are charged against corresponding memory cgroups on allocation and uncharged on releasing them. The problem is that we do cache kernel stacks in small per-cpu caches and do reuse them for new tasks, which can belong to different memory cgroups. Each stack page still holds a reference to the original cgroup, so the cgroup can't be released until the vmap area is released. To make this happen we need more than two subsequent exits without forks in between on the current cpu, which makes it very unlikely to happen. As a result, I saw a significant number of dying cgroups (in theory, up to 2 * number_of_cpu + number_of_tasks), which can't be released even by significant memory pressure. As a cgroup structure can take a significant amount of memory (first of all, per-cpu data like memcg statistics), it leads to a noticeable waste of memory. Link: http://lkml.kernel.org/r/[email protected] Fixes: ac496bf48d97 ("fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y") Signed-off-by: Roman Gushchin <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-26Merge tag 'dma-mapping-4.20-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds2-266/+62
Pull more dma-mapping updates from Christoph Hellwig: - various swiotlb cleanups - do not dip into the ѕwiotlb pool for dma coherent allocations - add support for not cache coherent DMA to swiotlb - switch ARM64 to use the generic swiotlb_dma_ops * tag 'dma-mapping-4.20-1' of git://git.infradead.org/users/hch/dma-mapping: arm64: use the generic swiotlb_dma_ops swiotlb: add support for non-coherent DMA swiotlb: don't dip into swiotlb pool for coherent allocations swiotlb: refactor swiotlb_map_page swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs swiotlb: merge swiotlb_unmap_page and unmap_single swiotlb: remove the overflow buffer swiotlb: do not panic on mapping failures swiotlb: mark is_swiotlb_buffer static swiotlb: remove a pointless comment
2018-10-25Merge tag 'printk-for-4.20' of ↵Linus Torvalds1-38/+48
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Fix two more locations where printf formatting leaked pointers - Better log_buf_len parameter handling - Add prefix to messages from printk code - Do not miss messages on other consoles when the log is replayed on a new one - Reduce race between console registration and panic() when the log might get replayed on all consoles - Some cont buffer code clean up - Call console only when there is something to do (log vs cont buffer) * tag 'printk-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: lib/vsprintf: Hash printed address for netdev bits fallback lib/vsprintf: Hash legacy clock addresses lib/vsprintf: Prepare for more general use of ptr_to_id() lib/vsprintf: Make ptr argument conts in ptr_to_id() printk: fix integer overflow in setup_log_buf() printk: do not preliminary split up cont buffer printk: lock/unlock console only for new logbuf entries printk: keep kernel cont support always enabled printk: Give error on attempt to set log buffer length to over 2G printk: Add KBUILD_MODNAME and remove a redundant print prefix printk: Correct wrong casting printk: Fix panic caused by passing log_buf_len to command line printk: CON_PRINTBUFFER console registration is a bit racy printk: Do not miss new messages when replaying the log
2018-10-25bpf: add bpf_jit_limit knob to restrict unpriv allocationsDaniel Borkmann1-3/+46
Rick reported that the BPF JIT could potentially fill the entire module space with BPF programs from unprivileged users which would prevent later attempts to load normal kernel modules or privileged BPF programs, for example. If JIT was enabled but unsuccessful to generate the image, then before commit 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") we would always fall back to the BPF interpreter. Nowadays in the case where the CONFIG_BPF_JIT_ALWAYS_ON could be set, then the load will abort with a failure since the BPF interpreter was compiled out. Add a global limit and enforce it for unprivileged users such that in case of BPF interpreter compiled out we fail once the limit has been reached or we fall back to BPF interpreter earlier w/o using module mem if latter was compiled in. In a next step, fair share among unprivileged users can be resolved in particular for the case where we would fail hard once limit is reached. Fixes: 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") Co-Developed-by: Rick Edgecombe <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kees Cook <[email protected]> Cc: LKML <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-10-25bpf: make direct packet write unclone more robustDaniel Borkmann1-1/+5
Given this seems to be quite fragile and can easily slip through the cracks, lets make direct packet write more robust by requiring that future program types which allow for such write must provide a prologue callback. In case of XDP and sk_msg it's noop, thus add a generic noop handler there. The latter starts out with NULL data/data_end unconditionally when sg pages are shared. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-10-25bpf: fix leaking uninitialized memory on pop/peek helpersDaniel Borkmann1-0/+2
Commit f1a2e44a3aec ("bpf: add queue and stack maps") added helpers with ARG_PTR_TO_UNINIT_MAP_VALUE. Meaning, the helper is supposed to fill the map value buffer with data instead of reading from it like in other helpers such as map update. However, given the buffer is allowed to be uninitialized (since we fill it in the helper anyway), it also means that the helper is obliged to wipe the memory in case of an error in order to not allow for leaking uninitialized memory. Given pop/peek is both handled inside __{stack,queue}_map_get(), lets wipe it there on error case, that is, empty stack/queue. Fixes: f1a2e44a3aec ("bpf: add queue and stack maps") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Cc: Mauricio Vasquez B <[email protected]> Acked-by: Mauricio Vasquez B<[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-10-25bpf: fix direct packet write into pop/peek helpersDaniel Borkmann1-2/+0
Commit f1a2e44a3aec ("bpf: add queue and stack maps") probably just copy-pasted .pkt_access for bpf_map_{pop,peek}_elem() helpers, but this is buggy in this context since it would allow writes into cloned skbs which is invalid. Therefore, disable .pkt_access for the two. Fixes: f1a2e44a3aec ("bpf: add queue and stack maps") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Cc: Mauricio Vasquez B <[email protected]> Acked-by: Mauricio Vasquez B<[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-10-25bpf: fix cg_skb types to hint access type in may_access_direct_pkt_dataDaniel Borkmann1-0/+1
Commit b39b5f411dcf ("bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB") added direct packet access for skbs in cg_skb program types, however allowed access type was not added to the may_access_direct_pkt_data() helper. Therefore the latter always returns false. This is not directly an issue, it just means writes are unconditionally disabled (which is correct) but also reads. Latter is relevant in this function when BPF helpers may read direct packet data which is unconditionally disabled then. Fix it by properly adding BPF_PROG_TYPE_CGROUP_SKB to may_access_direct_pkt_data(). Fixes: b39b5f411dcf ("bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Cc: Song Liu <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-10-25bpf: fix direct packet access for flow dissector progsDaniel Borkmann1-2/+4
Commit d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") added direct packet access for skbs in may_access_direct_pkt_data() function where this enables read and write access to the skb->data. This is buggy because without a prologue generator such as bpf_unclone_prologue() we would allow for writing into cloned skbs. Original intention might have been to only allow read access where this is not needed (similar as the flow_dissector_func_proto() indicates which enables only bpf_skb_load_bytes() as well), therefore this patch fixes it to restrict to read-only. Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Cc: Petar Penkov <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-10-26bpf, btf: fix a missing check bug in btf_parseMartin Lau1-33/+25
Wenwen Wang reported: In btf_parse(), the header of the user-space btf data 'btf_data' is firstly parsed and verified through btf_parse_hdr(). In btf_parse_hdr(), the header is copied from user-space 'btf_data' to kernel-space 'btf->hdr' and then verified. If no error happens during the verification process, the whole data of 'btf_data', including the header, is then copied to 'data' in btf_parse(). It is obvious that the header is copied twice here. More importantly, no check is enforced after the second copy to make sure the headers obtained in these two copies are same. Given that 'btf_data' resides in the user space, a malicious user can race to modify the header between these two copies. By doing so, the user can inject inconsistent data, which can cause undefined behavior of the kernel and introduce potential security risk. This issue is similar to the one fixed in commit 8af03d1ae2e1 ("bpf: btf: Fix a missing check bug"). To fix it, this patch copies the user 'btf_data' *before* parsing / verifying the BTF header. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Signed-off-by: Martin KaFai Lau <[email protected]> Co-developed-by: Wenwen Wang <[email protected]> Acked-by: Song Liu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-10-26bpf: devmap: fix wrong interface selection in notifier_callTaehee Yoo1-2/+1
The dev_map_notification() removes interface in devmap if unregistering interface's ifindex is same. But only checking ifindex is not enough because other netns can have same ifindex. so that wrong interface selection could occurred. Hence netdev pointer comparison code is added. v2: compare netdev pointer instead of using net_eq() (Daniel Borkmann) v1: Initial patch Fixes: 2ddf71e23cc2 ("net: add notifier hooks for devmap bpf map") Signed-off-by: Taehee Yoo <[email protected]> Acked-by: Song Liu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-10-25Merge branch 'irq-core-for-linus' of ↵Linus Torvalds3-7/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The interrupt brigade came up with the following updates: - Driver for the Marvell System Error Interrupt machinery - Overhaul of the GIC-V3 ITS driver - Small updates and fixes all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) genirq: Fix race on spurious interrupt detection softirq: Fix typo in __do_softirq() comments genirq: Fix grammar s/an /a / irqchip/gic: Unify GIC priority definitions irqchip/gic-v3: Remove acknowledge loop dt-bindings/interrupt-controller: Add documentation for Marvell SEI controller dt-bindings/interrupt-controller: Update Marvell ICU bindings irqchip/irq-mvebu-icu: Add support for System Error Interrupts (SEI) arm64: marvell: Enable SEI driver irqchip/irq-mvebu-sei: Add new driver for Marvell SEI irqchip/irq-mvebu-icu: Support ICU subnodes irqchip/irq-mvebu-icu: Disociate ICU and NSR irqchip/irq-mvebu-icu: Clarify the reset operation of configured interrupts irqchip/irq-mvebu-icu: Fix wrong private data retrieval dt-bindings/interrupt-controller: Fix Marvell ICU length in the example genirq/msi: Allow creation of a tree-based irqdomain for platform-msi dt-bindings: irqchip: renesas-irqc: Document r8a7744 support dt-bindings: irqchip: renesas-irqc: Document R-Car E3 support irqchip/pdc: Setup all edge interrupts as rising edge at GIC irqchip/gic-v3-its: Allow use of LPI tables in reserved memory ...
2018-10-25Merge branch 'timers-core-for-linus' of ↵Linus Torvalds11-132/+86
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping updates from Thomas Gleixner: "The timers and timekeeping departement provides: - Another large y2038 update with further preparations for providing the y2038 safe timespecs closer to the syscalls. - An overhaul of the SHCMT clocksource driver - SPDX license identifier updates - Small cleanups and fixes all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) tick/sched : Remove redundant cpu_online() check clocksource/drivers/dw_apb: Add reset control clocksource: Remove obsolete CLOCKSOURCE_OF_DECLARE clocksource/drivers: Unify the names to timer-* format clocksource/drivers/sh_cmt: Add R-Car gen3 support dt-bindings: timer: renesas: cmt: document R-Car gen3 support clocksource/drivers/sh_cmt: Properly line-wrap sh_cmt_of_table[] initializer clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines clocksource/drivers/sh_cmt: Fixup for 64-bit machines clocksource/drivers/sh_tmu: Convert to SPDX identifiers clocksource/drivers/sh_mtu2: Convert to SPDX identifiers clocksource/drivers/sh_cmt: Convert to SPDX identifiers clocksource/drivers/renesas-ostm: Convert to SPDX identifiers clocksource: Convert to using %pOFn instead of device_node.name tick/broadcast: Remove redundant check RISC-V: Request newstat syscalls y2038: signal: Change rt_sigtimedwait to use __kernel_timespec y2038: socket: Change recvmmsg to use __kernel_timespec y2038: sched: Change sched_rr_get_interval to use __kernel_timespec y2038: utimes: Rework #ifdef guards for compat syscalls ...
2018-10-25kernel: hung_task.c: disable on suspendVitaly Kuznetsov1-1/+29
It is possible to observe hung_task complaints when system goes to suspend-to-idle state: # echo freeze > /sys/power/state PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.001 seconds) done. OOM killer disabled. Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. sd 0:0:0:0: [sda] Synchronizing SCSI cache INFO: task bash:1569 blocked for more than 120 seconds. Not tainted 4.19.0-rc3_+ #687 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. bash D 0 1569 604 0x00000000 Call Trace: ? __schedule+0x1fe/0x7e0 schedule+0x28/0x80 suspend_devices_and_enter+0x4ac/0x750 pm_suspend+0x2c0/0x310 Register a PM notifier to disable the detector on suspend and re-enable back on wakeup. Signed-off-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2018-10-25cpuidle: menu: Remove get_loadavg() from the performance multiplierDaniel Lezcano1-7/+0
The function get_loadavg() returns almost always zero. To be more precise, statistically speaking for a total of 1023379 times passing in the function, the load is equal to zero 1020728 times, greater than 100, 610 times, the remaining is between 0 and 5. In 2011, the get_loadavg() was removed from the Android tree because of the above [1]. At this time, the load was: unsigned long this_cpu_load(void) { struct rq *this = this_rq(); return this->cpu_load[0]; } In 2014, the code was changed by commit 372ba8cb46b2 (cpuidle: menu: Lookup CPU runqueues less) and the load is: void get_iowait_load(unsigned long *nr_waiters, unsigned long *load) { struct rq *rq = this_rq(); *nr_waiters = atomic_read(&rq->nr_iowait); *load = rq->load.weight; } with the same result. Both measurements show using the load in this code path does no matter anymore. Removing it. [1] https://android.googlesource.com/kernel/common/+/4dedd9f124703207895777ac6e91dacde0f7cc17 Signed-off-by: Daniel Lezcano <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2018-10-25sched: Factor out nr_iowait and nr_iowait_cpuDaniel Lezcano1-21/+20
The function nr_iowait_cpu() can be used directly by nr_iowait() instead of duplicating code. Call nr_iowait_cpu() from nr_iowait() Signed-off-by: Daniel Lezcano <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2018-10-24kvm_config: add CONFIG_VIRTIO_MENULénaïc Huard1-0/+1
Make sure that make kvmconfig enables all the virtio drivers even if it is preceded by a make allnoconfig. Signed-off-by: Lénaïc Huard <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>