aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2019-03-04printk: Remove no longer used LOG_PREFIX.Tetsuo Handa1-5/+1
When commit 5becfb1df5ac8e49 ("kmsg: merge continuation records while printing") introduced LOG_PREFIX, we used KERN_DEFAULT etc. as a flag for setting LOG_PREFIX in order to tell whether to call cont_add() (i.e. whether to append the message to "struct cont"). But since commit 4bcc595ccd80decb ("printk: reinstate KERN_CONT for printing continuation lines") inverted the behavior (i.e. don't append the message to "struct cont" unless KERN_CONT is specified) and commit 5aa068ea4082b39e ("printk: remove games with previous record flags") removed the last LOG_PREFIX check, setting LOG_PREFIX via KERN_DEFAULT etc. is no longer meaningful. Therefore, we can remove LOG_PREFIX and make KERN_DEFAULT empty string. Link: http://lkml.kernel.org/r/1550829580-9189-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp To: Steven Rostedt <[email protected]> To: Linus Torvalds <[email protected]> Cc: [email protected] Cc: Tetsuo Handa <[email protected]> Signed-off-by: Tetsuo Handa <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Petr Mladek <[email protected]>
2019-03-04Merge branches 'pm-core', 'pm-sleep', 'pm-qos', 'pm-domains' and 'pm-em'Rafael J. Wysocki2-6/+59
* pm-core: PM / core: Add support to skip power management in device/driver model PM / suspend: Print debug messages for device using direct-complete PM-runtime: update time accounting only when enabled PM-runtime: Switch accounting over to ktime_get_mono_fast_ns() PM-runtime: Optimize pm_runtime_autosuspend_expiration() PM-runtime: Replace jiffies-based accounting with ktime-based accounting PM-runtime: update accounting_timestamp on enable PM: clock_ops: fix missing clk_prepare() return value check drm/i915: Move on the new pm runtime interface PM-runtime: Add new interface to get accounted time * pm-sleep: PM / wakeup: fix kerneldoc comment for pm_wakeup_dev_event() * pm-qos: PM: QoS: no need to check return value of debugfs_create functions * pm-domains: PM / Domains: Mark "name" const in dev_pm_domain_attach_by_name() PM / Domains: Mark "name" const in genpd_dev_pm_attach_by_name() PM: domains: no need to check return value of debugfs_create functions * pm-em: PM / EM: Expose the Energy Model in debugfs
2019-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-4/+5
2019-03-01bpf: fix sanitation rewrite in case of non-pointersDaniel Borkmann1-1/+2
Marek reported that he saw an issue with the below snippet in that timing measurements where off when loaded as unpriv while results were reasonable when loaded as privileged: [...] uint64_t a = bpf_ktime_get_ns(); uint64_t b = bpf_ktime_get_ns(); uint64_t delta = b - a; if ((int64_t)delta > 0) { [...] Turns out there is a bug where a corner case is missing in the fix d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar type from different paths"), namely fixup_bpf_calls() only checks whether aux has a non-zero alu_state, but it also needs to test for the case of BPF_ALU_NON_POINTER since in both occasions we need to skip the masking rewrite (as there is nothing to mask). Fixes: d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar type from different paths") Reported-by: Marek Majkowski <[email protected]> Reported-by: Arthur Fabre <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/netdev/CAJPywTJqP34cK20iLM5YmUMz9KXQOdu1-+BZrGMAGgLuBWz7fg@mail.gmail.com/T/ Acked-by: Song Liu <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-03-02bpf: fix u64_stats_init() usage in bpf_prog_alloc()Eric Dumazet1-1/+7
We need to iterate through all possible cpus. Fixes: 492ecee892c2 ("bpf: enable program stats") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Guenter Roeck <[email protected]> Tested-by: Guenter Roeck <[email protected]> Acked-by: Song Liu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-03-01tracing/kprobes: Use probe_kernel_read instead of probe_mem_readMasami Hiramatsu1-1/+1
Use probe_kernel_read() instead of probe_mem_read() because probe_mem_read() is a kind of wrapper for switching memory read function between uprobes and kprobes. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-03-01tracing: Fix event filters and triggers to handle negative numbersPavel Tikhomirov1-1/+4
Then tracing syscall exit event it is extremely useful to filter exit codes equal to some negative value, to react only to required errors. But negative numbers does not work: [root@snorch sys_exit_read]# echo "ret == -1" > filter bash: echo: write error: Invalid argument [root@snorch sys_exit_read]# cat filter ret == -1 ^ parse_error: Invalid value (did you forget quotes)? Similar thing happens when setting triggers. These is a regression in v4.17 introduced by the commit mentioned below, testing without these commit shows no problem with negative numbers. Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: 80765597bc58 ("tracing: Rewrite filter logic to be simpler and faster") Signed-off-by: Pavel Tikhomirov <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-03-01bpf: drop refcount if bpf_map_new_fd() fails in map_create()Peng Sun1-2/+2
In bpf/syscall.c, map_create() first set map->usercnt to 1, a file descriptor is supposed to return to userspace. When bpf_map_new_fd() fails, drop the refcount. Fixes: bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID") Signed-off-by: Peng Sun <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-03-01perf: Mark expected switch fall-throughGustavo A. R. Silva1-0/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warning: kernel/events/core.c: In function ‘perf_event_parse_addr_filter’: kernel/events/core.c:9154:11: warning: this statement may fall through [-Wimplicit-fallthrough=] kernel = 1; ~~~~~~~^~~ kernel/events/core.c:9156:3: note: here case IF_SRC_FILEADDR: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kees Kook <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/20190212205430.GA8446@embeddedor Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-03-01bpf: fix build without bpf_syscallAlexei Starovoitov1-2/+5
wrap bpf_stats_enabled sysctl with #ifdef Reported-by: Stephen Rothwell <[email protected]> Fixes: 492ecee892c2 ("bpf: enable program stats") Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-02-28mm/resource: Let walk_system_ram_range() search child resourcesDave Hansen1-1/+4
In the process of onlining memory, we use walk_system_ram_range() to find the actual RAM areas inside of the area being onlined. However, it currently only finds memory resources which are "top-level" iomem_resources. Children are not currently searched which causes it to skip System RAM in areas like this (in the format of /proc/iomem): a0000000-bfffffff : Persistent Memory (legacy) a0000000-afffffff : System RAM Changing the true->false here allows children to be searched as well. We need this because we add a new "System RAM" resource underneath the "persistent memory" resource when we use persistent memory in a volatile mode. Signed-off-by: Dave Hansen <[email protected]> Cc: Keith Busch <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Huang Ying <[email protected]> Cc: Fengguang Wu <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Yaowei Bai <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Jerome Glisse <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2019-02-28mm/resource: Move HMM pr_debug() deeper into resource codeDave Hansen1-0/+9
HMM consumes physical address space for its own use, even though nothing is mapped or accessible there. It uses a special resource description (IORES_DESC_DEVICE_PRIVATE_MEMORY) to uniquely identify these areas. When HMM consumes address space, it makes a best guess about what to consume. However, it is possible that a future memory or device hotplug can collide with the reserved area. In the case of these conflicts, there is an error message in register_memory_resource(). Later patches in this series move register_memory_resource() from using request_resource_conflict() to __request_region(). Unfortunately, __request_region() does not return the conflict like the previous function did, which makes it impossible to check for IORES_DESC_DEVICE_PRIVATE_MEMORY in a conflicting resource. Instead of warning in register_memory_resource(), move the check into the core resource code itself (__request_region()) where the conflicting resource _is_ available. This has the added bonus of producing a warning in case of HMM conflicts with devices *or* RAM address space, as opposed to the RAM- only warnings that were there previously. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jerome Glisse <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Huang Ying <[email protected]> Cc: Fengguang Wu <[email protected]> Cc: Keith Busch <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2019-02-28mm/resource: Return real error codes from walk failuresDave Hansen1-2/+2
walk_system_ram_range() can return an error code either becuase *it* failed, or because the 'func' that it calls returned an error. The memory hotplug does the following: ret = walk_system_ram_range(..., func); if (ret) return ret; and 'ret' makes it out to userspace, eventually. The problem s, walk_system_ram_range() failues that result from *it* failing (as opposed to 'func') return -1. That leads to a very odd -EPERM (-1) return code out to userspace. Make walk_system_ram_range() return -EINVAL for internal failures to keep userspace less confused. This return code is compatible with all the callers that I audited. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Bjorn Helgaas <[email protected]> Acked-by: Michael Ellerman <[email protected]> (powerpc) Cc: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Huang Ying <[email protected]> Cc: Fengguang Wu <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Yaowei Bai <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: [email protected] Cc: Keith Busch <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2019-02-28perf, bpf: Consider events with attr.bpf_event as side-band eventsSong Liu1-1/+2
Events with attr.bpf_event set should be considered as side-band events, as they carry information about BPF programs. Signed-off-by: Song Liu <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: 6ee52e2a3fe4 ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-02-28io_uring: add support for pre-mapped user IO buffersJens Axboe1-0/+1
If we have fixed user buffers, we can map them into the kernel when we setup the io_uring. That avoids the need to do get_user_pages() for each and every IO. To utilize this feature, the application must call io_uring_register() after having setup an io_uring instance, passing in IORING_REGISTER_BUFFERS as the opcode. The argument must be a pointer to an iovec array, and the nr_args should contain how many iovecs the application wishes to map. If successful, these buffers are now mapped into the kernel, eligible for IO. To use these fixed buffers, the application must use the IORING_OP_READ_FIXED and IORING_OP_WRITE_FIXED opcodes, and then set sqe->index to the desired buffer index. sqe->addr..sqe->addr+seq->len must point to somewhere inside the indexed buffer. The application may register buffers throughout the lifetime of the io_uring instance. It can call io_uring_register() with IORING_UNREGISTER_BUFFERS as the opcode to unregister the current set of buffers, and then register a new set. The application need not unregister buffers explicitly before shutting down the io_uring instance. It's perfectly valid to setup a larger buffer, and then sometimes only use parts of it for an IO. As long as the range is within the originally mapped region, it will work just fine. For now, buffers must not be file backed. If file backed buffers are passed in, the registration will fail with -1/EOPNOTSUPP. This restriction may be relaxed in the future. RLIMIT_MEMLOCK is used to check how much memory we can pin. A somewhat arbitrary 1G per buffer size is also imposed. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-28Add io_uring IO interfaceJens Axboe1-0/+2
The submission queue (SQ) and completion queue (CQ) rings are shared between the application and the kernel. This eliminates the need to copy data back and forth to submit and complete IO. IO submissions use the io_uring_sqe data structure, and completions are generated in the form of io_uring_cqe data structures. The SQ ring is an index into the io_uring_sqe array, which makes it possible to submit a batch of IOs without them being contiguous in the ring. The CQ ring is always contiguous, as completion events are inherently unordered, and hence any io_uring_cqe entry can point back to an arbitrary submission. Two new system calls are added for this: io_uring_setup(entries, params) Sets up an io_uring instance for doing async IO. On success, returns a file descriptor that the application can mmap to gain access to the SQ ring, CQ ring, and io_uring_sqes. io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize) Initiates IO against the rings mapped to this fd, or waits for them to complete, or both. The behavior is controlled by the parameters passed in. If 'to_submit' is non-zero, then we'll try and submit new IO. If IORING_ENTER_GETEVENTS is set, the kernel will wait for 'min_complete' events, if they aren't already available. It's valid to set IORING_ENTER_GETEVENTS and 'min_complete' == 0 at the same time, this allows the kernel to return already completed events without waiting for them. This is useful only for polling, as for IRQ driven IO, the application can just check the CQ ring without entering the kernel. With this setup, it's possible to do async IO with a single system call. Future developments will enable polled IO with this interface, and polled submission as well. The latter will enable an application to do IO without doing ANY system calls at all. For IRQ driven IO, an application only needs to enter the kernel for completions if it wants to wait for them to occur. Each io_uring is backed by a workqueue, to support buffered async IO as well. We will only punt to an async context if the command would need to wait for IO on the device side. Any data that can be accessed directly in the page cache is done inline. This avoids the slowness issue of usual threadpools, since cached data is accessed as quickly as a sync interface. Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-02-28kthread: Do not use TIMER_IRQSAFESebastian Andrzej Siewior1-2/+3
The TIMER_IRQSAFE usage was introduced in commit 22597dc3d97b1 ("kthread: initial support for delayed kthread work") which modelled the delayed kthread code after workqueue's code. The workqueue code requires the flag TIMER_IRQSAFE for synchronisation purpose. This is not true for kthread's delay timer since all operations occur under a lock. Remove TIMER_IRQSAFE from the timer initialisation and use timer_setup() for initialisation purpose which is the official function. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-02-28kthread: Convert worker lock to raw spinlockJulia Cartwright1-21/+21
In order to enable the queuing of kthread work items from hardirq context even when PREEMPT_RT_FULL is enabled, convert the worker spin_lock to a raw_spin_lock. This is only acceptable to do because the work performed under the lock is well-bounded and minimal. Reported-by: Steffen Trumtrar <[email protected]> Reported-by: Tim Sander <[email protected]> Signed-off-by: Julia Cartwright <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Steffen Trumtrar <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Guenter Roeck <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-02-28vfs: Add some logging to the core users of the fs_context logDavid Howells1-1/+1
Add some logging to the core users of the fs_context log so that information can be extracted from them as to the reason for failure. Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2019-02-28cpuset: Use fs_contextDavid Howells1-14/+42
Make the cpuset filesystem use the filesystem context. This is potentially tricky as the cpuset fs is almost an alias for the cgroup filesystem, but with some special parameters. This can, however, be handled by setting up an appropriate cgroup filesystem and returning the root directory of that as the root dir of this one. Signed-off-by: David Howells <[email protected]> cc: Tejun Heo <[email protected]> Signed-off-by: Al Viro <[email protected]>
2019-02-28kernfs, sysfs, cgroup, intel_rdt: Support fs_contextDavid Howells2-18/+18
Make kernfs support superblock creation/mount/remount with fs_context. This requires that sysfs, cgroup and intel_rdt, which are built on kernfs, be made to support fs_context also. Notes: (1) A kernfs_fs_context struct is created to wrap fs_context and the kernfs mount parameters are moved in here (or are in fs_context). (2) kernfs_mount{,_ns}() are made into kernfs_get_tree(). The extra namespace tag parameter is passed in the context if desired (3) kernfs_free_fs_context() is provided as a destructor for the kernfs_fs_context struct, but for the moment it does nothing except get called in the right places. (4) sysfs doesn't wrap kernfs_fs_context since it has no parameters to pass, but possibly this should be done anyway in case someone wants to add a parameter in future. (5) A cgroup_fs_context struct is created to wrap kernfs_fs_context and the cgroup v1 and v2 mount parameters are all moved there. (6) cgroup1 parameter parsing error messages are now handled by invalf(), which allows userspace to collect them directly. (7) cgroup1 parameter cleanup is now done in the context destructor rather than in the mount/get_tree and remount functions. Weirdies: (*) cgroup_do_get_tree() calls cset_cgroup_from_root() with locks held, but then uses the resulting pointer after dropping the locks. I'm told this is okay and needs commenting. (*) The cgroup refcount web. This really needs documenting. (*) cgroup2 only has one root? Add a suggestion from Thomas Gleixner in which the RDT enablement code is placed into its own function. [folded a leak fix from Andrey Vagin] Signed-off-by: David Howells <[email protected]> cc: Greg Kroah-Hartman <[email protected]> cc: Tejun Heo <[email protected]> cc: Li Zefan <[email protected]> cc: Johannes Weiner <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Al Viro <[email protected]>
2019-02-28cgroup: store a reference to cgroup_ns into cgroup_fs_contextAl Viro3-12/+17
... and trim cgroup_do_mount() arguments (renaming it to cgroup_do_get_tree()) Signed-off-by: Al Viro <[email protected]>
2019-02-28cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helperAl Viro1-41/+46
Signed-off-by: Al Viro <[email protected]>
2019-02-28cgroup_do_mount(): massage calling conventionsAl Viro3-36/+29
pass it fs_context instead of fs_type/flags/root triple, have it return int instead of dentry and make it deal with setting fc->root. Signed-off-by: Al Viro <[email protected]>
2019-02-28cgroup: stash cgroup_root reference into cgroup_fs_contextAl Viro3-4/+10
Note that this reference is *NOT* contributing to refcount of cgroup_root in question and is valid only until cgroup_do_mount() returns. Signed-off-by: Al Viro <[email protected]>
2019-02-28cgroup2: switch to option-by-option parsingAl Viro1-29/+33
[again, carved out of patch by dhowells] [NB: we probably want to handle "source" in parse_param here] Signed-off-by: Al Viro <[email protected]>
2019-02-28cgroup1: switch to option-by-option parsingAl Viro3-98/+117
[dhowells should be the author - it's carved out of his patch] Signed-off-by: Al Viro <[email protected]>
2019-02-28cgroup: take options parsing into ->parse_monolithic()Al Viro3-116/+104
Store the results in cgroup_fs_context. There's a nasty twist caused by the enabling/disabling subsystems - we can't do the checks sensitive to that until cgroup_mutex gets grabbed. Frankly, these checks are complete bullshit (e.g. all,none combination is accepted if all subsystems are disabled; so's cpusets,none and all,cpusets when cpusets is disabled, etc.), but touching that would be a userland-visible behaviour change ;-/ So we do parsing in ->parse_monolithic() and have the consistency checks done in check_cgroupfs_options(), with the latter called (on already parsed options) from cgroup1_get_tree() and cgroup1_reconfigure(). Freeing the strdup'ed strings is done from fs_context destructor, which somewhat simplifies the life for cgroup1_{get_tree,reconfigure}(). Signed-off-by: Al Viro <[email protected]>
2019-02-28cgroup: fold cgroup1_mount() into cgroup1_get_tree()Al Viro3-31/+18
Signed-off-by: Al Viro <[email protected]>
2019-02-28cgroup: start switching to fs_contextAl Viro3-41/+116
Unfortunately, cgroup is tangled into kernfs infrastructure. To avoid converting all kernfs-based filesystems at once, we need to untangle the remount part of things, instead of having it go through kernfs_sop_remount_fs(). Fortunately, it's not hard to do. This commit just gets cgroup/cgroup1 to use fs_context to deliver options on mount and remount paths. Parsing those is going to be done in the next commits; for now we do pretty much what legacy case does. Signed-off-by: Al Viro <[email protected]>
2019-02-28Merge tag 'perf-core-for-mingo-5.1-20190225' of ↵Ingo Molnar1-31/+59
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf annotate: Wei Li: - Fix getting source line failure perf script: Andi Kleen: - Handle missing fields with -F +... perf data: Jiri Olsa: - Prep work to support per-cpu files in a directory. Intel PT: Adrian Hunter: - Improve thread_stack__no_call_return() - Hide x86 retpolines in thread stacks. - exported SQL viewer refactorings, new 'top calls' report.. Alexander Shishkin: - Copy parent's address filter offsets on clone - Fix address filters for vmas with non-zero offset. Applies to ARM's CoreSight as well. python scripts: Tony Jones: - Python3 support for several 'perf script' python scripts. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar21-71/+215
Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Add module_param to enable consistency checksPeter Zijlstra1-13/+33
And move the whole lot under CONFIG_DEBUG_LOCKDEP. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28kernel/workqueue: Use dynamic lockdep keys for workqueuesBart Van Assche1-9/+50
The following commit: 87915adc3f0a ("workqueue: re-add lockdep dependencies for flushing") improved deadlock checking in the workqueue implementation. Unfortunately that patch also introduced a few false positive lockdep complaints. This patch suppresses these false positives by allocating the workqueue mutex lockdep key dynamically. An example of a false positive lockdep complaint suppressed by this patch can be found below. The root cause of the lockdep complaint shown below is that the direct I/O code can call alloc_workqueue() from inside a work item created by another alloc_workqueue() call and that both workqueues share the same lockdep key. This patch avoids that that lockdep complaint is triggered by allocating the work queue lockdep keys dynamically. In other words, this patch guarantees that a unique lockdep key is associated with each work queue mutex. ====================================================== WARNING: possible circular locking dependency detected 4.19.0-dbg+ #1 Not tainted fio/4129 is trying to acquire lock: 00000000a01cfe1a ((wq_completion)"dio/%s"sb->s_id){+.+.}, at: flush_workqueue+0xd0/0x970 but task is already holding lock: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sb->s_type->i_mutex_key#14){+.+.}: down_write+0x3d/0x80 __generic_file_fsync+0x77/0xf0 ext4_sync_file+0x3c9/0x780 vfs_fsync_range+0x66/0x100 dio_complete+0x2f5/0x360 dio_aio_complete_work+0x1c/0x20 process_one_work+0x481/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #1 ((work_completion)(&dio->complete_work)){+.+.}: process_one_work+0x447/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #0 ((wq_completion)"dio/%s"sb->s_id){+.+.}: lock_acquire+0xc5/0x200 flush_workqueue+0xf3/0x970 drain_workqueue+0xec/0x220 destroy_workqueue+0x23/0x350 sb_init_dio_done_wq+0x6a/0x80 do_blockdev_direct_IO+0x1f33/0x4be0 __blockdev_direct_IO+0x79/0x86 ext4_direct_IO+0x5df/0xbb0 generic_file_direct_write+0x119/0x220 __generic_file_write_iter+0x131/0x2d0 ext4_file_write_iter+0x3fa/0x710 aio_write+0x235/0x330 io_submit_one+0x510/0xeb0 __x64_sys_io_submit+0x122/0x340 do_syscall_64+0x71/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: (wq_completion)"dio/%s"sb->s_id --> (work_completion)(&dio->complete_work) --> &sb->s_type->i_mutex_key#14 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#14); lock((work_completion)(&dio->complete_work)); lock(&sb->s_type->i_mutex_key#14); lock((wq_completion)"dio/%s"sb->s_id); *** DEADLOCK *** 1 lock held by fio/4129: #0: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710 stack backtrace: CPU: 3 PID: 4129 Comm: fio Not tainted 4.19.0-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xc5 print_circular_bug.isra.32+0x20a/0x218 __lock_acquire+0x1c68/0x1cf0 lock_acquire+0xc5/0x200 flush_workqueue+0xf3/0x970 drain_workqueue+0xec/0x220 destroy_workqueue+0x23/0x350 sb_init_dio_done_wq+0x6a/0x80 do_blockdev_direct_IO+0x1f33/0x4be0 __blockdev_direct_IO+0x79/0x86 ext4_direct_IO+0x5df/0xbb0 generic_file_direct_write+0x119/0x220 __generic_file_write_iter+0x131/0x2d0 ext4_file_write_iter+0x3fa/0x710 aio_write+0x235/0x330 io_submit_one+0x510/0xeb0 __x64_sys_io_submit+0x122/0x340 do_syscall_64+0x71/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lkml.kernel.org/r/[email protected] [ Reworked the changelog a bit. ] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Add support for dynamic keysBart Van Assche1-8/+113
A shortcoming of the current lockdep implementation is that it requires lock keys to be allocated statically. That forces all instances of lock objects that occur in a given data structure to share a lock key. Since lock dependency analysis groups lock objects per key sharing lock keys can cause false positive lockdep reports. Make it possible to avoid such false positive reports by allowing lock keys to be allocated dynamically. Require that dynamically allocated lock keys are registered before use by calling lockdep_register_key(). Complain about attempts to register the same lock key pointer twice without calling lockdep_unregister_key() between successive registration calls. The purpose of the new lock_keys_hash[] data structure that keeps track of all dynamic keys is twofold: - Verify whether the lockdep_register_key() and lockdep_unregister_key() functions are used correctly. - Avoid that lockdep_init_map() complains when encountering a dynamically allocated key. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Verify whether lock objects are small enough to be used as ↵Bart Van Assche1-0/+11
class keys Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Check data structure consistencyBart Van Assche1-0/+167
Debugging lockdep data structure inconsistencies is challenging. Add code that verifies data structure consistency at runtime. That code is disabled by default because it is very CPU intensive. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Reuse lock chains that have been freedBart Van Assche1-20/+37
A previous patch introduced a lock chain leak. Fix that leak by reusing lock chains that have been freed. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Fix a comment in add_chain_cache()Bart Van Assche1-1/+1
Reflect that add_chain_cache() is always called with the graph lock held. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Introduce lockdep_next_lockchain() and lock_chain_count()Bart Van Assche3-8/+23
This patch does not change any functionality but makes the next patch in this series easier to read. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Reuse list entries that are no longer in useBart Van Assche1-8/+16
Instead of abandoning elements of list_entries[] that are no longer in use, make alloc_list_entry() reuse array elements that have been freed. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Free lock classes that are no longer in useBart Van Assche1-48/+348
Instead of leaving lock classes that are no longer in use in the lock_classes array, reuse entries from that array that are no longer in use. Maintain a linked list of free lock classes with list head 'free_lock_class'. Only add freed lock classes to the free_lock_classes list after a grace period to avoid that a lock_classes[] element would be reused while an RCU reader is accessing it. Since the lockdep selftests run in a context where sleeping is not allowed and since the selftests require that lock resetting/zapping works with debug_locks off, make the behavior of lockdep_free_key_range() and lockdep_reset_lock() depend on whether or not these are called from the context of the lockdep selftests. Thanks to Peter for having shown how to modify get_pending_free() such that that function does not have to sleep. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Update two outdated commentsBart Van Assche1-5/+3
synchronize_sched() has been removed recently. Update the comments that refer to synchronize_sched(). Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: 51959d85f32d ("lockdep: Replace synchronize_sched() with synchronize_rcu()") # v5.0-rc1 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Make it easy to detect whether or not inside a selftestBart Van Assche1-0/+6
The patch that frees unused lock classes will modify the behavior of lockdep_free_key_range() and lockdep_reset_lock() depending on whether or not these functions are called from the context of the lockdep selftests. Hence make it easy to detect whether or not lockdep code is called from the context of a lockdep selftest. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Split lockdep_free_key_range() and lockdep_reset_lock()Bart Van Assche1-36/+36
This patch does not change the behavior of these functions but makes the patch that frees unused lock classes easier to read. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Initialize the locks_before and locks_after lists earlierBart Van Assche1-2/+27
This patch does not change any functionality. A later patch will reuse lock classes that have been freed. In combination with that patch this patch wil have the effect of initializing lock class order lists once instead of every time a lock class structure is reinitialized. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Make zap_class() remove all matching lock order entriesBart Van Assche1-6/+13
Make sure that all lock order entries that refer to a class are removed from the list_entries[] array when a kernel module is unloaded. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Avoid that add_chain_cache() adds an invalid chain to the cacheBart Van Assche1-10/+1
Make sure that add_chain_cache() returns 0 and does not modify the chain hash if nr_chain_hlocks == MAX_LOCKDEP_CHAIN_HLOCKS before this function is called. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Fix reported required memory size (2/2)Bart Van Assche1-1/+2
Lock chains are only tracked with CONFIG_PROVE_LOCKING=y. Do not report the memory required for the lock chain array if CONFIG_PROVE_LOCKING=n. See also commit: ca58abcb4a6d ("lockdep: sanitise CONFIG_PROVE_LOCKING") Include the size of the chain_hlocks[] array. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-28locking/lockdep: Fix reported required memory size (1/2)Bart Van Assche1-7/+7
Change the sizeof(array element time) * (array size) expressions into sizeof(array). This fixes the size computations of the classhash_table[] and chainhash_table[] arrays. The reason is that commit: a63f38cc4ccf ("locking/lockdep: Convert hash tables to hlists") changed the type of the elements of that array from 'struct list_head' into 'struct hlist_head'. Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>