aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2019-06-05kernel/module.c: Only return -EEXIST for modules that have finished loadingPrarit Bhargava1-4/+2
Microsoft HyperV disables the X86_FEATURE_SMCA bit on AMD systems, and linux guests boot with repeated errors: amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2) amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2) amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2) amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2) amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2) amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2) The warnings occur because the module code erroneously returns -EEXIST for modules that have failed to load and are in the process of being removed from the module list. module amd64_edac_mod has a dependency on module edac_mce_amd. Using modules.dep, systemd will load edac_mce_amd for every request of amd64_edac_mod. When the edac_mce_amd module loads, the module has state MODULE_STATE_UNFORMED and once the module load fails and the state becomes MODULE_STATE_GOING. Another request for edac_mce_amd module executes and add_unformed_module() will erroneously return -EEXIST even though the previous instance of edac_mce_amd has MODULE_STATE_GOING. Upon receiving -EEXIST, systemd attempts to load amd64_edac_mod, which fails because of unknown symbols from edac_mce_amd. add_unformed_module() must wait to return for any case other than MODULE_STATE_LIVE to prevent a race between multiple loads of dependent modules. Signed-off-by: Prarit Bhargava <[email protected]> Signed-off-by: Barret Rhoden <[email protected]> Cc: David Arcari <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Heiko Carstens <[email protected]> Signed-off-by: Jessica Yu <[email protected]>
2019-06-04bpf: remove redundant assignment to errColin Ian King2-2/+2
The variable err is assigned with the value -EINVAL that is never read and it is re-assigned a new value later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-06-03gcov: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-22/+2
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Also delete the dentry variable as it is never needed. Acked-by: Peter Oberparleiter <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-03dma-direct: provide generic support for uncached kernel segmentsChristoph Hellwig1-2/+15
A few architectures support uncached kernel segments. In that case we get an uncached mapping for a given physica address by using an offset in the uncached segement. Implement support for this scheme in the generic dma-direct code instead of duplicating it in arch hooks. Signed-off-by: Christoph Hellwig <[email protected]>
2019-06-03dma-contiguous: use fallback alloc_pages for single pagesNicolin Chen1-1/+10
The addresses within a single page are always contiguous, so it's not so necessary to always allocate one single page from CMA area. Since the CMA area has a limited predefined size of space, it may run out of space in heavy use cases, where there might be quite a lot CMA pages being allocated for single pages. However, there is also a concern that a device might care where a page comes from -- it might expect the page from CMA area and act differently if the page doesn't. This patch tries to use the fallback alloc_pages path, instead of one-page size allocations from the global CMA area in case that a device does not have its own CMA area. This'd save resources from the CMA global area for more CMA allocations, and also reduce CMA fragmentations resulted from trivial allocations. Signed-off-by: Nicolin Chen <[email protected]> Tested-by: dann frazier <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-06-03dma-contiguous: add dma_{alloc,free}_contiguous() helpersNicolin Chen2-20/+51
Both dma_alloc_from_contiguous() and dma_release_from_contiguous() are very simply implemented, but requiring callers to pass certain parameters like count and align, and taking a boolean parameter to check __GFP_NOWARN in the allocation flags. So every function call duplicates similar work: unsigned long order = get_order(size); size_t count = size >> PAGE_SHIFT; page = dma_alloc_from_contiguous(dev, count, order, gfp & __GFP_NOWARN); [...] dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT); Additionally, as CMA can be used only in the context which permits sleeping, most of callers do a gfpflags_allow_blocking() check and a corresponding fallback allocation of normal pages upon any false result: if (gfpflags_allow_blocking(flag)) page = dma_alloc_from_contiguous(); if (!page) page = alloc_pages(); [...] if (!dma_release_from_contiguous(dev, page, count)) __free_pages(page, get_order(size)); So this patch simplifies those function calls by abstracting these operations into the two new functions: dma_{alloc,free}_contiguous. As some callers of dma_{alloc,release}_from_contiguous() might be complicated, this patch just implements these two new functions to kernel/dma/direct.c only as an initial step. Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: Nicolin Chen <[email protected]> Tested-by: dann frazier <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-06-03kprobes: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-19/+6
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: "Naveen N. Rao" <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: "David S. Miller" <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-03fail_function: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-18/+5
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Kees Cook <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: zhong jiang <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-03blktrace: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-6/+0
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Jens Axboe <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-03trace: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-4/+0
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-03locking/lock_events: Use raw_cpu_{add,inc}() for statsPeter Zijlstra1-41/+4
Instead of playing silly games with CONFIG_DEBUG_PREEMPT toggling between this_cpu_*() and __this_cpu_*() use raw_cpu_*(), which is exactly what we want here. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tim Chen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: huang ying <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Fix merging of hlocks with non-zero referencesImre Deak1-9/+9
The sequence static DEFINE_WW_CLASS(test_ww_class); struct ww_acquire_ctx ww_ctx; struct ww_mutex ww_lock_a; struct ww_mutex ww_lock_b; struct ww_mutex ww_lock_c; struct mutex lock_c; ww_acquire_init(&ww_ctx, &test_ww_class); ww_mutex_init(&ww_lock_a, &test_ww_class); ww_mutex_init(&ww_lock_b, &test_ww_class); ww_mutex_init(&ww_lock_c, &test_ww_class); mutex_init(&lock_c); ww_mutex_lock(&ww_lock_a, &ww_ctx); mutex_lock(&lock_c); ww_mutex_lock(&ww_lock_b, &ww_ctx); ww_mutex_lock(&ww_lock_c, &ww_ctx); mutex_unlock(&lock_c); (*) ww_mutex_unlock(&ww_lock_c); ww_mutex_unlock(&ww_lock_b); ww_mutex_unlock(&ww_lock_a); ww_acquire_fini(&ww_ctx); (**) will trigger the following error in __lock_release() when calling mutex_release() at **: DEBUG_LOCKS_WARN_ON(depth <= 0) The problem is that the hlock merging happening at * updates the references for test_ww_class incorrectly to 3 whereas it should've updated it to 4 (representing all the instances for ww_ctx and ww_lock_[abc]). Fix this by updating the references during merging correctly taking into account that we can have non-zero references (both for the hlock that we merge into another hlock or for the hlock we are merging into). Signed-off-by: Imre Deak <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Fix OOO unlock when hlocks need mergingImre Deak1-12/+29
The sequence static DEFINE_WW_CLASS(test_ww_class); struct ww_acquire_ctx ww_ctx; struct ww_mutex ww_lock_a; struct ww_mutex ww_lock_b; struct mutex lock_c; struct mutex lock_d; ww_acquire_init(&ww_ctx, &test_ww_class); ww_mutex_init(&ww_lock_a, &test_ww_class); ww_mutex_init(&ww_lock_b, &test_ww_class); mutex_init(&lock_c); ww_mutex_lock(&ww_lock_a, &ww_ctx); mutex_lock(&lock_c); ww_mutex_lock(&ww_lock_b, &ww_ctx); mutex_unlock(&lock_c); (*) ww_mutex_unlock(&ww_lock_b); ww_mutex_unlock(&ww_lock_a); ww_acquire_fini(&ww_ctx); triggers the following WARN in __lock_release() when doing the unlock at *: DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth - 1); The problem is that the WARN check doesn't take into account the merging of ww_lock_a and ww_lock_b which results in decreasing curr->lockdep_depth by 2 not only 1. Note that the following sequence doesn't trigger the WARN, since there won't be any hlock merging. ww_acquire_init(&ww_ctx, &test_ww_class); ww_mutex_init(&ww_lock_a, &test_ww_class); ww_mutex_init(&ww_lock_b, &test_ww_class); mutex_init(&lock_c); mutex_init(&lock_d); ww_mutex_lock(&ww_lock_a, &ww_ctx); mutex_lock(&lock_c); mutex_lock(&lock_d); ww_mutex_lock(&ww_lock_b, &ww_ctx); mutex_unlock(&lock_d); ww_mutex_unlock(&ww_lock_b); ww_mutex_unlock(&ww_lock_a); mutex_unlock(&lock_c); ww_acquire_fini(&ww_ctx); In general both of the above two sequences are valid and shouldn't trigger any lockdep warning. Fix this by taking the decrement due to the hlock merging into account during lock release and hlock class re-setting. Merging can't happen during lock downgrading since there won't be a new possibility to merge hlocks in that case, so add a WARN if merging still happens then. Signed-off-by: Imre Deak <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03x86/power: Fix 'nosmt' vs hibernation triple fault during resumeJiri Kosina2-2/+11
As explained in 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") we always, no matter what, have to bring up x86 HT siblings during boot at least once in order to avoid first MCE bringing the system to its knees. That means that whenever 'nosmt' is supplied on the kernel command-line, all the HT siblings are as a result sitting in mwait or cpudile after going through the online-offline cycle at least once. This causes a serious issue though when a kernel, which saw 'nosmt' on its commandline, is going to perform resume from hibernation: if the resume from the hibernated image is successful, cr3 is flipped in order to point to the address space of the kernel that is being resumed, which in turn means that all the HT siblings are all of a sudden mwaiting on address which is no longer valid. That results in triple fault shortly after cr3 is switched, and machine reboots. Fix this by always waking up all the SMT siblings before initiating the 'restore from hibernation' process; this guarantees that all the HT siblings will be properly carried over to the resumed kernel waiting in resume_play_dead(), and acted upon accordingly afterwards, based on the target kernel configuration. Symmetricaly, the resumed kernel has to push the SMT siblings to mwait again in case it has SMT disabled; this means it has to online all the siblings when resuming (so that they come out of hlt) and offline them again to let them reach mwait. Cc: 4.19+ <[email protected]> # v4.19+ Debugged-by: Thomas Gleixner <[email protected]> Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") Signed-off-by: Jiri Kosina <[email protected]> Acked-by: Pavel Machek <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2019-06-03perf/core: Add attr_groups_update into struct pmuJiri Olsa1-0/+6
Adding attr_update attribute group into pmu, to allow having multiple attribute groups for same group name. This will allow us to update "events" or "format" directories with attributes that depend on various HW conditions. For example having group_format_extra group that updates "format" directory only if pmu version is 2 and higher: static umode_t exra_is_visible(struct kobject *kobj, struct attribute *attr, int i) { return x86_pmu.version >= 2 ? attr->mode : 0; } static struct attribute_group group_format_extra = { .name = "format", .is_visible = exra_is_visible, }; Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03perf/core: Allow non-privileged uprobe for user processesSong Liu2-3/+3
Currently, non-privileged user could only use uprobe with kernel.perf_event_paranoid = -1 However, setting perf_event_paranoid to -1 leaks other users' processes to non-privileged uprobes. To introduce proper permission control of uprobes, we are building the following system: A daemon with CAP_SYS_ADMIN is in charge to create uprobes via tracefs; Users asks the daemon to create uprobes; Then user can attach uprobe only to processes owned by the user. This patch allows non-privileged user to attach uprobe to processes owned by the user. The following example shows how to use uprobe with non-privileged user. This is based on Brendan's blog post [1] 1. Create uprobe with root: sudo perf probe -x 'readline%return +0($retval):string' 2. Then non-root user can use the uprobe as: perf record -vvv -e probe_bash:readline__return -p <pid> sleep 20 perf script [1] http://www.brendangregg.com/blog/2015-06-28/linux-ftrace-uprobe.html Signed-off-by: Song Liu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Remove !dir in lock irq usage checkYuyang Du1-1/+1
In mark_lock_irq(), the following checks are performed: ---------------------------------- | -> | unsafe | read unsafe | |----------------------------------| | safe | F B | F* B* | |----------------------------------| | read safe | F? B* | - | ---------------------------------- Where: F: check_usage_forwards B: check_usage_backwards *: check enabled by STRICT_READ_CHECKS ?: check enabled by the !dir condition From checking point of view, the special F? case does not make sense, whereas it perhaps is made for peroformance concern. As later patch will address this issue, remove this exception, which makes the checks consistent later. With STRICT_READ_CHECKS = 1 which is default, there is no functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Adjust new bit cases in mark_lockYuyang Du1-14/+7
The new bit can be any possible lock usage except it is garbage, so the cases in switch can be made simpler. Warn early on if wrong usage bit is passed without taking locks. No functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Consolidate lock usage bit initializationYuyang Du1-8/+14
Lock usage bit initialization is consolidated into one function mark_usage(). Trivial readability improvement. No functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Check redundant dependency only when CONFIG_LOCKDEP_SMALLYuyang Du1-0/+4
As Peter has put it all sound and complete for the cause, I simply quote: "It (check_redundant) was added for cross-release (which has since been reverted) which would generate a lot of redundant links (IIRC) but having it makes the reports more convoluted -- basically, if we had an A-B-C relation, then A-C will not be added to the graph because it is already covered. This then means any report will include B, even though a shorter cycle might have been possible." This would increase the number of direct dependencies. For a simple workload (make clean; reboot; make vmlinux -j8), the data looks like this: CONFIG_LOCKDEP_SMALL: direct dependencies: 6926 !CONFIG_LOCKDEP_SMALL: direct dependencies: 9052 (+30.7%) Suggested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Refactorize check_noncircular and check_redundantYuyang Du1-44/+74
These two functions now handle different check results themselves. A new check_path function is added to check whether there is a path in the dependency graph. No functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Remove unused argument in __lock_releaseYuyang Du1-2/+2
The @nested is not used in __release_lock so remove it despite that it is not used in lock_release in the first place. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Remove redundant argument in check_deadlockYuyang Du1-3/+3
In check_deadlock(), the third argument read comes from the second argument hlock so that it can be removed. No functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Update comments on dependency searchYuyang Du1-11/+10
The breadth-first search is implemented as flat-out non-recursive now, but the comments are still describing it as recursive, update the comments in that regard. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Avoid constant checks in __bfs by using offset referenceYuyang Du1-12/+21
In search of a dependency in the lock graph, there is contant checks for forward or backward search. Directly reference the field offset of the struct that differentiates the type of search to avoid those checks. No functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Change the return type of __cq_dequeue()Yuyang Du1-8/+13
With the change, we can slightly adjust the code to iterate the queue in BFS search, which simplifies the code. No functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Change type of the element field in circular_queueYuyang Du1-10/+14
The element field is an array in struct circular_queue to keep track of locks in the search. Making it the same type as the locks avoids type cast. Also fix a typo and elaborate the comment above struct circular_queue. No functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Update commentYuyang Du1-3/+9
A leftover comment is removed. While at it, add more explanatory comments. Such a trivial patch! Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Remove unused argument in validate_chain() and check_deadlock()Yuyang Du1-8/+8
The lockdep_map argument in them is not used, remove it. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Change the range of class_idx in held_lock structYuyang Du1-20/+39
held_lock->class_idx is used to point to the class of the held lock. The index is shifted by 1 to make index 0 mean no class, which results in class index shifting back and forth but is not worth doing so. The reason is: (1) there will be no "no-class" held_lock to begin with, and (2) index 0 seems to be used for error checking, but if something wrong indeed happened, the index can't be counted on to distinguish it as that something won't set the class_idx to 0 on purpose to tell us it is wrong. Therefore, change the index to start from 0. This saves a lot of back-and-forth shifts and a class slot back to lock_classes. Since index 0 is now used for lock class, we change the initial chain key to -1 to avoid key collision, which is due to the fact that __jhash_mix(0, 0, 0) = 0. Actually, the initial chain key can be any arbitrary value other than 0. In addition, a bitmap is maintained to keep track of the used lock classes, and we check the validity of the held lock against that bitmap. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Define INITIAL_CHAIN_KEY for chain keys to start withYuyang Du1-9/+9
Chain keys are computed using Jenkins hash function, which needs an initial hash to start with. Dedicate a macro to make this clear and configurable. A later patch changes this initial chain key. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Use lockdep_init_task for task initiation consistentlyYuyang Du2-6/+8
Despite that there is a lockdep_init_task() which does nothing, lockdep initiates tasks by assigning lockdep fields and does so inconsistently. Fix this by using lockdep_init_task(). Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Print the right depth for chain key collisionYuyang Du1-3/+4
Since chains are separated by IRQ context, so when printing a chain the depth should be consistent with it. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Remove useless conditional macroYuyang Du1-3/+3
Since #defined(CONFIG_PROVE_LOCKING) is used in the scope of #ifdef CONFIG_PROVE_LOCKING, it can be removed. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Adjust lock usage bit character checksYuyang Du1-5/+16
The lock usage bit characters are defined and determined with tricks. Add some explanation to make it a bit clearer, then adjust the logic to check the usage, which optimizes the code a bit. No functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Change all print_*() return type to voidYuyang Du1-101/+108
Since none of the print_*() function's return value is necessary, change their return type to void. No functional change. In cases where an invariable return value is used, this change slightly improves readability, i.e.: print_x(); return 0; is definitely better than: return print_x(); /* where print_x() always returns 0 */ Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03Merge tag 'v5.2-rc3' into locking/core, to pick up fixesIngo Molnar44-329/+213
Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03sched/fair: Remove sgs->sum_weighted_loadDietmar Eggemann1-3/+1
Since sg_lb_stats::sum_weighted_load is now identical with sg_lb_stats::group_load remove it and replace its use case (calculating load per task) with the latter. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Vincent Guittot <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Patrick Bellasi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03sched/core: Remove sd->*_idxDietmar Eggemann2-25/+10
The sched domain per rq load index files also disappear from the /proc/sys/kernel/sched_domain/cpuX/domainY directories. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Patrick Bellasi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03sched/core: Remove rq->cpu_load[]Dietmar Eggemann3-12/+1
The per rq load array values also disappear from the cpu#X sections in /proc/sched_debug. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Patrick Bellasi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03sched/debug: Remove sd->*_idx range on sysctlDietmar Eggemann1-23/+14
This reverts: commit 201c373e8e48 ("sched/debug: Limit sd->*_idx range on sysctl") Load indexes (sd->*_idx) are no longer needed without rq->cpu_load[]. The range check for load indexes can be removed as well. Get rid of it before the rq->cpu_load[] since it uses CPU_LOAD_IDX_MAX. At the same time, fix the following coding style issues detected by scripts/checkpatch.pl: ERROR: space prohibited before that ',' ERROR: space prohibited before that close parenthesis ')' Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Patrick Bellasi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03sched/fair: Replace source_load() & target_load() with weighted_cpuload()Dietmar Eggemann2-87/+4
With LB_BIAS disabled, source_load() & target_load() return weighted_cpuload(). Replace both with calls to weighted_cpuload(). The function to obtain the load index (sd->*_idx) for an sd, get_sd_load_idx(), can be removed as well. Finally, get rid of the sched feature LB_BIAS. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Patrick Bellasi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03sched/fair: Remove the rq->cpu_load[] update codeDietmar Eggemann4-264/+0
With LB_BIAS disabled, there is no need to update the rq->cpu_load[idx] any more. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Patrick Bellasi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03sched/fair: Remove rq->loadDietmar Eggemann3-9/+2
The CFS class is the only one maintaining and using the CPU wide load (rq->load(.weight)). The last use case of the CPU wide load in CFS's set_next_entity() can be replaced by using the load of the CFS class (rq->cfs.load(.weight)) instead. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03sched/core: Provide a pointer to the valid CPU maskSebastian Andrzej Siewior9-48/+50
In commit: 4b53a3412d66 ("sched/core: Remove the tsk_nr_cpus_allowed() wrapper") the tsk_nr_cpus_allowed() wrapper was removed. There was not much difference in !RT but in RT we used this to implement migrate_disable(). Within a migrate_disable() section the CPU mask is restricted to single CPU while the "normal" CPU mask remains untouched. As an alternative implementation Ingo suggested to use: struct task_struct { const cpumask_t *cpus_ptr; cpumask_t cpus_mask; }; with t->cpus_ptr = &t->cpus_mask; In -RT we then can switch the cpus_ptr to: t->cpus_ptr = &cpumask_of(task_cpu(p)); in a migration disabled region. The rules are simple: - Code that 'uses' ->cpus_allowed would use the pointer. - Code that 'modifies' ->cpus_allowed would use the direct mask. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03PM: sleep: Add kerneldoc comments to some functionsRafael J. Wysocki1-0/+6
Add kerneldoc comments to pm_suspend_via_firmware(), pm_resume_via_firmware() and pm_suspend_via_s2idle() to explain what they do. Signed-off-by: Rafael J. Wysocki <[email protected]>
2019-06-02Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2-16/+52
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "On the kernel side there's a bunch of ring-buffer ordering fixes for a reproducible bug, plus a PEBS constraints regression fix. Plus tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools headers UAPI: Sync kvm.h headers with the kernel sources perf record: Fix s390 missing module symbol and warning for non-root users perf machine: Read also the end of the kernel perf test vmlinux-kallsyms: Ignore aliases to _etext when searching on kallsyms perf session: Add missing swap ops for namespace events perf namespace: Protect reading thread's namespace tools headers UAPI: Sync drm/drm.h with the kernel tools headers UAPI: Sync drm/i915_drm.h with the kernel tools headers UAPI: Sync linux/fs.h with the kernel tools headers UAPI: Sync linux/sched.h with the kernel tools arch x86: Sync asm/cpufeatures.h with the with the kernel tools include UAPI: Update copy of files related to new fspick, fsmount, fsconfig, fsopen, move_mount and open_tree syscalls perf arm64: Fix mksyscalltbl when system kernel headers are ahead of the kernel perf data: Fix 'strncat may truncate' build failure with recent gcc perf/ring-buffer: Use regular variables for nesting perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data perf/ring_buffer: Add ordering to rb->nest increment perf/ring_buffer: Fix exposing a temporarily decreased data_head perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints
2019-06-02Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull stacktrace fix from Ingo Molnar: "Fix a stack_trace_save_tsk_reliable() regression" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stacktrace: Unbreak stack_trace_save_tsk_reliable()
2019-06-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds4-31/+51
Merge misc fixes from Andrew Morton: "Various fixes and followups" * emailed patches from Andrew Morton <[email protected]>: mm, compaction: make sure we isolate a valid PFN include/linux/generic-radix-tree.h: fix kerneldoc comment kernel/signal.c: trace_signal_deliver when signal_group_exit drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not used spdxcheck.py: fix directory structures kasan: initialize tag to 0xff in __kasan_kmalloc z3fold: fix sheduling while atomic scripts/gdb: fix invocation when CONFIG_COMMON_CLK is not set mm/gup: continue VM_FAULT_RETRY processing even for pre-faults ocfs2: fix error path kobject memory leak memcg: make it work on sparse non-0-node systems mm, memcg: consider subtrees in memory.events prctl_set_mm: downgrade mmap_sem to read lock prctl_set_mm: refactor checks from validate_prctl_map kernel/fork.c: make max_threads symbol static arch/arm/boot/compressed/decompress.c: fix build error due to lz4 changes arch/parisc/configs/c8000_defconfig: remove obsoleted CONFIG_DEBUG_SLAB_LEAK mm/vmalloc.c: fix typo in comment lib/sort.c: fix kernel-doc notation warnings mm: fix Documentation/vm/hmm.rst Sphinx warnings
2019-06-01kernel/signal.c: trace_signal_deliver when signal_group_exitZhenliang Wei1-0/+2
In the fixes commit, removing SIGKILL from each thread signal mask and executing "goto fatal" directly will skip the call to "trace_signal_deliver". At this point, the delivery tracking of the SIGKILL signal will be inaccurate. Therefore, we need to add trace_signal_deliver before "goto fatal" after executing sigdelset. Note: SEND_SIG_NOINFO matches the fact that SIGKILL doesn't have any info. Link: http://lkml.kernel.org/r/[email protected] Fixes: cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT") Signed-off-by: Zhenliang Wei <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Ivan Delalande <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Deepa Dinamani <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>