aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-04-03printk: fix one circular lockdep warning about console_lockJane Li1-0/+2
Fix a warning about possible circular locking dependency. If do in following sequence: enter suspend -> resume -> plug-out CPUx (echo 0 > cpux/online) lockdep will show warning as following: ====================================================== [ INFO: possible circular locking dependency detected ] 3.10.0 #2 Tainted: G O ------------------------------------------------------- sh/1271 is trying to acquire lock: (console_lock){+.+.+.}, at: console_cpu_notify+0x20/0x2c but task is already holding lock: (cpu_hotplug.lock){+.+.+.}, at: cpu_hotplug_begin+0x2c/0x58 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (cpu_hotplug.lock){+.+.+.}: lock_acquire+0x98/0x12c mutex_lock_nested+0x50/0x3d8 cpu_hotplug_begin+0x2c/0x58 _cpu_up+0x24/0x154 cpu_up+0x64/0x84 smp_init+0x9c/0xd4 kernel_init_freeable+0x78/0x1c8 kernel_init+0x8/0xe4 ret_from_fork+0x14/0x2c -> #1 (cpu_add_remove_lock){+.+.+.}: lock_acquire+0x98/0x12c mutex_lock_nested+0x50/0x3d8 disable_nonboot_cpus+0x8/0xe8 suspend_devices_and_enter+0x214/0x448 pm_suspend+0x1e4/0x284 try_to_suspend+0xa4/0xbc process_one_work+0x1c4/0x4fc worker_thread+0x138/0x37c kthread+0xa4/0xb0 ret_from_fork+0x14/0x2c -> #0 (console_lock){+.+.+.}: __lock_acquire+0x1b38/0x1b80 lock_acquire+0x98/0x12c console_lock+0x54/0x68 console_cpu_notify+0x20/0x2c notifier_call_chain+0x44/0x84 __cpu_notify+0x2c/0x48 cpu_notify_nofail+0x8/0x14 _cpu_down+0xf4/0x258 cpu_down+0x24/0x40 store_online+0x30/0x74 dev_attr_store+0x18/0x24 sysfs_write_file+0x16c/0x19c vfs_write+0xb4/0x190 SyS_write+0x3c/0x70 ret_fast_syscall+0x0/0x48 Chain exists of: console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(cpu_hotplug.lock); lock(cpu_add_remove_lock); lock(cpu_hotplug.lock); lock(console_lock); *** DEADLOCK *** There are three locks involved in two sequence: a) pm suspend: console_lock (@suspend_console()) cpu_add_remove_lock (@disable_nonboot_cpus()) cpu_hotplug.lock (@_cpu_down()) b) Plug-out CPUx: cpu_add_remove_lock (@(cpu_down()) cpu_hotplug.lock (@_cpu_down()) console_lock (@console_cpu_notify()) => Lockdeps prints warning log. There should be not real deadlock, as flag of console_suspended can protect this. Although console_suspend() releases console_sem, it doesn't tell lockdep about it. That results in the lockdep warning about circular locking when doing the following: enter suspend -> resume -> plug-out CPUx (echo 0 > cpux/online) Fix the problem by telling lockdep we actually released the semaphore in console_suspend() and acquired it again in console_resume(). Signed-off-by: Jane Li <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03include/linux/printk.h: remove double asmlinkage in printk_emitSimon Kågström1-3/+3
The double asmlinkage was introduced in commit 7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer"). Signed-off-by: Simon Kagstrom <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03printk: do not compute the size of the message twicePetr Mladek1-1/+1
This is just a tiny optimization. It removes duplicate computation of the message size. Signed-off-by: Petr Mladek <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jan Kara <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Kay Sievers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03printk: use also the last bytes in the ring bufferPetr Mladek1-2/+2
It seems that we have newer used the last byte in the ring buffer. In fact, we have newer used the last 4 bytes because of padding. First problem is in the check for free space. The exact number of free bytes is enough to store the length of data. Second problem is in the check where the ring buffer is rotated. The left side counts the first unused index. It is unused, so it might be the same as the size of the buffer. Note that the first problem has to be fixed together with the second one. Otherwise, the buffer is rotated even when there is enough space on the end of the buffer. Then the beginning of the buffer is rewritten and valid entries get corrupted. Signed-off-by: Petr Mladek <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jan Kara <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Kay Sievers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03printk: add comment about tricky check for text buffer sizePetr Mladek1-0/+5
There is no check for potential "text_len" overflow. It is not needed because only valid level is detected. It took me some time to understand why. It would deserve a comment ;-) Signed-off-by: Petr Mladek <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jan Kara <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Kay Sievers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03printk: remove obsolete check for log level "c"Petr Mladek1-2/+0
The kernel log level "c" was removed in commit 61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT"). It is no longer detected in printk_get_level(). Hence we do not need to check it in vprintk_emit. Signed-off-by: Petr Mladek <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jan Kara <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Kay Sievers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03printk: remove duplicated check for log levelPetr Mladek1-7/+3
The check for the exact log level is already done in printk_get_level. We do not need to duplicate it in printk_skip_level. Signed-off-by: Petr Mladek <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jan Kara <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Kay Sievers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03vsprintf: remove %n handlingRyan Mallon1-36/+9
All in-kernel users of %n in format strings have now been removed and the %n directive is ignored. Remove the handling of %n so that it is treated the same as any other invalid format string directive. Keep a warning in place to deter new instances of %n in format strings. Signed-off-by: Ryan Mallon <[email protected]> Acked-by: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03kernel/resource.c: make reallocate_resource() staticDaeseok Youn1-1/+1
sparse says: kernel/resource.c:518:5: warning: symbol 'reallocate_resource' was not declared. Should it be static? Signed-off-by: Daeseok Youn <[email protected]> Reviewed-by: Yasuaki Ishimatsu <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03kernel: audit/fix non-modular users of module_init in core codePaul Gortmaker6-9/+7
Code that is obj-y (always built-in) or dependent on a bool Kconfig (built-in or absent) can never be modular. So using module_init as an alias for __initcall can be somewhat misleading. Fix these up now, so that we can relocate module_init from init.h into module.h in the future. If we don't do this, we'd have to add module.h to obviously non-modular code, and that would be a worse thing. The audit targets the following module_init users for change: kernel/user.c obj-y kernel/kexec.c bool KEXEC (one instance per arch) kernel/profile.c bool PROFILING kernel/hung_task.c bool DETECT_HUNG_TASK kernel/sched/stats.c bool SCHEDSTATS kernel/user_namespace.c bool USER_NS Note that direct use of __initcall is discouraged, vs. one of the priority categorized subgroups. As __initcall gets mapped onto device_initcall, our use of subsys_initcall (which makes sense for these files) will thus change this registration from level 6-device to level 4-subsys (i.e. slightly earlier). However no observable impact of that difference has been observed during testing. Also, two instances of missing ";" at EOL are fixed in kexec. Signed-off-by: Paul Gortmaker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Eric Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03lib/syscall.c: unexport task_current_syscall()Andrew Morton1-1/+0
It is only used by procfs and procfs cannot be a module. Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03xattr: guard against simultaneous glibc header inclusionSerge Hallyn2-0/+16
If the glibc xattr.h header is included after the uapi header, compilation fails due to an enum re-using a #define from the uapi header. Protect against this by guarding the define and enum inclusions against each other. (See https://lists.debian.org/debian-glibc/2014/03/msg00029.html and https://sourceware.org/glibc/wiki/Synchronizing_Headers for more information.) Signed-off-by: Serge Hallyn <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Allan McRae <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03samples/seccomp/Makefile: do not build tests if cross-compiling for MIPSMarkos Chandras1-4/+10
The Makefile is designed to use the host toolchain so it may be unsafe to build the tests if the kernel has been configured and built for another architecture. This fixes a build problem when the kernel has been configured and built for the MIPS architecture but the host is not MIPS (cross-compiled). The MIPS syscalls are only defined if one of the following is true: 1) _MIPS_SIM == _MIPS_SIM_ABI64 2) _MIPS_SIM == _MIPS_SIM_ABI32 3) _MIPS_SIM == _MIPS_SIM_NABI32 Of course, none of these make sense on a non-MIPS toolchain and the following build problem occurs when building on a non-MIPS host. linux/usr/include/linux/kexec.h:50: userspace cannot reference function or variable defined in the kernel samples/seccomp/bpf-direct.c: In function `emulator': samples/seccomp/bpf-direct.c:76:17: error: `__NR_write' undeclared (first use in this function) Signed-off-by: Markos Chandras <[email protected]> Reported-by: Paul Gortmaker <[email protected]> Cc: Ralf Baechle <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03err.h: use bool for IS_ERR and IS_ERR_OR_NULLJoe Perches1-3/+4
Use the more natural return of bool for these tests. No difference observed in .o files produced by gcc for x86. Remove the dentry description of kernel pointers left over from the 90's and 2002's cleanup move of parts of fs.h to err.h. Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03SubmittingPatches: document the use of gitJosh Triplett1-15/+16
Most of the mechanical portions of SubmittingPatches exist to help patch submitters replicate the output of git. Mention this explicitly, both as a reminder that git will help with this process, and as signposting to let git users know what they can safely skip. Signed-off-by: Josh Triplett <[email protected]> Acked-by: Borislav Petkov <[email protected]> Cc: Rob Landley <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03SubmittingPatches: add recommendation for mailing list referencesJosh Triplett1-1/+9
SubmittingPatches already mentions referencing bugs fixed by a commit, but doesn't mention citing relevant mailing list discussions. Add a note to that effect, along with a recommendation to use the https://lkml.kernel.org/ redirector. Portions based on text from git's SubmittingPatches. Signed-off-by: Josh Triplett <[email protected]> Acked-by: Borislav Petkov <[email protected]> Cc: Rob Landley <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03SubmittingPatches: add style recommendation to use imperative descriptionsJosh Triplett1-0/+5
Most commit messages use this style, and the recommendation frequently comes up in discussions (especially in response to patches that don't use it), but that recommendation doesn't actually appear anywhere in Documentation. Add this style guideline to SubmittingPatches, using the description from git's SubmittingPatches. Signed-off-by: Josh Triplett <[email protected]> Acked-by: Borislav Petkov <[email protected]> Cc: Rob Landley <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03fs, kernel: permit disabling the uselib syscallJosh Triplett4-1/+21
uselib hasn't been used since libc5; glibc does not use it. Support turning it off. When disabled, also omit the load_elf_library implementation from binfmt_elf.c, which only uselib invokes. bloat-o-meter: add/remove: 0/4 grow/shrink: 0/1 up/down: 0/-785 (-785) function old new delta padzero 39 36 -3 uselib_flags 20 - -20 sys_uselib 168 - -168 SyS_uselib 168 - -168 load_elf_library 426 - -426 The new CONFIG_USELIB defaults to `y'. Signed-off-by: Josh Triplett <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03kernel/groups.c: remove return value of set_groupsWang YanQing3-17/+4
After commit 6307f8fee295 ("security: remove dead hook task_setgroups"), set_groups will always return zero, so we could just remove return value of set_groups. This patch reduces code size, and simplfies code to use set_groups, because we don't need to check its return value any more. [[email protected]: remove obsolete claims from set_groups() comment] Signed-off-by: Wang YanQing <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Eric Paris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03sys_sysfs: Add CONFIG_SYSFS_SYSCALLFabian Frederick3-0/+13
sys_sysfs is an obsolete system call no longer supported by libc. - This patch adds a default CONFIG_SYSFS_SYSCALL=y - Option can be turned off in expert mode. - cond_syscall added to kernel/sys_ni.c [[email protected]: tweak Kconfig help text] Signed-off-by: Fabian Frederick <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03include/linux/syscalls.h: add sys32_quotactl() prototypeRashika Kheria1-0/+2
This eliminates the following warning in quota/compat.c: fs/quota/compat.c:43:17: warning: no previous prototype for `sys32_quotactl' [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit ↵Raghavendra K T1-2/+2
readahead pages Currently max_sane_readahead() returns zero on the cpu whose NUMA node has no local memory which leads to readahead failure. Fix this readahead failure by returning minimum of (requested pages, 512). Users running applications on a memory-less cpu which needs readahead such as streaming application see considerable boost in the performance. Result: fadvise experiment with FADV_WILLNEED on a PPC machine having memoryless CPU with 1GB testfile (12 iterations) yielded around 46.66% improvement. fadvise experiment with FADV_WILLNEED on a x240 machine with 1GB testfile 32GB* 4G RAM numa machine (12 iterations) showed no impact on the normal NUMA cases w/ patch. Kernel Avg Stddev base 7.4975 3.92% patched 7.4174 3.26% [Andrew: making return value PAGE_SIZE independent] Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Jan Kara <[email protected]> Cc: Wu Fengguang <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03slub: do not drop slab_mutex for sysfs_slab_addVladimir Davydov1-5/+3
We release the slab_mutex while calling sysfs_slab_add from __kmem_cache_create since commit 66c4c35c6bc5 ("slub: Do not hold slub_lock when calling sysfs_slab_add()"), because kobject_uevent called by sysfs_slab_add might block waiting for the usermode helper to exec, which would result in a deadlock if we took the slab_mutex while executing it. However, apart from complicating synchronization rules, releasing the slab_mutex on kmem cache creation can result in a kmemcg-related race. The point is that we check if the memcg cache exists before going to __kmem_cache_create, but register the new cache in memcg subsys after it. Since we can drop the mutex there, several threads can see that the memcg cache does not exist and proceed to creating it, which is wrong. Fortunately, recently kobject_uevent was patched to call the usermode helper with the UMH_NO_WAIT flag, making the deadlock impossible. Therefore there is no point in releasing the slab_mutex while calling sysfs_slab_add, so let's simplify kmem_cache_create synchronization and fix the kmemcg-race mentioned above by holding the slab_mutex during the whole cache creation path. Signed-off-by: Vladimir Davydov <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Greg KH <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03kobject: don't block for each kobject_ueventVladimir Davydov2-6/+37
Currently kobject_uevent has somewhat unpredictable semantics. The point is, since it may call a usermode helper and wait for it to execute (UMH_WAIT_EXEC), it is impossible to say for sure what lock dependencies it will introduce for the caller - strictly speaking it depends on what fs the binary is located on and the set of locks fork may take. There are quite a few kobject_uevent's users that do not take this into account and call it with various mutexes taken, e.g. rtnl_mutex, net_mutex, which might potentially lead to a deadlock. Since there is actually no reason to wait for the usermode helper to execute there, let's make kobject_uevent start the helper asynchronously with the aid of the UMH_NO_WAIT flag. Personally, I'm interested in this, because I really want kobject_uevent to be called under the slab_mutex in the slub implementation as it used to be some time ago, because it greatly simplifies synchronization and automatically fixes a kmemcg-related race. However, there was a deadlock detected on an attempt to call kobject_uevent under the slab_mutex (see https://lkml.org/lkml/2012/1/14/45), which was reported to be fixed by releasing the slab_mutex for kobject_uevent. Unfortunately, there was no information about who exactly blocked on the slab_mutex causing the usermode helper to stall, neither have I managed to find this out or reproduce the issue. BTW, this is not the first attempt to make kobject_uevent use UMH_NO_WAIT. Previous one was made by commit f520360d93cd ("kobject: don't block for each kobject_uevent"), but it was wrong (it passed arguments allocated on stack to async thread) so it was reverted in 05f54c13cd0c ("Revert "kobject: don't block for each kobject_uevent"."). It targeted on speeding up the boot process though. Signed-off-by: Vladimir Davydov <[email protected]> Cc: Greg KH <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03drop_caches: add some documentation and info messageDave Hansen5-10/+47
There is plenty of anecdotal evidence and a load of blog posts suggesting that using "drop_caches" periodically keeps your system running in "tip top shape". Perhaps adding some kernel documentation will increase the amount of accurate data on its use. If we are not shrinking caches effectively, then we have real bugs. Using drop_caches will simply mask the bugs and make them harder to find, but certainly does not fix them, nor is it an appropriate "workaround" to limit the size of the caches. On the contrary, there have been bug reports on issues that turned out to be misguided use of cache dropping. Dropping caches is a very drastic and disruptive operation that is good for debugging and running tests, but if it creates bug reports from production use, kernel developers should be aware of its use. Add a bit more documentation about it, a syslog message to track down abusers, and vmstat drop counters to help analyze problem reports. [[email protected]: checkpatch fixes] [[email protected]: add runtime suppression control] Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Acked-by: KOSAKI Motohiro <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm: remove read_cache_page_async()Sasha Levin4-54/+25
This patch removes read_cache_page_async() which wasn't really needed anywhere and simplifies the code around it a bit. read_cache_page_async() is useful when we want to read a page into the cache without waiting for it to complete. This happens when the appropriate callback 'filler' doesn't complete its read operation and releases the page lock immediately, and instead queues a different completion routine to do that. This never actually happened anywhere in the code. read_cache_page_async() had 3 different callers: - read_cache_page() which is the sync version, it would just wait for the requested read to complete using wait_on_page_read(). - JFFS2 would call it from jffs2_gc_fetch_page(), but the filler function it supplied doesn't do any async reads, and would complete before the filler function returns - making it actually a sync read. - CRAMFS would call it using the read_mapping_page_async() wrapper, with a similar story to JFFS2 - the filler function doesn't do anything that reminds async reads and would always complete before the filler function returns. To sum it up, the code in mm/filemap.c never took advantage of having read_cache_page_async(). While there are filler callbacks that do async reads (such as the block one), we always called it with the read_cache_page(). This patch adds a mandatory wait for read to complete when adding a new page to the cache, and removes read_cache_page_async() and its wrappers. Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm, thp: drop do_huge_pmd_wp_zero_page_fallback()Kirill A. Shutemov1-77/+2
I've realized that there's no need for do_huge_pmd_wp_zero_page_fallback(). We can just split zero page with split_huge_page_pmd() and return VM_FAULT_FALLBACK. handle_pte_fault() will handle write-protection fault for us. Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm: consolidate code to setup pteKirill A. Shutemov1-36/+30
Extract and consolidate code to setup pte from do_read_fault(), do_cow_fault() and do_shared_fault(). Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm: consolidate code to call vm_ops->page_mkwrite()Kirill A. Shutemov1-60/+45
There are two functions which need to call vm_ops->page_mkwrite(): do_shared_fault() and do_wp_page(). We can consolidate preparation code. Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm: introduce do_shared_fault() and drop do_fault()Kirill A. Shutemov1-164/+62
Introduce do_shared_fault(). The function does what do_fault() does for write faults to shared mappings Unlike do_fault(), do_shared_fault() is relatively clean and straight-forward. Old do_fault() is not needed anymore. Let it die. [[email protected]: fix NULL pointer dereference] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Bob Liu <[email protected]> Cc: Sasha Levin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm: introduce do_cow_fault()Kirill A. Shutemov1-0/+62
Introduce do_cow_fault(). The function does what do_fault() does for write page faults to private mappings. Unlike do_fault(), do_read_fault() is relatively clean and straight-forward. Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm: introduce do_read_fault()Kirill A. Shutemov1-0/+43
Introduce do_read_fault(). The function does what do_fault() does for read page faults. Unlike do_fault(), do_read_fault() is pretty clean and straightforward. Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm: do_fault(): extract to call vm_ops->do_fault() to separate functionKirill A. Shutemov1-31/+45
Extract code to vm_ops->do_fault() and basic error handling to separate function. The code will be reused. Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm: rename __do_fault() -> do_fault()Kirill A. Shutemov1-5/+5
Current __do_fault() is awful and unmaintainable. These patches try to sort it out by split __do_fault() into three destinct codepaths: - to handle read page fault; - to handle write page fault to private mappings; - to handle write page fault to shared mappings; I also found page refcount leak in PageHWPoison() path of __do_fault(). This patch (of 7): do_fault() is unused: no reason for underscores. Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03include/linux/mm.h: remove ifdef conditionRashika Kheria1-2/+0
The ifdef conditions in include/linux/mm.h presents three cases: - !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) There is no actual definition of function but include/linux/mm.h has a static inline stub defined. - defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) linux/mm.h does not define a prototype, but mm/page_alloc.c defines the function. Hence, compiler reports the following warning: mm/page_alloc.c:4300:15: warning: no previous prototype for `__early_pfn_to_nid' [-Wmissing-prototypes] - defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) The architecture defines the function, and linux/mm.h has a prototype. Thus, join the conditions of Case 2 and 3 ie eliminate the ifdef condition of CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID to eliminate the missing prototype warning from file mm/page_alloc.c. Signed-off-by: Rashika Kheria <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm/nobootmem.c: mark function as staticRashika Kheria1-1/+1
Mark function as static in nobootmem.c because it is not used outside this file. This eliminates the following warning in mm/nobootmem.c: mm/nobootmem.c:324:15: warning: no previous prototype for `___alloc_bootmem_node' [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm/page_cgroup.c: mark functions as staticRashika Kheria1-6/+6
Mark functions as static in page_cgroup.c because they are not used outside this file. This eliminates the following warning in mm/page_cgroup.c: mm/page_cgroup.c:177:6: warning: no previous prototype for `__free_page_cgroup' [-Wmissing-prototypes] mm/page_cgroup.c:190:15: warning: no previous prototype for `online_page_cgroup' [-Wmissing-prototypes] mm/page_cgroup.c:225:15: warning: no previous prototype for `offline_page_cgroup' [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm/process_vm_access.c: mark function as staticRashika Kheria1-1/+1
Mark function as static in process_vm_access.c because it is not used outside this file. This eliminates the following warning in mm/process_vm_access.c: mm/process_vm_access.c:416:1: warning: no previous prototype for `compat_process_vm_rw' [-Wmissing-prototypes] [[email protected]: remove unneeded asmlinkage - compat_process_vm_rw isn't referenced from asm] Signed-off-by: Rashika Kheria <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm/mmap.c: mark function as staticRashika Kheria1-1/+1
Mark function as static in mmap.c because they are not used outside this file. This eliminates the following warning in mm/mmap.c: mm/mmap.c:407:6: warning: no previous prototype for `validate_mm' [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm/memory.c: mark functions as staticRashika Kheria1-2/+2
mark functions as static in memory.c because they are not used outside this file. This eliminates the following warnings in mm/memory.c: mm/memory.c:3530:5: warning: no previous prototype for `numa_migrate_prep' [-Wmissing-prototypes] mm/memory.c:3545:5: warning: no previous prototype for `do_numa_page' [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm/compaction.c: mark function as staticRashika Kheria1-1/+1
Mark function as static in compaction.c because it is not used outside this file. This eliminates the following warning from mm/compaction.c: mm/compaction.c:1190:9: warning: no previous prototype for `sysfs_compact_node' [-Wmissing-prototypes Signed-off-by: Rashika Kheria <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm, compaction: avoid isolating pinned pagesDavid Rientjes1-0/+9
Page migration will fail for memory that is pinned in memory with, for example, get_user_pages(). In this case, it is unnecessary to take zone->lru_lock or isolating the page and passing it to page migration which will ultimately fail. This is a racy check, the page can still change from under us, but in that case we'll just fail later when attempting to move the page. This avoids very expensive memory compaction when faulting transparent hugepages after pinning a lot of memory with a Mellanox driver. On a 128GB machine and pinning ~120GB of memory, before this patch we see the enormous disparity in the number of page migration failures because of the pinning (from /proc/vmstat): compact_pages_moved 8450 compact_pagemigrate_failed 15614415 0.05% of pages isolated are successfully migrated and explicitly triggering memory compaction takes 102 seconds. After the patch: compact_pages_moved 9197 compact_pagemigrate_failed 7 99.9% of pages isolated are now successfully migrated in this configuration and memory compaction takes less than one second. Signed-off-by: David Rientjes <[email protected]> Acked-by: Hugh Dickins <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Greg Thelen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm, hugetlb: mark some bootstrap functions as __initDavid Rientjes1-2/+3
Both prep_compound_huge_page() and prep_compound_gigantic_page() are only called at bootstrap and can be marked as __init. The __SetPageTail(page) in prep_compound_gigantic_page() happening before page->first_page is initialized is not concerning since this is bootstrap. Signed-off-by: David Rientjes <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: Joonsoo Kim <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm: keep page cache radix tree nodes in checkJohannes Weiner10-43/+359
Previously, page cache radix tree nodes were freed after reclaim emptied out their page pointers. But now reclaim stores shadow entries in their place, which are only reclaimed when the inodes themselves are reclaimed. This is problematic for bigger files that are still in use after they have a significant amount of their cache reclaimed, without any of those pages actually refaulting. The shadow entries will just sit there and waste memory. In the worst case, the shadow entries will accumulate until the machine runs out of memory. To get this under control, the VM will track radix tree nodes exclusively containing shadow entries on a per-NUMA node list. Per-NUMA rather than global because we expect the radix tree nodes themselves to be allocated node-locally and we want to reduce cross-node references of otherwise independent cache workloads. A simple shrinker will then reclaim these nodes on memory pressure. A few things need to be stored in the radix tree node to implement the shadow node LRU and allow tree deletions coming from the list: 1. There is no index available that would describe the reverse path from the node up to the tree root, which is needed to perform a deletion. To solve this, encode in each node its offset inside the parent. This can be stored in the unused upper bits of the same member that stores the node's height at no extra space cost. 2. The number of shadow entries needs to be counted in addition to the regular entries, to quickly detect when the node is ready to go to the shadow node LRU list. The current entry count is an unsigned int but the maximum number of entries is 64, so a shadow counter can easily be stored in the unused upper bits. 3. Tree modification needs tree lock and tree root, which are located in the address space, so store an address_space backpointer in the node. The parent pointer of the node is in a union with the 2-word rcu_head, so the backpointer comes at no extra cost as well. 4. The node needs to be linked to an LRU list, which requires a list head inside the node. This does increase the size of the node, but it does not change the number of objects that fit into a slab page. [[email protected]: export the right function] Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Bob Liu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Luigi Semenzato <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Metin Doslu <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Ozgun Erdogan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Ryan Mallon <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03lib: radix_tree: tree node interfaceJohannes Weiner2-115/+182
Make struct radix_tree_node part of the public interface and provide API functions to create, look up, and delete whole nodes. Refactor the existing insert, look up, delete functions on top of these new node primitives. This will allow the VM to track and garbage collect page cache radix tree nodes. [[email protected]: return correct error code on insertion failure] Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Bob Liu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Luigi Semenzato <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Metin Doslu <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Ozgun Erdogan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Ryan Mallon <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm: thrash detection-based file cache sizingJohannes Weiner8-23/+331
The VM maintains cached filesystem pages on two types of lists. One list holds the pages recently faulted into the cache, the other list holds pages that have been referenced repeatedly on that first list. The idea is to prefer reclaiming young pages over those that have shown to benefit from caching in the past. We call the recently usedbut ultimately was not significantly better than a FIFO policy and still thrashed cache based on eviction speed, rather than actual demand for cache. This patch solves one half of the problem by decoupling the ability to detect working set changes from the inactive list size. By maintaining a history of recently evicted file pages it can detect frequently used pages with an arbitrarily small inactive list size, and subsequently apply pressure on the active list based on actual demand for cache, not just overall eviction speed. Every zone maintains a counter that tracks inactive list aging speed. When a page is evicted, a snapshot of this counter is stored in the now-empty page cache radix tree slot. On refault, the minimum access distance of the page can be assessed, to evaluate whether the page should be part of the active list or not. This fixes the VM's blindness towards working set changes in excess of the inactive list. And it's the foundation to further improve the protection ability and reduce the minimum inactive list size of 50%. Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: Bob Liu <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Luigi Semenzato <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Metin Doslu <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Ozgun Erdogan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Ryan Mallon <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm + fs: store shadow entries in page cacheJohannes Weiner50-65/+147
Reclaim will be leaving shadow entries in the page cache radix tree upon evicting the real page. As those pages are found from the LRU, an iput() can lead to the inode being freed concurrently. At this point, reclaim must no longer install shadow pages because the inode freeing code needs to ensure the page tree is really empty. Add an address_space flag, AS_EXITING, that the inode freeing code sets under the tree lock before doing the final truncate. Reclaim will check for this flag before installing shadow pages. Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Bob Liu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Luigi Semenzato <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Metin Doslu <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Ozgun Erdogan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Ryan Mallon <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm + fs: prepare for non-page entries in page cache radix treesJohannes Weiner11-130/+349
shmem mappings already contain exceptional entries where swap slot information is remembered. To be able to store eviction information for regular page cache, prepare every site dealing with the radix trees directly to handle entries other than pages. The common lookup functions will filter out non-page entries and return NULL for page cache holes, just as before. But provide a raw version of the API which returns non-page entries as well, and switch shmem over to use it. Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Bob Liu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Luigi Semenzato <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Metin Doslu <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Ozgun Erdogan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Ryan Mallon <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm: filemap: move radix tree hole searching hereJohannes Weiner6-82/+84
The radix tree hole searching code is only used for page cache, for example the readahead code trying to get a a picture of the area surrounding a fault. It sufficed to rely on the radix tree definition of holes, which is "empty tree slot". But this is about to change, though, as shadow page descriptors will be stored in the page cache after the actual pages get evicted from memory. Move the functions over to mm/filemap.c and make them native page cache operations, where they can later be adapted to handle the new definition of "page cache hole". Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Bob Liu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Luigi Semenzato <[email protected]> Cc: Metin Doslu <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Ozgun Erdogan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Ryan Mallon <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-03mm: shmem: save one radix tree lookup when truncating swapped pagesJohannes Weiner1-13/+12
Page cache radix tree slots are usually stabilized by the page lock, but shmem's swap cookies have no such thing. Because the overall truncation loop is lockless, the swap entry is currently confirmed by a tree lookup and then deleted by another tree lookup under the same tree lock region. Use radix_tree_delete_item() instead, which does the verification and deletion with only one lookup. This also allows removing the delete-only special case from shmem_radix_tree_replace(). Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Bob Liu <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Luigi Semenzato <[email protected]> Cc: Metin Doslu <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Ozgun Erdogan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Ryan Mallon <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>