blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2016-01-16	lib/vsprintf.c: help gcc make number() smaller	Rasmus Villemoes	1	-12/+14
	One consequence of the reorganization of struct printf_spec to make field_width 24 bits was that number() gained about 180 bytes. Since spec is never passed to other functions, we can help gcc make number() lose most of that extra weight by using local variables for the field width and precision. Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Al Viro <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kees Cook <[email protected]> Cc: Maurizio Lombardi <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	lib/vsprintf.c: expand field_width to 24 bits	Rasmus Villemoes	1	-20/+22
	Maurizio Lombardi reported a problem [1] with the %pb extension: It doesn't work for sufficiently large bitmaps, since the size is stashed in the field_width field of the struct printf_spec, which is currently an s16. Concretely, this manifested itself in /sys/bus/pseudo/drivers/scsi_debug/map being empty, since the bitmap printer got a size of 0, which is the 16 bit truncation of the actual bitmap size. We do want to keep struct printf_spec at 8 bytes so that it can cheaply be passed by value. The qualifier field is only used for internal bookkeeping in format_decode, so we might as well use a local variable for that. This gives us an additional 8 bits, which we can then use for the field width. To stay in 8 bytes, we need to do a little rearranging and make the type member a bitfield as well. For consistency, change all the members to bit fields. gcc doesn't generate much worse code with these changes (in fact, bloat-o-meter says we save 300 bytes - which I think is a little surprising). I didn't find a BUILD_BUG/compiletime_assertion/... which would work outside function context, so for now I just open-coded it. [1] http://thread.gmane.org/gmane.linux.kernel/2034835 [[email protected]: avoid open-coded BUILD_BUG_ON] Signed-off-by: Rasmus Villemoes <[email protected]> Reported-by: Maurizio Lombardi <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Al Viro <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	lib/vsprintf.c: eliminate potential race in string()	Rasmus Villemoes	1	-19/+9
	If the string corresponding to a %s specifier can change under us, we might end up copying a \0 byte to the output buffer. There might be callers who expect the output buffer to contain a genuine C string whose length is exactly the snprintf return value (assuming truncation hasn't happened or has been checked for). We can avoid this by only passing over the source string once, stopping the first time we meet a nul byte (or when we reach the given precision), and then letting widen_string() handle left/right space padding. As a small bonus, this code reuse also makes the generated code slightly smaller. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Al Viro <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kees Cook <[email protected]> Cc: Maurizio Lombardi <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	lib/vsprintf.c: move string() below widen_string()	Rasmus Villemoes	1	-31/+31
	This is pure code movement, making sure the widen_string() helper is defined before the string() function. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Al Viro <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kees Cook <[email protected]> Cc: Maurizio Lombardi <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	lib/vsprintf.c: pull out padding code from dentry_name()	Rasmus Villemoes	1	-15/+31
	Pull out the logic in dentry_name() which handles field width space padding, in preparation for reusing it from string(). Rename the widen() helper to move_right(), since it is used for handling the !(flags & LEFT) case. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Al Viro <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kees Cook <[email protected]> Cc: Maurizio Lombardi <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	printk: do cond_resched() between lines while outputting to consoles	Tejun Heo	3	-3/+36
	@console_may_schedule tracks whether console_sem was acquired through lock or trylock. If the former, we're inside a sleepable context and console_conditional_schedule() performs cond_resched(). This allows console drivers which use console_lock for synchronization to yield while performing time-consuming operations such as scrolling. However, the actual console outputting is performed while holding irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule before starting outputting lines. Also, only a few drivers call console_conditional_schedule() to begin with. This means that when a lot of lines need to be output by console_unlock(), for example on a console registration, the task doing console_unlock() may not yield for a long time on a non-preemptible kernel. If this happens with a slow console devices, for example a serial console, the outputting task may occupy the cpu for a very long time. Long enough to trigger softlockup and/or RCU stall warnings, which in turn pile more messages, sometimes enough to trigger the next cycle of warnings incapacitating the system. Fix it by making console_unlock() insert cond_resched() between lines if @console_may_schedule. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Calvin Owens <[email protected]> Acked-by: Jan Kara <[email protected]> Cc: Dave Jones <[email protected]> Cc: Kyle McMartin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	printk: only unregister boot consoles when necessary	Thierry Reding	1	-1/+25
	Boot consoles are typically replaced by proper consoles during the boot process. This can be problematic if the boot console data is part of the init section that is reclaimed late during boot. If the proper console does not register before this point in time, the boot console will need to be removed (so that the freed memory is not accessed), leaving the system without output for some time. There are various reasons why the proper console may not register early enough, such as deferred probe or the driver being a loadable module. If that happens, there is some amount of time where no console messages are visible to the user, which in turn can mean that they won't see crashes or other potentially useful information. To avoid this situation, only remove the boot console when it resides in the init section. Code exists to replace the boot console by the proper console when it is registered, keeping a seamless transition between the boot and proper consoles. Signed-off-by: Thierry Reding <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	asm/sections: add helpers to check for section data	Thierry Reding	1	-0/+65
	Add a helper to check if an object (given an address and a size) is part of a section (given beginning and end addresses). For convenience, also provide a helper that performs this check for __init data using the __init_begin and __init_end limits. Signed-off-by: Thierry Reding <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	kernel/stop_machine.c: remove CONFIG_SMP dependencies	Andrew Morton	1	-4/+0
	stop_machine.o is only built if CONFIG_SMP=y, so this ifdef always evaluates to true. [[email protected]: remove now-unneeded ifdef] Reported-by: Valentin Rothberg <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	uselib: default depending if libc5 was used	Riku Voipio	1	-1/+1
	uselib hasn't been used since libc5; glibc does not use it. Deprecate uselib a bit more, by making the default y only if libc5 was widely used on the plaform. This makes arm64 kernel built with defconfig slightly smaller bloat-o-meter: add/remove: 0/3 grow/shrink: 0/2 up/down: 0/-1390 (-1390) function old new delta kernel_config_data 18164 18162 -2 uselib_flags 20 - -20 padzero 216 192 -24 sys_uselib 380 - -380 load_elf_library 964 - -964 Signed-off-by: Riku Voipio <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	err.h: add (missing) unlikely() to IS_ERR_OR_NULL()	Viresh Kumar	1	-1/+1
	IS_ERR_VALUE() already contains it and so we need to add this only to the !ptr check. That will allow users of IS_ERR_OR_NULL(), to not add this compiler flag. Signed-off-by: Viresh Kumar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	Kconfig: remove HAVE_LATENCYTOP_SUPPORT	Will Deacon	12	-37/+0
	As illustrated by commit a3afe70b83fd ("[S390] latencytop s390 support."), HAVE_LATENCYTOP_SUPPORT is defined by an architecture to advertise an implementation of save_stack_trace_tsk. However, as of 9212ddb5eada ("stacktrace: provide save_stack_trace_tsk() weak alias") a dummy implementation is provided if STACKTRACE=y. Given that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether. Signed-off-by: Will Deacon <[email protected]> Acked-by: Heiko Carstens <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Russell King <[email protected]> Cc: James Hogan <[email protected]> Cc: Michal Simek <[email protected]> Cc: Helge Deller <[email protected]> Acked-by: Michael Ellerman <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	include/linux/kdev_t.h: remove new_valid_dev()	Yaowei Bai	1	-5/+0
	As all new_valid_dev() checks have been removed it's time to drop new_valid_dev() itself. No functional change. Signed-off-by: Yaowei Bai <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	fs/stat.c: drop the last new_valid_dev check	Yaowei Bai	1	-1/+1
	New_valid_dev() always returns true, so that's unnecessary to perform new_valid_dev() checks in some filesystems. Most checks of new_valid_dev() have been removed so let's drop this last one and then we can remove new_valid_dev() from the source code. No functional change. Signed-off-by: Yaowei Bai <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	include/linux/kernel.h: change abs() macro so it uses consistent return type	Michal Nazarewicz	3	-24/+23
	Rewrite abs() so that its return type does not depend on the architecture and no unexpected type conversion happen inside of it. The only conversion is from unsigned to signed type. char is left as a return type but treated as a signed type regradless of it's actual signedness. With the old version, int arguments were promoted to long and depending on architecture a long argument might result in s64 or long return type (which may or may not be the same). This came after some back and forth with Nicolas. The current macro has different return type (for the same input type) depending on architecture which might be midly iritating. An alternative version would promote to int like so: #define abs(x) __abs_choose_expr(x, long long, \ __abs_choose_expr(x, long, \ __builtin_choose_expr( \ sizeof(x) <= sizeof(int), \ ({ int __x = (x); __x<0?-__x:__x; }), \ ((void)0)))) I have no preference but imagine Linus might. :] Nicolas argument against is that promoting to int causes iconsistent behaviour: int main(void) { unsigned short a = 0, b = 1, c = a - b; unsigned short d = abs(a - b); unsigned short e = abs(c); printf("%u %u\n", d, e); // prints: 1 65535 } Then again, no sane person expects consistent behaviour from C integer arithmetic. ;) Note: __builtin_types_compatible_p(unsigned char, char) is always false, and __builtin_types_compatible_p(signed char, char) is also always false. Signed-off-by: Michal Nazarewicz <[email protected]> Reviewed-by: Nicolas Pitre <[email protected]> Cc: Srinivas Pandruvada <[email protected]> Cc: Wey-Yi Guy <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16	include/linux/poison.h: use POISON_POINTER_DELTA for poison pointers	Vasily Kulikov	1	-2/+2
	TIMER_ENTRY_STATIC and TAIL_MAPPING are defined as poison pointers which should point to nowhere. Redefine them using POISON_POINTER_DELTA arithmetics to make sure they really point to non-mappable area declared by the target architecture. Signed-off-by: Vasily Kulikov <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Solar Designer <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	memcg: only free spare array when readers are done	Martijn Coenen	1	-5/+6
	A spare array holding mem cgroup threshold events is kept around to make sure we can always safely deregister an event and have an array to store the new set of events in. In the scenario where we're going from 1 to 0 registered events, the pointer to the primary array containing 1 event is copied to the spare slot, and then the spare slot is freed because no events are left. However, it is freed before calling synchronize_rcu(), which means readers may still be accessing threshold->primary after it is freed. Fixed by only freeing after synchronize_rcu(). Signed-off-by: Martijn Coenen <[email protected]> Cc: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm: soft-offline: exit with failure for non anonymous thp	Naoya Horiguchi	1	-8/+8
	Currently memory_failure() doesn't handle non anonymous thp case, because we can hardly expect the error handling to be successful, and it can just hit some corner case which results in BUG_ON or something severe like that. This is also the case for soft offline code, so let's make it in the same way. Orignal code has a MF_COUNT_INCREASED check before put_hwpoison_page(), but it's unnecessary because get_any_page() is already called when running on this code, which takes a refcount of the target page regardress of the flag. So this patch also removes it. [[email protected]: fix build] Signed-off-by: Naoya Horiguchi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm: soft-offline: clean up soft_offline_page()	Naoya Horiguchi	1	-31/+47
	soft_offline_page() has some deeply indented code, that's the sign of demand for cleanup. So let's do this. No functionality change. Signed-off-by: Naoya Horiguchi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm/hugetlbfs: unmap pages if page fault raced with hole punch	Mike Kravetz	1	-69/+75
	Page faults can race with fallocate hole punch. If a page fault happens between the unmap and remove operations, the page is not removed and remains within the hole. This is not the desired behavior. The race is difficult to detect in user level code as even in the non-race case, a page within the hole could be faulted back in before fallocate returns. If userfaultfd is expanded to support hugetlbfs in the future, this race will be easier to observe. If this race is detected and a page is mapped, the remove operation (remove_inode_hugepages) will unmap the page before removing. The unmap within remove_inode_hugepages occurs with the hugetlb_fault_mutex held so that no other faults will be processed until the page is removed. The (unmodified) routine hugetlb_vmdelete_list was moved ahead of remove_inode_hugepages to satisfy the new reference. [[email protected]: move hugetlb_vmdelete_list()] Signed-off-by: Mike Kravetz <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	fs/hugetlbfs/inode.c: fix bugs in hugetlb_vmtruncate_list()	Mike Kravetz	1	-8/+11
	Hillf Danton noticed bugs in the hugetlb_vmtruncate_list routine. The argument end is of type pgoff_t. It was being converted to a vaddr offset and passed to unmap_hugepage_range. However, end was also being used as an argument to the vma_interval_tree_foreach controlling loop. In addition, the conversion of end to vaddr offset was incorrect. hugetlb_vmtruncate_list is called as part of a file truncate or fallocate hole punch operation. When truncating a hugetlbfs file, this bug could prevent some pages from being unmapped. This is possible if there are multiple vmas mapping the file, and there is a sufficiently sized hole between the mappings. The size of the hole between two vmas (A,B) must be such that the starting virtual address of B is greater than (ending virtual address of A << PAGE_SHIFT). In this case, the pages in B would not be unmapped. If pages are not properly unmapped during truncate, the following BUG is hit: kernel BUG at fs/hugetlbfs/inode.c:428! In the fallocate hole punch case, this bug could prevent pages from being unmapped as in the truncate case. However, for hole punch the result is that unmapped pages will not be removed during the operation. For hole punch, it is also possible that more pages than desired will be unmapped. This unnecessary unmapping will cause page faults to reestablish the mappings on subsequent page access. Fixes: 1bfad99ab (" hugetlbfs: hugetlb_vmtruncate_list() needs to take a range")Reported-by: Hillf Danton <[email protected]> Signed-off-by: Mike Kravetz <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dave Hansen <[email protected]> Cc: <[email protected]> [4.3] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm: make swapoff more robust against soft dirty	Hugh Dickins	1	-14/+4
	Both s390 and powerpc have hit the issue of swapoff hanging, when CONFIG_HAVE_ARCH_SOFT_DIRTY and CONFIG_MEM_SOFT_DIRTY ifdefs were not quite as x86_64 had them. I think it would be much clearer if HAVE_ARCH_SOFT_DIRTY was just a Kconfig option set by architectures to determine whether the MEM_SOFT_DIRTY option should be offered, and the actual code depend upon CONFIG_MEM_SOFT_DIRTY alone. But won't embark on that change myself: instead make swapoff more robust, by using pte_swp_clear_soft_dirty() on each pte it encounters, without an explicit #ifdef CONFIG_MEM_SOFT_DIRTY. That being a no-op, whether the bit in question is defined as 0 or the asm-generic fallback is used, unless soft dirty is fully turned on. Why "maybe" in maybe_same_pte()? Rename it pte_same_as_swp(). Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Acked-by: Cyrill Gorcunov <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Martin Schwidefsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm: fix locking order in mm_take_all_locks()	Kirill A. Shutemov	3	-21/+37
	Dmitry Vyukov has reported[1] possible deadlock (triggered by his syzkaller fuzzer): Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&hugetlbfs_i_mmap_rwsem_key); lock(&mapping->i_mmap_rwsem); lock(&hugetlbfs_i_mmap_rwsem_key); lock(&mapping->i_mmap_rwsem); Both traces points to mm_take_all_locks() as a source of the problem. It doesn't take care about ordering or hugetlbfs_i_mmap_rwsem_key (aka mapping->i_mmap_rwsem for hugetlb mapping) vs. i_mmap_rwsem. huge_pmd_share() does memory allocation under hugetlbfs_i_mmap_rwsem_key and allocator can take i_mmap_rwsem if it hit reclaim. So we need to take i_mmap_rwsem from all hugetlb VMAs before taking i_mmap_rwsem from rest of VMAs. The patch also documents locking order for hugetlbfs_i_mmap_rwsem_key. [1] http://lkml.kernel.org/r/CACT4Y+Zu95tBs-0EvdiAKzUOsb4tczRRfCRTpLr4bg_OP9HuVg@mail.gmail.com Signed-off-by: Kirill A. Shutemov <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm: mempolicy: skip non-migratable VMAs when setting MPOL_MF_LAZY	Liang Chen	1	-1/+2
	MPOL_MF_LAZY is not visible from userspace since a720094ded8c ("mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now"), but it should still skip non-migratable VMAs such as VM_IO, VM_PFNMAP, and VM_HUGETLB VMAs, and avoid useless overhead of minor faults. Signed-off-by: Liang Chen <[email protected]> Signed-off-by: Gavin Guo <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm/page_alloc.c: remove unused struct zone *z variable	Alexander Kuleshov	1	-2/+0
	Remove unused struct zone *z variable which appeared in 86051ca5eaf5 ("mm: fix usemap initialization"). Signed-off-by: Alexander Kuleshov <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm/mlock.c: change can_do_mlock return value type to boolean	Wang Xiaoqiang	2	-5/+5
	Since can_do_mlock only return 1 or 0, so make it boolean. No functional change. [[email protected]: update declaration in mm.h] Signed-off-by: Wang Xiaoqiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm/vmalloc.c: use macro IS_ALIGNED to judge the aligment	Wang Xiaoqiang	1	-2/+2
	Just cleanup, no functional change. Signed-off-by: Wang Xiaoqiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	cgroup, memcg, writeback: drop spurious rcu locking around ↵	Tejun Heo	2	-5/+0
	mem_cgroup_css_from_page() In earlier versions, mem_cgroup_css_from_page() could return non-root css on a legacy hierarchy which can go away and required rcu locking; however, the eventual version simply returns the root cgroup if memcg is on a legacy hierarchy and thus doesn't need rcu locking around or in it. Remove spurious rcu lockings. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm/page_isolation: do some cleanup in "undo_isolate_page_range"	Wang Xiaoqiang	1	-2/+4
	Use "IS_ALIGNED" to judge the alignment, rather than directly judging. Signed-off-by: Wang Xiaoqiang <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	memblock: fix section mismatch	Kirill A. Shutemov	1	-9/+9
	allmodconfig produces following warning for me: WARNING: vmlinux.o(.text.unlikely+0x10314): Section mismatch in reference from the function movable_node_is_enabled() to the variable .meminit.data:movable_node_enabled The function movable_node_is_enabled() references the variable __meminitdata movable_node_enabled. This is often because movable_node_is_enabled lacks a __meminitdata annotation or the annotation of movable_node_enabled is wrong. Let's mark the function with __meminit. It fixes the warning. Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	s390/mm: enable fixup_user_fault retrying	Dominik Dingel	1	-3/+26
	By passing a non-null flag we allow fixup_user_fault to retry, which enables userfaultfd. As during these retries we might drop the mmap_sem we need to check if that happened and redo the complete chain of actions. Signed-off-by: Dominik Dingel <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: "Jason J. Herne" <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric B Munson <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Dominik Dingel <[email protected]> Cc: Paolo Bonzini <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm: bring in additional flag for fixup_user_fault to signal unlock	Dominik Dingel	4	-11/+34
	During Jason's work with postcopy migration support for s390 a problem regarding gmap faults was discovered. The gmap code will call fixup_user_fault which will end up always in handle_mm_fault. Till now we never cared about retries, but as the userfaultfd code kind of relies on it. this needs some fix. This patchset does not take care of the futex code. I will now look closer at this. This patch (of 2): With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the faulting we ever unlocked mmap_sem. This patch brings in the logic to handle retries as well as it cleans up the current documentation. fixup_user_fault was not having the same semantics as filemap_fault. It never indicated if a retry happened and so a caller wasn't able to handle that case. So we now changed the behaviour to always retry a locked mmap_sem. Signed-off-by: Dominik Dingel <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: "Jason J. Herne" <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric B Munson <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Dominik Dingel <[email protected]> Cc: Paolo Bonzini <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	dax: re-enable dax pmd mappings	Dan Williams	2	-7/+4
	Now that the get_user_pages() path knows how to handle dax-pmd mappings, remove the protections that disabled dax-pmd support. Tests available from github.com/pmem/ndctl: make TESTS="lib/test-dax.sh lib/test-mmap.sh" check Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	dax: provide diagnostics for pmd mapping failures	Dan Williams	1	-8/+57
	There is a wide gamut of conditions that can trigger the dax pmd path to fallback to pte mappings. Ideally we'd have a syscall interface to determine mapping characteristics after the fact. In the meantime provide debug messages. Signed-off-by: Dan Williams <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm, x86: get_user_pages() for dax mappings	Dan Williams	8	-39/+212
	A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver has established a devm_memremap_pages() mapping, i.e. when the pfn_t return from ->direct_access() has PFN_DEV and PFN_MAP set. Later, when encountering _PAGE_DEVMAP during a page table walk we lookup and pin a struct dev_pagemap instance to keep the result of pfn_to_page() valid until put_page(). Signed-off-by: Dan Williams <[email protected]> Tested-by: Logan Gunthorpe <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd	Dan Williams	7	-27/+47
	A dax-huge-page mapping while it uses some thp helpers is ultimately not a transparent huge page. The distinction is especially important in the get_user_pages() path. pmd_devmap() is used to distinguish dax-pmds from pmd_huge() and pmd_trans_huge() which have slightly different semantics. Explicitly mark the pmd_trans_huge() helpers that dax needs by adding pmd_devmap() checks. [[email protected]: fix regression in handling mlocked pages in __split_huge_pmd()] Signed-off-by: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm, dax, pmem: introduce {get\|put}_dev_pagemap() for dax-gup	Dan Williams	6	-8/+125
	get_dev_page() enables paths like get_user_pages() to pin a dynamically mapped pfn-range (devm_memremap_pages()) while the resulting struct page objects are in use. Unlike get_page() it may fail if the device is, or is in the process of being, disabled. While the initial lookup of the range may be an expensive list walk, the result is cached to speed up subsequent lookups which are likely to be in the same mapped range. devm_memremap_pages() now requires a reference counter to be specified at init time. For pmem this means moving request_queue allocation into pmem_alloc() so the existing queue usage counter can track "device pages". ZONE_DEVICE pages always have an elevated count and will never be on an lru reclaim list. That space in 'struct page' can be redirected for other uses, but for safety introduce a poison value that will always trip __list_add() to assert. This allows half of the struct list_head storage to be reclaimed with some assurance to back up the assumption that the page count never goes to zero and a list_add() is never attempted. Signed-off-by: Dan Williams <[email protected]> Tested-by: Logan Gunthorpe <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Alexander Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	libnvdimm, pmem: move request_queue allocation earlier in probe	Dan Williams	1	-13/+20
	Before the dynamically allocated struct pages from devm_memremap_pages() can be put to use outside the driver, we need a mechanism to track whether they are still in use at teardown. Towards that goal reorder the initialization sequence to allow the 'q_usage_counter' from the request_queue to be used by the devm_memremap_pages() implementation (in subsequent patches). Signed-off-by: Dan Williams <[email protected]> Cc: Ross Zwisler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm, dax: convert vmf_insert_pfn_pmd() to pfn_t	Dan Williams	8	-11/+30
	Similar to the conversion of vm_insert_mixed() use pfn_t in the vmf_insert_pfn_pmd() to tag the resulting pte with _PAGE_DEVICE when the pfn is backed by a devm_memremap_pages() mapping. Signed-off-by: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Alexander Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm, dax, gpu: convert vm_insert_mixed to pfn_t	Dan Williams	10	-14/+61
	Convert the raw unsigned long 'pfn' argument to pfn_t for the purpose of evaluating the PFN_MAP and PFN_DEV flags. When both are set it triggers _PAGE_DEVMAP to be set in the resulting pte. There are no functional changes to the gpu drivers as a result of this conversion. Signed-off-by: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Airlie <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	x86, mm: introduce _PAGE_DEVMAP	Dan Williams	1	-1/+6
	_PAGE_DEVMAP is a hardware-unused pte bit that will later be used in the get_user_pages() path to identify pfns backed by the dynamic allocation established by devm_memremap_pages. Upon seeing that bit the gup path will lookup and pin the allocation while the pages are in use. Since the _PAGE_DEVMAP bit is > 32 it must be cast to u64 instead of a pteval_t to allow pmd_flags() usage in the realmode boot code to build. Signed-off-by: Dan Williams <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	frv: fix compiler warning from definition of __pmd()	Dan Williams	1	-1/+1
	Take into account that the pmd_t type is a array inside a struct, so it needs two levels of brackets to initialize. Otherwise, a usage of __pmd generates a warning: include/linux/mm.h:986:2: warning: missing braces around initializer [-Wmissing-braces] Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	hugetlb: fix compile error on tile	Dan Williams	1	-0/+1
	Inlude asm/pgtable.h to get the definition for pud_t to fix: include/linux/hugetlb.h:203:29: error: unknown type name 'pud_t' Signed-off-by: Dan Williams <[email protected]> Cc: Liviu Dudau <[email protected]> Cc: Sudeep Holla <[email protected]> Cc: Lorenzo Pieralisi <[email protected]> Cc: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	avr32: convert to asm-generic/memory_model.h	Dan Williams	1	-4/+4
	Switch avr32/include/asm/page.h to use the common defintions for pfn_to_page(), page_to_pfn(), and ARCH_PFN_OFFSET. Signed-off-by: Dan Williams <[email protected]> Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	libnvdimm, pfn, pmem: allocate memmap array in persistent memory	Dan Williams	4	-4/+20
	Use the new vmem_altmap capability to enable the pmem driver to arrange for a struct page memmap to be established in persistent memory. [[email protected]: mn10300: declare __pfn_to_phys() to fix build error] Signed-off-by: Dan Williams <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Ross Zwisler <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	x86, mm: introduce vmem_altmap to augment vmemmap_populate()	Dan Williams	10	-42/+282
	In support of providing struct page for large persistent memory capacities, use struct vmem_altmap to change the default policy for allocating memory for the memmap array. The default vmemmap_populate() allocates page table storage area from the page allocator. Given persistent memory capacities relative to DRAM it may not be feasible to store the memmap in 'System Memory'. Instead vmem_altmap represents pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf() requests. Signed-off-by: Dan Williams <[email protected]> Reported-by: kbuild test robot <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm: introduce find_dev_pagemap()	Dan Williams	4	-24/+116
	There are several scenarios where we need to retrieve and update metadata associated with a given devm_memremap_pages() mapping, and the only lookup key available is a pfn in the range: 1/ We want to augment vmemmap_populate() (called via arch_add_memory()) to allocate memmap storage from pre-allocated pages reserved by the device driver. At vmemmap_alloc_block_buf() time it grabs device pages rather than page allocator pages. This is in support of devm_memremap_pages() mappings where the memmap is too large to fit in main memory (i.e. large persistent memory devices). 2/ Taking a reference against the mapping when inserting device pages into the address_space radix of a given inode. This facilitates unmap_mapping_range() and truncate_inode_pages() operations when the driver is tearing down the mapping. 3/ get_user_pages() operations on ZONE_DEVICE memory require taking a reference against the mapping so that the driver teardown path can revoke and drain usage of device pages. Signed-off-by: Dan Williams <[email protected]> Tested-by: Logan Gunthorpe <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Ross Zwisler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm: skip memory block registration for ZONE_DEVICE	Dan Williams	2	-0/+25
	Prevent userspace from trying and failing to online ZONE_DEVICE pages which are meant to never be onlined. For example on platforms with a udev rule like the following: SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online" ...will generate futile attempts to online the ZONE_DEVICE sections. Example kernel messages: Built 1 zonelists in Node order, mobility grouping on. Total pages: 1004747 Policy zone: Normal online_pages [mem 0x248000000-0x24fffffff] failed Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	mm, dax, pmem: introduce pfn_t	Dan Williams	9	-23/+116
	For the purpose of communicating the optional presence of a 'struct page' for the pfn returned from ->direct_access(), introduce a type that encapsulates a page-frame-number plus flags. These flags contain the historical "page_link" encoding for a scatterlist entry, but can also denote "device memory". Where "device memory" is a set of pfns that are not part of the kernel's linear mapping by default, but are accessed via the same memory controller as ram. The motivation for this new type is large capacity persistent memory that needs struct page entries in the 'memmap' to support 3rd party DMA (i.e. O_DIRECT I/O with a persistent memory source/target). However, we also need it in support of maintaining a list of mapped inodes which need to be unmapped at driver teardown or freeze_bdev() time. Signed-off-by: Dan Williams <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ross Zwisler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15	kvm: rename pfn_t to kvm_pfn_t	Dan Williams	23	-104/+110
	To date, we have implemented two I/O usage models for persistent memory, PMEM (a persistent "ram disk") and DAX (mmap persistent memory into userspace). This series adds a third, DAX-GUP, that allows DAX mappings to be the target of direct-i/o. It allows userspace to coordinate DMA/RDMA from/to persistent memory. The implementation leverages the ZONE_DEVICE mm-zone that went into 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned and dynamically mapped by a device driver. The pmem driver, after mapping a persistent memory range into the system memmap via devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus page-backed pmem-pfns via flags in the new pfn_t type. The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the resulting pte(s) inserted into the process page tables with a new _PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys off _PAGE_DEVMAP to pin the device hosting the page range active. Finally, get_page() and put_page() are modified to take references against the device driver established page mapping. Finally, this need for "struct page" for persistent memory requires memory capacity to store the memmap array. Given the memmap array for a large pool of persistent may exhaust available DRAM introduce a mechanism to allocate the memmap from persistent memory. The new "struct vmem_altmap *" parameter to devm_memremap_pages() enables arch_add_memory() to use reserved pmem capacity rather than the page allocator. This patch (of 18): The core has developed a need for a "pfn_t" type [1]. Move the existing pfn_t in KVM to kvm_pfn_t [2]. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html Signed-off-by: Dan Williams <[email protected]> Acked-by: Christoffer Dall <[email protected]> Cc: Paolo Bonzini <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>