aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-06-24mailmap: add Antoine Tenart's emailAntoine Tenart1-0/+1
I used "Antoine Ténart" at first but then moved to a name without accent as this cause some issues from time to time... Add my email in the mailmap file to have a consistent shortlog output. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Antoine Tenart <[email protected]> Cc: Antoine Tenart <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim maskMel Gorman1-1/+2
Commit d0164adc89f6 ("mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd") modified __GFP_WAIT to explicitly identify the difference between atomic callers and those that were unwilling to sleep. Later the definition was removed entirely. The GFP_RECLAIM_MASK is the set of flags that affect watermark checking and reclaim behaviour but __GFP_ATOMIC was never added. Without it, atomic users of the slab allocator strip the __GFP_ATOMIC flag and cannot access the page allocator atomic reserves. This patch addresses the problem. The user-visible impact depends on the workload but potentially atomic allocations unnecessarily fail without this path. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Reported-by: Marcin Wojtas <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: <[email protected]> [4.4+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24mm: mempool: kasan: don't poot mempool objects in quarantineAndrey Ryabinin3-15/+14
Currently we may put reserved by mempool elements into quarantine via kasan_kfree(). This is totally wrong since quarantine may really free these objects. So when mempool will try to use such element, use-after-free will happen. Or mempool may decide that it no longer need that element and double-free it. So don't put object into quarantine in kasan_kfree(), just poison it. Rename kasan_kfree() to kasan_poison_kfree() to respect that. Also, we shouldn't use kasan_slab_alloc()/kasan_krealloc() in kasan_unpoison_element() because those functions may update allocation stacktrace. This would be wrong for the most of the remove_element call sites. (The only call site where we may want to update alloc stacktrace is in mempool_alloc(). Kmemleak solves this by calling kmemleak_update_trace(), so we could make something like that too. But this is out of scope of this patch). Fixes: 55834c59098d ("mm: kasan: initial memory quarantine implementation") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andrey Ryabinin <[email protected]> Reported-by: Kuthonuzo Luruo <[email protected]> Acked-by: Alexander Potapenko <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Kostya Serebryany <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24MAINTAINERS: update Calgary IOMMUJon Mason1-3/+3
Update the contact info for Muli, clean-up my name, and update the mailing list to the IOMMU mailing list. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jon Mason <[email protected]> Cc: Muli Ben-Yehuda <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Bartlomiej Zolnierkiewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24jbd2: get rid of superfluous __GFP_REPEATMichal Hocko1-25/+7
jbd2_alloc is explicit about its allocation preferences wrt. the allocation size. Sub page allocations go to the slab allocator and larger are using either the page allocator or vmalloc. This is all good but the logic is unnecessarily complex. 1) as per Ted, the vmalloc fallback is a left-over: : jbd2_alloc is only passed in the bh->b_size, which can't be PAGE_SIZE, so : the code path that calls vmalloc() should never get called. When we : conveted jbd2_alloc() to suppor sub-page size allocations in commit : d2eecb039368, there was an assumption that it could be called with a size : greater than PAGE_SIZE, but that's certaily not true today. Moreover vmalloc allocation might even lead to a deadlock because the callers expect GFP_NOFS context while vmalloc is GFP_KERNEL. 2) __GFP_REPEAT for requests <= PAGE_ALLOC_COSTLY_ORDER is ignored since the flag was introduced. Let's simplify the code flow and use the slab allocator for sub-page requests and the page allocator for others. Even though order > 0 is not currently used as per above leave that option open. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24unicore32: get rid of superfluous __GFP_REPEATMichal Hocko1-1/+1
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. PGALLOC_GFP uses __GFP_REPEAT but it is only used in pte_alloc_one, pte_alloc_one_kernel which does order-0 request. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Cc: Guan Xuetao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24tile: get rid of superfluous __GFP_REPEATMichal Hocko1-1/+1
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. pgtable_alloc_one uses __GFP_REPEAT flag for L2_USER_PGTABLE_ORDER but the order is either 0 or 3 if L2_KERNEL_PGTABLE_SHIFT for HPAGE_SHIFT. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Chris Metcalf <[email protected]> [for tile] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24sh: get rid of superfluous __GFP_REPEATMichal Hocko1-1/+1
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. PGALLOC_GFP uses __GFP_REPEAT but {pgd,pmd}_alloc allocate from {pgd,pmd}_cache but both caches are allocating up to PAGE_SIZE objects. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24s390: get rid of superfluous __GFP_REPEATMichal Hocko1-1/+1
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. page_table_alloc then uses the flag for a single page allocation. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Heiko Carstens <[email protected]> Cc: Martin Schwidefsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24sparc: get rid of superfluous __GFP_REPEATMichal Hocko1-4/+2
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. {pud,pmd}_alloc_one is using __GFP_REPEAT but it always allocates from pgtable_cache which is initialzed to PAGE_SIZE objects. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24powerpc: get rid of superfluous __GFP_REPEATMichal Hocko3-11/+7
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. {pud,pmd}_alloc_one are allocating from {PGT,PUD}_CACHE initialized in pgtable_cache_init which doesn't have larger than sizeof(void *) << 12 size and that fits into !costly allocation request size. PGALLOC_GFP is used only in radix__pgd_alloc which uses either order-0 or order-4 requests. The first one doesn't need the flag while the second does. Drop __GFP_REPEAT from PGALLOC_GFP and add it for the order-4 one. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24score: get rid of superfluous __GFP_REPEATMichal Hocko1-3/+2
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. pte_alloc_one{_kernel} allocate PTE_ORDER which is 0. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Cc: Chen Liqin <[email protected]> Cc: Lennox Wu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24parisc: get rid of superfluous __GFP_REPEATMichal Hocko1-2/+1
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. pmd_alloc_one allocate PMD_ORDER which is 1. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24nios2: get rid of superfluous __GFP_REPEATMichal Hocko1-3/+2
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. pte_alloc_one{_kernel} allocate PTE_ORDER which is 0. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Cc: Ley Foon Tan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24mips: get rid of superfluous __GFP_REPEATMichal Hocko1-3/+3
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. pte_alloc_one{_kernel}, pmd_alloc_one allocate PTE_ORDER resp. PMD_ORDER but both are not larger than 1. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Cc: John Crispin <[email protected]> Cc: Ralf Baechle <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24arc: get rid of superfluous __GFP_REPEATMichal Hocko1-2/+2
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. pte_alloc_one_kernel uses __get_order_pte but this is obviously always zero because BITS_FOR_PTE is not larger than 9 yet the page size is always larger than 4K. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24arm64: get rid of superfluous __GFP_REPEATMichal Hocko1-1/+1
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. {pte,pmd,pud}_alloc_one{_kernel}, late_pgtable_alloc use PGALLOC_GFP for __get_free_page (aka order-0). pgd_alloc is slightly more complex because it allocates from pgd_cache if PGD_SIZE != PAGE_SIZE and PGD_SIZE depends on the configuration (CONFIG_ARM64_VA_BITS, PAGE_SHIFT and CONFIG_PGTABLE_LEVELS). As per config PGTABLE_LEVELS int default 2 if ARM64_16K_PAGES && ARM64_VA_BITS_36 default 2 if ARM64_64K_PAGES && ARM64_VA_BITS_42 default 3 if ARM64_64K_PAGES && ARM64_VA_BITS_48 default 3 if ARM64_4K_PAGES && ARM64_VA_BITS_39 default 3 if ARM64_16K_PAGES && ARM64_VA_BITS_47 default 4 if !ARM64_64K_PAGES && ARM64_VA_BITS_48 we should have the following options CONFIG_ARM64_VA_BITS:48 CONFIG_PGTABLE_LEVELS:4 PAGE_SIZE:4k size:4096 pages:1 CONFIG_ARM64_VA_BITS:48 CONFIG_PGTABLE_LEVELS:4 PAGE_SIZE:16k size:16 pages:1 CONFIG_ARM64_VA_BITS:48 CONFIG_PGTABLE_LEVELS:3 PAGE_SIZE:64k size:512 pages:1 CONFIG_ARM64_VA_BITS:47 CONFIG_PGTABLE_LEVELS:3 PAGE_SIZE:16k size:16384 pages:1 CONFIG_ARM64_VA_BITS:42 CONFIG_PGTABLE_LEVELS:2 PAGE_SIZE:64k size:65536 pages:1 CONFIG_ARM64_VA_BITS:39 CONFIG_PGTABLE_LEVELS:3 PAGE_SIZE:4k size:4096 pages:1 CONFIG_ARM64_VA_BITS:36 CONFIG_PGTABLE_LEVELS:2 PAGE_SIZE:16k size:16384 pages:1 All of them fit into a single page (aka order-0). This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Catalin Marinas <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24x86/efi: get rid of superfluous __GFP_REPEATMichal Hocko1-1/+1
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. efi_alloc_page_tables uses __GFP_REPEAT but it allocates an order-0 page. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Matt Fleming <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24x86: get rid of superfluous __GFP_REPEATMichal Hocko2-2/+2
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. PGALLOC_GFP uses __GFP_REPEAT but none of the allocation which uses this flag is for more than order-0. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Andy Lutomirski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24tree wide: get rid of __GFP_REPEAT for order-0 allocations part IMichal Hocko27-52/+47
This is the third version of the patchset previously sent [1]. I have basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get rid of superfluous gfp flags" which went through dm tree. I am sending it now because it is tree wide and chances for conflicts are reduced considerably when we want to target rc2. I plan to send the next step and rename the flag and move to a better semantic later during this release cycle so we will have a new semantic ready for 4.8 merge window hopefully. Motivation: While working on something unrelated I've checked the current usage of __GFP_REPEAT in the tree. It seems that a majority of the usage is and always has been bogus because __GFP_REPEAT has always been about costly high order allocations while we are using it for order-0 or very small orders very often. It seems that a big pile of them is just a copy&paste when a code has been adopted from one arch to another. I think it makes some sense to get rid of them because they are just making the semantic more unclear. Please note that GFP_REPEAT is documented as * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt * _might_ fail. This depends upon the particular VM implementation. while !costly requests have basically nofail semantic. So one could reasonably expect that order-0 request with __GFP_REPEAT will not loop for ever. This is not implemented right now though. I would like to move on with __GFP_REPEAT and define a better semantic for it. $ git grep __GFP_REPEAT origin/master | wc -l 111 $ git grep __GFP_REPEAT | wc -l 36 So we are down to the third after this patch series. The remaining places really seem to be relying on __GFP_REPEAT due to large allocation requests. This still needs some double checking which I will do later after all the simple ones are sorted out. I am touching a lot of arch specific code here and I hope I got it right but as a matter of fact I even didn't compile test for some archs as I do not have cross compiler for them. Patches should be quite trivial to review for stupid compile mistakes though. The tricky parts are usually hidden by macro definitions and thats where I would appreciate help from arch maintainers. [1] http://lkml.kernel.org/r/[email protected] This patch (of 19): __GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. Yet we have the full kernel tree with its usage for apparently order-0 allocations. This is really confusing because __GFP_REPEAT is explicitly documented to allow allocation failures which is a weaker semantic than the current order-0 has (basically nofail). Let's simply drop __GFP_REPEAT from those places. This would allow to identify place which really need allocator to retry harder and formulate a more specific semantic for what the flag is supposed to do actually. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Liqin <[email protected]> Cc: Chris Metcalf <[email protected]> [for tile] Cc: Guan Xuetao <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jan Kara <[email protected]> Cc: John Crispin <[email protected]> Cc: Lennox Wu <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matt Fleming <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24tmpfs: don't undo fallocate past its last pageAnthony Romano1-1/+1
When fallocate is interrupted it will undo a range that extends one byte past its range of allocated pages. This can corrupt an in-use page by zeroing out its first byte. Instead, undo using the inclusive byte range. Fixes: 1635f6a74152f1d ("tmpfs: undo fallocation on failure") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anthony Romano <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Brandon Philips <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24selftests/vm/compaction_test: fix write to restore nr_hugepagesMike Kravetz1-1/+1
The write at the end of the test to restore nr_hugepages to its previous value is failing. This is because it is trying to write the number of bytes in the char array as opposed to the number of bytes in the string. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Sri Jayaramappa <[email protected]> Cc: Eric B Munson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24oom_reaper: avoid pointless atomic_inc_not_zero usage.Tetsuo Handa1-7/+1
Since commit 36324a990cf5 ("oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space") changed to use find_lock_task_mm() for finding a mm_struct to reap, it is guaranteed that mm->mm_users > 0 because find_lock_task_mm() returns a task_struct with ->mm != NULL. Therefore, we can safely use atomic_inc(). Link: http://lkml.kernel.org/r/1465024759-8074-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero()Tetsuo Handa1-0/+1
Commit e2fe14564d33 ("oom_reaper: close race with exiting task") reduced frequency of needlessly selecting next OOM victim, but was calling mmput_async() when atomic_inc_not_zero() failed. Link: http://lkml.kernel.org/r/1464423365-5555-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24Merge tag 'nfsd-4.7-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds4-46/+48
Pull nfsd bugfixes from Bruce Fields: "Fix missing server-side permission checks on setting NFS ACLs" * tag 'nfsd-4.7-2' of git://linux-nfs.org/~bfields/linux: nfsd: check permissions when setting ACLs posix_acl: Add set_posix_acl
2016-06-24fix up initial thread stack pointer vs thread_info confusionLinus Torvalds2-1/+2
The INIT_TASK() initializer was similarly confused about the stack vs thread_info allocation that the allocators had, and that were fixed in commit b235beea9e99 ("Clarify naming of thread info/stack allocators"). The task ->stack pointer only incidentally ends up having the same value as the thread_info, and in fact that will change. So fix the initial task struct initializer to point to 'init_stack' instead of 'init_thread_info', and make sure the ia64 definition for that exists. This actually makes the ia64 tsk->stack pointer be sensible for the initial task, but not for any other task. As mentioned in commit b235beea9e99, that whole pointer isn't actually used on ia64, since task_stack_page() there just points to the (single) allocation. All the other architectures seem to have copied the 'init_stack' definition, even if it tended to be generally unusued. Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24USB: EHCI: declare hostpc register as zero-length arrayAlan Stern1-2/+2
The HOSTPC extension registers found in some EHCI implementations form a variable-length array, with one element for each port. Therefore the hostpc field in struct ehci_regs should be declared as a zero-length array, not a single-element array. This fixes a problem reported by UBSAN. Signed-off-by: Alan Stern <[email protected]> Reported-by: Wilfried Klaebe <[email protected]> Tested-by: Wilfried Klaebe <[email protected]> CC: <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-06-24Merge tag 'phy-for-4.7-rc5' of ↵Greg Kroah-Hartman4-23/+11
git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy into usb-linus Kishon writes: phy: for 4.7-rc5 *) Fix in sun4i-usb phy driver to properly handle the return value of gpiod_to_irq *) Fix a sparse warning in sun4i-usb phy driver *) Fix bcm-ns-usb2 phy driver to check the correct variable *) Fix spurious interrupts during VBUS change in rcar-gen3-usb2 phy driver Signed-off-by: Kishon Vijay Abraham I <[email protected]>
2016-06-24Merge tag 'usb-ci-v4.7-rc5' of ↵Greg Kroah-Hartman1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb into usb-linus Peter writes: One fix for module support in OTG FSM
2016-06-24x86: fix up a few misc stack pointer vs thread_info confusionsLinus Torvalds3-9/+6
As the actual pointer value is the same for the thread stack allocation and the thread_info, code that confused the two worked fine, but will break when the thread info is moved away from the stack allocation. It also looks very confusing. For example, the kprobe code wanted to know the current top of stack. To do that, it used this: (unsigned long)current_thread_info() + THREAD_SIZE which did indeed give the correct value. But it's not only a fairly nonsensical expression, it's also rather complex, especially since we actually have this: static inline unsigned long current_top_of_stack(void) which not only gives us the value we are interested in, but happens to be how "current_thread_info()" is currently defined as: (struct thread_info *)(current_top_of_stack() - THREAD_SIZE); so using current_thread_info() to figure out the top of the stack really is a very round-about thing to do. The other cases are just simpler confusion about task_thread_info() vs task_stack_page(), which currently return the same pointer - but if you want the stack page, you really should be using the latter one. And there was one entirely unused assignment of the current stack to a thread_info pointer. All cleaned up to make more sense today, and make it easier to move the thread_info away from the stack in the future. No semantic changes. Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24Clarify naming of thread info/stack allocatorsLinus Torvalds10-39/+41
We've had the thread info allocated together with the thread stack for most architectures for a long time (since the thread_info was split off from the task struct), but that is about to change. But the patches that move the thread info to be off-stack (and a part of the task struct instead) made it clear how confused the allocator and freeing functions are. Because the common case was that we share an allocation with the thread stack and the thread_info, the two pointers were identical. That identity then meant that we would have things like ti = alloc_thread_info_node(tsk, node); ... tsk->stack = ti; which certainly _worked_ (since stack and thread_info have the same value), but is rather confusing: why are we assigning a thread_info to the stack? And if we move the thread_info away, the "confusing" code just gets to be entirely bogus. So remove all this confusion, and make it clear that we are doing the stack allocation by renaming and clarifying the function names to be about the stack. The fact that the thread_info then shares the allocation is an implementation detail, and not really about the allocation itself. This is a pure renaming and type fix: we pass in the same pointer, it's just that we clarify what the pointer means. The ia64 code that actually only has one single allocation (for all of task_struct, thread_info and kernel thread stack) now looks a bit odd, but since "tsk->stack" is actually not even used there, that oddity doesn't matter. It would be a separate thing to clean that up, I intentionally left the ia64 changes as a pure brute-force renaming and type change. Acked-by: Andy Lutomirski <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24Merge branches 'pm-devfreq-fixes' and 'pm-cpufreq-fixes'Rafael J. Wysocki1-1/+1
* pm-devfreq-fixes: PM / devfreq: Send the DEVFREQ_POSTCHANGE notification when target() is failed PM / devfreq: fix initialization of current frequency in last status PM / devfreq: exynos-nocp: Remove incorrect IS_ERR() check PM / devfreq: remove double put_device PM / devfreq: fix double call put_device PM / devfreq: fix duplicated kfree on devfreq pointer PM / devfreq: devm_kzalloc to have dev pointer more precisely * pm-cpufreq-fixes: cpufreq: pcc-cpufreq: Fix doorbell.access_width
2016-06-24Merge branch 'acpica-fixes'Rafael J. Wysocki2-2/+9
* acpica-fixes: ACPICA: Namespace: Fix deadlock triggered by MLC support in dynamic table loading
2016-06-24File names with trailing period or space need special case conversionSteve French2-4/+31
POSIX allows files with trailing spaces or a trailing period but SMB3 does not, so convert these using the normal Services For Mac mapping as we do for other reserved characters such as : < > | ? * This is similar to what Macs do for the same problem over SMB3. CC: Stable <[email protected]> Signed-off-by: Steve French <[email protected]> Acked-by: Pavel Shilovsky <[email protected]>
2016-06-24Fix reconnect to not defer smb3 session reconnect long after socket reconnectSteve French2-1/+30
Azure server blocks clients that open a socket and don't do anything on it. In our reconnect scenarios, we can reconnect the tcp session and detect the socket is available but we defer the negprot and SMB3 session setup and tree connect reconnection until the next i/o is requested, but this looks suspicous to some servers who expect SMB3 negprog and session setup soon after a socket is created. In the echo thread, reconnect SMB3 sessions and tree connections that are disconnected. A later patch will replay persistent (and resilient) handle opens. CC: Stable <[email protected]> Signed-off-by: Steve French <[email protected]> Acked-by: Pavel Shilovsky <[email protected]>
2016-06-24nfsd: check permissions when setting ACLsBen Hutchings3-27/+25
Use set_posix_acl, which includes proper permission checks, instead of calling ->set_acl directly. Without this anyone may be able to grant themselves permissions to a file by setting the ACL. Lock the inode to make the new checks atomic with respect to set_acl. (Also, nfsd was the only caller of set_acl not locking the inode, so I suspect this may fix other races.) This also simplifies the code, and ensures our ACLs are checked by posix_acl_valid. The permission checks and the inode locking were lost with commit 4ac7249e, which changed nfsd to use the set_acl inode operation directly instead of going through xattr handlers. Reported-by: David Sinquin <[email protected]> [[email protected]: use set_posix_acl] Fixes: 4ac7249e Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Cc: [email protected] Signed-off-by: J. Bruce Fields <[email protected]>
2016-06-24posix_acl: Add set_posix_aclAndreas Gruenbacher1-19/+23
Factor out part of posix_acl_xattr_set into a common function that takes a posix_acl, which nfsd can also call. The prototype already exists in include/linux/posix_acl.h. Signed-off-by: Andreas Gruenbacher <[email protected]> Cc: [email protected] Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-06-24acpi, nfit: fix acpi_check_dsm() vs zero functions implementedDan Williams2-6/+6
QEMU 2.6 implements nascent support for nvdimm DSMs. Depending on configuration it may only implement the function0 dsm to indicate that no other DSMs are available. Commit 31eca76ba2fc "nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism" breaks QEMU, but QEMU is spec compliant. Per the spec the way to indicate that no functions are supported is: If Function Index is zero, the return is a buffer containing one bit for each function index, starting with zero. Bit 0 indicates whether there is support for any functions other than function 0 for the specified UUID and Revision ID. If set to zero, no functions are supported (other than function zero) for the specified UUID and Revision ID. Update the nfit driver to determine the family (interface UUID) without requiring the implementation to define any other functions, i.e. short-circuit acpi_check_dsm() to succeed per the spec. The nfit driver appears to be the only user passing funcs==0 to acpi_check_dsm(), so this behavior change of the common routine should be limited to the probing done by the nfit driver. Cc: Len Brown <[email protected]> Cc: Jerry Hoemann <[email protected]> Acked-by: "Rafael J. Wysocki" <[email protected]> Fixes: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism") Reported-by: Xiao Guangrong <[email protected]> Tested-by: Xiao Guangrong <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2016-06-24NFS: Fix an unused variable warningTrond Myklebust1-2/+0
Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-06-24NFS: Fix potential race in nfs_fhget()Trond Myklebust1-0/+1
If we don't set the mode correctly in nfs_init_locked(), then there is potential for a race with a second call to nfs_fhget that will cause inode aliasing. Signed-off-by: Trond Myklebust <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-06-24NFS: Don't let readdirplus revalidate an inode that was marked as staleTrond Myklebust1-1/+6
Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-06-24NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removedTrond Myklebust1-1/+3
According to RFC5661, section 12.5.3. the layout stateid is no longer valid once the client no longer holds any layout segments. Ensure that we mark it invalid. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-06-24NFS: Fix a double page unlockTrond Myklebust1-2/+2
Since commit 0bcbf039f6b2, nfs_readpage_release() has been used to unlock the page in the read code. Fixes: 0bcbf039f6b2 ("nfs: handle request add failure properly") Cc: [email protected] # v4.5+ Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-06-24pnfs_nfs: fix _cancel_empty_pagelistWeston Andros Adamson1-2/+10
pnfs_generic_commit_cancel_empty_pagelist calls nfs_commitdata_release, but that is wrong: nfs_commitdata_release puts the open context, something that isn't valid until nfs_init_commit is called, which is never the case when pnfs_generic_commit_cancel_empty_pagelist is called. This was introduced in "nfs: avoid race that crashes nfs_init_commit". Signed-off-by: Weston Andros Adamson <[email protected]> Cc: [email protected] Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-06-24nfs4: Fix potential use after free of state in nfs4_do_reclaim.Oleg Drokin1-1/+1
Commit e8d975e73e5f ("fixing infinite OPEN loop in 4.0 stateid recovery") introduced access to state after it was just potentially freed by nfs4_put_open_state leading to a random data corruption somewhere. BUG: unable to handle kernel paging request at ffff88004941ee40 IP: [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740 PGD 3501067 PUD 3504067 PMD 6ff37067 PTE 800000004941e060 Oops: 0002 [#1] SMP DEBUG_PAGEALLOC Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq tpm_tis joydev i2c_piix4 pcspkr tpm virtio_console nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops floppy serio_raw virtio_blk drm CPU: 6 PID: 2161 Comm: 192.168.10.253- Not tainted 4.7.0-rc1-vm-nfs+ #112 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800463dcd00 ti: ffff88003ff48000 task.ti: ffff88003ff48000 RIP: 0010:[<ffffffff813baf01>] [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740 RSP: 0018:ffff88003ff4bd68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffffff81a49900 RCX: 00000000000000e8 RDX: 00000000000000e8 RSI: ffff8800418b9930 RDI: ffff880040c96c88 RBP: ffff88003ff4bdf8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880040c96c98 R13: ffff88004941ee20 R14: ffff88004941ee40 R15: ffff88004941ee00 FS: 0000000000000000(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88004941ee40 CR3: 0000000060b0b000 CR4: 00000000000006e0 Stack: ffffffff813baad5 ffff8800463dcd00 ffff880000000001 ffffffff810e6b68 ffff880043ddbc88 ffff8800418b9800 ffff8800418b98c8 ffff88004941ee48 ffff880040c96c90 ffff880040c96c00 ffff880040c96c20 ffff880040c96c40 Call Trace: [<ffffffff813baad5>] ? nfs4_do_reclaim+0x35/0x740 [<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0 [<ffffffff813bb7cd>] nfs4_run_state_manager+0x5ed/0xa40 [<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740 [<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740 [<ffffffff810af0d1>] kthread+0x101/0x120 [<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0 [<ffffffff818843af>] ret_from_fork+0x1f/0x40 [<ffffffff810aefd0>] ? kthread_create_on_node+0x250/0x250 Code: 65 80 4c 8b b5 78 ff ff ff e8 fc 88 4c 00 48 8b 7d 88 e8 13 67 d2 ff 49 8b 47 40 a8 02 0f 84 d3 01 00 00 4c 89 ff e8 7f f9 ff ff <f0> 41 80 26 7f 48 8b 7d c8 e8 b1 84 4c 00 e9 39 fd ff ff 3d e6 RIP [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740 RSP <ffff88003ff4bd68> CR2: ffff88004941ee40 Signed-off-by: Oleg Drokin <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-06-24NFS: Fix up O_DIRECT resultsTrond Myklebust1-3/+7
if we read or wrote something, we must report it Signed-off-by: Trond Myklebust <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-06-24NFS/pnfs: handle bad delegation stateids in nfs4_layoutget_handle_exceptionTrond Myklebust1-4/+4
We must call nfs4_handle_exception() on BAD_STATEID errors. The only exception is if the stateid argument turns out to be a layout stateid that is declared invalid. Signed-off-by: Trond Myklebust <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-06-24NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layoutTrond Myklebust1-0/+2
Signed-off-by: Trond Myklebust <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-06-24NFSv4.1/pnfs: Layout stateids start out as being invalidTrond Myklebust1-2/+2
Signed-off-by: Trond Myklebust <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-06-24NFSv4.1/pnfs: Ensure we handle delegation errors in nfs4_proc_layoutget()Trond Myklebust1-1/+4
nfs4_handle_exception() relies on the caller setting the 'inode' field in the struct nfs4_exception argument when the error applies to a delegation. Signed-off-by: Trond Myklebust <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>