aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-07-12mm/vmalloc.c: spelling> s/informaion/information/Geert Uytterhoeven1-2/+2
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Souptick Joarder <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/vmalloc.c: switch to WARN_ON() and move it under unlink_va()Uladzislau Rezki (Sony)1-15/+10
Trigger a warning if an object that is about to be freed is detached. We used to have a BUG_ON(), but even though it is considered as faulty behaviour that is not a good reason to break a system. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/vmalloc.c: get rid of one single unlink_va() when mergeUladzislau Rezki (Sony)1-6/+2
It does not make sense to try to "unlink" the node that is definitely not linked with a list nor tree. On the first merge step VA just points to the previously disconnected busy area. On the second step, check if the node has been merged and do "unlink" if so, because now it points to an object that must be linked. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Acked-by: Hillf Danton <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/vmalloc.c: preload a CPU with one object for split purposeUladzislau Rezki (Sony)1-4/+51
Refactor the NE_FIT_TYPE split case when it comes to an allocation of one extra object. We need it in order to build a remaining space. The preload is done per CPU in non-atomic context with GFP_KERNEL flags. More permissive parameters can be beneficial for systems which are suffer from high memory pressure or low memory condition. For example on my KVM system(4xCPUs, no swap, 256MB RAM) i can simulate the failure of page allocation with GFP_NOWAIT flags. Using "stress-ng" tool and starting N workers spinning on fork() and exit(), i can trigger below trace: <snip> [ 179.815161] stress-ng-fork: page allocation failure: order:0, mode:0x40800(GFP_NOWAIT|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [ 179.815168] CPU: 0 PID: 12612 Comm: stress-ng-fork Not tainted 5.2.0-rc3+ #1003 [ 179.815170] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 179.815171] Call Trace: [ 179.815178] dump_stack+0x5c/0x7b [ 179.815182] warn_alloc+0x108/0x190 [ 179.815187] __alloc_pages_slowpath+0xdc7/0xdf0 [ 179.815191] __alloc_pages_nodemask+0x2de/0x330 [ 179.815194] cache_grow_begin+0x77/0x420 [ 179.815197] fallback_alloc+0x161/0x200 [ 179.815200] kmem_cache_alloc+0x1c9/0x570 [ 179.815202] alloc_vmap_area+0x32c/0x990 [ 179.815206] __get_vm_area_node+0xb0/0x170 [ 179.815208] __vmalloc_node_range+0x6d/0x230 [ 179.815211] ? _do_fork+0xce/0x3d0 [ 179.815213] copy_process.part.46+0x850/0x1b90 [ 179.815215] ? _do_fork+0xce/0x3d0 [ 179.815219] _do_fork+0xce/0x3d0 [ 179.815226] ? __do_page_fault+0x2bf/0x4e0 [ 179.815229] do_syscall_64+0x55/0x130 [ 179.815231] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 179.815234] RIP: 0033:0x7fedec4c738b ... [ 179.815237] RSP: 002b:00007ffda469d730 EFLAGS: 00000246 ORIG_RAX: 0000000000000038 [ 179.815239] RAX: ffffffffffffffda RBX: 00007ffda469d730 RCX: 00007fedec4c738b [ 179.815240] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011 [ 179.815241] RBP: 00007ffda469d780 R08: 00007fededd6e300 R09: 00007ffda47f50a0 [ 179.815242] R10: 00007fededd6e5d0 R11: 0000000000000246 R12: 0000000000000000 [ 179.815243] R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000 [ 179.815245] Mem-Info: [ 179.815249] active_anon:12686 inactive_anon:14760 isolated_anon:0 active_file:502 inactive_file:61 isolated_file:70 unevictable:2 dirty:0 writeback:0 unstable:0 slab_reclaimable:2380 slab_unreclaimable:7520 mapped:15069 shmem:14813 pagetables:10833 bounce:0 free:1922 free_pcp:229 free_cma:0 <snip> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/vmalloc.c: remove "node" argumentUladzislau Rezki (Sony)1-2/+2
Patch series "Some cleanups for the KVA/vmalloc", v5. This patch (of 4): Remove unused argument from the __alloc_vmap_area() function. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/mmu_notifier: use hlist_add_head_rcu()Jean-Philippe Brucker1-1/+1
Make mmu_notifier_register() safer by issuing a memory barrier before registering a new notifier. This fixes a theoretical bug on weakly ordered CPUs. For example, take this simplified use of notifiers by a driver: my_struct->mn.ops = &my_ops; /* (1) */ mmu_notifier_register(&my_struct->mn, mm) ... hlist_add_head(&mn->hlist, &mm->mmu_notifiers); /* (2) */ ... Once mmu_notifier_register() releases the mm locks, another thread can invalidate a range: mmu_notifier_invalidate_range() ... hlist_for_each_entry_rcu(mn, &mm->mmu_notifiers, hlist) { if (mn->ops->invalidate_range) The read side relies on the data dependency between mn and ops to ensure that the pointer is properly initialized. But the write side doesn't have any dependency between (1) and (2), so they could be reordered and the readers could dereference an invalid mn->ops. mmu_notifier_register() does take all the mm locks before adding to the hlist, but those have acquire semantics which isn't sufficient. By calling hlist_add_head_rcu() instead of hlist_add_head() we update the hlist using a store-release, ensuring that readers see prior initialization of my_struct. This situation is better illustated by litmus test MP+onceassign+derefonce. Link: http://lkml.kernel.org/r/[email protected] Fixes: cddb8a5c14aa ("mmu-notifiers: core") Signed-off-by: Jean-Philippe Brucker <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/memory.c: fail when offset == num in first check of __vm_map_pages()Miguel Ojeda1-1/+1
If the caller asks us for offset == num, we should already fail in the first check, i.e. the one testing for offsets beyond the object. At the moment, we are failing on the second test anyway, since count cannot be 0. Still, to agree with the comment of the first test, we should first test it there. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Souptick Joarder <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Huang Ying <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/pgtable: drop pgtable_t variable from pte_fn_t functionsAnshuman Khandual13-31/+15
Drop the pgtable_t variable from all implementation for pte_fn_t as none of them use it. apply_to_pte_range() should stop computing it as well. Should help us save some cycles. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Matthew Wilcox <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Russell King <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Dan Williams <[email protected]> Cc: <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12unicore32: switch to generic version of pte allocationMike Rapoport1-28/+8
Replace __get_free_page() and alloc_pages() calls with the generic __pte_alloc_one_kernel() and __pte_alloc_one(). There is no functional change for the kernel PTE allocation. The difference for the user PTEs, is that the clear_pte_table() is now called after pgtable_page_ctor() and the addition of __GFP_ACCOUNT to the GFP flags. The pte_free() and pte_free_kernel() versions are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Guo Ren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12um: switch to generic version of pte allocationMike Rapoport2-36/+2
um allocates PTE pages with __get_free_page() and uses GFP_KERNEL | __GFP_ZERO for the allocations. Switch it to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The pte_free() and pte_free_kernel() versions are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Anton Ivanov <[email protected]> Acked-by: Anton Ivanov <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12riscv: switch to generic version of pte allocationMike Rapoport1-27/+2
The only difference between the generic and RISC-V implementation of PTE allocation is the usage of __GFP_RETRY_MAYFAIL for both kernel and user PTEs and the absence of __GFP_ACCOUNT for the user PTEs. The conversion to the generic version removes the __GFP_RETRY_MAYFAIL and ensures that GFP_ACCOUNT is used for the user PTE allocations. The pte_free() and pte_free_kernel() versions are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12parisc: switch to generic version of pte allocationMike Rapoport1-31/+2
parisc allocates PTE pages with __get_free_page() and uses GFP_KERNEL | __GFP_ZERO for the allocations. Switch it to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The pte_free_kernel() and pte_free() versions on are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12nios2: switch to generic version of pte allocationMike Rapoport1-35/+2
nios2 allocates kernel PTE pages with __get_free_pages(GFP_KERNEL | __GFP_ZERO, PTE_ORDER); and user page tables with pte = alloc_pages(GFP_KERNEL, PTE_ORDER); if (pte) clear_highpage(); The PTE_ORDER is hardwired to zero, which makes nios2 implementation almost identical to the generic one. Switch nios2 to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The pte_free_kernel() and pte_free() versions on nios2 are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12nds32: switch to generic version of pte allocationMike Rapoport1-27/+4
The nds32 implementation of pte_alloc_one_kernel() differs from the generic in the use of __GFP_RETRY_MAYFAIL flag, which is removed after the conversion. The nds32 version of pte_alloc_one() missed the call to pgtable_page_ctor() and also used __GFP_RETRY_MAYFAIL. Switching it to use generic __pte_alloc_one() for the PTE page allocation ensures that page table constructor is run and the user page tables are allocated with __GFP_ACCOUNT. The conversion to the generic version of pte_free_kernel() removes the NULL check for pte. The pte_free() version on nds32 is identical to the generic one and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mips: switch to generic version of pte allocationMike Rapoport1-31/+2
MIPS allocates kernel PTE pages with __get_free_pages(GFP_KERNEL | __GFP_ZERO, PTE_ORDER) and user PTE pages with pte = alloc_pages(GFP_KERNEL, PTE_ORDER) and then uses clear_highpage(pte) to zero out the allocated page for the user page tables. The PTE_ORDER is hardwired to zero, which makes MIPS implementation almost identical to the generic one. Switch MIPS to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The pte_free_kernel() and pte_free() versions on mips are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Paul Burton <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12m68k: sun3: switch to generic version of pte allocationMike Rapoport1-39/+2
The sun3 MMU variant of m68k uses GFP_KERNEL to allocate a PTE page and then memset(0) or clear_highpage() to clear it. This is equivalent to allocating the page with GFP_KERNEL | __GFP_ZERO, which allows replacing sun3 implementation of pte_alloc_one() and pte_alloc_one_kernel() with the generic ones. The pte_free() and pte_free_kernel() versions are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12csky: switch to generic version of pte allocationMike Rapoport1-27/+3
The csky implementation pte_alloc_one(), pte_free_kernel() and pte_free() is identical to the generic except of lack of __GFP_ACCOUNT for the user PTEs allocation. Switch csky to use generic version of these functions. The csky implementation of pte_alloc_one_kernel() is not replaced because it does not clear the allocated page but rather sets each PTE in it to a non-zero value. The pte_free_kernel() and pte_free() versions on csky are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Guo Ren <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12arm64: switch to generic version of pte allocationMike Rapoport4-43/+14
The PTE allocations in arm64 are identical to the generic ones modulo the GFP flags. Using the generic pte_alloc_one() functions ensures that the user page tables are allocated with __GFP_ACCOUNT set. The arm64 definition of PGALLOC_GFP is removed and replaced with GFP_PGTABLE_USER for p[gum]d_alloc_one() for the user page tables and GFP_PGTABLE_KERNEL for the kernel page tables. The KVM memory cache is now using GFP_PGTABLE_USER. The mappings created with create_pgd_mapping() are now using GFP_PGTABLE_KERNEL. The conversion to the generic version of pte_free_kernel() removes the NULL check for pte. The pte_free() version on arm64 is identical to the generic one and can be simply dropped. [[email protected]: fix a bogus GFP flag in pgd_alloc()] Link: https://lore.kernel.org/r/[email protected]/ [and fix it more] Link: https://lore.kernel.org/linux-mm/20190617151252.GF16810@rapoport-lnx/ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12arm: switch to generic version of pte allocationMike Rapoport2-29/+14
Replace __get_free_page() and alloc_pages() calls with the generic __pte_alloc_one_kernel() and __pte_alloc_one(). There is no functional change for the kernel PTE allocation. The difference for the user PTEs, is that the clear_pte_table() is now called after pgtable_page_ctor() and the addition of __GFP_ACCOUNT to the GFP flags. The conversion to the generic version of pte_free_kernel() removes the NULL check for pte. The pte_free() version on arm is identical to the generic one and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12alpha: switch to generic version of pte allocationMike Rapoport1-37/+3
alpha allocates PTE pages with __get_free_page() and uses GFP_KERNEL | __GFP_ZERO for the allocations. Switch it to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The alpha pte_free() and pte_free_kernel() versions are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12asm-generic, x86: introduce generic pte_{alloc,free}_one[_kernel]Mike Rapoport3-44/+115
Most architectures have identical or very similar implementation of pte_alloc_one_kernel(), pte_alloc_one(), pte_free_kernel() and pte_free(). Add a generic implementation that can be reused across architectures and enable its use on x86. The generic implementation uses GFP_KERNEL | __GFP_ZERO for the kernel page tables and GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT for the user page tables. The "base" functions for PTE allocation, namely __pte_alloc_one_kernel() and __pte_alloc_one() are intended for the architectures that require additional actions after actual memory allocation or must use non-default GFP flags. x86 is switched to use generic pte_alloc_one_kernel(), pte_free_kernel() and pte_free(). x86 still implements pte_alloc_one() to allow run-time control of GFP flags required for "userpte" command line option. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/gup.c: mark undo_dev_pagemap as __maybe_unusedGuenter Roeck1-1/+2
Several mips builds generate the following build warning. mm/gup.c:1788:13: warning: 'undo_dev_pagemap' defined but not used The function is declared unconditionally but only called from behind various ifdefs. Mark it __maybe_unused. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/gup.c: remove some BUG_ONs from get_gate_page()Andy Lutomirski1-3/+6
If we end up without a PGD or PUD entry backing the gate area, don't BUG -- just fail gracefully. It's not entirely implausible that this could happen some day on x86. It doesn't right now even with an execute-only emulated vsyscall page because the fixmap shares the PUD, but the core mm code shouldn't rely on that particular detail to avoid OOPSing. Link: http://lkml.kernel.org/r/a1d9f4efb75b9d464e59fd6af00104b21c58f6f7.1561610798.git.luto@kernel.org Signed-off-by: Andy Lutomirski <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Jann Horn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/gup: speed up check_and_migrate_cma_pages() on huge pagePingfan Liu1-8/+16
Both hugetlb and thp locate on the same migration type of pageblock, since they are allocated from a free_list[]. Based on this fact, it is enough to check on a single subpage to decide the migration type of the whole huge page. By this way, it saves (2M/4K - 1) times loop for pmd_huge on x86, similar on other archs. Furthermore, when executing isolate_huge_page(), it avoid taking global hugetlb_lock many times, and meanless remove/add to the local link list cma_page_list. [[email protected]: make `i' and `step' unsigned] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pingfan Liu <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Keith Busch <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: mark the page referenced in gup_hugepteChristoph Hellwig1-0/+1
All other get_user_page_fast cases mark the page referenced, so do this here as well. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: switch gup_hugepte to use try_get_compound_headChristoph Hellwig1-1/+2
This applies the overflow fixes from 8fde12ca79aff ("mm: prevent get_user_pages() from overflowing page refcount") to the powerpc hugepd code and brings it back in sync with the other GUP cases. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: move the powerpc hugepd code to mm/gup.cChristoph Hellwig5-90/+93
While only powerpc supports the hugepd case, the code is pretty generic and I'd like to keep all GUP internals in one place. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: validate get_user_pages_fast flagsChristoph Hellwig1-0/+3
We can only deal with FOLL_WRITE and/or FOLL_LONGTERM in get_user_pages_fast, so reject all other flags. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: consolidate the get_user_pages* implementationsChristoph Hellwig5-142/+65
Always build mm/gup.c so that we don't have to provide separate nommu stubs. Also merge the get_user_pages_fast and __get_user_pages_fast stubs when HAVE_FAST_GUP into the main implementations, which will never call the fast path if HAVE_FAST_GUP is not set. This also ensures the new put_user_pages* helpers are available for nommu, as those are currently missing, which would create a problem as soon as we actually grew users for it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: reorder code blocks in gup.cChristoph Hellwig1-205/+205
This moves the actually exported functions towards the end of the file, and reorders some functions to be in more logical blocks as a preparation for moving various stubs inline into the main functionality using IS_ENABLED(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: rename CONFIG_HAVE_GENERIC_GUP to CONFIG_HAVE_FAST_GUPChristoph Hellwig10-18/+11
We only support the generic GUP now, so rename the config option to be more clear, and always use the mm/Kconfig definition of the symbol and select it from the arch Kconfigs. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12sparc64: use the generic get_user_pages_fast codeChristoph Hellwig4-341/+20
The sparc64 code is mostly equivalent to the generic one, minus various bugfixes and two arch overrides that this patch adds to pgtable.h. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Cc: David Miller <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12sparc64: define untagged_addr()Christoph Hellwig1-0/+22
Add a helper to untag a user pointer. This is needed for ADI support in get_user_pages_fast. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Cc: David Miller <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12sparc64: add the missing pgd_page definitionChristoph Hellwig1-0/+3
sparc64 only had pgd_page_vaddr, but not pgd_page. [[email protected]: fix sparc64 build] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: David Miller <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12sh: use the generic get_user_pages_fast codeChristoph Hellwig4-278/+40
The sh code is mostly equivalent to the generic one, minus various bugfixes and two arch overrides that this patch adds to pgtable.h. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12sh: add the missing pud_page definitionChristoph Hellwig1-0/+3
sh only had pud_page_vaddr, but not pud_page. [[email protected]: sh: stub out pud_page] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Guenter Roeck <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12MIPS: use the generic get_user_pages_fast codeChristoph Hellwig4-304/+5
The mips code is mostly equivalent to the generic one, minus various bugfixes and an arch override for gup_fast_permitted. Note that this defines ARCH_HAS_PTE_SPECIAL for mips as mips has pte_special and pte_mkspecial implemented and used in the existing gup code. They are no-op stubs, though which makes me a little unsure if this is really right thing to do. Note that this also adds back a missing cpu_has_dc_aliases check for __get_user_pages_fast, which the old code was only doing for get_user_pages_fast. This clearly looks like an oversight, as any condition that makes get_user_pages_fast unsafe also applies to __get_user_pages_fast. [[email protected]: MIPS: don't select ARCH_HAS_PTE_SPECIAL] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Tested-by: Guenter Roeck <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: lift the x86_32 PAE version of gup_get_pte to common codeChristoph Hellwig5-52/+52
The split low/high access is the only non-READ_ONCE version of gup_get_pte that did show up in the various arch implemenations. Lift it to common code and drop the ifdef based arch override. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: simplify gup_fast_permittedChristoph Hellwig3-24/+9
Pass in the already calculated end value instead of recomputing it, and leave the end > start check in the callers instead of duplicating them in the arch code. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: use untagged_addr() for get_user_pages_fast addressesChristoph Hellwig1-2/+2
Patch series "switch the remaining architectures to use generic GUP", v4. A series to switch mips, sh and sparc64 to use the generic GUP code so that we only have one codebase to touch for further improvements to this code. This patch (of 16): This will allow sparc64, or any future architecture with memory tagging to override its tags for get_user_pages and get_user_pages_fast. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Paul Burton <[email protected]> Cc: James Hogan <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: David Miller <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Ralf Baechle <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm, memcg: add a memcg_slabinfo debugfs fileWaiman Long2-0/+64
There are concerns about memory leaks from extensive use of memory cgroups as each memory cgroup creates its own set of kmem caches. There is a possiblity that the memcg kmem caches may remain even after the memory cgroups have been offlined. Therefore, it will be useful to show the status of each of memcg kmem caches. This patch introduces a new <debugfs>/memcg_slabinfo file which is somewhat similar to /proc/slabinfo in format, but lists only information about kmem caches that have child memcg kmem caches. Information available in /proc/slabinfo are not repeated in memcg_slabinfo. A portion of a sample output of the file was: # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs> rpc_inode_cache root 13 51 1 1 rpc_inode_cache 48 0 0 0 0 fat_inode_cache root 1 45 1 1 fat_inode_cache 41 2 45 1 1 xfs_inode root 770 816 24 24 xfs_inode 92 22 34 1 1 xfs_inode 88:dead 1 34 1 1 xfs_inode 89:dead 23 34 1 1 xfs_inode 85 4 34 1 1 xfs_inode 84 9 34 1 1 The css id of the memcg is also listed. If a memcg is not online, the tag ":dead" will be attached as shown above. [[email protected]: memcg: add ":deact" tag for reparented kmem caches in memcg_slabinfo] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: set the flag in the common code as suggested by Roman] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Waiman Long <[email protected]> Suggested-by: Shakeel Butt <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: reparent memcg kmem_caches on cgroup removalRoman Gushchin4-18/+58
Let's reparent non-root kmem_caches on memcg offlining. This allows us to release the memory cgroup without waiting for the last outstanding kernel object (e.g. dentry used by another application). Since the parent cgroup is already charged, everything we need to do is to splice the list of kmem_caches to the parent's kmem_caches list, swap the memcg pointer, drop the css refcounter for each kmem_cache and adjust the parent's css refcounter. Please, note that kmem_cache->memcg_params.memcg isn't a stable pointer anymore. It's safe to read it under rcu_read_lock(), cgroup_mutex held, or any other way that protects the memory cgroup from being released. We can race with the slab allocation and deallocation paths. It's not a big problem: parent's charge and slab global stats are always correct, and we don't care anymore about the child usage and global stats. The child cgroup is already offline, so we don't use or show it anywhere. Local slab stats (NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE) aren't used anywhere except count_shadow_nodes(). But even there it won't break anything: after reparenting "nodes" will be 0 on child level (because we're already reparenting shrinker lists), and on parent level page stats always were 0, and this patch won't change anything. [[email protected]: properly handle kmem_caches reparented to root_mem_cgroup] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pagesRoman Gushchin3-19/+70
Every slab page charged to a non-root memory cgroup has a pointer to the memory cgroup and holds a reference to it, which protects a non-empty memory cgroup from being released. At the same time the page has a pointer to the corresponding kmem_cache, and also hold a reference to the kmem_cache. And kmem_cache by itself holds a reference to the cgroup. So there is clearly some redundancy, which allows to stop setting the page->mem_cgroup pointer and rely on getting memcg pointer indirectly via kmem_cache. Further it will allow to change this pointer easier, without a need to go over all charged pages. So let's stop setting page->mem_cgroup pointer for slab pages, and stop using the css refcounter directly for protecting the memory cgroup from going away. Instead rely on kmem_cache as an intermediate object. Make sure that vmstats and shrinker lists are working as previously, as well as /proc/kpagecgroup interface. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: rework non-root kmem_cache lifecycle managementRoman Gushchin4-79/+96
Currently each charged slab page holds a reference to the cgroup to which it's charged. Kmem_caches are held by the memcg and are released all together with the memory cgroup. It means that none of kmem_caches are released unless at least one reference to the memcg exists, which is very far from optimal. Let's rework it in a way that allows releasing individual kmem_caches as soon as the cgroup is offline, the kmem_cache is empty and there are no pending allocations. To make it possible, let's introduce a new percpu refcounter for non-root kmem caches. The counter is initialized to the percpu mode, and is switched to the atomic mode during kmem_cache deactivation. The counter is bumped for every charged page and also for every running allocation. So the kmem_cache can't be released unless all allocations complete. To shutdown non-active empty kmem_caches, let's reuse the work queue, previously used for the kmem_cache deactivation. Once the reference counter reaches 0, let's schedule an asynchronous kmem_cache release. * I used the following simple approach to test the performance (stolen from another patchset by T. Harding): time find / -name fname-no-exist echo 2 > /proc/sys/vm/drop_caches repeat 10 times Results: orig patched real 0m1.455s real 0m1.355s user 0m0.206s user 0m0.219s sys 0m0.855s sys 0m0.807s real 0m1.487s real 0m1.699s user 0m0.221s user 0m0.256s sys 0m0.806s sys 0m0.948s real 0m1.515s real 0m1.505s user 0m0.183s user 0m0.215s sys 0m0.876s sys 0m0.858s real 0m1.291s real 0m1.380s user 0m0.193s user 0m0.198s sys 0m0.843s sys 0m0.786s real 0m1.364s real 0m1.374s user 0m0.180s user 0m0.182s sys 0m0.868s sys 0m0.806s real 0m1.352s real 0m1.312s user 0m0.201s user 0m0.212s sys 0m0.820s sys 0m0.761s real 0m1.302s real 0m1.349s user 0m0.205s user 0m0.203s sys 0m0.803s sys 0m0.792s real 0m1.334s real 0m1.301s user 0m0.194s user 0m0.201s sys 0m0.806s sys 0m0.779s real 0m1.426s real 0m1.434s user 0m0.216s user 0m0.181s sys 0m0.824s sys 0m0.864s real 0m1.350s real 0m1.295s user 0m0.200s user 0m0.190s sys 0m0.842s sys 0m0.811s So it looks like the difference is not noticeable in this test. [[email protected]: fix an use-after-free in kmemcg_workfn()] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Qian Cai <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: synchronize access to kmem_cache dying flag using a spinlockRoman Gushchin1-3/+12
Currently the memcg_params.dying flag and the corresponding workqueue used for the asynchronous deactivation of kmem_caches is synchronized using the slab_mutex. It makes impossible to check this flag from the irq context, which will be required in order to implement asynchronous release of kmem_caches. So let's switch over to the irq-save flavor of the spinlock-based synchronization. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: don't check the dying flag on kmem_cache creationRoman Gushchin1-1/+1
There is no point in checking the root_cache->memcg_params.dying flag on kmem_cache creation path. New allocations shouldn't be performed using a dead root kmem_cache, so no new memcg kmem_cache creation can be scheduled after the flag is set. And if it was scheduled before, flush_memcg_workqueue() will wait for it anyway. So let's drop this check to simplify the code. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: unify SLAB and SLUB page accountingRoman Gushchin3-28/+30
Currently the page accounting code is duplicated in SLAB and SLUB internals. Let's move it into new (un)charge_slab_page helpers in the slab_common.c file. These helpers will be responsible for statistics (global and memcg-aware) and memcg charging. So they are replacing direct memcg_(un)charge_slab() calls. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: introduce __memcg_kmem_uncharge_memcg()Roman Gushchin2-8/+27
Let's separate the page counter modification code out of __memcg_kmem_uncharge() in a way similar to what __memcg_kmem_charge() and __memcg_kmem_charge_memcg() work. This will allow to reuse this code later using a new memcg_kmem_uncharge_memcg() wrapper, which calls __memcg_kmem_uncharge_memcg() if memcg_kmem_enabled() check is passed. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: generalize postponed non-root kmem_cache deactivationRoman Gushchin4-28/+14
Currently SLUB uses a work scheduled after an RCU grace period to deactivate a non-root kmem_cache. This mechanism can be reused for kmem_caches release, but requires generalization for SLAB case. Introduce kmemcg_cache_deactivate() function, which calls allocator-specific __kmem_cache_deactivate() and schedules execution of __kmem_cache_deactivate_after_rcu() with all necessary locks in a worker context after an rcu grace period. Here is the new calling scheme: kmemcg_cache_deactivate() __kmemcg_cache_deactivate() SLAB/SLUB-specific kmemcg_rcufn() rcu kmemcg_workfn() work __kmemcg_cache_deactivate_after_rcu() SLAB/SLUB-specific instead of: __kmemcg_cache_deactivate() SLAB/SLUB-specific slab_deactivate_memcg_cache_rcu_sched() SLUB-only kmemcg_rcufn() rcu kmemcg_workfn() work kmemcg_cache_deact_after_rcu() SLUB-only For consistency, all allocator-specific functions start with "__". Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: rename slab delayed deactivation functions and fieldsRoman Gushchin3-19/+19
The delayed work/rcu deactivation infrastructure of non-root kmem_caches can be also used for asynchronous release of these objects. Let's get rid of the word "deactivation" in corresponding names to make the code look better after generalization. It's easier to make the renaming first, so that the generalized code will look consistent from scratch. Let's rename struct memcg_cache_params fields: deact_fn -> work_fn deact_rcu_head -> rcu_head deact_work -> work And RCU/delayed work callbacks in slab common code: kmemcg_deactivate_rcufn -> kmemcg_rcufn kmemcg_deactivate_workfn -> kmemcg_workfn This patch contains no functional changes, only renamings. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>