aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-07-12mm: init: report memory auto-initialization features at boot timeAlexander Potapenko1-0/+24
Print the currently enabled stack and heap initialization modes. Stack initialization is enabled by a config flag, while heap initialization is configured at boot time with defaults being set in the config. It's more convenient for the user to have all information about these hardening measures in one place at boot, so the user can reason about the expected behavior of the running system. The possible options for stack are: - "all" for CONFIG_INIT_STACK_ALL; - "byref_all" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL; - "byref" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF; - "__user" for CONFIG_GCC_PLUGIN_STRUCTLEAK_USER; - "off" otherwise. Depending on the values of init_on_alloc and init_on_free boottime options we also report "heap alloc" and "heap free" as "on"/"off". In the init_on_free mode initializing pages at boot time may take a while, so print a notice about that as well. This depends on how much memory is installed, the memory bandwidth, etc. On a relatively modern x86 system, it takes about 0.75s/GB to wipe all memory: [ 0.418722] mem auto-init: stack:byref_all, heap alloc:off, heap free:on [ 0.419765] mem auto-init: clearing system memory may take some time... [ 12.376605] Memory: 16408564K/16776672K available (14339K kernel code, 1397K rwdata, 3756K rodata, 1636K init, 11460K bss, 368108K reserved, 0K cma-reserved) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Suggested-by: Kees Cook <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: James Morris <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kostya Serebryany <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Sandeep Patil <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: Souptick Joarder <[email protected]> Cc: Marco Elver <[email protected]> Cc: Kaiwan N Billimoria <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: security: introduce init_on_alloc=1 and init_on_free=1 boot optionsAlexander Potapenko10-18/+199
Patch series "add init_on_alloc/init_on_free boot options", v10. Provide init_on_alloc and init_on_free boot options. These are aimed at preventing possible information leaks and making the control-flow bugs that depend on uninitialized values more deterministic. Enabling either of the options guarantees that the memory returned by the page allocator and SL[AU]B is initialized with zeroes. SLOB allocator isn't supported at the moment, as its emulation of kmem caches complicates handling of SLAB_TYPESAFE_BY_RCU caches correctly. Enabling init_on_free also guarantees that pages and heap objects are initialized right after they're freed, so it won't be possible to access stale data by using a dangling pointer. As suggested by Michal Hocko, right now we don't let the heap users to disable initialization for certain allocations. There's not enough evidence that doing so can speed up real-life cases, and introducing ways to opt-out may result in things going out of control. This patch (of 2): The new options are needed to prevent possible information leaks and make control-flow bugs that depend on uninitialized values more deterministic. This is expected to be on-by-default on Android and Chrome OS. And it gives the opportunity for anyone else to use it under distros too via the boot args. (The init_on_free feature is regularly requested by folks where memory forensics is included in their threat models.) init_on_alloc=1 makes the kernel initialize newly allocated pages and heap objects with zeroes. Initialization is done at allocation time at the places where checks for __GFP_ZERO are performed. init_on_free=1 makes the kernel initialize freed pages and heap objects with zeroes upon their deletion. This helps to ensure sensitive data doesn't leak via use-after-free accesses. Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator returns zeroed memory. The two exceptions are slab caches with constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never zero-initialized to preserve their semantics. Both init_on_alloc and init_on_free default to zero, but those defaults can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and CONFIG_INIT_ON_FREE_DEFAULT_ON. If either SLUB poisoning or page poisoning is enabled, those options take precedence over init_on_alloc and init_on_free: initialization is only applied to unpoisoned allocations. Slowdown for the new features compared to init_on_free=0, init_on_alloc=0: hackbench, init_on_free=1: +7.62% sys time (st.err 0.74%) hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%) Linux build with -j12, init_on_free=1: +8.38% wall time (st.err 0.39%) Linux build with -j12, init_on_free=1: +24.42% sys time (st.err 0.52%) Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%) Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%) The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline is within the standard error. The new features are also going to pave the way for hardware memory tagging (e.g. arm64's MTE), which will require both on_alloc and on_free hooks to set the tags for heap objects. With MTE, tagging will have the same cost as memory initialization. Although init_on_free is rather costly, there are paranoid use-cases where in-memory data lifetime is desired to be minimized. There are various arguments for/against the realism of the associated threat models, but given that we'll need the infrastructure for MTE anyway, and there are people who want wipe-on-free behavior no matter what the performance cost, it seems reasonable to include it in this series. [[email protected]: v8] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: v9] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: v10] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Michal Hocko <[email protected]> [page and dmapool parts Acked-by: James Morris <[email protected]>] Cc: Christoph Lameter <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Kostya Serebryany <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Sandeep Patil <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Jann Horn <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Marco Elver <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12arm64: move jump_label_init() before parse_early_param()Kees Cook2-5/+5
While jump_label_init() was moved earlier in the boot process in efd9e03facd0 ("arm64: Use static keys for CPU features"), it wasn't early enough for early params to use it. The old state of things was as described here... init/main.c calls out to arch-specific things before general jump label and early param handling: asmlinkage __visible void __init start_kernel(void) { ... setup_arch(&command_line); ... smp_prepare_boot_cpu(); ... /* parameters may set static keys */ jump_label_init(); parse_early_param(); ... } x86 setup_arch() wants those earlier, so it handles jump label and early param: void __init setup_arch(char **cmdline_p) { ... jump_label_init(); ... parse_early_param(); ... } arm64 setup_arch() only had early param: void __init setup_arch(char **cmdline_p) { ... parse_early_param(); ... } with jump label later in smp_prepare_boot_cpu(): void __init smp_prepare_boot_cpu(void) { ... jump_label_init(); ... } This moves arm64 jump_label_init() from smp_prepare_boot_cpu() to setup_arch(), as done already on x86, in preparation from early param usage in the init_on_alloc/free() series: https://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/201906271003.005303B52@keescook Signed-off-by: Kees Cook <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Acked-by: Catalin Marinas <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Qian Cai <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/large system hash: clear hashdist when only one node with memory is bootedNicholas Piggin1-13/+18
CONFIG_NUMA on 64-bit CPUs currently enables hashdist unconditionally even when booting on single node machines. This causes the large system hashes to be allocated with vmalloc, and mapped with small pages. This change clears hashdist if only one node has come up with memory. This results in the important large inode and dentry hashes using memblock allocations. All others are within 4MB size up to about 128GB of RAM, which allows them to be allocated from the linear map on most non-NUMA images. Other big hashes like futex and TCP should eventually be moved over to the same style of allocation as those vfs caches that use HASH_EARLY if !hashdist, so they don't exceed MAX_ORDER on very large non-NUMA images. This brings dTLB misses for linux kernel tree `git diff` from ~45,000 to ~8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off (performance is in the noise, under 1% difference, page tables are likely to be well cached for this workload). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/large system hash: use vmalloc for size > MAX_ORDER when !hashdistNicholas Piggin1-7/+9
The kernel currently clamps large system hashes to MAX_ORDER when hashdist is not set, which is rather arbitrary. vmalloc space is limited on 32-bit machines, but this shouldn't result in much more used because of small physical memory limiting system hash sizes. Include "vmalloc" or "linear" in the kernel log message. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/vmalloc.c: spelling> s/informaion/information/Geert Uytterhoeven1-2/+2
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Souptick Joarder <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/vmalloc.c: switch to WARN_ON() and move it under unlink_va()Uladzislau Rezki (Sony)1-15/+10
Trigger a warning if an object that is about to be freed is detached. We used to have a BUG_ON(), but even though it is considered as faulty behaviour that is not a good reason to break a system. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/vmalloc.c: get rid of one single unlink_va() when mergeUladzislau Rezki (Sony)1-6/+2
It does not make sense to try to "unlink" the node that is definitely not linked with a list nor tree. On the first merge step VA just points to the previously disconnected busy area. On the second step, check if the node has been merged and do "unlink" if so, because now it points to an object that must be linked. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Acked-by: Hillf Danton <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/vmalloc.c: preload a CPU with one object for split purposeUladzislau Rezki (Sony)1-4/+51
Refactor the NE_FIT_TYPE split case when it comes to an allocation of one extra object. We need it in order to build a remaining space. The preload is done per CPU in non-atomic context with GFP_KERNEL flags. More permissive parameters can be beneficial for systems which are suffer from high memory pressure or low memory condition. For example on my KVM system(4xCPUs, no swap, 256MB RAM) i can simulate the failure of page allocation with GFP_NOWAIT flags. Using "stress-ng" tool and starting N workers spinning on fork() and exit(), i can trigger below trace: <snip> [ 179.815161] stress-ng-fork: page allocation failure: order:0, mode:0x40800(GFP_NOWAIT|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [ 179.815168] CPU: 0 PID: 12612 Comm: stress-ng-fork Not tainted 5.2.0-rc3+ #1003 [ 179.815170] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 179.815171] Call Trace: [ 179.815178] dump_stack+0x5c/0x7b [ 179.815182] warn_alloc+0x108/0x190 [ 179.815187] __alloc_pages_slowpath+0xdc7/0xdf0 [ 179.815191] __alloc_pages_nodemask+0x2de/0x330 [ 179.815194] cache_grow_begin+0x77/0x420 [ 179.815197] fallback_alloc+0x161/0x200 [ 179.815200] kmem_cache_alloc+0x1c9/0x570 [ 179.815202] alloc_vmap_area+0x32c/0x990 [ 179.815206] __get_vm_area_node+0xb0/0x170 [ 179.815208] __vmalloc_node_range+0x6d/0x230 [ 179.815211] ? _do_fork+0xce/0x3d0 [ 179.815213] copy_process.part.46+0x850/0x1b90 [ 179.815215] ? _do_fork+0xce/0x3d0 [ 179.815219] _do_fork+0xce/0x3d0 [ 179.815226] ? __do_page_fault+0x2bf/0x4e0 [ 179.815229] do_syscall_64+0x55/0x130 [ 179.815231] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 179.815234] RIP: 0033:0x7fedec4c738b ... [ 179.815237] RSP: 002b:00007ffda469d730 EFLAGS: 00000246 ORIG_RAX: 0000000000000038 [ 179.815239] RAX: ffffffffffffffda RBX: 00007ffda469d730 RCX: 00007fedec4c738b [ 179.815240] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011 [ 179.815241] RBP: 00007ffda469d780 R08: 00007fededd6e300 R09: 00007ffda47f50a0 [ 179.815242] R10: 00007fededd6e5d0 R11: 0000000000000246 R12: 0000000000000000 [ 179.815243] R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000 [ 179.815245] Mem-Info: [ 179.815249] active_anon:12686 inactive_anon:14760 isolated_anon:0 active_file:502 inactive_file:61 isolated_file:70 unevictable:2 dirty:0 writeback:0 unstable:0 slab_reclaimable:2380 slab_unreclaimable:7520 mapped:15069 shmem:14813 pagetables:10833 bounce:0 free:1922 free_pcp:229 free_cma:0 <snip> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/vmalloc.c: remove "node" argumentUladzislau Rezki (Sony)1-2/+2
Patch series "Some cleanups for the KVA/vmalloc", v5. This patch (of 4): Remove unused argument from the __alloc_vmap_area() function. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/mmu_notifier: use hlist_add_head_rcu()Jean-Philippe Brucker1-1/+1
Make mmu_notifier_register() safer by issuing a memory barrier before registering a new notifier. This fixes a theoretical bug on weakly ordered CPUs. For example, take this simplified use of notifiers by a driver: my_struct->mn.ops = &my_ops; /* (1) */ mmu_notifier_register(&my_struct->mn, mm) ... hlist_add_head(&mn->hlist, &mm->mmu_notifiers); /* (2) */ ... Once mmu_notifier_register() releases the mm locks, another thread can invalidate a range: mmu_notifier_invalidate_range() ... hlist_for_each_entry_rcu(mn, &mm->mmu_notifiers, hlist) { if (mn->ops->invalidate_range) The read side relies on the data dependency between mn and ops to ensure that the pointer is properly initialized. But the write side doesn't have any dependency between (1) and (2), so they could be reordered and the readers could dereference an invalid mn->ops. mmu_notifier_register() does take all the mm locks before adding to the hlist, but those have acquire semantics which isn't sufficient. By calling hlist_add_head_rcu() instead of hlist_add_head() we update the hlist using a store-release, ensuring that readers see prior initialization of my_struct. This situation is better illustated by litmus test MP+onceassign+derefonce. Link: http://lkml.kernel.org/r/[email protected] Fixes: cddb8a5c14aa ("mmu-notifiers: core") Signed-off-by: Jean-Philippe Brucker <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/memory.c: fail when offset == num in first check of __vm_map_pages()Miguel Ojeda1-1/+1
If the caller asks us for offset == num, we should already fail in the first check, i.e. the one testing for offsets beyond the object. At the moment, we are failing on the second test anyway, since count cannot be 0. Still, to agree with the comment of the first test, we should first test it there. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Souptick Joarder <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Huang Ying <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/pgtable: drop pgtable_t variable from pte_fn_t functionsAnshuman Khandual13-31/+15
Drop the pgtable_t variable from all implementation for pte_fn_t as none of them use it. apply_to_pte_range() should stop computing it as well. Should help us save some cycles. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Matthew Wilcox <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Russell King <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Dan Williams <[email protected]> Cc: <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12unicore32: switch to generic version of pte allocationMike Rapoport1-28/+8
Replace __get_free_page() and alloc_pages() calls with the generic __pte_alloc_one_kernel() and __pte_alloc_one(). There is no functional change for the kernel PTE allocation. The difference for the user PTEs, is that the clear_pte_table() is now called after pgtable_page_ctor() and the addition of __GFP_ACCOUNT to the GFP flags. The pte_free() and pte_free_kernel() versions are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Guo Ren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12um: switch to generic version of pte allocationMike Rapoport2-36/+2
um allocates PTE pages with __get_free_page() and uses GFP_KERNEL | __GFP_ZERO for the allocations. Switch it to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The pte_free() and pte_free_kernel() versions are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Anton Ivanov <[email protected]> Acked-by: Anton Ivanov <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12riscv: switch to generic version of pte allocationMike Rapoport1-27/+2
The only difference between the generic and RISC-V implementation of PTE allocation is the usage of __GFP_RETRY_MAYFAIL for both kernel and user PTEs and the absence of __GFP_ACCOUNT for the user PTEs. The conversion to the generic version removes the __GFP_RETRY_MAYFAIL and ensures that GFP_ACCOUNT is used for the user PTE allocations. The pte_free() and pte_free_kernel() versions are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12parisc: switch to generic version of pte allocationMike Rapoport1-31/+2
parisc allocates PTE pages with __get_free_page() and uses GFP_KERNEL | __GFP_ZERO for the allocations. Switch it to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The pte_free_kernel() and pte_free() versions on are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12nios2: switch to generic version of pte allocationMike Rapoport1-35/+2
nios2 allocates kernel PTE pages with __get_free_pages(GFP_KERNEL | __GFP_ZERO, PTE_ORDER); and user page tables with pte = alloc_pages(GFP_KERNEL, PTE_ORDER); if (pte) clear_highpage(); The PTE_ORDER is hardwired to zero, which makes nios2 implementation almost identical to the generic one. Switch nios2 to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The pte_free_kernel() and pte_free() versions on nios2 are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12nds32: switch to generic version of pte allocationMike Rapoport1-27/+4
The nds32 implementation of pte_alloc_one_kernel() differs from the generic in the use of __GFP_RETRY_MAYFAIL flag, which is removed after the conversion. The nds32 version of pte_alloc_one() missed the call to pgtable_page_ctor() and also used __GFP_RETRY_MAYFAIL. Switching it to use generic __pte_alloc_one() for the PTE page allocation ensures that page table constructor is run and the user page tables are allocated with __GFP_ACCOUNT. The conversion to the generic version of pte_free_kernel() removes the NULL check for pte. The pte_free() version on nds32 is identical to the generic one and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mips: switch to generic version of pte allocationMike Rapoport1-31/+2
MIPS allocates kernel PTE pages with __get_free_pages(GFP_KERNEL | __GFP_ZERO, PTE_ORDER) and user PTE pages with pte = alloc_pages(GFP_KERNEL, PTE_ORDER) and then uses clear_highpage(pte) to zero out the allocated page for the user page tables. The PTE_ORDER is hardwired to zero, which makes MIPS implementation almost identical to the generic one. Switch MIPS to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The pte_free_kernel() and pte_free() versions on mips are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Paul Burton <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12m68k: sun3: switch to generic version of pte allocationMike Rapoport1-39/+2
The sun3 MMU variant of m68k uses GFP_KERNEL to allocate a PTE page and then memset(0) or clear_highpage() to clear it. This is equivalent to allocating the page with GFP_KERNEL | __GFP_ZERO, which allows replacing sun3 implementation of pte_alloc_one() and pte_alloc_one_kernel() with the generic ones. The pte_free() and pte_free_kernel() versions are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12csky: switch to generic version of pte allocationMike Rapoport1-27/+3
The csky implementation pte_alloc_one(), pte_free_kernel() and pte_free() is identical to the generic except of lack of __GFP_ACCOUNT for the user PTEs allocation. Switch csky to use generic version of these functions. The csky implementation of pte_alloc_one_kernel() is not replaced because it does not clear the allocated page but rather sets each PTE in it to a non-zero value. The pte_free_kernel() and pte_free() versions on csky are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Guo Ren <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12arm64: switch to generic version of pte allocationMike Rapoport4-43/+14
The PTE allocations in arm64 are identical to the generic ones modulo the GFP flags. Using the generic pte_alloc_one() functions ensures that the user page tables are allocated with __GFP_ACCOUNT set. The arm64 definition of PGALLOC_GFP is removed and replaced with GFP_PGTABLE_USER for p[gum]d_alloc_one() for the user page tables and GFP_PGTABLE_KERNEL for the kernel page tables. The KVM memory cache is now using GFP_PGTABLE_USER. The mappings created with create_pgd_mapping() are now using GFP_PGTABLE_KERNEL. The conversion to the generic version of pte_free_kernel() removes the NULL check for pte. The pte_free() version on arm64 is identical to the generic one and can be simply dropped. [[email protected]: fix a bogus GFP flag in pgd_alloc()] Link: https://lore.kernel.org/r/[email protected]/ [and fix it more] Link: https://lore.kernel.org/linux-mm/20190617151252.GF16810@rapoport-lnx/ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12arm: switch to generic version of pte allocationMike Rapoport2-29/+14
Replace __get_free_page() and alloc_pages() calls with the generic __pte_alloc_one_kernel() and __pte_alloc_one(). There is no functional change for the kernel PTE allocation. The difference for the user PTEs, is that the clear_pte_table() is now called after pgtable_page_ctor() and the addition of __GFP_ACCOUNT to the GFP flags. The conversion to the generic version of pte_free_kernel() removes the NULL check for pte. The pte_free() version on arm is identical to the generic one and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12alpha: switch to generic version of pte allocationMike Rapoport1-37/+3
alpha allocates PTE pages with __get_free_page() and uses GFP_KERNEL | __GFP_ZERO for the allocations. Switch it to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The alpha pte_free() and pte_free_kernel() versions are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12asm-generic, x86: introduce generic pte_{alloc,free}_one[_kernel]Mike Rapoport3-44/+115
Most architectures have identical or very similar implementation of pte_alloc_one_kernel(), pte_alloc_one(), pte_free_kernel() and pte_free(). Add a generic implementation that can be reused across architectures and enable its use on x86. The generic implementation uses GFP_KERNEL | __GFP_ZERO for the kernel page tables and GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT for the user page tables. The "base" functions for PTE allocation, namely __pte_alloc_one_kernel() and __pte_alloc_one() are intended for the architectures that require additional actions after actual memory allocation or must use non-default GFP flags. x86 is switched to use generic pte_alloc_one_kernel(), pte_free_kernel() and pte_free(). x86 still implements pte_alloc_one() to allow run-time control of GFP flags required for "userpte" command line option. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/gup.c: mark undo_dev_pagemap as __maybe_unusedGuenter Roeck1-1/+2
Several mips builds generate the following build warning. mm/gup.c:1788:13: warning: 'undo_dev_pagemap' defined but not used The function is declared unconditionally but only called from behind various ifdefs. Mark it __maybe_unused. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/gup.c: remove some BUG_ONs from get_gate_page()Andy Lutomirski1-3/+6
If we end up without a PGD or PUD entry backing the gate area, don't BUG -- just fail gracefully. It's not entirely implausible that this could happen some day on x86. It doesn't right now even with an execute-only emulated vsyscall page because the fixmap shares the PUD, but the core mm code shouldn't rely on that particular detail to avoid OOPSing. Link: http://lkml.kernel.org/r/a1d9f4efb75b9d464e59fd6af00104b21c58f6f7.1561610798.git.luto@kernel.org Signed-off-by: Andy Lutomirski <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Jann Horn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/gup: speed up check_and_migrate_cma_pages() on huge pagePingfan Liu1-8/+16
Both hugetlb and thp locate on the same migration type of pageblock, since they are allocated from a free_list[]. Based on this fact, it is enough to check on a single subpage to decide the migration type of the whole huge page. By this way, it saves (2M/4K - 1) times loop for pmd_huge on x86, similar on other archs. Furthermore, when executing isolate_huge_page(), it avoid taking global hugetlb_lock many times, and meanless remove/add to the local link list cma_page_list. [[email protected]: make `i' and `step' unsigned] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pingfan Liu <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Keith Busch <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: mark the page referenced in gup_hugepteChristoph Hellwig1-0/+1
All other get_user_page_fast cases mark the page referenced, so do this here as well. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: switch gup_hugepte to use try_get_compound_headChristoph Hellwig1-1/+2
This applies the overflow fixes from 8fde12ca79aff ("mm: prevent get_user_pages() from overflowing page refcount") to the powerpc hugepd code and brings it back in sync with the other GUP cases. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: move the powerpc hugepd code to mm/gup.cChristoph Hellwig5-90/+93
While only powerpc supports the hugepd case, the code is pretty generic and I'd like to keep all GUP internals in one place. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: validate get_user_pages_fast flagsChristoph Hellwig1-0/+3
We can only deal with FOLL_WRITE and/or FOLL_LONGTERM in get_user_pages_fast, so reject all other flags. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: consolidate the get_user_pages* implementationsChristoph Hellwig5-142/+65
Always build mm/gup.c so that we don't have to provide separate nommu stubs. Also merge the get_user_pages_fast and __get_user_pages_fast stubs when HAVE_FAST_GUP into the main implementations, which will never call the fast path if HAVE_FAST_GUP is not set. This also ensures the new put_user_pages* helpers are available for nommu, as those are currently missing, which would create a problem as soon as we actually grew users for it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: reorder code blocks in gup.cChristoph Hellwig1-205/+205
This moves the actually exported functions towards the end of the file, and reorders some functions to be in more logical blocks as a preparation for moving various stubs inline into the main functionality using IS_ENABLED(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: rename CONFIG_HAVE_GENERIC_GUP to CONFIG_HAVE_FAST_GUPChristoph Hellwig10-18/+11
We only support the generic GUP now, so rename the config option to be more clear, and always use the mm/Kconfig definition of the symbol and select it from the arch Kconfigs. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12sparc64: use the generic get_user_pages_fast codeChristoph Hellwig4-341/+20
The sparc64 code is mostly equivalent to the generic one, minus various bugfixes and two arch overrides that this patch adds to pgtable.h. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Cc: David Miller <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12sparc64: define untagged_addr()Christoph Hellwig1-0/+22
Add a helper to untag a user pointer. This is needed for ADI support in get_user_pages_fast. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Cc: David Miller <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12sparc64: add the missing pgd_page definitionChristoph Hellwig1-0/+3
sparc64 only had pgd_page_vaddr, but not pgd_page. [[email protected]: fix sparc64 build] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: David Miller <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12sh: use the generic get_user_pages_fast codeChristoph Hellwig4-278/+40
The sh code is mostly equivalent to the generic one, minus various bugfixes and two arch overrides that this patch adds to pgtable.h. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12sh: add the missing pud_page definitionChristoph Hellwig1-0/+3
sh only had pud_page_vaddr, but not pud_page. [[email protected]: sh: stub out pud_page] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Guenter Roeck <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12MIPS: use the generic get_user_pages_fast codeChristoph Hellwig4-304/+5
The mips code is mostly equivalent to the generic one, minus various bugfixes and an arch override for gup_fast_permitted. Note that this defines ARCH_HAS_PTE_SPECIAL for mips as mips has pte_special and pte_mkspecial implemented and used in the existing gup code. They are no-op stubs, though which makes me a little unsure if this is really right thing to do. Note that this also adds back a missing cpu_has_dc_aliases check for __get_user_pages_fast, which the old code was only doing for get_user_pages_fast. This clearly looks like an oversight, as any condition that makes get_user_pages_fast unsafe also applies to __get_user_pages_fast. [[email protected]: MIPS: don't select ARCH_HAS_PTE_SPECIAL] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Tested-by: Guenter Roeck <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: lift the x86_32 PAE version of gup_get_pte to common codeChristoph Hellwig5-52/+52
The split low/high access is the only non-READ_ONCE version of gup_get_pte that did show up in the various arch implemenations. Lift it to common code and drop the ifdef based arch override. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: simplify gup_fast_permittedChristoph Hellwig3-24/+9
Pass in the already calculated end value instead of recomputing it, and leave the end > start check in the callers instead of duplicating them in the arch code. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: James Hogan <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: use untagged_addr() for get_user_pages_fast addressesChristoph Hellwig1-2/+2
Patch series "switch the remaining architectures to use generic GUP", v4. A series to switch mips, sh and sparc64 to use the generic GUP code so that we only have one codebase to touch for further improvements to this code. This patch (of 16): This will allow sparc64, or any future architecture with memory tagging to override its tags for get_user_pages and get_user_pages_fast. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Paul Burton <[email protected]> Cc: James Hogan <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: David Miller <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Ralf Baechle <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm, memcg: add a memcg_slabinfo debugfs fileWaiman Long2-0/+64
There are concerns about memory leaks from extensive use of memory cgroups as each memory cgroup creates its own set of kmem caches. There is a possiblity that the memcg kmem caches may remain even after the memory cgroups have been offlined. Therefore, it will be useful to show the status of each of memcg kmem caches. This patch introduces a new <debugfs>/memcg_slabinfo file which is somewhat similar to /proc/slabinfo in format, but lists only information about kmem caches that have child memcg kmem caches. Information available in /proc/slabinfo are not repeated in memcg_slabinfo. A portion of a sample output of the file was: # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs> rpc_inode_cache root 13 51 1 1 rpc_inode_cache 48 0 0 0 0 fat_inode_cache root 1 45 1 1 fat_inode_cache 41 2 45 1 1 xfs_inode root 770 816 24 24 xfs_inode 92 22 34 1 1 xfs_inode 88:dead 1 34 1 1 xfs_inode 89:dead 23 34 1 1 xfs_inode 85 4 34 1 1 xfs_inode 84 9 34 1 1 The css id of the memcg is also listed. If a memcg is not online, the tag ":dead" will be attached as shown above. [[email protected]: memcg: add ":deact" tag for reparented kmem caches in memcg_slabinfo] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: set the flag in the common code as suggested by Roman] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Waiman Long <[email protected]> Suggested-by: Shakeel Butt <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: reparent memcg kmem_caches on cgroup removalRoman Gushchin4-18/+58
Let's reparent non-root kmem_caches on memcg offlining. This allows us to release the memory cgroup without waiting for the last outstanding kernel object (e.g. dentry used by another application). Since the parent cgroup is already charged, everything we need to do is to splice the list of kmem_caches to the parent's kmem_caches list, swap the memcg pointer, drop the css refcounter for each kmem_cache and adjust the parent's css refcounter. Please, note that kmem_cache->memcg_params.memcg isn't a stable pointer anymore. It's safe to read it under rcu_read_lock(), cgroup_mutex held, or any other way that protects the memory cgroup from being released. We can race with the slab allocation and deallocation paths. It's not a big problem: parent's charge and slab global stats are always correct, and we don't care anymore about the child usage and global stats. The child cgroup is already offline, so we don't use or show it anywhere. Local slab stats (NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE) aren't used anywhere except count_shadow_nodes(). But even there it won't break anything: after reparenting "nodes" will be 0 on child level (because we're already reparenting shrinker lists), and on parent level page stats always were 0, and this patch won't change anything. [[email protected]: properly handle kmem_caches reparented to root_mem_cgroup] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pagesRoman Gushchin3-19/+70
Every slab page charged to a non-root memory cgroup has a pointer to the memory cgroup and holds a reference to it, which protects a non-empty memory cgroup from being released. At the same time the page has a pointer to the corresponding kmem_cache, and also hold a reference to the kmem_cache. And kmem_cache by itself holds a reference to the cgroup. So there is clearly some redundancy, which allows to stop setting the page->mem_cgroup pointer and rely on getting memcg pointer indirectly via kmem_cache. Further it will allow to change this pointer easier, without a need to go over all charged pages. So let's stop setting page->mem_cgroup pointer for slab pages, and stop using the css refcounter directly for protecting the memory cgroup from going away. Instead rely on kmem_cache as an intermediate object. Make sure that vmstats and shrinker lists are working as previously, as well as /proc/kpagecgroup interface. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: rework non-root kmem_cache lifecycle managementRoman Gushchin4-79/+96
Currently each charged slab page holds a reference to the cgroup to which it's charged. Kmem_caches are held by the memcg and are released all together with the memory cgroup. It means that none of kmem_caches are released unless at least one reference to the memcg exists, which is very far from optimal. Let's rework it in a way that allows releasing individual kmem_caches as soon as the cgroup is offline, the kmem_cache is empty and there are no pending allocations. To make it possible, let's introduce a new percpu refcounter for non-root kmem caches. The counter is initialized to the percpu mode, and is switched to the atomic mode during kmem_cache deactivation. The counter is bumped for every charged page and also for every running allocation. So the kmem_cache can't be released unless all allocations complete. To shutdown non-active empty kmem_caches, let's reuse the work queue, previously used for the kmem_cache deactivation. Once the reference counter reaches 0, let's schedule an asynchronous kmem_cache release. * I used the following simple approach to test the performance (stolen from another patchset by T. Harding): time find / -name fname-no-exist echo 2 > /proc/sys/vm/drop_caches repeat 10 times Results: orig patched real 0m1.455s real 0m1.355s user 0m0.206s user 0m0.219s sys 0m0.855s sys 0m0.807s real 0m1.487s real 0m1.699s user 0m0.221s user 0m0.256s sys 0m0.806s sys 0m0.948s real 0m1.515s real 0m1.505s user 0m0.183s user 0m0.215s sys 0m0.876s sys 0m0.858s real 0m1.291s real 0m1.380s user 0m0.193s user 0m0.198s sys 0m0.843s sys 0m0.786s real 0m1.364s real 0m1.374s user 0m0.180s user 0m0.182s sys 0m0.868s sys 0m0.806s real 0m1.352s real 0m1.312s user 0m0.201s user 0m0.212s sys 0m0.820s sys 0m0.761s real 0m1.302s real 0m1.349s user 0m0.205s user 0m0.203s sys 0m0.803s sys 0m0.792s real 0m1.334s real 0m1.301s user 0m0.194s user 0m0.201s sys 0m0.806s sys 0m0.779s real 0m1.426s real 0m1.434s user 0m0.216s user 0m0.181s sys 0m0.824s sys 0m0.864s real 0m1.350s real 0m1.295s user 0m0.200s user 0m0.190s sys 0m0.842s sys 0m0.811s So it looks like the difference is not noticeable in this test. [[email protected]: fix an use-after-free in kmemcg_workfn()] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Qian Cai <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcg/slab: synchronize access to kmem_cache dying flag using a spinlockRoman Gushchin1-3/+12
Currently the memcg_params.dying flag and the corresponding workqueue used for the asynchronous deactivation of kmem_caches is synchronized using the slab_mutex. It makes impossible to check this flag from the irq context, which will be required in order to implement asynchronous release of kmem_caches. So let's switch over to the irq-save flavor of the spinlock-based synchronization. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Waiman Long <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>