aboutsummaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)AuthorFilesLines
2018-10-21shmem: Convert shmem_add_to_page_cache to XArrayMatthew Wilcox1-47/+34
We can use xas_find_conflict() instead of radix_tree_gang_lookup_slot() to find any conflicting entry and combine the three paths through this function into one. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21shmem: Convert find_swap_entry to XArrayMatthew Wilcox1-17/+10
This is a 1:1 conversion. The major part of this patch is converting the test framework from userspace to kernel space and mirroring the algorithm now used in find_swap_entry(). Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21shmem: Convert shmem_confirm_swap to XArrayMatthew Wilcox1-6/+1
xa_load has its own RCU locking, so we can eliminate it here. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21shmem: Convert shmem_radix_tree_replace to XArrayMatthew Wilcox1-14/+8
Rename shmem_radix_tree_replace() to shmem_replace_entry() and convert it to use the XArray API. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21pagevec: Use xa_mark_tMatthew Wilcox1-2/+2
Removes sparse warnings. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21mm: Convert is_page_cache_freeable to XArrayMatthew Wilcox1-4/+4
This is just a variable rename and comment change. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21mm: Convert khugepaged_scan_shmem to XArrayMatthew Wilcox1-12/+5
Slightly shorter and easier to read code. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21mm: Convert collapse_shmem to XArrayMatthew Wilcox1-93/+66
I found another victim of the radix tree being hard to use. Because there was no call to radix_tree_preload(), khugepaged was allocating radix_tree_nodes using GFP_ATOMIC. I also converted a local_irq_save()/restore() pair to disable()/enable(). Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21mm: Convert huge_memory to XArrayMatthew Wilcox1-10/+7
Quite a straightforward conversion. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21mm: Convert page migration to XArrayMatthew Wilcox1-30/+18
Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21mm: Convert __do_page_cache_readahead to XArrayMatthew Wilcox1-3/+1
This one is trivial. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21mm: Convert delete_from_swap_cache to XArrayMatthew Wilcox2-15/+11
Both callers of __delete_from_swap_cache have the swp_entry_t already, so pass that in to make constructing the XA_STATE easier. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21mm: Convert add_to_swap_cache to XArrayMatthew Wilcox1-64/+29
Combine __add_to_swap_cache and add_to_swap_cache into one function since there is no more need to preload. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21mm: Convert truncate to XArrayMatthew Wilcox1-9/+6
This is essentially xa_cmpxchg() with the locking handled above us, and it doesn't have to handle replacing a NULL entry. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21mm: Convert workingset to XArrayMatthew Wilcox1-30/+21
We construct an XA_STATE and use it to delete the node with xas_store() rather than adding a special function for this unique use case. Includes a test that simulates this usage for the test suite. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21mm: Convert page-writeback to XArrayMatthew Wilcox1-46/+26
Includes moving mapping_tagged() to fs.h as a static inline, and changing it to return bool. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache: Convert filemap_range_has_page to XArrayMatthew Wilcox1-8/+19
Instead of calling find_get_pages_range() and putting any reference, use xas_find() to iterate over any entries in the range, skipping the shadow/swap entries. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache: Remove stray radix commentMatthew Wilcox1-1/+1
Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache: Convert delete_batch to XArrayMatthew Wilcox1-15/+13
Rename the function from page_cache_tree_delete_batch to just page_cache_delete_batch. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache: Convert filemap_map_pages to XArrayMatthew Wilcox1-29/+13
Slight change of strategy here; if we have trouble getting hold of a page for whatever reason (eg a compound page is split underneath us), don't spin to stabilise the page, just continue the iteration, like we would if we failed to trylock the page. Since this is a speculative optimisation, it feels like we should allow the process to take an extra fault if it turns out to need this page instead of spending time to pin down a page it may not need. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache: Convert find_get_entries_tag to XArrayMatthew Wilcox1-30/+24
Slightly shorter and simpler code. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache; Convert find_get_pages_range_tag to XArrayMatthew Wilcox1-42/+26
The 'end' parameter of the xas_for_each iterator avoids a useless iteration at the end of the range. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache: Convert find_get_pages_contig to XArrayMatthew Wilcox1-31/+22
There's no direct replacement for radix_tree_for_each_contig() in the XArray API as it's an unusual thing to do. Instead, open-code a loop using xas_next(). This removes the only user of radix_tree_for_each_contig() so delete the iterator from the API and the test suite code for it. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache: Convert find_get_pages_range to XArrayMatthew Wilcox1-33/+19
The 'end' parameter of the xas_for_each iterator avoids a useless iteration at the end of the range. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache: Convert find_get_entries to XArrayMatthew Wilcox1-28/+23
Slightly shorter and simpler code. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache: Convert find_get_entry to XArrayMatthew Wilcox1-35/+28
Slightly shorter and simpler code. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache: Convert page deletion to XArrayMatthew Wilcox1-18/+13
The code is slightly shorter and simpler. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache: Add and replace pages using the XArrayMatthew Wilcox1-82/+57
Use the XArray APIs to add and replace pages in the page cache. This removes two uses of the radix tree preload API and is significantly shorter code. It also removes the last user of __radix_tree_create() outside radix-tree.c itself, so make it static. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21page cache: Convert hole search to XArrayMatthew Wilcox2-62/+52
The page cache offers the ability to search for a miss in the previous or next N locations. Rather than teach the XArray about the page cache's definition of a miss, use xas_prev() and xas_next() to search the page array. This should be more efficient as it does not have to start the lookup from the top for each index. Signed-off-by: Matthew Wilcox <[email protected]>
2018-10-21xarray: Define struct xa_nodeMatthew Wilcox1-8/+8
This is a direct replacement for struct radix_tree_node. A couple of struct members have changed name, so convert those. Use a #define so that radix tree users continue to work without change. Signed-off-by: Matthew Wilcox <[email protected]> Reviewed-by: Josef Bacik <[email protected]>
2018-10-18mremap: properly flush TLB before releasing the pageLinus Torvalds2-23/+17
Jann Horn points out that our TLB flushing was subtly wrong for the mremap() case. What makes mremap() special is that we don't follow the usual "add page to list of pages to be freed, then flush tlb, and then free pages". No, mremap() obviously just _moves_ the page from one page table location to another. That matters, because mremap() thus doesn't directly control the lifetime of the moved page with a freelist: instead, the lifetime of the page is controlled by the page table locking, that serializes access to the entry. As a result, we need to flush the TLB not just before releasing the lock for the source location (to avoid any concurrent accesses to the entry), but also before we release the destination page table lock (to avoid the TLB being flushed after somebody else has already done something to that page). This also makes the whole "need_flush" logic unnecessary, since we now always end up flushing the TLB for every valid entry. Reported-and-tested-by: Jann Horn <[email protected]> Acked-by: Will Deacon <[email protected]> Tested-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-16mm: move is_kernel_rodata() to asm-generic/sections.hBartosz Golaszewski1-7/+0
Export this routine so that we can use it later in devm_kstrdup_const() and devm_kfree(). Signed-off-by: Bartosz Golaszewski <[email protected]> Reviewed-by: Bjorn Andersson <[email protected]> Acked-by: Mike Rapoport <[email protected]> Acked-by: Rasmus Villemoes <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-13mm/thp: fix call to mmu_notifier in set_pmd_migration_entry() v2Jérôme Glisse1-6/+0
Inside set_pmd_migration_entry() we are holding page table locks and thus we can not sleep so we can not call invalidate_range_start/end() So remove call to mmu_notifier_invalidate_range_start/end() because they are call inside the function calling set_pmd_migration_entry() (see try_to_unmap_one()). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jérôme Glisse <[email protected]> Reported-by: Andrea Arcangeli <[email protected]> Reviewed-by: Zi Yan <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Nellans <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-13mm/mmap.c: don't clobber partially overlapping VMA with MAP_FIXED_NOREPLACEJann Horn1-1/+1
Daniel Micay reports that attempting to use MAP_FIXED_NOREPLACE in an application causes that application to randomly crash. The existing check for handling MAP_FIXED_NOREPLACE looks up the first VMA that either overlaps or follows the requested region, and then bails out if that VMA overlaps *the start* of the requested region. It does not bail out if the VMA only overlaps another part of the requested region. Fix it by checking that the found VMA only starts at or after the end of the requested region, in which case there is no overlap. Test case: user@debian:~$ cat mmap_fixed_simple.c #include <sys/mman.h> #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #ifndef MAP_FIXED_NOREPLACE #define MAP_FIXED_NOREPLACE 0x100000 #endif int main(void) { char *p; errno = 0; p = mmap((void*)0x10001000, 0x4000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED_NOREPLACE, -1, 0); printf("p1=%p err=%m\n", p); errno = 0; p = mmap((void*)0x10000000, 0x2000, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED_NOREPLACE, -1, 0); printf("p2=%p err=%m\n", p); char cmd[100]; sprintf(cmd, "cat /proc/%d/maps", getpid()); system(cmd); return 0; } user@debian:~$ gcc -o mmap_fixed_simple mmap_fixed_simple.c user@debian:~$ ./mmap_fixed_simple p1=0x10001000 err=Success p2=0x10000000 err=Success 10000000-10002000 r--p 00000000 00:00 0 10002000-10005000 ---p 00000000 00:00 0 564a9a06f000-564a9a070000 r-xp 00000000 fe:01 264004 /home/user/mmap_fixed_simple 564a9a26f000-564a9a270000 r--p 00000000 fe:01 264004 /home/user/mmap_fixed_simple 564a9a270000-564a9a271000 rw-p 00001000 fe:01 264004 /home/user/mmap_fixed_simple 564a9a54a000-564a9a56b000 rw-p 00000000 00:00 0 [heap] 7f8eba447000-7f8eba5dc000 r-xp 00000000 fe:01 405885 /lib/x86_64-linux-gnu/libc-2.24.so 7f8eba5dc000-7f8eba7dc000 ---p 00195000 fe:01 405885 /lib/x86_64-linux-gnu/libc-2.24.so 7f8eba7dc000-7f8eba7e0000 r--p 00195000 fe:01 405885 /lib/x86_64-linux-gnu/libc-2.24.so 7f8eba7e0000-7f8eba7e2000 rw-p 00199000 fe:01 405885 /lib/x86_64-linux-gnu/libc-2.24.so 7f8eba7e2000-7f8eba7e6000 rw-p 00000000 00:00 0 7f8eba7e6000-7f8eba809000 r-xp 00000000 fe:01 405876 /lib/x86_64-linux-gnu/ld-2.24.so 7f8eba9e9000-7f8eba9eb000 rw-p 00000000 00:00 0 7f8ebaa06000-7f8ebaa09000 rw-p 00000000 00:00 0 7f8ebaa09000-7f8ebaa0a000 r--p 00023000 fe:01 405876 /lib/x86_64-linux-gnu/ld-2.24.so 7f8ebaa0a000-7f8ebaa0b000 rw-p 00024000 fe:01 405876 /lib/x86_64-linux-gnu/ld-2.24.so 7f8ebaa0b000-7f8ebaa0c000 rw-p 00000000 00:00 0 7ffcc99fa000-7ffcc9a1b000 rw-p 00000000 00:00 0 [stack] 7ffcc9b44000-7ffcc9b47000 r--p 00000000 00:00 0 [vvar] 7ffcc9b47000-7ffcc9b49000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] user@debian:~$ uname -a Linux debian 4.19.0-rc6+ #181 SMP Wed Oct 3 23:43:42 CEST 2018 x86_64 GNU/Linux user@debian:~$ As you can see, the first page of the mapping at 0x10001000 was clobbered. Link: http://lkml.kernel.org/r/[email protected] Fixes: a4ff8e8620d3 ("mm: introduce MAP_FIXED_NOREPLACE") Signed-off-by: Jann Horn <[email protected]> Reported-by: Daniel Micay <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: John Hubbard <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-11Merge branch 'sched-urgent-for-linus' of ↵Greg Kroah-Hartman1-10/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "scheduler fix: Cleanup of dead code left over from the recent sched/numa fixes." * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mm, sched/numa: Remove remaining traces of NUMA rate-limiting
2018-10-09x86/mm: Page size aware flush_tlb_mm_range()Peter Zijlstra1-0/+1
Use the new tlb_get_unmap_shift() to determine the stride of the INVLPG loop. Cc: Nick Piggin <[email protected]> Cc: Will Deacon <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
2018-10-09Merge branch 'tlb/asm-generic' of ↵Peter Zijlstra3-250/+264
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into x86/mm Pull in the generic mmu_gather changes from the ARM64 tree such that we can put x86 specific things on top as well.
2018-10-09mm, sched/numa: Remove remaining traces of NUMA rate-limitingSrikar Dronamraju1-10/+0
Remove the leftover pglist_data::numabalancing_migrate_lock and its initialization, we stopped using this lock with: efaffc5e40ae ("mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration") [ mingo: Rewrote the changelog. ] Signed-off-by: Srikar Dronamraju <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Linux-MM <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-10-07percpu: stop leaking bitmap metadata blocksMike Rapoport1-0/+1
The commit ca460b3c9627 ("percpu: introduce bitmap metadata blocks") introduced bitmap metadata blocks. These metadata blocks are allocated whenever a new chunk is created, but they are never freed. Fix it. Fixes: ca460b3c9627 ("percpu: introduce bitmap metadata blocks") Signed-off-by: Mike Rapoport <[email protected]> Cc: [email protected] Signed-off-by: Dennis Zhou <[email protected]>
2018-10-05Merge branch 'akpm'Greg Kroah-Hartman8-23/+132
* akpm: mm: madvise(MADV_DODUMP): allow hugetlbfs pages ocfs2: fix locking for res->tracking and dlm->tracking_list mm/vmscan.c: fix int overflow in callers of do_shrink_slab() mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly mm/vmstat.c: fix outdated vmstat_text proc: restrict kernel stack dumps to root mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizes mm/migrate.c: split only transparent huge pages when allocation fails ipc/shm.c: use ERR_CAST() for shm_lock() error return mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl mm, thp: fix mlocking THP page with migration enabled ocfs2: fix crash in ocfs2_duplicate_clusters_by_page() hugetlb: take PMD sharing into account when flushing tlb/caches mm: migration: fix migration of huge PMD shared pages
2018-10-05mm: madvise(MADV_DODUMP): allow hugetlbfs pagesDaniel Black1-1/+1
Reproducer, assuming 2M of hugetlbfs available: Hugetlbfs mounted, size=2M and option user=testuser # mount | grep ^hugetlbfs hugetlbfs on /dev/hugepages type hugetlbfs (rw,pagesize=2M,user=dan) # sysctl vm.nr_hugepages=1 vm.nr_hugepages = 1 # grep Huge /proc/meminfo AnonHugePages: 0 kB ShmemHugePages: 0 kB HugePages_Total: 1 HugePages_Free: 1 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 2048 kB Code: #include <sys/mman.h> #include <stddef.h> #define SIZE 2*1024*1024 int main() { void *ptr; ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_HUGETLB | MAP_ANONYMOUS, -1, 0); madvise(ptr, SIZE, MADV_DONTDUMP); madvise(ptr, SIZE, MADV_DODUMP); } Compile and strace: mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7ff7c9200000 madvise(0x7ff7c9200000, 2097152, MADV_DONTDUMP) = 0 madvise(0x7ff7c9200000, 2097152, MADV_DODUMP) = -1 EINVAL (Invalid argument) hugetlbfs pages have VM_DONTEXPAND in the VmFlags driver pages based on author testing with analysis from Florian Weimer[1]. The inclusion of VM_DONTEXPAND into the VM_SPECIAL defination was a consequence of the large useage of VM_DONTEXPAND in device drivers. A consequence of [2] is that VM_DONTEXPAND marked pages are unable to be marked DODUMP. A user could quite legitimately madvise(MADV_DONTDUMP) their hugetlbfs memory for a while and later request that madvise(MADV_DODUMP) on the same memory. We correct this omission by allowing madvice(MADV_DODUMP) on hugetlbfs pages. [1] https://stackoverflow.com/questions/52548260/madvisedodump-on-the-same-ptr-size-as-a-successful-madvisedontdump-fails-wit [2] commit 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers") Link: http://lkml.kernel.org/r/[email protected] Link: https://lists.launchpad.net/maria-discuss/msg05245.html Fixes: 0103bd16fb90 ("mm: prepare VM_DONTDUMP for using in drivers") Reported-by: Kenneth Penza <[email protected]> Signed-off-by: Daniel Black <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-05mm/vmscan.c: fix int overflow in callers of do_shrink_slab()Kirill Tkhai1-4/+3
do_shrink_slab() returns unsigned long value, and the placing into int variable cuts high bytes off. Then we compare ret and 0xfffffffe (since SHRINK_EMPTY is converted to ret type). Thus a large number of objects returned by do_shrink_slab() may be interpreted as SHRINK_EMPTY, if low bytes of their value are equal to 0xfffffffe. Fix that by declaration ret as unsigned long in these functions. Link: http://lkml.kernel.org/r/153813407177.17544.14888305435570723973.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <[email protected]> Reported-by: Cyrill Gorcunov <[email protected]> Acked-by: Cyrill Gorcunov <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-05mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properlyJann Horn1-0/+3
5dd0b16cdaff ("mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP") made the availability of the NR_TLB_REMOTE_FLUSH* counters inside the kernel unconditional to reduce #ifdef soup, but (either to avoid showing dummy zero counters to userspace, or because that code was missed) didn't update the vmstat_array, meaning that all following counters would be shown with incorrect values. This only affects kernel builds with CONFIG_VM_EVENT_COUNTERS=y && CONFIG_DEBUG_TLBFLUSH=y && CONFIG_SMP=n. Link: http://lkml.kernel.org/r/[email protected] Fixes: 5dd0b16cdaff ("mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP") Signed-off-by: Jann Horn <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Kemi Wang <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-05mm/vmstat.c: fix outdated vmstat_textJann Horn1-1/+0
7a9cdebdcc17 ("mm: get rid of vmacache_flush_all() entirely") removed the VMACACHE_FULL_FLUSHES statistics, but didn't remove the corresponding entry in vmstat_text. This causes an out-of-bounds access in vmstat_show(). Luckily this only affects kernels with CONFIG_DEBUG_VM_VMACACHE=y, which is probably very rare. Link: http://lkml.kernel.org/r/[email protected] Fixes: 7a9cdebdcc17 ("mm: get rid of vmacache_flush_all() entirely") Signed-off-by: Jann Horn <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Kemi Wang <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-05mm/migrate.c: split only transparent huge pages when allocation failsAnshuman Khandual1-1/+1
split_huge_page_to_list() fails on HugeTLB pages. I was experimenting with moving 32MB contig HugeTLB pages on arm64 (with a debug patch applied) and hit the following stack trace when the kernel crashed. [ 3732.462797] Call trace: [ 3732.462835] split_huge_page_to_list+0x3b0/0x858 [ 3732.462913] migrate_pages+0x728/0xc20 [ 3732.462999] soft_offline_page+0x448/0x8b0 [ 3732.463097] __arm64_sys_madvise+0x724/0x850 [ 3732.463197] el0_svc_handler+0x74/0x110 [ 3732.463297] el0_svc+0x8/0xc [ 3732.463347] Code: d1000400 f90b0e60 f2fbd5a2 a94982a1 (f9000420) When unmap_and_move[_huge_page]() fails due to lack of memory, the splitting should happen only for transparent huge pages not for HugeTLB pages. PageTransHuge() returns true for both THP and HugeTLB pages. Hence the conditonal check should test PagesHuge() flag to make sure that given pages is not a HugeTLB one. Link: http://lkml.kernel.org/r/[email protected] Fixes: 94723aafb9 ("mm: unclutter THP migration") Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Zi Yan <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-05mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctlYueHaibing1-1/+2
get_user_pages_fast() will return negative value if no pages were pinned, then be converted to a unsigned, which is compared to zero, giving the wrong result. Link: http://lkml.kernel.org/r/[email protected] Fixes: 09e35a4a1ca8 ("mm/gup_benchmark: handle gup failures") Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-05mm, thp: fix mlocking THP page with migration enabledKirill A. Shutemov2-1/+4
A transparent huge page is represented by a single entry on an LRU list. Therefore, we can only make unevictable an entire compound page, not individual subpages. If a user tries to mlock() part of a huge page, we want the rest of the page to be reclaimable. We handle this by keeping PTE-mapped huge pages on normal LRU lists: the PMD on border of VM_LOCKED VMA will be split into PTE table. Introduction of THP migration breaks[1] the rules around mlocking THP pages. If we had a single PMD mapping of the page in mlocked VMA, the page will get mlocked, regardless of PTE mappings of the page. For tmpfs/shmem it's easy to fix by checking PageDoubleMap() in remove_migration_pmd(). Anon THP pages can only be shared between processes via fork(). Mlocked page can only be shared if parent mlocked it before forking, otherwise CoW will be triggered on mlock(). For Anon-THP, we can fix the issue by munlocking the page on removing PTE migration entry for the page. PTEs for the page will always come after mlocked PMD: rmap walks VMAs from oldest to newest. Test-case: #include <unistd.h> #include <sys/mman.h> #include <sys/wait.h> #include <linux/mempolicy.h> #include <numaif.h> int main(void) { unsigned long nodemask = 4; void *addr; addr = mmap((void *)0x20000000UL, 2UL << 20, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, -1, 0); if (fork()) { wait(NULL); return 0; } mlock(addr, 4UL << 10); mbind(addr, 2UL << 20, MPOL_PREFERRED | MPOL_F_RELATIVE_NODES, &nodemask, 4, MPOL_MF_MOVE); return 0; } [1] https://lkml.kernel.org/r/CAOMGZ=G52R-30rZvhGxEbkTw7rLLwBGadVYeo--iizcD3upL3A@mail.gmail.com Link: http://lkml.kernel.org/r/[email protected] Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") Signed-off-by: Kirill A. Shutemov <[email protected]> Reported-by: Vegard Nossum <[email protected]> Reviewed-by: Zi Yan <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: <[email protected]> [4.14+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-05hugetlb: take PMD sharing into account when flushing tlb/cachesMike Kravetz1-9/+44
When fixing an issue with PMD sharing and migration, it was discovered via code inspection that other callers of huge_pmd_unshare potentially have an issue with cache and tlb flushing. Use the routine adjust_range_if_pmd_sharing_possible() to calculate worst case ranges for mmu notifiers. Ensure that this range is flushed if huge_pmd_unshare succeeds and unmaps a PUD_SUZE area. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-05mm: migration: fix migration of huge PMD shared pagesMike Kravetz2-5/+74
The page migration code employs try_to_unmap() to try and unmap the source page. This is accomplished by using rmap_walk to find all vmas where the page is mapped. This search stops when page mapcount is zero. For shared PMD huge pages, the page map count is always 1 no matter the number of mappings. Shared mappings are tracked via the reference count of the PMD page. Therefore, try_to_unmap stops prematurely and does not completely unmap all mappings of the source page. This problem can result is data corruption as writes to the original source page can happen after contents of the page are copied to the target page. Hence, data is lost. This problem was originally seen as DB corruption of shared global areas after a huge page was soft offlined due to ECC memory errors. DB developers noticed they could reproduce the issue by (hotplug) offlining memory used to back huge pages. A simple testcase can reproduce the problem by creating a shared PMD mapping (note that this must be at least PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using migrate_pages() to migrate process pages between nodes while continually writing to the huge pages being migrated. To fix, have the try_to_unmap_one routine check for huge PMD sharing by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a shared mapping it will be 'unshared' which removes the page table entry and drops the reference on the PMD page. After this, flush caches and TLB. mmu notifiers are called before locking page tables, but we can not be sure of PMD sharing until page tables are locked. Therefore, check for the possibility of PMD sharing before locking so that notifiers can prepare for the worst possible case. Link: http://lkml.kernel.org/r/[email protected] [[email protected]: make _range_in_vma() a static inline] Link: http://lkml.kernel.org/r/[email protected] Fixes: 39dde65c9940 ("shared page table for hugetlb page") Signed-off-by: Mike Kravetz <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-10-04Merge branch 'linus' into x86/core, to pick up fixesIngo Molnar16-119/+118
Signed-off-by: Ingo Molnar <[email protected]>