aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-02-24mm/memory_hotplug.c: fix overflow in test_pages_in_a_zone()zhong jiang1-2/+2
When mainline introduced commit a96dfddbcc04 ("base/memory, hotplug: fix a kernel oops in show_valid_zones()"), it obtained the valid start and end pfn from the given pfn range. The valid start pfn can fix the actual issue, but it introduced another issue. The valid end pfn will may exceed the given end_pfn. Although the incorrect overflow will not result in actual problem at present, but I think it need to be fixed. [[email protected]: remove assumption that end_pfn is aligned by MAX_ORDER_NR_PAGES] Fixes: a96dfddbcc04 ("base/memory, hotplug: fix a kernel oops in show_valid_zones()") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: zhong jiang <[email protected]> Signed-off-by: Toshi Kani <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24zram: extend zero pages to same element pageszhouxianrong3-33/+69
The idea is that without doing more calculations we extend zero pages to same element pages for zram. zero page is special case of same element page with zero element. 1. the test is done under android 7.0 2. startup too many applications circularly 3. sample the zero pages, same pages (none-zero element) and total pages in function page_zero_filled the result is listed as below: ZERO SAME TOTAL 36214 17842 598196 ZERO/TOTAL SAME/TOTAL (ZERO+SAME)/TOTAL ZERO/SAME AVERAGE 0.060631909 0.024990816 0.085622726 2.663825038 STDEV 0.00674612 0.005887625 0.009707034 2.115881328 MAX 0.069698422 0.030046087 0.094975336 7.56043956 MIN 0.03959586 0.007332205 0.056055193 1.928985507 from the above data, the benefit is about 2.5% and up to 3% of total swapout pages. The defect of the patch is that when we recovery a page from non-zero element the operations are low efficient for partial read. This patch extends zero_page to same_page so if there is any user to have monitored zero_pages, he will be surprised if the number is increased but it's not harmful, I believe. [[email protected]: do not free same element pages in zram_meta_free] Link: http://lkml.kernel.org/r/20170207065741.GA2567@bbox Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: zhouxianrong <[email protected]> Signed-off-by: Minchan Kim <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm/page-writeback.c: place "not" inside of unlikely() statement in ↵Steven Rostedt (VMware)1-1/+1
wb_domain_writeout_inc() The likely/unlikely profiler noticed that the unlikely statement in wb_domain_writeout_inc() is constantly wrong. This is due to the "not" (!) being outside the unlikely statement. It is likely that dom->period_time will be set, but unlikely that it wont be. Move the not into the unlikely statement. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24powerpc/mm/autonuma: switch ppc64 to its own implementation of saved writeAneesh Kumar K.V1-7/+45
With this our protnone becomes a present pte with READ/WRITE/EXEC bit cleared. By default we also set _PAGE_PRIVILEGED on such pte. This is now used to help us identify a protnone pte that as saved write bit. For such pte, we will clear the _PAGE_PRIVILEGED bit. The pte still remain non-accessible from both user and kernel. [[email protected]: v3] Link: http://lkml.kernel.org/r/1487498625-10891-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Acked-by: Michael Neuling <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm/ksm: handle protnone saved writes when making page write protectAneesh Kumar K.V2-2/+15
Without this KSM will consider the page write protected, but a numa fault can later mark the page writable. This can result in memory corruption. Link: http://lkml.kernel.org/r/1487498625-10891-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm/autonuma: let architecture override how the write bit should be stashed ↵Aneesh Kumar K.V4-5/+21
in a protnone pte. Patch series "Numabalancing preserve write fix", v2. This patch series address an issue w.r.t THP migration and autonuma preserve write feature. migrate_misplaced_transhuge_page() cannot deal with concurrent modification of the page. It does a page copy without following the migration pte sequence. IIUC, this was done to keep the migration simpler and at the time of implemenation we didn't had THP page cache which would have required a more elaborate migration scheme. That means thp autonuma migration expect the protnone with saved write to be done such that both kernel and user cannot update the page content. This patch series enables archs like ppc64 to do that. We are good with the hash translation mode with the current code, because we never create a hardware page table entry for a protnone pte. This patch (of 2): Autonuma preserves the write permission across numa fault to avoid taking a writefault after a numa fault (Commit: b191f9b106ea " mm: numa: preserve PTE write permissions across a NUMA hinting fault"). Architecture can implement protnone in different ways and some may choose to implement that by clearing Read/ Write/Exec bit of pte. Setting the write bit on such pte can result in wrong behaviour. Fix this up by allowing arch to override how to save the write bit on a protnone pte. [[email protected]: don't mark pte saved write in case of dirty_accountable] Link: http://lkml.kernel.org/r/1487942884-16517-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com [[email protected]: v3] Link: http://lkml.kernel.org/r/1487498625-10891-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Acked-by: Michael Neuling <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm/autonuma: don't use set_pte_at when updating protnone ptesAneesh Kumar K.V1-9/+9
Architectures like ppc64, use privilege access bit to mark pte non accessible. This implies that kernel can do a copy_to_user to an address marked for numa fault. This also implies that there can be a parallel hardware update for the pte. set_pte_at cannot be used in such scenarios. Hence switch the pte update to use ptep_get_and_clear and set_pte_at combination. [[email protected]: remove unwanted ppc change, per Aneesh] Link: http://lkml.kernel.org/r/1486400776-28114-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm/shmem.c: fix unlikely() test of info->seals to test only for WRITE and GROWSteven Rostedt (VMware)1-1/+1
Running my likely/unlikely profiler, I discovered that the test in shmem_write_begin() that tests for info->seals as unlikely, is always incorrect. This is because shmem_get_inode() sets info->seals to have F_SEAL_SEAL set by default, and it is unlikely to be cleared when shmem_write_begin() is called. Thus, the if statement is very likely. But as the if statement block only cares about F_SEAL_WRITE and F_SEAL_GROW, change the test to only test those two bits. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: David Herrmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm, vmscan: clear PGDAT_WRITEBACK when zone is balancedMel Gorman1-0/+1
Hillf Danton pointed out that since commit 1d82de618dd ("mm, vmscan: make kswapd reclaim in terms of nodes") that PGDAT_WRITEBACK is no longer cleared. It was not noticed as triggering it requires pages under writeback to cycle twice through the LRU and before kswapd gets stalled. Historically, such issues tended to occur on small machines writing heavily to slow storage such as a USB stick. Once kswapd stalls, direct reclaim stalls may be higher but due to the fact that memory pressure is required, it would not be very noticable. Michal Hocko suggested removing the flag entirely but the conservative fix is to restore the intended PGDAT_WRITEBACK behaviour and clear the flag when a suitable zone is balanced. Fixes: 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24drm: remove unnecessary fault wrappersRoss Zwisler1-25/+5
The fault wrappers drm_vm_fault(), drm_vm_shm_fault(), drm_vm_dma_fault() and drm_vm_sg_fault() used to provide extra logic beyond what was in the "drm_do_*" versions of these functions, but as of commit ca0b07d9a969 ("drm: convert drm from nopage to fault") they are just unnecessary wrappers that do nothing. Remove them, and rename the the drm_do_* fault handlers to remove the "do_" since they no longer have corresponding wrappers. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ross Zwisler <[email protected]> Reviewed-by: Sean Paul <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: Dave Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: codgin-style fixesTobin C Harding1-31/+29
Fix whitespace issues, extraneous braces. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tobin C Harding <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm/memory.c: use NULL instead of literal 0Tobin C Harding1-3/+3
Patch fixes sparse warning: Using plain integer as NULL pointer. Replaces assignment of 0 to pointer with NULL assignment. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tobin C Harding <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm/page_alloc.c: remove duplicate inclusion of page_ext.hMasanari Iida1-1/+0
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masanari Iida <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24vmalloc: back off when the current task is killedMichal Hocko1-0/+5
__vmalloc_area_node() allocates pages to cover the requested vmalloc size. This can be a lot of memory. If the current task is killed by the OOM killer, and thus has an unlimited access to memory reserves, it can consume all the memory theoretically. Fix this by checking for fatal_signal_pending and back off early. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: cma: print allocation failure reason and bitmap statusJaewon Kim1-1/+33
There are many reasons of CMA allocation failure such as EBUSY, ENOMEM, EINTR. But we did not know error reason so far. This patch prints the error value. Additionally if CONFIG_CMA_DEBUG is enabled, this patch shows bitmap status to know available pages. Actually CMA internally tries on all available regions because some regions can be failed because of EBUSY. Bitmap status is useful to know in detail on both ENONEM and EBUSY; ENOMEM: not tried at all because of no available region it could be too small total region or could be fragmentation issue EBUSY: tried some region but all failed This is an ENOMEM example with this patch. [2: Binder:714_1: 744] cma: cma_alloc: alloc failed, req-size: 256 pages, ret: -12 If CONFIG_CMA_DEBUG is enabled, avabile pages also will be shown as concatenated size@position format. So 4@572 means that there are 4 available pages at 572 position starting from 0 position. [2: Binder:714_1: 744] cma: number of available pages: 4@572+7@585+7@601+8@632+38@730+166@1114+127@1921=> 357 free of 2048 total pages Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jaewon Kim <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm, madvise: fail with ENOMEM when splitting vma will hit max_map_countDavid Rientjes5-17/+56
If madvise(2) advice will result in the underlying vma being split and the number of areas mapped by the process will exceed /proc/sys/vm/max_map_count as a result, return ENOMEM instead of EAGAIN. EAGAIN is returned by madvise(2) when a kernel resource, such as slab, is temporarily unavailable. It indicates that userspace should retry the advice in the near future. This is important for advice such as MADV_DONTNEED which is often used by malloc implementations to free memory back to the system: we really do want to free memory back when madvise(2) returns EAGAIN because slab allocations (for vmas, anon_vmas, or mempolicies) cannot be allocated. Encountering /proc/sys/vm/max_map_count is not a temporary failure, however, so return ENOMEM to indicate this is a more serious issue. A followup patch to the man page will specify this behavior. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Rientjes <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jerome Marchand <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: wire up GFP flag passing in dma_alloc_from_contiguousLucas Stach9-19/+24
The callers of the DMA alloc functions already provide the proper context GFP flags. Make sure to pass them through to the CMA allocator, to make the CMA compaction context aware. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Lucas Stach <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: cma_alloc: allow to specify GFP maskLucas Stach5-6/+9
Most users of this interface just want to use it with the default GFP_KERNEL flags, but for cases where DMA memory is allocated it may be called from a different context. No functional change yet, just passing through the flag to the underlying alloc_contig_range function. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Lucas Stach <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: alloc_contig_range: allow to specify GFP maskLucas Stach4-5/+8
Currently alloc_contig_range assumes that the compaction should be done with the default GFP_KERNEL flags. This is probably right for all current uses of this interface, but may change as CMA is used in more use-cases (including being the default DMA memory allocator on some platforms). Change the function prototype, to allow for passing through the GFP mask set by upper layers. Also respect global restrictions by applying memalloc_noio_flags to the passed in flags. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Lucas Stach <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24userfaultfd: documentation updateMike Rapoport1-0/+89
Add documentation about new userfaultfd features and events Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24userfaultfd_copy: return -ENOSPC in case mm has goneMike Rapoport1-0/+2
In the non-cooperative userfaultfd case, the process exit may race with outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC instead of -EINVAL when mm is already gone will allow uffd monitor to distinguish this case from other error conditions. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24userfaultfd: mcopy_atomic: return -ENOENT when no compatible VMA foundMike Rapoport1-30/+28
The memory mapping of a process may change between #PF event and the call to mcopy_atomic that comes to resolve the page fault. In such case, there will be no VMA covering the range passed to mcopy_atomic or the VMA will not have userfaultfd context. To allow uffd monitor to distinguish those case from other errors, let's return -ENOENT instead of -EINVAL. Note, that despite availability of UFFD_EVENT_UNMAP there still might be race between the processing of UFFD_EVENT_UNMAP and outstanding mcopy_atomic in case of non-cooperative uffd usage. [[email protected]: update cases returning -ENOENT] Link: http://lkml.kernel.org/r/20170207150249.GA6709@rapoport-lnx [[email protected]: merge fix] [[email protected]: fix the merge fix] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24userfaultfd: non-cooperative: add event for exit() notificationMike Rapoport4-1/+41
Allow userfaultfd monitor track termination of the processes that have memory backed by the uffd. [[email protected]: add comment] Link: http://lkml.kernel.org/r/20170202135448.GB19804@rapoport-lnxLink: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24userfaultfd: non-cooperative: add event for memory unmapsMike Rapoport15-45/+160
When a non-cooperative userfaultfd monitor copies pages in the background, it may encounter regions that were already unmapped. Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely changes in the virtual memory layout. Since there might be different uffd contexts for the affected VMAs, we first should create a temporary representation for the unmap event for each uffd context and then notify them one by one to the appropriate userfault file descriptors. The event notification occurs after the mmap_sem has been released. [[email protected]: fix nommu build] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix nommu build] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: call vm_munmap in munmap syscall instead of using open coded versionMike Rapoport1-8/+1
Patch series "userfaultfd: non-cooperative: better tracking for mapping changes", v2. These patches try to address issues I've encountered during integration of userfaultfd with CRIU. Previously added userfaultfd events for fork(), madvise() and mremap() unfortunately do not cover all possible changes to a process virtual memory layout required for uffd monitor. When one or more VMAs is removed from the process mm, the external uffd monitor has no way to detect those changes and will attempt to fill the removed regions with userfaultfd_copy. Another problematic event is the exit() of the process. Here again, the external uffd monitor will try to use userfaultfd_copy, although mm owning the memory has already gone. The first patch in the series is a minor cleanup and it's not strictly related to the rest of the series. The patches 2 and 3 below add UFFD_EVENT_UNMAP and UFFD_EVENT_EXIT to allow the uffd monitor track changes in the memory layout of a process. The patches 4 and 5 amend error codes returned by userfaultfd_copy to make the uffd monitor able to cope with races that might occur between delivery of unmap and exit events and outstanding userfaultfd_copy's. This patch (of 5): Commit dc0ef0df7b6a ("mm: make mmap_sem for write waits killable for mm syscalls") replaced call to vm_munmap in munmap syscall with open coded version to allow different waits on mmap_sem in munmap syscall and vm_munmap. Now both functions use down_write_killable, so we can restore the call to vm_munmap from the munmap system call. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: convert remove_migration_pte() to use page_vma_mapped_walk()Kirill A. Shutemov1-61/+41
remove_migration_pte() also can easily be converted to page_vma_mapped_walk(). [[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: drop page_check_address{,_transhuge}Kirill A. Shutemov2-174/+0
All users are gone. Let's drop them. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: convert page_mapped_in_vma() to use page_vma_mapped_walk()Kirill A. Shutemov2-26/+30
For consistency, it worth converting all page_check_address() to page_vma_mapped_walk(), so we could drop the former. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm, uprobes: convert __replace_page() to use page_vma_mapped_walk()Kirill A. Shutemov1-8/+14
For consistency, it worth converting all page_check_address() to page_vma_mapped_walk(), so we could drop the former. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Srikar Dronamraju <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm, ksm: convert write_protect_page() to use page_vma_mapped_walk()Kirill A. Shutemov1-16/+18
For consistency, it worth converting all page_check_address() to page_vma_mapped_walk(), so we could drop the former. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: convert try_to_unmap_one() to use page_vma_mapped_walk()Kirill A. Shutemov2-139/+137
For consistency, it worth converting all page_check_address() to page_vma_mapped_walk(), so we could drop the former. It also makes freeze_page() as we walk though rmap only once. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: convert page_mkclean_one() to use page_vma_mapped_walk()Kirill A. Shutemov1-23/+45
For consistency, it worth converting all page_check_address() to page_vma_mapped_walk(), so we could drop the former. PMD handling here is future-proofing, we don't have users yet. ext4 with huge pages will be the first. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm, rmap: check all VMAs that PTE-mapped THP can be part ofKirill A. Shutemov2-9/+16
Current rmap code can miss a VMA that maps PTE-mapped THP if the first suppage of the THP was unmapped from the VMA. We need to walk rmap for the whole range of offsets that THP covers, not only the first one. vma_address() also need to be corrected to check the range instead of the first subpage. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: fix handling PTE-mapped THPs in page_idle_clear_pte_refs()Kirill A. Shutemov1-17/+17
For PTE-mapped THP page_check_address_transhuge() is not adequate: it cannot find all relevant PTEs, only the first one.i Let's switch it to page_vma_mapped_walk(). I don't think it's subject for stable@: it's not fatal. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Srikar Dronamraju <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: fix handling PTE-mapped THPs in page_referenced()Kirill A. Shutemov1-32/+34
For PTE-mapped THP page_check_address_transhuge() is not adequate: it cannot find all relevant PTEs, only the first one. It means we can miss some references of the page and it can result in suboptimal decisions by vmscan. Let's switch it to page_vma_mapped_walk(). I don't think it's subject for stable@: it's not fatal. The only side effect is that THP can be swapped out when it shouldn't. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: introduce page_vma_mapped_walk()Kirill A. Shutemov4-5/+224
Introduce a new interface to check if a page is mapped into a vma. It aims to address shortcomings of page_check_address{,_transhuge}. Existing interface is not able to handle PTE-mapped THPs: it only finds the first PTE. The rest lefted unnoticed. page_vma_mapped_walk() iterates over all possible mapping of the page in the vma. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24uprobes: split THPs before trying to replace themKirill A. Shutemov1-2/+2
Patch series "Fix few rmap-related THP bugs", v3. The patchset fixes handing PTE-mapped THPs in page_referenced() and page_idle_clear_pte_refs(). To achieve that I've intrdocued new helper -- page_vma_mapped_walk() -- which replaces all page_check_address{,_transhuge}() and covers all THP cases. Patchset overview: - First patch fixes one uprobe bug (unrelated to the rest of the patchset, just spotted it at the same time); - Patches 2-5 fix handling PTE-mapped THPs in page_referenced(), page_idle_clear_pte_refs() and rmap core; - Patches 6-12 convert all page_check_address{,_transhuge}() users (plus remove_migration_pte()) to page_vma_mapped_walk() and drop unused helpers. I think the fixes are not critical enough for stable@ as they don't lead to crashes or hangs, only suboptimal behaviour. This patch (of 12): For THPs page_check_address() always fails. It leads to endless loop in uprobe_write_opcode(). Testcase with huge-tmpfs (uprobes cannot probe anonymous memory). mount -t debugfs none /sys/kernel/debug mount -t tmpfs -o huge=always none /mnt gcc -Wall -O2 -o /mnt/test -x c - <<EOF int main(void) { return 0; } /* Padding to map the code segment with huge pmd */ asm (".zero 2097152"); EOF echo 'p /mnt/test:0' > /sys/kernel/debug/tracing/uprobe_events echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable /mnt/test Let's split THPs before trying to replace. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm/hotplug: enable memory hotplug for non-lru movable pagesYisheng Xie2-13/+23
We had considered all of the non-lru pages as unmovable before commit bda807d44454 ("mm: migrate: support non-lru movable page migration"). But now some of non-lru pages like zsmalloc, virtio-balloon pages also become movable. So we can offline such blocks by using non-lru page migration. This patch straightforwardly adds non-lru migration code, which means adding non-lru related code to the functions which scan over pfn and collect pages to be migrated and isolate them before migration. Signed-off-by: Yisheng Xie <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Hanjun Guo <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Reza Arbab <[email protected]> Cc: Taku Izumi <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Yisheng Xie <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24HWPOISON: soft offlining for non-lru movable pageYisheng Xie1-10/+16
Extend soft offlining framework to support non-lru page, which already support migration after commit bda807d44454 ("mm: migrate: support non-lru movable page migration") When memory corrected errors occur on a non-lru movable page, we can choose to stop using it by migrating data onto another page and disable the original (maybe half-broken) one. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yisheng Xie <[email protected]> Suggested-by: Michal Hocko <[email protected]> Suggested-by: Minchan Kim <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Hanjun Guo <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Reza Arbab <[email protected]> Cc: Taku Izumi <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Xishi Qiu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm/migration: make isolate_movable_page always definedYisheng Xie1-0/+2
Define isolate_movable_page as a static inline function when CONFIG_MIGRATION is not enable. It should return -EBUSY here which means failed to isolate movable pages. This patch do not have any functional change but prepare for later patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yisheng Xie <[email protected]> Acked-by: Minchan Kim <[email protected]> Suggested-by: Michal Hocko <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Hanjun Guo <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Reza Arbab <[email protected]> Cc: Taku Izumi <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Xishi Qiu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm/migration: make isolate_movable_page() return int typeYisheng Xie3-5/+5
Patch series "HWPOISON: soft offlining for non-lru movable page", v6. After Minchan's commit bda807d44454 ("mm: migrate: support non-lru movable page migration"), some type of non-lru page like zsmalloc and virtio-balloon page also support migration. Therefore, we can: 1) soft offlining no-lru movable pages, which means when memory corrected errors occur on a non-lru movable page, we can stop to use it by migrating data onto another page and disable the original (maybe half-broken) one. 2) enable memory hotplug for non-lru movable pages, i.e. we may offline blocks, which include such pages, by using non-lru page migration. This patchset is heavily dependent on non-lru movable page migration. This patch (of 4): Change the return type of isolate_movable_page() from bool to int. It will return 0 when isolate movable page successfully, and return -EBUSY when it isolates failed. There is no functional change within this patch but prepare for later patch. [[email protected]: v6] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yisheng Xie <[email protected]> Suggested-by: Michal Hocko <[email protected]> Acked-by: Minchan Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Hanjun Guo <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Reza Arbab <[email protected]> Cc: Taku Izumi <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xishi Qiu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24z3fold: add kref refcountingVitaly Wool1-86/+69
With both coming and already present locking optimizations, introducing kref to reference-count z3fold objects is the right thing to do. Moreover, it makes buddied list no longer necessary, and allows for a simpler handling of headless pages. [[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vitaly Wool <[email protected]> Reviewed-by: Dan Streetman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24z3fold: use per-page spinlockVitaly Wool1-42/+106
Most of z3fold operations are in-page, such as modifying z3fold page header or moving z3fold objects within a page. Taking per-pool spinlock to protect per-page objects is therefore suboptimal, and the idea of having a per-page spinlock (or rwlock) has been around for some time. This patch implements spinlock-based per-page locking mechanism which is lightweight enough to normally fit ok into the z3fold header. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vitaly Wool <[email protected]> Reviewed-by: Dan Streetman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24z3fold: extend compaction functionVitaly Wool1-1/+25
z3fold_compact_page() currently only handles the situation when there's a single middle chunk within the z3fold page. However it may be worth it to move middle chunk closer to either first or last chunk, whichever is there, if the gap between them is big enough. This patch adds the relevant code, using BIG_CHUNK_GAP define as a threshold for middle chunk to be worth moving. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vitaly Wool <[email protected]> Reviewed-by: Dan Streetman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24z3fold: fix header size related issuesVitaly Wool1-50/+64
Currently the whole kernel build will be stopped if the size of struct z3fold_header is greater than the size of one chunk, which is 64 bytes by default. This patch instead defines the offset for z3fold objects as the size of the z3fold header in chunks. Fixed also are the calculation of num_free_chunks() and the address to move the middle chunk to in case of in-page compaction in z3fold_compact_page(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vitaly Wool <[email protected]> Reviewed-by: Dan Streetman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24z3fold: make pages_nr atomicVitaly Wool1-11/+9
Convert pages_nr per-pool counter to atomic64_t. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vitaly Wool <[email protected]> Reviewed-by: Dan Streetman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: fix get_user_pages() vs device-dax pud mappingsDan Williams1-4/+24
A new unit test for the device-dax 1GB enabling currently fails with this warning before hanging the test thread: WARNING: CPU: 0 PID: 21 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1e3/0x1f0 percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic [..] CPU: 0 PID: 21 Comm: rcuos/1 Tainted: G O 4.10.0-rc7-next-20170207+ #944 [..] Call Trace: dump_stack+0x86/0xc3 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5f/0x80 ? rcu_nocb_kthread+0x27a/0x510 ? dax_pmem_percpu_exit+0x50/0x50 [dax_pmem] percpu_ref_switch_to_atomic_rcu+0x1e3/0x1f0 ? percpu_ref_exit+0x60/0x60 rcu_nocb_kthread+0x339/0x510 ? rcu_nocb_kthread+0x27a/0x510 kthread+0x101/0x140 The get_user_pages() path needs to arrange for references to be taken against the dev_pagemap instance backing the pud mapping. Refactor the existing __gup_device_huge_pmd() to also account for the pud case. Link: http://lkml.kernel.org/r/148653181153.38226.9605457830505509385.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Nilesh Choudhury <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm: replace FAULT_FLAG_SIZE with parameter to huge_faultDave Jiang8-38/+46
Since the introduction of FAULT_FLAG_SIZE to the vm_fault flag, it has been somewhat painful with getting the flags set and removed at the correct locations. More than one kernel oops was introduced due to difficulties of getting the placement correctly. Remove the flag values and introduce an input parameter to huge_fault that indicates the size of the page entry. This makes the code easier to trace and should avoid the issues we see with the fault flags where removal of the flag was necessary in the fallback paths. Link: http://lkml.kernel.org/r/148615748258.43180.1690152053774975329.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <[email protected]> Tested-by: Dan Williams <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Nilesh Choudhury <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24dax: support for transparent PUD pages for device DAXDave Jiang1-0/+48
Add transparent huge PUD pages support for device DAX by adding a pud_fault handler. Link: http://lkml.kernel.org/r/148545060002.17912.6765687780007547551.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Jan Kara <[email protected]> Cc: Dan Williams <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Nilesh Choudhury <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm, x86: add support for PUD-sized transparent hugepagesMatthew Wilcox21-18/+844
The current transparent hugepage code only supports PMDs. This patch adds support for transparent use of PUDs with DAX. It does not include support for anonymous pages. x86 support code also added. Most of this patch simply parallels the work that was done for huge PMDs. The only major difference is how the new ->pud_entry method in mm_walk works. The ->pmd_entry method replaces the ->pte_entry method, whereas the ->pud_entry method works along with either ->pmd_entry or ->pte_entry. The pagewalk code takes care of locking the PUD before calling ->pud_walk, so handlers do not need to worry whether the PUD is stable. [[email protected]: fix SMP x86 32bit build for native_pud_clear()] Link: http://lkml.kernel.org/r/148719066814.31111.3239231168815337012.stgit@djiang5-desk3.ch.intel.com [[email protected]: native_pud_clear missing on i386 build] Link: http://lkml.kernel.org/r/148640375195.69754.3315433724330910314.stgit@djiang5-desk3.ch.intel.com Link: http://lkml.kernel.org/r/148545059381.17912.8602162635537598445.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Matthew Wilcox <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Tested-by: Alexander Kapshuk <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Jan Kara <[email protected]> Cc: Dan Williams <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Nilesh Choudhury <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>