aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-01-31mm, thp: fix defrag setting if newline is not usedDavid Rientjes1-16/+8
If thp defrag setting "defer" is used and a newline is *not* used when writing to the sysfs file, this is interpreted as the "defer+madvise" option. This is because we do prefix matching and if five characters are written without a newline, the current code ends up comparing to the first five bytes of the "defer+madvise" option and using that instead. Use the more appropriate sysfs_streq() that handles the trailing newline for us. Since this doubles as a nice cleanup, do it in enabled_store() as well. The current implementation relies on prefix matching: the number of bytes compared is either the number of bytes written or the length of the option being compared. With a newline, "defer\n" does not match "defer+"madvise"; without a newline, however, "defer" is considered to match "defer+madvise" (prefix matching is only comparing the first five bytes). End result is that writing "defer" is broken unless it has an additional trailing character. This means that writing "madv" in the past would match and set "madvise". With strict checking, that no longer is the case but it is unlikely anybody is currently doing this. Link: http://lkml.kernel.org/r/[email protected] Fixes: 21440d7eb904 ("mm, thp: add new defer+madvise defrag option") Signed-off-by: David Rientjes <[email protected]> Suggested-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/migrate: add stable check in migrate_vma_insert_page()Ralph Campbell1-0/+12
migrate_vma_insert_page() closely follows the code in: __handle_mm_fault() handle_pte_fault() do_anonymous_page() Add a call to check_stable_address_space() after locking the page table entry before inserting a ZONE_DEVICE private zero page mapping similar to page faulting a new anonymous page. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ralph Campbell <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Chris Down <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/migrate: clean up some minor coding styleRalph Campbell1-21/+13
Fix some comment typos and coding style clean up in preparation for the next patch. No functional changes. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ralph Campbell <[email protected]> Acked-by: Chris Down <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/migrate: remove useless mask of start addressRalph Campbell1-2/+2
Addresses passed to walk_page_range() callback functions are already page aligned and don't need to be masked with PAGE_MASK. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ralph Campbell <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Chris Down <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/huge_memory.c: reduce critical section protected by split_queue_lockWei Yang1-1/+1
split_queue_lock protects data in struct deferred_split. We can release the lock after delete the page from deferred_split_queue. This patch moves the THP accounting out of the lock protection, which is introduced in commit 65c453778aea ("mm, rmap: account shmem thp pages"). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/huge_memory.c: use head to emphasize the purpose of pageWei Yang1-8/+8
During split huge page, it checks the property of the page. Currently we do the check on page and head without emphasizing the check is on the compound page. In case the page passed to split_huge_page_to_list is a tail page, audience would take some time to think about whether the check is on compound page or tail page itself. To make it explicit, use head instead of page for those checks. After this, audience would be more clear about the checks are on compound page and the page is used to do the split and dump error message if failed. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/huge_memory.c: use head to check huge zero pageWei Yang1-1/+1
The page could be a tail page, if this is the case, this BUG_ON will never be triggered. Link: http://lkml.kernel.org/r/[email protected] Fixes: e9b61f19858a ("thp: reintroduce split_huge_page()") Signed-off-by: Wei Yang <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm, oom: dump stack of victim when reaping failedDavid Rientjes1-0/+2
When a process cannot be oom reaped, for whatever reason, currently the list of locks that are held is currently dumped to the kernel log. Much more interesting is the stack trace of the victim that cannot be reaped. If the stack trace is dumped, we have the ability to find related occurrences in the same kernel code and hopefully solve the issue that is making it wedged. Dump the stack trace when a process fails to be oom reaped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Rientjes <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31memblock: Use __func__ in remaining memblock_dbg() call sitesAnshuman Khandual1-4/+4
Replace open function name strings with %s (__func__) in all remaining memblock_dbg() call sites. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/memblock: define memblock_physmem_add()Anshuman Khandual3-14/+19
On the s390 platform memblock.physmem array is being built by directly calling into memblock_add_range() which is a low level function not intended to be used outside of memblock. Hence lets conditionally add helper functions for physmem array when HAVE_MEMBLOCK_PHYS_MAP is enabled. Also use MAX_NUMNODES instead of 0 as node ID similar to memblock_add() and memblock_reserve(). Make memblock_add_range() a static function as it is no longer getting used outside of memblock. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Acked-by: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Collin Walling <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Philipp Rudo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31tools/vm/slabinfo: fix sanity checks enablingDaniel Wagner1-2/+2
The sysfs file name for enabling sanity checking is called 'sanity_checks' and not 'sanity'. The name of the file has never changed since the introduction of the slub allocator. Obviously, most people turn the checks on via the command line option and not during runtime using slabinfo. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Daniel Wagner <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: "Tobin C. Harding" <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONEAlex Shi1-4/+2
Commit 1b2ffb7896ad ("[PATCH] Zone reclaim: Allow modification of zone reclaim behavior")' defined RECLAIM_OFF/RECLAIM_ZONE, but never use them, so better to remove them. [[email protected]: fix sanity checks enabling] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: renumber the bits for neatness] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alex Shi <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: "Tobin C. Harding" <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/vmscan: remove prefetch_prev_lru_pageAlex Shi1-14/+0
This macro was never used in git history. So better to remove. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alex Shi <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/vmscan.c: remove unused return value of shrink_nodeLiu Song1-3/+1
The return value of shrink_node is not used, so remove unnecessary operations. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Liu Song <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm: remove "count" parameter from has_unmovable_pages()David Hildenbrand4-18/+11
Now that the memory isolate notifier is gone, the parameter is always 0. Drop it and cleanup has_unmovable_pages(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Qian Cai <[email protected]> Cc: Pingfan Liu <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Dan Williams <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Wei Yang <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Arun KS <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm: remove the memory isolate notifierDavid Hildenbrand3-80/+4
Luckily, we have no users left, so we can get rid of it. Cleanup set_migratetype_isolate() a little bit. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Dan Williams <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Pingfan Liu <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/page_alloc: skip non present sections on zone initializationKirill A. Shutemov1-1/+27
memmap_init_zone() can be called on the ranges with holes during the boot. It will skip any non-valid PFNs one-by-one. It works fine as long as holes are not too big. But huge holes in the memory map causes a problem. It takes over 20 seconds to walk 32TiB hole. x86-64 with 5-level paging allows for much larger holes in the memory map which would practically hang the system. Deferred struct page init doesn't help here. It only works on the present ranges. Skipping non-present sections would fix the issue. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Baoquan He <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Dan Williams <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: "Jin, Zhi" <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Tatashin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/early_ioremap.c: use %pa to print resource_size_t variablesAndy Shevchenko1-4/+4
%pa takes into consideration the special types such as resource_size_t. Use this specifier %instead of explicit casting. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more()Gustavo A. R. Silva1-0/+1
In case memory resources for _ptr2_ were allocated, release them before return. Notice that in case _ptr1_ happens to be NULL, krealloc() behaves exactly like kmalloc(). Addresses-Coverity-ID: 1490594 ("Resource leak") Link: http://lkml.kernel.org/r/20200123160115.GA4202@embeddedor Fixes: 3f15801cdc23 ("lib: add kasan test module") Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm, tracing: print symbol name for kmem_alloc_node call_site eventsJunyong Sun1-2/+2
Print the call_site ip of kmem_alloc_node using '%pS' to improve the readability of raw slab trace points. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Junyong Sun <[email protected]> Acked-by: Steven Rostedt (VMware) <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Changbin Du <[email protected]> Cc: Tim Murray <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/page_vma_mapped.c: explicitly compare pfn for normal, hugetlbfs and THP pageLi Xinhai1-4/+8
When check_pte, pfn of normal, hugetlbfs and THP page need be compared. The current implementation apply comparison as - normal 4K page: page_pfn <= pfn < page_pfn + 1 - hugetlbfs page: page_pfn <= pfn < page_pfn + HPAGE_PMD_NR - THP page: page_pfn <= pfn < page_pfn + HPAGE_PMD_NR in pfn_in_hpage. For hugetlbfs page, it should be page_pfn == pfn Now, change pfn_in_hpage to pfn_is_match to highlight that comparison is not only for THP and explicitly compare for these cases. No impact upon current behavior, just make the code clear. I think it is important to make the code clear - comparing hugetlbfs page in range page_pfn <= pfn < page_pfn + HPAGE_PMD_NR is confusing. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Li Xinhai <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Mike Kravetz <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/memcontrol.c: cleanup some useless codeKaitao Cheng1-4/+3
Compound pages handling in mem_cgroup_migrate is more convoluted than necessary. The state is duplicated in compound variable and the same could be achieved by PageTransHuge check which is trivial and hpage_nr_pages is already PageTransHuge aware. It is much simpler to just use hpage_nr_pages for nr_pages and replace the local variable by PageTransHuge check directly Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kaitao Cheng <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/swapfile.c: swap_next should increase position indexVasily Averin1-1/+1
If seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. In Aug 2018 NeilBrown noticed commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") "Some ->next functions do not increment *pos when they return NULL... Note that such ->next functions are buggy and should be fixed. A simple demonstration is dd if=/proc/swaps bs=1000 skip=1 Choose any block size larger than the size of /proc/swaps. This will always show the whole last line of /proc/swaps" Described problem is still actual. If you make lseek into middle of last output line following read will output end of last line and whole last line once again. $ dd if=/proc/swaps bs=1 # usual output Filename Type Size Used Priority /dev/dm-0 partition 4194812 97536 -2 104+0 records in 104+0 records out 104 bytes copied $ dd if=/proc/swaps bs=40 skip=1 # last line was generated twice dd: /proc/swaps: cannot skip to specified offset v/dm-0 partition 4194812 97536 -2 /dev/dm-0 partition 4194812 97536 -2 3+1 records in 3+1 records out 131 bytes copied https://bugzilla.kernel.org/show_bug.cgi?id=206283 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vasily Averin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Jann Horn <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Kees Cook <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm, tree-wide: rename put_user_page*() to unpin_user_page*()John Hubbard18-55/+55
In order to provide a clearer, more symmetric API for pinning and unpinning DMA pages. This way, pin_user_pages*() calls match up with unpin_user_pages*() calls, and the API is a lot closer to being self-explanatory. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1"John Hubbard2-4/+11
Fix the gup benchmark flags to use the symbolic FOLL_WRITE, instead of a hard-coded "1" value. Also, clean up the filtering of gup flags a little, by just doing it once before issuing any of the get_user_pages*() calls. This makes it harder to overlook, instead of having little "gup_flags & 1" phrases in the function calls. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31powerpc: book3s64: convert to pin_user_pages() and put_user_page()John Hubbard1-5/+5
1. Convert from get_user_pages() to pin_user_pages(). 2. As required by pin_user_pages(), release these pages via put_user_page(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversionJohn Hubbard1-4/+3
1. Change vfio from get_user_pages_remote(), to pin_user_pages_remote(). 2. Because all FOLL_PIN-acquired pages must be released via put_user_page(), also convert the put_page() call over to put_user_pages_dirty_lock(). Note that this effectively changes the code's behavior in vfio_iommu_type1.c: put_pfn(): it now ultimately calls set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate. As Christoph Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1] [1] https://lore.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Tested-by: Alex Williamson <[email protected]> Acked-by: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversionJohn Hubbard1-7/+4
1. Change v4l2 from get_user_pages() to pin_user_pages(). 2. Because all FOLL_PIN-acquired pages must be released via put_user_page(), also convert the put_page() call over to put_user_pages_dirty_lock(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Acked-by: Hans Verkuil <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31net/xdp: set FOLL_PIN via pin_user_pages()John Hubbard1-1/+1
Convert net/xdp to use the new pin_longterm_pages() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages. In partial anticipation of this work, the net/xdp code was already calling put_user_page() instead of put_page(). Therefore, in order to convert from the get_user_pages()/put_page() model, to the pin_user_pages()/put_user_page() model, the only change required here is to change get_user_pages() to pin_user_pages(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Acked-by: Björn Töpel <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31fs/io_uring: set FOLL_PIN via pin_user_pages()John Hubbard1-1/+1
Convert fs/io_uring to use the new pin_user_pages() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages, and therefore for any code that calls put_user_page(). In partial anticipation of this work, the io_uring code was already calling put_user_page() instead of put_page(). Therefore, in order to convert from the get_user_pages()/put_page() model, to the pin_user_pages()/put_user_page() model, the only change required here is to change get_user_pages() to pin_user_pages(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31drm/via: set FOLL_PIN via pin_user_pages_fast()John Hubbard1-1/+1
Convert drm/via to use the new pin_user_pages_fast() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages, and therefore for any code that calls put_user_page(). In partial anticipation of this work, the drm/via driver was already calling put_user_page() instead of put_page(). Therefore, in order to convert from the get_user_pages()/put_page() model, to the pin_user_pages()/put_user_page() model, the only change required is to change get_user_pages() to pin_user_pages(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Acked-by: Daniel Vetter <[email protected]> Reviewed-by: Jérôme Glisse <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()John Hubbard1-13/+15
Convert process_vm_access to use the new pin_user_pages_remote() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages. Also, release the pages via put_user_page*(). Also, rename "pages" to "pinned_pages", as this makes for easier reading of process_vm_rw_single_vec(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jérôme Glisse <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODPJohn Hubbard8-14/+13
Convert infiniband to use the new pin_user_pages*() calls. Also, revert earlier changes to Infiniband ODP that had it using put_user_page(). ODP is "Case 3" in Documentation/core-api/pin_user_pages.rst, which is to say, normal get_user_pages() and put_page() is the API to use there. The new pin_user_pages*() calls replace corresponding get_user_pages*() calls, and set the FOLL_PIN flag. The FOLL_PIN flag requires that the caller must return the pages via put_user_page*() calls, but infiniband was already doing that as part of an earlier commit. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31goldish_pipe: convert to pin_user_pages() and put_user_page()John Hubbard1-14/+3
1. Call the new global pin_user_pages_fast(), from pin_goldfish_pages(). 2. As required by pin_user_pages(), release these pages via put_user_page(). In this case, do so via put_user_pages_dirty_lock(). That has the side effect of calling set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate. As Christoph Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1] Another side effect is that the release code is simplified because the page[] loop is now in gup.c instead of here, so just delete the local release_user_pages() entirely, and call put_user_pages_dirty_lock() directly, instead. [1] https://lore.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/gup: introduce pin_user_pages*() and FOLL_PINJohn Hubbard4-34/+426
Introduce pin_user_pages*() variations of get_user_pages*() calls, and also pin_longterm_pages*() variations. For now, these are placeholder calls, until the various call sites are converted to use the correct get_user_pages*() or pin_user_pages*() API. These variants will eventually all set FOLL_PIN, which is also introduced, and thoroughly documented. pin_user_pages() pin_user_pages_remote() pin_user_pages_fast() All pages that are pinned via the above calls, must be unpinned via put_user_page(). The underlying rules are: * FOLL_PIN is a gup-internal flag, so the call sites should not directly set it. That behavior is enforced with assertions. * Call sites that want to indicate that they are going to do DirectIO ("DIO") or something with similar characteristics, should call a get_user_pages()-like wrapper call that sets FOLL_PIN. These wrappers will: * Start with "pin_user_pages" instead of "get_user_pages". That makes it easy to find and audit the call sites. * Set FOLL_PIN * For pages that are received via FOLL_PIN, those pages must be returned via put_user_page(). Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases in this documentation. (I've reworded it and expanded upon it.) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> [Documentation] Reviewed-by: Jérôme Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31media/v4l2-core: set pages dirty upon releasing DMA buffersJohn Hubbard1-1/+4
After DMA is complete, and the device and CPU caches are synchronized, it's still required to mark the CPU pages as dirty, if the data was coming from the device. However, this driver was just issuing a bare put_page() call, without any set_page_dirty*() call. Fix the problem, by calling set_page_dirty_lock() if the CPU pages were potentially receiving data from the device. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Hans Verkuil <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31IB/umem: use get_user_pages_fast() to pin DMA pagesJohn Hubbard1-11/+6
And get rid of the mmap_sem calls, as part of that. Note that get_user_pages_fast() will, if necessary, fall back to __gup_longterm_unlocked(), which takes the mmap_sem as needed. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/gup: allow FOLL_FORCE for get_user_pages_fast()John Hubbard1-1/+2
Commit 817be129e6f2 ("mm: validate get_user_pages_fast flags") allowed only FOLL_WRITE and FOLL_LONGTERM to be passed to get_user_pages_fast(). This, combined with the fact that get_user_pages_fast() falls back to "slow gup", which *does* accept FOLL_FORCE, leads to an odd situation: if you need FOLL_FORCE, you cannot call get_user_pages_fast(). There does not appear to be any reason for filtering out FOLL_FORCE. There is nothing in the _fast() implementation that requires that we avoid writing to the pages. So it appears to have been an oversight. Fix by allowing FOLL_FORCE to be set for get_user_pages_fast(). Link: http://lkml.kernel.org/r/[email protected] Fixes: 817be129e6f2 ("mm: validate get_user_pages_fast flags") Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() callJohn Hubbard1-25/+5
Update VFIO to take advantage of the recently loosened restriction on FOLL_LONGTERM with get_user_pages_remote(). Also, now it is possible to fix a bug: the VFIO caller is logically a FOLL_LONGTERM user, but it wasn't setting FOLL_LONGTERM. Also, remove an unnessary pair of calls that were releasing and reacquiring the mmap_sem. There is no need to avoid holding mmap_sem just in order to call page_to_pfn(). Also, now that the the DAX check ("if a VMA is DAX, don't allow long term pinning") is in the internals of get_user_pages_remote() and __gup_longterm_locked(), there's no need for it at the VFIO call site. So remove it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Tested-by: Alex Williamson <[email protected]> Acked-by: Alex Williamson <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Suggested-by: Jason Gunthorpe <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERMJohn Hubbard1-82/+92
As it says in the updated comment in gup.c: current FOLL_LONGTERM behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on vmas. However, the corresponding restriction in get_user_pages_remote() was slightly stricter than is actually required: it forbade all FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers that do not set the "locked" arg. Update the code and comments to loosen the restriction, allowing FOLL_LONGTERM in some cases. Also, copy the DAX check ("if a VMA is DAX, don't allow long term pinning") from the VFIO call site, all the way into the internals of get_user_pages_remote() and __gup_longterm_locked(). That is: get_user_pages_remote() calls __gup_longterm_locked(), which in turn calls check_dax_vmas(). This check will then be removed from the VFIO call site in a subsequent patch. Thanks to Jason Gunthorpe for pointing out a clean way to fix this, and to Dan Williams for helping clarify the DAX refactoring. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Tested-by: Alex Williamson <[email protected]> Acked-by: Alex Williamson <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Suggested-by: Jason Gunthorpe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31goldish_pipe: rename local pin_user_pages() routineJohn Hubbard1-9/+9
Avoid naming conflicts: rename local static function from "pin_user_pages()" to "goldfish_pin_pages()". An upcoming patch will introduce a global pin_user_pages() function. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jérôme Glisse <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pagesJohn Hubbard3-20/+40
An upcoming patch changes and complicates the refcounting and especially the "put page" aspects of it. In order to keep everything clean, refactor the devmap page release routines: * Rename put_devmap_managed_page() to page_is_devmap_managed(), and limit the functionality to "read only": return a bool, with no side effects. * Add a new routine, put_devmap_managed_page(), to handle decrementing the refcount for ZONE_DEVICE pages. * Change callers (just release_pages() and put_page()) to check page_is_devmap_managed() before calling the new put_devmap_managed_page() routine. This is a performance point: put_page() is a hot path, so we need to avoid non- inline function calls where possible. * Rename __put_devmap_managed_page() to free_devmap_managed_page(), and limit the functionality to unconditionally freeing a devmap page. This is originally based on a separate patch by Ira Weiny, which applied to an early version of the put_user_page() experiments. Since then, Jérôme Glisse suggested the refactoring described above. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ira Weiny <[email protected]> Signed-off-by: John Hubbard <[email protected]> Suggested-by: Jérôme Glisse <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm: Cleanup __put_devmap_managed_page() vs ->page_free()Dan Williams2-42/+44
After the removal of the device-public infrastructure there are only 2 ->page_free() call backs in the kernel. One of those is a device-private callback in the nouveau driver, the other is a generic wakeup needed in the DAX case. In the hopes that all ->page_free() callbacks can be migrated to common core kernel functionality, move the device-private specific actions in __put_devmap_managed_page() under the is_device_private_page() conditional, including the ->page_free() callback. For the other page types just open-code the generic wakeup. Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA case. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]> Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jérôme Glisse <[email protected]> Cc: Jan Kara <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/gup: move try_get_compound_head() to top, fix minor issuesJohn Hubbard1-14/+15
An upcoming patch uses try_get_compound_head() more widely, so move it to the top of gup.c. Also fix a tiny spelling error and a checkpatch.pl warning. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/gup: factor out duplicate code from four routinesJohn Hubbard1-55/+40
Patch series "mm/gup: prereqs to track dma-pinned pages: FOLL_PIN", v12. Overview: This is a prerequisite to solving the problem of proper interactions between file-backed pages, and [R]DMA activities, as discussed in [1], [2], [3], and in a remarkable number of email threads since about 2017. :) A new internal gup flag, FOLL_PIN is introduced, and thoroughly documented in the last patch's Documentation/vm/pin_user_pages.rst. I believe that this will provide a good starting point for doing the layout lease work that Ira Weiny has been working on. That's because these new wrapper functions provide a clean, constrained, systematically named set of functionality that, again, is required in order to even know if a page is "dma-pinned". In contrast to earlier approaches, the page tracking can be incrementally applied to the kernel call sites that, until now, have been simply calling get_user_pages() ("gup"). In other words, opt-in by changing from this: get_user_pages() (sets FOLL_GET) put_page() to this: pin_user_pages() (sets FOLL_PIN) unpin_user_page() Testing: * I've done some overall kernel testing (LTP, and a few other goodies), and some directed testing to exercise some of the changes. And as you can see, gup_benchmark is enhanced to exercise this. Basically, I've been able to runtime test the core get_user_pages() and pin_user_pages() and related routines, but not so much on several of the call sites--but those are generally just a couple of lines changed, each. Not much of the kernel is actually using this, which on one hand reduces risk quite a lot. But on the other hand, testing coverage is low. So I'd love it if, in particular, the Infiniband and PowerPC folks could do a smoke test of this series for me. Runtime testing for the call sites so far is pretty light: * io_uring: Some directed tests from liburing exercise this, and they pass. * process_vm_access.c: A small directed test passes. * gup_benchmark: the enhanced version hits the new gup.c code, and passes. * infiniband: Ran rdma-core tests: rdma-core/build/bin/run_tests.py * VFIO: compiles (I'm vowing to set up a run time test soon, but it's not ready just yet) * powerpc: it compiles... * drm/via: compiles... * goldfish: compiles... * net/xdp: compiles... * media/v4l2: compiles... [1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/ [2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/ [3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/ This patch (of 22): There are four locations in gup.c that have a fair amount of code duplication. This means that changing one requires making the same changes in four places, not to mention reading the same code four times, and wondering if there are subtle differences. Factor out the common code into static functions, thus reducing the overall line count and the code's complexity. Also, take the opportunity to slightly improve the efficiency of the error cases, by doing a mass subtraction of the refcount, surrounded by get_page()/put_page(). Also, further simplify (slightly), by waiting until the the successful end of each routine, to increment *nr. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jérôme Glisse <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/gup.c: use is_vm_hugetlb_page() to check whether to follow hugeWei Yang1-2/+2
No functional change, just leverage the helper function to improve readability as others. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Ralph Campbell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm: fix gup_pud_rangeQiujun Huang1-1/+1
sorry for not processing for a long time. I met it again. patch v1 https://lkml.org/lkml/2019/9/20/656 do_machine_check() do_memory_failure() memory_failure() hw_poison_user_mappings() try_to_unmap() pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); ...and now we have a swap entry that indicates that the page entry refers to a bad (and poisoned) page of memory, but gup_fast() at this level of the page table was ignoring swap entries, and incorrectly assuming that "!pxd_none() == valid and present". And this was not just a poisoned page problem, but a generaly swap entry problem. So, any swap entry type (device memory migration, numa migration, or just regular swapping) could lead to the same problem. Fix this by checking for pxd_present(), instead of pxd_none(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Qiujun Huang <[email protected]> Cc: John Hubbard <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/filemap.c: clean up filemap_write_and_wait()Ira Weiny2-29/+11
At some point filemap_write_and_wait() and filemap_write_and_wait_range() got the exact same implementation with the exception of the range being specified in *_range() Similar to other functions in fs.h which call *_range(..., 0, LLONG_MAX), change filemap_write_and_wait() to be a static inline which calls filemap_write_and_wait_range() Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ira Weiny <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/debug.c: always print flags in dump_page()Vlastimil Babka1-3/+5
Commit 76a1850e4572 ("mm/debug.c: __dump_page() prints an extra line") inadvertently removed printing of page flags for pages that are neither anon nor ksm nor have a mapping. Fix that. Using pr_cont() again would be a solution, but the commit explicitly removed its use. Avoiding the danger of mixing up split lines from multiple CPUs might be beneficial for near-panic dumps like this, so fix this without reintroducing pr_cont(). Link: http://lkml.kernel.org/r/[email protected] Fixes: 76a1850e4572 ("mm/debug.c: __dump_page() prints an extra line") Signed-off-by: Vlastimil Babka <[email protected]> Reported-by: Anshuman Khandual <[email protected]> Reported-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Qian Cai <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Dan Williams <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Ralph Campbell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/kmemleak: turn kmemleak_lock and object->lock to raw_spinlock_tHe Zhe1-56/+56
kmemleak_lock as a rwlock on RT can possibly be acquired in atomic context which does work. Since the kmemleak operation is performed in atomic context make it a raw_spinlock_t so it can also be acquired on RT. This is used for debugging and is not enabled by default in a production like environment (where performance/latency matters) so it makes sense to make it a raw_spinlock_t instead trying to get rid of the atomic context. Turn also the kmemleak_object->lock into raw_spinlock_t which is acquired (nested) while the kmemleak_lock is held. The time spent in "echo scan > kmemleak" slightly improved on 64core box with this patch applied after boot. [[email protected]: redo the description, update comments. Merge the individual bits: He Zhe did the kmemleak_lock, Liu Haitao the ->lock and Yongxin Liu forwarded Liu's patch.] Link: http://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: He Zhe <[email protected]> Signed-off-by: Liu Haitao <[email protected]> Signed-off-by: Yongxin Liu <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>