aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-02-26mm: page-flags.h: Typo fix (It -> If)Guo Ren1-2/+2
The "If" was wrongly spelled as "It". Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Guo Ren <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Steven Price <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/zsmalloc.c: use page_private() to access page->privateMiaohe Lin1-1/+1
It's recommended to use helper macro page_private() to access the private field of page. Use such helper to eliminate direct access. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26zsmalloc: account the number of compacted pages correctlyRokudo Yan3-8/+13
There exists multiple path may do zram compaction concurrently. 1. auto-compaction triggered during memory reclaim 2. userspace utils write zram<id>/compaction node So, multiple threads may call zs_shrinker_scan/zs_compact concurrently. But pages_compacted is a per zsmalloc pool variable and modification of the variable is not serialized(through under class->lock). There are two issues here: 1. the pages_compacted may not equal to total number of pages freed(due to concurrently add). 2. zs_shrinker_scan may not return the correct number of pages freed(issued by current shrinker). The fix is simple: 1. account the number of pages freed in zs_compact locally. 2. use actomic variable pages_compacted to accumulate total number. Link: https://lkml.kernel.org/r/[email protected] Fixes: 860c707dca155a56 ("zsmalloc: account the number of compacted pages") Signed-off-by: Rokudo Yan <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/zsmalloc.c: convert to use kmem_cache_zalloc in cache_alloc_zspage()Miaohe Lin1-2/+1
We always memset the zspage allocated via cache_alloc_zspage. So it's more convenient to use kmem_cache_zalloc in cache_alloc_zspage than caller do it manually. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: set the sleep_mapped to true for zbud and z3foldTian Tao2-0/+2
zpool driver adds a flag to indicate whether the zpool driver can enter an atomic context after mapping. This patch sets it true for z3fold and zbud. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tian Tao <[email protected]> Reviewed-by: Vitaly Wool <[email protected]> Acked-by: Sebastian Andrzej Siewior <[email protected]> Reported-by: Mike Galbraith <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Barry Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/zswap: add the flag can_sleep_mappedTian Tao3-5/+62
Patch series "Fix the compatibility of zsmalloc and zswap". Patch #1 adds a flag to zpool, then zswap used to determine if zpool drivers such as zbud/z3fold/zsmalloc will enter an atomic context after mapping. The difference between zbud/z3fold and zsmalloc is that zsmalloc requires an atomic context that since its map function holds a preempt-disabled, but zbud/z3fold don't require an atomic context. So patch #2 sets flag sleep_mapped to true indicating that zbud/z3fold can sleep after mapping. zsmalloc didn't support sleep after mapping, so don't set that flag to true. This patch (of 2): Add a flag to zpool, named is "can_sleep_mapped", and have it set true for zbud/z3fold, not set this flag for zsmalloc, so its default value is false. Then zswap could go the current path if the flag is true; and if it's false, copy data from src to a temporary buffer, then unmap the handle, take the mutex, process the buffer instead of src to avoid sleeping function called from atomic context. [[email protected]: add return value in zswap_frontswap_load] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix potential memory leak] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix potential uninitialized pointer read on tmp] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix variable 'entry' is uninitialized when used] Link: https://lkml.kernel.org/r/[email protected]: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tian Tao <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Vitaly Wool <[email protected]> Acked-by: Sebastian Andrzej Siewior <[email protected]> Reported-by: Mike Galbraith <[email protected]> Cc: Barry Song <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: zswap: clean up confusing commentRandy Dunlap1-3/+3
Correct wording and change one duplicated word (it) to "it is". Link: https://lkml.kernel.org/r/[email protected] Fixes: 0ab0abcf5115 ("mm/zswap: refactor the get/put routines") Signed-off-by: Randy Dunlap <[email protected]> Cc: Weijie Yang <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/rmap: fix potential pte_unmap on an not mapped pteMiaohe Lin1-1/+2
For PMD-mapped page (usually THP), pvmw->pte is NULL. For PTE-mapped THP, pvmw->pte is mapped. But for HugeTLB pages, pvmw->pte is not mapped and set to the relevant page table entry. So in page_vma_mapped_walk_done(), we may do pte_unmap() for HugeTLB pte which is not mapped. Fix this by checking pvmw->page against PageHuge before trying to do pte_unmap(). Link: https://lkml.kernel.org/r/[email protected] Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hongxiang Lou <[email protected]> Signed-off-by: Miaohe Lin <[email protected]> Tested-by: Sedat Dilek <[email protected]> Cc: Kees Cook <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Wei Yang <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Brian Geffon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/rmap: correct obsolete comment of page_get_anon_vma()Miaohe Lin1-2/+2
Since commit 746b18d421da ("mm: use refcounts for page_lock_anon_vma()"), page_lock_anon_vma() is renamed to page_get_anon_vma() and converted to return a refcount increased anon_vma. But it forgot to change the relevant comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/rmap: use page_not_mapped in try_to_unmap()Miaohe Lin1-8/+3
page_mapcount_is_zero() calculates accurately how many mappings a hugepage has in order to check against 0 only. This is a waste of cpu time. We can do this via page_not_mapped() to save some possible atomic_read cycles. Remove the function page_mapcount_is_zero() as it's not used anymore and move page_not_mapped() above try_to_unmap() to avoid identifier undeclared compilation error. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/rmap: fix obsolete comment in __page_check_anon_rmap()Miaohe Lin1-2/+1
Commit 21333b2b66b8 ("ksm: no debug in page_dup_rmap()") has reverted page_dup_rmap() to an inline atomic_inc of mapcount. So page_dup_rmap() does not call __page_check_anon_rmap() anymore. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/rmap: remove unneeded semicolon in page_not_mapped()Miaohe Lin1-1/+1
Remove extra semicolon without any functional change intended. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/rmap: correct some obsolete comments of anon_vmaMiaohe Lin1-2/+2
commit 2b575eb64f7a ("mm: convert anon_vma->lock to a mutex") changed spinlock used to serialize access to vma list to mutex. And further, the commit 5a505085f043 ("mm/rmap: Convert the struct anon_vma::mutex to an rwsem") converted the mutex to an rwsem for solving scalability problem. So replace spinlock with rwsem to make comment uptodate. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/mlock: stop counting mlocked pages when none vma is foundMiaohe Lin1-1/+1
There will be no vma satisfies addr < vm_end when find_vma() returns NULL. Thus it's meaningless to traverse the vma list below because we can't find any vma to count mlocked pages. Stop counting mlocked pages in this case to save some vma list traversal cycles. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26virtio-mem: check against mhp_get_pluggable_range() which memory we can hotplugDavid Hildenbrand1-14/+27
Right now, we only check against MAX_PHYSMEM_BITS - but turns out there are more restrictions of which memory we can actually hotplug, especially om arm64 or s390x once we support them: we might receive something like -E2BIG or -ERANGE from add_memory_driver_managed(), stopping device operation. So, check right when initializing the device which memory we can add, warning the user. Try only adding actually pluggable ranges: in the worst case, no memory provided by our device is pluggable. In the usual case, we expect all device memory to be pluggable, and in corner cases only some memory at the end of the device-managed memory region to not be pluggable. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Anshuman Khandual <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Wei Yang <[email protected]> Cc: teawater <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Catalin Marinas <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26s390/mm: define arch_get_mappable_range()Anshuman Khandual2-1/+14
This overrides arch_get_mappabble_range() on s390 platform which will be used with recently added generic framework. It modifies the existing range check in vmem_add_mapping() using arch_get_mappable_range(). It also adds a VM_BUG_ON() check that would ensure that mhp_range_allowed() has already been called on the hotplug path. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Heiko Carstens <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Mark Rutland <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: teawater <[email protected]> Cc: Wei Yang <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26arm64/mm: define arch_get_mappable_range()Anshuman Khandual1-8/+7
This overrides arch_get_mappable_range() on arm64 platform which will be used with recently added generic framework. It drops inside_linear_region() and subsequent check in arch_add_memory() which are no longer required. It also adds a VM_BUG_ON() check that would ensure that mhp_range_allowed() has already been called. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: teawater <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/memory_hotplug: prevalidate the address range being added with platformAnshuman Khandual3-20/+76
Patch series "mm/memory_hotplug: Pre-validate the address range with platform", v5. This series adds a mechanism allowing platforms to weigh in and prevalidate incoming address range before proceeding further with the memory hotplug. This helps prevent potential platform errors for the given address range, down the hotplug call chain, which inevitably fails the hotplug itself. This mechanism was suggested by David Hildenbrand during another discussion with respect to a memory hotplug fix on arm64 platform. https://lore.kernel.org/linux-arm-kernel/[email protected]/ This mechanism focuses on the addressibility aspect and not [sub] section alignment aspect. Hence check_hotplug_memory_range() and check_pfn_span() have been left unchanged. This patch (of 4): This introduces mhp_range_allowed() which can be called in various memory hotplug paths to prevalidate the address range which is being added, with the platform. Then mhp_range_allowed() calls mhp_get_pluggable_range() which provides applicable address range depending on whether linear mapping is required or not. For ranges that require linear mapping, it calls a new arch callback arch_get_mappable_range() which the platform can override. So the new callback, in turn provides the platform an opportunity to configure acceptable memory hotplug address ranges in case there are constraints. This mechanism will help prevent platform specific errors deep down during hotplug calls. This drops now redundant check_hotplug_memory_addressable() check in __add_pages() but instead adds a VM_BUG_ON() check which would ensure that the range has been validated with mhp_range_allowed() earlier in the call chain. Besides mhp_get_pluggable_range() also can be used by potential memory hotplug callers to avail the allowed physical range which would go through on a given platform. This does not really add any new range check in generic memory hotplug but instead compensates for lost checks in arch_add_memory() where applicable and check_hotplug_memory_addressable(), with unified mhp_range_allowed(). [[email protected]: make pagemap_range() return -EINVAL when mhp_range_allowed() fails] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Vasily Gorbik <[email protected]> # s390 Cc: Will Deacon <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: teawater <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26Documentation: sysfs/memory: clarify some memory block device propertiesDavid Hildenbrand2-28/+41
In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory blocks as removable") we changed the output of the "removable" property of memory devices to return "1" if and only if the kernel supports memory offlining. Let's update documentation, stating that the interface is legacy. Also update documentation of the "state" property and "valid_zones" properties. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Ilya Dryomov <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26drivers/base/memory: don't store phys_device in memory blocksDavid Hildenbrand4-22/+15
No need to store the value for each and every memory block, as we can easily query the value at runtime. Reshuffle the members to optimize the memory layout. Also, let's clarify what the interface once was used for and why it's legacy nowadays. "phys_device" was used on s390x in older versions of lsmem[2]/chmem[3], back when they were still part of s390x-tools. They were later replaced by the variants in linux-utils. For example, RHEL6 and RHEL7 contain lsmem/chmem from s390-utils. RHEL8 switched to versions from util-linux on s390x [4]. "phys_device" was added with sysfs support for memory hotplug in commit 3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove functions") in 2005. It always returned 0. s390x started returning something != 0 on some setups (if sclp.rzm is set by HW) in 2010 via commit 57b552ba0b2f ("memory hotplug/s390: set phys_device"). For s390x, it allowed for identifying which memory block devices belong to the same storage increment (RZM). Only if all memory block devices comprising a single storage increment were offline, the memory could actually be removed in the hypervisor. Since commit e5d709bb5fb7 ("s390/memory hotplug: provide memory_block_size_bytes() function") in 2013 a memory block device spans at least one storage increment - which is why the interface isn't really helpful/used anymore (except by old lsmem/chmem tools). There were once RFC patches to make use of "phys_device" in ACPI context; however, the underlying problem could be solved using different interfaces [1]. [1] https://patchwork.kernel.org/patch/2163871/ [2] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/lsmem [3] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/chmem [4] https://bugzilla.redhat.com/show_bug.cgi?id=1504134 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Ilya Dryomov <[email protected]> Cc: Vaibhav Jain <[email protected]> Cc: Tom Rix <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/memory_hotplug: use helper function zone_end_pfn() to get end_pfnMiaohe Lin1-5/+4
Commit 108bcc96ef70 ("mm: add & use zone_end_pfn() and zone_spans_pfn()") introduced the helper zone_end_pfn() to calculate the zone end pfn. But update_pgdat_span() forgot to use it. Use this helper and rename local variable zone_end_pfn to end_pfn to avoid a naming conflict with the existing zone_end_pfn(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/memory_hotplug: MEMHP_MERGE_RESOURCE -> MHP_MERGE_RESOURCEDavid Hildenbrand5-5/+5
Let's make "MEMHP_MERGE_RESOURCE" consistent with "MHP_NONE", "mhp_t" and "mhp_flags". As discussed recently [1], "mhp" is our internal acronym for memory hotplug now. [1] https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Wei Liu <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: Jason Wang <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/memory_hotplug: rename all existing 'memhp' into 'mhp'Anshuman Khandual3-13/+13
This renames all 'memhp' instances to 'mhp' except for memhp_default_state for being a kernel command line option. This is just a clean up and should not cause a functional change. Let's make it consistent rater than mixing the two prefixes. In preparation for more users of the 'mhp' terminology. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: fix memory_failure() handling of dax-namespace metadataDan Williams3-0/+27
Given 'struct dev_pagemap' spans both data pages and metadata pages be careful to consult the altmap if present to delineate metadata. In fact the pfn_first() helper already identifies the first valid data pfn, so export that helper for other code paths via pgmap_pfn_valid(). Other usage of get_dev_pagemap() are not a concern because those are operating on known data pfns having been looked up by get_user_pages(). I.e. metadata pfns are never user mapped. Link: https://lkml.kernel.org/r/161058501758.1840162.4239831989762604527.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 6100e34b2526 ("mm, memory_failure: Teach memory_failure() about dev_pagemap pages") Signed-off-by: Dan Williams <[email protected]> Reported-by: David Hildenbrand <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: teach pfn_to_online_page() about ZONE_DEVICE section collisionsDan Williams2-7/+65
While pfn_to_online_page() is able to determine pfn_valid() at subsection granularity it is not able to reliably determine if a given pfn is also online if the section is mixes ZONE_{NORMAL,MOVABLE} with ZONE_DEVICE. This means that pfn_to_online_page() may return invalid @page objects. For example with a memory map like: 100000000-1fbffffff : System RAM 142000000-143002e16 : Kernel code 143200000-143713fff : Kernel rodata 143800000-143b15b7f : Kernel data 144227000-144ffffff : Kernel bss 1fc000000-2fbffffff : Persistent Memory (legacy) 1fc000000-2fbffffff : namespace0.0 This command: echo 0x1fc000000 > /sys/devices/system/memory/soft_offline_page ...succeeds when it should fail. When it succeeds it touches an uninitialized page and may crash or cause other damage (see dissolve_free_huge_page()). While the memory map above is contrived via the memmap=ss!nn kernel command line option, the collision happens in practice on shipping platforms. The memory controller resources that decode spans of physical address space are a limited resource. One technique platform-firmware uses to conserve those resources is to share a decoder across 2 devices to keep the address range contiguous. Unfortunately the unit of operation of a decoder is 64MiB while the Linux section size is 128MiB. This results in situations where, without subsection hotplug memory mappings with different lifetimes collide into one object that can only express one lifetime. Update move_pfn_range_to_zone() to flag (SECTION_TAINT_ZONE_DEVICE) a section that mixes ZONE_DEVICE pfns with other online pfns. With SECTION_TAINT_ZONE_DEVICE to delineate, pfn_to_online_page() can fall back to a slow-path check for ZONE_DEVICE pfns in an online section. In the fast path online_section() for a full ZONE_DEVICE section returns false. Because the collision case is rare, and for simplicity, the SECTION_TAINT_ZONE_DEVICE flag is never cleared once set. [[email protected]: fix CONFIG_ZONE_DEVICE=n build] Link: https://lkml.kernel.org/r/CAPcyv4iX+7LAgAeSqx7Zw-Zd=ZV9gBv8Bo7oTbwCOOqJoZ3+Yg@mail.gmail.com Link: https://lkml.kernel.org/r/161058500675.1840162.7887862152161279354.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Dan Williams <[email protected]> Reported-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Reported-by: David Hildenbrand <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: teach pfn_to_online_page() to consider subsection validityDan Williams1-4/+19
pfn_to_online_page is primarily used to filter out offline or fully uninitialized pages. pfn_valid resp. online_section_nr have a coarse per memory section granularity. If a section shared with a partially offline memory (e.g. part of ZONE_DEVICE) then pfn_to_online_page would lead to a false positive on some pfns. Fix this by adding pfn_section_valid check which is subsection aware. [[email protected]: changelog rewrite] Link: https://lkml.kernel.org/r/161058500148.1840162.4365921007820501696.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: b13bc35193d9 ("mm/hotplug: invalid PFNs from pfn_to_online_page()") Signed-off-by: Dan Williams <[email protected]> Reported-by: David Hildenbrand <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Qian Cai <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: move pfn_to_online_page() out of lineDan Williams2-16/+17
Patch series "mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE", v4. A pfn-walker that uses pfn_to_online_page() may inadvertently translate a pfn as online and in the page allocator, when it is offline managed by a ZONE_DEVICE mapping (details in Patch 3: ("mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions")). The 2 proposals under consideration are teach pfn_to_online_page() to be precise in the presence of mixed-zone sections, or teach the memory-add code to drop the System RAM associated with ZONE_DEVICE collisions. In order to not regress memory capacity by a few 10s to 100s of MiB the approach taken in this set is to add precision to pfn_to_online_page(). In the course of validating pfn_to_online_page() a couple other fixes fell out: 1/ soft_offline_page() fails to drop the reference taken in the madvise(..., MADV_SOFT_OFFLINE) case. 2/ memory_failure() uses get_dev_pagemap() to lookup ZONE_DEVICE pages, however that mapping may contain data pages and metadata raw pfns. Introduce pgmap_pfn_valid() to delineate the 2 types and fail the handling of raw metadata pfns. This patch (of 4); pfn_to_online_page() is already too large to be a macro or an inline function. In anticipation of further logic changes / growth, move it out of line. No functional change, just code movement. Link: https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/161058499608.1840162.10165648147615238793.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Reported-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/vmstat.c: erase latency in vmstat_shepherdJiang Biao1-0/+2
Many 100us+ latencies have been deteceted in vmstat_shepherd() on CPX platform which has 208 logic cpus. And vmstat_shepherd is queued every second, which could make the case worse. Add schedule point in vmstat_shepherd() to erase the latency. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiang Biao <[email protected]> Reported-by: Bin Lai <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: vmstat: add some comments on internal storage of byte itemsJohannes Weiner2-0/+18
Byte-accounted items are used for slab object accounting at the cgroup level, because the objects in a slab page can belong to different cgroups. At the global level these items always change in multiples of whole slab pages. The vmstat code exploits this and stores these items as pages internally, which allows for more compact per-cpu data. This optimization isn't self-evident from the asserts and the division in the stat update functions. Provide the reader with some context. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: vmstat: fix NOHZ wakeups for node stat changesJohannes Weiner1-6/+9
On NOHZ, the periodic vmstat flushers on each CPU can go to sleep and won't wake up until stat changes are detected in the per-cpu deltas of the zone vmstat counters. In commit 75ef71840539 ("mm, vmstat: add infrastructure for per-node vmstats") per-node counters were introduced, and subsequently most stats were moved from the zone to the node level. However, the node counters weren't added to the NOHZ wakeup detection. In theory this can cause per-cpu errors to remain in the user-reported stats indefinitely. In practice this only affects a handful of sub counters (file_mapped, dirty and writeback e.g.) because other page state changes at the node level likely involve a change at the zone level as well (alloc and free, lru ops). Also, nobody has complained. Fix it up for completeness: wake up vmstat refreshing on node changes. Also remove the BUILD_BUG_ONs that assert counter size; we haven't relied on it since we added sizeof() to the range calculation in commit 13c9aaf7fa01 ("mm/vmstat.c: fix NUMA statistics updates"). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: cma: print region name on failurePatrick Daly1-2/+2
Print the name of the CMA region for convenience. This is useful information to have when cma_alloc() fails. [[email protected]: print the "count" variable] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Patrick Daly <[email protected]> Signed-off-by: Georgi Djakov <[email protected]> Acked-by: Minchan Kim <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfoDavid Hildenbrand3-2/+20
Let's count the number of CMA pages per zone and print them in /proc/zoneinfo. Having access to the total number of CMA pages per zone is helpful for debugging purposes to know where exactly the CMA pages ended up, and to figure out how many pages of a zone might behave differently, even after some of these pages might already have been allocated. As one example, CMA pages part of a kernel zone cannot be used for ordinary kernel allocations but instead behave more like ZONE_MOVABLE. For now, we are only able to get the global nr+free cma pages from /proc/meminfo and the free cma pages per zone from /proc/zoneinfo. Example after this patch when booting a 6 GiB QEMU VM with "hugetlb_cma=2G": # cat /proc/zoneinfo | grep cma cma 0 nr_free_cma 0 cma 0 nr_free_cma 0 cma 524288 nr_free_cma 493016 cma 0 cma 0 # cat /proc/meminfo | grep Cma CmaTotal: 2097152 kB CmaFree: 1972064 kB Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way, one can be sure when spotting "cma 0", that there are definetly no CMA pages located in a zone. [[email protected]: v2] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: v3] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Wei Yang <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/cma: expose all pages to the buddy if activation of an area failsDavid Hildenbrand1-22/+21
Right now, if activation fails, we might already have exposed some pages to the buddy for CMA use (although they will never get actually used by CMA), and some pages won't be exposed to the buddy at all. Let's check for "single zone" early and on error, don't expose any pages for CMA use - instead, expose them to the buddy available for any use. Simply call free_reserved_page() on every single page - easier than going via free_reserved_area(), converting back and forth between pfns and virt addresses. In addition, make sure to fixup totalcma_pages properly. Example: 6 GiB QEMU VM with "... hugetlb_cma=2G movablecore=20% ...": [ 0.006891] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.006893] cma: Reserved 2048 MiB at 0x0000000100000000 [ 0.006893] hugetlb_cma: reserved 2048 MiB on node 0 ... [ 0.175433] cma: CMA area hugetlb0 could not be activated Before this patch: # cat /proc/meminfo MemTotal: 5867348 kB MemFree: 5692808 kB MemAvailable: 5542516 kB ... CmaTotal: 2097152 kB CmaFree: 1884160 kB After this patch: # cat /proc/meminfo MemTotal: 6077308 kB MemFree: 5904208 kB MemAvailable: 5747968 kB ... CmaTotal: 0 kB CmaFree: 0 kB Note: cma_init_reserved_mem() makes sure that we always cover full pageblocks / MAX_ORDER - 1 pages. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: cma: allocate cma areas bottom-upRoman Gushchin1-0/+17
Currently cma areas without a fixed base are allocated close to the end of the node. This placement is sub-optimal because of compaction: it brings pages into the cma area. In particular, it can bring in hot executable pages, even if there is a plenty of free memory on the machine. This results in cma allocation failures. Instead let's place cma areas close to the beginning of a node. In this case the compaction will help to free cma areas, resulting in better cma allocation success rates. If there is enough memory let's try to allocate bottom-up starting with 4GB to exclude any possible interference with DMA32. On smaller machines or in a case of a failure, stick with the old behavior. 16GB vm, 2GB cma area: With this patch: [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G [ 0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.002930] cma: Reserved 2048 MiB at 0x0000000100000000 [ 0.002931] hugetlb_cma: reserved 2048 MiB on node 0 Without this patch: [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G [ 0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000 [ 0.002934] hugetlb_cma: reserved 2048 MiB on node 0 v2: - switched to memblock_set_bottom_up(true), by Mike - start with 4GB, by Mike [[email protected]: whitespace fix, per Mike] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix 32-bit warnings] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix 32-bit systems] [[email protected]: build fix] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Cc: Wonhyuk Yang <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm,shmem,thp: limit shmem THP allocations to requested zonesRik van Riel1-1/+5
Hugh pointed out that the gma500 driver uses shmem pages, but needs to limit them to the DMA32 zone. Ensure the allocations resulting from the gfp_mask returned by limit_gfp_mask use the zone flags that were originally passed to shmem_getpage_gfp. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rik van Riel <[email protected]> Suggested-by: Hugh Dickins <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xu Yu <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm,thp,shmem: make khugepaged obey tmpfs mount flagsRik van Riel2-6/+18
Currently if thp enabled=[madvise], mounting a tmpfs filesystem with huge=always and mmapping files from that tmpfs does not result in khugepaged collapsing those mappings, despite the mount flag indicating that it should. Fix that by breaking up the blocks of tests in hugepage_vma_check a little bit, and testing things in the correct order. Link: https://lkml.kernel.org/r/[email protected] Fixes: c2231020ea7b ("mm: thp: register mm for khugepaged when merging vma for shmem") Signed-off-by: Rik van Riel <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xu Yu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm,thp,shm: limit gfp mask to no more than specifiedRik van Riel1-0/+21
Matthew Wilcox pointed out that the i915 driver opportunistically allocates tmpfs memory, but will happily reclaim some of its pool if no memory is available. Make sure the gfp mask used to opportunistically allocate a THP is always at least as restrictive as the original gfp mask. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rik van Riel <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xu Yu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm,thp,shmem: limit shmem THP alloc gfp_maskRik van Riel3-6/+10
Patch series "mm,thp,shm: limit shmem THP alloc gfp_mask", v6. The allocation flags of anonymous transparent huge pages can be controlled through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can help the system from getting bogged down in the page reclaim and compaction code when many THPs are getting allocated simultaneously. However, the gfp_mask for shmem THP allocations were not limited by those configuration settings, and some workloads ended up with all CPUs stuck on the LRU lock in the page reclaim code, trying to allocate dozens of THPs simultaneously. This patch applies the same configurated limitation of THPs to shmem hugepage allocations, to prevent that from happening. This way a THP defrag setting of "never" or "defer+madvise" will result in quick allocation failures without direct reclaim when no 2MB free pages are available. With this patch applied, THP allocations for tmpfs will be a little more aggressive than today for files mmapped with MADV_HUGEPAGE, and a little less aggressive for files that are not mmapped or mapped without that flag. This patch (of 4): The allocation flags of anonymous transparent huge pages can be controlled through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can help the system from getting bogged down in the page reclaim and compaction code when many THPs are getting allocated simultaneously. However, the gfp_mask for shmem THP allocations were not limited by those configuration settings, and some workloads ended up with all CPUs stuck on the LRU lock in the page reclaim code, trying to allocate dozens of THPs simultaneously. This patch applies the same configurated limitation of THPs to shmem hugepage allocations, to prevent that from happening. Controlling the gfp_mask of THP allocations through the knobs in sysfs allows users to determine the balance between how aggressively the system tries to allocate THPs at fault time, and how much the application may end up stalling attempting those allocations. This way a THP defrag setting of "never" or "defer+madvise" will result in quick allocation failures without direct reclaim when no 2MB free pages are available. With this patch applied, THP allocations for tmpfs will be a little more aggressive than today for files mmapped with MADV_HUGEPAGE, and a little less aggressive for files that are not mmapped or mapped without that flag. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rik van Riel <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Xu Yu <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: remove pagevec_lookup_entriesMatthew Wilcox (Oracle)3-39/+4
pagevec_lookup_entries() is now just a wrapper around find_get_entries() so remove it and convert all its callers. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: pass pvec directly to find_get_entriesMatthew Wilcox (Oracle)4-20/+13
All callers of find_get_entries() use a pvec, so pass it directly instead of manipulating it in the caller. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: William Kucharski <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: remove nr_entries parameter from pagevec_lookup_entriesMatthew Wilcox (Oracle)3-6/+5
All callers want to fetch the full size of the pvec. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: add an 'end' parameter to pagevec_lookup_entriesMatthew Wilcox (Oracle)3-38/+16
Simplifies the callers and uses the existing functionality in find_get_entries(). We can also drop the final argument of truncate_exceptional_pvec_entries() and simplify the logic in that function. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: add an 'end' parameter to find_get_entriesMatthew Wilcox (Oracle)4-15/+10
This simplifies the callers and leads to a more efficient implementation since the XArray has this functionality already. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: add and use find_lock_entriesMatthew Wilcox (Oracle)4-97/+78
We have three functions (shmem_undo_range(), truncate_inode_pages_range() and invalidate_mapping_pages()) which want exactly this function, so add it to filemap.c. Before this patch, shmem_undo_range() would split any compound page which overlaps either end of the range being punched in both the first and second loops through the address space. After this patch, that functionality is left for the second loop, which is arguably more appropriate since the first loop is supposed to run through all the pages quickly, and splitting a page can sleep. [[email protected]: add assertion] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26iomap: use mapping_seek_hole_dataMatthew Wilcox (Oracle)2-119/+43
Enhance mapping_seek_hole_data() to handle partially uptodate pages and convert the iomap seek code to call it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: William Kucharski <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/filemap: add mapping_seek_hole_dataMatthew Wilcox (Oracle)3-70/+82
Rewrite shmem_seek_hole_data() and move it to filemap.c. [[email protected]: don't put an xa_is_value() page] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/filemap: add helper for finding pagesMatthew Wilcox (Oracle)1-55/+42
There is a lot of common code in find_get_entries(), find_get_pages_range() and find_get_pages_range_tag(). Factor out find_get_entry() which simplifies all three functions. [[email protected]: remove VM_BUG_ON_PAGE()] Link: https://lkml.kernel.org/r/[email protected]: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: William Kucharski <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/filemap: rename find_get_entry to mapping_get_entryMatthew Wilcox (Oracle)1-3/+4
find_get_entry doesn't "find" anything. It returns the entry at a particular index. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: William Kucharski <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm: add FGP_ENTRYMatthew Wilcox (Oracle)5-41/+13
The functionality of find_lock_entry() and find_get_entry() can be provided by pagecache_get_page(), which lets us delete find_lock_entry() and make find_get_entry() static. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: William Kucharski <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-26mm/swap: optimise get_shadow_from_swap_cacheMatthew Wilcox (Oracle)1-3/+1
There's no need to get a reference to the page, just load the entry and see if it's a shadow entry. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: William Kucharski <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>