aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-04-28selftests: vm: fix shellcheck warnings in run_vmtests.shAxel Rasmussen1-28/+27
These might not be issues yet, but they make the script more fragile. Also by fixing them we give a better example to future readers, who might copy/paste or otherwise re-use snippets from our script. - Use "read -r", since we don't ever want read to be interpreting '\' characters as escape sequences... - Quote variables, to deal with spaces properly. - Use $() instead of the older and harder-to-nest ``. - Get rid of superfluous "$" prefixes inside arithmetic $(()). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28selftests: vm: refactor run_vmtests.sh to reduce boilerplateAxel Rasmussen1-415/+64
Previously, each test printed out its own header, dealt with its own return code, etc. By just putting this standard stuff in a function, we can delete > 300 lines from the script. This also makes adding future tests easier. And, it gets rid of various inconsistencies that already exist: - Some tests correctly deal with ksft_skip, but others don't. - Some tests just print the executable name, others print arguments, and yet others print some comment in the header. - Most tests print out a header with two separator lines, but not the HMM smoke test or the memfd_secret test, which only print one. - We had a redundant "exit" at the end, with all the boilerplate it's an easy oversight. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28selftests: vm: add test for Soft-Dirty PTE bitGabriel Krisman Bertazi4-0/+150
This introduces three tests: 1) Sanity check soft dirty basic semantics: allocate area, clean, dirty, check if the SD bit is flipped. 2) Check VMA reuse: validate the VM_SOFTDIRTY usage 3) Check soft-dirty on huge pages This was motivated by Will Deacon's fix commit 912efa17e512 ("mm: proc: Invalidate TLB after clearing soft-dirty page state"). I was tracking the same issue that he fixed, and this test would have caught it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Cc: Will Deacon <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28selftests: vm: bring common functions to a new fileMuhammad Usama Anjum5-113/+124
Bring common functions to a new file while keeping code as much same as possible. These functions can be used in the new tests. This helps in avoiding code duplication. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muhammad Usama Anjum <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Gabriel Krisman Bertazi <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28tools/testing/selftests/vm/gup_test.c: clarify error statementSidhartha Kumar2-11/+44
Print three possible reasons /sys/kernel/debug/gup_test cannot be opened to help users of this test diagnose failures. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm: simplify follow_invalidate_pte()Muchun Song2-61/+23
The only user (DAX) of range and pmdpp parameters of follow_invalidate_pte() is gone, it is safe to remove them and make it static to simlify the code. This is revertant of the following commits: 097963959594 ("mm: add follow_pte_pmd()") a4d1a8852513 ("dax: update to new mmu_notifier semantic") There is only one caller of the follow_invalidate_pte(). So just fold it into follow_pte() and remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Al Viro <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Xiongchun Duan <[email protected]> Cc: Xiyu Yang <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28dax: fix missing writeprotect the pte entryMuchun Song1-87/+12
Currently dax_mapping_entry_mkclean() fails to clean and write protect the pte entry within a DAX PMD entry during an *sync operation. This can result in data loss in the following sequence: 1) process A mmap write to DAX PMD, dirtying PMD radix tree entry and making the pmd entry dirty and writeable. 2) process B mmap with the @offset (e.g. 4K) and @length (e.g. 4K) write to the same file, dirtying PMD radix tree entry (already done in 1)) and making the pte entry dirty and writeable. 3) fsync, flushing out PMD data and cleaning the radix tree entry. We currently fail to mark the pte entry as clean and write protected since the vma of process B is not covered in dax_entry_mkclean(). 4) process B writes to the pte. These don't cause any page faults since the pte entry is dirty and writeable. The radix tree entry remains clean. 5) fsync, which fails to flush the dirty PMD data because the radix tree entry was clean. 6) crash - dirty data that should have been fsync'd as part of 5) could still have been in the processor cache, and is lost. Just to use pfn_mkclean_range() to clean the pfns to fix this issue. Link: https://lkml.kernel.org/r/[email protected] Fixes: 4b4bb46d00b3 ("dax: clear dirty entry tags on cache flush") Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Al Viro <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Xiongchun Duan <[email protected]> Cc: Xiyu Yang <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm: pvmw: add support for walking devmap pagesMuchun Song1-8/+9
The devmap pages can not use page_vma_mapped_walk() to check if a huge devmap page is mapped into a vma. Add support for walking huge devmap pages so that DAX can use it in the next patch. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Al Viro <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Xiongchun Duan <[email protected]> Cc: Xiyu Yang <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm: rmap: introduce pfn_mkclean_range() to cleans PTEsMuchun Song3-20/+74
The page_mkclean_one() is supposed to be used with the pfn that has a associated struct page, but not all the pfns (e.g. DAX) have a struct page. Introduce a new function pfn_mkclean_range() to cleans the PTEs (including PMDs) mapped with range of pfns which has no struct page associated with them. This helper will be used by DAX device in the next patch to make pfns clean. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Al Viro <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Xiongchun Duan <[email protected]> Cc: Xiyu Yang <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28dax: fix cache flush on PMD-mapped pagesMuchun Song1-1/+2
The flush_cache_page() only remove a PAGE_SIZE sized range from the cache. However, it does not cover the full pages in a THP except a head page. Replace it with flush_cache_range() to fix this issue. This is just a documentation issue with the respect to properly documenting the expected usage of cache flushing before modifying the pmd. However, in practice this is not a problem due to the fact that DAX is not available on architectures with virtually indexed caches per: commit d92576f1167c ("dax: does not work correctly with virtual aliasing caches") Link: https://lkml.kernel.org/r/[email protected] Fixes: f729c8c9b24f ("dax: wrprotect pmd_t in dax_mapping_entry_mkclean") Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Al Viro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Xiongchun Duan <[email protected]> Cc: Xiyu Yang <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm: rmap: fix cache flush on THP pagesMuchun Song1-1/+2
Patch series "Fix some bugs related to ramp and dax", v7. Patch 1-2 fix a cache flush bug, because subsequent patches depend on those on those changes, there are placed in this series. Patch 3-4 are preparation for fixing a dax bug in patch 5. Patch 6 is code cleanup since the previous patch removes the usage of follow_invalidate_pte(). This patch (of 6): The flush_cache_page() only remove a PAGE_SIZE sized range from the cache. However, it does not cover the full pages in a THP except a head page. Replace it with flush_cache_range() to fix this issue. At least, no problems were found due to this. Maybe because the architectures that have virtual indexed caches is less. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()") Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Yang Shi <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Jan Kara <[email protected]> Cc: Al Viro <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Yang Shi <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Xiyu Yang <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Xiongchun Duan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/madvise: fix potential pte_unmap_unlock pte errorMiaohe Lin1-4/+4
We can't assume pte_offset_map_lock will return same orig_pte value. So it's necessary to reacquire the orig_pte or pte_unmap_unlock will unmap the stale pte. Link: https://lkml.kernel.org/r/[email protected] Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD") Fixes: 854e9ed09ded ("mm: support madvise(MADV_FREE)") Signed-off-by: Miaohe Lin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm: untangle config dependencies for demote-on-reclaimOscar Salvador3-26/+21
At the time demote-on-reclaim was introduced, it was tied to CONFIG_HOTPLUG_CPU + CONFIG_MIGRATE, but that is not really accurate. The only two things we need to depend on are CONFIG_NUMA + CONFIG_MIGRATE, so clean this up. Furthermore, we only register the hotplug memory notifier when the system has CONFIG_MEMORY_HOTPLUG. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Suggested-by: "Huang, Ying" <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Abhishek Goel <[email protected]> Cc: Baolin Wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm: migrate: simplify the refcount validation when migrating hugetlb mappingBaolin Wang1-5/+0
There is no need to validate the hugetlb page's refcount before trying to freeze the hugetlb page's expected refcount, instead we can just rely on the page_ref_freeze() to simplify the validation. Moreover we are always under the page lock when migrating the hugetlb page mapping, which means nowhere else can remove it from the page cache, so we can remove the xas_load() validation under the i_pages lock. Link: https://lkml.kernel.org/r/eb2fbbeaef2b1714097b9dec457426d682ee0635.1649676424.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <[email protected]> Acked-by: Mike Kravetz <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Muchun Song <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/migration: fix possible do_pages_stat_array racing with memory offlineMiaohe Lin1-2/+7
When follow_page peeks a page, the page could be migrated and then be offlined while it's still being used by the do_pages_stat_array(). Use FOLL_GET to hold the page refcnt to fix this potential race. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Acked-by: "Huang, Ying" <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/migration: fix potential invalid node access for reclaim-based migrationMiaohe Lin1-3/+3
If we failed to setup hotplug state callbacks for mm/demotion:online in some corner cases, node_demotion will be left uninitialized. Invalid node might be returned from the next_demotion_node() when doing reclaim-based migration. Use kcalloc to allocate node_demotion to fix the issue. Link: https://lkml.kernel.org/r/[email protected] Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes demotion") Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Muchun Song <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/migration: fix potential page refcounts leak in migrate_pagesMiaohe Lin1-0/+8
In -ENOMEM case, there might be some subpages of fail-to-migrate THPs left in thp_split_pages list. We should move them back to migration list so that they could be put back to the right list by the caller otherwise the page refcnt will be leaked here. Also adjust nr_failed and nr_thp_failed accordingly to make vm events account more accurate. Link: https://lkml.kernel.org/r/[email protected] Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()") Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Alistair Popple <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/migration: remove some duplicated codes in migrate_pagesMiaohe Lin1-24/+12
Remove the duplicated codes in migrate_pages to simplify the code. Minor readability improvement. No functional change intended. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Cc: Alistair Popple <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/migration: avoid unneeded nodemask_t initializationMiaohe Lin1-2/+2
Avoid unneeded next_pass and this_pass initialization as they're always set before using to save possible cpu cycles when there are plenty of nodes in the system. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Cc: Alistair Popple <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/migration: use helper macro min in do_pages_statMiaohe Lin1-6/+2
We could use helper macro min to help set the chunk_nr to simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Baolin Wang <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/migration: use helper function vma_lookup() in add_page_for_migrationMiaohe Lin1-2/+2
We could use helper function vma_lookup() to lookup the needed vma to simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Baolin Wang <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Muchun Song <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/migration: remove unneeded local variable page_lruMiaohe Lin1-3/+1
We can use page_is_file_lru() directly to help account the isolated pages to simplify the code a bit. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Baolin Wang <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Muchun Song <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/migration: remove unneeded local variable mapping_lockedMiaohe Lin1-4/+2
Patch series "A few cleanup and fixup patches for migration", v2. This series contains a few patches to remove unneeded variables, jump label and use helper to simplify the code. Also we fix some bugs such as page refcounts leak , invalid node access and so on. More details can be found in the respective changelogs. This patch (of 11): When mapping_locked is true, TTU_RMAP_LOCKED is always set to ttu. We can check ttu instead so mapping_locked can be removed. And ttu is either 0 or TTU_RMAP_LOCKED now. Change '|=' to '=' to reflect this. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Zi Yan <[email protected]> Cc: Baolin Wang <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Muchun Song <[email protected]> Cc: Alistair Popple <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm: add selftests for migration entriesAlistair Popple2-0/+196
Add some basic migration tests and in particular tests that will stress both the pte and pmd migration entry wait paths. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Muchun Song <[email protected]> Cc: John Hubbard <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/mempolicy: clean up the code logic in queue_pages_pte_rangeMiaohe Lin1-9/+3
Since commit e5947d23edd8 ("mm: mempolicy: don't have to split pmd for huge zero page"), THP is never splited in queue_pages_pmd. Thus 2 is never returned now. We can remove such unnecessary ret != 2 check and clean up the relevant comment. Minor improvements in readability. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Zi Yan <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28drivers/base/node.c: fix compaction sysfs file leakMiaohe Lin1-0/+1
Compaction sysfs file is created via compaction_register_node in register_node. But we forgot to remove it in unregister_node. Thus compaction sysfs file is leaked. Using compaction_unregister_node to fix this issue. Link: https://lkml.kernel.org/r/[email protected] Fixes: ed4a6d7f0676 ("mm: compaction: add /sys trigger for per-node memory compaction") Signed-off-by: Miaohe Lin <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm: compaction: use helper isolation_suitable()Miaohe Lin1-1/+1
Use helper isolation_suitable() to check whether page is suitable to isolate to simplify the code. Minor readability improvement. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/z3fold: remove unneeded PAGE_HEADLESS check in free_handle()Miaohe Lin1-3/+0
The only caller z3fold_free() never calls free_handle() in PAGE_HEADLESS case. Remove this unneeded check. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/z3fold: remove redundant list_del_init of zhdr->buddy in z3fold_freeMiaohe Lin1-3/+0
do_compact_page() will do list_del_init(&zhdr->buddy) for us. Remove this extra one to save some possible cpu cycles. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/z3fold: move decrement of pool->pages_nr into __release_z3fold_page()Miaohe Lin1-29/+12
The z3fold will always do atomic64_dec(&pool->pages_nr) when the __release_z3fold_page() is called. Thus we can move decrement of pool->pages_nr into __release_z3fold_page() to simplify the code. Also we can reduce the size of z3fold.o ~1k. Without this patch: text data bss dec hex filename 15444 1376 8 16828 41bc mm/z3fold.o With this patch: text data bss dec hex filename 15044 1248 8 16300 3fac mm/z3fold.o Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/z3fold: remove confusing local variable l reassignmentMiaohe Lin1-1/+0
The local variable l holds the address of unbuddied[i] which won't change after we take the pool lock. Remove it to avoid confusion. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/z3fold: remove unneeded page_mapcount_reset and ClearPagePrivateMiaohe Lin1-3/+0
Page->page_type and PagePrivate are not used in z3fold. We should remove these confusing unneeded operations. The z3fold do these here is due to referring to zsmalloc's migration code which does need these operations. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/z3fold: minor clean up for z3fold_freeMiaohe Lin1-4/+4
Use put_z3fold_header() to pair with get_z3fold_header. Also fix the wrong comments. Minor readability improvement. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/z3fold: remove obsolete comment in z3fold_allocMiaohe Lin1-3/+0
The highmem pages are supported since commit f1549cb5ab2b ("mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"). Remove the residual comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/z3fold: declare z3fold_mount with __initMiaohe Lin1-1/+1
Patch series "A few cleanup patches for z3fold", v2. This series contains a few patches to simplify the code, remove unneeded code, fix obsolete comment and so on. More details can be found in the respective changelogs. This patch (of 8): z3fold_mount is only called during init. So we should declare it with __init. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28fs/proc/task_mmu.c: remove redundant page validation of pte_pageXianting Tian1-2/+0
pte_page() always returns a valid page, so remove the redundant page validation, as we did in many other places. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Xianting Tian <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Yang Shi <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/vmscan: fix comment for isolate_lru_pagesMiaohe Lin1-2/+2
Since commit 791b48b64232 ("mm: vmscan: scan until it finds eligible pages"), splicing any skipped pages to the tail of the LRU list won't put the system at risk of premature OOM but will waste lots of cpu cycles. Correct the comment accordingly. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/vmscan: fix comment for current_may_throttleMiaohe Lin1-4/+3
Since commit 6d6435811c19 ("remove bdi_congested() and wb_congested() and related functions"), there is no congested backing device check anymore. Correct the comment accordingly. [[email protected]: tweak grammar] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/vmscan: remove obsolete comment in get_scan_countMiaohe Lin1-3/+1
Since commit 1431d4d11abb ("mm: base LRU balancing on an explicit cost model"), the relative value of each set of LRU lists is based on cost model instead of rotated/scanned ratio. Cleanup the relevant comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/vmscan: sc->reclaim_idx must be a valid zone indexWei Yang1-2/+2
lruvec_lru_size() is only used in get_scan_count(), so the only possible zone_idx is sc->reclaim_idx. Since sc->reclaim_idx is ensured to be a valid zone idex, we can remove the extra check for zone iteration. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/vmscan: make sure wakeup_kswapd with managed zoneWei Yang2-3/+5
wakeup_kswapd() only wake up kswapd when the zone is managed. For two callers of wakeup_kswapd(), they are node perspective. * wake_all_kswapds * numamigrate_isolate_page If we picked up a !managed zone, this is not we expected. This patch makes sure we pick up a managed zone for wakeup_kswapd(). And it also use managed_zone in migrate_balanced_pgdat() to get the proper zone. [[email protected]: adjust the usage in migrate_balanced_pgdat()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/vmscan: reclaim only affects managed_zonesWei Yang1-2/+2
As mentioned in commit 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator") , reclaim only affects managed_zones. Let's adjust the code and comment accordingly. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28arm64: mm: hugetlb: enable HUGETLB_PAGE_FREE_VMEMMAP for arm64Muchun Song2-0/+15
The feature of minimizing overhead of struct page associated with each HugeTLB page aims to free its vmemmap pages (used as struct page) to save memory, where is ~14GB/16GB per 1TB HugeTLB pages (2MB/1GB type). In short, when a HugeTLB page is allocated or freed, the vmemmap array representing the range associated with the page will need to be remapped. When a page is allocated, vmemmap pages are freed after remapping. When a page is freed, previously discarded vmemmap pages must be allocated before remapping. More implementations and details can be found here [1]. The infrastructure of freeing vmemmap pages associated with each HugeTLB page is already there, we can easily enable HUGETLB_PAGE_FREE_VMEMMAP for arm64, the only thing to be fixed is flush_dcache_page() . flush_dcache_page() need to be adapted to operate on the head page's flags since the tail vmemmap pages are mapped with read-only after the feature is enabled (clear operation is not permitted). There was some discussions about this in the thread [2], but there was no conclusion in the end. And I copied the concern proposed by Anshuman to here and explain why those concern is superfluous. It is safe to enable it for x86_64 as well as arm64. 1st concern: ''' But what happens when a hot remove section's vmemmap area (which is being teared down) is nearby another vmemmap area which is either created or being destroyed for HugeTLB alloc/free purpose. As you mentioned HugeTLB pages inside the hot remove section might be safe. But what about other HugeTLB areas whose vmemmap area shares page table entries with vmemmap entries for a section being hot removed ? Massive HugeTLB alloc /use/free test cycle using memory just adjacent to a memory hotplug area, which is always added and removed periodically, should be able to expose this problem. ''' Answer: At the time memory is removed, all HugeTLB pages either have been migrated away or dissolved. So there is no race between memory hot remove and free_huge_page_vmemmap(). Therefore, HugeTLB pages inside the hot remove section is safe. Let's talk your question "what about other HugeTLB areas whose vmemmap area shares page table entries with vmemmap entries for a section being hot removed ?", the question is not established. The minimal granularity size of hotplug memory 128MB (on arm64, 4k base page), any HugeTLB smaller than 128MB is within a section, then, there is no share PTE page tables between HugeTLB in this section and ones in other sections and a HugeTLB page could not cross two sections. In this case, the section cannot be freed. Any HugeTLB bigger than 128MB (section size) whose vmemmap pages is an integer multiple of 2MB (PMD-mapped). As long as: 1) HugeTLBs are naturally aligned, power-of-two sizes 2) The HugeTLB size >= the section size 3) The HugeTLB size >= the vmemmap leaf mapping size Then a HugeTLB will not share any leaf page table entries with *anything else*, but will share intermediate entries. In this case, at the time memory is removed, all HugeTLB pages either have been migrated away or dissolved. So there is also no race between memory hot remove and free_huge_page_vmemmap(). 2nd concern: ''' differently, not sure if ptdump would require any synchronization. Dumping an wrong value is probably okay but crashing because a page table entry is being freed after ptdump acquired the pointer is bad. On arm64, ptdump() is protected against hotremove via [get|put]_online_mems(). ''' Answer: The ptdump should be fine since vmemmap_remap_free() only exchanges PTEs or splits the PMD entry (which means allocating a PTE page table). Both operations do not free any page tables (PTE), so ptdump cannot run into a UAF on any page tables. The worst case is just dumping an wrong value. [1] https://lore.kernel.org/all/[email protected]/ [2] https://lore.kernel.org/all/[email protected]/ [[email protected]: restructure the code comment inside flush_dcache_page()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Barry Song <[email protected]> Tested-by: Barry Song <[email protected]> Cc: Will Deacon <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Bodeddula Balasubramaniam <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: David Rientjes <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: James Morse <[email protected]> Cc: Xiongchun Duan <[email protected]> Cc: Fam Zheng <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm: hugetlb_vmemmap: introduce ARCH_WANT_HUGETLB_PAGE_FREE_VMEMMAPMuchun Song2-1/+10
The feature of minimizing overhead of struct page associated with each HugeTLB page is implemented on x86_64, however, the infrastructure of this feature is already there, we could easily enable it for other architectures. Introduce ARCH_WANT_HUGETLB_PAGE_FREE_VMEMMAP for other architectures to be easily enabled. Just select this config if they want to enable this feature. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Suggested-by: Andrew Morton <[email protected]> Reviewed-by: Barry Song <[email protected]> Tested-by: Barry Song <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Bodeddula Balasubramaniam <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Fam Zheng <[email protected]> Cc: James Morse <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Will Deacon <[email protected]> Cc: Xiongchun Duan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28hugetlb: remove use of list iterator variable after loopJakob Koschel1-14/+19
In preparation to limit the scope of the list iterator to the list traversal loop, use a dedicated pointer to iterate through the list [1]. Before hugetlb_resv_map_add() was expecting a file_region struct, but in case the list iterator in add_reservation_in_range() did not exit early, the variable passed in, is not actually a valid structure. In such a case 'rg' is computed on the head element of the list and represents an out-of-bounds pointer. This still remains safe *iff* you only use the link member (as it is done in hugetlb_resv_map_add()). To avoid the type-confusion altogether and limit the list iterator to the loop, only a list_head pointer is kept to pass to hugetlb_resv_map_add(). Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jakob Koschel <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Brian Johannesmeyer" <[email protected]> Cc: Cristiano Giuffrida <[email protected]> Cc: "Bos, H.J." <[email protected]> Cc: Jakob Koschel <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm, hugetlb, hwpoison: separate branch for free and in-use hugepageNaoya Horiguchi2-2/+6
We know that HPageFreed pages should have page refcount 0, so get_page_unless_zero() always fails and returns 0. So explicitly separate the branch based on page state for minor optimization and better readability. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Suggested-by: Mike Kravetz <[email protected]> Suggested-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/memory-failure.c: dissolve truncated hugetlb pageMiaohe Lin1-5/+4
If me_huge_page meets a truncated but not yet freed hugepage, it won't be dissolved even if we hold the last refcnt. It's because the hugepage has NULL page_mapping while it's not anonymous hugepage too. Thus we lose the last chance to dissolve it into buddy to save healthy subpages. Remove PageAnon check to handle these hugepages too. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/memory-failure.c: minor cleanup for HWPoisonHandlableMiaohe Lin1-5/+3
Patch series "A few fixup and cleanup patches for memory failure", v2. This series contains a patch to clean up the HWPoisonHandlable and another one to dissolve truncated hugetlb page. More details can be found in the respective changelogs. This patch (of 2): The local variable movable can be removed by returning true directly. Also fix typo 'mirgate'. No functional change intended. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Yang Shi <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28Revert "mm/memory-failure.c: fix race with changing page compound again"Naoya Horiguchi3-13/+0
Reverts commit 888af2701db7 ("mm/memory-failure.c: fix race with changing page compound again") because now we fetch the page refcount under hugetlb_lock in try_memory_failure_hugetlb() so that the race check is no longer necessary. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Suggested-by: Miaohe Lin <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Yang Shi <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-28mm/hwpoison: put page in already hwpoisoned case with MF_COUNT_INCREASEDNaoya Horiguchi1-0/+2
In already hwpoisoned case, memory_failure() is supposed to return with releasing the page refcount taken for error handling. But currently the refcount is not released when called with MF_COUNT_INCREASED, which makes page refcount inconsistent. This should be rare and non-critical, but it might be inconvenient in testing (unpoison doesn't work). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Suggested-by: Miaohe Lin <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>