linux-IllusionX/mm
yangge 33dfe9204f mm/gup: clear the LRU flag of a page before adding to LRU batch
If a large number of CMA memory are configured in system (for example,
the CMA memory accounts for 50% of the system memory), starting a
virtual virtual machine with device passthrough, it will call
pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory.  Normally
if a page is present and in CMA area, pin_user_pages_remote() will
migrate the page from CMA area to non-CMA area because of FOLL_LONGTERM
flag.  But the current code will cause the migration failure due to
unexpected page refcounts, and eventually cause the virtual machine
fail to start.

If a page is added in LRU batch, its refcount increases one, remove the
page from LRU batch decreases one.  Page migration requires the page is
not referenced by others except page mapping.  Before migrating a page,
we should try to drain the page from LRU batch in case the page is in
it, however, folio_test_lru() is not sufficient to tell whether the
page is in LRU batch or not, if the page is in LRU batch, the migration
will fail.

To solve the problem above, we modify the logic of adding to LRU batch.
Before adding a page to LRU batch, we clear the LRU flag of the page
so that we can check whether the page is in LRU batch by
folio_test_lru(page).  It's quite valuable, because likely we don't
want to blindly drain the LRU batch simply because there is some
unexpected reference on a page, as described above.

This change makes the LRU flag of a page invisible for longer, which
may impact some programs.  For example, as long as a page is on a LRU
batch, we cannot isolate it, and we cannot check if it's an LRU page. 
Further, a page can now only be on exactly one LRU batch.  This doesn't
seem to matter much, because a new page is allocated from buddy and
added to the lru batch, or be isolated, it's LRU flag may also be
invisible for a long time.

Link: https://lkml.kernel.org/r/1720075944-27201-1-git-send-email-yangge1116@126.com
Link: https://lkml.kernel.org/r/1720008153-16035-1-git-send-email-yangge1116@126.com
Fixes: 9a4e9f3b2d ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region")
Signed-off-by: yangge <yangge1116@126.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-17 21:08:54 -07:00
..
damon mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
kasan kasan: fix bad call to unpoison_slab_object 2024-06-24 20:52:09 -07:00
kfence mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
kmsan kmsan: do not pass NULL pointers as 0 2024-07-03 19:30:26 -07:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c mm: remove MIGRATE_SYNC_NO_COPY mode 2024-07-03 19:30:00 -07:00
bootmem_info.c bootmem: use kmemleak_free_part_phys in put_page_bootmem 2023-10-25 16:47:13 -07:00
cma.c mm/cma: drop incorrect alignment check in cma_init_reserved_mem 2024-04-25 20:56:42 -07:00
cma.h mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
cma_debug.c
cma_sysfs.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
compaction.c mm: handle profiling for fake memory allocations during compaction 2024-06-24 20:52:09 -07:00
debug.c mm/debug: print only page mapcount (excluding folio entire mapcount) in __dump_folio() 2024-05-05 17:53:31 -07:00
debug_page_alloc.c mm: page_alloc: consolidate free page accounting 2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick 2024-06-15 10:43:08 -07:00
dmapool.c mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs 2023-12-05 11:17:58 +01:00
dmapool_test.c mm/dmapool: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
early_ioremap.c
execmem.c mm/execmem, arch: convert remaining overrides of module_alloc to execmem 2024-05-14 00:31:43 -07:00
fadvise.c
fail_page_alloc.c mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC 2024-07-17 21:05:18 -07:00
failslab.c mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB 2024-07-17 21:05:18 -07:00
filemap.c Merge branch 'mm-hotfixes-stable' into mm-stable to pick up "mm: fix 2024-07-06 11:44:41 -07:00
folio-compat.c mm: remove page_mapping() 2024-07-03 19:29:59 -07:00
gup.c mm: remove CONFIG_ARCH_HAS_HUGEPD 2024-07-12 15:52:19 -07:00
gup_test.c
gup_test.h
highmem.c mm/highmem: make nr_free_highpages() return "unsigned long" 2024-07-03 19:30:06 -07:00
hmm.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
huge_memory.c mm: shmem: rename mTHP shmem counters 2024-07-12 15:52:23 -07:00
hugetlb.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
hugetlb_cgroup.c mm/hugetlb_cgroup: switch to the new cftypes 2024-07-03 19:30:10 -07:00
hugetlb_vmemmap.c Merge branch 'mm-hotfixes-stable' into mm-stable to pick up "mm: fix 2024-07-06 11:44:41 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: fix reference to nonexistent file 2023-10-25 16:47:14 -07:00
hwpoison-inject.c mm/hwpoison: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
init-mm.c mm: Deprecate pasid field 2023-12-12 10:11:32 +01:00
internal.h Merge branch 'mm-hotfixes-stable' into mm-stable to pick up "mm: fix 2024-07-06 11:44:41 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig mm: remove CONFIG_ARCH_HAS_HUGEPD 2024-07-12 15:52:19 -07:00
Kconfig.debug mm/slub: unify all sl[au]b parameters with "slab_$param" 2024-01-22 10:31:08 +01:00
khugepaged.c mm: fix khugepaged activation policy 2024-07-12 15:52:20 -07:00
kmemleak.c mm/kmemleak: replace strncpy() with strscpy() 2024-07-17 21:05:18 -07:00
ksm.c mm: move memory_failure_queue() into copy_mc_[user]_highpage() 2024-07-06 11:53:19 -07:00
list_lru.c mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
maccess.c
madvise.c mm/madvise: add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON) 2024-07-03 19:29:57 -07:00
Makefile mm: memcg: put cgroup v1-specific code under a config option 2024-07-04 18:05:54 -07:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c memblock: use numa_valid_node() helper to check for invalid node ID 2024-06-16 10:17:57 +03:00
memcontrol-v1.c mm: memcg1: convert charge move flags to unsigned long long 2024-07-17 21:05:19 -07:00
memcontrol-v1.h mm: memcg: gather memcg1-specific fields initialization in memcg1_memcg_init() 2024-07-04 18:05:56 -07:00
memcontrol.c mm/page_counter: move calculating protection values to page_counter 2024-07-12 15:52:20 -07:00
memfd.c mm/gup: introduce memfd_pin_folios() for pinning memfd folios 2024-07-12 15:52:09 -07:00
memory-failure.c mm/memory-failure: remove obsolete MF_MSG_DIFFERENT_COMPOUND 2024-07-12 15:52:22 -07:00
memory-tiers.c memory tier: consolidate the initialization of memory tiers 2024-07-12 15:52:20 -07:00
memory.c mm: unexport vmf_insert_mixed_mkwrite 2024-07-12 15:52:19 -07:00
memory_hotplug.c mm/page_alloc: put __free_pages_core() in __meminit section 2024-07-12 15:52:21 -07:00
mempolicy.c mm/numa_balancing: teach mpol_to_str about the balancing mode 2024-07-17 21:08:54 -07:00
mempool.c mm: fix xyz_noprof functions calling profiled functions 2024-06-05 19:19:26 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-03-13 12:12:21 -07:00
migrate.c mm/migrate: putback split folios when numa hint migration fails 2024-07-12 15:52:22 -07:00
migrate_device.c mm: extend rmap flags arguments for folio_add_new_anon_rmap 2024-07-03 19:30:18 -07:00
mincore.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
mlock.c mm/mlock: implement folio_mlock_step() using folio_pte_batch() 2024-07-03 19:30:09 -07:00
mm_init.c mm/mm_init.c: move build check on MAX_ZONELISTS out of ifdef 2024-07-03 19:30:19 -07:00
mm_slot.h
mmap.c mm: batch unlink_file_vma calls in free_pgd_range 2024-07-03 19:29:58 -07:00
mmap_lock.c mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer 2024-07-03 19:30:26 -07:00
mmu_gather.c mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing 2024-02-22 15:27:17 -08:00
mmu_notifier.c mmu_notifier: remove the .change_pte() callback 2024-04-11 13:18:36 -04:00
mmzone.c zswap: shrink zswap pool based on memory pressure 2023-12-12 10:57:02 -08:00
mprotect.c mm: introduce pmd|pte_needs_soft_dirty_wp helpers for softdirty write-protect 2024-07-03 19:30:07 -07:00
mremap.c mm: remove page_mkclean() 2024-07-03 19:30:17 -07:00
mseal.c mseal: add mseal syscall 2024-05-23 19:40:26 -07:00
msync.c
nommu.c The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
oom_kill.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
page-writeback.c mm: avoid overflows in dirty throttling logic 2024-07-03 19:30:15 -07:00
page_alloc.c mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC 2024-07-17 21:05:18 -07:00
page_counter.c mm/page_counter: move calculating protection values to page_counter 2024-07-12 15:52:20 -07:00
page_ext.c mm: report per-page metadata information 2024-07-03 19:30:09 -07:00
page_idle.c
page_io.c mm: ignore data-race in __swap_writepage 2024-07-17 21:05:18 -07:00
page_isolation.c mm: page_isolation: prepare for hygienic freelists 2024-04-25 20:56:04 -07:00
page_owner.c mm/page-owner: use gfp_nested_mask() instead of open coded masking 2024-05-19 14:40:44 -07:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-15 10:43:04 -07:00
page_vma_mapped.c mm: make page_mapped_in_vma conditional on CONFIG_MEMORY_FAILURE 2024-05-05 17:53:45 -07:00
pagewalk.c mm: remove CONFIG_ARCH_HAS_HUGEPD 2024-07-12 15:52:19 -07:00
percpu-internal.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-05-07 10:37:00 -07:00
process_vm_access.c mm: fix process_vm_rw page counts 2023-12-10 16:51:39 -08:00
ptdump.c mm: ptdump: add check_wx_pages debugfs attribute 2024-02-22 10:24:47 -08:00
readahead.c Merge branch 'mm-hotfixes-stable' into mm-stable to pick up "mm: fix 2024-07-06 11:44:41 -07:00
rmap.c mm: remove folio_test_anon(folio)==false path in __folio_add_anon_rmap() 2024-07-03 19:30:18 -07:00
rodata_test.c
secretmem.c mm/secretmem: use a folio in secretmem_fault() 2023-08-21 13:38:02 -07:00
shmem.c mm: shmem: rename mTHP shmem counters 2024-07-12 15:52:23 -07:00
shmem_quota.c tmpfs: fix race on handling dquot rbtree 2024-03-26 11:07:23 -07:00
show_mem.c lib: add memory allocations report in show_mem() 2024-04-25 20:55:57 -07:00
shrinker.c mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info() 2024-01-05 09:58:32 -08:00
shrinker_debug.c mm: shrinker: convert shrinker_rwsem to mutex 2023-10-04 10:32:26 -07:00
shuffle.c
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
slab_common.c mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
slub.c mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB 2024-07-17 21:05:18 -07:00
sparse-vmemmap.c mm: report per-page metadata information 2024-07-03 19:30:09 -07:00
sparse.c mm/sparse: nr_pages won't be 0 2024-07-03 19:30:19 -07:00
swap.c mm/gup: clear the LRU flag of a page before adding to LRU batch 2024-07-17 21:08:54 -07:00
swap.h mm: swap: remove 'synchronous' argument to swap_read_folio() 2024-07-03 19:30:06 -07:00
swap_cgroup.c
swap_slots.c mm: swap: update get_swap_pages() to take folio order 2024-04-25 20:56:37 -07:00
swap_state.c mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() 2024-07-12 15:52:22 -07:00
swapfile.c mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)==false 2024-07-03 19:30:18 -07:00
truncate.c mm/truncate: batch-clear shadow entries 2024-07-12 15:52:22 -07:00
usercopy.c
userfaultfd.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
util.c mm: add folio_mc_copy() 2024-07-06 11:53:19 -07:00
vmalloc.c Merge branch 'mm-hotfixes-stable' into mm-stable to pick up "mm: fix 2024-07-06 11:44:41 -07:00
vmpressure.c eventfd: simplify eventfd_signal() 2023-11-28 14:08:38 +01:00
vmscan.c mm/mglru: fix overshooting shrinker memory 2024-07-17 21:05:18 -07:00
vmstat.c mm: report per-page metadata information 2024-07-03 19:30:09 -07:00
workingset.c cachestat: do not flush stats in recency check 2024-07-03 22:40:37 -07:00
z3fold.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zbud.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zpool.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zsmalloc.c zsmalloc: rename class stat mutators 2024-07-12 15:52:13 -07:00
zswap.c mm/zswap: use only one pool in zswap 2024-07-12 15:52:09 -07:00