aboutsummaryrefslogtreecommitdiff
path: root/mm/page_isolation.c
AgeCommit message (Collapse)AuthorFilesLines
2022-02-04Revert "mm/page_isolation: unset migratetype directly for non Buddy page"Chen Wandun1-1/+1
This reverts commit 721fb891ad0b3956d5c168b2931e3e5e4fb7ca40. Commit 721fb891ad0b ("mm/page_isolation: unset migratetype directly for non Buddy page") will result memory that should in buddy disappear by mistake. move_freepages_block moves all pages in pageblock instead of pages indicated by input parameter, so if input pages is not in buddy but other pages in pageblock is in buddy, it will result in page out of control. Link: https://lkml.kernel.org/r/[email protected] Fixes: 721fb891ad0b ("mm/page_isolation: unset migratetype directly for non Buddy page") Signed-off-by: Chen Wandun <[email protected]> Reported-by: "kernelci.org bot" <[email protected]> Acked-by: David Hildenbrand <[email protected]> Tested-by: Dong Aisheng <[email protected]> Tested-by: Francesco Dolcini <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-15mm/page_isolation: unset migratetype directly for non Buddy pageChen Wandun1-1/+1
In unset_migratetype_isolate(), we can bypass the call to move_freepages_block() for non-buddy pages. It will save a few cpu cycles for some situations such as cma and hugetlb when allocating continue pages, in these situation function alloc_contig_pages will be called. alloc_contig_pages __alloc_contig_migrate_range isolate_freepages_range ==> pages has been remove from buddy undo_isolate_page_range unset_migratetype_isolate ==> can directly set migratetype [[email protected]: changelog tweak] Link: https://lkml.kernel.org/r/[email protected] Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock") Signed-off-by: Chen Wandun <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Wang Kefeng <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06mm/page_isolation: guard against possible putback unisolated pageMiaohe Lin1-2/+7
Isolating a free page in an isolated pageblock is expected to always work as watermarks don't apply here. But if __isolate_free_page() failed, due to condition changes, the page will be left on the free list. And the page will be put back to free list again via __putback_isolated_page(). This may trigger VM_BUG_ON_PAGE() on page->flags checking in __free_one_page() if PageReported is set. Or we will corrupt the free list because list_add() will be called for pages already on another list. Add a VM_WARN_ON() to complain about this change. Link: https://lkml.kernel.org/r/[email protected] Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock") Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06mm/page_isolation: fix potential missing call to unset_migratetype_isolate()Miaohe Lin1-17/+3
In start_isolate_page_range() undo path, pfn_to_online_page() just checks the first pfn in a pageblock while __first_valid_page() will traverse the pageblock until the first online pfn is found. So we may miss the call to unset_migratetype_isolate() in undo path and pages will remain isolated unexpectedly. Fix this by calling undo_isolate_page_range() and this will also help to simplify the code further. Note we shouldn't ever trigger it because MAX_ORDER-1 aligned pfn ranges shouldn't contain memory holes now. Link: https://lkml.kernel.org/r/[email protected] Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-6/+1
Merge more updates from Andrew Morton: "147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f. Subsystems affected by this patch series: mm (memory-hotplug, rmap, ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan), alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib, checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig, selftests, ipc, and scripts" * emailed patches from Andrew Morton <[email protected]>: (94 commits) scripts: check_extable: fix typo in user error message mm/workingset: correct kernel-doc notations ipc: replace costly bailout check in sysvipc_find_ipc() selftests/memfd: remove unused variable Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH configs: remove the obsolete CONFIG_INPUT_POLLDEV prctl: allow to setup brk for et_dyn executables pid: cleanup the stale comment mentioning pidmap_init(). kernel/fork.c: unexport get_{mm,task}_exe_file coredump: fix memleak in dump_vma_snapshot() fs/coredump.c: log if a core dump is aborted due to changed file permissions nilfs2: use refcount_dec_and_lock() to fix potential UAF nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group nilfs2: fix NULL pointer in nilfs_##name##_attr_release nilfs2: fix memory leak in nilfs_sysfs_create_device_group trap: cleanup trap_init() init: move usermodehelper_enable() to populate_rootfs() ...
2021-09-08mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONEMike Rapoport1-6/+1
Patch series "mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE". After recent updates to freeing unused parts of the memory map, no architecture can have holes in the memory map within a pageblock. This makes pfn_valid_within() check and CONFIG_HOLES_IN_ZONE configuration option redundant. The first patch removes them both in a mechanical way and the second patch simplifies memory_hotplug::test_pages_in_a_zone() that had pfn_valid_within() surrounded by more logic than simple if. This patch (of 2): After recent changes in freeing of the unused parts of the memory map and rework of pfn_valid() in arm and arm64 there are no architectures that can have holes in the memory map within a pageblock and so nothing can enable CONFIG_HOLES_IN_ZONE which guards non trivial implementation of pfn_valid_within(). With that, pfn_valid_within() is always hardwired to 1 and can be completely removed. Remove calls to pfn_valid_within() and CONFIG_HOLES_IN_ZONE. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-03mm/page_isolation: tracing: trace all test_pages_isolated failuresGeorge G. Davis1-3/+10
Some test_pages_isolated failure conditions don't include trace points. For debugging issues caused by "pinned" pages, make sure to trace all calls whether they succeed or fail. In this case, a failure case did not result in a trace point. So add the missing failure case in test_pages_isolated traces. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: George G. Davis <[email protected]> Cc: Eugeniu Rosca <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm/page_isolation: do not isolate the max order pageMuchun Song1-1/+1
A max order page has no buddy page and never merges to another order. So isolating and then freeing it is pointless. Link: https://lkml.kernel.org/r/[email protected] Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock") Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm, page_alloc: disable pcplists during memory offlineVlastimil Babka1-3/+3
Memory offlining relies on page isolation to guarantee a forward progress because pages cannot be reused while they are isolated. But the page isolation itself doesn't prevent from races while freed pages are stored on pcp lists and thus can be reused. This can be worked around by repeated draining of pcplists, as done by commit 968318261221 ("mm/memory_hotplug: drain per-cpu pages again during memory offline"). David and Michal would prefer that this race was closed in a way that callers of page isolation who need stronger guarantees don't need to repeatedly drain. David suggested disabling pcplists usage completely during page isolation, instead of repeatedly draining them. To achieve this without adding special cases in alloc/free fastpath, we can use the same approach as boot pagesets - when pcp->high is 0, any pcplist addition will be immediately flushed. The race can thus be closed by setting pcp->high to 0 and draining pcplists once, before calling start_isolate_page_range(). The draining will serialize after processes that already disabled interrupts and read the old value of pcp->high in free_unref_page_commit(), and processes that have not yet disabled interrupts, will observe pcp->high == 0 when they are rescheduled, and skip pcplists. This guarantees no stray pages on pcplists in zones where isolation happens. This patch thus adds zone_pcp_disable() and zone_pcp_enable() functions that page isolation users can call before start_isolate_page_range() and after unisolating (or offlining) the isolated pages. Also, drain_all_pages() is optimized to only execute on cpus where pcplists are not empty. The check can however race with a free to pcplist that has not yet increased the pcp->count from 0 to 1. Thus make the drain optionally skip the racy check and drain on all cpus, and use this option in zone_pcp_disable(). As we have to avoid external updates to high and batch while pcplists are disabled, we take pcp_batch_high_lock in zone_pcp_disable() and release it in zone_pcp_enable(). This also synchronizes multiple users of zone_pcp_disable()/enable(). Currently the only user of this functionality is offline_pages(). [[email protected]: add comment, per David] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Suggested-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm, page_alloc: move draining pcplists to page isolation usersVlastimil Babka1-5/+5
Currently, pcplists are drained during set_migratetype_isolate() which means once per pageblock processed start_isolate_page_range(). This is somewhat wasteful. Moreover, the callers might need different guarantees, and the draining is currently prone to races and does not guarantee that no page from isolated pageblock will end up on the pcplist after the drain. Better guarantees are added by later patches and require explicit actions by page isolation users that need them. Thus it makes sense to move the current imperfect draining to the callers also as a preparation step. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: David Hildenbrand <[email protected]> Suggested-by: Pavel Tatashin <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm: rename page_order() to buddy_order()Matthew Wilcox (Oracle)1-2/+2
The current page_order() can only be called on pages in the buddy allocator. For compound pages, you have to use compound_order(). This is confusing and led to a bug, so rename page_order() to buddy_order(). Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/page_alloc: move pages to tail in move_to_free_list()David Hildenbrand1-0/+5
Whenever we move pages between freelists via move_to_free_list()/ move_freepages_block(), we don't actually touch the pages: 1. Page isolation doesn't actually touch the pages, it simply isolates pageblocks and moves all free pages to the MIGRATE_ISOLATE freelist. When undoing isolation, we move the pages back to the target list. 2. Page stealing (steal_suitable_fallback()) moves free pages directly between lists without touching them. 3. reserve_highatomic_pageblock()/unreserve_highatomic_pageblock() moves free pages directly between freelists without touching them. We already place pages to the tail of the freelists when undoing isolation via __putback_isolated_page(), let's do it in any case (e.g., if order <= pageblock_order) and document the behavior. To simplify, let's move the pages to the tail for all move_to_free_list()/move_freepages_block() users. In 2., the target list is empty, so there should be no change. In 3., we might observe a change, however, highatomic is more concerned about allocations succeeding than cache hotness - if we ever realize this change degrades a workload, we can special-case this instance and add a proper comment. This change results in all pages getting onlined via online_pages() to be placed to the tail of the freelist. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Wei Yang <[email protected]> Acked-by: Pankaj Gupta <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Scott Cheloha <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: "K. Y. Srinivasan" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: Wei Liu <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/page_isolation: simplify return value of start_isolate_page_range()David Hildenbrand1-5/+2
Callers no longer need the number of isolated pageblocks. Let's simplify. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Wei Yang <[email protected]> Cc: Baoquan He <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Charan Teja Reddy <[email protected]> Cc: Dan Williams <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-13mm/page_isolation: cleanup set_migratetype_isolate()David Hildenbrand1-10/+7
Let's clean it up a bit, simplifying the exit paths. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Baoquan He <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Jason Wang <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Qian Cai <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-13mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()David Hildenbrand1-9/+6
Inside has_unmovable_pages(), we have a comment describing how unmovable data could end up in ZONE_MOVABLE - via "movablecore". Also, besides checking if the first page in the pageblock is reserved, we don't perform any further checks in case of ZONE_MOVABLE. In case of memory offlining, we set REPORT_FAILURE, properly dump_page() the page and handle the error gracefully. alloc_contig_pages() users currently never allocate from ZONE_MOVABLE. E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic pages only, which will never end up on the MOVABLE zone (see htlb_alloc_mask()). Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Baoquan He <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Jason Wang <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Qian Cai <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-13mm/page_isolation: exit early when pageblock is isolated in ↵David Hildenbrand1-4/+5
set_migratetype_isolate() Right now, if we have two isolations racing on a pageblock that's in the MOVABLE zone, we would trigger the WARN_ON_ONCE(). Let's just return directly, simplifying error handling. The change was introduced in commit 3d680bdf60a5 ("mm/page_isolation: fix potential warning from user"). As far as I can see, we currently don't have alloc_contig_range() users that use the ZONE_MOVABLE (anymore), so it's currently more a cleanup and a preparation for the future than a fix. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Baoquan He <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Acked-by: Mike Kravetz <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Qian Cai <[email protected]> Cc: Jason Wang <[email protected]> Cc: Mike Rapoport <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-09-19mm/memory_hotplug: drain per-cpu pages again during memory offlinePavel Tatashin1-0/+8
There is a race during page offline that can lead to infinite loop: a page never ends up on a buddy list and __offline_pages() keeps retrying infinitely or until a termination signal is received. Thread#1 - a new process: load_elf_binary begin_new_exec exec_mmap mmput exit_mmap tlb_finish_mmu tlb_flush_mmu release_pages free_unref_page_list free_unref_page_prepare set_pcppage_migratetype(page, migratetype); // Set page->index migration type below MIGRATE_PCPTYPES Thread#2 - hot-removes memory __offline_pages start_isolate_page_range set_migratetype_isolate set_pageblock_migratetype(page, MIGRATE_ISOLATE); Set migration type to MIGRATE_ISOLATE-> set drain_all_pages(zone); // drain per-cpu page lists to buddy allocator. Thread#1 - continue free_unref_page_commit migratetype = get_pcppage_migratetype(page); // get old migration type list_add(&page->lru, &pcp->lists[migratetype]); // add new page to already drained pcp list Thread#2 Never drains pcp again, and therefore gets stuck in the loop. The fix is to try to drain per-cpu lists again after check_pages_isolated_cb() fails. Fixes: c52e75935f8d ("mm: remove extra drain pages on pcp list") Signed-off-by: Pavel Tatashin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Wei Yang <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/page_alloc: remove a wrapper for alloc_migration_target()Joonsoo Kim1-10/+0
There is a well-defined standard migration target callback. Use it directly. Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Roman Gushchin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/migrate: introduce a standard migration target allocation functionJoonsoo Kim1-2/+5
There are some similar functions for migration target allocation. Since there is no fundamental difference, it's better to keep just one rather than keeping all variants. This patch implements base migration target allocation function. In the following patches, variants will be converted to use this function. Changes should be mechanical, but, unfortunately, there are some differences. First, some callers' nodemask is assgined to NULL since NULL nodemask will be considered as all available nodes, that is, &node_states[N_MEMORY]. Second, for hugetlb page allocation, gfp_mask is redefined as regular hugetlb allocation gfp_mask plus __GFP_THISNODE if user provided gfp_mask has it. This is because future caller of this function requires to set this node constaint. Lastly, if provided nodeid is NUMA_NO_NODE, nodeid is set up to the node where migration source lives. It helps to remove simple wrappers for setting up the nodeid. Note that PageHighmem() call in previous function is changed to open-code "is_highmem_idx()" since it provides more readability. [[email protected]: tweak patch title, per Vlastimil] [[email protected]: fix typo in comment] Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Roman Gushchin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/page_isolation: prefer the node of the source pageJoonsoo Kim1-1/+3
Patch series "clean-up the migration target allocation functions", v5. This patch (of 9): For locality, it's better to migrate the page to the same node rather than the node of the current caller's cpu. Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-04mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINEDavid Hildenbrand1-0/+9
virtio-mem wants to allow to offline memory blocks of which some parts were unplugged (allocated via alloc_contig_range()), especially, to later offline and remove completely unplugged memory blocks. The important part is that PageOffline() has to remain set until the section is offline, so these pages will never get accessed (e.g., when dumping). The pages should not be handed back to the buddy (which would require clearing PageOffline() and result in issues if offlining fails and the pages are suddenly in the buddy). Let's allow to do that by allowing to isolate any PageOffline() page when offlining. This way, we can reach the memory hotplug notifier MEM_GOING_OFFLINE, where the driver can signal that he is fine with offlining this page by dropping its reference count. PageOffline() pages with a reference count of 0 can then be skipped when offlining the pages (like if they were free, however they are not in the buddy). Anybody who uses PageOffline() pages and does not agree to offline them (e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not decrement the reference count and make offlining fail when trying to migrate such an unmovable page. So there should be no observable change. Same applies to balloon compaction users (movable PageOffline() pages), the pages will simply be migrated. Note 1: If offlining fails, a driver has to increment the reference count again in MEM_CANCEL_OFFLINE. Note 2: A driver that makes use of this has to be aware that re-onlining the memory block has to be handled by hooking into onlining code (online_page_callback_t), resetting the page PageOffline() and not giving them to the buddy. Reviewed-by: Alexander Duyck <[email protected]> Acked-by: Michal Hocko <[email protected]> Tested-by: Pankaj Gupta <[email protected]> Acked-by: Andrew Morton <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Anthony Yznaga <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Dan Williams <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Qian Cai <[email protected]> Cc: Pingfan Liu <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2020-04-07mm: add function __putback_isolated_pageAlexander Duyck1-4/+2
There are cases where we would benefit from avoiding having to go through the allocation and free cycle to return an isolated page. Examples for this might include page poisoning in which we isolate a page and then put it back in the free list without ever having actually allocated it. This will enable us to also avoid notifiers for the future free page reporting which will need to avoid retriggering page reporting when returning pages that have been reported on. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Nitesh Narayan Lal <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Wang <[email protected]> Cc: Yang Zhang <[email protected]> Cc: wei qi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/page_isolation: fix potential warning from userQian Cai1-7/+11
It makes sense to call the WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE) from start_isolate_page_range(), but should avoid triggering it from userspace, i.e, from is_mem_section_removable() because it could crash the system by a non-root user if warn_on_panic is set. While at it, simplify the code a bit by removing an unnecessary jump label. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Qian Cai <[email protected]> Suggested-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm/hotplug: silence a lockdep splat with printk()Qian Cai1-1/+10
It is not that hard to trigger lockdep splats by calling printk from under zone->lock. Most of them are false positives caused by lock chains introduced early in the boot process and they do not cause any real problems (although most of the early boot lock dependencies could happen after boot as well). There are some console drivers which do allocate from the printk context as well and those should be fixed. In any case, false positives are not that trivial to workaround and it is far from optimal to lose lockdep functionality for something that is a non-issue. So change has_unmovable_pages() so that it no longer calls dump_page() itself - instead it returns a "struct page *" of the unmovable page back to the caller so that in the case of a has_unmovable_pages() failure, the caller can call dump_page() after releasing zone->lock. Also, make dump_page() is able to report a CMA page as well, so the reason string from has_unmovable_pages() can be removed. Even though has_unmovable_pages doesn't hold any reference to the returned page this should be reasonably safe for the purpose of reporting the page (dump_page) because it cannot be hotremoved in the context of memory unplug. The state of the page might change but that is the case even with the existing code as zone->lock only plays role for free pages. While at it, remove a similar but unnecessary debug-only printk() as well. A sample of one of those lockdep splats is, WARNING: possible circular locking dependency detected ------------------------------------------------------ test.sh/8653 is trying to acquire lock: ffffffff865a4460 (console_owner){-.-.}, at: console_unlock+0x207/0x750 but task is already holding lock: ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at: __offline_isolated_pages+0x179/0x3e0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&(&zone->lock)->rlock){-.-.}: __lock_acquire+0x5b3/0xb40 lock_acquire+0x126/0x280 _raw_spin_lock+0x2f/0x40 rmqueue_bulk.constprop.21+0xb6/0x1160 get_page_from_freelist+0x898/0x22c0 __alloc_pages_nodemask+0x2f3/0x1cd0 alloc_pages_current+0x9c/0x110 allocate_slab+0x4c6/0x19c0 new_slab+0x46/0x70 ___slab_alloc+0x58b/0x960 __slab_alloc+0x43/0x70 __kmalloc+0x3ad/0x4b0 __tty_buffer_request_room+0x100/0x250 tty_insert_flip_string_fixed_flag+0x67/0x110 pty_write+0xa2/0xf0 n_tty_write+0x36b/0x7b0 tty_write+0x284/0x4c0 __vfs_write+0x50/0xa0 vfs_write+0x105/0x290 redirected_tty_write+0x6a/0xc0 do_iter_write+0x248/0x2a0 vfs_writev+0x106/0x1e0 do_writev+0xd4/0x180 __x64_sys_writev+0x45/0x50 do_syscall_64+0xcc/0x76c entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #2 (&(&port->lock)->rlock){-.-.}: __lock_acquire+0x5b3/0xb40 lock_acquire+0x126/0x280 _raw_spin_lock_irqsave+0x3a/0x50 tty_port_tty_get+0x20/0x60 tty_port_default_wakeup+0xf/0x30 tty_port_tty_wakeup+0x39/0x40 uart_write_wakeup+0x2a/0x40 serial8250_tx_chars+0x22e/0x440 serial8250_handle_irq.part.8+0x14a/0x170 serial8250_default_handle_irq+0x5c/0x90 serial8250_interrupt+0xa6/0x130 __handle_irq_event_percpu+0x78/0x4f0 handle_irq_event_percpu+0x70/0x100 handle_irq_event+0x5a/0x8b handle_edge_irq+0x117/0x370 do_IRQ+0x9e/0x1e0 ret_from_intr+0x0/0x2a cpuidle_enter_state+0x156/0x8e0 cpuidle_enter+0x41/0x70 call_cpuidle+0x5e/0x90 do_idle+0x333/0x370 cpu_startup_entry+0x1d/0x1f start_secondary+0x290/0x330 secondary_startup_64+0xb6/0xc0 -> #1 (&port_lock_key){-.-.}: __lock_acquire+0x5b3/0xb40 lock_acquire+0x126/0x280 _raw_spin_lock_irqsave+0x3a/0x50 serial8250_console_write+0x3e4/0x450 univ8250_console_write+0x4b/0x60 console_unlock+0x501/0x750 vprintk_emit+0x10d/0x340 vprintk_default+0x1f/0x30 vprintk_func+0x44/0xd4 printk+0x9f/0xc5 -> #0 (console_owner){-.-.}: check_prev_add+0x107/0xea0 validate_chain+0x8fc/0x1200 __lock_acquire+0x5b3/0xb40 lock_acquire+0x126/0x280 console_unlock+0x269/0x750 vprintk_emit+0x10d/0x340 vprintk_default+0x1f/0x30 vprintk_func+0x44/0xd4 printk+0x9f/0xc5 __offline_isolated_pages.cold.52+0x2f/0x30a offline_isolated_pages_cb+0x17/0x30 walk_system_ram_range+0xda/0x160 __offline_pages+0x79c/0xa10 offline_pages+0x11/0x20 memory_subsys_offline+0x7e/0xc0 device_offline+0xd5/0x110 state_store+0xc6/0xe0 dev_attr_store+0x3f/0x60 sysfs_kf_write+0x89/0xb0 kernfs_fop_write+0x188/0x240 __vfs_write+0x50/0xa0 vfs_write+0x105/0x290 ksys_write+0xc6/0x160 __x64_sys_write+0x43/0x50 do_syscall_64+0xcc/0x76c entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: console_owner --> &(&port->lock)->rlock --> &(&zone->lock)->rlock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&zone->lock)->rlock); lock(&(&port->lock)->rlock); lock(&(&zone->lock)->rlock); lock(console_owner); *** DEADLOCK *** 9 locks held by test.sh/8653: #0: ffff88839ba7d408 (sb_writers#4){.+.+}, at: vfs_write+0x25f/0x290 #1: ffff888277618880 (&of->mutex){+.+.}, at: kernfs_fop_write+0x128/0x240 #2: ffff8898131fc218 (kn->count#115){.+.+}, at: kernfs_fop_write+0x138/0x240 #3: ffffffff86962a80 (device_hotplug_lock){+.+.}, at: lock_device_hotplug_sysfs+0x16/0x50 #4: ffff8884374f4990 (&dev->mutex){....}, at: device_offline+0x70/0x110 #5: ffffffff86515250 (cpu_hotplug_lock.rw_sem){++++}, at: __offline_pages+0xbf/0xa10 #6: ffffffff867405f0 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x87/0x2f0 #7: ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at: __offline_isolated_pages+0x179/0x3e0 #8: ffffffff865a4920 (console_lock){+.+.}, at: vprintk_emit+0x100/0x340 stack backtrace: Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10, BIOS U34 05/21/2019 Call Trace: dump_stack+0x86/0xca print_circular_bug.cold.31+0x243/0x26e check_noncircular+0x29e/0x2e0 check_prev_add+0x107/0xea0 validate_chain+0x8fc/0x1200 __lock_acquire+0x5b3/0xb40 lock_acquire+0x126/0x280 console_unlock+0x269/0x750 vprintk_emit+0x10d/0x340 vprintk_default+0x1f/0x30 vprintk_func+0x44/0xd4 printk+0x9f/0xc5 __offline_isolated_pages.cold.52+0x2f/0x30a offline_isolated_pages_cb+0x17/0x30 walk_system_ram_range+0xda/0x160 __offline_pages+0x79c/0xa10 offline_pages+0x11/0x20 memory_subsys_offline+0x7e/0xc0 device_offline+0xd5/0x110 state_store+0xc6/0xe0 dev_attr_store+0x3f/0x60 sysfs_kf_write+0x89/0xb0 kernfs_fop_write+0x188/0x240 __vfs_write+0x50/0xa0 vfs_write+0x105/0x290 ksys_write+0xc6/0x160 __x64_sys_write+0x43/0x50 do_syscall_64+0xcc/0x76c entry_SYSCALL_64_after_hwframe+0x49/0xbe Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Qian Cai <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm: remove "count" parameter from has_unmovable_pages()David Hildenbrand1-1/+1
Now that the memory isolate notifier is gone, the parameter is always 0. Drop it and cleanup has_unmovable_pages(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Qian Cai <[email protected]> Cc: Pingfan Liu <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Dan Williams <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Wei Yang <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Arun KS <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-31mm: remove the memory isolate notifierDavid Hildenbrand1-34/+4
Luckily, we have no users left, so we can get rid of it. Cleanup set_migratetype_isolate() a little bit. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Dan Williams <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Pingfan Liu <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-12-01mm/page_isolation.c: convert SKIP_HWPOISON to MEMORY_OFFLINEDavid Hildenbrand1-6/+6
We have two types of users of page isolation: 1. Memory offlining: Offline memory so it can be unplugged. Memory won't be touched. 2. Memory allocation: Allocate memory (e.g., alloc_contig_range()) to become the owner of the memory and make use of it. For example, in case we want to offline memory, we can ignore (skip over) PageHWPoison() pages, as the memory won't get used. We can allow to offline memory. In contrast, we don't want to allow to allocate such memory. Let's generalize the approach so we can special case other types of pages we want to skip over in case we offline memory. While at it, also pass the same flags to test_pages_isolated(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Suggested-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Pingfan Liu <[email protected]> Cc: Qian Cai <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Dan Williams <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/page_isolation.c: change the prototype of undo_isolate_page_range()Pingfan Liu1-2/+1
undo_isolate_page_range() never fails, so no need to return value. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pingfan Liu <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14mm/page_isolation.c: remove redundant pfn_valid_within() in __first_valid_page()Anshuman Khandual1-2/+0
pfn_valid_within() calls pfn_valid() when CONFIG_HOLES_IN_ZONE making it redundant for both definitions (w/wo CONFIG_MEMORY_HOTPLUG) of the helper pfn_to_online_page() which either calls pfn_valid() or pfn_valid_within(). pfn_valid_within() being 1 when !CONFIG_HOLES_IN_ZONE is irrelevant either way. This does not change functionality. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-29mm/page_isolation.c: fix a wrong flag in set_migratetype_isolate()Qian Cai1-1/+2
Due to has_unmovable_pages() taking an incorrect irqsave flag instead of the isolation flag in set_migratetype_isolate(), there are issues with HWPOSION and error reporting where dump_page() is not called when there is an unmovable page. Link: http://lkml.kernel.org/r/[email protected] Fixes: d381c54760dc ("mm: only report isolation failures when offlining memory") Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Signed-off-by: Qian Cai <[email protected]> Cc: <[email protected]> [5.0.x] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-29mm/hotplug: fix offline undo_isolate_page_range()Qian Cai1-18/+30
Commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") introduced move_pfn_range_to_zone() which calls memmap_init_zone() during onlining a memory block. memmap_init_zone() will reset pagetype flags and makes migrate type to be MOVABLE. However, in __offline_pages(), it also call undo_isolate_page_range() after offline_isolated_pages() to do the same thing. Due to commit 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed __first_valid_page() to skip offline pages, undo_isolate_page_range() here just waste CPU cycles looping around the offlining PFN range while doing nothing, because __first_valid_page() will return NULL as offline_isolated_pages() has already marked all memory sections within the pfn range as offline via offline_mem_sections(). Also, after calling the "useless" undo_isolate_page_range() here, it reaches the point of no returning by notifying MEM_OFFLINE. Those pages will be marked as MIGRATE_MOVABLE again once onlining. The only thing left to do is to decrease the number of isolated pageblocks zone counter which would make some paths of the page allocation slower that the above commit introduced. Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages on ppc64, an "int" should still be enough to represent the number of pageblocks there. Fix an incorrect comment along the way. [[email protected]: v4] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") Signed-off-by: Qian Cai <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> [4.13+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-12-28mm: only report isolation failures when offlining memoryMichal Hocko1-6/+4
Heiko has complained that his log is swamped by warnings from has_unmovable_pages [ 20.536664] page dumped because: has_unmovable_pages [ 20.536792] page:000003d081ff4080 count:1 mapcount:0 mapping:000000008ff88600 index:0x0 compound_mapcount: 0 [ 20.536794] flags: 0x3fffe0000010200(slab|head) [ 20.536795] raw: 03fffe0000010200 0000000000000100 0000000000000200 000000008ff88600 [ 20.536796] raw: 0000000000000000 0020004100000000 ffffffff00000001 0000000000000000 [ 20.536797] page dumped because: has_unmovable_pages [ 20.536814] page:000003d0823b0000 count:1 mapcount:0 mapping:0000000000000000 index:0x0 [ 20.536815] flags: 0x7fffe0000000000() [ 20.536817] raw: 07fffe0000000000 0000000000000100 0000000000000200 0000000000000000 [ 20.536818] raw: 0000000000000000 0000000000000000 ffffffff00000001 0000000000000000 which are not triggered by the memory hotplug but rather CMA allocator. The original idea behind dumping the page state for all call paths was that these messages will be helpful debugging failures. From the above it seems that this is not the case for the CMA path because we are lacking much more context. E.g the second reported page might be a CMA allocated page. It is still interesting to see a slab page in the CMA area but it is hard to tell whether this is bug from the above output alone. Address this issue by dumping the page state only on request. Both start_isolate_page_range and has_unmovable_pages already have an argument to ignore hwpoison pages so make this argument more generic and turn it into flags and allow callers to combine non-default modes into a mask. While we are at it, has_unmovable_pages call from is_pageblock_removable_nolock (sysfs removable file) is questionable to report the failure so drop it from there as well. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Heiko Carstens <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mm, migrate: remove reason argument from new_page_tMichal Hocko1-2/+1
No allocation callback is using this argument anymore. new_page_node used to use this parameter to convey node_id resp. migration error up to move_pages code (do_move_page_to_node_array). The error status never made it into the final status field and we have a better way to communicate node id to the status field now. All other allocation callbacks simply ignored the argument so we can drop it finally. [[email protected]: fix migration callback] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix alloc_misplaced_dst_page()] [[email protected]: fix build] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reviewed-by: Zi Yan <[email protected]> Cc: Andrea Reale <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-05mm/page_isolation.c: make start_isolate_page_range() fail if already isolatedMike Kravetz1-1/+17
start_isolate_page_range() is used to set the migrate type of a set of pageblocks to MIGRATE_ISOLATE while attempting to start a migration operation. It assumes that only one thread is calling it for the specified range. This routine is used by CMA, memory hotplug and gigantic huge pages. Each of these users synchronize access to the range within their subsystem. However, two subsystems (CMA and gigantic huge pages for example) could attempt operations on the same range. If this happens, one thread may 'undo' the work another thread is doing. This can result in pageblocks being incorrectly left marked as MIGRATE_ISOLATE and therefore not available for page allocation. What is ideally needed is a way to synchronize access to a set of pageblocks that are undergoing isolation and migration. The only thing we know about these pageblocks is that they are all in the same zone. A per-node mutex is too coarse as we want to allow multiple operations on different ranges within the same zone concurrently. Instead, we will use the migration type of the pageblocks themselves as a form of synchronization. start_isolate_page_range sets the migration type on a set of page- blocks going in order from the one associated with the smallest pfn to the largest pfn. The zone lock is acquired to check and set the migration type. When going through the list of pageblocks check if MIGRATE_ISOLATE is already set. If so, this indicates another thread is working on this pageblock. We know exactly which pageblocks we set, so clean up by undo those and return -EBUSY. This allows start_isolate_page_range to serve as a synchronization mechanism and will allow for more general use of callers making use of these interfaces. Update comments in alloc_contig_range to reflect this new functionality. Each CPU holds the associated zone lock to modify or examine the migration type of a pageblock. And, it will only examine/update a single pageblock per lock acquire/release cycle. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-11-15mm: distinguish CMA and MOVABLE isolation in has_unmovable_pages()Michal Hocko1-5/+5
Joonsoo has noticed that "mm: drop migrate type checks from has_unmovable_pages" would break CMA allocator because it relies on has_unmovable_pages returning false even for CMA pageblocks which in fact don't have to be movable: alloc_contig_range start_isolate_page_range set_migratetype_isolate has_unmovable_pages This is a result of the code sharing between CMA and memory hotplug while each one has a different idea of what has_unmovable_pages should return. This is unfortunate but fixing it properly would require a lot of code duplication. Fix the issue by introducing the requested migrate type argument and special case MIGRATE_CMA case where CMA page blocks are handled properly. This will work for memory hotplug because it requires MIGRATE_MOVABLE. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Joonsoo Kim <[email protected]> Tested-by: Stefan Wahren <[email protected]> Tested-by: Ran Wang <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Igor Mammedov <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Reza Arbab <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman1-0/+1
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Philippe Ombredanne <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2017-07-10mm: unify new_node_page and alloc_migrate_targetMichal Hocko1-16/+2
Commit 394e31d2ceb4 ("mem-hotplug: alloc new page from a nearest neighbor node when mem-offline") has duplicated a large part of alloc_migrate_target with some hotplug specific special casing. To be more precise it tried to enfore the allocation from a different node than the original page. As a result the two function diverged in their shared logic, e.g. the hugetlb allocation strategy. Let's unify the two and express different NUMA requirements by the given nodemask. new_node_page will simply exclude the node it doesn't care about and alloc_migrate_target will use all the available nodes. alloc_migrate_target will then learn to migrate hugetlb pages more sanely and use preallocated pool when possible. Please note that alloc_migrate_target used to call alloc_page resp. alloc_pages_current so the memory policy of the current context which is quite strange when we consider that it is used in the context of alloc_contig_range which just tries to migrate pages which stand in the way. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: zhong jiang <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-06mm: __first_valid_page skip over offline pagesMichal Hocko1-8/+18
__first_valid_page skips over invalid pfns in the range but it might still stumble over offline pages. At least start_isolate_page_range will mark those set_migratetype_isolate. This doesn't represent any immediate AFAICS because alloc_contig_range will fail to isolate those pages but it relies on not fully initialized page which will become a problem later when we stop associating offline pages to zones. Use pfn_to_online_page to handle this. This is more a preparatory patch than a fix. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Dan Williams <[email protected]> Cc: Daniel Kiper <[email protected]> Cc: David Rientjes <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Igor Mammedov <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Reza Arbab <[email protected]> Cc: Tobias Regnery <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-08mm, page_alloc: count movable pages when stealing from pageblockVlastimil Babka1-2/+3
When stealing pages from pageblock of a different migratetype, we count how many free pages were stolen, and change the pageblock's migratetype if more than half of the pageblock was free. This might be too conservative, as there might be other pages that are not free, but were allocated with the same migratetype as our allocation requested. While we cannot determine the migratetype of allocated pages precisely (at least without the page_owner functionality enabled), we can count pages that compaction would try to isolate for migration - those are either on LRU or __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be done as part of free page stealing with little additional overhead. The page stealing code is changed so that it considers free pages plus pages of the "good" migratetype for the decision whether to change pageblock's migratetype. The result should be more accurate migratetype of pageblocks wrt the actual pages in the pageblocks, when stealing from semi-occupied pageblocks. This should help the efficiency of page grouping by mobility. In testing based on 4.9 kernel with stress-highalloc from mmtests configured for order-4 GFP_KERNEL allocations, this patch has reduced the number of unmovable allocations falling back to movable pageblocks by 47%. The number of movable allocations falling back to other pageblocks are increased by 55%, but these events don't cause permanent fragmentation, so the tradeoff should be positive. Later patches also offset the movable fallback increase to some extent. [[email protected]: merge fix] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-03mm: use is_migrate_isolate_page() to simplify the codeXishi Qiu1-3/+3
Use is_migrate_isolate_page() to simplify the code, no functional changes. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Xishi Qiu <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm, page_alloc: avoid page_to_pfn() when merging buddiesVlastimil Babka1-1/+1
On architectures that allow memory holes, page_is_buddy() has to perform page_to_pfn() to check for the memory hole. After the previous patch, we have the pfn already available in __free_one_page(), which is the only caller of page_is_buddy(), so move the check there and avoid page_to_pfn(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm, page_alloc: don't convert pfn to idx when mergingVlastimil Babka1-4/+4
In __free_one_page() we do the buddy merging arithmetics on "page/buddy index", which is just the lower MAX_ORDER bits of pfn. The operations we do that affect the higher bits are bitwise AND and subtraction (in that order), where the final result will be the same with the higher bits left unmasked, as long as these bits are equal for both buddies - which must be true by the definition of a buddy. We can therefore use pfn's directly instead of "index" and skip the zeroing of >MAX_ORDER bits. This can help a bit by itself, although compiler might be smart enough already. It also helps the next patch to avoid page_to_pfn() for memory hole checks. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-10-07mm/page_isolation: fix typo: "paes" -> "pages"Yisheng Xie1-1/+1
Fix typo in comment. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yisheng Xie <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-26mm/page_isolation: clean up confused codeJoonsoo Kim1-3/+3
When there is an isolated_page, post_alloc_hook() is called with page but __free_pages() is called with isolated_page. Since they are the same so no problem but it's very confusing. To reduce it, this patch changes isolated_page to boolean type and uses page variable consistently. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-26mm/page_alloc: introduce post allocation processing on page allocatorJoonsoo Kim1-3/+1
This patch is motivated from Hugh and Vlastimil's concern [1]. There are two ways to get freepage from the allocator. One is using normal memory allocation API and the other is __isolate_free_page() which is internally used for compaction and pageblock isolation. Later usage is rather tricky since it doesn't do whole post allocation processing done by normal API. One problematic thing I already know is that poisoned page would not be checked if it is allocated by __isolate_free_page(). Perhaps, there would be more. We could add more debug logic for allocated page in the future and this separation would cause more problem. I'd like to fix this situation at this time. Solution is simple. This patch commonize some logic for newly allocated page and uses it on all sites. This will solve the problem. [1] http://marc.info/?i=alpine.LSU.2.11.1604270029350.7066%40eggly.anvils%3E [[email protected]: mm-page_alloc-introduce-post-allocation-processing-on-page-allocator-v3] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-26mm/page_owner: initialize page owner without holding the zone lockJoonsoo Kim1-3/+6
It's not necessary to initialized page_owner with holding the zone lock. It would cause more contention on the zone lock although it's not a big problem since it is just debug feature. But, it is better than before so do it. This is also preparation step to use stackdepot in page owner feature. Stackdepot allocates new pages when there is no reserved space and holding the zone lock in this case will cause deadlock. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-19mm/memory_hotplug: add comment to some functions related to memory hotplugJoonsoo Kim1-0/+1
__offline_isolated_pages() and test_pages_isolated() are used by memory hotplug. These functions require that range is in a single zone but there is no code to do this because memory hotplug checks it before calling these functions. To avoid confusing future user of these functions, this patch adds comments to them. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-19include/linux/nodemask.h: create next_node_in() helperAndrew Morton1-6/+3
Lots of code does node = next_node(node, XXX); if (node == MAX_NUMNODES) node = first_node(XXX); so create next_node_in() to do this and use it in various places. [[email protected]: use next_node_in() helper] Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: David Rientjes <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Hui Zhu <[email protected]> Cc: Wang Xiaoqiang <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-04-01mm/page_isolation.c: fix the function commentsNeil Zhang1-1/+1
Commit fea85cff11de ("mm/page_isolation.c: return last tested pfn rather than failure indicator") changed the meaning of the return value. Let's change the function comments as well. Signed-off-by: Neil Zhang <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-04-01mm: fix invalid node in alloc_migrate_target()Xishi Qiu1-4/+4
It is incorrect to use next_node to find a target node, it will return MAX_NUMNODES or invalid node. This will lead to crash in buddy system allocation. Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: Xishi Qiu <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: David Rientjes <[email protected]> Cc: "Laura Abbott" <[email protected]> Cc: Hui Zhu <[email protected]> Cc: Wang Xiaoqiang <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>