aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-10-16mm/page_isolation: simplify return value of start_isolate_page_range()David Hildenbrand3-7/+4
Callers no longer need the number of isolated pageblocks. Let's simplify. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Wei Yang <[email protected]> Cc: Baoquan He <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Charan Teja Reddy <[email protected]> Cc: Dan Williams <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/memory_hotplug: drop nr_isolate_pageblock in offline_pages()David Hildenbrand1-3/+2
We make sure that we cannot have any memory holes right at the beginning of offline_pages() and we only support to online/offline full sections. Both, sections and pageblocks are a power of two in size, and sections always span full pageblocks. We can directly calculate the number of isolated pageblocks from nr_pages. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Wei Yang <[email protected]> Cc: Baoquan He <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Charan Teja Reddy <[email protected]> Cc: Dan Williams <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/page_alloc: simplify __offline_isolated_pages()David Hildenbrand2-25/+6
offline_pages() is the only user. __offline_isolated_pages() never gets called with ranges that contain memory holes and we no longer care about the return value. Drop the return value handling and all pfn_valid() checks. Update the documentation. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Wei Yang <[email protected]> Cc: Baoquan He <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Charan Teja Reddy <[email protected]> Cc: Dan Williams <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/memory_hotplug: simplify page offliningDavid Hildenbrand1-34/+10
We make sure that we cannot have any memory holes right at the beginning of offline_pages(). We no longer need walk_system_ram_range() and can call test_pages_isolated() and __offline_isolated_pages() directly. offlined_pages always corresponds to nr_pages, so we can simplify that. [[email protected]: patch conflict resolution] Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Wei Yang <[email protected]> Cc: Baoquan He <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Charan Teja Reddy <[email protected]> Cc: Dan Williams <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/memory_hotplug: enforce section granularity when onlining/offliningDavid Hildenbrand1-0/+10
Already two people (including me) tried to offline subsections, because the function looks like it can deal with it. But we really can only online/offline full sections that are properly aligned (e.g., we can only mark full sections online/offline via SECTION_IS_ONLINE). Add a simple safety net to document the restriction now. Current users (core and powernv/memtrace) respect these restrictions. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Wei Yang <[email protected]> Cc: Baoquan He <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Charan Teja Reddy <[email protected]> Cc: Dan Williams <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/memory_hotplug: inline __offline_pages() into offline_pages()David Hildenbrand1-11/+5
Patch series "mm/memory_hotplug: online_pages()/offline_pages() cleanups", v2. These are a bunch of cleanups for online_pages()/offline_pages() and related code, mostly getting rid of memory hole handling that is no longer necessary. There is only a single walk_system_ram_range() call left in offline_pages(), to make sure we don't have any memory holes. I had some of these patches lying around for a longer time but didn't have time to polish them. In addition, the last patch marks all pageblocks of memory to get onlined MIGRATE_ISOLATE, so pages that have just been exposed to the buddy cannot get allocated before onlining is complete. Once heavy lifting is done, the pageblocks are set to MIGRATE_MOVABLE, such that allocations are possible. I played with DIMMs and virtio-mem on x86-64 and didn't spot any surprises. I verified that the numer of isolated pageblocks is correctly handled when onlining/offlining. This patch (of 10): There is only a single user, offline_pages(). Let's inline, to make it look more similar to online_pages(). Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Wei Yang <[email protected]> Cc: Baoquan He <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Charan Teja Reddy <[email protected]> Cc: Dan Williams <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Tony Luck <[email protected]> Cc: Mel Gorman <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/mmu_notifier: fix mmget() assert in __mmu_interval_notifier_insertJann Horn1-1/+1
The comment talks about having to hold mmget() (which means mm_users), but the actual check is on mm_count (which would be mmgrab()). Given that MMU notifiers are torn down in mmput() -> __mmput() -> exit_mmap() -> mmu_notifier_release(), I believe that the comment is correct and the check should be on mm->mm_users. Fix it up accordingly. Fixes: 99cb252f5e68 ("mm/mmu_notifier: add an interval tree notifier") Signed-off-by: Jann Horn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christian König <[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/util.c: update the kerneldoc for kstrdup_const()Bartosz Golaszewski1-1/+2
Memory allocated with kstrdup_const() must not be passed to regular krealloc() as it is not aware of the possibility of the chunk residing in .rodata. Since there are no potential users of krealloc_const() at the moment, let's just update the doc to make it explicit. Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/vmstat.c: use helper macro abs()Miaohe Lin1-4/+4
Use helper macro abs() to simplify the "x > t || x < -t" cmp. Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/page_poison.c: replace bool variable with static keyMateusz Nosek1-5/+15
Variable 'want_page_poisoning' is a switch deciding if page poisoning should be enabled. This patch changes it to be static key. Signed-off-by: Mateusz Nosek <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison: try to narrow window race for free pagesOscar Salvador1-1/+6
Aristeu Rozanski reported that a customer test case started to report -EBUSY after the hwpoison rework patchset. There is a race window between spotting a free page and taking it off its buddy freelist, so it might be that by the time we try to take it off, the page has been already allocated. This patch tries to handle such race window by trying to handle the new type of page again if the page was allocated under us. Reported-by: Aristeu Rozanski <[email protected]> Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Aristeu Rozanski <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison: double-check page count in __get_any_page()Naoya Horiguchi1-0/+6
Soft offlining could fail with EIO due to the race condition with hugepage migration. This issuse became visible due to the change by previous patch that makes soft offline handler take page refcount by its own. We have no way to directly pin zero refcount page, and the page considered as a zero refcount page could be allocated just after the first check. This patch adds the second check to find the race and gives us chance to handle it more reliably. Reported-by: Qian Cai <[email protected]> Signed-off-by: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison: introduce MF_MSG_UNSPLIT_THPNaoya Horiguchi3-1/+8
memory_failure() is supposed to call action_result() when it handles a memory error event, but there's one missing case. So let's add it. I find that include/ras/ras_event.h has some other MF_MSG_* undefined, so this patch also adds them. Signed-off-by: Naoya Horiguchi <[email protected]> Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison: return 0 if the page is already poisoned in soft-offlineOscar Salvador2-7/+2
Currently, there is an inconsistency when calling soft-offline from different paths on a page that is already poisoned. 1) madvise: madvise_inject_error skips any poisoned page and continues the loop. If that was the only page to madvise, it returns 0. 2) /sys/devices/system/memory/: When calling soft_offline_page_store()->soft_offline_page(), we return -EBUSY in case the page is already poisoned. This is inconsistent with a) the above example and b) memory_failure, where we return 0 if the page was poisoned. Fix this by dropping the PageHWPoison() check in madvise_inject_error, and let soft_offline_page return 0 if it finds the page already poisoned. Please, note that this represents a user-api change, since now the return error when calling soft_offline_page_store()->soft_offline_page() will be different. Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison: refactor soft_offline_huge_page and __soft_offline_pageOscar Salvador1-100/+82
Merging soft_offline_huge_page and __soft_offline_page let us get rid of quite some duplicated code, and makes the code much easier to follow. Now, __soft_offline_page will handle both normal and hugetlb pages. Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison: rework soft offline for in-use pagesOscar Salvador4-70/+28
This patch changes the way we set and handle in-use poisoned pages. Until now, poisoned pages were released to the buddy allocator, trusting that the checks that take place at allocation time would act as a safe net and would skip that page. This has proved to be wrong, as we got some pfn walkers out there, like compaction, that all they care is the page to be in a buddy freelist. Although this might not be the only user, having poisoned pages in the buddy allocator seems a bad idea as we should only have free pages that are ready and meant to be used as such. Before explaining the taken approach, let us break down the kind of pages we can soft offline. - Anonymous THP (after the split, they end up being 4K pages) - Hugetlb - Order-0 pages (that can be either migrated or invalited) * Normal pages (order-0 and anon-THP) - If they are clean and unmapped page cache pages, we invalidate then by means of invalidate_inode_page(). - If they are mapped/dirty, we do the isolate-and-migrate dance. Either way, do not call put_page directly from those paths. Instead, we keep the page and send it to page_handle_poison to perform the right handling. page_handle_poison sets the HWPoison flag and does the last put_page. Down the chain, we placed a check for HWPoison page in free_pages_prepare, that just skips any poisoned page, so those pages do not end up in any pcplist/freelist. After that, we set the refcount on the page to 1 and we increment the poisoned pages counter. If we see that the check in free_pages_prepare creates trouble, we can always do what we do for free pages: - wait until the page hits buddy's freelists - take it off, and flag it The downside of the above approach is that we could race with an allocation, so by the time we want to take the page off the buddy, the page has been already allocated so we cannot soft offline it. But the user could always retry it. * Hugetlb pages - We isolate-and-migrate them After the migration has been successful, we call dissolve_free_huge_page, and we set HWPoison on the page if we succeed. Hugetlb has a slightly different handling though. While for non-hugetlb pages we cared about closing the race with an allocation, doing so for hugetlb pages requires quite some additional and intrusive code (we would need to hook in free_huge_page and some other places). So I decided to not make the code overly complicated and just fail normally if the page we allocated in the meantime. We can always build on top of this. As a bonus, because of the way we handle now in-use pages, we no longer need the put-as-isolation-migratetype dance, that was guarding for poisoned pages to end up in pcplists. Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison: rework soft offline for free pagesOscar Salvador3-6/+81
When trying to soft-offline a free page, we need to first take it off the buddy allocator. Once we know is out of reach, we can safely flag it as poisoned. take_page_off_buddy will be used to take a page meant to be poisoned off the buddy allocator. take_page_off_buddy calls break_down_buddy_pages, which splits a higher-order page in case our page belongs to one. Once the page is under our control, we call page_handle_poison to set it as poisoned and grab a refcount on it. Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison: unify THP handling for hard and soft offlineOscar Salvador1-26/+22
Place the THP's page handling in a helper and use it from both hard and soft-offline machinery, so we get rid of some duplicated code. Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison: kill put_hwpoison_pageOscar Salvador2-16/+15
After commit 4e41a30c6d50 ("mm: hwpoison: adjust for new thp refcounting"), put_hwpoison_page got reduced to a put_page. Let us just use put_page instead. Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison: refactor madvise_inject_errorOscar Salvador1-17/+13
Make a proper if-else condition for {hard,soft}-offline. Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tony Luck <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Mike Kravetz <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison: unexport get_hwpoison_page and make it staticOscar Salvador2-3/+1
Since get_hwpoison_page is only used in memory-failure code now, let us un-export it and make it private to that code. Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison-inject: don't pin for hwpoison_filterNaoya Horiguchi1-13/+5
Another memory error injection interface debugfs:hwpoison/corrupt-pfn also takes bogus refcount for hwpoison_filter(). It's justified because this does a coarse filter, expecting that memory_failure() redoes the check for sure. Signed-off-by: Naoya Horiguchi <[email protected]> Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm, hwpoison: remove recalculating hpageNaoya Horiguchi1-5/+1
hpage is never used after try_to_split_thp_page() in memory_failure(), so we don't have to update hpage. So let's not recalculate/use hpage. Suggested-by: "Aneesh Kumar K.V" <[email protected]> Signed-off-by: Naoya Horiguchi <[email protected]> Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm,hwpoison: cleanup unused PageHuge() checkNaoya Horiguchi1-4/+1
Patch series "HWPOISON: soft offline rework", v7. This patchset fixes a couple of issues that the patchset Naoya sent [1] contained due to rebasing problems and a misunterdansting. Main focus of this series is to stabilize soft offline. Historically soft offlined pages have suffered from racy conditions because PageHWPoison is used to a little too aggressively, which (directly or indirectly) invades other mm code which cares little about hwpoison. This results in unexpected behavior or kernel panic, which is very far from soft offline's "do not disturb userspace or other kernel component" policy. An example of this can be found here [2]. Along with several cleanups, this code refactors and changes the way soft offline work. Main point of this change set is to contain target page "via buddy allocator" or in migrating path. For ther former we first free the target page as we do for normal pages, and once it has reached buddy and it has been taken off the freelists, we flag it as HWpoison. For the latter we never get to release the page in unmap_and_move, so the page is under our control and we can handle it in hwpoison code. [1] https://patchwork.kernel.org/cover/11704083/ [2] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u This patch (of 14): Drop the PageHuge check, which is dead code since memory_failure() forks into memory_failure_hugetlb() for hugetlb pages. memory_failure() and memory_failure_hugetlb() shares some functions like hwpoison_user_mappings() and identify_page_state(), so they should properly handle 4kB page, thp, and hugetlb. Signed-off-by: Naoya Horiguchi <[email protected]> Signed-off-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tony Luck <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Qian Cai <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: Oscar Salvador <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/readahead: pass a file_ra_state into force_page_cache_raDavid Howells2-5/+5
The file_ra_state being passed into page_cache_sync_readahead() was being ignored in favour of using the one embedded in the struct file. The only caller for which this makes a difference is the fsverity code if the file has been marked as POSIX_FADV_RANDOM, but it's confusing and worth fixing. Signed-off-by: David Howells <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Eric Biggers <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/filemap: fold ra_submit into do_sync_mmap_readaheadDavid Howells2-15/+5
Fold ra_submit() into its last remaining user and pass the readahead_control struct to both do_page_cache_ra() and page_cache_sync_ra(). Signed-off-by: David Howells <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Eric Biggers <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/readahead: add page_cache_sync_ra and page_cache_async_raMatthew Wilcox (Oracle)2-56/+66
Reimplement page_cache_sync_readahead() and page_cache_async_readahead() as wrappers around versions of the function which take a readahead_control in preparation for making do_sync_mmap_readahead() pass down an RAC struct. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: David Howells <[email protected]> Cc: Eric Biggers <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/readahead: pass readahead_control to force_page_cache_raDavid Howells2-12/+19
Reimplement force_page_cache_readahead() as a wrapper around force_page_cache_ra(). Pass the existing readahead_control from page_cache_sync_readahead(). Signed-off-by: David Howells <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Eric Biggers <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/readahead: make ondemand_readahead take a readahead_controlDavid Howells1-12/+17
Make ondemand_readahead() take a readahead_control struct in preparation for making do_sync_mmap_readahead() pass down an RAC struct. Signed-off-by: David Howells <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Eric Biggers <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/readahead: make do_page_cache_ra take a readahead_controlMatthew Wilcox (Oracle)2-19/+20
Rename __do_page_cache_readahead() to do_page_cache_ra() and call it directly from ondemand_readahead() instead of indirecting via ra_submit(). Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: David Howells <[email protected]> Cc: Eric Biggers <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/readahead: make page_cache_ra_unbounded take a readahead_controlMatthew Wilcox (Oracle)4-23/+20
Define it in the callers instead of in page_cache_ra_unbounded(). Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: David Howells <[email protected]> Cc: Eric Biggers <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/readahead: add DEFINE_READAHEADMatthew Wilcox (Oracle)2-5/+8
Patch series "Readahead patches for 5.9/5.10". These are infrastructure for both the THP patchset and for the fscache rewrite, For both pieces of infrastructure being build on top of this patchset, we want the ractl to be available higher in the call-stack. For David's work, he wants to add the 'critical page' to the ractl so that he knows which page NEEDS to be brought in from storage, and which ones are nice-to-have. We might want something similar in block storage too. It used to be simple -- the first page was the critical one, but then mmap added fault-around and so for that usecase, the middle page is the critical one. Anyway, I don't have any code to show that yet, we just know that the lowest point in the callchain where we have that information is do_sync_mmap_readahead() and so the ractl needs to start its life there. For THP, we havew the code that needs it. It's actually the apex patch to the series; the one which finally starts to allocate THPs and present them to consenting filesystems: http://git.infradead.org/users/willy/pagecache.git/commitdiff/798bcf30ab2eff278caad03a9edca74d2f8ae760 This patch (of 8): Allow for a more concise definition of a struct readahead_control. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Eric Biggers <[email protected]> Cc: David Howells <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm: fix a race during THP splittingHuang Ying1-6/+7
It is reported that the following bug is triggered if the HDD is used as swap device, [ 5758.157556] BUG: kernel NULL pointer dereference, address: 0000000000000007 [ 5758.165331] #PF: supervisor write access in kernel mode [ 5758.171161] #PF: error_code(0x0002) - not-present page [ 5758.176894] PGD 0 P4D 0 [ 5758.179721] Oops: 0002 [#1] SMP PTI [ 5758.183614] CPU: 10 PID: 316 Comm: kswapd1 Kdump: loaded Tainted: G S --------- --- 5.9.0-0.rc3.1.tst.el8.x86_64 #1 [ 5758.196717] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.01.0002.082220131453 08/22/2013 [ 5758.208176] RIP: 0010:split_swap_cluster+0x47/0x60 [ 5758.213522] Code: c1 e3 06 48 c1 eb 0f 48 8d 1c d8 48 89 df e8 d0 20 6a 00 80 63 07 fb 48 85 db 74 16 48 89 df c6 07 00 66 66 66 90 31 c0 5b c3 <80> 24 25 07 00 00 00 fb 31 c0 5b c3 b8 f0 ff ff ff 5b c3 66 0f 1f [ 5758.234478] RSP: 0018:ffffb147442d7af0 EFLAGS: 00010246 [ 5758.240309] RAX: 0000000000000000 RBX: 000000000014b217 RCX: ffffb14779fd9000 [ 5758.248281] RDX: 000000000014b217 RSI: ffff9c52f2ab1400 RDI: 000000000014b217 [ 5758.256246] RBP: ffffe00c51168080 R08: ffffe00c5116fe08 R09: ffff9c52fffd3000 [ 5758.264208] R10: ffffe00c511537c8 R11: ffff9c52fffd3c90 R12: 0000000000000000 [ 5758.272172] R13: ffffe00c51170000 R14: ffffe00c51170000 R15: ffffe00c51168040 [ 5758.280134] FS: 0000000000000000(0000) GS:ffff9c52f2a80000(0000) knlGS:0000000000000000 [ 5758.289163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5758.295575] CR2: 0000000000000007 CR3: 0000000022a0e003 CR4: 00000000000606e0 [ 5758.303538] Call Trace: [ 5758.306273] split_huge_page_to_list+0x88b/0x950 [ 5758.311433] deferred_split_scan+0x1ca/0x310 [ 5758.316202] do_shrink_slab+0x12c/0x2a0 [ 5758.320491] shrink_slab+0x20f/0x2c0 [ 5758.324482] shrink_node+0x240/0x6c0 [ 5758.328469] balance_pgdat+0x2d1/0x550 [ 5758.332652] kswapd+0x201/0x3c0 [ 5758.336157] ? finish_wait+0x80/0x80 [ 5758.340147] ? balance_pgdat+0x550/0x550 [ 5758.344525] kthread+0x114/0x130 [ 5758.348126] ? kthread_park+0x80/0x80 [ 5758.352214] ret_from_fork+0x22/0x30 [ 5758.356203] Modules linked in: fuse zram rfkill sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 iTCO_wdt crct10dif_pclmul iTCO_vendor_support drm_kms_helper crc32_pclmul ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops cec rapl joydev intel_cstate ipmi_si ipmi_devintf drm intel_uncore i2c_i801 ipmi_msghandler pcspkr lpc_ich mei_me i2c_smbus mei ioatdma ip_tables xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg igb ahci libahci i2c_algo_bit crc32c_intel libata dca wmi dm_mirror dm_region_hash dm_log dm_mod [ 5758.412673] CR2: 0000000000000007 [ 0.000000] Linux version 5.9.0-0.rc3.1.tst.el8.x86_64 ([email protected]) (gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), GNU ld version 2.30-79.el8) #1 SMP Wed Sep 9 16:03:34 EDT 2020 After further digging it's found that the following race condition exists in the original implementation, CPU1 CPU2 ---- ---- deferred_split_scan() split_huge_page(page) /* page isn't compound head */ split_huge_page_to_list(page, NULL) __split_huge_page(page, ) ClearPageCompound(head) /* unlock all subpages except page (not head) */ add_to_swap(head) /* not THP */ get_swap_page(head) add_to_swap_cache(head, ) SetPageSwapCache(head) if PageSwapCache(head) split_swap_cluster(/* swap entry of head */) /* Deref sis->cluster_info: NULL accessing! */ So, in split_huge_page_to_list(), PageSwapCache() is called for the already split and unlocked "head", which may be added to swap cache in another CPU. So split_swap_cluster() may be called wrongly. To fix the race, the call to split_swap_cluster() is moved to __split_huge_page() before all subpages are unlocked. So that the PageSwapCache() is stable. Fixes: 59807685a7e77 ("mm, THP, swap: support splitting THP for THP swap out") Reported-by: Rafael Aquini <[email protected]> Signed-off-by: "Huang, Ying" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Rafael Aquini <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Matthew Wilcox <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16fs: do not update nr_thps for mappings which support THPsMatthew Wilcox (Oracle)2-27/+29
The nr_thps counter is to support THPs in the page cache when the filesystem doesn't understand THPs. Eventually it will be removed, but we should still support filesystems which do not understand THPs yet. Move the nr_thp manipulation functions to filemap.h since they're page-cache specific. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Alexander Viro <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Song Liu <[email protected]> Cc: Rik van Riel <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16fs: add a filesystem flag for THPsMatthew Wilcox (Oracle)4-1/+10
The page cache needs to know whether the filesystem supports THPs so that it doesn't send THPs to filesystems which can't handle them. Dave Chinner points out that getting from the page mapping to the filesystem type is too many steps (mapping->host->i_sb->s_type->fs_flags) so cache that information in the address space flags. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Alexander Viro <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Song Liu <[email protected]> Cc: Rik van Riel <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/vmscan: allow arbitrary sized pages to be paged outMatthew Wilcox (Oracle)1-2/+1
Remove the assumption that a compound page has HPAGE_PMD_NR pins from the page cache. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: "Huang, Ying" <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/page-writeback: support tail pages in wait_for_stable_pageMatthew Wilcox (Oracle)1-0/+1
page->mapping is undefined for tail pages, so operate exclusively on the head page. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Huang Ying <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/truncate: fix truncation for pages of arbitrary sizeMatthew Wilcox (Oracle)1-3/+3
Remove the assumption that a compound page is HPAGE_PMD_SIZE, and the assumption that any page is PAGE_SIZE. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Huang Ying <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/rmap: fix assumptions of THP sizeMatthew Wilcox (Oracle)1-5/+5
Ask the page what size it is instead of assuming it's PMD size. Do this for anon pages as well as file pages for when someone decides to support that. Leave the assumption alone for pages which are PMD mapped; we don't currently grow THPs beyond PMD size, so we don't need to change this code yet. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Huang Ying <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/huge_memory: fix can_split_huge_page assumption of THP sizeMatthew Wilcox (Oracle)1-2/+2
Ask the page how many subpages it has instead of assuming it's PMD size. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: "Huang, Ying" <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/huge_memory: fix page_trans_huge_mapcount assumption of THP sizeMatthew Wilcox (Oracle)1-2/+2
Ask the page what size it is instead of assuming it's PMD size. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Huang Ying <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/huge_memory: fix split assumption of page sizeKirill A. Shutemov1-7/+8
File THPs may now be of arbitrary size, and we can't rely on that size after doing the split so remember the number of pages before we start the split. Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Cc: Huang Ying <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/huge_memory: fix total_mapcount assumption of page sizeKirill A. Shutemov1-4/+5
File THPs may now be of arbitrary order. Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Cc: Huang Ying <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/page_owner: change split_page_owner to take a countMatthew Wilcox (Oracle)4-7/+7
The implementation of split_page_owner() prefers a count rather than the old order of the page. When we support a variable size THP, we won't have the order at this point, but we will have the number of pages. So change the interface to what the caller and callee would prefer. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Huang Ying <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/memory: remove page fault assumption of compound page sizeMatthew Wilcox (Oracle)1-3/+4
A compound page in the page cache will not necessarily be of PMD size, so check explicitly. [[email protected]: fix remove page fault assumption of compound page size] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/filemap: fix page cache removal for arbitrary sized THPsMatthew Wilcox (Oracle)1-1/+1
Patch series "Remove assumptions of THP size". There are a number of places in the VM which assume that a THP is a PMD in size. That's true today, and remains true after this patch series, but this is a prerequisite for switching to arbitrary-sized THPs. thp_nr_pages() still returns either HPAGE_PMD_NR or 1, but will be changed later. This patch (of 11): page_cache_free_page() assumes THPs are PMD_SIZE; fix that assumption. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Huang Ying <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/filemap: fix storing to a THP shadow entryMatthew Wilcox (Oracle)1-11/+31
When a THP is removed from the page cache by reclaim, we replace it with a shadow entry that occupies all slots of the XArray previously occupied by the THP. If the user then accesses that page again, we only allocate a single page, but storing it into the shadow entry replaces all entries with that one page. That leads to bugs like page dumped because: VM_BUG_ON_PAGE(page_to_pgoff(page) != offset) ------------[ cut here ]------------ kernel BUG at mm/filemap.c:2529! https://bugzilla.kernel.org/show_bug.cgi?id=206569 This is hard to reproduce with mainline, but happens regularly with the THP patchset (as so many more THPs are created). This solution is take from the THP patchset. It splits the shadow entry into order-0 pieces at the time that we bring a new page into cache. Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Song Liu <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Qian Cai <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16XArray: add xas_splitMatthew Wilcox (Oracle)4-16/+225
In order to use multi-index entries for huge pages in the page cache, we need to be able to split a multi-index entry (eg if a file is truncated in the middle of a huge page entry). This version does not support splitting more than one level of the tree at a time. This is an acceptable limitation for the page cache as we do not expect to support order-12 pages in the near future. [[email protected]: export xas_split_alloc() to modules] [[email protected]: fix xarray split] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix xarray] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Qian Cai <[email protected]> Cc: Song Liu <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16XArray: add xa_get_orderMatthew Wilcox (Oracle)3-0/+70
Patch series "Fix read-only THP for non-tmpfs filesystems". As described more verbosely in the [3/3] changelog, we can inadvertently put an order-0 page in the page cache which occupies 512 consecutive entries. Users are running into this if they enable the READ_ONLY_THP_FOR_FS config option; see https://bugzilla.kernel.org/show_bug.cgi?id=206569 and Qian Cai has also reported it here: https://lore.kernel.org/lkml/[email protected]/ This is a rather intrusive way of fixing the problem, but has the advantage that I've actually been testing it with the THP patches, which means that it sees far more use than it does upstream -- indeed, Song has been entirely unable to reproduce it. It also has the advantage that it removes a few patches from my gargantuan backlog of THP patches. This patch (of 3): This function returns the order of the entry at the index. We need this because there isn't space in the shadow entry to encode its order. [[email protected]: export xa_get_order to modules] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Qian Cai <[email protected]> Cc: Song Liu <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-16mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped.Aneesh Kumar K.V1-3/+8
With highmem, pte_alloc_map() keep the level4 page table mapped using kmap_atomic(). Avoid doing new memory allocation with page table mapped like above. [ 9.409233] BUG: sleeping function called from invalid context at mm/page_alloc.c:4822 [ 9.410557] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper [ 9.411932] no locks held by swapper/1. [ 9.412595] CPU: 0 PID: 1 Comm: swapper Not tainted 5.9.0-rc3-00323-gc50eb1ed654b5 #2 [ 9.413824] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 9.415207] Call Trace: [ 9.415651] ? ___might_sleep.cold+0xa7/0xcc [ 9.416367] ? __alloc_pages_nodemask+0x14c/0x5b0 [ 9.417055] ? swap_migration_tests+0x50/0x293 [ 9.417704] ? debug_vm_pgtable+0x4bc/0x708 [ 9.418287] ? swap_migration_tests+0x293/0x293 [ 9.418911] ? do_one_initcall+0x82/0x3cb [ 9.419465] ? parse_args+0x1bd/0x280 [ 9.419983] ? rcu_read_lock_sched_held+0x36/0x60 [ 9.420673] ? trace_initcall_level+0x1f/0xf3 [ 9.421279] ? trace_initcall_level+0xbd/0xf3 [ 9.421881] ? do_basic_setup+0x9d/0xdd [ 9.422410] ? do_basic_setup+0xc3/0xdd [ 9.422938] ? kernel_init_freeable+0x72/0xa3 [ 9.423539] ? rest_init+0x134/0x134 [ 9.424055] ? kernel_init+0x5/0x12c [ 9.424574] ? ret_from_fork+0x19/0x30 Reported-by: kernel test robot <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>