aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-02-24hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fsMiaohe Lin1-1/+1
Since commit e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes"), we can use macro default_hstate to get the struct hstate which we use by default. But init_hugetlbfs_fs() forgot to use it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr()Miaohe Lin1-2/+0
When we reach here with inode = NULL, we should have crashed as inode has already been dereferenced via hstate_inode. So this BUG_ON(!inode) does not take effect and should be removed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24hugetlbfs: remove special hugetlbfs_set_page_dirty()Mike Kravetz1-12/+1
Matthew Wilcox noticed that hugetlbfs_set_page_dirty always returns 0. Instead, it should return 1 or 0 depending on the previous state of the dirty bit. In addition, the call to compound_head is redundant as it is also performed in calling routine set_page_dirty. Replace the hugetlbfs specific routine hugetlbfs_set_page_dirty with __set_page_dirty_no_writeback as it addresses both of these issues. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: change hugetlb_reserve_pages() to type boolMike Kravetz3-26/+17
While reviewing a bug in hugetlb_reserve_pages, it was noticed that all callers ignore the return value. Any failure is considered an ENOMEM error by the callers. Change the function to be of type bool. The function will return true if the reservation was successful, false otherwise. Callers currently assume a zero return code indicates success. Change the callers to look for true to indicate success. No functional change, only code cleanup. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm, oom: fix a comment in dump_task()Tang Yizhou1-3/+2
If p is a kthread, it will be checked in oom_unkillable_task() so we can delete the corresponding comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tang Yizhou <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk()Miaohe Lin1-1/+1
The helper range_in_vma() is introduced via commit 017b1660df89 ("mm: migration: fix migration of huge PMD shared pages"). But we forgot to use it in queue_pages_test_walk(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24numa balancing: migrate on fault among multiple bound nodesHuang Ying2-1/+19
Now, NUMA balancing can only optimize the page placement among the NUMA nodes if the default memory policy is used. Because the memory policy specified explicitly should take precedence. But this seems too strict in some situations. For example, on a system with 4 NUMA nodes, if the memory of an application is bound to the node 0 and 1, NUMA balancing can potentially migrate the pages between the node 0 and 1 to reduce cross-node accessing without breaking the explicit memory binding policy. So in this patch, we add MPOL_F_NUMA_BALANCING mode flag to set_mempolicy() when mode is MPOL_BIND. With the flag specified, NUMA balancing will be enabled within the thread to optimize the page placement within the constrains of the specified memory binding policy. With the newly added flag, the NUMA balancing control mechanism becomes, - sysctl knob numa_balancing can enable/disable the NUMA balancing globally. - even if sysctl numa_balancing is enabled, the NUMA balancing will be disabled for the memory areas or applications with the explicit memory policy by default. - MPOL_F_NUMA_BALANCING can be used to enable the NUMA balancing for the applications when specifying the explicit memory policy (MPOL_BIND). Various page placement optimization based on the NUMA balancing can be done with these flags. As the first step, in this patch, if the memory of the application is bound to multiple nodes (MPOL_BIND), and in the hint page fault handler the accessing node are in the policy nodemask, the page will be tried to be migrated to the accessing node to reduce the cross-node accessing. If the newly added MPOL_F_NUMA_BALANCING flag is specified by an application on an old kernel version without its support, set_mempolicy() will return -1 and errno will be set to EINVAL. The application can use this behavior to run on both old and new kernel versions. And if the MPOL_F_NUMA_BALANCING flag is specified for the mode other than MPOL_BIND, set_mempolicy() will return -1 and errno will be set to EINVAL as before. Because we don't support optimization based on the NUMA balancing for these modes. In the previous version of the patch, we tried to reuse MPOL_MF_LAZY for mbind(). But that flag is tied to MPOL_MF_MOVE.*, so it seems not a good API/ABI for the purpose of the patch. And because it's not clear whether it's necessary to enable NUMA balancing for a specific memory area inside an application, so we only add the flag at the thread level (set_mempolicy()) instead of the memory area level (mbind()). We can do that when it become necessary. To test the patch, we run a test case as follows on a 4-node machine with 192 GB memory (48 GB per node). 1. Change pmbench memory accessing benchmark to call set_mempolicy() to bind its memory to node 1 and 3 and enable NUMA balancing. Some related code snippets are as follows, #include <numaif.h> #include <numa.h> struct bitmask *bmp; int ret; bmp = numa_parse_nodestring("1,3"); ret = set_mempolicy(MPOL_BIND | MPOL_F_NUMA_BALANCING, bmp->maskp, bmp->size + 1); /* If MPOL_F_NUMA_BALANCING isn't supported, fall back to MPOL_BIND */ if (ret < 0 && errno == EINVAL) ret = set_mempolicy(MPOL_BIND, bmp->maskp, bmp->size + 1); if (ret < 0) { perror("Failed to call set_mempolicy"); exit(-1); } 2. Run a memory eater on node 3 to use 40 GB memory before running pmbench. 3. Run pmbench with 64 processes, the working-set size of each process is 640 MB, so the total working-set size is 64 * 640 MB = 40 GB. The CPU and the memory (as in step 1.) of all pmbench processes is bound to node 1 and 3. So, after CPU usage is balanced, some pmbench processes run on the CPUs of the node 3 will access the memory of the node 1. 4. After the pmbench processes run for 100 seconds, kill the memory eater. Now it's possible for some pmbench processes to migrate their pages from node 1 to node 3 to reduce cross-node accessing. Test results show that, with the patch, the pages can be migrated from node 1 to node 3 after killing the memory eater, and the pmbench score can increase about 17.5%. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Michal Hocko <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm, compaction: make fast_isolate_freepages() stay within zoneVlastimil Babka1-5/+11
Compaction always operates on pages from a single given zone when isolating both pages to migrate and freepages. Pageblock boundaries are intersected with zone boundaries to be safe in case zone starts or ends in the middle of pageblock. The use of pageblock_pfn_to_page() protects against non-contiguous pageblocks. The functions fast_isolate_freepages() and fast_isolate_around() don't currently protect the fast freepage isolation thoroughly enough against these corner cases, and can result in freepage isolation operate outside of zone boundaries: - in fast_isolate_freepages() if we get a pfn from the first pageblock of a zone that starts in the middle of that pageblock, 'highest' can be a pfn outside of the zone. If we fail to isolate anything in this function, we may then call fast_isolate_around() on a pfn outside of the zone and there effectively do a set_pageblock_skip(page_to_pfn(highest)) which may currently hit a VM_BUG_ON() in some configurations - fast_isolate_around() checks only the zone end boundary and not beginning, nor that the pageblock is contiguous (with pageblock_pfn_to_page()) so it's possible that we end up calling isolate_freepages_block() on a range of pfn's from two different zones and end up e.g. isolating freepages under the wrong zone's lock. This patch should fix the above issues. Link: https://lkml.kernel.org/r/[email protected] Fixes: 5a811889de10 ("mm, compaction: use free lists to quickly locate a migration target") Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/compaction: fix misbehaviors of fast_find_migrateblock()Wonhyuk Yang1-15/+12
In the fast_find_migrateblock(), it iterates ocer the freelist to find the proper pageblock. But there are some misbehaviors. First, if the page we found is equal to cc->migrate_pfn, it is considered that we didn't find a suitable pageblock. Secondly, if the loop was terminated because order is less than PAGE_ALLOC_COSTLY_ORDER, it could be considered that we found a suitable one. Thirdly, if the skip bit is set on the page block and we goto continue, it doesn't check nr_scanned. Fourthly, if the page block's skip bit is set, it checks that page block is the last of list, which is unnecessary. Link: https://lkml.kernel.org/r/[email protected] Fixes: 70b44595eafe9 ("mm, compaction: use free lists to quickly locate a migration source") Signed-off-by: Wonhyuk Yang <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/compaction: correct deferral logic for proactive compactionCharan Teja Reddy1-6/+14
should_proactive_compact_node() returns true when sum of the weighted fragmentation score of all the zones in the node is greater than the wmark_high of compaction, which then triggers the proactive compaction that operates on the individual zones of the node. But proactive compaction runs on the zone only when its weighted fragmentation score is greater than wmark_low(=wmark_high - 10). This means that the sum of the weighted fragmentation scores of all the zones can exceed the wmark_high but individual weighted fragmentation zone scores can still be less than wmark_low which makes the unnecessary trigger of the proactive compaction only to return doing nothing. Issue with the return of proactive compaction with out even trying is its deferral. It is simply deferred for 1 << COMPACT_MAX_DEFER_SHIFT if the scores across the proactive compaction is same, thinking that compaction didn't make any progress but in reality it didn't even try. With the delay between successive retries for proactive compaction is 500msec, it can result into the deferral for ~30sec with out even trying the proactive compaction. Test scenario is that: compaction_proactiveness=50 thus the wmark_low = 50 and wmark_high = 60. System have 2 zones(Normal and Movable) with sizes 5GB and 6GB respectively. After opening some apps on the android, the weighted fragmentation scores of these zones are 47 and 49 respectively. Since the sum of these fragmentation scores are above the wmark_high which triggers the proactive compaction and there since the individual zones weighted fragmentation scores are below wmark_low, it returns without trying the proactive compaction. As a result the weighted fragmentation scores of the zones are still 47 and 49 which makes the existing logic to defer the compaction thinking that noprogress is made across the compaction. Fix this by checking just zone fragmentation score, not the weighted, in __compact_finished() and use the zones weighted fragmentation score in fragmentation_score_node(). In the test case above, If the weighted average of is above wmark_high, then individual score (not adjusted) of atleast one zone has to be above wmark_high. Thus it avoids the unnecessary trigger and deferrals of the proactive compaction. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Charan Teja Reddy <[email protected]> Suggested-by: Vlastimil Babka <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Vinayak Menon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLockedMiaohe Lin1-1/+0
The VM_BUG_ON_PAGE(!PageLocked(page), page) is also done in PageMovable. Remove this explicitly one. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/compaction: remove rcu_read_lock during page compactionAlex Shi1-4/+1
isolate_migratepages_block() used rcu_read_lock() with the intention of safeguarding against the mem_cgroup being destroyed concurrently; but its TestClearPageLRU already protects against that. Delete the unnecessary rcu_read_lock() and _unlock(). Hugh Dickins helped on commit log polishing, Thanks! Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alex Shi <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24z3fold: simplify the zhdr initialization code in init_z3fold_page()Miaohe Lin1-7/+1
We can simplify the zhdr initialization by memset() the zhdr first instead of set struct member to zero one by one. This would also make code more compact and clear. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24z3fold: remove unused attribute for release_z3fold_pageMiaohe Lin1-2/+1
Since commit dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim"), release_z3fold_page() is used again. So we can drop the unused attribute safely. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/vmscan: restore zone_reclaim_mode ABIDave Hansen2-7/+12
I went to go add a new RECLAIM_* mode for the zone_reclaim_mode sysctl. Like a good kernel developer, I also went to go update the documentation. I noticed that the bits in the documentation didn't match the bits in the #defines. The VM never explicitly checks the RECLAIM_ZONE bit. The bit is, however implicitly checked when checking 'node_reclaim_mode==0'. The RECLAIM_ZONE #define was removed in a cleanup. That, by itself is fine. But, when the bit was removed (bit 0) the _other_ bit locations also got changed. That's not OK because the bit values are documented to mean one specific thing. Users surely do not expect the meaning to change from kernel to kernel. The end result is that if someone had a script that did: sysctl vm.zone_reclaim_mode=1 it would have gone from enabling node reclaim for clean unmapped pages to writing out pages during node reclaim after the commit in question. That's not great. Put the bits back the way they were and add a comment so something like this is a bit harder to do again. Update the documentation to make it clear that the first bit is ignored. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Dave Hansen <[email protected]> Fixes: 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE") Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Alex Shi <[email protected]> Cc: Daniel Wagner <[email protected]> Cc: "Tobin C. Harding" <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Dan Williams <[email protected]> Cc: Qian Cai <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24hugetlb: fix uninitialized subpool pointerMike Kravetz1-0/+1
Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages() with linux-next 5.12.0-20210222. Call trace: hugepage_subpool_put_pages.part.0+0x2c/0x138 __free_huge_page+0xce/0x310 alloc_pool_huge_page+0x102/0x120 set_max_huge_pages+0x13e/0x350 hugetlb_sysctl_handler_common+0xd8/0x110 hugetlb_sysctl_handler+0x48/0x58 proc_sys_call_handler+0x138/0x238 new_sync_write+0x10e/0x198 vfs_write.part.0+0x12c/0x238 ksys_write+0x68/0xf8 do_syscall+0x82/0xd0 __do_syscall+0xb4/0xc8 system_call+0x72/0x98 This is a result of the change which moved the hugetlb page subpool pointer from page->private to page[1]->private. When new pages are allocated from the buddy allocator, the private field of the head page will be cleared, but the private field of subpages is not modified. Therefore, old values may remain. Fix by initializing hugetlb page subpool pointer in prep_new_huge_page(). Link: https://lkml.kernel.org/r/[email protected] Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags") Signed-off-by: Mike Kravetz <[email protected]> Reported-by: Gerald Schaefer <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Muchun Song <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Sven Schnelle <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24include/linux/hugetlb.h: add synchronization information for new hugetlb ↵Mike Kravetz1-0/+10
specific flags Add comments, no functional change. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24hugetlb: convert PageHugeFreed to HPageFreed flagMike Kravetz2-19/+7
Use new hugetlb specific HPageFreed flag to replace the PageHugeFreed interfaces. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Muchun Song <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24hugetlb: convert PageHugeTemporary() to HPageTemporary flagMike Kravetz2-29/+13
Use new hugetlb specific HPageTemporary flag to replace the PageHugeTemporary() interfaces. PageHugeTemporary does contain a PageHuge() check. However, this interface is only used within hugetlb code where we know we are dealing with a hugetlb page. Therefore, the check can be eliminated. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Muchun Song <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24hugetlb: convert page_huge_active() HPageMigratable flagMike Kravetz5-44/+25
Use the new hugetlb page specific flag HPageMigratable to replace the page_huge_active interfaces. By it's name, page_huge_active implied that a huge page was on the active list. However, that is not really what code checking the flag wanted to know. It really wanted to determine if the huge page could be migrated. This happens when the page is actually added to the page cache and/or task page table. This is the reasoning behind the name change. The VM_BUG_ON_PAGE() calls in the *_huge_active() interfaces are not really necessary as we KNOW the page is a hugetlb page. Therefore, they are removed. The routine page_huge_active checked for PageHeadHuge before testing the active bit. This is unnecessary in the case where we hold a reference or lock and know it is a hugetlb head page. page_huge_active is also called without holding a reference or lock (scan_movable_pages), and can race with code freeing the page. The extra check in page_huge_active shortened the race window, but did not prevent the race. Offline code calling scan_movable_pages already deals with these races, so removing the check is acceptable. Add comment to racy code. [[email protected]: remove set_page_huge_active() declaration from include/linux/hugetlb.h] Link: https://lkml.kernel.org/r/CAMZfGtUda+KoAZscU0718TN61cSFwp4zy=y2oZ=+6Z2TAZZwng@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Muchun Song <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24hugetlb: use page.private for hugetlb specific page flagsMike Kravetz3-32/+96
Patch series "create hugetlb flags to consolidate state", v3. While discussing a series of hugetlb fixes in [1], it became evident that the hugetlb specific page state information is stored in a somewhat haphazard manner. Code dealing with state information would be easier to read, understand and maintain if this information was stored in a consistent manner. This series uses page.private of the hugetlb head page for storing a set of hugetlb specific page flags. Routines are priovided for test, set and clear of the flags. [1] https://lore.kernel.org/r/[email protected] This patch (of 4): As hugetlbfs evolved, state information about hugetlb pages was added. One 'convenient' way of doing this was to use available fields in tail pages. Over time, it has become difficult to know the meaning or contents of fields simply by looking at a small bit of code. Sometimes, the naming is just confusing. For example: The PagePrivate flag indicates a huge page reservation was consumed and needs to be restored if an error is encountered and the page is freed before it is instantiated. The page.private field contains the pointer to a subpool if the page is associated with one. In an effort to make the code more readable, use page.private to contain hugetlb specific page flags. These flags will have test, set and clear functions similar to those used for 'normal' page flags. More importantly, an enum of flag values will be created with names that actually reflect their purpose. In this patch, - Create infrastructure for hugetlb specific page flag functions - Move subpool pointer to page[1].private to make way for flags Create routines with meaningful names to modify subpool field - Use new HPageRestoreReserve flag instead of PagePrivate Conversion of other state information will happen in subsequent patches. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Muchun Song <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm: workingset: clarify eviction order and distance calculationOscar Salvador1-1/+1
The premise of the refault distance is that it can be seen as a deficit of the inactive list space, so that if the inactive list would have had (R - E) more slots, the page would not have been evicted but promoted to the active list instead. However, the way the code is ordered right now set us to be off by one, so the real number of slots would be (R - E) + 1. I stumbled upon this when trying to understand the code and it puzzled me that the comments did not match what the code did. This it not an issue at all since evictions and refaults tend to happen in a number large enough that being off-by-one does not have any impact - and since the compiler and CPUs are free to rearrange the execution sequence anyway. But as Johannes says, it is better to re-arrange the code in the proper order since otherwise would be misleading to somebody who is actively reading and trying to understand the logic of the code - like it happened to me. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/vmscan.c: make lruvec_lru_size() staticYu Zhao2-3/+2
All other references to the function were removed after commit b910718a948a ("mm: vmscan: detect file thrashing at the reclaim root"). Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Reviewed-by: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24include/linux/mm_inline.h: fold __update_lru_size() into its sole callerYu Zhao1-8/+1
All other references to the function were removed after commit a892cb6b977f ("mm/vmscan.c: use update_lru_size() in update_lru_sizes()"). Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Reviewed-by: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24include/linux/mm_inline.h: fold page_lru_base_type() into its sole callerYu Zhao1-21/+6
We've removed all other references to this function. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Reviewed-by: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm: VM_BUG_ON lru page flagsYu Zhao3-3/+4
Move scattered VM_BUG_ONs to two essential places that cover all lru list additions and deletions. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Cc: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm: add __clear_page_lru_flags() to replace page_off_lru()Yu Zhao3-24/+13
Similar to page_off_lru(), the new function does non-atomic clearing of PageLRU() in addition to PageActive() and PageUnevictable(), on a page that has no references left. If PageActive() and PageUnevictable() are both set, refuse to clear either and leave them to bad_page(). This is a behavior change that is meant to help debug. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Cc: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/swap.c: don't pass "enum lru_list" to del_page_from_lru_list()Yu Zhao5-23/+17
The parameter is redundant in the sense that it can be potentially extracted from the "struct page" parameter by page_lru(). We need to make sure that existing PageActive() or PageUnevictable() remains until the function returns. A few places don't conform, and simple reordering fixes them. This patch may have left page_off_lru() seemingly odd, and we'll take care of it in the next patch. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Cc: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/swap.c: don't pass "enum lru_list" to trace_mm_lru_insertion()Yu Zhao2-11/+5
The parameter is redundant in the sense that it can be extracted from the "struct page" parameter by page_lru() correctly. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Reviewed-by: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm: don't pass "enum lru_list" to lru list addition functionsYu Zhao3-14/+15
The "enum lru_list" parameter to add_page_to_lru_list() and add_page_to_lru_list_tail() is redundant in the sense that it can be extracted from the "struct page" parameter by page_lru(). A caveat is that we need to make sure PageActive() or PageUnevictable() is correctly set or cleared before calling these two functions. And they are indeed. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Cc: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24include/linux/mm_inline.h: shuffle lru list addition and deletion functionsYu Zhao1-21/+21
These functions will call page_lru() in the following patches. Move them below page_lru() to avoid the forward declaration. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: Alex Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/vmscan.c: use add_page_to_lru_list()Yu Zhao1-5/+1
Patch series "mm: lru related cleanups", v2. The cleanups are intended to reduce the verbosity in lru list operations and make them less error-prone. A typical example would be how the patches change __activate_page(): static void __activate_page(struct page *page, struct lruvec *lruvec) { if (!PageActive(page) && !PageUnevictable(page)) { - int lru = page_lru_base_type(page); int nr_pages = thp_nr_pages(page); - del_page_from_lru_list(page, lruvec, lru); + del_page_from_lru_list(page, lruvec); SetPageActive(page); - lru += LRU_ACTIVE; - add_page_to_lru_list(page, lruvec, lru); + add_page_to_lru_list(page, lruvec); trace_mm_lru_activate(page); There are a few more places like __activate_page() and they are unnecessarily repetitive in terms of figuring out which list a page should be added onto or deleted from. And with the duplicated code removed, they are easier to read, IMO. Patch 1 to 5 basically cover the above. Patch 6 and 7 make code more robust by improving bug reporting. Patch 8, 9 and 10 take care of some dangling helpers left in header files. This patch (of 10): There is add_page_to_lru_list(), and move_pages_to_lru() should reuse it, not duplicate it. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Reviewed-by: Alex Shi <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/workingset.c: avoid unnecessary max_nodes estimation in count_shadow_nodes()Miaohe Lin1-3/+2
If list_lru_shrink_count is 0, we always return SHRINK_EMPTY regardless of the value of max_nodes. So we can return early if nodes == 0 to save some cpu cycles of approximating a reasonable limit for the nodes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/vmscan: __isolate_lru_page_prepare() cleanupAlex Shi3-39/+33
The function just returns 2 results, so using a 'switch' to deal with its result is unnecessary. Also simplify it to a bool func as Vlastimil suggested. Also remove 'goto' by reusing list_move(), and take Matthew Wilcox's suggestion to update comments in function. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alex Shi <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: suppress wrong warning info when alloc gigantic pageChen Wandun1-2/+2
If hugetlb_cma is enabled, it will skip boot time allocation when allocating gigantic page, that doesn't means allocation failure, so suppress this warning info. Link: https://lkml.kernel.org/r/[email protected] Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") Signed-off-by: Chen Wandun <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24hugetlb: fix copy_huge_page_from_user contig page struct assumptionMike Kravetz1-4/+6
page structs are not guaranteed to be contiguous for gigantic pages. The routine copy_huge_page_from_user can encounter gigantic pages, yet it assumes page structs are contiguous when copying pages from user space. Since page structs for the target gigantic page are not contiguous, the data copied from user space could overwrite other pages not associated with the gigantic page and cause data corruption. Non-contiguous page structs are generally not an issue. However, they can exist with a specific kernel configuration and hotplug operations. For example: Configure the kernel with CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the gigantic page will be allocated. Link: https://lkml.kernel.org/r/[email protected] Fixes: 8fb5debc5fcd ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Mike Kravetz <[email protected]> Cc: Zi Yan <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Joao Martins <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24hugetlb: fix update_and_free_page contig page struct assumptionMike Kravetz1-2/+4
page structs are not guaranteed to be contiguous for gigantic pages. The routine update_and_free_page can encounter a gigantic page, yet it assumes page structs are contiguous when setting page flags in subpages. If update_and_free_page encounters non-contiguous page structs, we can see “BUG: Bad page state in process …” errors. Non-contiguous page structs are generally not an issue. However, they can exist with a specific kernel configuration and hotplug operations. For example: Configure the kernel with CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the gigantic page will be allocated. Zi Yan outlined steps to reproduce here [1]. [1] https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime") Signed-off-by: Zi Yan <[email protected]> Signed-off-by: Mike Kravetz <[email protected]> Cc: Zi Yan <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Joao Martins <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: use helper huge_page_size() to get hugepage sizeMiaohe Lin1-8/+6
We can use helper huge_page_size() to get the hugepage size directly to simplify the code slightly. [[email protected]: use helper huge_page_size() to get hugepage size] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: remove unnecessary VM_BUG_ON_PAGE on putback_active_hugepage()Miaohe Lin1-1/+0
All callers know they are operating on a hugetlb head page. So this VM_BUG_ON_PAGE can not catch anything useful. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: use helper function range_in_vma() in page_table_shareable()Miaohe Lin1-1/+1
We could use helper function range_in_vma() to check whether the vma is in the desired range to simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24hugetlb_cgroup: use helper pages_per_huge_page() in hugetlb_cgroupMiaohe Lin1-3/+3
We could use helper function pages_per_huge_page() to get the number of pages in a hstate to simplify the code slightly. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/pmem: avoid inserting hugepage PTE entry with fsdax if hugepage support ↵Aneesh Kumar K.V2-7/+14
is disabled Differentiate between hardware not supporting hugepages and user disabling THP via 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' For the devdax namespace, the kernel handles the above via the supported_alignment attribute and failing to initialize the namespace if the namespace align value is not supported on the platform. For the fsdax namespace, the kernel will continue to initialize the namespace. This can result in the kernel creating a huge pte entry even though the hardware don't support the same. We do want hugepage support with pmem even if the end-user disabled THP via sysfs file (/sys/kernel/mm/transparent_hugepage/enabled). Hence differentiate between hardware/firmware lacking support vs user-controlled disable of THP and prevent a huge fault if the hardware lacks hugepage support. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Dan Williams <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Jan Kara <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/huge_memory.c: remove unused return value of set_huge_zero_page()Miaohe Lin1-3/+2
The return value of set_huge_zero_page() is always ignored. So we should drop such return value. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb.c: fix typos in commentsZhiyuan Dai1-1/+1
Fix typo in comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhiyuan Dai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: remove redundant check in preparing and destroying gigantic pageYanfei Xu1-5/+2
Gigantic page is a compound page and its order is more than 1. Thus it must be available for hpage_pincount. Let's remove the redundant check for gigantic page. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yanfei Xu <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: fix some comment typosMiaohe Lin2-4/+4
Fix typos sasitfy to satisfy, reservtion to reservation, hugegpage to hugepage and uniprocesor to uniprocessor in comments. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Souptick Joarder <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: refactor subpage recordingJoao Martins1-21/+28
For a given hugepage backing a VA, there's a rather ineficient loop which is solely responsible for storing subpages in GUP @pages/@vmas array. For each subpage we check whether it's within range or size of @pages and keep increment @pfn_offset and a couple other variables per subpage iteration. Simplify this logic and minimize the cost of each iteration to just store the output page/vma. Instead of incrementing number of @refs iteratively, we do it through pre-calculation of @refs and only with a tight loop for storing pinned subpages/vmas. Additionally, retain existing behaviour with using mem_map_offset() when recording the subpages for configurations that don't have a contiguous mem_map. pinning consequently improves bringing us close to {pin,get}_user_pages_fast: - 16G with 1G huge page size gup_test -f /mnt/huge/file -m 16384 -r 30 -L -S -n 512 -w PIN_LONGTERM_BENCHMARK: ~12.8k us -> ~5.8k us PIN_FAST_BENCHMARK: ~3.7k us Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: grab head page refcount once for group of subpagesJoao Martins3-22/+29
Patch series "mm/hugetlb: follow_hugetlb_page() improvements", v2. While looking at ZONE_DEVICE struct page reuse particularly the last patch[0], I found two possible improvements for follow_hugetlb_page() which is solely used for get_user_pages()/pin_user_pages(). The first patch batches page refcount updates while the second tidies up storing the subpages/vmas. Both together bring the cost of slow variant of gup() cost from ~87.6k usecs to ~5.8k usecs. libhugetlbfs tests seem to pass as well gup_test benchmarks with hugetlbfs vmas. This patch (of 2): follow_hugetlb_page() once it locks the pmd/pud, checks all its N subpages in a huge page and grabs a reference for each one. Similar to gup-fast, have follow_hugetlb_page() grab the head page refcount only after counting all its subpages that are part of the just faulted huge page. Consequently we reduce the number of atomics necessary to pin said huge page, which improves non-fast gup() considerably: - 16G with 1G huge page size gup_test -f /mnt/huge/file -m 16384 -r 10 -L -S -n 512 -w PIN_LONGTERM_BENCHMARK: ~87.6k us -> ~12.8k us Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: simplify the calculation of variablesJiapeng Zhong1-2/+1
Fix the following coccicheck warnings: mm/hugetlb.c:3372:20-22: WARNING !A || A && B is equivalent to !A || B. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiapeng Zhong <[email protected]> Reported-by: Abaci Robot <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: fix use after free when subpool max_hpages accounting is not enabledMiaohe Lin1-3/+13
If a hugetlbfs filesystem is created with the min_size option and without the size option, used_hpages is always 0 and might lead to release subpool prematurely because it indicates no pages are used now while there might be. In order to fix this issue, we should check used_hpages == 0 iff max_hpages accounting is enabled. As max_hpages accounting should be enabled in most common case, this is not worth a Cc stable. [[email protected]: new changelog] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hongxiang Lou <[email protected]> Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>