aboutsummaryrefslogtreecommitdiff
path: root/mm/hugetlb.c
AgeCommit message (Collapse)AuthorFilesLines
2021-02-24mm/hugetlb: remove unnecessary VM_BUG_ON_PAGE on putback_active_hugepage()Miaohe Lin1-1/+0
All callers know they are operating on a hugetlb head page. So this VM_BUG_ON_PAGE can not catch anything useful. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: use helper function range_in_vma() in page_table_shareable()Miaohe Lin1-1/+1
We could use helper function range_in_vma() to check whether the vma is in the desired range to simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb.c: fix typos in commentsZhiyuan Dai1-1/+1
Fix typo in comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhiyuan Dai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: remove redundant check in preparing and destroying gigantic pageYanfei Xu1-5/+2
Gigantic page is a compound page and its order is more than 1. Thus it must be available for hpage_pincount. Let's remove the redundant check for gigantic page. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yanfei Xu <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: fix some comment typosMiaohe Lin1-3/+3
Fix typos sasitfy to satisfy, reservtion to reservation, hugegpage to hugepage and uniprocesor to uniprocessor in comments. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Souptick Joarder <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: refactor subpage recordingJoao Martins1-21/+28
For a given hugepage backing a VA, there's a rather ineficient loop which is solely responsible for storing subpages in GUP @pages/@vmas array. For each subpage we check whether it's within range or size of @pages and keep increment @pfn_offset and a couple other variables per subpage iteration. Simplify this logic and minimize the cost of each iteration to just store the output page/vma. Instead of incrementing number of @refs iteratively, we do it through pre-calculation of @refs and only with a tight loop for storing pinned subpages/vmas. Additionally, retain existing behaviour with using mem_map_offset() when recording the subpages for configurations that don't have a contiguous mem_map. pinning consequently improves bringing us close to {pin,get}_user_pages_fast: - 16G with 1G huge page size gup_test -f /mnt/huge/file -m 16384 -r 30 -L -S -n 512 -w PIN_LONGTERM_BENCHMARK: ~12.8k us -> ~5.8k us PIN_FAST_BENCHMARK: ~3.7k us Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: grab head page refcount once for group of subpagesJoao Martins1-19/+24
Patch series "mm/hugetlb: follow_hugetlb_page() improvements", v2. While looking at ZONE_DEVICE struct page reuse particularly the last patch[0], I found two possible improvements for follow_hugetlb_page() which is solely used for get_user_pages()/pin_user_pages(). The first patch batches page refcount updates while the second tidies up storing the subpages/vmas. Both together bring the cost of slow variant of gup() cost from ~87.6k usecs to ~5.8k usecs. libhugetlbfs tests seem to pass as well gup_test benchmarks with hugetlbfs vmas. This patch (of 2): follow_hugetlb_page() once it locks the pmd/pud, checks all its N subpages in a huge page and grabs a reference for each one. Similar to gup-fast, have follow_hugetlb_page() grab the head page refcount only after counting all its subpages that are part of the just faulted huge page. Consequently we reduce the number of atomics necessary to pin said huge page, which improves non-fast gup() considerably: - 16G with 1G huge page size gup_test -f /mnt/huge/file -m 16384 -r 10 -L -S -n 512 -w PIN_LONGTERM_BENCHMARK: ~87.6k us -> ~12.8k us Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joao Martins <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: simplify the calculation of variablesJiapeng Zhong1-2/+1
Fix the following coccicheck warnings: mm/hugetlb.c:3372:20-22: WARNING !A || A && B is equivalent to !A || B. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiapeng Zhong <[email protected]> Reported-by: Abaci Robot <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: fix use after free when subpool max_hpages accounting is not enabledMiaohe Lin1-3/+13
If a hugetlbfs filesystem is created with the min_size option and without the size option, used_hpages is always 0 and might lead to release subpool prematurely because it indicates no pages are used now while there might be. In order to fix this issue, we should check used_hpages == 0 iff max_hpages accounting is enabled. As max_hpages accounting should be enabled in most common case, this is not worth a Cc stable. [[email protected]: new changelog] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hongxiang Lou <[email protected]> Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: use helper huge_page_order and pages_per_huge_pageMiaohe Lin1-2/+2
Since commit a5516438959d ("hugetlb: modular state for hugetlb page size"), we can use huge_page_order to access hstate->order and pages_per_huge_page to fetch the pages per huge page. But gather_bootmem_prealloc() forgot to use it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: avoid unnecessary hugetlb_acct_memory() callMiaohe Lin1-0/+3
When reservation accounting remains unchanged, hugetlb_acct_memory() will do nothing except holding and releasing hugetlb_lock. We should avoid this unnecessary hugetlb_lock lock/unlock cycle which is happening on 'most' hugetlb munmap operations by check delta against 0 at the beginning of hugetlb_acct_memory. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb.c: fix unnecessary address expansion of pmd sharingLi Xinhai1-10/+12
The current code would unnecessarily expand the address range. Consider one example, (start, end) = (1G-2M, 3G+2M), and (vm_start, vm_end) = (1G-4M, 3G+4M), the expected adjustment should be keep (1G-2M, 3G+2M) without expand. But the current result will be (1G-4M, 3G+4M). Actually, the range (1G-4M, 1G) and (3G, 3G+4M) would never been involved in pmd sharing. After this patch, we will check that the vma span at least one PUD aligned size and the start,end range overlap the aligned range of vma. With above example, the aligned vma range is (1G, 3G), so if (start, end) range is within (1G-4M, 1G), or within (3G, 3G+4M), then no adjustment to both start and end. Otherwise, we will have chance to adjust start downwards or end upwards without exceeding (vm_start, vm_end). Mike: : The 'adjusted range' is used for calls to mmu notifiers and cache(tlb) : flushing. Since the current code unnecessarily expands the range in some : cases, more entries than necessary would be flushed. This would/could : result in performance degradation. However, this is highly dependent on : the user runtime. Is there a combination of vma layout and calls to : actually hit this issue? If the issue is hit, will those entries : unnecessarily flushed be used again and need to be unnecessarily reloaded? Link: https://lkml.kernel.org/r/[email protected] Fixes: 75802ca66354 ("mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible") Signed-off-by: Li Xinhai <[email protected]> Suggested-by: Mike Kravetz <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-24mm/hugetlb: fix potential double free in hugetlb_register_node() error pathMiaohe Lin1-1/+3
In hugetlb_sysfs_add_hstate(), we would do kobject_put() on hstate_kobjs when failed to create sysfs group but forget to set hstate_kobjs to NULL. Then in hugetlb_register_node() error path, we may free it again via hugetlb_unregister_node(). Link: https://lkml.kernel.org/r/[email protected] Fixes: a3437870160c ("hugetlb: new sysfs interface") Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-21Merge tag 'core-mm-2021-02-17' of ↵Linus Torvalds1-16/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull tlb gather updates from Ingo Molnar: "Theses fix MM (soft-)dirty bit management in the procfs code & clean up the TLB gather API" * tag 'core-mm-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ldt: Use tlb_gather_mmu_fullmm() when freeing LDT page-tables tlb: arch: Remove empty __tlb_remove_tlb_entry() stubs tlb: mmu_gather: Remove start/end arguments from tlb_gather_mmu() tlb: mmu_gather: Introduce tlb_gather_mmu_fullmm() tlb: mmu_gather: Remove unused start/end arguments from tlb_finish_mmu() mm: proc: Invalidate TLB after clearing soft-dirty page state
2021-02-05mm: hugetlb: fix missing put_page in gather_surplus_pages()Muchun Song1-1/+4
The VM_BUG_ON_PAGE avoids the generation of any code, even if that expression has side-effects when !CONFIG_DEBUG_VM. Link: https://lkml.kernel.org/r/[email protected] Fixes: e5dfacebe4a4 ("mm/hugetlb.c: just use put_page_testzero() instead of page_count()") Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-05mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_activeMuchun Song1-2/+1
The page_huge_active() can be called from scan_movable_pages() which do not hold a reference count to the HugeTLB page. So when we call page_huge_active() from scan_movable_pages(), the HugeTLB page can be freed parallel. Then we will trigger a BUG_ON which is in the page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the VM_BUG_ON_PAGE. Link: https://lkml.kernel.org/r/[email protected] Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()") Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Yang Shi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-05mm: hugetlb: fix a race between isolating and freeing pageMuchun Song1-2/+2
There is a race between isolate_huge_page() and __free_huge_page(). CPU0: CPU1: if (PageHuge(page)) put_page(page) __free_huge_page(page) spin_lock(&hugetlb_lock) update_and_free_page(page) set_compound_page_dtor(page, NULL_COMPOUND_DTOR) spin_unlock(&hugetlb_lock) isolate_huge_page(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHead(page), page) spin_lock(&hugetlb_lock) page_huge_active(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHuge(page), page) spin_unlock(&hugetlb_lock) When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0, because it is already freed to the buddy allocator. Link: https://lkml.kernel.org/r/[email protected] Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Yang Shi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-05mm: hugetlb: fix a race between freeing and dissolving the pageMuchun Song1-0/+39
There is a race condition between __free_huge_page() and dissolve_free_huge_page(). CPU0: CPU1: // page_count(page) == 1 put_page(page) __free_huge_page(page) dissolve_free_huge_page(page) spin_lock(&hugetlb_lock) // PageHuge(page) && !page_count(page) update_and_free_page(page) // page is freed to the buddy spin_unlock(&hugetlb_lock) spin_lock(&hugetlb_lock) clear_page_huge_active(page) enqueue_huge_page(page) // It is wrong, the page is already freed spin_unlock(&hugetlb_lock) The race window is between put_page() and dissolve_free_huge_page(). We should make sure that the page is already on the free list when it is dissolved. As a result __free_huge_page would corrupt page(s) already in the buddy allocator. Link: https://lkml.kernel.org/r/[email protected] Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Yang Shi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-05mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB pageMuchun Song1-1/+1
If a new hugetlb page is allocated during fallocate it will not be marked as active (set_page_huge_active) which will result in a later isolate_huge_page failure when the page migration code would like to move that page. Such a failure would be unexpected and wrong. Only export set_page_huge_active, just leave clear_page_huge_active as static. Because there are no external users. Link: https://lkml.kernel.org/r/[email protected] Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate()) Signed-off-by: Muchun Song <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Yang Shi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-01-29tlb: mmu_gather: Remove start/end arguments from tlb_gather_mmu()Will Deacon1-15/+1
The 'start' and 'end' arguments to tlb_gather_mmu() are no longer needed now that there is a separate function for 'fullmm' flushing. Remove the unused arguments and update all callers. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Yu Zhao <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/CAHk-=wjQWa14_4UpfDf=fiineNP+RH74kZeDMo_f1D35xNzq9w@mail.gmail.com
2021-01-29tlb: mmu_gather: Remove unused start/end arguments from tlb_finish_mmu()Will Deacon1-1/+1
Since commit 7a30df49f63a ("mm: mmu_gather: remove __tlb_reset_range() for force flush"), the 'start' and 'end' arguments to tlb_finish_mmu() are no longer used, since we flush the whole mm in case of a nested invalidation. Remove the unused arguments and update all callers. Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Yu Zhao <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Linus Torvalds <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-01-12mm/hugetlb: fix potential missing huge page size infoMiaohe Lin1-1/+1
The huge page size is encoded for VM_FAULT_HWPOISON errors only. So if we return VM_FAULT_HWPOISON, huge page size would just be ignored. Link: https://lkml.kernel.org/r/[email protected] Fixes: aa50d3a7aa81 ("Encode huge page size for VM_FAULT_HWPOISON errors") Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-29mm/hugetlb: fix deadlock in hugetlb_cow error pathMike Kravetz1-1/+21
syzbot reported the deadlock here [1]. The issue is in hugetlb cow error handling when there are not enough huge pages for the faulting task which took the original reservation. It is possible that other (child) tasks could have consumed pages associated with the reservation. In this case, we want the task which took the original reservation to succeed. So, we unmap any associated pages in children so that they can be used by the faulting task that owns the reservation. The unmapping code needs to hold i_mmap_rwsem in write mode. However, due to commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") we are already holding i_mmap_rwsem in read mode when hugetlb_cow is called. Technically, i_mmap_rwsem does not need to be held in read mode for COW mappings as they can not share pmd's. Modifying the fault code to not take i_mmap_rwsem in read mode for COW (and other non-sharable) mappings is too involved for a stable fix. Instead, we simply drop the hugetlb_fault_mutex and i_mmap_rwsem before unmapping. This is OK as it is technically not needed. They are reacquired after unmapping as expected by calling code. Since this is done in an uncommon error path, the overhead of dropping and reacquiring mutexes is acceptable. While making changes, remove redundant BUG_ON after unmap_ref_private. [1] https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Signed-off-by: Mike Kravetz <[email protected]> Reported-by: [email protected] Cc: Naoya Horiguchi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm: use sysfs_emit for struct kobject * usesJoe Perches1-6/+7
Patch series "mm: Convert sysfs sprintf family to sysfs_emit", v2. Use the new sysfs_emit family and not the sprintf family. This patch (of 5): Use the sysfs_emit function instead of the sprintf family. Done with cocci script as in commit 3c6bff3cf988 ("RDMA: Convert sysfs kobject * show functions to use sysfs_emit()") Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/9c249215bad6df616ba0410ad980042694970c1b.1605376435.git.joe@perches.com Signed-off-by: Joe Perches <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15hugetlb: fix an error code in hugetlb_reserve_pages()Dan Carpenter1-0/+1
Preserve the error code from region_add() instead of returning success. Link: https://lkml.kernel.org/r/X9NGZWnZl5/Mt99R@mwanda Fixes: 0db9d74ed884 ("hugetlb: disable region_add file_region coalescing") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Mina Almasry <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm,hugetlb: remove unneeded initializationOscar Salvador1-2/+0
hugetlb_add_hstate initializes nr_huge_pages and free_huge_pages to 0, but since hstates[] is a global variable, all its fields are defined to 0 already. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm: hugetlb: fix type of delta parameter and related local variables in ↵Liu Xiang1-3/+4
gather_surplus_pages() On 64-bit machine, delta variable in hugetlb_acct_memory() may be larger than 0xffffffff, but gather_surplus_pages() can only use the low 32-bit value now. So we need to fix type of delta parameter and related local variables in gather_surplus_pages(). Link: https://lkml.kernel.org/r/[email protected] Reported-by: Ma Chenggong <[email protected]> Signed-off-by: Liu Xiang <[email protected]> Signed-off-by: Pan Jiagen <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Liu Xiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15mm/hugetlb.c: just use put_page_testzero() instead of page_count()Hui Su1-2/+1
We test the page reference count is zero or not here, it can be a bug here if page refercence count is not zero. So we can just use put_page_testzero() instead of page_count(). Link: https://lkml.kernel.org/r/20201007170949.GA6416@rlk Signed-off-by: Hui Su <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-15vm_ops: rename .split() callback to .may_split()Dmitry Safonov1-1/+1
Rename the callback to reflect that it's not called *on* or *after* split, but rather some time before the splitting to check if it's possible. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Dmitry Safonov <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Brian Geffon <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Russell King <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-12-11mm/hugetlb: clear compound_nr before freeing gigantic pagesGerald Schaefer1-0/+1
Commit 1378a5ee451a ("mm: store compound_nr as well as compound_order") added compound_nr counter to first tail struct page, overlaying with page->mapping. The overlay itself is fine, but while freeing gigantic hugepages via free_contig_range(), a "bad page" check will trigger for non-NULL page->mapping on the first tail page: BUG: Bad page state in process bash pfn:380001 page:00000000c35f0856 refcount:0 mapcount:0 mapping:00000000126b68aa index:0x0 pfn:0x380001 aops:0x0 flags: 0x3ffff00000000000() raw: 3ffff00000000000 0000000000000100 0000000000000122 0000000100000000 raw: 0000000000000000 0000000000000000 ffffffff00000000 0000000000000000 page dumped because: non-NULL mapping Modules linked in: CPU: 6 PID: 616 Comm: bash Not tainted 5.10.0-rc7-next-20201208 #1 Hardware name: IBM 3906 M03 703 (LPAR) Call Trace: show_stack+0x6e/0xe8 dump_stack+0x90/0xc8 bad_page+0xd6/0x130 free_pcppages_bulk+0x26a/0x800 free_unref_page+0x6e/0x90 free_contig_range+0x94/0xe8 update_and_free_page+0x1c4/0x2c8 free_pool_huge_page+0x11e/0x138 set_max_huge_pages+0x228/0x300 nr_hugepages_store_common+0xb8/0x130 kernfs_fop_write+0xd2/0x218 vfs_write+0xb0/0x2b8 ksys_write+0xac/0xe0 system_call+0xe6/0x288 Disabling lock debugging due to kernel taint This is because only the compound_order is cleared in destroy_compound_gigantic_page(), and compound_nr is set to 1U << order == 1 for order 0 in set_compound_order(page, 0). Fix this by explicitly clearing compound_nr for first tail page after calling set_compound_order(page, 0). Link: https://lkml.kernel.org/r/[email protected] Fixes: 1378a5ee451a ("mm: store compound_nr as well as compound_order") Signed-off-by: Gerald Schaefer <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: <[email protected]> [5.9+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-11-14hugetlbfs: fix anon huge page migration raceMike Kravetz1-85/+5
Qian Cai reported the following BUG in [1] LTP: starting move_pages12 BUG: unable to handle page fault for address: ffffffffffffffe0 ... RIP: 0010:anon_vma_interval_tree_iter_first+0xa2/0x170 avc_start_pgoff at mm/interval_tree.c:63 Call Trace: rmap_walk_anon+0x141/0xa30 rmap_walk_anon at mm/rmap.c:1864 try_to_unmap+0x209/0x2d0 try_to_unmap at mm/rmap.c:1763 migrate_pages+0x1005/0x1fb0 move_pages_and_store_status.isra.47+0xd7/0x1a0 __x64_sys_move_pages+0xa5c/0x1100 do_syscall_64+0x5f/0x310 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Hugh Dickins diagnosed this as a migration bug caused by code introduced to use i_mmap_rwsem for pmd sharing synchronization. Specifically, the routine unmap_and_move_huge_page() is always passing the TTU_RMAP_LOCKED flag to try_to_unmap() while holding i_mmap_rwsem. This is wrong for anon pages as the anon_vma_lock should be held in this case. Further analysis suggested that i_mmap_rwsem was not required to he held at all when calling try_to_unmap for anon pages as an anon page could never be part of a shared pmd mapping. Discussion also revealed that the hack in hugetlb_page_mapping_lock_write to drop page lock and acquire i_mmap_rwsem is wrong. There is no way to keep mapping valid while dropping page lock. This patch does the following: - Do not take i_mmap_rwsem and set TTU_RMAP_LOCKED for anon pages when calling try_to_unmap. - Remove the hacky code in hugetlb_page_mapping_lock_write. The routine will now simply do a 'trylock' while still holding the page lock. If the trylock fails, it will return NULL. This could impact the callers: - migration calling code will receive -EAGAIN and retry up to the hard coded limit (10). - memory error code will treat the page as BUSY. This will force killing (SIGKILL) instead of SIGBUS any mapping tasks. Do note that this change in behavior only happens when there is a race. None of the standard kernel testing suites actually hit this race, but it is possible. [1] https://lore.kernel.org/lkml/[email protected]/ [2] https://lore.kernel.org/linux-mm/[email protected]/ Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Reported-by: Qian Cai <[email protected]> Suggested-by: Hugh Dickins <[email protected]> Signed-off-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-11-02hugetlb_cgroup: fix reservation accountingMike Kravetz1-9/+11
Michal Privoznik was using "free page reporting" in QEMU/virtio-balloon with hugetlbfs and hit the warning below. QEMU with free page hinting uses fallocate(FALLOC_FL_PUNCH_HOLE) to discard pages that are reported as free by a VM. The reporting granularity is in pageblock granularity. So when the guest reports 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE) one huge page in QEMU. WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50 Modules linked in: ... CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137 Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020 RIP: 0010:page_counter_uncharge+0x4b/0x50 ... Call Trace: hugetlb_cgroup_uncharge_file_region+0x4b/0x80 region_del+0x1d3/0x300 hugetlb_unreserve_pages+0x39/0xb0 remove_inode_hugepages+0x1a8/0x3d0 hugetlbfs_fallocate+0x3c4/0x5c0 vfs_fallocate+0x146/0x290 __x64_sys_fallocate+0x3e/0x70 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Investigation of the issue uncovered bugs in hugetlb cgroup reservation accounting. This patch addresses the found issues. Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings") Reported-by: Michal Privoznik <[email protected]> Co-developed-by: David Hildenbrand <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Michal Privoznik <[email protected]> Reviewed-by: Mina Almasry <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Cc: <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Tejun Heo <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-15Merge tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-2/+2
Pull dma-mapping updates from Christoph Hellwig: - rework the non-coherent DMA allocator - move private definitions out of <linux/dma-mapping.h> - lower CMA_ALIGNMENT (Paul Cercueil) - remove the omap1 dma address translation in favor of the common code - make dma-direct aware of multiple dma offset ranges (Jim Quinlan) - support per-node DMA CMA areas (Barry Song) - increase the default seg boundary limit (Nicolin Chen) - misc fixes (Robin Murphy, Thomas Tai, Xu Wang) - various cleanups * tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits) ARM/ixp4xx: add a missing include of dma-map-ops.h dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling dma-direct: factor out a dma_direct_alloc_from_pool helper dma-direct check for highmem pages in dma_direct_alloc_pages dma-mapping: merge <linux/dma-noncoherent.h> into <linux/dma-map-ops.h> dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma dma-mapping: move dma-debug.h to kernel/dma/ dma-mapping: remove <asm/dma-contiguous.h> dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h> dma-contiguous: remove dma_contiguous_set_default dma-contiguous: remove dev_set_cma_area dma-contiguous: remove dma_declare_contiguous dma-mapping: split <linux/dma-mapping.h> cma: decrease CMA_ALIGNMENT lower limit to 2 firewire-ohci: use dma_alloc_pages dma-iommu: implement ->alloc_noncoherent dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods dma-mapping: add a new dma_alloc_pages API dma-mapping: remove dma_cache_sync 53c700: convert to dma_alloc_noncoherent ...
2020-10-14Merge tag 'driver-core-5.10-rc1' of ↵Linus Torvalds1-8/+10
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of driver core patches for 5.10-rc1 They include a lot of different things, all related to the driver core and/or some driver logic: - sysfs common write functions to make it easier to audit sysfs attributes - device connection cleanups and fixes - devm helpers for a few functions - NOIO allocations for when devices are being removed - minor cleanups and fixes All have been in linux-next for a while with no reported issues" * tag 'driver-core-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits) regmap: debugfs: use semicolons rather than commas to separate statements platform/x86: intel_pmc_core: do not create a static struct device drivers core: node: Use a more typical macro definition style for ACCESS_ATTR drivers core: Use sysfs_emit for shared_cpu_map_show and shared_cpu_list_show mm: and drivers core: Convert hugetlb_report_node_meminfo to sysfs_emit drivers core: Miscellaneous changes for sysfs_emit drivers core: Reindent a couple uses around sysfs_emit drivers core: Remove strcat uses around sysfs_emit and neaten drivers core: Use sysfs_emit and sysfs_emit_at for show(device *...) functions sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output dyndbg: use keyword, arg varnames for query term pairs driver core: force NOIO allocations during unplug platform_device: switch to simpler IDA interface driver core: platform: Document return type of more functions Revert "driver core: Annotate dev_err_probe() with __must_check" Revert "test_firmware: Test platform fw loading on non-EFI systems" iio: adc: xilinx-xadc: use devm_krealloc() hwmon: pmbus: use more devres helpers devres: provide devm_krealloc() syscore: Use pm_pr_dbg() for syscore_{suspend,resume}() ...
2020-10-13hugetlb: add lockdep check for i_mmap_rwsem held in huge_pmd_shareMike Kravetz1-4/+11
As a debugging aid, huge_pmd_share should make sure i_mmap_rwsem is held if necessary. To clarify the 'if necessary', expand the comment block at the beginning of huge_pmd_share. No functional change. The added i_mmap_assert_locked() call is only enabled if CONFIG_LOCKDEP. Ideally, this should have been included with commit 34ae204f1851 ("hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem"). Signed-off-by: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Davidlohr Bueso <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-13mm/hugetlb: take the free hpage during the iteration directlyWei Yang1-13/+9
Function dequeue_huge_page_node_exact() iterates the free list and return the first valid free hpage. Instead of break and check the loop variant, we could return in the loop directly. This could reduce some redundant check. [[email protected]: points out a logic error] [[email protected]: v4] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Baoquan He <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Vlastimil Babka <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-13mm/hugetlb: narrow the hugetlb_lock protection area during preparing huge pageWei Yang1-1/+1
set_hugetlb_cgroup_[rsvd] just manipulate page local data, which is not necessary to be protected by hugetlb_lock. Let's take this out. Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Baoquan He <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Vlastimil Babka <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-13mm/hugetlb: a page from buddy is not on any listWei Yang1-1/+1
The page allocated from buddy is not on any list, so just use list_add() is enough. Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Baoquan He <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Vlastimil Babka <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-13mm/hugetlb: count file_region to be added when regions_needed != NULLWei Yang1-16/+17
There are only two cases of function add_reservation_in_range() * count file_region and return the number in regions_needed * do the real list operation without counting This means it is not necessary to have two parameters to classify these two cases. Just use regions_needed to separate them. Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Baoquan He <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Vlastimil Babka <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-13mm/hugetlb: use list_splice to merge two list at onceWei Yang1-5/+2
Instead of add allocated file_region one by one to region_cache, we could use list_splice to merge two list at once. Also we know the number of entries in the list, increase the number directly. Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Baoquan He <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Vlastimil Babka <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-13mm/hugetlb: remove VM_BUG_ON(!nrg) in get_file_region_entry_from_cache()Wei Yang1-1/+0
We are sure to get a valid file_region, otherwise the VM_BUG_ON(resv->region_cache_count <= 0) at the very beginning would be triggered. Let's remove the redundant one. Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vlastimil Babka <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-13mm/hugetlb: not necessary to coalesce regions recursivelyWei Yang1-5/+1
Patch series "mm/hugetlb: code refine and simplification", v4. Following are some cleanups for hugetlb. Simple testing with tools/testing/selftests/vm/map_hugetlb passes. This patch (of 7): Per my understanding, we keep the regions ordered and would always coalesce regions properly. So the task to keep this property is just to coalesce its neighbour. Let's simplify this. Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Baoquan He <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Vlastimil Babka <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-13mm/hugetlb.c: remove the unnecessary non_swap_entry()Baoquan He1-2/+2
If a swap entry tests positive for either is_[migration|hwpoison]_entry(), then its swap_type() is among SWP_MIGRATION_READ, SWP_MIGRATION_WRITE and SWP_HWPOISON. All these types >= MAX_SWAPFILES, exactly what is asserted with non_swap_entry(). So the checking non_swap_entry() in is_hugetlb_entry_migration() and is_hugetlb_entry_hwpoisoned() is redundant. Let's remove it to optimize code. Signed-off-by: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-13mm/hugetlb.c: make is_hugetlb_entry_hwpoisoned return boolBaoquan He1-4/+4
Patch series "mm/hugetlb: Small cleanup and improvement", v2. This patch (of 3): Just like its neighbour is_hugetlb_entry_migration() has done. Signed-off-by: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-02mm: and drivers core: Convert hugetlb_report_node_meminfo to sysfs_emitJoe Perches1-8/+10
Convert the unbound sprintf in hugetlb_report_node_meminfo to use sysfs_emit_at so that no possible overrun of a PAGE_SIZE buf can occur. Signed-off-by: Joe Perches <[email protected]> Acked-by: Mike Kravetz <[email protected]> Link: https://lore.kernel.org/r/894b351b82da6013cde7f36ff4b5493cd0ec30d0.1600285923.git.joe@perches.com Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-09-25Merge branch 'master' of ↵Christoph Hellwig1-12/+37
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into dma-mapping-for-next Pull in the latest 5.9 tree for the commit to revert the V4L2_FLAG_MEMORY_NON_CONSISTENT uapi addition.
2020-09-05mm/hugetlb: fix a race between hugetlb sysctl handlersMuchun Song1-6/+20
There is a race between the assignment of `table->data` and write value to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on the other thread. CPU0: CPU1: proc_sys_write hugetlb_sysctl_handler proc_sys_call_handler hugetlb_sysctl_handler_common hugetlb_sysctl_handler table->data = &tmp; hugetlb_sysctl_handler_common table->data = &tmp; proc_doulongvec_minmax do_proc_doulongvec_minmax sysctl_head_finish __do_proc_doulongvec_minmax unuse_table i = table->data; *i = val; // corrupt CPU1's stack Fix this by duplicating the `table`, and only update the duplicate of it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to simplify the code. The following oops was seen: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page Code: Bad RIP value. ... Call Trace: ? set_max_huge_pages+0x3da/0x4f0 ? alloc_pool_huge_page+0x150/0x150 ? proc_doulongvec_minmax+0x46/0x60 ? hugetlb_sysctl_handler_common+0x1c7/0x200 ? nr_hugepages_store+0x20/0x20 ? copy_fd_bitmaps+0x170/0x170 ? hugetlb_sysctl_handler+0x1e/0x20 ? proc_sys_call_handler+0x2f1/0x300 ? unregister_sysctl_table+0xb0/0xb0 ? __fd_install+0x78/0x100 ? proc_sys_write+0x14/0x20 ? __vfs_write+0x4d/0x90 ? vfs_write+0xef/0x240 ? ksys_write+0xc0/0x160 ? __ia32_sys_read+0x50/0x50 ? __close_fd+0x129/0x150 ? __x64_sys_write+0x43/0x50 ? do_syscall_64+0x6c/0x200 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes") Signed-off-by: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-09-05mm/hugetlb: try preferred node first when alloc gigantic page from cmaLi Xinhai1-6/+17
Since commit cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma"), the gigantic page would be allocated from node which is not the preferred node, although there are pages available from that node. The reason is that the nid parameter has been ignored in alloc_gigantic_page(). Besides, the __GFP_THISNODE also need be checked if user required to alloc only from the preferred node. After this patch, the preferred node is tried first before other allowed nodes, and don't try to allocate from other nodes if __GFP_THISNODE is specified. If user don't specify the preferred node, the current node will be used as preferred node, which makes sure consistent behavior of allocating gigantic and non-gigantic hugetlb page. Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") Signed-off-by: Li Xinhai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-09-01mm: cma: use CMA_MAX_NAME to define the length of cma name arrayBarry Song1-2/+2
CMA_MAX_NAME should be visible to CMA's users as they might need it to set the name of CMA areas and avoid hardcoding the size locally. So this patch moves CMA_MAX_NAME from local header file to include/linux header file and removes the hardcode in both hugetlb.c and contiguous.c. Signed-off-by: Barry Song <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-08-12mm/hugetlb: make hugetlb migration callback CMA awareJoonsoo Kim1-2/+9
new_non_cma_page() in gup.c requires to allocate the new page that is not on the CMA area. new_non_cma_page() implements it by using allocation scope APIs. However, there is a work-around for hugetlb. Normal hugetlb page allocation API for migration is alloc_huge_page_nodemask(). It consists of two steps. First is dequeing from the pool. Second is, if there is no available page on the queue, allocating by using the page allocator. new_non_cma_page() can't use this API since first step (deque) isn't aware of scope API to exclude CMA area. So, new_non_cma_page() exports hugetlb internal function for the second step, alloc_migrate_huge_page(), to global scope and uses it directly. This is suboptimal since hugetlb pages on the queue cannot be utilized. This patch tries to fix this situation by making the deque function on hugetlb CMA aware. In the deque function, CMA memory is skipped if PF_MEMALLOC_NOCMA flag is found. Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Mike Kravetz <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Roman Gushchin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>