diff options
author | Waiman Long <[email protected]> | 2019-11-30 17:56:49 -0800 |
---|---|---|
committer | Linus Torvalds <[email protected]> | 2019-12-01 12:59:08 -0800 |
commit | 930668c34408ba983049322e04f13f03b6f1fafa (patch) | |
tree | 70b658341e9392125e05254ec17f8c956f2d17bd | |
parent | 1ab5b82f540b31852fbf4a3c975f3c16e0e76b9f (diff) |
hugetlbfs: take read_lock on i_mmap for PMD sharing
A customer with large SMP systems (up to 16 sockets) with application
that uses large amount of static hugepages (~500-1500GB) are
experiencing random multisecond delays. These delays were caused by the
long time it took to scan the VMA interval tree with mmap_sem held.
The sharing of huge PMD does not require changes to the i_mmap at all.
Therefore, we can just take the read lock and let other threads
searching for the right VMA share it in parallel. Once the right VMA is
found, either the PMD lock (2M huge page for x86-64) or the
mm->page_table_lock will be acquired to perform the actual PMD sharing.
Lock contention, if present, will happen in the spinlock. That is much
better than contention in the rwsem where the time needed to scan the
the interval tree is indeterminate.
With this patch applied, the customer is seeing significant performance
improvement over the unpatched kernel.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Waiman Long <[email protected]>
Suggested-by: Mike Kravetz <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
-rw-r--r-- | mm/hugetlb.c | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 39579f98d6f3..18c92cb9bf43 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4769,7 +4769,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) if (!vma_shareable(vma, addr)) return (pte_t *)pmd_alloc(mm, pud, addr); - i_mmap_lock_write(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) continue; @@ -4799,7 +4799,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) spin_unlock(ptl); out: pte = (pte_t *)pmd_alloc(mm, pud, addr); - i_mmap_unlock_write(mapping); + i_mmap_unlock_read(mapping); return pte; } |