aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/scripts/python/bin/task-analyzer-record
diff options
context:
space:
mode:
authorDavid Hildenbrand <[email protected]>2024-02-14 21:44:26 +0100
committerAndrew Morton <[email protected]>2024-02-22 15:27:17 -0800
commit789753e17c4d6593932f07e40b740373123296a6 (patch)
tree426c534d7aab6684396de990d2e9a9d175db158f /tools/perf/scripts/python/bin/task-analyzer-record
parentb93c28ff72d42b09d539ac68845307b90cf95e53 (diff)
mm/memory: factor out zapping of present pte into zap_present_pte()
Patch series "mm/memory: optimize unmap/zap with PTE-mapped THP", v3. This series is based on [1]. Similar to what we did with fork(), let's implement PTE batching during unmap/zap when processing PTE-mapped THPs. We collect consecutive PTEs that map consecutive pages of the same large folio, making sure that the other PTE bits are compatible, and (a) adjust the refcount only once per batch, (b) call rmap handling functions only once per batch, (c) perform batch PTE setting/updates and (d) perform TLB entry removal once per batch. Ryan was previously working on this in the context of cont-pte for arm64, int latest iteration [2] with a focus on arm6 with cont-pte only. This series implements the optimization for all architectures, independent of such PTE bits, teaches MMU gather/TLB code to be fully aware of such large-folio-pages batches as well, and amkes use of our new rmap batching function when removing the rmap. To achieve that, we have to enlighten MMU gather / page freeing code (i.e., everything that consumes encoded_page) to process unmapping of consecutive pages that all belong to the same large folio. I'm being very careful to not degrade order-0 performance, and it looks like I managed to achieve that. While this series should -- similar to [1] -- be beneficial for adding cont-pte support on arm64[2], it's one of the requirements for maintaining a total mapcount[3] for large folios with minimal added overhead and further changes[4] that build up on top of the total mapcount. Independent of all that, this series results in a speedup during munmap() and similar unmapping (process teardown, MADV_DONTNEED on larger ranges) with PTE-mapped THP, which is the default with THPs that are smaller than a PMD (for example, 16KiB to 1024KiB mTHPs for anonymous memory[5]). On an Intel Xeon Silver 4210R CPU, munmap'ing a 1GiB VMA backed by PTE-mapped folios of the same size (stddev < 1%) results in the following runtimes for munmap() in seconds (shorter is better): Folio Size | mm-unstable | New | Change --------------------------------------------- 4KiB | 0.058110 | 0.057715 | - 1% 16KiB | 0.044198 | 0.035469 | -20% 32KiB | 0.034216 | 0.023522 | -31% 64KiB | 0.029207 | 0.018434 | -37% 128KiB | 0.026579 | 0.014026 | -47% 256KiB | 0.025130 | 0.011756 | -53% 512KiB | 0.024292 | 0.010703 | -56% 1024KiB | 0.023812 | 0.010294 | -57% 2048KiB | 0.023785 | 0.009910 | -58% [1] https://lkml.kernel.org/r/[email protected] [2] https://lkml.kernel.org/r/[email protected] [3] https://lkml.kernel.org/r/[email protected] [4] https://lkml.kernel.org/r/[email protected] [5] https://lkml.kernel.org/r/[email protected] This patch (of 10): Let's prepare for further changes by factoring out processing of present PTEs. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: [email protected] Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Diffstat (limited to 'tools/perf/scripts/python/bin/task-analyzer-record')
0 files changed, 0 insertions, 0 deletions