aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLucas Stach <[email protected]>2024-03-18 21:07:36 +0100
committerAndrew Morton <[email protected]>2024-04-25 20:55:44 -0700
commit55f77df7d715110299f12c27f4365bd6332d1adb (patch)
tree26a62cc7617fc7e4575c90ad81d14665fe093584
parent13e860961fd4505d7247ba69e8516f577c2dee5a (diff)
mm: page_alloc: control latency caused by zone PCP draining
Patch series "mm/treewide: Remove pXd_huge() API", v2. In previous work [1], we removed the pXd_large() API, which is arch specific. This patchset further removes the hugetlb pXd_huge() API. Hugetlb was never special on creating huge mappings when compared with other huge mappings. Having a standalone API just to detect such pgtable entries is more or less redundant, especially after the pXd_leaf() API set is introduced with/without CONFIG_HUGETLB_PAGE. When looking at this problem, a few issues are also exposed that we don't have a clear definition of the *_huge() variance API. This patchset started by cleaning these issues first, then replace all *_huge() users to use *_leaf(), then drop all *_huge() code. On x86/sparc, swap entries will be reported "true" in pXd_huge(), while for all the rest archs they're reported "false" instead. This part is done in patch 1-5, in which I suspect patch 1 can be seen as a bug fix, but I'll leave that to hmm experts to decide. Besides, there are three archs (arm, arm64, powerpc) that have slightly different definitions between the *_huge() v.s. *_leaf() variances. I tackled them separately so that it'll be easier for arch experts to chim in when necessary. This part is done in patch 6-9. The final patches 10-14 do the rest on the final removal, since *_leaf() will be the ultimate API in the future, and we seem to have quite some confusions on how *_huge() APIs can be defined, provide a rich comment for *_leaf() API set to define them properly to avoid future misuse, and hopefully that'll also help new archs to start support huge mappings and avoid traps (like either swap entries, or PROT_NONE entry checks). [1] https://lore.kernel.org/r/[email protected] This patch (of 14): When the complete PCP is drained a much larger number of pages than the usual batch size might be freed at once, causing large IRQ and preemption latency spikes, as they are all freed while holding the pcp and zone spinlocks. To avoid those latency spikes, limit the number of pages freed in a single bulk operation to common batch limits. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Lucas Stach <[email protected]> Signed-off-by: Peter Xu <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Bjorn Andersson <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David S. Miller <[email protected]> Cc: Fabio Estevam <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Konrad Dybcio <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Cc: Mark Salter <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Russell King <[email protected]> Cc: Shawn Guo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
-rw-r--r--mm/page_alloc.c11
1 files changed, 7 insertions, 4 deletions
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 14d39f34d336..5083ac034d26 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2216,12 +2216,15 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
*/
static void drain_pages_zone(unsigned int cpu, struct zone *zone)
{
- struct per_cpu_pages *pcp;
+ struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
+ int count = READ_ONCE(pcp->count);
+
+ while (count) {
+ int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
+ count -= to_drain;
- pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
- if (pcp->count) {
spin_lock(&pcp->lock);
- free_pcppages_bulk(zone, pcp->count, pcp, 0);
+ free_pcppages_bulk(zone, to_drain, pcp, 0);
spin_unlock(&pcp->lock);
}
}