aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorNishanth Aravamudan <[email protected]>2015-06-24 16:56:39 -0700
committerLinus Torvalds <[email protected]>2015-06-24 17:49:42 -0700
commitf012a84aff7a7f1d50b060e8b205ad68ffb86045 (patch)
tree06b0b6e17a8107fc86f4e7cbf5f56d12c6dadca4
parentf4d2897b930cb546d435f64fd4b74ea5d1223dff (diff)
mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages
Based upon 675becce15 ("mm: vmscan: do not throttle based on pfmemalloc reserves if node has no ZONE_NORMAL") from Mel. We have a system with the following topology: # numactl -H available: 3 nodes (0,2-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 0 size: 28273 MB node 0 free: 27323 MB node 2 cpus: node 2 size: 16384 MB node 2 free: 0 MB node 3 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 node 3 size: 30533 MB node 3 free: 13273 MB node distances: node 0 2 3 0: 10 20 20 2: 20 10 20 3: 20 20 10 Node 2 has no free memory, because: # cat /sys/devices/system/node/node2/hugepages/hugepages-16777216kB/nr_hugepages 1 This leads to the following zoneinfo: Node 2, zone DMA pages free 0 min 1840 low 2300 high 2760 scanned 0 spanned 262144 present 262144 managed 262144 ... all_unreclaimable: 1 If one then attempts to allocate some normal 16M hugepages via echo 37 > /proc/sys/vm/nr_hugepages The echo never returns and kswapd2 consumes CPU cycles. This is because throttle_direct_reclaim ends up calling wait_event(pfmemalloc_wait, pfmemalloc_watermark_ok...). pfmemalloc_watermark_ok() in turn checks all zones on the node if there are any reserves, and if so, then indicates the watermarks are ok, by seeing if there are sufficient free pages. 675becce15 added a condition already for memoryless nodes. In this case, though, the node has memory, it is just all consumed (and not reclaimable). Effectively, though, the result is the same on this call to pfmemalloc_watermark_ok() and thus seems like a reasonable additional condition. With this change, the afore-mentioned 16M hugepage allocation attempt succeeds and correctly round-robins between Nodes 1 and 3. Signed-off-by: Nishanth Aravamudan <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Dan Streetman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
-rw-r--r--mm/vmscan.c3
1 files changed, 2 insertions, 1 deletions
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5e8eadd71bac..c627fa4c991f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2646,7 +2646,8 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
for (i = 0; i <= ZONE_NORMAL; i++) {
zone = &pgdat->node_zones[i];
- if (!populated_zone(zone))
+ if (!populated_zone(zone) ||
+ zone_reclaimable_pages(zone) == 0)
continue;
pfmemalloc_reserve += min_wmark_pages(zone);