aboutsummaryrefslogtreecommitdiff
path: root/include/linux/compaction.h
AgeCommit message (Collapse)AuthorFilesLines
2012-07-31mm/compaction: cleanup on compaction_deferredGavin Shan1-2/+2
When CONFIG_COMPACTION is enabled, compaction_deferred() tries to recalculate the deferred limit again, which isn't necessary. When CONFIG_COMPACTION is disabled, compaction_deferred() should return "true" or "false" since it has "bool" for its return value. Signed-off-by: Gavin Shan <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-06-03Revert "mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks"Linus Torvalds1-19/+0
This reverts commit 5ceb9ce6fe9462a298bb2cd5c9f1ca6cb80a0199. That commit seems to be the cause of the mm compation list corruption issues that Dave Jones reported. The locking (or rather, absense there-of) is dubious, as is the use of the 'page' variable once it has been found to be outside the pageblock range. So revert it for now, we can re-visit this for 3.6. If we even need to: as Minchan Kim says, "The patch wasn't a bug fix and even test workload was very theoretical". Reported-and-tested-by: Dave Jones <[email protected]> Acked-by: Hugh Dickins <[email protected]> Acked-by: KOSAKI Motohiro <[email protected]> Acked-by: Minchan Kim <[email protected]> Cc: Bartlomiej Zolnierkiewicz <[email protected]> Cc: Kyungmin Park <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-05-29mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocksBartlomiej Zolnierkiewicz1-0/+19
When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE type pageblock (and some MIGRATE_MOVABLE pages are left in it) waiting until an allocation takes ownership of the block may take too long. The type of the pageblock remains unchanged so the pageblock cannot be used as a migration target during compaction. Fix it by: * Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE], and COMPACT_SYNC) and then converting sync field in struct compact_control to use it. * Adding nr_pageblocks_skipped field to struct compact_control and tracking how many destination pageblocks were of MIGRATE_UNMOVABLE type. If COMPACT_ASYNC_MOVABLE mode compaction ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies that there is not a suitable page for allocation. In this case then check how if there were enough MIGRATE_UNMOVABLE pageblocks to try a second pass in COMPACT_ASYNC_UNMOVABLE mode. * Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC and COMPACT_ASYNC_UNMOVABLE compaction modes) and building a count based on finding PageBuddy pages, page_count(page) == 0 or PageLRU pages. If all pages within the MIGRATE_UNMOVABLE pageblock are in one of those three sets change the whole pageblock type to MIGRATE_MOVABLE. My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means 131072 standard 4KiB pages in 'Normal' zone) is to: - allocate 120000 pages for kernel's usage - free every second page (60000 pages) of memory just allocated - allocate and use 60000 pages from user space - free remaining 60000 pages of kernel memory (now we have fragmented memory occupied mostly by user space pages) - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage The results: - with compaction disabled I get 11 successful allocations - with compaction enabled - 14 successful allocations - with this patch I'm able to get all 100 successful allocations NOTE: If we can make kswapd aware of order-0 request during compaction, we can enhance kswapd with changing mode to COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE). Please see the following thread: http://marc.info/?l=linux-mm&m=133552069417068&w=2 [[email protected]: minor cleanups] Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Marek Szyprowski <[email protected]> Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]> Signed-off-by: Kyungmin Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-03-21vmscan: only defer compaction for failed order and higherRik van Riel1-4/+10
Currently a failed order-9 (transparent hugepage) compaction can lead to memory compaction being temporarily disabled for a memory zone. Even if we only need compaction for an order 2 allocation, eg. for jumbo frames networking. The fix is relatively straightforward: keep track of the highest order at which compaction is succeeding, and only defer compaction for orders at which compaction is failing. Signed-off-by: Rik van Riel <[email protected]> Cc: Andrea Arcangeli <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Minchan Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-03-21vmscan: kswapd carefully call compactionRik van Riel1-0/+6
With CONFIG_COMPACTION enabled, kswapd does not try to free contiguous free pages, even when it is woken for a higher order request. This could be bad for eg. jumbo frame network allocations, which are done from interrupt context and cannot compact memory themselves. Higher than before allocation failure rates in the network receive path have been observed in kernels with compaction enabled. Teach kswapd to defragment the memory zones in a node, but only if required and compaction is not deferred in a zone. [[email protected]: reduce scope of zones_need_compaction] Signed-off-by: Rik van Riel <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Minchan Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-10-31mm: compaction: make compact_zone_order() staticKyungmin Park1-8/+0
There's no compact_zone_order() user outside file scope, so make it static. Signed-off-by: Kyungmin Park <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: compaction: prevent kswapd compacting memory to reduce CPU usageAndrea Arcangeli1-7/+2
This patch reverts 5a03b051 ("thp: use compaction in kswapd for GFP_ATOMIC order > 0") due to reports stating that kswapd CPU usage was higher and IRQs were being disabled more frequently. This was reported at http://www.spinics.net/linux/fedora/alsa-user/msg09885.html. Without this patch applied, CPU usage by kswapd hovers around the 20% mark according to the tester (Arthur Marsh: http://www.spinics.net/linux/fedora/alsa-user/msg09899.html). With this patch applied, it's around 2%. The problem is not related to THP which specifies __GFP_NO_KSWAPD but is triggered by high-order allocations hitting the low watermark for their order and waking kswapd on kernels with CONFIG_COMPACTION set. The most common trigger for this is network cards configured for jumbo frames but it's also possible it'll be triggered by fork-heavy workloads (order-1) and some wireless cards which depend on order-1 allocations. The symptoms for the user will be high CPU usage by kswapd in low-memory situations which could be confused with another writeback problem. While a patch like 5a03b051 may be reintroduced in the future, this patch plays it safe for now and reverts it. [[email protected]: Beefed up the changelog] Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Reported-by: Arthur Marsh <[email protected]> Tested-by: Arthur Marsh <[email protected]> Cc: <[email protected]> [2.6.38.1] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-01-13thp: use compaction in kswapd for GFP_ATOMIC order > 0Andrea Arcangeli1-3/+8
This takes advantage of memory compaction to properly generate pages of order > 0 if regular page reclaim fails and priority level becomes more severe and we don't reach the proper watermarks. Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-01-13mm: migration: allow migration to operate asynchronously and avoid ↵Mel Gorman1-4/+6
synchronous compaction in the faster path Migration synchronously waits for writeback if the initial passes fails. Callers of memory compaction do not necessarily want this behaviour if the caller is latency sensitive or expects that synchronous migration is not going to have a significantly better success rate. This patch adds a sync parameter to migrate_pages() allowing the caller to indicate if wait_on_page_writeback() is allowed within migration or not. For reclaim/compaction, try_to_compact_pages() is first called asynchronously, direct reclaim runs and then try_to_compact_pages() is called synchronously as there is a greater expectation that it'll succeed. [[email protected]: build/merge fix] Signed-off-by: Mel Gorman <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Rik van Riel <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Andy Whitcroft <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-01-13mm: vmscan: reclaim order-0 and use compaction instead of lumpy reclaimMel Gorman1-0/+14
Lumpy reclaim is disruptive. It reclaims a large number of pages and ignores the age of the pages it reclaims. This can incur significant stalls and potentially increase the number of major faults. Compaction has reached the point where it is considered reasonably stable (meaning it has passed a lot of testing) and is a potential candidate for displacing lumpy reclaim. This patch introduces an alternative to lumpy reclaim whe compaction is available called reclaim/compaction. The basic operation is very simple - instead of selecting a contiguous range of pages to reclaim, a number of order-0 pages are reclaimed and then compaction is later by either kswapd (compact_zone_order()) or direct compaction (__alloc_pages_direct_compact()). [[email protected]: fix build] [[email protected]: use conventional task_struct naming] Signed-off-by: Mel Gorman <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Rik van Riel <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Andy Whitcroft <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25mm: compaction: defer compaction using an exponential backoff when ↵Mel Gorman1-0/+39
compaction fails The fragmentation index may indicate that a failure is due to external fragmentation but after a compaction run completes, it is still possible for an allocation to fail. There are two obvious reasons as to why o Page migration cannot move all pages so fragmentation remains o A suitable page may exist but watermarks are not met In the event of compaction followed by an allocation failure, this patch defers further compaction in the zone (1 << compact_defer_shift) times. If the next compaction attempt also fails, compact_defer_shift is increased up to a maximum of 6. If compaction succeeds, the defer counters are reset again. The zone that is deferred is the first zone in the zonelist - i.e. the preferred zone. To defer compaction in the other zones, the information would need to be stored in the zonelist or implemented similar to the zonelist_cache. This would impact the fast-paths and is not justified at this time. Signed-off-by: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Minchan Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25mm: compaction: add a tunable that decides when memory should be compacted ↵Mel Gorman1-0/+3
and when it should be reclaimed The kernel applies some heuristics when deciding if memory should be compacted or reclaimed to satisfy a high-order allocation. One of these is based on the fragmentation. If the index is below 500, memory will not be compacted. This choice is arbitrary and not based on data. To help optimise the system and set a sensible default for this value, this patch adds a sysctl extfrag_threshold. The kernel will only compact memory if the fragmentation index is above the extfrag_threshold. [[email protected]: Fix build errors when proc fs is not configured] Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Minchan Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25mm: compaction: direct compact when a high-order allocation failsMel Gorman1-4/+20
Ordinarily when a high-order allocation fails, direct reclaim is entered to free pages to satisfy the allocation. With this patch, it is determined if an allocation failed due to external fragmentation instead of low memory and if so, the calling process will compact until a suitable page is freed. Compaction by moving pages in memory is considerably cheaper than paging out to disk and works where there are locked pages or no swap. If compaction fails to free a page of a suitable size, then reclaim will still occur. Direct compaction returns as soon as possible. As each block is compacted, it is checked if a suitable page has been freed and if so, it returns. [[email protected]: Fix build errors] [[email protected]: fix count_vm_event preempt in memory compaction direct reclaim] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25mm: compaction: add /sys trigger for per-node memory compactionMel Gorman1-0/+16
Add a per-node sysfs file called compact. When the file is written to, each zone in that node is compacted. The intention that this would be used by something like a job scheduler in a batch system before a job starts so that the job can allocate the maximum number of hugepages without significant start-up cost. Signed-off-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25mm: compaction: add /proc trigger for memory compactionMel Gorman1-0/+6
Add a proc file /proc/sys/vm/compact_memory. When an arbitrary value is written to the file, all zones are compacted. The expected user of such a trigger is a job scheduler that prepares the system before the target application runs. Signed-off-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25mm: compaction: memory compaction coreMel Gorman1-0/+9
This patch is the core of a mechanism which compacts memory in a zone by relocating movable pages towards the end of the zone. A single compaction run involves a migration scanner and a free scanner. Both scanners operate on pageblock-sized areas in the zone. The migration scanner starts at the bottom of the zone and searches for all movable pages within each area, isolating them onto a private list called migratelist. The free scanner starts at the top of the zone and searches for suitable areas and consumes the free pages within making them available for the migration scanner. The pages isolated for migration are then migrated to the newly isolated free pages. [[email protected]: Fix unsafe optimisation] [[email protected]: do not schedule work on other CPUs for compaction] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>