aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-01-21mm: compaction: encapsulate defer reset logicVlastimil Babka3-9/+21
Currently there are several functions to manipulate the deferred compaction state variables. The remaining case where the variables are touched directly is when a successful allocation occurs in direct compaction, or is expected to be successful in the future by kswapd. Here, the lowest order that is expected to fail is updated, and in the case of successful allocation, the deferred status and counter is reset completely. Create a new function compaction_defer_reset() to encapsulate this functionality and make it easier to understand the code. No functional change. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm: compaction: trace compaction begin and endMel Gorman2-0/+46
The broad goal of the series is to improve allocation success rates for huge pages through memory compaction, while trying not to increase the compaction overhead. The original objective was to reintroduce capturing of high-order pages freed by the compaction, before they are split by concurrent activity. However, several bugs and opportunities for simple improvements were found in the current implementation, mostly through extra tracepoints (which are however too ugly for now to be considered for sending). The patches mostly deal with two mechanisms that reduce compaction overhead, which is caching the progress of migrate and free scanners, and marking pageblocks where isolation failed to be skipped during further scans. Patch 1 (from mgorman) adds tracepoints that allow calculate time spent in compaction and potentially debug scanner pfn values. Patch 2 encapsulates the some functionality for handling deferred compactions for better maintainability, without a functional change type is not determined without being actually needed. Patch 3 fixes a bug where cached scanner pfn's are sometimes reset only after they have been read to initialize a compaction run. Patch 4 fixes a bug where scanners meeting is sometimes not properly detected and can lead to multiple compaction attempts quitting early without doing any work. Patch 5 improves the chances of sync compaction to process pageblocks that async compaction has skipped due to being !MIGRATE_MOVABLE. Patch 6 improves the chances of sync direct compaction to actually do anything when called after async compaction fails during allocation slowpath. The impact of patches were validated using mmtests's stress-highalloc benchmark with mmtests's stress-highalloc benchmark on a x86_64 machine with 4GB memory. Due to instability of the results (mostly related to the bugs fixed by patches 2 and 3), 10 iterations were performed, taking min,mean,max values for success rates and mean values for time and vmstat-based metrics. First, the default GFP_HIGHUSER_MOVABLE allocations were tested with the patches stacked on top of v3.13-rc2. Patch 2 is OK to serve as baseline due to no functional changes in 1 and 2. Comments below. stress-highalloc 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-nothp 3-nothp 4-nothp 5-nothp 6-nothp Success 1 Min 9.00 ( 0.00%) 10.00 (-11.11%) 43.00 (-377.78%) 43.00 (-377.78%) 33.00 (-266.67%) Success 1 Mean 27.50 ( 0.00%) 25.30 ( 8.00%) 45.50 (-65.45%) 45.90 (-66.91%) 46.30 (-68.36%) Success 1 Max 36.00 ( 0.00%) 36.00 ( 0.00%) 47.00 (-30.56%) 48.00 (-33.33%) 52.00 (-44.44%) Success 2 Min 10.00 ( 0.00%) 8.00 ( 20.00%) 46.00 (-360.00%) 45.00 (-350.00%) 35.00 (-250.00%) Success 2 Mean 26.40 ( 0.00%) 23.50 ( 10.98%) 47.30 (-79.17%) 47.60 (-80.30%) 48.10 (-82.20%) Success 2 Max 34.00 ( 0.00%) 33.00 ( 2.94%) 48.00 (-41.18%) 50.00 (-47.06%) 54.00 (-58.82%) Success 3 Min 65.00 ( 0.00%) 63.00 ( 3.08%) 85.00 (-30.77%) 84.00 (-29.23%) 85.00 (-30.77%) Success 3 Mean 76.70 ( 0.00%) 70.50 ( 8.08%) 86.20 (-12.39%) 85.50 (-11.47%) 86.00 (-12.13%) Success 3 Max 87.00 ( 0.00%) 86.00 ( 1.15%) 88.00 ( -1.15%) 87.00 ( 0.00%) 87.00 ( 0.00%) 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-nothp 3-nothp 4-nothp 5-nothp 6-nothp User 6437.72 6459.76 5960.32 5974.55 6019.67 System 1049.65 1049.09 1029.32 1031.47 1032.31 Elapsed 1856.77 1874.48 1949.97 1994.22 1983.15 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-nothp 3-nothp 4-nothp 5-nothp 6-nothp Minor Faults 253952267 254581900 250030122 250507333 250157829 Major Faults 420 407 506 530 530 Swap Ins 4 9 9 6 6 Swap Outs 398 375 345 346 333 Direct pages scanned 197538 189017 298574 287019 299063 Kswapd pages scanned 1809843 1801308 1846674 1873184 1861089 Kswapd pages reclaimed 1806972 1798684 1844219 1870509 1858622 Direct pages reclaimed 197227 188829 298380 286822 298835 Kswapd efficiency 99% 99% 99% 99% 99% Kswapd velocity 953.382 970.449 952.243 934.569 922.286 Direct efficiency 99% 99% 99% 99% 99% Direct velocity 104.058 101.832 153.961 143.200 148.205 Percentage direct scans 9% 9% 13% 13% 13% Zone normal velocity 347.289 359.676 348.063 339.933 332.983 Zone dma32 velocity 710.151 712.605 758.140 737.835 737.507 Zone dma velocity 0.000 0.000 0.000 0.000 0.000 Page writes by reclaim 557.600 429.000 353.600 426.400 381.800 Page writes file 159 53 7 79 48 Page writes anon 398 375 345 346 333 Page reclaim immediate 825 644 411 575 420 Sector Reads 2781750 2769780 2878547 2939128 2910483 Sector Writes 12080843 12083351 12012892 12002132 12010745 Page rescued immediate 0 0 0 0 0 Slabs scanned 1575654 1545344 1778406 1786700 1794073 Direct inode steals 9657 10037 15795 14104 14645 Kswapd inode steals 46857 46335 50543 50716 51796 Kswapd skipped wait 0 0 0 0 0 THP fault alloc 97 91 81 71 77 THP collapse alloc 456 506 546 544 565 THP splits 6 5 5 4 4 THP fault fallback 0 1 0 0 0 THP collapse fail 14 14 12 13 12 Compaction stalls 1006 980 1537 1536 1548 Compaction success 303 284 562 559 578 Compaction failures 702 696 974 976 969 Page migrate success 1177325 1070077 3927538 3781870 3877057 Page migrate failure 0 0 0 0 0 Compaction pages isolated 2547248 2306457 8301218 8008500 8200674 Compaction migrate scanned 42290478 38832618 153961130 154143900 159141197 Compaction free scanned 89199429 79189151 356529027 351943166 356326727 Compaction cost 1566 1426 5312 5156 5294 NUMA PTE updates 0 0 0 0 0 NUMA hint faults 0 0 0 0 0 NUMA hint local faults 0 0 0 0 0 NUMA hint local percent 100 100 100 100 100 NUMA pages migrated 0 0 0 0 0 AutoNUMA cost 0 0 0 0 0 Observations: - The "Success 3" line is allocation success rate with system idle (phases 1 and 2 are with background interference). I used to get stable values around 85% with vanilla 3.11. The lower min and mean values came with 3.12. This was bisected to commit 81c0a2bb ("mm: page_alloc: fair zone allocator policy") As explained in comment for patch 3, I don't think the commit is wrong, but that it makes the effect of compaction bugs worse. From patch 3 onwards, the results are OK and match the 3.11 results. - Patch 4 also clearly helps phases 1 and 2, and exceeds any results I've seen with 3.11 (I didn't measure it that thoroughly then, but it was never above 40%). - Compaction cost and number of scanned pages is higher, especially due to patch 4. However, keep in mind that patches 3 and 4 fix existing bugs in the current design of compaction overhead mitigation, they do not change it. If overhead is found unacceptable, then it should be decreased differently (and consistently, not due to random conditions) than the current implementation does. In contrast, patches 5 and 6 (which are not strictly bug fixes) do not increase the overhead (but also not success rates). This might be a limitation of the stress-highalloc benchmark as it's quite uniform. Another set of results is when configuring stress-highalloc t allocate with similar flags as THP uses: (GFP_HIGHUSER_MOVABLE|__GFP_NOMEMALLOC|__GFP_NORETRY|__GFP_NO_KSWAPD) stress-highalloc 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-thp 3-thp 4-thp 5-thp 6-thp Success 1 Min 2.00 ( 0.00%) 7.00 (-250.00%) 18.00 (-800.00%) 19.00 (-850.00%) 26.00 (-1200.00%) Success 1 Mean 19.20 ( 0.00%) 17.80 ( 7.29%) 29.20 (-52.08%) 29.90 (-55.73%) 32.80 (-70.83%) Success 1 Max 27.00 ( 0.00%) 29.00 ( -7.41%) 35.00 (-29.63%) 36.00 (-33.33%) 37.00 (-37.04%) Success 2 Min 3.00 ( 0.00%) 8.00 (-166.67%) 21.00 (-600.00%) 21.00 (-600.00%) 32.00 (-966.67%) Success 2 Mean 19.30 ( 0.00%) 17.90 ( 7.25%) 32.20 (-66.84%) 32.60 (-68.91%) 35.70 (-84.97%) Success 2 Max 27.00 ( 0.00%) 30.00 (-11.11%) 36.00 (-33.33%) 37.00 (-37.04%) 39.00 (-44.44%) Success 3 Min 62.00 ( 0.00%) 62.00 ( 0.00%) 85.00 (-37.10%) 75.00 (-20.97%) 64.00 ( -3.23%) Success 3 Mean 66.30 ( 0.00%) 65.50 ( 1.21%) 85.60 (-29.11%) 83.40 (-25.79%) 83.50 (-25.94%) Success 3 Max 70.00 ( 0.00%) 69.00 ( 1.43%) 87.00 (-24.29%) 86.00 (-22.86%) 87.00 (-24.29%) 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-thp 3-thp 4-thp 5-thp 6-thp User 6547.93 6475.85 6265.54 6289.46 6189.96 System 1053.42 1047.28 1043.23 1042.73 1038.73 Elapsed 1835.43 1821.96 1908.67 1912.74 1956.38 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-thp 3-thp 4-thp 5-thp 6-thp Minor Faults 256805673 253106328 253222299 249830289 251184418 Major Faults 395 375 423 434 448 Swap Ins 12 10 10 12 9 Swap Outs 530 537 487 455 415 Direct pages scanned 71859 86046 153244 152764 190713 Kswapd pages scanned 1900994 1870240 1898012 1892864 1880520 Kswapd pages reclaimed 1897814 1867428 1894939 1890125 1877924 Direct pages reclaimed 71766 85908 153167 152643 190600 Kswapd efficiency 99% 99% 99% 99% 99% Kswapd velocity 1029.000 1067.782 1000.091 991.049 951.218 Direct efficiency 99% 99% 99% 99% 99% Direct velocity 38.897 49.127 80.747 79.983 96.468 Percentage direct scans 3% 4% 7% 7% 9% Zone normal velocity 351.377 372.494 348.910 341.689 335.310 Zone dma32 velocity 716.520 744.414 731.928 729.343 712.377 Zone dma velocity 0.000 0.000 0.000 0.000 0.000 Page writes by reclaim 669.300 604.000 545.700 538.900 429.900 Page writes file 138 66 58 83 14 Page writes anon 530 537 487 455 415 Page reclaim immediate 806 655 772 548 517 Sector Reads 2711956 2703239 2811602 2818248 2839459 Sector Writes 12163238 12018662 12038248 11954736 11994892 Page rescued immediate 0 0 0 0 0 Slabs scanned 1385088 1388364 1507968 1513292 1558656 Direct inode steals 1739 2564 4622 5496 6007 Kswapd inode steals 47461 46406 47804 48013 48466 Kswapd skipped wait 0 0 0 0 0 THP fault alloc 110 82 84 69 70 THP collapse alloc 445 482 467 462 539 THP splits 6 5 4 5 3 THP fault fallback 3 0 0 0 0 THP collapse fail 15 14 14 14 13 Compaction stalls 659 685 1033 1073 1111 Compaction success 222 225 410 427 456 Compaction failures 436 460 622 646 655 Page migrate success 446594 439978 1085640 1095062 1131716 Page migrate failure 0 0 0 0 0 Compaction pages isolated 1029475 1013490 2453074 2482698 2565400 Compaction migrate scanned 9955461 11344259 24375202 27978356 30494204 Compaction free scanned 27715272 28544654 80150615 82898631 85756132 Compaction cost 552 555 1344 1379 1436 NUMA PTE updates 0 0 0 0 0 NUMA hint faults 0 0 0 0 0 NUMA hint local faults 0 0 0 0 0 NUMA hint local percent 100 100 100 100 100 NUMA pages migrated 0 0 0 0 0 AutoNUMA cost 0 0 0 0 0 There are some differences from the previous results for THP-like allocations: - Here, the bad result for unpatched kernel in phase 3 is much more consistent to be between 65-70% and not related to the "regression" in 3.12. Still there is the improvement from patch 4 onwards, which brings it on par with simple GFP_HIGHUSER_MOVABLE allocations. - Compaction costs have increased, but nowhere near as much as the non-THP case. Again, the patches should be worth the gained determininsm. - Patches 5 and 6 somewhat increase the number of migrate-scanned pages. This is most likely due to __GFP_NO_KSWAPD flag, which means the cached pfn's and pageblock skip bits are not reset by kswapd that often (at least in phase 3 where no concurrent activity would wake up kswapd) and the patches thus help the sync-after-async compaction. It doesn't however show that the sync compaction would help so much with success rates, which can be again seen as a limitation of the benchmark scenario. This patch (of 6): Add two tracepoints for compaction begin and end of a zone. Using this it is possible to calculate how much time a workload is spending within compaction and potentially debug problems related to cached pfns for scanning. In combination with the direct reclaim and slab trace points it should be possible to estimate most allocation-related overhead for a workload. Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21memcg, oom: lock mem_cgroup_print_oom_infoMichal Hocko1-5/+7
mem_cgroup_print_oom_info uses a static buffer (memcg_name) to store the name of the cgroup. This is not safe as pointed out by David Rientjes because memcg oom is locked only for its hierarchy and nothing prevents another parallel hierarchy to trigger oom as well and overwrite the already in-use buffer. This patch introduces oom_info_lock hidden inside mem_cgroup_print_oom_info which is held throughout the function. It makes access to memcg_name safe and as a bonus it also prevents parallel memcg ooms to interleave their statistics which would make the printed data hard to analyze otherwise. Signed-off-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21sched: add tracepoints related to NUMA task migrationMel Gorman3-1/+94
This patch adds three tracepoints o trace_sched_move_numa when a task is moved to a node o trace_sched_swap_numa when a task is swapped with another task o trace_sched_stick_numa when a numa-related migration fails The tracepoints allow the NUMA scheduler activity to be monitored and the following high-level metrics can be calculated o NUMA migrated stuck nr trace_sched_stick_numa o NUMA migrated idle nr trace_sched_move_numa o NUMA migrated swapped nr trace_sched_swap_numa o NUMA local swapped trace_sched_swap_numa src_nid == dst_nid (should never happen) o NUMA remote swapped trace_sched_swap_numa src_nid != dst_nid (should == NUMA migrated swapped) o NUMA group swapped trace_sched_swap_numa src_ngid == dst_ngid Maybe a small number of these are acceptable but a high number would be a major surprise. It would be even worse if bounces are frequent. o NUMA avg task migs. Average number of migrations for tasks o NUMA stddev task mig Self-explanatory o NUMA max task migs. Maximum number of migrations for a single task In general the intent of the tracepoints is to help diagnose problems where automatic NUMA balancing appears to be doing an excessive amount of useless work. [[email protected]: remove semicolon-after-if, repair coding-style] Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Alex Thorlton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm: numa: do not automatically migrate KSM pagesMel Gorman1-1/+2
KSM pages can be shared between tasks that are not necessarily related to each other from a NUMA perspective. This patch causes those pages to be ignored by automatic NUMA balancing so they do not migrate and do not cause unrelated tasks to be grouped together. Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Alex Thorlton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm: numa: trace tasks that fail migration due to rate limitingMel Gorman2-1/+30
A low local/remote numa hinting fault ratio is potentially explained by failed migrations. This patch adds a tracepoint that fires when migration fails due to migration rate limitation. Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Alex Thorlton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm: numa: limit scope of lock for NUMA migrate rate limitingMel Gorman2-13/+13
NUMA migrate rate limiting protects a migration counter and window using a lock but in some cases this can be a contended lock. It is not critical that the number of pages be perfect, lost updates are acceptable. Reduce the importance of this lock. Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Alex Thorlton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm: numa: make NUMA-migrate related functions staticMel Gorman1-2/+3
numamigrate_update_ratelimit and numamigrate_isolate_page only have callers in mm/migrate.c. This patch makes them static. Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: Alex Thorlton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21lib/show_mem.c: show num_poisoned_pages when oomXishi Qiu1-0/+3
Show num_poisoned_pages when oom, it is a little helpful to find the reason. Also it will be emitted anytime show_mem() is called. Signed-off-by: Xishi Qiu <[email protected]> Suggested-by: Naoya Horiguchi <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Wanpeng Li <[email protected]> Acked-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/hwpoison: add '#' to hwpoison_injectWanpeng Li1-1/+1
Add '#' to hwpoison_inject just as done in madvise_hwpoison. Signed-off-by: Wanpeng Li <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Reviewed-by: Vladimir Murzin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/memblock: use WARN_ONCE when MAX_NUMNODES passed as input parameterGrygorii Strashko1-13/+8
Check nid parameter and produce warning if it has deprecated MAX_NUMNODES value. Also re-assign NUMA_NO_NODE value to the nid parameter in this case. These will help to identify the wrong API usage (the caller) and make code simpler. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21x86/mm: memblock: switch to use NUMA_NO_NODEGrygorii Strashko3-3/+3
Update X86 code to use NUMA_NO_NODE instead of MAX_NUMNODES while calling memblock APIs, because memblock API will be changed to use NUMA_NO_NODE and will produce warning during boot otherwise. See: https://lkml.org/lkml/2013/12/9/898 Signed-off-by: Grygorii Strashko <[email protected]> Cc: Santosh Shilimkar <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21arch/arm/mach-omap2/omap_hwmod.c: use memblock apis for early memory allocationsSantosh Shilimkar1-6/+2
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Santosh Shilimkar <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Grygorii Strashko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21arch/arm/mm/init.c: use memblock apis for early memory allocationsSantosh Shilimkar1-1/+1
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Santosh Shilimkar <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Grygorii Strashko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21arch/arm/kernel/: use memblock apis for early memory allocationsSantosh Shilimkar2-2/+2
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Santosh Shilimkar <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Grygorii Strashko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21drivers/firmware/memmap.c: use memblock apis for early memory allocationsSantosh Shilimkar1-1/+1
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/memory_hotplug.c: use memblock apis for early memory allocationsSantosh Shilimkar1-1/+1
Correct ensure_zone_is_initialized() function description according to the introduced memblock APIs for early memory allocations. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/percpu.c: use memblock apis for early memory allocationsSantosh Shilimkar1-16/+22
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Santosh Shilimkar <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Grygorii Strashko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/page_cgroup.c: use memblock apis for early memory allocationsGrygorii Strashko1-2/+3
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Grygorii Strashko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/hugetlb.c: use memblock apis for early memory allocationsGrygorii Strashko1-5/+5
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/sparse: use memblock apis for early memory allocationsSantosh Shilimkar2-14/+19
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Santosh Shilimkar <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Grygorii Strashko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21lib/cpumask.c: use memblock apis for early memory allocationsSantosh Shilimkar1-2/+2
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Santosh Shilimkar <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Grygorii Strashko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21lib/swiotlb.c: use memblock apis for early memory allocationsSantosh Shilimkar1-15/+20
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Santosh Shilimkar <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Grygorii Strashko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21kernel/power/snapshot.c: use memblock apis for early memory allocationsSantosh Shilimkar1-1/+1
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Acked-by: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Grygorii Strashko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/page_alloc.c: use memblock apis for early memory allocationsSantosh Shilimkar1-12/+15
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Tejun Heo <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21kernel/printk/printk.c: use memblock apis for early memory allocationsSantosh Shilimkar1-7/+3
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Tejun Heo <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21init/main.c: use memblock apis for early memory allocationsSantosh Shilimkar1-3/+5
Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fall back to exiting bootmem APIs. Signed-off-by: Santosh Shilimkar <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Tejun Heo <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Grygorii Strashko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/memblock: add memblock memory allocation apisSantosh Shilimkar3-4/+361
Introduce memblock memory allocation APIs which allow to support PAE or LPAE extension on 32 bits archs where the physical memory start address can be beyond 4GB. In such cases, existing bootmem APIs which operate on 32 bit addresses won't work and needs memblock layer which operates on 64 bit addresses. So we add equivalent APIs so that we can replace usage of bootmem with memblock interfaces. Architectures already converted to NO_BOOTMEM use these new memblock interfaces. The architectures which are still not converted to NO_BOOTMEM continue to function as is because we still maintain the fal lback option of bootmem back-end supporting these new interfaces. So no functional change as such. In long run, once all the architectures moves to NO_BOOTMEM, we can get rid of bootmem layer completely. This is one step to remove the core code dependency with bootmem and also gives path for architectures to move away from bootmem. The proposed interface will became active if both CONFIG_HAVE_MEMBLOCK and CONFIG_NO_BOOTMEM are specified by arch. In case !CONFIG_NO_BOOTMEM, the memblock() wrappers will fallback to the existing bootmem apis so that arch's not converted to NO_BOOTMEM continue to work as is. The meaning of MEMBLOCK_ALLOC_ACCESSIBLE and MEMBLOCK_ALLOC_ANYWHERE is kept same. [[email protected]: s/depricated/deprecated/] Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Tejun Heo <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/memblock: switch to use NUMA_NO_NODE instead of MAX_NUMNODESGrygorii Strashko3-15/+25
It's recommended to use NUMA_NO_NODE everywhere to select "process any node" behavior or to indicate that "no node id specified". Hence, update __next_free_mem_range*() API's to accept both NUMA_NO_NODE and MAX_NUMNODES, but emit warning once on MAX_NUMNODES, and correct corresponding API's documentation to describe new behavior. Also, update other memblock/nobootmem APIs where MAX_NUMNODES is used dirrectly. The change was suggested by Tejun Heo. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Tejun Heo <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/memblock: reorder parameters of memblock_find_in_range_nodeGrygorii Strashko3-11/+12
Reorder parameters of memblock_find_in_range_node to be consistent with other memblock APIs. The change was suggested by Tejun Heo <[email protected]>. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Tejun Heo <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/memblock: drop WARN and use SMP_CACHE_BYTES as a default alignmentGrygorii Strashko1-2/+2
Don't produce warning and interpret 0 as "default align" equal to SMP_CACHE_BYTES in case if caller of memblock_alloc_base_nid() doesn't specify alignment for the block (align == 0). This is done in preparation of introducing common memblock alloc interface to make code behavior consistent. More details are in below thread : https://lkml.org/lkml/2013/10/13/117. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Tejun Heo <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/memblock: remove unnecessary inclusions of bootmem.hGrygorii Strashko2-2/+0
Clean-up to remove depedency with bootmem headers. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/bootmem: remove duplicated declaration of __free_pages_bootmem()Grygorii Strashko1-1/+0
The __free_pages_bootmem is used internally by MM core and already defined in internal.h. So, remove duplicated declaration. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/memblock: debug: don't free reserved array if !ARCH_DISCARD_MEMBLOCKGrygorii Strashko1-0/+13
Now the Nobootmem allocator will always try to free memory allocated for reserved memory regions (free_low_memory_core_early()) without taking into to account current memblock debugging configuration (CONFIG_ARCH_DISCARD_MEMBLOCK and CONFIG_DEBUG_FS state). As result if: - CONFIG_DEBUG_FS defined - CONFIG_ARCH_DISCARD_MEMBLOCK not defined; - reserved memory regions array have been resized during boot then: - memory allocated for reserved memory regions array will be freed to buddy allocator; - debug_fs entry "sys/kernel/debug/memblock/reserved" will show garbage instead of state of memory reservations. like: 0: 0x98393bc0..0x9a393bbf 1: 0xff120000..0xff11ffff 2: 0x00000000..0xffffffff Hence, do not free memory allocated for reserved memory regions if defined(CONFIG_DEBUG_FS) && !defined(CONFIG_ARCH_DISCARD_MEMBLOCK). Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21x86: memblock: set current limit to max low memory addressSantosh Shilimkar2-3/+3
The memblock current limit value is used to limit early boot memory allocations below max low memory address by default, as the kernel can access only to the low memory. Hence, set memblock current limit value to the max mapped low memory address instead of max mapped memory address. Signed-off-by: Santosh Shilimkar <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Grygorii Strashko <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21oom_kill: add rcu_read_lock() into find_lock_task_mm()Oleg Nesterov1-4/+8
find_lock_task_mm() expects it is called under rcu or tasklist lock, but it seems that at least oom_unkillable_task()->task_in_mem_cgroup() and mem_cgroup_out_of_memory()->oom_badness() can call it lockless. Perhaps we could fix the callers, but this patch simply adds rcu lock into find_lock_task_mm(). This also allows to simplify a bit one of its callers, oom_kill_process(). Signed-off-by: Oleg Nesterov <[email protected]> Cc: Sergey Dyasly <[email protected]> Cc: Sameer Nanda <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Mandeep Singh Baines <[email protected]> Cc: "Ma, Xindong" <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: "Tu, Xiaobing" <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21oom_kill: has_intersects_mems_allowed() needs rcu_read_lock()Oleg Nesterov1-8/+11
At least out_of_memory() calls has_intersects_mems_allowed() without even rcu_read_lock(), this is obviously buggy. Add the necessary rcu_read_lock(). This means that we can not simply return from the loop, we need "bool ret" and "break". While at it, swap the names of task_struct's (the argument and the local). This cleans up the code a little bit and avoids the unnecessary initialization. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Sergey Dyasly <[email protected]> Tested-by: Sergey Dyasly <[email protected]> Reviewed-by: Sameer Nanda <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Mandeep Singh Baines <[email protected]> Cc: "Ma, Xindong" <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: "Tu, Xiaobing" <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21oom_kill: change oom_kill.c to use for_each_thread()Oleg Nesterov1-10/+10
Change oom_kill.c to use for_each_thread() rather than the racy while_each_thread() which can loop forever if we race with exit. Note also that most users were buggy even if while_each_thread() was fine, the task can exit even _before_ rcu_read_lock(). Fortunately the new for_each_thread() only requires the stable task_struct, so this change fixes both problems. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Sergey Dyasly <[email protected]> Tested-by: Sergey Dyasly <[email protected]> Reviewed-by: Sameer Nanda <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Mandeep Singh Baines <[email protected]> Cc: "Ma, Xindong" <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: "Tu, Xiaobing" <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21introduce for_each_thread() to replace the buggy while_each_thread()Oleg Nesterov4-0/+22
while_each_thread() and next_thread() should die, almost every lockless usage is wrong. 1. Unless g == current, the lockless while_each_thread() is not safe. while_each_thread(g, t) can loop forever if g exits, next_thread() can't reach the unhashed thread in this case. Note that this can happen even if g is the group leader, it can exec. 2. Even if while_each_thread() itself was correct, people often use it wrongly. It was never safe to just take rcu_read_lock() and loop unless you verify that pid_alive(g) == T, even the first next_thread() can point to the already freed/reused memory. This patch adds signal_struct->thread_head and task->thread_node to create the normal rcu-safe list with the stable head. The new for_each_thread(g, t) helper is always safe under rcu_read_lock() as long as this task_struct can't go away. Note: of course it is ugly to have both task_struct->thread_node and the old task_struct->thread_group, we will kill it later, after we change the users of while_each_thread() to use for_each_thread(). Perhaps we can kill it even before we convert all users, we can reimplement next_thread(t) using the new thread_head/thread_node. But we can't do this right now because this will lead to subtle behavioural changes. For example, do/while_each_thread() always sees at least one task, while for_each_thread() can do nothing if the whole thread group has died. Or thread_group_empty(), currently its semantics is not clear unless thread_group_leader(p) and we need to audit the callers before we can change it. So this patch adds the new interface which has to coexist with the old one for some time, hopefully the next changes will be more or less straightforward and the old one will go away soon. Signed-off-by: Oleg Nesterov <[email protected]> Reviewed-by: Sergey Dyasly <[email protected]> Tested-by: Sergey Dyasly <[email protected]> Reviewed-by: Sameer Nanda <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Mandeep Singh Baines <[email protected]> Cc: "Ma, Xindong" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Tu, Xiaobing" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/rmap: use rmap_walk() in page_mkclean()Joonsoo Kim1-25/+26
Now, we have an infrastructure in rmap_walk() to handle difference from variants of rmap traversing functions. So, just use it in page_mkclean(). In this patch, I change following things. 1. remove some variants of rmap traversing functions. cf> page_mkclean_file 2. mechanical change to use rmap_walk() in page_mkclean(). Signed-off-by: Joonsoo Kim <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/rmap: use rmap_walk() in page_referenced()Joonsoo Kim4-194/+80
Now, we have an infrastructure in rmap_walk() to handle difference from variants of rmap traversing functions. So, just use it in page_referenced(). In this patch, I change following things. 1. remove some variants of rmap traversing functions. cf> page_referenced_ksm, page_referenced_anon, page_referenced_file 2. introduce new struct page_referenced_arg and pass it to page_referenced_one(), main function of rmap_walk, in order to count reference, to store vm_flags and to check finish condition. 3. mechanical change to use rmap_walk() in page_referenced(). [[email protected]: fix BUG at rmap_walk] Signed-off-by: Joonsoo Kim <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Wanpeng Li <[email protected]> Cc: Sasha Levin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/rmap: use rmap_walk() in try_to_munlock()Joonsoo Kim3-168/+42
Now, we have an infrastructure in rmap_walk() to handle difference from variants of rmap traversing functions. So, just use it in try_to_munlock(). In this patch, I change following things. 1. remove some variants of rmap traversing functions. cf> try_to_unmap_ksm, try_to_unmap_anon, try_to_unmap_file 2. mechanical change to use rmap_walk() in try_to_munlock(). 3. copy and paste comments. Signed-off-by: Joonsoo Kim <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/rmap: use rmap_walk() in try_to_unmap()Joonsoo Kim3-18/+39
Now, we have an infrastructure in rmap_walk() to handle difference from variants of rmap traversing functions. So, just use it in try_to_unmap(). In this patch, I change following things. 1. enable rmap_walk() if !CONFIG_MIGRATION. 2. mechanical change to use rmap_walk() in try_to_unmap(). Signed-off-by: Joonsoo Kim <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/rmap: extend rmap_walk_xxx() to cope with different casesJoonsoo Kim3-8/+51
There are a lot of common parts in traversing functions, but there are also a little of uncommon parts in it. By assigning proper function pointer on each rmap_walker_control, we can handle these difference correctly. Following are differences we should handle. 1. difference of lock function in anon mapping case 2. nonlinear handling in file mapping case 3. prechecked condition: checking memcg in page_referenced(), checking VM_SHARE in page_mkclean() checking temporary vma in try_to_unmap() 4. exit condition: checking page_mapped() in try_to_unmap() So, in this patch, I introduce 4 function pointers to handle above differences. Signed-off-by: Joonsoo Kim <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/rmap: make rmap_walk to get the rmap_walk_control argumentJoonsoo Kim5-21/+27
In each rmap traverse case, there is some difference so that we need function pointers and arguments to them in order to handle these For this purpose, struct rmap_walk_control is introduced in this patch, and will be extended in following patch. Introducing and extending are separate, because it clarify changes. Signed-off-by: Joonsoo Kim <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/rmap: factor lock function out of rmap_walk_anon()Joonsoo Kim1-8/+20
When we traverse anon_vma, we need to take a read-side anon_lock. But there is subtle difference in the situation so that we can't use same method to take a lock in each cases. Therefore, we need to make rmap_walk_anon() taking difference lock function. This patch is the first step, factoring lock function for anon_lock out of rmap_walk_anon(). It will be used in case of removing migration entry and in default of rmap_walk_anon(). Signed-off-by: Joonsoo Kim <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/rmap: factor nonlinear handling out of try_to_unmap_file()Joonsoo Kim1-62/+74
To merge all kinds of rmap traverse functions, try_to_unmap(), try_to_munlock(), page_referenced() and page_mkclean(), we need to extract common parts and separate out non-common parts. Nonlinear handling is handled just in try_to_unmap_file() and other rmap traverse functions doesn't care of it. Therfore it is better to factor nonlinear handling out of try_to_unmap_file() in order to merge all kinds of rmap traverse functions easily. Signed-off-by: Joonsoo Kim <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/rmap: recompute pgoff for huge pageJoonsoo Kim1-5/+2
Rmap traversing is used in five different cases, try_to_unmap(), try_to_munlock(), page_referenced(), page_mkclean() and remove_migration_ptes(). Each one implements its own traversing functions for the cases, anon, file, ksm, respectively. These cause lots of duplications and cause maintenance overhead. They also make codes being hard to understand and error-prone. One example is hugepage handling. There is a code to compute hugepage offset correctly in try_to_unmap_file(), but, there isn't a code to compute hugepage offset in rmap_walk_file(). These are used pairwise in migration context, but we missed to modify pairwise. To overcome these drawbacks, we should unify these through one unified function. I decide rmap_walk() as main function since it has no unnecessity. And to control behavior of rmap_walk(), I introduce struct rmap_walk_control having some function pointers. These makes rmap_walk() working for their specific needs. This patchset remove a lot of duplicated code as you can see in below short-stat and kernel text size also decrease slightly. text data bss dec hex filename 10640 1 16 10657 29a1 mm/rmap.o 10047 1 16 10064 2750 mm/rmap.o 13823 705 8288 22816 5920 mm/ksm.o 13199 705 8288 22192 56b0 mm/ksm.o This patch (of 9): We have to recompute pgoff if the given page is huge, since result based on HPAGE_SIZE is not approapriate for scanning the vma interval tree, as shown by commit 36e4f20af833 ("hugetlb: do not use vma_hugecache_offset() for vma_prio_tree_foreach") and commit 369a713e ("rmap: recompute pgoff for unmapping huge page"). To handle both the cases, normal page for page cache and hugetlb page, by same way, we can use compound_page(). It returns 0 on non-compound page and it also returns proper value on compound page. Signed-off-by: Joonsoo Kim <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21memcg: make memcg_update_cache_sizes() staticVladimir Davydov1-1/+1
This function is not used outside of memcontrol.c so make it static. Signed-off-by: Vladimir Davydov <[email protected]> Cc: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Balbir Singh <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21memcg: fix kmem_account_flags check in memcg_can_account_kmem()Vladimir Davydov1-1/+2
We should start kmem accounting for a memory cgroup only after both its kmem limit is set (KMEM_ACCOUNTED_ACTIVE) and related call sites are patched (KMEM_ACCOUNTED_ACTIVATED). Currently memcg_can_account_kmem() allows kmem accounting even if only one of the conditions is true. Fix it. This means that a page might get charged by memcg_kmem_newpage_charge which would see its static key patched already but memcg_kmem_commit_charge would still see it unpatched and so the charge won't be committed. The result would be charge inconsistency (page_cgroup not marked as PageCgroupUsed) and the charge would leak because __memcg_kmem_uncharge_pages would ignore it. [[email protected]: augment changelog] Signed-off-by: Vladimir Davydov <[email protected]> Cc: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Balbir Singh <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Glauber Costa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>