aboutsummaryrefslogtreecommitdiff
path: root/include/linux/gfp.h
AgeCommit message (Collapse)AuthorFilesLines
2015-11-06mm, page_alloc: use masks and shifts when converting GFP flags to migrate typesMel Gorman1-5/+7
This patch redefines which GFP bits are used for specifying mobility and the order of the migrate types. Once redefined it's possible to convert GFP flags to a migrate type with a simple mask and shift. The only downside is that readers of OOM kill messages and allocation failures may have been used to the existing values but scripts/gfp-translate will help. Signed-off-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08mm: use numa_mem_id() in alloc_pages_node()Vlastimil Babka1-2/+3
alloc_pages_node() might fail when called with NUMA_NO_NODE and __GFP_THISNODE on a CPU belonging to a memoryless node. To make the local-node fallback more robust and prevent such situations, use numa_mem_id(), which was introduced for similar scenarios in the slab context. Suggested-by: Christoph Lameter <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08mm: unify checks in alloc_pages_node() and __alloc_pages_node()Vlastimil Babka1-5/+5
Perform the same debug checks in alloc_pages_node() as are done in __alloc_pages_node(), by making the former function a wrapper of the latter one. In addition to better diagnostics in DEBUG_VM builds for situations which have been already fatal (e.g. out-of-bounds node id), there are two visible changes for potential existing buggy callers of alloc_pages_node(): - calling alloc_pages_node() with any negative nid (e.g. due to arithmetic overflow) was treated as passing NUMA_NO_NODE and fallback to local node was applied. This will now be fatal. - calling alloc_pages_node() with an offline node will now be checked for DEBUG_VM builds. Since it's not fatal if the node has been previously online, and this patch may expose some existing buggy callers, change the VM_BUG_ON in __alloc_pages_node() to VM_WARN_ON. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08mm: rename alloc_pages_exact_node() to __alloc_pages_node()Vlastimil Babka1-8/+15
alloc_pages_exact_node() was introduced in commit 6484eb3e2a81 ("page allocator: do not check NUMA node ID when the caller knows the node is valid") as an optimized variant of alloc_pages_node(), that doesn't fallback to current node for nid == NUMA_NO_NODE. Unfortunately the name of the function can easily suggest that the allocation is restricted to the given node and fails otherwise. In truth, the node is only preferred, unless __GFP_THISNODE is passed among the gfp flags. The misleading name has lead to mistakes in the past, see for example commits 5265047ac301 ("mm, thp: really limit transparent hugepage allocation to local node") and b360edb43f8e ("mm, mempolicy: migrate_to_node should only migrate to node"). Another issue with the name is that there's a family of alloc_pages_exact*() functions where 'exact' means exact size (instead of page order), which leads to more confusion. To prevent further mistakes, this patch effectively renames alloc_pages_exact_node() to __alloc_pages_node() to better convey that it's an optimized variant of alloc_pages_node() not intended for general usage. Both functions get described in comments. It has been also considered to really provide a convenience function for allocations restricted to a node, but the major opinion seems to be that __GFP_THISNODE already provides that functionality and we shouldn't duplicate the API needlessly. The number of users would be small anyway. Existing callers of alloc_pages_exact_node() are simply converted to call __alloc_pages_node(), with the exception of sba_alloc_coherent() which open-codes the check for NUMA_NO_NODE, so it is converted to use alloc_pages_node() instead. This means it no longer performs some VM_BUG_ON checks, and since the current check for nid in alloc_pages_node() uses a 'nid < 0' comparison (which includes NUMA_NO_NODE), it may hide wrong values which would be previously exposed. Both differences will be rectified by the next patch. To sum up, this patch makes no functional changes, except temporarily hiding potentially buggy callers. Restricting the checks in alloc_pages_node() is left for the next patch which can in turn expose more existing buggy callers. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Robin Holt <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Michael Ellerman <[email protected]> Cc: Mel Gorman <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Cliff Whickman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08mm: improve __GFP_NORETRY comment based on implementationDavid Rientjes1-1/+4
Explicitly state that __GFP_NORETRY will attempt direct reclaim and memory compaction before returning NULL and that the oom killer is not called in the current implementation of the page allocator. [[email protected]: s/has/have/] Signed-off-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: meminit: finish initialisation of struct pages before basic setupMel Gorman1-0/+8
Waiman Long reported that 24TB machines hit OOM during basic setup when struct page initialisation was deferred. One approach is to initialise memory on demand but it interferes with page allocator paths. This patch creates dedicated threads to initialise memory before basic setup. It then blocks on a rw_semaphore until completion as a wait_queue and counter is overkill. This may be slower to boot but it's simplier overall and also gets rid of a section mangling which existed so kswapd could do the initialisation. [[email protected]: include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast] Signed-off-by: Mel Gorman <[email protected]> Cc: Waiman Long <[email protected] Cc: Nathan Zimmer <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Scott Norton <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+2
Conflicts: drivers/net/ethernet/cadence/macb.c drivers/net/phy/phy.c include/linux/skbuff.h net/ipv4/tcp.c net/switchdev/switchdev.c Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD} renaming overlapping with net-next changes of various sorts. phy.c was a case of two changes, one adding a local variable to a function whilst the second was removing one. tcp.c overlapped a deadlock fix with the addition of new tcp_info statistic values. macb.c involved the addition of two zyncq device entries. skbuff.h involved adding back ipv4_daddr to nf_bridge_info whilst net-next changes put two other existing members of that struct into a union. Signed-off-by: David S. Miller <[email protected]>
2015-05-14gfp: add __GFP_NOACCOUNTVladimir Davydov1-0/+2
Not all kmem allocations should be accounted to memcg. The following patch gives an example when accounting of a certain type of allocations to memcg can effectively result in a memory leak. This patch adds the __GFP_NOACCOUNT flag which if passed to kmalloc and friends will force the allocation to go through the root cgroup. It will be used by the next patch. Note, since in case of kmemleak enabled each kmalloc implies yet another allocation from the kmemleak_object cache, we add __GFP_NOACCOUNT to gfp_kmemleak_mask. Alternatively, we could introduce a per kmem cache flag disabling accounting for all allocations of a particular kind, but (a) we would not be able to bypass accounting for kmalloc then and (b) a kmem cache with this flag set could not be merged with a kmem cache without this flag, which would increase the number of global caches and therefore fragmentation even if the memory cgroup controller is not used. Despite its generic name, currently __GFP_NOACCOUNT disables accounting only for kmem allocations while user page allocations are always charged. To catch abusing of this flag, a warning is issued on an attempt of passing it to mem_cgroup_try_charge. Signed-off-by: Vladimir Davydov <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: <[email protected]> [4.0.x] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-05-12mm/net: Rename and move page fragment handling from net/ to mm/Alexander Duyck1-0/+5
This change moves the __alloc_page_frag functionality out of the networking stack and into the page allocation portion of mm. The idea it so help make this maintainable by placing it with other page allocation functions. Since we are moving it from skbuff.c to page_alloc.c I have also renamed the basic defines and structure from netdev_alloc_cache to page_frag_cache to reflect that this is now part of a different kernel subsystem. I have also added a simple __free_page_frag function which can handle freeing the frags based on the skb->head pointer. The model for this is based off of __free_pages since we don't actually need to deal with all of the cases that put_page handles. I incorporated the virt_to_head_page call and compound_order into the function as it actually allows for a signficant size reduction by reducing code duplication. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-14mm: clarify __GFP_NOFAIL deprecation statusMichal Hocko1-2/+4
__GFP_NOFAIL is documented as a deprecated flag since commit 478352e789f5 ("mm: add comment about deprecation of __GFP_NOFAIL"). This has discouraged people from using it but in some cases an opencoded endless loop around allocator has been used instead. So the allocator is not aware of the de facto __GFP_NOFAIL allocation because this information was not communicated properly. Let's make clear that if the allocation context really cannot afford failure because there is no good failure policy then using __GFP_NOFAIL is preferable to opencoding the loop outside of the allocator. [[email protected]: coding-style fixes] Signed-off-by: Michal Hocko <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dave Chinner <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Vipul Pandya <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-14mm: remove GFP_THISNODEDavid Rientjes1-10/+0
NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE. GFP_THISNODE is a secret combination of gfp bits that have different behavior than expected. It is a combination of __GFP_THISNODE, __GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page allocator slowpath to fail without trying reclaim even though it may be used in combination with __GFP_WAIT. An example of the problem this creates: commit e97ca8e5b864 ("mm: fix GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE that really just wanted __GFP_THISNODE. The problem doesn't end there, however, because even it was a no-op for alloc_misplaced_dst_page(), which also sets __GFP_NORETRY and __GFP_NOWARN, and migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT is set in GFP_TRANSHUGE. Converting GFP_THISNODE to __GFP_THISNODE is a no-op in these cases since the page allocator special-cases __GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN. It's time to just remove GFP_THISNODE entirely. We leave __GFP_THISNODE to restrict an allocation to a local node, but remove GFP_THISNODE and its obscurity. Instead, we require that a caller clear __GFP_WAIT if it wants to avoid reclaim. This allows the aforementioned functions to actually reclaim as they should. It also enables any future callers that want to do __GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim. The rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT. Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so it is unchanged. Signed-off-by: David Rientjes <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Pravin Shelar <[email protected]> Cc: Jarno Rajahalme <[email protected]> Cc: Li Zefan <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vmaVlastimil Babka1-6/+6
The previous commit ("mm/thp: Allocate transparent hugepages on local node") introduced alloc_hugepage_vma() to mm/mempolicy.c to perform a special policy for THP allocations. The function has the same interface as alloc_pages_vma(), shares a lot of boilerplate code and a long comment. This patch merges the hugepage special case into alloc_pages_vma. The extra if condition should be cheap enough price to pay. We also prevent a (however unlikely) race with parallel mems_allowed update, which could make hugepage allocation restart only within the fallback call to alloc_hugepage_vma() and not reconsider the special rule in alloc_hugepage_vma(). Also by making sure mpol_cond_put(pol) is always called before actual allocation attempt, we can use a single exit path within the function. Also update the comment for missing node parameter and obsolete reference to mm_sem. Signed-off-by: Vlastimil Babka <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11mm/thp: allocate transparent hugepages on local nodeAneesh Kumar K.V1-0/+4
This make sure that we try to allocate hugepages from local node if allowed by mempolicy. If we can't, we fallback to small page allocation based on mempolicy. This is based on the observation that allocating pages on local node is more beneficial than allocating hugepages on remote node. With this patch applied we may find transparent huge page allocation failures if the current node doesn't have enough freee hugepages. Before this patch such failures result in us retrying the allocation on other nodes in the numa node mask. [[email protected]: fix comment, add CONFIG_TRANSPARENT_HUGEPAGE dependency] Signed-off-by: Aneesh Kumar K.V <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-12-13mm, gfp: escalatedly define GFP_HIGHUSER and GFP_HIGHUSER_MOVABLEJianyu Zhan1-5/+2
GFP_USER, GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE are escalatedly confined defined, also implied by their names: GFP_USER = GFP_USER GFP_USER + __GFP_HIGHMEM = GFP_HIGHUSER GFP_USER + __GFP_HIGHMEM + __GFP_MOVABLE = GFP_HIGHUSER_MOVABLE So just make GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE escalatedly defined to reflect this fact. It also makes the definition clear and texturally warn on any furture break-up of this escalated relastionship. Signed-off-by: Jianyu Zhan <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-12-10mm: introduce single zone pcplists drainVlastimil Babka1-2/+2
The functions for draining per-cpu pages back to buddy allocators currently always operate on all zones. There are however several cases where the drain is only needed in the context of a single zone, and spilling other pcplists is a waste of time both due to the extra spilling and later refilling. This patch introduces new zone pointer parameter to drain_all_pages() and changes the dummy parameter of drain_local_pages() to be also a zone pointer. When NULL is passed, the functions operate on all zones as usual. Passing a specific zone pointer reduces the work to the single zone. All callers are updated to pass the NULL pointer in this patch. Conversion to single zone (where appropriate) is done in further patches. Signed-off-by: Vlastimil Babka <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Marek Szyprowski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09mm: rename allocflags_to_migratetype for clarityDavid Rientjes1-1/+1
The page allocator has gfp flags (like __GFP_WAIT) and alloc flags (like ALLOC_CPUSET) that have separate semantics. The function allocflags_to_migratetype() actually takes gfp flags, not alloc flags, and returns a migratetype. Rename it to gfpflags_to_migratetype(). Signed-off-by: David Rientjes <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Reviewed-by: Zhang Yanfei <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06mm/page_alloc.c: add __meminit to alloc_pages_exact_nid()Fabian Frederick1-1/+1
alloc_pages_exact_nid() is only called by __meminit alloc_page_cgroup() Signed-off-by: Fabian Frederick <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-04include/linux/gfp.h: exclude duplicate headerAndy Shevchenko1-1/+0
mmdebug.h is included twice. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-04mm: page_alloc: convert hot/cold parameter and immediate callers to boolMel Gorman1-2/+2
cold is a bool, make it one. Make the likely case the "if" part of the block instead of the else as according to the optimisation manual this is preferred. Signed-off-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Jan Kara <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-04mm: get rid of __GFP_KMEMCGVladimir Davydov1-4/+6
Currently to allocate a page that should be charged to kmemcg (e.g. threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page allocated is then to be freed by free_memcg_kmem_pages. Apart from looking asymmetrical, this also requires intrusion to the general allocation path. So let's introduce separate functions that will alloc/free pages charged to kmemcg. The new functions are called alloc_kmem_pages and free_kmem_pages. They should be used when the caller actually would like to use kmalloc, but has to fall back to the page allocator for the allocation is large. They only differ from alloc_pages and free_pages in that besides allocating or freeing pages they also charge them to the kmem resource counter of the current memory cgroup. [[email protected]: export kmalloc_order() to modules] Signed-off-by: Vladimir Davydov <[email protected]> Acked-by: Greg Thelen <[email protected]> Cc: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-03-10mm: fix GFP_THISNODE callers and clarifyJohannes Weiner1-0/+4
GFP_THISNODE is for callers that implement their own clever fallback to remote nodes. It restricts the allocation to the specified node and does not invoke reclaim, assuming that the caller will take care of it when the fallback fails, e.g. through a subsequent allocation request without GFP_THISNODE set. However, many current GFP_THISNODE users only want the node exclusive aspect of the flag, without actually implementing their own fallback or triggering reclaim if necessary. This results in things like page migration failing prematurely even when there is easily reclaimable memory available, unless kswapd happens to be running already or a concurrent allocation attempt triggers the necessary reclaim. Convert all callsites that don't implement their own fallback strategy to __GFP_THISNODE. This restricts the allocation a single node too, but at the same time allows the allocator to enter the slowpath, wake kswapd, and invoke direct reclaim if necessary, to make the allocation happen when memory is full. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Jan Stancek <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-23mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGESasha Levin1-0/+1
Most of the VM_BUG_ON assertions are performed on a page. Usually, when one of these assertions fails we'll get a BUG_ON with a call stack and the registers. I've recently noticed based on the requests to add a small piece of code that dumps the page to various VM_BUG_ON sites that the page dump is quite useful to people debugging issues in mm. This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what VM_BUG_ON() does, also dumps the page before executing the actual BUG_ON. [[email protected]: fix up includes] Signed-off-by: Sasha Levin <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09include/linux/gfp.h: fix the comment for GFP_ZONE_TABLEZhang Yanfei1-1/+1
0xc just means MOVABLE + DMA32, which results in zone DMA32. Signed-off-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-18mm: allocate kernel pages to the right memcgGlauber Costa1-0/+3
When a process tries to allocate a page with the __GFP_KMEMCG flag, the page allocator will call the corresponding memcg functions to validate the allocation. Tasks in the root memcg can always proceed. To avoid adding markers to the page - and a kmem flag that would necessarily follow, as much as doing page_cgroup lookups for no reason, whoever is marking its allocations with __GFP_KMEMCG flag is responsible for telling the page allocator that this is such an allocation at free_pages() time. This is done by the invocation of __free_accounted_pages() and free_accounted_pages(). Signed-off-by: Glauber Costa <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Kamezawa Hiroyuki <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Suleiman Souhlal <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Greg Thelen <[email protected]> Cc: JoonSoo Kim <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-18mm: add a __GFP_KMEMCG flagGlauber Costa1-0/+2
This flag is used to indicate to the callees that this allocation is a kernel allocation in process context, and should be accounted to current's memcg. Signed-off-by: Glauber Costa <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Kamezawa Hiroyuki <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Suleiman Souhlal <[email protected]> Cc: Tejun Heo <[email protected]> Cc: David Rientjes <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Greg Thelen <[email protected]> Cc: JoonSoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-12mm: add a reminder comment for __GFP_BITS_SHIFTAndrew Morton1-0/+1
Cc: Glauber Costa <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-11mm: use IS_ENABLED(CONFIG_NUMA) instead of NUMA_BUILDKirill A. Shutemov1-1/+1
We don't need custom NUMA_BUILD anymore, since we have handy IS_ENABLED(). Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: KOSAKI Motohiro <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-10Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damageLinus Torvalds1-5/+8
This reverts commits a50915394f1fc02c2861d3b7ce7014788aa5066e and d7c3b937bdf45f0b844400b7bf6fd3ed50bac604. This is a revert of a revert of a revert. In addition, it reverts the even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the original commits in linux-next. It turns out that the original patch really was bogus, and that the original revert was the correct thing to do after all. We thought we had fixed the problem, and then reverted the revert, but the problem really is fundamental: waking up kswapd simply isn't the right thing to do, and direct reclaim sometimes simply _is_ the right thing to do. When certain allocations fail, we simply should try some direct reclaim, and if that fails, fail the allocation. That's the right thing to do for THP allocations, which can easily fail, and the GPU allocations want to do that too. So starting kswapd is sometimes simply wrong, and removing the flag that said "don't start kswapd" was a mistake. Let's hope we never revisit this mistake again - and certainly not this many times ;) Acked-by: Mel Gorman <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-11-30revert "Revert "mm: remove __GFP_NO_KSWAPD""Andrew Morton1-8/+5
It apepars that this patch was innocent, and we hope that "mm: avoid waking kswapd for THP allocations when compaction is deferred or contended" will fix the final kswapd-spinning cause. Cc: Zdenek Kabelac <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Valdis Kletnieks <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Robert Jennings <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-11-26Revert "mm: remove __GFP_NO_KSWAPD"Mel Gorman1-1/+4
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" reverted, Zdenek Kabelac reported the following Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 Call Trace: preempt_schedule+0x42/0x60 _raw_spin_unlock+0x55/0x60 put_super+0x31/0x40 drop_super+0x22/0x30 prune_super+0x149/0x1b0 shrink_slab+0xba/0x510 The sysrq+m indicates the system has no swap so it'll never reclaim anonymous pages as part of reclaim/compaction. That is one part of the problem but not the root cause as file-backed pages could also be reclaimed. The likely underlying problem is that kswapd is woken up or kept awake for each THP allocation request in the page allocator slow path. If compaction fails for the requesting process then compaction will be deferred for a time and direct reclaim is avoided. However, if there are a storm of THP requests that are simply rejected, it will still be the the case that kswapd is awake for a prolonged period of time as pgdat->kswapd_max_order is updated each time. This is noticed by the main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead it will loopp, shrinking a small number of pages and calling shrink_slab() on each iteration. The temptation is to supply a patch that checks if kswapd was woken for THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not backed up by proper testing. As 3.7 is very close to release and this is not a bug we should release with, a safer path is to revert "mm: remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the balance_pgdat() logic in general. Signed-off-by: Mel Gorman <[email protected]> Cc: Zdenek Kabelac <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Valdis Kletnieks <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Robert Jennings <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-10-09make GFP_NOTRACK definition unconditionalGlauber Costa1-4/+0
There was a general sentiment in a recent discussion (See https://lkml.org/lkml/2012/9/18/258) that the __GFP flags should be defined unconditionally. Currently, the only offender is GFP_NOTRACK, which is conditional to KMEMCHECK. Signed-off-by: Glauber Costa <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Mel Gorman <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-10-09mm: remove __GFP_NO_KSWAPDRik van Riel1-4/+1
When transparent huge pages were introduced, memory compaction and swap storms were an issue, and the kernel had to be careful to not make THP allocations cause pageout or compaction. Now that we have working compaction deferral, kswapd is smart enough to invoke compaction and the quadratic behaviour around isolate_free_pages has been fixed, it should be safe to remove __GFP_NO_KSWAPD. [[email protected]: Comment fix] [[email protected]: Avoid direct reclaim for deferred compaction] Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-07-31netvm: allow skb allocation to use PFMEMALLOC reservesMel Gorman1-0/+3
Change the skb allocation API to indicate RX usage and use this to fall back to the PFMEMALLOC reserve when needed. SKBs allocated from the reserve are tagged in skb->pfmemalloc. If an SKB is allocated from the reserve and the socket is later found to be unrelated to page reclaim, the packet is dropped so that the memory remains available for page reclaim. Network protocols are expected to recover from this packet loss. [[email protected]: Ideas taken from various patches] [[email protected]: Use static branches, coding style corrections] [[email protected]: Avoid unnecessary cast, fix !CONFIG_NET build] Signed-off-by: Mel Gorman <[email protected]> Acked-by: David S. Miller <[email protected]> Cc: Neil Brown <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Christie <[email protected]> Cc: Eric B Munson <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-07-31mm: introduce __GFP_MEMALLOC to allow access to emergency reservesMel Gorman1-2/+8
__GFP_MEMALLOC will allow the allocation to disregard the watermarks, much like PF_MEMALLOC. It allows one to pass along the memalloc state in object related allocation flags as opposed to task related flags, such as sk->sk_allocation. This removes the need for ALLOC_PFMEMALLOC as callers using __GFP_MEMALLOC can get the ALLOC_NO_WATERMARK flag which is now enough to identify allocations related to page reclaim. Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Cc: David Miller <[email protected]> Cc: Neil Brown <[email protected]> Cc: Mike Christie <[email protected]> Cc: Eric B Munson <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-05-21mm: page_isolation: MIGRATE_CMA isolation functions addedMichal Nazarewicz1-1/+2
This commit changes various functions that change pages and pageblocks migrate type between MIGRATE_ISOLATE and MIGRATE_MOVABLE in such a way as to allow to work with MIGRATE_CMA migrate type. Signed-off-by: Michal Nazarewicz <[email protected]> Signed-off-by: Marek Szyprowski <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Tested-by: Rob Clark <[email protected]> Tested-by: Ohad Ben-Cohen <[email protected]> Tested-by: Benjamin Gaignard <[email protected]> Tested-by: Robert Nelson <[email protected]> Tested-by: Barry Song <[email protected]>
2012-05-21mm: mmzone: MIGRATE_CMA migration type addedMichal Nazarewicz1-0/+3
The MIGRATE_CMA migration type has two main characteristics: (i) only movable pages can be allocated from MIGRATE_CMA pageblocks and (ii) page allocator will never change migration type of MIGRATE_CMA pageblocks. This guarantees (to some degree) that page in a MIGRATE_CMA page block can always be migrated somewhere else (unless there's no memory left in the system). It is designed to be used for allocating big chunks (eg. 10MiB) of physically contiguous memory. Once driver requests contiguous memory, pages from MIGRATE_CMA pageblocks may be migrated away to create a contiguous block. To minimise number of migrations, MIGRATE_CMA migration type is the last type tried when page allocator falls back to other migration types when requested. Signed-off-by: Michal Nazarewicz <[email protected]> Signed-off-by: Marek Szyprowski <[email protected]> Signed-off-by: Kyungmin Park <[email protected]> Acked-by: Mel Gorman <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Tested-by: Rob Clark <[email protected]> Tested-by: Ohad Ben-Cohen <[email protected]> Tested-by: Benjamin Gaignard <[email protected]> Tested-by: Robert Nelson <[email protected]> Tested-by: Barry Song <[email protected]>
2012-05-21mm: page_alloc: introduce alloc_contig_range()Michal Nazarewicz1-0/+8
This commit adds the alloc_contig_range() function which tries to allocate given range of pages. It tries to migrate all already allocated pages that fall in the range thus freeing them. Once all pages in the range are freed they are removed from the buddy system thus allocated for the caller to use. Signed-off-by: Michal Nazarewicz <[email protected]> Signed-off-by: Marek Szyprowski <[email protected]> Acked-by: Mel Gorman <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Tested-by: Rob Clark <[email protected]> Tested-by: Ohad Ben-Cohen <[email protected]> Tested-by: Benjamin Gaignard <[email protected]> Tested-by: Robert Nelson <[email protected]> Tested-by: Barry Song <[email protected]>
2012-01-10mm: try to distribute dirty pages fairly across zonesJohannes Weiner1-1/+3
The maximum number of dirty pages that exist in the system at any time is determined by a number of pages considered dirtyable and a user-configured percentage of those, or an absolute number in bytes. This number of dirtyable pages is the sum of memory provided by all the zones in the system minus their lowmem reserves and high watermarks, so that the system can retain a healthy number of free pages without having to reclaim dirty pages. But there is a flaw in that we have a zoned page allocator which does not care about the global state but rather the state of individual memory zones. And right now there is nothing that prevents one zone from filling up with dirty pages while other zones are spared, which frequently leads to situations where kswapd, in order to restore the watermark of free pages, does indeed have to write pages from that zone's LRU list. This can interfere so badly with IO from the flusher threads that major filesystems (btrfs, xfs, ext4) mostly ignore write requests from reclaim already, taking away the VM's only possibility to keep such a zone balanced, aside from hoping the flushers will soon clean pages from that zone. Enter per-zone dirty limits. They are to a zone's dirtyable memory what the global limit is to the global amount of dirtyable memory, and try to make sure that no single zone receives more than its fair share of the globally allowed dirty pages in the first place. As the number of pages considered dirtyable excludes the zones' lowmem reserves and high watermarks, the maximum number of dirty pages in a zone is such that the zone can always be balanced without requiring page cleaning. As this is a placement decision in the page allocator and pages are dirtied only after the allocation, this patch allows allocators to pass __GFP_WRITE when they know in advance that the page will be written to and become dirty soon. The page allocator will then attempt to allocate from the first zone of the zonelist - which on NUMA is determined by the task's NUMA memory policy - that has not exceeded its dirty limit. At first glance, it would appear that the diversion to lower zones can increase pressure on them, but this is not the case. With a full high zone, allocations will be diverted to lower zones eventually, so it is more of a shift in timing of the lower zone allocations. Workloads that previously could fit their dirty pages completely in the higher zone may be forced to allocate from lower zones, but the amount of pages that "spill over" are limited themselves by the lower zones' dirty constraints, and thus unlikely to become a problem. For now, the problem of unfair dirty page distribution remains for NUMA configurations where the zones allowed for allocation are in sum not big enough to trigger the global dirty limits, wake up the flusher threads and remedy the situation. Because of this, an allocation that could not succeed on any of the considered zones is allowed to ignore the dirty limits before going into direct reclaim or even failing the allocation, until a future patch changes the global dirty throttling and flusher thread activation so that they take individual zone states into account. Test results 15M DMA + 3246M DMA32 + 504 Normal = 3765M memory 40% dirty ratio 16G USB thumb drive 10 runs of dd if=/dev/zero of=disk/zeroes bs=32k count=$((10 << 15)) seconds nr_vmscan_write (stddev) min| median| max xfs vanilla: 549.747( 3.492) 0.000| 0.000| 0.000 patched: 550.996( 3.802) 0.000| 0.000| 0.000 fuse-ntfs vanilla: 1183.094(53.178) 54349.000| 59341.000| 65163.000 patched: 558.049(17.914) 0.000| 0.000| 43.000 btrfs vanilla: 573.679(14.015) 156657.000| 460178.000| 606926.000 patched: 563.365(11.368) 0.000| 0.000| 1362.000 ext4 vanilla: 561.197(15.782) 0.000|2725438.000|4143837.000 patched: 568.806(17.496) 0.000| 0.000| 0.000 Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Acked-by: Mel Gorman <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Tested-by: Wu Fengguang <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Jan Kara <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Chris Mason <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-01-10mm, debug: test for online nid when allocating on single nodeDavid Rientjes1-1/+1
Calling alloc_pages_exact_node() means the allocation only passes the zonelist of a single node into the page allocator. If that node isn't online, it's zonelist may never have been initialized causing a strange oops that may not immediately be clear. I recently debugged an issue where node 0 wasn't online and an allocator was passing 0 to alloc_pages_exact_node() and it resulted in a NULL pointer on zonelist->_zoneref. If CONFIG_DEBUG_VM is enabled, though, it would be nice to catch this a bit earlier. Signed-off-by: David Rientjes <[email protected]> Acked-by: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-01-10mm: avoid livelock on !__GFP_FS allocationsMel Gorman1-0/+16
Colin Cross reported; Under the following conditions, __alloc_pages_slowpath can loop forever: gfp_mask & __GFP_WAIT is true gfp_mask & __GFP_FS is false reclaim and compaction make no progress order <= PAGE_ALLOC_COSTLY_ORDER These conditions happen very often during suspend and resume, when pm_restrict_gfp_mask() effectively converts all GFP_KERNEL allocations into __GFP_WAIT. The oom killer is not run because gfp_mask & __GFP_FS is false, but should_alloc_retry will always return true when order is less than PAGE_ALLOC_COSTLY_ORDER. In his fix, he avoided retrying the allocation if reclaim made no progress and __GFP_FS was not set. The problem is that this would result in GFP_NOIO allocations failing that previously succeeded which would be very unfortunate. The big difference between GFP_NOIO and suspend converting GFP_KERNEL to behave like GFP_NOIO is that normally flushers will be cleaning pages and kswapd reclaims pages allowing GFP_NOIO to succeed after a short delay. The same does not necessarily apply during suspend as the storage device may be suspended. This patch special cases the suspend case to fail the page allocation if reclaim cannot make progress and adds some documentation on how gfp_allowed_mask is currently used. Failing allocations like this may cause suspend to abort but that is better than a livelock. [[email protected]: Rework fix to be suspend specific] [[email protected]: Move suspended device check to should_alloc_retry] Reported-by: Colin Cross <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-01-10mm: add free_hot_cold_page_list() helperKonstantin Khlebnikov1-0/+1
This patch adds helper free_hot_cold_page_list() to free list of 0-order pages. It frees pages directly from list without temporary page-vector. It also calls trace_mm_pagevec_free() to simulate pagevec_free() behaviour. bloat-o-meter: add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28) function old new delta free_hot_cold_page_list - 264 +264 get_page_from_freelist 2129 2132 +3 __pagevec_free 243 239 -4 split_free_page 380 373 -7 release_pages 606 510 -96 free_page_list 188 - -188 Signed-off-by: Konstantin Khlebnikov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-08-03mm: page_alloc: increase __GFP_BITS_SHIFT to include __GFP_OTHER_NODEJohannes Weiner1-1/+1
__GFP_OTHER_NODE is used for NUMA allocations on behalf of other nodes. It's supposed to be passed through from the page allocator to zone_statistics(), but it never gets there as gfp_allowed_mask is not wide enough and masks out the flag early in the allocation path. The result is an accounting glitch where successful NUMA allocations by-agent are not properly attributed as local. Increase __GFP_BITS_SHIFT so that it includes __GFP_OTHER_NODE. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Andi Kleen <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Acked-by: Mel Gorman <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25include/linux/gfp.h: convert BUG_ON() into VM_BUG_ON()Dave Hansen1-3/+1
VM_BUG_ON() if effectively a BUG_ON() undef #ifdef CONFIG_DEBUG_VM. That is exactly what we have here now, and two different folks have suggested doing it this way. Signed-off-by: Dave Hansen <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25include/linux/gfp.h: work around apparent sparse confusionDave Hansen1-6/+1
Running sparse on page_alloc.c today, it errors out: include/linux/gfp.h:254:17: error: bad constant expression include/linux/gfp.h:254:17: error: cannot size expression which is a line in gfp_zone(): BUILD_BUG_ON((GFP_ZONE_BAD >> bit) & 1); That's really unfortunate, because it ends up hiding all of the other legitimate sparse messages like this: mm/page_alloc.c:5315:59: warning: incorrect type in argument 1 (different base types) mm/page_alloc.c:5315:59: expected unsigned long [unsigned] [usertype] size mm/page_alloc.c:5315:59: got restricted gfp_t [usertype] <noident> ... Having sparse be able to catch these very oopsable bugs is a lot more important than keeping a BUILD_BUG_ON(). Kill the BUILD_BUG_ON(). Compiles on x86_64 with and without CONFIG_DEBUG_VM=y. defconfig boots fine for me. Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-11mm: add alloc_pages_exact_nid()Andi Kleen1-0/+2
Add a alloc_pages_exact_nid() that allocates on a specific node. The naming is quite broken, but fixing that would need a larger renaming action. [[email protected]: coding-style fixes] [[email protected]: tweak comment] Signed-off-by: Andi Kleen <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Balbir Singh <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Acked-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: add __GFP_OTHER_NODE flagAndi Kleen1-0/+2
Add a new __GFP_OTHER_NODE flag to tell the low level numa statistics in zone_statistics() that an allocation is on behalf of another thread. This way the local and remote counters can be still correct, even when background daemons like khugepaged are changing memory mappings. This only affects the accounting, but I think it's worth doing that right to avoid confusing users. I first tried to just pass down the right node, but this required a lot of changes to pass down this parameter and at least one addition of a 10th argument to a 9 argument function. Using the flag is a lot less intrusive. Open: should be also used for migration? [[email protected]: coding-style fixes] Signed-off-by: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04mm: add alloc_page_vma_node()Andi Kleen1-0/+2
Add a alloc_page_vma_node that allows passing the "local" node in. Used in a followon patch. Acked-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-04mm: change alloc_pages_vma to pass down the policy node for local policyAndi Kleen1-4/+5
Currently alloc_pages_vma() always uses the local node as policy node for the LOCAL policy. Pass this node down as an argument instead. No behaviour change from this patch, but will be needed for followons. Acked-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-01-24Remove MAYBE_BUILD_BUG_ONRusty Russell1-1/+1
Now BUILD_BUG_ON() can handle optimizable constants, we don't need MAYBE_BUILD_BUG_ON any more. Signed-off-by: Rusty Russell <[email protected]>
2011-01-13thp: add numa awareness to hugepage allocationsAndrea Arcangeli1-2/+5
It's mostly a matter of replacing alloc_pages with alloc_pages_vma after introducing alloc_pages_vma. khugepaged needs special handling as the allocation has to happen inside collapse_huge_page where the vma is known and an error has to be returned to the outer loop to sleep alloc_sleep_millisecs in case of failure. But it retains the more efficient logic of handling allocation failures in khugepaged in case of CONFIG_NUMA=n. Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>