aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-05-25mm: declare mpol_to_str() when CONFIG_TMPFS=nStephen Wilson1-2/+2
When CONFIG_TMPFS=n mpol_to_str() is not declared in mempolicy.h. However, in the NUMA case, the definition is always compiled. Since it is not strictly true that tmpfs is the only client, and since the symbol was always lurking around anyways, export mpol_to_str() unconditionally. Furthermore, this will allow us to move show_numa_map() out of mempolicy.c and into the procfs subsystem. Signed-off-by: Stephen Wilson <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: David Rientjes <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: remove check_huge_range()Stephen Wilson1-35/+0
This function has been superseded by gather_hugetbl_stats() and is no longer needed. Signed-off-by: Stephen Wilson <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: David Rientjes <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: make gather_stats() type-safe and remove forward declarationStephen Wilson1-4/+4
Improve the prototype of gather_stats() to take a struct numa_maps as argument instead of a generic void *. Update all callers to make the required type explicit. Since gather_stats() is not needed before its definition and is scheduled to be moved out of mempolicy.c the declaration is removed as well. Signed-off-by: Stephen Wilson <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: David Rientjes <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: remove MPOL_MF_STATSStephen Wilson1-4/+1
Mapping statistics in a NUMA environment is now computed using the generic walk_page_range() logic. Remove the old/equivalent functionality. Signed-off-by: Stephen Wilson <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: David Rientjes <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: use walk_page_range() instead of custom page table walking codeStephen Wilson1-7/+68
Converting show_numa_map() to use the generic routine decouples the function from mempolicy.c, allowing it to be moved out of the mm subsystem and into fs/proc. Also, include KSM pages in /proc/pid/numa_maps statistics. The pagewalk logic implemented by check_pte_range() failed to account for such pages as they were not applicable to the page migration case. Signed-off-by: Stephen Wilson <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: David Rientjes <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: export get_vma_policy()Stephen Wilson2-1/+4
In commit 48fce3429d ("mempolicies: unexport get_vma_policy()") get_vma_policy() was marked static as all clients were local to mempolicy.c. However, the decision to generate /proc/pid/numa_maps in the numa memory policy code and outside the procfs subsystem introduces an artificial interdependency between the two systems. Exporting get_vma_policy() once again is the first step to clean up this interdependency. Signed-off-by: Stephen Wilson <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: David Rientjes <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: remove last trace of shmem_get_unmapped_areaHugh Dickins1-8/+0
Remove noMMU declaration of shmem_get_unmapped_area() from mm.h: it fell out of use in 2.6.21 and ceased to exist in 2.6.29. Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25tmpfs: implement generic xattr supportEric Paris3-56/+290
Implement generic xattrs for tmpfs filesystems. The Feodra project, while trying to replace suid apps with file capabilities, realized that tmpfs, which is used on the build systems, does not support file capabilities and thus cannot be used to build packages which use file capabilities. Xattrs are also needed for overlayfs. The xattr interface is a bit odd. If a filesystem does not implement any {get,set,list}xattr functions the VFS will call into some random LSM hooks and the running LSM can then implement some method for handling xattrs. SELinux for example provides a method to support security.selinux but no other security.* xattrs. As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have xattr handler routines specifically to handle acls. Because of this tmpfs would loose the VFS/LSM helpers to support the running LSM. To make up for that tmpfs had stub functions that did nothing but call into the LSM hooks which implement the helpers. This new patch does not use the LSM fallback functions and instead just implements a native get/set/list xattr feature for the full security.* and trusted.* namespace like a normal filesystem. This means that tmpfs can now support both security.selinux and security.capability, which was not previously possible. The basic implementation is that I attach a: struct shmem_xattr { struct list_head list; /* anchored by shmem_inode_info->xattr_list */ char *name; size_t size; char value[0]; }; Into the struct shmem_inode_info for each xattr that is set. This implementation could easily support the user.* namespace as well, except some care needs to be taken to prevent large amounts of unswappable memory being allocated for unprivileged users. [[email protected]: new config option, suport trusted.*, support symlinks] Signed-off-by: Eric Paris <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]> Acked-by: Serge Hallyn <[email protected]> Tested-by: Serge Hallyn <[email protected]> Cc: Kyle McMartin <[email protected]> Acked-by: Hugh Dickins <[email protected]> Tested-by: Jordi Pujol <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25memblock/nobootmem: remove unneeded code from alloc_bootmem_node_high()Yinghai Lu1-23/+0
The bootmem wrapper with memblock supports top-down now, so we no longer need this trick. Signed-off-by: Yinghai LU <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Olaf Hering <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Lucas De Marchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25memblock/nobootmem: allow alloc_bootmem() to take 0 as low limitYinghai Lu1-9/+16
The bootmem wrapper with memblock supports top-down now, so we do not need to set the low limit to __pa(MAX_DMA_ADDRESS). The logic should be: good to allocate above __pa(MAX_DMA_ADDRESS), but it is ok if we can not find memory above 16M on system that has a small amount of RAM. Signed-off-by: Yinghai LU <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Olaf Hering <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Lucas De Marchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: delete non-atomic mm counter implementationMatt Fleming2-43/+10
The problem with having two different types of counters is that developers adding new code need to keep in mind whether it's safe to use both the atomic and non-atomic implementations. For example, when adding new callers of the *_mm_counter() functions a developer needs to ensure that those paths are always executed with page_table_lock held, in case we're using the non-atomic implementation of mm counters. Hugh Dickins introduced the atomic mm counters in commit f412ac08c986 ("[PATCH] mm: fix rss and mmlist locking"). When asked why he left the non-atomic counters around he said, | The only reason was to avoid adding costly atomic operations into a | configuration that had no need for them there: the page_table_lock | sufficed. | | Certainly it would be simpler just to delete the non-atomic variant. | | And I think it's fair to say that any configuration on which we're | measuring performance to that degree (rather than "does it boot fast?" | type measurements), would already be going the split ptlocks route. Removing the non-atomic counters eases the maintenance burden because developers no longer have to mindful of the two implementations when using *_mm_counter(). Note that all architectures provide a means of atomically updating atomic_long_t variables, even if they have to revert to the generic spinlock implementation because they don't support 64-bit atomic instructions (see lib/atomic64.c). Signed-off-by: Matt Fleming <[email protected]> Acked-by: Dave Hansen <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: fail GFP_DMA allocations when ZONE_DMA is not configuredDavid Rientjes1-0/+4
The page allocator will improperly return a page from ZONE_NORMAL even when __GFP_DMA is passed if CONFIG_ZONE_DMA is disabled. The caller expects DMA memory, perhaps for ISA devices with 16-bit address registers, and may get higher memory resulting in undefined behavior. This patch causes the page allocator to return NULL in such circumstances with a warning emitted to the kernel log on the first occurrence. Signed-off-by: David Rientjes <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: do not define PFN_SECTION_SHIFT if !CONFIG_SPARSEMEMDaniel Kiper1-4/+0
Do not define PFN_SECTION_SHIFT if !CONFIG_SPARSEMEM. Signed-off-by: Daniel Kiper <[email protected]> Acked-by: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: pfn_to_section_nr()/section_nr_to_pfn() is valid only in ↵Daniel Kiper1-3/+3
CONFIG_SPARSEMEM context pfn_to_section_nr()/section_nr_to_pfn() is valid only in CONFIG_SPARSEMEM context. Move it to proper place. Signed-off-by: Daniel Kiper <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: enable set_page_section() only if CONFIG_SPARSEMEM and ↵Daniel Kiper1-6/+8
!CONFIG_SPARSEMEM_VMEMMAP set_page_section() is meaningful only in CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP context. Move it to proper place and amend accordingly functions which are using it. Signed-off-by: Daniel Kiper <[email protected]> Acked-by: Dave Hansen <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: remove dependency on CONFIG_FLATMEM from online_page()Daniel Kiper1-4/+0
online_pages() is only compiled for CONFIG_MEMORY_HOTPLUG_SPARSE, so there is no need to support CONFIG_FLATMEM code within it. This patch removes code that is never used. Signed-off-by: Daniel Kiper <[email protected]> Acked-by: Dave Hansen <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: filter unevictable page out in deactivate_page()Minchan Kim1-0/+7
It's pointless that deactive_page's operates on unevictable pages. This patch removes unnecessary overhead which might be a bit problem in case that there are many unevictable page in system(ex, mprotect workload) [[email protected]: tidy up comment] Reviewed-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Minchan Kim <[email protected]> Reviewed-by: Rik van Riel<[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25readahead: trigger mmap sequential readahead on PG_readaheadWu Fengguang1-4/+2
Previously the mmap sequential readahead is triggered by updating ra->prev_pos on each page fault and compare it with current page offset. It costs dirtying the cache line on each _minor_ page fault. So remove the ra->prev_pos recording, and instead tag PG_readahead to trigger the possible sequential readahead. It's not only more simple, but also will work more reliably and reduce cache line bouncing on concurrent page faults on shared struct file. In the mosbench exim benchmark which does multi-threaded page faults on shared struct file, the ra->mmap_miss and ra->prev_pos updates are found to cause excessive cache line bouncing on tmpfs, which actually disabled readahead totally (shmem_backing_dev_info.ra_pages == 0). So remove the ra->prev_pos recording, and instead tag PG_readahead to trigger the possible sequential readahead. It's not only more simple, but also will work more reliably on concurrent reads on shared struct file. Signed-off-by: Wu Fengguang <[email protected]> Tested-by: Tim Chen <[email protected]> Reported-by: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25readahead: reduce unnecessary mmap_miss increasesAndi Kleen1-1/+2
The original INT_MAX is too large, reduce it to - avoid unnecessarily dirtying/bouncing the cache line - restore mmap read-around faster on changed access pattern Background: in the mosbench exim benchmark which does multi-threaded page faults on shared struct file, the ra->mmap_miss updates are found to cause excessive cache line bouncing on tmpfs. The ra state updates are needless for tmpfs because it actually disabled readahead totally (shmem_backing_dev_info.ra_pages == 0). Tested-by: Tim Chen <[email protected]> Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25readahead: return early when readahead is disabledWu Fengguang1-6/+6
Reduce readahead overheads by returning early in do_sync_mmap_readahead(). tmpfs has ra_pages=0 and it can page fault really fast (not constraint by IO if not swapping). Signed-off-by: Wu Fengguang <[email protected]> Tested-by: Tim Chen <[email protected]> Reported-by: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25vmscan: change shrinker API by passing shrink_control structYing Han21-61/+95
Change each shrinker's API by consolidating the existing parameters into shrink_control struct. This will simplify any further features added w/o touching each file of shrinker. [[email protected]: fix build] [[email protected]: fix warning] [[email protected]: fix up new shrinker API] [[email protected]: fix xfs warning] [[email protected]: update gfs2] Signed-off-by: Ying Han <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Minchan Kim <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Steven Whitehouse <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25vmscan: change shrink_slab() interfaces by passing shrink_controlYing Han4-17/+55
Consolidate the existing parameters to shrink_slab() into a new shrink_control struct. This is needed later to pass the same struct to shrinkers. Signed-off-by: Ying Han <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Minchan Kim <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25readahead: readahead page allocations are OK to failWu Fengguang2-1/+7
Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations. readahead page allocations are completely optional. They are OK to fail and in particular shall not trigger OOM on themselves. Reported-by: Dave Young <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Wu Fengguang <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: check if any page in a pageblock is reserved before marking it ↵Arve Hjønnevåg1-2/+17
MIGRATE_RESERVE This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even though it only had a few free pages. eg, On current ARM port, The kernel starts at offset 0x8000 to leave room for boot parameters, and the memory is freed later. This in turn caused no contiguous memory to be reserved and frequent kswapd wakeups that emptied the caches to get more contiguous memory. Unfortunatelly, ARM needs order-2 allocation for pgd (see arm/mm/pgd.c#pgd_alloc()). Therefore the issue is not minor nor easy avoidable. [[email protected]: added some explanation] [[email protected]: add !pfn_valid_within() to check] [[email protected]: check end_pfn in pageblock_is_reserved] Signed-off-by: John Stultz <[email protected]> Signed-off-by: Arve Hjønnevåg <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25m32r, mm: set all online nodes in N_NORMAL_MEMORYDavid Rientjes1-0/+1
For m32r, N_NORMAL_MEMORY represents all nodes that have present memory since it does not support HIGHMEM. This patch sets the bit at the time the node is initialized. If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it uses this nodemask to setup per-cache kmem_cache_node data structures. Signed-off-by: David Rientjes <[email protected]> Cc: Hirokazu Takata <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25alpha, mm: set all online nodes in N_NORMAL_MEMORYDavid Rientjes1-0/+1
For alpha, N_NORMAL_MEMORY represents all nodes that have present memory since it does not support HIGHMEM. This patch sets the bit at the time the node is initialized. If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it uses this nodemask to setup per-cache kmem_cache_node data structures. Signed-off-by: David Rientjes <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: strictly require elevated page refcount in isolate_lru_page()Konstantin Khlebnikov1-1/+4
isolate_lru_page() must be called only with stable reference to the page, this is what is written in the comment above it, this is reasonable. current isolate_lru_page() users and its page extra reference sources: mm/huge_memory.c: __collapse_huge_page_isolate() - reference from pte mm/memcontrol.c: mem_cgroup_move_parent() - get_page_unless_zero() mem_cgroup_move_charge_pte_range() - reference from pte mm/memory-failure.c: soft_offline_page() - fixed, reference from get_any_page() delete_from_lru_cache() - reference from caller or get_page_unless_zero() [ seems like there bug, because __memory_failure() can call page_action() for hpages tail, but it is ok for isolate_lru_page(), tail getted and not in lru] mm/memory_hotplug.c: do_migrate_range() - fixed, get_page_unless_zero() mm/mempolicy.c: migrate_page_add() - reference from pte mm/migrate.c: do_move_page_to_node_array() - reference from follow_page() mlock.c: - various external references mm/vmscan.c: putback_lru_page() - reference from isolate_lru_page() It seems that all isolate_lru_page() users are ready now for this restriction. So, let's replace redundant get_page_unless_zero() with get_page() and add page initial reference count check with VM_BUG_ON() Signed-off-by: Konstantin Khlebnikov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mem-hwpoison: fix page refcount around isolate_lru_page()Konstantin Khlebnikov1-5/+6
Drop first page reference only after calling isolate_lru_page() to keep page stable reference while isolating. Signed-off-by: Konstantin Khlebnikov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mem-hotplug: call isolate_lru_page with elevated refcountKonstantin Khlebnikov1-1/+3
isolate_lru_page() must be called only with stable reference to page. So, let's grab normal page reference. Signed-off-by: Konstantin Khlebnikov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: print vmalloc() state after allocation failuresDave Hansen1-2/+7
I was tracking down a page allocation failure that ended up in vmalloc(). Since vmalloc() uses 0-order pages, if somebody asks for an insane amount of memory, we'll still get a warning with "order:0" in it. That's not very useful. During recovery, vmalloc() also nicely frees all of the memory that it got up to the point of the failure. That is wonderful, but it also quickly hides any issues. We have a much different sitation if vmalloc() repeatedly fails 10GB in to: vmalloc(100 * 1<<30); versus repeatedly failing 4096 bytes in to a: vmalloc(8192); This patch will print out messages that look like this: [ 68.123503] vmalloc: allocation failure, allocated 6680576 of 13426688 bytes [ 68.124218] bash: page allocation failure: order:0, mode:0xd2 [ 68.124811] Pid: 3770, comm: bash Not tainted 2.6.39-rc3-00082-g85f2e68-dirty #333 [ 68.125579] Call Trace: [ 68.125853] [<ffffffff810f6da6>] warn_alloc_failed+0x146/0x170 [ 68.126464] [<ffffffff8107e05c>] ? printk+0x6c/0x70 [ 68.126791] [<ffffffff8112b5d4>] ? alloc_pages_current+0x94/0xe0 [ 68.127661] [<ffffffff8111ed37>] __vmalloc_node_range+0x237/0x290 ... The 'order' variable is added for clarity when calling warn_alloc_failed() to avoid having an unexplained '0' as an argument. The 'tmp_mask' is because adding an open-coded '| __GFP_NOWARN' would take us over 80 columns for the alloc_pages_node() call. If we are going to add a line, it might as well be one that makes the sucker easier to read. As a side issue, I also noticed that ctl_ioctl() does vmalloc() based solely on an unverified value passed in from userspace. Granted, it's under CAP_SYS_ADMIN, but it still frightens me a bit. Signed-off-by: Dave Hansen <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: David Rientjes <[email protected]> Cc: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: break out page allocation warning codeDave Hansen2-21/+43
This originally started as a simple patch to give vmalloc() some more verbose output on failure on top of the plain page allocator messages. Johannes suggested that it might be nicer to lead with the vmalloc() info _before_ the page allocator messages. But, I do think there's a lot of value in what __alloc_pages_slowpath() does with its filtering and so forth. This patch creates a new function which other allocators can call instead of relying on the internal page allocator warnings. It also gives this function private rate-limiting which separates it from other printk_ratelimit() users. Signed-off-by: Dave Hansen <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: David Rientjes <[email protected]> Cc: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: convert mm->cpu_vm_cpumask into cpumask_var_tKOSAKI Motohiro7-9/+44
cpumask_t is very big struct and cpu_vm_mask is placed wrong position. It might lead to reduce cache hit ratio. This patch has two change. 1) Move the place of cpumask into last of mm_struct. Because usually cpumask is accessed only front bits when the system has cpu-hotplug capability 2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory footprint if cpumask_size() will use nr_cpumask_bits properly in future. In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var. It may help to detect out of tree cpu_vm_mask users. This patch has no functional change. [[email protected]: build fix] [[email protected]: coding-style fixes] Signed-off-by: KOSAKI Motohiro <[email protected]> Cc: David Howells <[email protected]> Cc: Koichi Yasutake <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Chris Metcalf <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: thp: optimize memcg charge in khugepagedAndrea Arcangeli1-10/+11
We don't need to hold the mmmap_sem through mem_cgroup_newpage_charge(), the mmap_sem is only hold for keeping the vma stable and we don't need the vma stable anymore after we return from alloc_hugepage_vma(). Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: David Rientjes <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: uninline large generic tlb.h functionsPeter Zijlstra2-124/+135
Some of these functions have grown beyond inline sanity, move them out-of-line. Signed-off-by: Peter Zijlstra <[email protected]> Requested-by: Andrew Morton <[email protected]> Requested-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: optimize page_lock_anon_vma() fast-pathPeter Zijlstra1-4/+82
Optimize the page_lock_anon_vma() fast path to be one atomic op, instead of two. Signed-off-by: Peter Zijlstra <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: convert anon_vma->lock to a mutexPeter Zijlstra6-25/+21
Straightforward conversion of anon_vma->lock to a mutex. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Hugh Dickins <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: use refcounts for page_lock_anon_vma()Peter Zijlstra2-28/+31
Convert page_lock_anon_vma() over to use refcounts. This is done to prepare for the conversion of anon_vma from spinlock to mutex. Sadly this inceases the cost of page_lock_anon_vma() from one to two atomics, a follow up patch addresses this, lets keep that simple for now. Signed-off-by: Peter Zijlstra <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: improve page_lock_anon_vma() commentPeter Zijlstra1-2/+16
A slightly more verbose comment to go along with the trickery in page_lock_anon_vma(). Signed-off-by: Peter Zijlstra <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: revert page_lock_anon_vma() lock annotationPeter Zijlstra2-17/+2
Its beyond ugly and gets in the way. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: Convert i_mmap_lock to a mutexPeter Zijlstra17-58/+58
Straightforward conversion of i_mmap_lock to a mutex. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: Remove i_mmap_lock lockbreakPeter Zijlstra8-188/+28
Hugh says: "The only significant loser, I think, would be page reclaim (when concurrent with truncation): could spin for a long time waiting for the i_mmap_mutex it expects would soon be dropped? " Counter points: - cpu contention makes the spin stop (need_resched()) - zap pages should be freeing pages at a higher rate than reclaim ever can I think the simplification of the truncate code is definitely worth it. Effectively reverts: 2aa15890f3c ("mm: prevent concurrent unmap_mapping_range() on the same inode") and takes out the code that caused its problem. Signed-off-by: Peter Zijlstra <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25lockdep, mutex: provide mutex_lock_nest_lockPeter Zijlstra3-8/+29
In order to convert i_mmap_lock to a mutex we need a mutex equivalent to spin_lock_nest_lock(), thus provide the mutex_lock_nest_lock() annotation. As with spin_lock_nest_lock(), mutex_lock_nest_lock() allows annotation of the locking pattern where an outer lock serializes the acquisition order of nested locks. That is, if every time you lock multiple locks A, say A1 and A2 you first acquire N, the order of acquiring A1 and A2 is irrelevant. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: extended batches for generic mmu_gatherPeter Zijlstra2-47/+84
Instead of using a single batch (the small on-stack, or an allocated page), try and extend the batch every time it runs out and only flush once either the extend fails or we're done. Signed-off-by: Peter Zijlstra <[email protected]> Requested-by: Nick Piggin <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm, powerpc: move the RCU page-table freeing into generic codePeter Zijlstra10-125/+150
In case other architectures require RCU freed page-tables to implement gup_fast() and software filled hashes and similar things, provide the means to do so by moving the logic into generic code. Signed-off-by: Peter Zijlstra <[email protected]> Requested-by: David Miller <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25mm: now that all old mmu_gather code is gone, remove the storagePeter Zijlstra21-41/+0
Fold all the mmu_gather rework patches into one for submission Signed-off-by: Peter Zijlstra <[email protected]> Reported-by: Hugh Dickins <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25um: mmu_gather reworkPeter Zijlstra1-18/+11
Fix up the um mmu_gather code to conform to the new API. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Tony Luck <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25ia64: mmu_gather reworkPeter Zijlstra1-20/+46
Fix up the ia64 mmu_gather code to conform to the new API. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Tony Luck <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25sh: mmu_gather reworkPeter Zijlstra1-11/+17
Fix up the sh mmu_gather code to conform to the new API. Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Paul Mundt <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25arm: mmu_gather reworkPeter Zijlstra1-16/+37
Fix up the arm mmu_gather code to conform to the new API. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-25s390: mmu_gather reworkPeter Zijlstra1-25/+37
Adapt the stand-alone s390 mmu_gather implementation to the new preemptible mmu_gather interface. Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Miller <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Tony Luck <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>