aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2013-07-09memcg: don't need to free memcg via RCU or workqueueLi Zefan1-46/+5
Now memcg has the same life cycle with its corresponding cgroup, and a cgroup is freed via RCU and then mem_cgroup_css_free() will be called in a work function, so we can simply call __mem_cgroup_free() in mem_cgroup_css_free(). This actually reverts commit 59927fb984d ("memcg: free mem_cgroup by RCU to fix oops"). Signed-off-by: Li Zefan <[email protected]> Cc: Hugh Dickins <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: kill memcg refcntLi Zefan1-17/+1
Now memcg has the same life cycle as its corresponding cgroup. Kill the useless refcnt. Signed-off-by: Li Zefan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: don't need to get a reference to the parentLi Zefan1-16/+3
The cgroup core guarantees it's always safe to access the parent. Signed-off-by: Li Zefan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: use css_get/put for swap memcgLi Zefan1-10/+16
Use css_get/put instead of mem_cgroup_get/put. A simple replacement will do. The historical reason that memcg has its own refcnt instead of always using css_get/put, is that cgroup couldn't be removed if there're still css refs, so css refs can't be used as long-lived reference. The situation has changed so that rmdir a cgroup will succeed regardless css refs, but won't be freed until css refs goes down to 0. Signed-off-by: Li Zefan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: use css_get/put when charging/uncharging kmemLi Zefan1-26/+54
Use css_get/put instead of mem_cgroup_get/put. We can't do a simple replacement, because here mem_cgroup_put() is called during mem_cgroup_css_free(), while mem_cgroup_css_free() won't be called until css refcnt goes down to 0. Instead we increment css refcnt in mem_cgroup_css_offline(), and then check if there's still kmem charges. If not, css refcnt will be decremented immediately, otherwise the refcnt will be released after the last kmem allocation is uncahred. [[email protected]: tweak comment] Signed-off-by: Li Zefan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: don't use mem_cgroup_get() when creating a kmemcg cacheLi Zefan1-5/+5
Use css_get()/css_put() instead of mem_cgroup_get()/mem_cgroup_put(). There are two things being done in the current code: First, we acquired a css_ref to make sure that the underlying cgroup would not go away. That is a short lived reference, and it is put as soon as the cache is created. At this point, we acquire a long-lived per-cache memcg reference count to guarantee that the memcg will still be alive. so it is: enqueue: css_get create : memcg_get, css_put destroy: memcg_put So we only need to get rid of the memcg_get, change the memcg_put to css_put, and get rid of the now extra css_put. (This changelog is mostly written by Glauber) Signed-off-by: Li Zefan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: use css_get() in sock_update_memcg()Li Zefan1-4/+4
Use css_get/css_put instead of mem_cgroup_get/put. Note, if at the same time someone is moving @current to a different cgroup and removing the old cgroup, css_tryget() may return false, and sock->sk_cgrp won't be initialized, which is fine. Signed-off-by: Li Zefan <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg, kmem: fix reference count handling on the error pathMichal Hocko1-8/+0
mem_cgroup_css_online calls mem_cgroup_put if memcg_init_kmem fails. This is not correct because only memcg_propagate_kmem takes an additional reference while mem_cgroup_sockets_init is allowed to fail as well (although no current implementation fails) but it doesn't take any reference. This all suggests that it should be memcg_propagate_kmem that should clean up after itself so this patch moves mem_cgroup_put over there. Unfortunately this is not that easy (as pointed out by Li Zefan) because memcg_kmem_mark_dead marks the group dead (KMEM_ACCOUNTED_DEAD) if it is marked active (KMEM_ACCOUNTED_ACTIVE) which is the case even if memcg_propagate_kmem fails so the additional reference is dropped in that case in kmem_cgroup_destroy which means that the reference would be dropped two times. The easiest way then would be to simply remove mem_cgrroup_put from mem_cgroup_css_online and rely on kmem_cgroup_destroy doing the right thing. Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Li Zefan <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: <[email protected]> [3.8] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09Revert "memcg: avoid dangling reference count in creation failure"Michal Hocko1-2/+0
This reverts commit e4715f01be697a. mem_cgroup_put is hierarchy aware so mem_cgroup_put(memcg) already drops an additional reference from all parents so the additional mem_cgrroup_put(parent) potentially causes use-after-free. Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Li Zefan <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: <[email protected]> [3.9+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mmap: allow MAP_HUGETLB for hugetlbfs files v2Jörn Engel1-2/+4
It is counterintuitive at best that mmap'ing a hugetlbfs file with MAP_HUGETLB fails, while mmap'ing it without will a) succeed and b) return huge pages. v2: use is_file_hugepages(), as suggested by Jianguo Signed-off-by: Joern Engel <[email protected]> Cc: Jianguo Wu <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: vmscan: do not scale writeback pages when deciding whether to set ↵Mel Gorman1-15/+1
ZONE_WRITEBACK After the patch "mm: vmscan: Flatten kswapd priority loop" was merged the scanning priority of kswapd changed. The priority now rises until it is scanning enough pages to meet the high watermark. shrink_inactive_list sets ZONE_WRITEBACK if a number of pages were encountered under writeback but this value is scaled based on the priority. As kswapd frequently scans with a higher priority now it is relatively easy to set ZONE_WRITEBACK. This patch removes the scaling and treates writeback pages similar to how it treats unqueued dirty pages and congested pages. The user-visible effect should be that kswapd will writeback fewer pages from reclaim context. Signed-off-by: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: vmscan: do not continue scanning if reclaim was aborted for compactionMel Gorman1-3/+5
Direct reclaim is not aborting to allow compaction to go ahead properly. do_try_to_free_pages is told to abort reclaim which is happily ignores and instead increases priority instead until it reaches 0 and starts shrinking file/anon equally. This patch corrects the situation by aborting reclaim when requested instead of raising priority. Signed-off-by: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/memory_hotplug.c: fix a comment typo in register_page_bootmem_info_node()Tang Chen1-2/+2
Signed-off-by: Tang Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/memblock.c: fix wrong comment in __next_free_mem_range()Tang Chen1-1/+1
Remove one redundant "nid" in the comment. Signed-off-by: Tang Chen <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09page migration: fix wrong comment in address_space_operations.migratepage()Tang Chen1-2/+2
There is no parameter "sync" in address_space_operations->migratepage(). It should be migrate_mode. And the comment is for MIGRATE_ASYNC. Signed-off-by: Tang Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: fix an overflow bug in alloc_vmap_area()Zhang Yanfei1-3/+3
When searching a vmap area in the vmalloc space, we use (addr + size - 1) to check if the value is less than addr, which is an overflow. But we assign (addr + size) to vmap_area->va_end. So if we come across the below case: (addr + size - 1) : not overflow (addr + size) : overflow we will assign an overflow value (e.g 0) to vmap_area->va_end, And this will trigger BUG in __insert_vmap_area, causing system panic. So using (addr + size) to check the overflow should be the correct behaviour, not (addr + size - 1). Signed-off-by: Zhang Yanfei <[email protected]> Reported-by: Ghennadi Procopciuc <[email protected]> Tested-by: Daniel Baluta <[email protected]> Cc: David Rientjes <[email protected]> Cc: Minchan Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: remove unused VM_<READfoo> macros and expand other in-placeJoe Perches4-11/+5
These VM_<READfoo> macros aren't used very often and three of them aren't used at all. Expand the ones that are used in-place, and remove all the now unused #define VM_<foo> macros. VM_READHINTMASK, VM_NormalReadHint and VM_ClearReadHint were added just before 2.4 and appears have never been used. Signed-off-by: Joe Perches <[email protected]> Acked-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/pgtable: don't accumulate addr during pgd prepopulate pmdWanpeng Li1-3/+1
The old codes accumulate addr to get right pmd, however, currently pmds are preallocated and transfered as a parameter, there is unnecessary to accumulate addr variable any more, this patch remove it. Signed-off-by: Wanpeng Li <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Reviewed-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/thp: fix doc for transparent huge zero pageWanpeng Li1-2/+2
Transparent huge zero page is used during the page fault instead of in khugepaged. # ls /sys/kernel/mm/transparent_hugepage/ defrag enabled khugepaged use_zero_page # ls /sys/kernel/mm/transparent_hugepage/khugepaged/ alloc_sleep_millisecs defrag full_scans max_ptes_none pages_collapsed pages_to_scan scan_sleep_millisecs This patch corrects the documentation just like the codes done. Signed-off-by: Wanpeng Li <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/page_alloc: fix doc for numa_zonelist_orderWanpeng Li1-1/+1
The default zonelist order selecter will select "node" order if any nodes DMA zone comprises greater than 70% of its local memory instead of 60%, according to default_zonelist_order::low_kmem_size > total * 70/100. Signed-off-by: Wanpeng Li <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/writeback: commit reason of WB_REASON_FORKER_THREAD mismatch nameWanpeng Li1-0/+6
After commit 839a8e8660b6 ("writeback: replace custom worker pool implementation with unbound workqueue"), there is no bdi forker thread any more. However, WB_REASON_FORKER_THREAD is still used due to it is TPs userland visible and we won't be exposing exactly the same information with just a different name. Signed-off-by: Wanpeng Li <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/writeback: don't check force_wait to handle bdi->work_listWanpeng Li1-8/+2
After commit 839a8e8660b6 ("writeback: replace custom worker pool implementation with unbound workqueue"), bdi_writeback_workfn runs off bdi_writeback->dwork, on each execution, it processes bdi->work_list and reschedules if there are more things to do instead of flush any work that race with us existing. It is unecessary to check force_wait in wb_do_writeback since it is always 0 after the mentioned commit. This patch remove the force_wait in wb_do_writeback. Signed-off-by: Wanpeng Li <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Reviewed-by: Fengguang Wu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/writeback: remove wb_reason_nameWanpeng Li1-1/+0
wb_reason_name is not used any more - remove it. Signed-off-by: Wanpeng Li <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Reviewed-by: Fengguang Wu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09fs/fs-writeback.c: : make wb_do_writeback() as staticHaicheng Li2-2/+1
It's not used globally and could be static. Signed-off-by: Haicheng Li <[email protected]> Cc: Jan Kara <[email protected]> Cc: Wu Fengguang <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/sparse.c: put clear_hwpoisoned_pages within CONFIG_MEMORY_HOTREMOVEZhang Yanfei1-1/+1
With CONFIG_MEMORY_HOTREMOVE unset, there is a compile warning: mm/sparse.c:755: warning: `clear_hwpoisoned_pages' defined but not used And Bisecting it ended up pointing to 4edd7ceff ("mm, hotplug: avoid compiling memory hotremove functions when disabled"). This is because the commit above put sparse_remove_one_section() within the protection of CONFIG_MEMORY_HOTREMOVE but the only user of clear_hwpoisoned_pages() is sparse_remove_one_section(), and it is not within the protection of CONFIG_MEMORY_HOTREMOVE. So put clear_hwpoisoned_pages within CONFIG_MEMORY_HOTREMOVE should fix the warning. Signed-off-by: Zhang Yanfei <[email protected]> Cc: David Rientjes <[email protected]> Acked-by: Toshi Kani <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: remove unused __put_page()Zhang Yanfei1-5/+0
This function is nowhere used, and it has a confusing name with put_page in mm/swap.c. So better to remove it. Signed-off-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09vfree: don't schedule free_work() if llist_add() returns falseOleg Nesterov1-3/+2
vfree() only needs schedule_work(&p->wq) if p->list was empty, otherwise vfree_deferred->wq is already pending or it is running and didn't do llist_del_all() yet. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/page_alloc.c: remove unlikely() from the current_order testZhang Yanfei1-1/+1
In __rmqueue_fallback(), current_order loops down from MAX_ORDER - 1 to the order passed. MAX_ORDER is typically 11 and pageblock_order is typically 9 on x86. Integer division truncates, so pageblock_order / 2 is 4. For the first eight iterations, it's guaranteed that current_order >= pageblock_order / 2 if it even gets that far! So just remove the unlikely(), it's completely bogus. Signed-off-by: Zhang Yanfei <[email protected]> Suggested-by: David Rientjes <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: remove unused functions is_{normal_idx, normal, dma32, dma}Zhang Yanfei1-28/+0
These functions are nowhere used, so remove them. Signed-off-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/page_alloc.c: remove zone_type argument of build_zonelists_nodeZhang Yanfei1-13/+8
The callers of build_zonelists_node always pass MAX_NR_ZONES -1 as the zone_type argument, so we can directly use the value in build_zonelists_node and remove zone_type argument. Signed-off-by: Zhang Yanfei <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09MAINTAINERS: add zswap and zbud maintainerSeth Jennings1-0/+13
Add maintainer information for zswap and zbud into the MAINTAINERS file. Signed-off-by: Seth Jennings <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09include/linux/gfp.h: fix the comment for GFP_ZONE_TABLEZhang Yanfei1-1/+1
0xc just means MOVABLE + DMA32, which results in zone DMA32. Signed-off-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: do not account memory used for cache creationGlauber Costa1-0/+2
The memory we used to hold the memcg arrays is currently accounted to the current memcg. But that creates a problem, because that memory can only be freed after the last user is gone. Our only way to know which is the last user, is to hook up to freeing time, but the fact that we still have some in flight kmallocs will prevent freeing to happen. I believe therefore to be just easier to account this memory as global overhead. Signed-off-by: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: also test for skip accounting at the page allocation levelGlauber Costa1-0/+28
The memory we used to hold the memcg arrays is currently accounted to the current memcg. But that creates a problem, because that memory can only be freed after the last user is gone. Our only way to know which is the last user, is to hook up to freeing time, but the fact that we still have some in flight kmallocs will prevent freeing to happen. I believe therefore to be just easier to account this memory as global overhead. This patch (of 2): Disabling accounting is only relevant for some specific memcg internal allocations. Therefore we would initially not have such check at memcg_kmem_newpage_charge, since direct calls to the page allocator that are marked with GFP_KMEMCG only happen outside memcg core. We are mostly concerned with cache allocations and by having this test at memcg_kmem_get_cache we are already able to relay the allocation to the root cache and bypass the memcg caches altogether. There is one exception, though: the SLUB allocator does not create large order caches, but rather service large kmallocs directly from the page allocator. Therefore, the following sequence, when backed by the SLUB allocator: memcg_stop_kmem_account(); kmalloc(<large_number>) memcg_resume_kmem_account(); would effectively ignore the fact that we should skip accounting, since it will drive us directly to this function without passing through the cache selector memcg_kmem_get_cache. Such large allocations are extremely rare but can happen, for instance, for the cache arrays. This was never a problem in practice, because we weren't skipping accounting for the cache arrays. All the allocations we were skipping were fairly small. However, the fact that we were not skipping those allocations are a problem and can prevent the memcgs from going away. As we fix that, we need to make sure that the fix will also work with the SLUB allocator. Signed-off-by: Glauber Costa <[email protected]> Reported-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: check VM_UNINITIALIZED flag in s_show instead of show_numa_infoZhang Yanfei1-5/+5
We should check the VM_UNITIALIZED flag in s_show(). If this flag is set, that said, the vm_struct is not fully initialized. So it is unnecessary to try to show the information contained in vm_struct. We checked this flag in show_numa_info(), but I think it's better to check it earlier. Signed-off-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: rename VM_UNLIST to VM_UNINITIALIZEDZhang Yanfei2-15/+15
VM_UNLIST was used to indicate that the vm_struct is not listed in vmlist. But after commit 4341fa454796 ("mm, vmalloc: remove list management of vmlist after initializing vmalloc"), the meaning of this flag changed. It now means the vm_struct is not fully initialized. So renaming it to VM_UNINITIALIZED seems more reasonable. Also change clear_vm_unlist to clear_vm_uninitialized_flag. Signed-off-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: emit the failure message before returnZhang Yanfei1-1/+1
Use goto to jump to the fail label to give a failure message before returning NULL. This makes the failure handling in this function consistent. Signed-off-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: remove alloc_map from vmap_blockZhang Yanfei1-3/+0
As we have removed the dead code in the vb_alloc, it seems there is no place to use the alloc_map. So there is no reason to maintain the alloc_map in vmap_block. Signed-off-by: Zhang Yanfei <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: remove unused purge_fragmented_blocks_thiscpuZhang Yanfei1-5/+0
This function is nowhere used now, so remove it. Signed-off-by: Zhang Yanfei <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: remove dead code in vb_allocZhang Yanfei1-15/+1
Space in a vmap block that was once allocated is considered dirty and not made available for allocation again before the whole block is recycled. The result is that free space within a vmap block is always contiguous. So if a vmap block has enough free space for allocation, the allocation is impossible to fail. Thus, the fragmented block purging was never invoked from vb_alloc(). So remove this dead code. [ Same patches also sent by: Chanho Min <[email protected]> Johannes Weiner <[email protected]> but git doesn't do "multiple authors" ] Signed-off-by: Zhang Yanfei <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: unbreak __vunmap()Dan Carpenter1-1/+1
There is an extra semi-colon so the function always returns. Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: remove duplicated call of get_pfn_range_for_nidZhang Yanfei1-11/+32
When calculating pages in a node, for each zone in that node, we will have zone_spanned_pages_in_node --> get_pfn_range_for_nid zone_absent_pages_in_node --> get_pfn_range_for_nid That is to say, we call the get_pfn_range_for_nid to get start_pfn and end_pfn of the node for MAX_NR_ZONES * 2 times. And this is totally unnecessary if we call the get_pfn_range_for_nid before zone_*_pages_in_node add two extra arguments node_start_pfn and node_end_pfn for zone_*_pages_in_node, then we can remove the get_pfn_range_in_node in zone_*_pages_in_node. [[email protected]: make definitions more readable] Signed-off-by: Zhang Yanfei <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: invoke oom-killer from remaining unconverted page fault handlersJohannes Weiner6-19/+24
A few remaining architectures directly kill the page faulting task in an out of memory situation. This is usually not a good idea since that task might not even use a significant amount of memory and so may not be the optimal victim to resolve the situation. Since 2.6.29's 1c0fe6e ("mm: invoke oom-killer from page fault") there is a hook that architecture page fault handlers are supposed to call to invoke the OOM killer and let it pick the right task to kill. Convert the remaining architectures over to this hook. To have the previous behavior of simply taking out the faulting task the vm.oom_kill_allocating_task sysctl can be set to 1. Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Vineet Gupta <[email protected]> [arch/arc bits] Cc: James Hogan <[email protected]> Cc: David Howells <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Chen Liqin <[email protected]> Cc: Lennox Wu <[email protected]> Cc: Chris Metcalf <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: clean up memcg->nodeinfoJohannes Weiner1-15/+5
Remove struct mem_cgroup_lru_info and fold its single member, the variably sized nodeinfo[0], directly into struct mem_cgroup. This should make it more obvious why it has to be the last member there. Also move the comment that's above that special last member below it, so it is more visible to somebody that considers appending to the struct mem_cgroup. Signed-off-by: Johannes Weiner <[email protected]> Cc: David Rientjes <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Glauber Costa <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: mremap: validate input before taking lockRasmus Villemoes1-8/+10
This patch is very similar to commit 84d96d897671 ("mm: madvise: complete input validation before taking lock"): perform some basic validation of the input to mremap() before taking the &current->mm->mmap_sem lock. This also makes the MREMAP_FIXED => MREMAP_MAYMOVE dependency slightly more explicit. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09drivers/iommu/msm_iommu_dev.c: fix leak and clean up error pathsLibo Chen1-14/+10
Fix two obvious problems: 1. We have registered msm_iommu_driver first, and need unregister it when registered msm_iommu_ctx_driver fail 2. We don't need to kfree drvdata before kzalloc was successful. [[email protected]: remove now-unneeded initialization of ctx_drvdata, remove unneeded braces] Signed-off-by: Libo Chen <[email protected]> Acked-by: David Brown <[email protected]> Cc: David Woodhouse <[email protected]> Cc: James Hogan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09fsnotify: update comments concerning locking schemeLino Sanfilippo1-28/+22
There have been changes in the locking scheme of fsnotify but the comments in the source code have not been updated yet. This patch corrects this. Signed-off-by: Lino Sanfilippo <[email protected]> Cc: Eric Paris <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09inotify: fix race when adding a new watchLino Sanfilippo1-9/+4
In inotify_new_watch() the number of watches for a group is compared against the max number of allowed watches and increased afterwards. The check and incrementation is not done atomically, so it is possible for multiple concurrent threads to pass the check and increment the number of marks above the allowed max. This patch uses an inotify groups mark_lock to ensure that both check and incrementation are done atomic. Furthermore we dont have to worry about the race that allows a concurrent thread to add a watch just after inotify_update_existing_watch() returned with -ENOENT anymore, since this is also synchronized by the groups mark mutex now. Signed-off-by: Lino Sanfilippo <[email protected]> Cc: Eric Paris <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09dnotify: replace dnotify_mark_mutex with mark mutex of dnotify_groupLino Sanfilippo1-12/+13
There is no need to use a special mutex to protect against the fcntl/close race (see dnotify.c for a description of this race). Instead the dnotify_groups mark mutex can be used. Signed-off-by: Lino Sanfilippo <[email protected]> Cc: Eric Paris <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09fanotify: put duplicate code for adding vfsmount/inode marks into an own ↵Lino Sanfilippo1-36/+35
function The code under the groups mark_mutex in fanotify_add_inode_mark() and fanotify_add_vfsmount_mark() is almost identical. So put it into a seperate function. Signed-off-by: Lino Sanfilippo <[email protected]> Cc: Eric Paris <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>