aboutsummaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)AuthorFilesLines
2013-07-09memcg: don't need to get a reference to the parentLi Zefan1-16/+3
The cgroup core guarantees it's always safe to access the parent. Signed-off-by: Li Zefan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: use css_get/put for swap memcgLi Zefan1-10/+16
Use css_get/put instead of mem_cgroup_get/put. A simple replacement will do. The historical reason that memcg has its own refcnt instead of always using css_get/put, is that cgroup couldn't be removed if there're still css refs, so css refs can't be used as long-lived reference. The situation has changed so that rmdir a cgroup will succeed regardless css refs, but won't be freed until css refs goes down to 0. Signed-off-by: Li Zefan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: use css_get/put when charging/uncharging kmemLi Zefan1-26/+54
Use css_get/put instead of mem_cgroup_get/put. We can't do a simple replacement, because here mem_cgroup_put() is called during mem_cgroup_css_free(), while mem_cgroup_css_free() won't be called until css refcnt goes down to 0. Instead we increment css refcnt in mem_cgroup_css_offline(), and then check if there's still kmem charges. If not, css refcnt will be decremented immediately, otherwise the refcnt will be released after the last kmem allocation is uncahred. [[email protected]: tweak comment] Signed-off-by: Li Zefan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: don't use mem_cgroup_get() when creating a kmemcg cacheLi Zefan1-5/+5
Use css_get()/css_put() instead of mem_cgroup_get()/mem_cgroup_put(). There are two things being done in the current code: First, we acquired a css_ref to make sure that the underlying cgroup would not go away. That is a short lived reference, and it is put as soon as the cache is created. At this point, we acquire a long-lived per-cache memcg reference count to guarantee that the memcg will still be alive. so it is: enqueue: css_get create : memcg_get, css_put destroy: memcg_put So we only need to get rid of the memcg_get, change the memcg_put to css_put, and get rid of the now extra css_put. (This changelog is mostly written by Glauber) Signed-off-by: Li Zefan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: use css_get() in sock_update_memcg()Li Zefan1-4/+4
Use css_get/css_put instead of mem_cgroup_get/put. Note, if at the same time someone is moving @current to a different cgroup and removing the old cgroup, css_tryget() may return false, and sock->sk_cgrp won't be initialized, which is fine. Signed-off-by: Li Zefan <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg, kmem: fix reference count handling on the error pathMichal Hocko1-8/+0
mem_cgroup_css_online calls mem_cgroup_put if memcg_init_kmem fails. This is not correct because only memcg_propagate_kmem takes an additional reference while mem_cgroup_sockets_init is allowed to fail as well (although no current implementation fails) but it doesn't take any reference. This all suggests that it should be memcg_propagate_kmem that should clean up after itself so this patch moves mem_cgroup_put over there. Unfortunately this is not that easy (as pointed out by Li Zefan) because memcg_kmem_mark_dead marks the group dead (KMEM_ACCOUNTED_DEAD) if it is marked active (KMEM_ACCOUNTED_ACTIVE) which is the case even if memcg_propagate_kmem fails so the additional reference is dropped in that case in kmem_cgroup_destroy which means that the reference would be dropped two times. The easiest way then would be to simply remove mem_cgrroup_put from mem_cgroup_css_online and rely on kmem_cgroup_destroy doing the right thing. Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Li Zefan <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: <[email protected]> [3.8] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09Revert "memcg: avoid dangling reference count in creation failure"Michal Hocko1-2/+0
This reverts commit e4715f01be697a. mem_cgroup_put is hierarchy aware so mem_cgroup_put(memcg) already drops an additional reference from all parents so the additional mem_cgrroup_put(parent) potentially causes use-after-free. Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Li Zefan <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: <[email protected]> [3.9+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mmap: allow MAP_HUGETLB for hugetlbfs files v2Jörn Engel1-2/+4
It is counterintuitive at best that mmap'ing a hugetlbfs file with MAP_HUGETLB fails, while mmap'ing it without will a) succeed and b) return huge pages. v2: use is_file_hugepages(), as suggested by Jianguo Signed-off-by: Joern Engel <[email protected]> Cc: Jianguo Wu <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: vmscan: do not scale writeback pages when deciding whether to set ↵Mel Gorman1-15/+1
ZONE_WRITEBACK After the patch "mm: vmscan: Flatten kswapd priority loop" was merged the scanning priority of kswapd changed. The priority now rises until it is scanning enough pages to meet the high watermark. shrink_inactive_list sets ZONE_WRITEBACK if a number of pages were encountered under writeback but this value is scaled based on the priority. As kswapd frequently scans with a higher priority now it is relatively easy to set ZONE_WRITEBACK. This patch removes the scaling and treates writeback pages similar to how it treats unqueued dirty pages and congested pages. The user-visible effect should be that kswapd will writeback fewer pages from reclaim context. Signed-off-by: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: vmscan: do not continue scanning if reclaim was aborted for compactionMel Gorman1-3/+5
Direct reclaim is not aborting to allow compaction to go ahead properly. do_try_to_free_pages is told to abort reclaim which is happily ignores and instead increases priority instead until it reaches 0 and starts shrinking file/anon equally. This patch corrects the situation by aborting reclaim when requested instead of raising priority. Signed-off-by: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/memory_hotplug.c: fix a comment typo in register_page_bootmem_info_node()Tang Chen1-2/+2
Signed-off-by: Tang Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/memblock.c: fix wrong comment in __next_free_mem_range()Tang Chen1-1/+1
Remove one redundant "nid" in the comment. Signed-off-by: Tang Chen <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: fix an overflow bug in alloc_vmap_area()Zhang Yanfei1-3/+3
When searching a vmap area in the vmalloc space, we use (addr + size - 1) to check if the value is less than addr, which is an overflow. But we assign (addr + size) to vmap_area->va_end. So if we come across the below case: (addr + size - 1) : not overflow (addr + size) : overflow we will assign an overflow value (e.g 0) to vmap_area->va_end, And this will trigger BUG in __insert_vmap_area, causing system panic. So using (addr + size) to check the overflow should be the correct behaviour, not (addr + size - 1). Signed-off-by: Zhang Yanfei <[email protected]> Reported-by: Ghennadi Procopciuc <[email protected]> Tested-by: Daniel Baluta <[email protected]> Cc: David Rientjes <[email protected]> Cc: Minchan Kim <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: remove unused VM_<READfoo> macros and expand other in-placeJoe Perches3-5/+5
These VM_<READfoo> macros aren't used very often and three of them aren't used at all. Expand the ones that are used in-place, and remove all the now unused #define VM_<foo> macros. VM_READHINTMASK, VM_NormalReadHint and VM_ClearReadHint were added just before 2.4 and appears have never been used. Signed-off-by: Joe Perches <[email protected]> Acked-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/sparse.c: put clear_hwpoisoned_pages within CONFIG_MEMORY_HOTREMOVEZhang Yanfei1-1/+1
With CONFIG_MEMORY_HOTREMOVE unset, there is a compile warning: mm/sparse.c:755: warning: `clear_hwpoisoned_pages' defined but not used And Bisecting it ended up pointing to 4edd7ceff ("mm, hotplug: avoid compiling memory hotremove functions when disabled"). This is because the commit above put sparse_remove_one_section() within the protection of CONFIG_MEMORY_HOTREMOVE but the only user of clear_hwpoisoned_pages() is sparse_remove_one_section(), and it is not within the protection of CONFIG_MEMORY_HOTREMOVE. So put clear_hwpoisoned_pages within CONFIG_MEMORY_HOTREMOVE should fix the warning. Signed-off-by: Zhang Yanfei <[email protected]> Cc: David Rientjes <[email protected]> Acked-by: Toshi Kani <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: remove unused __put_page()Zhang Yanfei1-5/+0
This function is nowhere used, and it has a confusing name with put_page in mm/swap.c. So better to remove it. Signed-off-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09vfree: don't schedule free_work() if llist_add() returns falseOleg Nesterov1-3/+2
vfree() only needs schedule_work(&p->wq) if p->list was empty, otherwise vfree_deferred->wq is already pending or it is running and didn't do llist_del_all() yet. Signed-off-by: Oleg Nesterov <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/page_alloc.c: remove unlikely() from the current_order testZhang Yanfei1-1/+1
In __rmqueue_fallback(), current_order loops down from MAX_ORDER - 1 to the order passed. MAX_ORDER is typically 11 and pageblock_order is typically 9 on x86. Integer division truncates, so pageblock_order / 2 is 4. For the first eight iterations, it's guaranteed that current_order >= pageblock_order / 2 if it even gets that far! So just remove the unlikely(), it's completely bogus. Signed-off-by: Zhang Yanfei <[email protected]> Suggested-by: David Rientjes <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/page_alloc.c: remove zone_type argument of build_zonelists_nodeZhang Yanfei1-13/+8
The callers of build_zonelists_node always pass MAX_NR_ZONES -1 as the zone_type argument, so we can directly use the value in build_zonelists_node and remove zone_type argument. Signed-off-by: Zhang Yanfei <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: do not account memory used for cache creationGlauber Costa1-0/+2
The memory we used to hold the memcg arrays is currently accounted to the current memcg. But that creates a problem, because that memory can only be freed after the last user is gone. Our only way to know which is the last user, is to hook up to freeing time, but the fact that we still have some in flight kmallocs will prevent freeing to happen. I believe therefore to be just easier to account this memory as global overhead. Signed-off-by: Glauber Costa <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: also test for skip accounting at the page allocation levelGlauber Costa1-0/+28
The memory we used to hold the memcg arrays is currently accounted to the current memcg. But that creates a problem, because that memory can only be freed after the last user is gone. Our only way to know which is the last user, is to hook up to freeing time, but the fact that we still have some in flight kmallocs will prevent freeing to happen. I believe therefore to be just easier to account this memory as global overhead. This patch (of 2): Disabling accounting is only relevant for some specific memcg internal allocations. Therefore we would initially not have such check at memcg_kmem_newpage_charge, since direct calls to the page allocator that are marked with GFP_KMEMCG only happen outside memcg core. We are mostly concerned with cache allocations and by having this test at memcg_kmem_get_cache we are already able to relay the allocation to the root cache and bypass the memcg caches altogether. There is one exception, though: the SLUB allocator does not create large order caches, but rather service large kmallocs directly from the page allocator. Therefore, the following sequence, when backed by the SLUB allocator: memcg_stop_kmem_account(); kmalloc(<large_number>) memcg_resume_kmem_account(); would effectively ignore the fact that we should skip accounting, since it will drive us directly to this function without passing through the cache selector memcg_kmem_get_cache. Such large allocations are extremely rare but can happen, for instance, for the cache arrays. This was never a problem in practice, because we weren't skipping accounting for the cache arrays. All the allocations we were skipping were fairly small. However, the fact that we were not skipping those allocations are a problem and can prevent the memcgs from going away. As we fix that, we need to make sure that the fix will also work with the SLUB allocator. Signed-off-by: Glauber Costa <[email protected]> Reported-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: check VM_UNINITIALIZED flag in s_show instead of show_numa_infoZhang Yanfei1-5/+5
We should check the VM_UNITIALIZED flag in s_show(). If this flag is set, that said, the vm_struct is not fully initialized. So it is unnecessary to try to show the information contained in vm_struct. We checked this flag in show_numa_info(), but I think it's better to check it earlier. Signed-off-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: rename VM_UNLIST to VM_UNINITIALIZEDZhang Yanfei1-9/+9
VM_UNLIST was used to indicate that the vm_struct is not listed in vmlist. But after commit 4341fa454796 ("mm, vmalloc: remove list management of vmlist after initializing vmalloc"), the meaning of this flag changed. It now means the vm_struct is not fully initialized. So renaming it to VM_UNINITIALIZED seems more reasonable. Also change clear_vm_unlist to clear_vm_uninitialized_flag. Signed-off-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: emit the failure message before returnZhang Yanfei1-1/+1
Use goto to jump to the fail label to give a failure message before returning NULL. This makes the failure handling in this function consistent. Signed-off-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: remove alloc_map from vmap_blockZhang Yanfei1-3/+0
As we have removed the dead code in the vb_alloc, it seems there is no place to use the alloc_map. So there is no reason to maintain the alloc_map in vmap_block. Signed-off-by: Zhang Yanfei <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: remove unused purge_fragmented_blocks_thiscpuZhang Yanfei1-5/+0
This function is nowhere used now, so remove it. Signed-off-by: Zhang Yanfei <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: remove dead code in vb_allocZhang Yanfei1-15/+1
Space in a vmap block that was once allocated is considered dirty and not made available for allocation again before the whole block is recycled. The result is that free space within a vmap block is always contiguous. So if a vmap block has enough free space for allocation, the allocation is impossible to fail. Thus, the fragmented block purging was never invoked from vb_alloc(). So remove this dead code. [ Same patches also sent by: Chanho Min <[email protected]> Johannes Weiner <[email protected]> but git doesn't do "multiple authors" ] Signed-off-by: Zhang Yanfei <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/vmalloc.c: unbreak __vunmap()Dan Carpenter1-1/+1
There is an extra semi-colon so the function always returns. Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: remove duplicated call of get_pfn_range_for_nidZhang Yanfei1-11/+32
When calculating pages in a node, for each zone in that node, we will have zone_spanned_pages_in_node --> get_pfn_range_for_nid zone_absent_pages_in_node --> get_pfn_range_for_nid That is to say, we call the get_pfn_range_for_nid to get start_pfn and end_pfn of the node for MAX_NR_ZONES * 2 times. And this is totally unnecessary if we call the get_pfn_range_for_nid before zone_*_pages_in_node add two extra arguments node_start_pfn and node_end_pfn for zone_*_pages_in_node, then we can remove the get_pfn_range_in_node in zone_*_pages_in_node. [[email protected]: make definitions more readable] Signed-off-by: Zhang Yanfei <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09memcg: clean up memcg->nodeinfoJohannes Weiner1-15/+5
Remove struct mem_cgroup_lru_info and fold its single member, the variably sized nodeinfo[0], directly into struct mem_cgroup. This should make it more obvious why it has to be the last member there. Also move the comment that's above that special last member below it, so it is more visible to somebody that considers appending to the struct mem_cgroup. Signed-off-by: Johannes Weiner <[email protected]> Cc: David Rientjes <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Glauber Costa <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm: mremap: validate input before taking lockRasmus Villemoes1-8/+10
This patch is very similar to commit 84d96d897671 ("mm: madvise: complete input validation before taking lock"): perform some basic validation of the input to mremap() before taking the &current->mm->mmap_sem lock. This also makes the MREMAP_FIXED => MREMAP_MAYMOVE dependency slightly more explicit. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-04Merge branch 'for-linus' of ↵Linus Torvalds2-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "The usual stuff from trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits) treewide: relase -> release Documentation/cgroups/memory.txt: fix stat file documentation sysctl/net.txt: delete reference to obsolete 2.4.x kernel spinlock_api_smp.h: fix preprocessor comments treewide: Fix typo in printk doc: device tree: clarify stuff in usage-model.txt. open firmware: "/aliasas" -> "/aliases" md: bcache: Fixed a typo with the word 'arithmetic' irq/generic-chip: fix a few kernel-doc entries frv: Convert use of typedef ctl_table to struct ctl_table sgi: xpc: Convert use of typedef ctl_table to struct ctl_table doc: clk: Fix incorrect wording Documentation/arm/IXP4xx fix a typo Documentation/networking/ieee802154 fix a typo Documentation/DocBook/media/v4l fix a typo Documentation/video4linux/si476x.txt fix a typo Documentation/virtual/kvm/api.txt fix a typo Documentation/early-userspace/README fix a typo Documentation/video4linux/soc-camera.txt fix a typo lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment ...
2013-07-04Merge branch 'next' of ↵Linus Torvalds2-12/+21
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc changes for the 3.11 merge window. In addition to the usual bug fixes and small updates, the main highlights are: - Support for transparent huge pages by Aneesh Kumar for 64-bit server processors. This allows the use of 16M pages as transparent huge pages on kernels compiled with a 64K base page size. - Base VFIO support for KVM on power by Alexey Kardashevskiy - Wiring up of our nvram to the pstore infrastructure, including putting compressed oopses in there by Aruna Balakrishnaiah - Move, rework and improve our "EEH" (basically PCI error handling and recovery) infrastructure. It is no longer specific to pseries but is now usable by the new "powernv" platform as well (no hypervisor) by Gavin Shan. - I fixed some bugs in our math-emu instruction decoding and made it usable to emulate some optional FP instructions on processors with hard FP that lack them (such as fsqrt on Freescale embedded processors). - Support for Power8 "Event Based Branch" facility by Michael Ellerman. This facility allows what is basically "userspace interrupts" for performance monitor events. - A bunch of Transactional Memory vs. Signals bug fixes and HW breakpoint/watchpoint fixes by Michael Neuling. And more ... I appologize in advance if I've failed to highlight something that somebody deemed worth it." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) pstore: Add hsize argument in write_buf call of pstore_ftrace_call powerpc/fsl: add MPIC timer wakeup support powerpc/mpic: create mpic subsystem object powerpc/mpic: add global timer support powerpc/mpic: add irq_set_wake support powerpc/85xx: enable coreint for all the 64bit boards powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig powerpc/mpic: Add get_version API both for internal and external use powerpc: Handle both new style and old style reserve maps powerpc/hw_brk: Fix off by one error when validating DAWR region end powerpc/pseries: Support compression of oops text via pstore powerpc/pseries: Re-organise the oops compression code pstore: Pass header size in the pstore write callback powerpc/powernv: Fix iommu initialization again powerpc/pseries: Inform the hypervisor we are using EBB regs powerpc/perf: Add power8 EBB support powerpc/perf: Core EBB support for 64-bit book3s powerpc/perf: Drop MMCRA from thread_struct powerpc/perf: Don't enable if we have zero events ...
2013-07-03Merge branch 'akpm' (updates from Andrew Morton)Linus Torvalds22-516/+1021
Merge first patch-bomb from Andrew Morton: - various misc bits - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been distracted. There has been quite a bit of activity. - About half the MM queue - Some backlight bits - Various lib/ updates - checkpatch updates - zillions more little rtc patches - ptrace - signals - exec - procfs - rapidio - nbd - aoe - pps - memstick - tools/testing/selftests updates * emailed patches from Andrew Morton <[email protected]>: (445 commits) tools/testing/selftests: don't assume the x bit is set on scripts selftests: add .gitignore for kcmp selftests: fix clean target in kcmp Makefile selftests: add .gitignore for vm selftests: add hugetlbfstest self-test: fix make clean selftests: exit 1 on failure kernel/resource.c: remove the unneeded assignment in function __find_resource aio: fix wrong comment in aio_complete() drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode drivers/memstick/host/r592.c: convert to module_pci_driver drivers/memstick/host/jmb38x_ms: convert to module_pci_driver pps-gpio: add device-tree binding and support drivers/pps/clients/pps-gpio.c: convert to module_platform_driver drivers/pps/clients/pps-gpio.c: convert to devm_* helpers drivers/parport/share.c: use kzalloc Documentation/accounting/getdelays.c: avoid strncpy in accounting tool aoe: update internal version number to v83 aoe: update copyright date aoe: perform I/O completions in parallel ...
2013-07-03drivers: avoid format string in dev_set_nameKees Cook1-3/+2
Calling dev_set_name with a single paramter causes it to be handled as a format string. Many callers are passing potentially dynamic string content, so use "%s" in those cases to avoid any potential accidents, including wrappers like device_create*() and bdi_register(). Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: memcontrol: factor out reclaim iterator loading and updatingJohannes Weiner1-29/+57
mem_cgroup_iter() is too hard to follow. Factor out the lockless reclaim iterator loading and updating so it's easier to follow the big picture. Also document the iterator invalidation mechanism a bit more extensively. Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Tejun Heo <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Glauber Costa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm, vmalloc: use clamp() to simplify codeZhang Yanfei1-10/+2
Signed-off-by: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm, vmalloc: remove insert_vmalloc_vm()Zhang Yanfei1-7/+0
Now this function is nowhere used, we can remove it directly. Signed-off-by: Zhang Yanfei <[email protected]> Acked-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm, vmalloc: call setup_vmalloc_vm() instead of insert_vmalloc_vm()Zhang Yanfei1-2/+2
Here we pass flags with only VM_ALLOC bit set, it is unnecessary to call clear_vm_unlist to clear VM_UNLIST bit. So use setup_vmalloc_vm instead of insert_vmalloc_vm. Signed-off-by: Zhang Yanfei <[email protected]> Acked-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm, vmalloc: only call setup_vmalloc_vm() only in __get_vm_area_node()Zhang Yanfei1-10/+1
Now for insert_vmalloc_vm, it only calls the two functions: - setup_vmalloc_vm: fill vm_struct and vmap_area instances - clear_vm_unlist: clear VM_UNLIST bit in vm_struct->flags So in __get_vm_area_node(), if VM_UNLIST bit unset in flags, that is the else branch here, we don't need to clear VM_UNLIST bit for vm->flags since this bit is obviously not set. That is to say, we could only call setup_vmalloc_vm instead of insert_vmalloc_vm here. And then we could even remove the if test here. Signed-off-by: Zhang Yanfei <[email protected]> Acked-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03sparsemem: add BUILD_BUG_ON when sizeof mem_section is non-power-of-2Cody P Schafer1-0/+3
Instead of leaving a hidden trap for the next person who comes along and wants to add something to mem_section, add a big fat warning about it needing to be a power-of-2, and insert a BUILD_BUG_ON() in sparse_init() to catch mistakes. Right now non-power-of-2 mem_sections cause a number of WARNs at boot (which don't clearly point to the size of mem_section as an issue), but the system limps on (temporarily, at least). This is based upon Dave Hansen's earlier RFC where he ran into the same issue: "sparsemem: fix boot when SECTIONS_PER_ROOT is not power-of-2" http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/03077.html Signed-off-by: Cody P Schafer <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc: Jiang Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: kill free_all_bootmem_node()Jiang Liu1-18/+0
Now nobody makes use of free_all_bootmem_node(), kill it. Signed-off-by: Jiang Liu <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Yinghai Lu <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: kill global variable num_physpagesJiang Liu2-4/+0
Now all references to num_physpages have been removed, so kill it. Signed-off-by: Jiang Liu <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: David Rientjes <[email protected]> Cc: Al Viro <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm/hotplug: prepare for removing num_physpagesJiang Liu1-4/+0
Prepare for removing num_physpages. Signed-off-by: Jiang Liu <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Tang Chen <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: introduce helper function mem_init_print_info() to simplify mem_init()Jiang Liu1-0/+52
Introduce helper function mem_init_print_info() to simplify mem_init() across different architectures, which also unifies the format and information printed. Function mem_init_print_info() calculates memory statistics information without walking each page, so it should be a little faster on some architectures. Also introduce another helper get_num_physpages() to kill the global variable num_physpages. Signed-off-by: Jiang Liu <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Marek Szyprowski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: fix the TLB range flushed when __tlb_remove_page() runs out of slotsVineet Gupta1-3/+6
zap_pte_range loops from @addr to @end. In the middle, if it runs out of batching slots, TLB entries needs to be flushed for @start to @interim, NOT @interim to @end. Since ARC port doesn't use page free batching I can't test it myself but this seems like the right thing to do. Observed this when working on a fix for the issue at thread: http://www.spinics.net/lists/linux-arch/msg21736.html Signed-off-by: Vineet Gupta <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: David Rientjes <[email protected]> Cc: Peter Zijlstra <[email protected]> Acked-by: Catalin Marinas <[email protected]> Cc: Max Filippov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: report available pages as "MemTotal" for each NUMA nodeJiang Liu1-1/+5
As reported by https://bugzilla.kernel.org/show_bug.cgi?id=53501, "MemTotal" from /proc/meminfo means memory pages managed by the buddy system (managed_pages), but "MemTotal" from /sys/.../node/nodex/meminfo means physical pages present (present_pages) within the NUMA node. There's a difference between managed_pages and present_pages due to bootmem allocator and reserved pages. And Documentation/filesystems/proc.txt says MemTotal: Total usable ram (i.e. physical ram minus a few reserved bits and the kernel binary code) So change /sys/.../node/nodex/meminfo to report available pages within the node as "MemTotal". Signed-off-by: Jiang Liu <[email protected]> Reported-by: <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Jianguo Wu <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Tang Chen <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: concentrate modification of totalram_pages into the mm coreJiang Liu2-2/+14
Concentrate code to modify totalram_pages into the mm core, so the arch memory initialized code doesn't need to take care of it. With these changes applied, only following functions from mm core modify global variable totalram_pages: free_bootmem_late(), free_all_bootmem(), free_all_bootmem_node(), adjust_managed_page_count(). With this patch applied, it will be much more easier for us to keep totalram_pages and zone->managed_pages in consistence. Signed-off-by: Jiang Liu <[email protected]> Acked-by: David Howells <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Jianguo Wu <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Tang Chen <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: correctly update zone->managed_pagesJiang Liu3-19/+10
Enhance adjust_managed_page_count() to adjust totalhigh_pages for highmem pages. And change code which directly adjusts totalram_pages to use adjust_managed_page_count() because it adjusts totalram_pages, totalhigh_pages and zone->managed_pages altogether in a safe way. Remove inc_totalhigh_pages() and dec_totalhigh_pages() from xen/balloon driver bacause adjust_managed_page_count() has already adjusted totalhigh_pages. This patch also fixes two bugs: 1) enhances virtio_balloon driver to adjust totalhigh_pages when reserve/unreserve pages. 2) enhance memory_hotplug.c to adjust totalhigh_pages when hot-removing memory. We still need to deal with modifications of totalram_pages in file arch/powerpc/platforms/pseries/cmm.c, but need help from PPC experts. [[email protected]: remove ifdef, per Wanpeng Li, virtio_balloon.c cleanup, per Sergei] [[email protected]: export adjust_managed_page_count() to modules, for drivers/virtio/virtio_balloon.c] Signed-off-by: Jiang Liu <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Rusty Russell <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Tang Chen <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jianguo Wu <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Russell King <[email protected]> Cc: Sergei Shtylyov <[email protected]> Cc: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: make __free_pages_bootmem() only available at boot timeJiang Liu2-22/+3
In order to simpilify management of totalram_pages and zone->managed_pages, make __free_pages_bootmem() only available at boot time. With this change applied, __free_pages_bootmem() will only be used by bootmem.c and nobootmem.c at boot time, so mark it as __init. Other callers of __free_pages_bootmem() have been converted to use free_reserved_page(), which handles totalram_pages and zone->managed_pages in a safer way. This patch also fix a bug in free_pagetable() for x86_64, which should increase zone->managed_pages instead of zone->present_pages when freeing reserved pages. And now we have managed_pages_count_lock to protect totalram_pages and zone->managed_pages, so remove the redundant ppb_lock lock in put_page_bootmem(). This greatly simplifies the locking rules. Signed-off-by: Jiang Liu <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Tang Chen <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Jianguo Wu <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Will Deacon <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>