aboutsummaryrefslogtreecommitdiff
path: root/mm/memory_hotplug.c
AgeCommit message (Collapse)AuthorFilesLines
2014-12-10mm, memory_hotplug/failure: drain single zone pcplistsVlastimil Babka1-2/+2
Memory hotplug and failure mechanisms have several places where pcplists are drained so that pages are returned to the buddy allocator and can be e.g. prepared for offlining. This is always done in the context of a single zone, we can reduce the pcplists drain to the single zone, which is now possible. The change should make memory offlining due to hotremove or failure faster and not disturbing unrelated pcplists anymore. Signed-off-by: Vlastimil Babka <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Marek Szyprowski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-12-10mm: introduce single zone pcplists drainVlastimil Babka1-2/+2
The functions for draining per-cpu pages back to buddy allocators currently always operate on all zones. There are however several cases where the drain is only needed in the context of a single zone, and spilling other pcplists is a waste of time both due to the extra spilling and later refilling. This patch introduces new zone pointer parameter to drain_all_pages() and changes the dummy parameter of drain_local_pages() to be also a zone pointer. When NULL is passed, the functions operate on all zones as usual. Passing a specific zone pointer reduces the work to the single zone. All callers are updated to pass the NULL pointer in this patch. Conversion to single zone (where appropriate) is done in further patches. Signed-off-by: Vlastimil Babka <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Marek Szyprowski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-11-13mem-hotplug: reset node present pages when hot-adding a new pgdatTang Chen1-0/+17
When memory is hot-added, all the memory is in offline state. So clear all zones' present_pages because they will be updated in online_pages() and offline_pages(). Otherwise, /proc/zoneinfo will corrupt: When the memory of node2 is offline: # cat /proc/zoneinfo ...... Node 2, zone Movable ...... spanned 8388608 present 8388608 managed 0 When we online memory on node2: # cat /proc/zoneinfo ...... Node 2, zone Movable ...... spanned 8388608 present 16777216 managed 8388608 Signed-off-by: Tang Chen <[email protected]> Reviewed-by: Yasuaki Ishimatsu <[email protected]> Cc: <[email protected]> [3.16+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-11-13mem-hotplug: reset node managed pages when hot-adding a new pgdatTang Chen1-0/+9
In free_area_init_core(), zone->managed_pages is set to an approximate value for lowmem, and will be adjusted when the bootmem allocator frees pages into the buddy system. But free_area_init_core() is also called by hotadd_new_pgdat() when hot-adding memory. As a result, zone->managed_pages of the newly added node's pgdat is set to an approximate value in the very beginning. Even if the memory on that node has node been onlined, /sys/device/system/node/nodeXXX/meminfo has wrong value: hot-add node2 (memory not onlined) cat /sys/device/system/node/node2/meminfo Node 2 MemTotal: 33554432 kB Node 2 MemFree: 0 kB Node 2 MemUsed: 33554432 kB Node 2 Active: 0 kB This patch fixes this problem by reset node managed pages to 0 after hot-adding a new node. 1. Move reset_managed_pages_done from reset_node_managed_pages() to reset_all_zones_managed_pages() 2. Make reset_node_managed_pages() non-static 3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat is initialized Signed-off-by: Tang Chen <[email protected]> Signed-off-by: Yasuaki Ishimatsu <[email protected]> Cc: <[email protected]> [3.16+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-29memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node()Yasuaki Ishimatsu1-5/+0
When hot adding the same memory after hot removal, the following messages are shown: WARNING: CPU: 20 PID: 6 at mm/page_alloc.c:4968 free_area_init_node+0x3fe/0x426() ... Call Trace: dump_stack+0x46/0x58 warn_slowpath_common+0x81/0xa0 warn_slowpath_null+0x1a/0x20 free_area_init_node+0x3fe/0x426 hotadd_new_pgdat+0x90/0x110 add_memory+0xd4/0x200 acpi_memory_device_add+0x1aa/0x289 acpi_bus_attach+0xfd/0x204 acpi_bus_attach+0x178/0x204 acpi_bus_scan+0x6a/0x90 acpi_device_hotplug+0xe8/0x418 acpi_hotplug_work_fn+0x1f/0x2b process_one_work+0x14e/0x3f0 worker_thread+0x11b/0x510 kthread+0xe1/0x100 ret_from_fork+0x7c/0xb0 The detaled explanation is as follows: When hot removing memory, pgdat is set to 0 in try_offline_node(). But if the pgdat is allocated by bootmem allocator, the clearing step is skipped. And when hot adding the same memory, the uninitialized pgdat is reused. But free_area_init_node() checks wether pgdat is set to zero. As a result, free_area_init_node() hits WARN_ON(). This patch clears pgdat which is allocated by bootmem allocator in try_offline_node(). Signed-off-by: Yasuaki Ishimatsu <[email protected]> Cc: Zhang Zhen <[email protected]> Cc: Wang Nan <[email protected]> Cc: Tang Chen <[email protected]> Reviewed-by: Toshi Kani <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09memory-hotplug: add sysfs valid_zones attributeZhang Zhen1-1/+1
Currently memory-hotplug has two limits: 1. If the memory block is in ZONE_NORMAL, you can change it to ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE. 2. If the memory block is in ZONE_MOVABLE, you can change it to ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL. With this patch, we can easy to know a memory block can be onlined to which zone, and don't need to know the above two limits. Updated the related Documentation. [[email protected]: use conventional comment layout] [[email protected]: fix build with CONFIG_MEMORY_HOTREMOVE=n] [[email protected]: remove unused local zone_prev] Signed-off-by: Zhang Zhen <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Wang Nan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06memory-hotplug: add zone_for_memory() for selecting zone for new memoryWang Nan1-0/+28
This series of patches fixes a problem when adding memory in bad manner. For example: for a x86_64 machine booted with "mem=400M" and with 2GiB memory installed, following commands cause problem: # echo 0x40000000 > /sys/devices/system/memory/probe [ 28.613895] init_memory_mapping: [mem 0x40000000-0x47ffffff] # echo 0x48000000 > /sys/devices/system/memory/probe [ 28.693675] init_memory_mapping: [mem 0x48000000-0x4fffffff] # echo online_movable > /sys/devices/system/memory/memory9/state # echo 0x50000000 > /sys/devices/system/memory/probe [ 29.084090] init_memory_mapping: [mem 0x50000000-0x57ffffff] # echo 0x58000000 > /sys/devices/system/memory/probe [ 29.151880] init_memory_mapping: [mem 0x58000000-0x5fffffff] # echo online_movable > /sys/devices/system/memory/memory11/state # echo online> /sys/devices/system/memory/memory8/state # echo online> /sys/devices/system/memory/memory10/state # echo offline> /sys/devices/system/memory/memory9/state [ 30.558819] Offlined Pages 32768 # free total used free shared buffers cached Mem: 780588 18014398509432020 830552 0 0 51180 -/+ buffers/cache: 18014398509380840 881732 Swap: 0 0 0 This is because the above commands probe higher memory after online a section with online_movable, which causes ZONE_HIGHMEM (or ZONE_NORMAL for systems without ZONE_HIGHMEM) overlaps ZONE_MOVABLE. After the second online_movable, the problem can be observed from zoneinfo: # cat /proc/zoneinfo ... Node 0, zone Movable pages free 65491 min 250 low 312 high 375 scanned 0 spanned 18446744073709518848 present 65536 managed 65536 ... This series of patches solve the problem by checking ZONE_MOVABLE when choosing zone for new memory. If new memory is inside or higher than ZONE_MOVABLE, makes it go there instead. After applying this series of patches, following are free and zoneinfo result (after offlining memory9): bash-4.2# free total used free shared buffers cached Mem: 780956 80112 700844 0 0 51180 -/+ buffers/cache: 28932 752024 Swap: 0 0 0 bash-4.2# cat /proc/zoneinfo Node 0, zone DMA pages free 3389 min 14 low 17 high 21 scanned 0 spanned 4095 present 3998 managed 3977 nr_free_pages 3389 ... start_pfn: 1 inactive_ratio: 1 Node 0, zone DMA32 pages free 73724 min 341 low 426 high 511 scanned 0 spanned 98304 present 98304 managed 92958 nr_free_pages 73724 ... start_pfn: 4096 inactive_ratio: 1 Node 0, zone Normal pages free 32630 min 120 low 150 high 180 scanned 0 spanned 32768 present 32768 managed 32768 nr_free_pages 32630 ... start_pfn: 262144 inactive_ratio: 1 Node 0, zone Movable pages free 65476 min 241 low 301 high 361 scanned 0 spanned 98304 present 65536 managed 65536 nr_free_pages 65476 ... start_pfn: 294912 inactive_ratio: 1 This patch (of 7): Introduce zone_for_memory() in arch independent code for arch_add_memory() use. Many arch_add_memory() function simply selects ZONE_HIGHMEM or ZONE_NORMAL and add new memory into it. However, with the existance of ZONE_MOVABLE, the selection method should be carefully considered: if new, higher memory is added after ZONE_MOVABLE is setup, the default zone and ZONE_MOVABLE may overlap each other. should_add_memory_movable() checks the status of ZONE_MOVABLE. If it has already contain memory, compare the address of new memory and movable memory. If new memory is higher than movable, it should be added into ZONE_MOVABLE instead of default zone. Signed-off-by: Wang Nan <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: "Mel Gorman" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Chris Metcalf <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06mem-hotplug: introduce MMOP_OFFLINE to replace the hard coding -1Tang Chen1-3/+6
In store_mem_state(), we have: ... 334 else if (!strncmp(buf, "offline", min_t(int, count, 7))) 335 online_type = -1; ... 355 case -1: 356 ret = device_offline(&mem->dev); 357 break; ... Here, "offline" is hard coded as -1. This patch does the following renaming: ONLINE_KEEP -> MMOP_ONLINE_KEEP ONLINE_KERNEL -> MMOP_ONLINE_KERNEL ONLINE_MOVABLE -> MMOP_ONLINE_MOVABLE and introduces MMOP_OFFLINE = -1 to avoid hard coding. Signed-off-by: Tang Chen <[email protected]> Cc: Hu Tao <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Gu Zheng <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-08-06mm/memory_hotplug.c: add __meminit to grow_zone_span/grow_pgdat_spanFabian Frederick1-4/+4
grow_zone_span and grow_pgdat_span are only called by __meminit __add_zone Signed-off-by: Fabian Frederick <[email protected]> Cc: Toshi Kani <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-04mm, migration: add destination page freeing callbackDavid Rientjes1-1/+1
Memory migration uses a callback defined by the caller to determine how to allocate destination pages. When migration fails for a source page, however, it frees the destination page back to the system. This patch adds a memory migration callback defined by the caller to determine how to free destination pages. If a caller, such as memory compaction, builds its own freelist for migration targets, this can reuse already freed memory instead of scanning additional memory. If the caller provides a function to handle freeing of destination pages, it is called when page migration fails. If the caller passes NULL then freeing back to the system will be handled as usual. This patch introduces no functional change. Signed-off-by: David Rientjes <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Greg Thelen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-04mm/memory_hotplug.c: use PFN_DOWN()Fabian Frederick1-2/+2
Replace ((x) >> PAGE_SHIFT) with the pfn macro. Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-04mem-hotplug: implement get/put_online_memsVladimir Davydov1-41/+101
kmem_cache_{create,destroy,shrink} need to get a stable value of cpu/node online mask, because they init/destroy/access per-cpu/node kmem_cache parts, which can be allocated or destroyed on cpu/mem hotplug. To protect against cpu hotplug, these functions use {get,put}_online_cpus. However, they do nothing to synchronize with memory hotplug - taking the slab_mutex does not eliminate the possibility of race as described in patch 2. What we need there is something like get_online_cpus, but for memory. We already have lock_memory_hotplug, which serves for the purpose, but it's a bit of a hammer right now, because it's backed by a mutex. As a result, it imposes some limitations to locking order, which are not desirable, and can't be used just like get_online_cpus. That's why in patch 1 I substitute it with get/put_online_mems, which work exactly like get/put_online_cpus except they block not cpu, but memory hotplug. [ v1 can be found at https://lkml.org/lkml/2014/4/6/68. I NAK'ed it by myself, because it used an rw semaphore for get/put_online_mems, making them dead lock prune. ] This patch (of 2): {un}lock_memory_hotplug, which is used to synchronize against memory hotplug, is currently backed by a mutex, which makes it a bit of a hammer - threads that only want to get a stable value of online nodes mask won't be able to proceed concurrently. Also, it imposes some strong locking ordering rules on it, which narrows down the set of its usage scenarios. This patch introduces get/put_online_mems, which are the same as get/put_online_cpus, but for memory hotplug, i.e. executing a code inside a get/put_online_mems section will guarantee a stable value of online nodes, present pages, etc. lock_memory_hotplug()/unlock_memory_hotplug() are removed altogether. Signed-off-by: Vladimir Davydov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Tang Chen <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: David Rientjes <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Lai Jiangshan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-23mm/memory_hotplug.c: move register_memory_resource out of the ↵Nathan Zimmer1-3/+4
lock_memory_hotplug We don't need to do register_memory_resource() under lock_memory_hotplug() since it has its own lock and doesn't make any callbacks. Also register_memory_resource return NULL on failure so we don't have anything to cleanup at this point. The reason for this rfc is I was doing some experiments with hotplugging of memory on some of our larger systems. While it seems to work, it can be quite slow. With some preliminary digging I found that lock_memory_hotplug is clearly ripe for breakup. It could be broken up per nid or something but it also covers the online_page_callback. The online_page_callback shouldn't be very hard to break out. Also there is the issue of various structures(wmarks come to mind) that are only updated under the lock_memory_hotplug that would need to be dealt with. Cc: Tang Chen <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Reviewed-by: Yasuaki Ishimatsu <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Hedi <[email protected]> Cc: Mike Travis <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-23mm: print more details for bad_page()Dave Hansen1-1/+1
bad_page() is cool in that it prints out a bunch of data about the page. But, I can never remember which page flags are good and which are bad, or whether ->index or ->mapping is required to be NULL. This patch allows bad/dump_page() callers to specify a string about why they are dumping the page and adds explanation strings to a number of places. It also adds a 'bad_flags' argument to bad_page(), which it then dumps out separately from the flags which are actually set. This way, the messages will show specifically why the page was bad, *specifically* which flags it is complaining about, if it was a page flag combination which was the problem. [[email protected]: switch to pr_alert] Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Cc: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/memory_hotplug.c: use memblock apis for early memory allocationsSantosh Shilimkar1-1/+1
Correct ensure_zone_is_initialized() function description according to the introduced memblock APIs for early memory allocations. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Lindgren <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21mm/memblock: remove unnecessary inclusions of bootmem.hGrygorii Strashko1-1/+0
Clean-up to remove depedency with bootmem headers. Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Reviewed-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Lindgren <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-01-21memblock, mem_hotplug: make memblock skip hotpluggable regions if neededTang Chen1-0/+1
Linux kernel cannot migrate pages used by the kernel. As a result, hotpluggable memory used by the kernel won't be able to be hot-removed. To solve this problem, the basic idea is to prevent memblock from allocating hotpluggable memory for the kernel at early time, and arrange all hotpluggable memory in ACPI SRAT(System Resource Affinity Table) as ZONE_MOVABLE when initializing zones. In the previous patches, we have marked hotpluggable memory regions with MEMBLOCK_HOTPLUG flag in memblock.memory. In this patch, we make memblock skip these hotpluggable memory regions in the default top-down allocation function if movable_node boot option is specified. [[email protected]: coding-style fixes] Signed-off-by: Tang Chen <[email protected]> Signed-off-by: Zhang Yanfei <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: "Rafael J . Wysocki" <[email protected]> Cc: Chen Tang <[email protected]> Cc: Gong Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Larry Woodman <[email protected]> Cc: Len Brown <[email protected]> Cc: Liu Jiang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Taku Izumi <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Renninger <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vasilis Liaskovitis <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13mem-hotplug: introduce movable_node boot optionTang Chen1-0/+31
The hot-Pluggable field in SRAT specifies which memory is hotpluggable. As we mentioned before, if hotpluggable memory is used by the kernel, it cannot be hot-removed. So memory hotplug users may want to set all hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it. Memory hotplug users may also set a node as movable node, which has ZONE_MOVABLE only, so that the whole node can be hot-removed. But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the kernel cannot use memory in movable nodes. This will cause NUMA performance down. And other users may be unhappy. So we need a way to allow users to enable and disable this functionality. In this patch, we introduce movable_node boot option to allow users to choose to not to consume hotpluggable memory at early boot time and later we can set it as ZONE_MOVABLE. To achieve this, the movable_node boot option will control the memblock allocation direction. That said, after memblock is ready, before SRAT is parsed, we should allocate memory near the kernel image as we explained in the previous patches. So if movable_node boot option is set, the kernel does the following: 1. After memblock is ready, make memblock allocate memory bottom up. 2. After SRAT is parsed, make memblock behave as default, allocate memory top down. Users can specify "movable_node" in kernel commandline to enable this functionality. For those who don't use memory hotplug or who don't want to lose their NUMA performance, just don't specify anything. The kernel will work as before. Signed-off-by: Tang Chen <[email protected]> Signed-off-by: Zhang Yanfei <[email protected]> Suggested-by: Kamezawa Hiroyuki <[email protected]> Suggested-by: Ingo Molnar <[email protected]> Acked-by: Tejun Heo <[email protected]> Acked-by: Toshi Kani <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Thomas Renninger <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Taku Izumi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13mm/sparsemem: use PAGES_PER_SECTION to remove redundant nr_pages parameterZhang Yanfei1-2/+1
For below functions, - sparse_add_one_section() - kmalloc_section_memmap() - __kmalloc_section_memmap() - __kfree_section_memmap() they are always invoked to operate on one memory section, so it is redundant to always pass a nr_pages parameter, which is the page numbers in one section. So we can directly use predefined macro PAGES_PER_SECTION instead of passing the parameter. Signed-off-by: Zhang Yanfei <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Tang Chen <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Yasunori Goto <[email protected]> Cc: Andy Whitcroft <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13cpu/mem hotplug: add try_online_node() for cpu_up()Toshi Kani1-2/+14
cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call mem_online_node() to put its node online if offlined and then call build_all_zonelists() to initialize the zone list. These steps are specific to memory hotplug, and should be managed in mm/memory_hotplug.c. lock_memory_hotplug() should also be held for the whole steps. For this reason, this patch replaces mem_online_node() with try_online_node(), which performs the whole steps with lock_memory_hotplug() held. try_online_node() is named after try_offline_node() as they have similar purpose. There is no functional change in this patch. Signed-off-by: Toshi Kani <[email protected]> Reviewed-by: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13mm/memory_hotplug.c: use pfn_to_nid() instead of page_to_nid(pfn_to_page())Xishi Qiu1-1/+1
Use "pfn_to_nid(pfn)" instead of "page_to_nid(pfn_to_page(pfn))". Signed-off-by: Xishi Qiu <[email protected]> Acked-by: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13mm/memory_hotplug.c: rename the function is_memblock_offlined_cb()Xishi Qiu1-2/+2
A is_memblock_offlined() return or 1 means memory block is offlined, but is_memblock_offlined_cb() returning 1 means memory block is not offlined, this will confuse somebody, so rename the function. Signed-off-by: Xishi Qiu <[email protected]> Acked-by: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13mm: use pgdat_end_pfn() to simplify the code in othersXishi Qiu1-5/+4
Use "pgdat_end_pfn()" instead of "pgdat->node_start_pfn + pgdat->node_spanned_pages". Simplify the code, no functional change. Signed-off-by: Xishi Qiu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-09-12Merge tag 'pm+acpi-fixes-3.12-rc1' of ↵Linus Torvalds1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "All of these commits are fixes that have emerged recently and some of them fix bugs introduced during this merge window. Specifics: 1) ACPI-based PCI hotplug (ACPIPHP) fixes related to spurious events After the recent ACPIPHP changes we've seen some interesting breakage on a system that triggers device check notifications during boot for non-existing devices. Although those notifications are really spurious, we should be able to deal with them nevertheless and that shouldn't introduce too much overhead. Four commits to make that work properly. 2) Memory hotplug and hibernation mutual exclusion rework This was maent to be a cleanup, but it happens to fix a classical ABBA deadlock between system suspend/hibernation and ACPI memory hotplug which is possible if they are started roughly at the same time. Three commits rework memory hotplug so that it doesn't acquire pm_mutex and make hibernation use device_hotplug_lock which prevents it from racing with memory hotplug. 3) ACPI Intel LPSS (Low-Power Subsystem) driver crash fix The ACPI LPSS driver crashes during boot on Apple Macbook Air with Haswell that has slightly unusual BIOS configuration in which one of the LPSS device's _CRS method doesn't return all of the information expected by the driver. Fix from Mika Westerberg, for stable. 4) ACPICA fix related to Store->ArgX operation AML interpreter fix for obscure breakage that causes AML to be executed incorrectly on some machines (observed in practice). From Bob Moore. 5) ACPI core fix for PCI ACPI device objects lookup There still are cases in which there is more than one ACPI device object matching a given PCI device and we don't choose the one that the BIOS expects us to choose, so this makes the lookup take more criteria into account in those cases. 6) Fix to prevent cpuidle from crashing in some rare cases If the result of cpuidle_get_driver() is NULL, which can happen on some systems, cpuidle_driver_ref() will crash trying to use that pointer and the Daniel Fu's fix prevents that from happening. 7) cpufreq fixes related to CPU hotplug Stephen Boyd reported a number of concurrency problems with cpufreq related to CPU hotplug which are addressed by a series of fixes from Srivatsa S Bhat and Viresh Kumar. 8) cpufreq fix for time conversion in time_in_state attribute Time conversion carried out by cpufreq when user space attempts to read /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state won't work correcty if cputime_t doesn't map directly to jiffies. Fix from Andreas Schwab. 9) Revert of a troublesome cpufreq commit Commit 7c30ed5 (cpufreq: make sure frequency transitions are serialized) was intended to address some known concurrency problems in cpufreq related to the ordering of transitions, but unfortunately it introduced several problems of its own, so I decided to revert it now and address the original problems later in a more robust way. 10) Intel Haswell CPU models for intel_pstate from Nell Hardcastle. 11) cpufreq fixes related to system suspend/resume The recent cpufreq changes that made it preserve CPU sysfs attributes over suspend/resume cycles introduced a possible NULL pointer dereference that caused it to crash during the second attempt to suspend. Three commits from Srivatsa S Bhat fix that problem and a couple of related issues. 12) cpufreq locking fix cpufreq_policy_restore() should acquire the lock for reading, but it acquires it for writing. Fix from Lan Tianyu" * tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits) cpufreq: Acquire the lock in cpufreq_policy_restore() for reading cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu cpufreq: Restructure if/else block to avoid unintended behavior cpufreq: Fix crash in cpufreq-stats during suspend/resume intel_pstate: Add Haswell CPU models Revert "cpufreq: make sure frequency transitions are serialized" cpufreq: Use signed type for 'ret' variable, to store negative error values cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock cpufreq: Split __cpufreq_remove_dev() into two parts cpufreq: Fix wrong time unit conversion cpufreq: serialize calls to __cpufreq_governor() cpufreq: don't allow governor limits to be changed when it is disabled ACPI / bind: Prefer device objects with _STA to those without it ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checks ACPI / hotplug / PCI: Use _OST to notify firmware about notify status ACPI / hotplug / PCI: Avoid doing too much for spurious notifies ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field. ACPI / hotplug / PCI: Don't trim devices before scanning the namespace ...
2013-09-11mm: memory-hotplug: enable memory hotplug to handle hugepageNaoya Horiguchi1-7/+35
Until now we can't offline memory blocks which contain hugepages because a hugepage is considered as an unmovable page. But now with this patch series, a hugepage has become movable, so by using hugepage migration we can offline such memory blocks. What's different from other users of hugepage migration is that we need to decompose all the hugepages inside the target memory block into free buddy pages after hugepage migration, because otherwise free hugepages remaining in the memory block intervene the memory offlining. For this reason we introduce new functions dissolve_free_huge_page() and dissolve_free_huge_pages(). Other than that, what this patch does is straightforwardly to add hugepage migration code, that is, adding hugepage code to the functions which scan over pfn and collect hugepages to be migrated, and adding a hugepage allocation function to alloc_migrate_target(). As for larger hugepages (1GB for x86_64), it's not easy to do hotremove over them because it's larger than memory block. So we now simply leave it to fail as it is. [[email protected]: remove duplicated include] Signed-off-by: Naoya Horiguchi <[email protected]> Acked-by: Andi Kleen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Rik van Riel <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-09-11mm/hotplug: remove stop_machine() from try_offline_node()Toshi Kani1-9/+22
lock_device_hotplug() serializes hotplug & online/offline operations. The lock is held in common sysfs online/offline interfaces and ACPI hotplug code paths. And here are the code paths: - CPU & Mem online/offline via sysfs online store_online()->lock_device_hotplug() - Mem online via sysfs state: store_mem_state()->lock_device_hotplug() - ACPI CPU & Mem hot-add: acpi_scan_bus_device_check()->lock_device_hotplug() - ACPI CPU & Mem hot-delete: acpi_scan_hot_remove()->lock_device_hotplug() try_offline_node() off-lines a node if all memory sections and cpus are removed on the node. It is called from acpi_processor_remove() and acpi_memory_remove_memory()->remove_memory() paths, both of which are in the ACPI hotplug code. try_offline_node() calls stop_machine() to stop all cpus while checking all cpu status with the assumption that the caller is not protected from CPU hotplug or CPU online/offline operations. However, the caller is always serialized with lock_device_hotplug(). Also, the code needs to be properly serialized with a lock, not by stopping all cpus at a random place with stop_machine(). This patch removes the use of stop_machine() in try_offline_node() and adds comments to try_offline_node() and remove_memory() that lock_device_hotplug() is required. Signed-off-by: Toshi Kani <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Tang Chen <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Wanpeng Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-09-11mm/hotplug: verify hotplug memory rangeToshi Kani1-0/+23
add_memory() and remove_memory() can only handle a memory range aligned with section. There are problems when an unaligned range is added and then deleted as follows: - add_memory() with an unaligned range succeeds, but __add_pages() called from add_memory() adds a whole section of pages even though a given memory range is less than the section size. - remove_memory() to the added unaligned range hits BUG_ON() in __remove_pages(). This patch changes add_memory() and remove_memory() to check if a given memory range is aligned with section at the beginning. As the result, add_memory() fails with -EINVAL when a given range is unaligned, and does not add such memory range. This prevents remove_memory() to be called with an unaligned range as well. Note that remove_memory() has to use BUG_ON() since this function cannot fail. [[email protected]: avoid printk warnings] Signed-off-by: Toshi Kani <[email protected]> Acked-by: KOSAKI Motohiro <[email protected]> Reviewed-by: Tang Chen <[email protected]> Reviewed-by: Wanpeng Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-09-11mm: use zone_is_initialized() instead of if(zone->wait_table)Xishi Qiu1-1/+1
Use "zone_is_initialized()" instead of "if (zone->wait_table)". Simplify the code, no functional change. Signed-off-by: Xishi Qiu <[email protected]> Cc: Cody P Schafer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-09-11mm: use zone_is_empty() instead of if(zone->spanned_pages)Xishi Qiu1-3/+3
Use "zone_is_empty()" instead of "if (zone->spanned_pages)". Simplify the code, no functional change. Signed-off-by: Xishi Qiu <[email protected]> Cc: Cody P Schafer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-09-11mm: use zone_end_pfn() instead of zone_start_pfn+spanned_pagesXishi Qiu1-3/+4
Use "zone_end_pfn()" instead of "zone->zone_start_pfn + zone->spanned_pages". Simplify the code, no functional change. [[email protected]: fix build] Signed-off-by: Xishi Qiu <[email protected]> Cc: Cody P Schafer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-09-11mm/hotplug: remove unnecessary BUG_ON in __offline_pages()Xishi Qiu1-1/+0
I think we can remove "BUG_ON(start_pfn >= end_pfn)" in __offline_pages(), because in memory_block_action() "nr_pages = PAGES_PER_SECTION * sections_per_block" is always greater than 0. memory_block_action() offline_pages() __offline_pages() BUG_ON(start_pfn >= end_pfn) In v2.6.32, If info->length==0, this way may hit this BUG_ON(). acpi_memory_disable_device() remove_memory(info->start_addr, info->length) offline_pages() A later Fujitsu patch renamed this function and the BUG_ON() is unnecessary. Signed-off-by: Xishi Qiu <[email protected]> Reviewed-by: Dave Hansen <[email protected]> Cc: Toshi Kani <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-08-31PM / hibernate / memory hotplug: Rework mutual exclusionRafael J. Wysocki1-4/+0
Since all of the memory hotplug operations have to be carried out under device_hotplug_lock, they won't need to acquire pm_mutex if device_hotplug_lock is held around hibernation. For this reason, make the hibernation code acquire device_hotplug_lock after freezing user space processes and release it before thawing them. At the same tim drop the lock_system_sleep() and unlock_system_sleep() calls from lock_memory_hotplug() and unlock_memory_hotplug(), respectively. Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Toshi Kani <[email protected]>
2013-07-09mm/memory_hotplug.c: fix return value of online_pages()Toshi Kani1-3/+3
online_pages() is called from memory_block_action() when a user requests to online a memory block via sysfs. This function needs to return a proper error value in case of error. Signed-off-by: Toshi Kani <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Tang Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-09mm/memory_hotplug.c: fix a comment typo in register_page_bootmem_info_node()Tang Chen1-2/+2
Signed-off-by: Tang Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03Merge branch 'akpm' (updates from Andrew Morton)Linus Torvalds1-32/+16
Merge first patch-bomb from Andrew Morton: - various misc bits - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been distracted. There has been quite a bit of activity. - About half the MM queue - Some backlight bits - Various lib/ updates - checkpatch updates - zillions more little rtc patches - ptrace - signals - exec - procfs - rapidio - nbd - aoe - pps - memstick - tools/testing/selftests updates * emailed patches from Andrew Morton <[email protected]>: (445 commits) tools/testing/selftests: don't assume the x bit is set on scripts selftests: add .gitignore for kcmp selftests: fix clean target in kcmp Makefile selftests: add .gitignore for vm selftests: add hugetlbfstest self-test: fix make clean selftests: exit 1 on failure kernel/resource.c: remove the unneeded assignment in function __find_resource aio: fix wrong comment in aio_complete() drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode drivers/memstick/host/r592.c: convert to module_pci_driver drivers/memstick/host/jmb38x_ms: convert to module_pci_driver pps-gpio: add device-tree binding and support drivers/pps/clients/pps-gpio.c: convert to module_platform_driver drivers/pps/clients/pps-gpio.c: convert to devm_* helpers drivers/parport/share.c: use kzalloc Documentation/accounting/getdelays.c: avoid strncpy in accounting tool aoe: update internal version number to v83 aoe: update copyright date aoe: perform I/O completions in parallel ...
2013-07-03mm/hotplug: prepare for removing num_physpagesJiang Liu1-4/+0
Prepare for removing num_physpages. Signed-off-by: Jiang Liu <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Tang Chen <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: correctly update zone->managed_pagesJiang Liu1-13/+3
Enhance adjust_managed_page_count() to adjust totalhigh_pages for highmem pages. And change code which directly adjusts totalram_pages to use adjust_managed_page_count() because it adjusts totalram_pages, totalhigh_pages and zone->managed_pages altogether in a safe way. Remove inc_totalhigh_pages() and dec_totalhigh_pages() from xen/balloon driver bacause adjust_managed_page_count() has already adjusted totalhigh_pages. This patch also fixes two bugs: 1) enhances virtio_balloon driver to adjust totalhigh_pages when reserve/unreserve pages. 2) enhance memory_hotplug.c to adjust totalhigh_pages when hot-removing memory. We still need to deal with modifications of totalram_pages in file arch/powerpc/platforms/pseries/cmm.c, but need help from PPC experts. [[email protected]: remove ifdef, per Wanpeng Li, virtio_balloon.c cleanup, per Sergei] [[email protected]: export adjust_managed_page_count() to modules, for drivers/virtio/virtio_balloon.c] Signed-off-by: Jiang Liu <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Rusty Russell <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Tang Chen <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jianguo Wu <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Russell King <[email protected]> Cc: Sergei Shtylyov <[email protected]> Cc: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: make __free_pages_bootmem() only available at boot timeJiang Liu1-14/+2
In order to simpilify management of totalram_pages and zone->managed_pages, make __free_pages_bootmem() only available at boot time. With this change applied, __free_pages_bootmem() will only be used by bootmem.c and nobootmem.c at boot time, so mark it as __init. Other callers of __free_pages_bootmem() have been converted to use free_reserved_page(), which handles totalram_pages and zone->managed_pages in a safer way. This patch also fix a bug in free_pagetable() for x86_64, which should increase zone->managed_pages instead of zone->present_pages when freeing reserved pages. And now we have managed_pages_count_lock to protect totalram_pages and zone->managed_pages, so remove the redundant ppb_lock lock in put_page_bootmem(). This greatly simplifies the locking rules. Signed-off-by: Jiang Liu <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Tang Chen <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Jianguo Wu <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Will Deacon <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: fix some trivial typos in commentsJiang Liu1-1/+1
Fix some trivial typos in comments. Signed-off-by: Jiang Liu <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Tang Chen <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Jianguo Wu <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm/memory_hotplug.c: change normal message to use pr_debugToshi Kani1-1/+1
During early boot-up, iomem_resource is set up from the boot descriptor table, such as EFI Memory Table and e820. Later, acpi_memory_device_add() calls add_memory() for each ACPI memory device object as it enumerates ACPI namespace. This add_memory() call is expected to fail in register_memory_resource() at boot since iomem_resource has been set up from EFI/e820. As a result, add_memory() returns -EEXIST, which acpi_memory_device_add() handles as the normal case. This scheme works fine, but the following error message is logged for every ACPI memory device object during boot-up. "System RAM resource %pR cannot be added\n" This patch changes register_memory_resource() to use pr_debug() for the message as it shows up under the normal case. Signed-off-by: Toshi Kani <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03memory_hotplug: use pgdat_resize_lock() in __offline_pages()Cody P Schafer1-0/+5
mmzone.h documents node_size_lock (which pgdat_resize_lock() locks) as follows: * Must be held any time you expect node_start_pfn, node_present_pages * or node_spanned_pages stay constant. [...] So actually hold it when we update node_present_pages in __offline_pages(). [[email protected]: fix build] Signed-off-by: Cody P Schafer <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03memory_hotplug: use pgdat_resize_lock() in online_pages()Cody P Schafer1-0/+5
mmzone.h documents node_size_lock (which pgdat_resize_lock() locks) as follows: * Must be held any time you expect node_start_pfn, node_present_pages * or node_spanned_pages stay constant. [...] So actually hold it when we update node_present_pages in online_pages(). Signed-off-by: Cody P Schafer <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-06-28Merge branch 'acpi-hotplug'Rafael J. Wysocki1-70/+11
* acpi-hotplug: ACPI: Do not use CONFIG_ACPI_HOTPLUG_MEMORY_MODULE ACPI / cpufreq: Add ACPI processor device IDs to acpi-cpufreq Memory hotplug: Move alternative function definitions to header ACPI / processor: Fix potential NULL pointer dereference in acpi_processor_add() Memory hotplug / ACPI: Simplify memory removal ACPI / scan: Add second pass of companion offlining to hot-remove code Driver core / MM: Drop offline_memory_block() ACPI / processor: Pass processor object handle to acpi_bind_one() ACPI: Drop removal_type field from struct acpi_device Driver core / memory: Simplify __memory_block_change_state() ACPI / processor: Initialize per_cpu(processors, pr->id) properly CPU: Fix sysfs cpu/online of offlined CPUs Driver core: Introduce offline/online callbacks for memory blocks ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes ACPI / processor: Use common hotplug infrastructure ACPI / hotplug: Use device offline/online for graceful hot-removal Driver core: Use generic offline/online for CPU offline/online Driver core: Add offline/online device operations
2013-06-01Memory hotplug: Move alternative function definitions to headerRafael J. Wysocki1-7/+1
Move the definitions of offline_pages() and remove_memory() for CONFIG_MEMORY_HOTREMOVE to memory_hotplug.h, where they belong, and make them static inline. Signed-off-by: Rafael J. Wysocki <[email protected]>
2013-06-01Memory hotplug / ACPI: Simplify memory removalRafael J. Wysocki1-63/+8
Now that the memory offlining should be taken care of by the companion device offlining code in acpi_scan_hot_remove(), the ACPI memory hotplug driver doesn't need to offline it in remove_memory() any more. Moreover, since the return value of remove_memory() is not used, it's better to make it be a void function and trigger a BUG() if the memory scheduled for removal is not offline. Change the code in accordance with the above observations. Signed-off-by: Rafael J. Wysocki <[email protected]> Reviewed-by: Toshi Kani <[email protected]>
2013-06-01Driver core / MM: Drop offline_memory_block()Rafael J. Wysocki1-1/+1
Since offline_memory_block(mem) is functionally equivalent to device_offline(&mem->dev), make the only caller of the former use the latter instead and drop offline_memory_block() entirely. Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Toshi Kani <[email protected]>
2013-05-24mm/memory_hotplug.c: fix printk format warningsRandy Dunlap1-3/+6
Fix printk format warnings in mm/memory_hotplug.c by using "%pa": mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'resource_size_t' [-Wformat] mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'resource_size_t' [-Wformat] Signed-off-by: Randy Dunlap <[email protected]> Reported-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-12ACPI / memhotplug: Bind removable memory blocks to ACPI device nodesRafael J. Wysocki1-1/+3
During ACPI memory hotplug configuration bind memory blocks residing in modules removable through the standard ACPI mechanism to struct acpi_device objects associated with ACPI namespace objects representing those modules. Accordingly, unbind those memory blocks from the struct acpi_device objects when the memory modules in question are being removed. When "offline" operation for devices representing memory blocks is introduced, this will allow the ACPI core's device hot-remove code to use it to carry out remove_memory() for those memory blocks and check the results of that before it actually removes the modules holding them from the system. Since walk_memory_range() is used for accessing all memory blocks corresponding to a given ACPI namespace object, it is exported from memory_hotplug.c so that the code in acpi_memhotplug.c can use it. Signed-off-by: Rafael J. Wysocki <[email protected]> Tested-by: Vasilis Liaskovitis <[email protected]> Reviewed-by: Toshi Kani <[email protected]>
2013-04-29mm: fix memory_hotplug.c printk format warningRandy Dunlap1-4/+8
PFN_PHYS() is a phys_addr_t, which can be u32 or u64. Fix the build warning when phys_addr_t is u32. mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'unsigned int' [-Wformat]: => 1685:3 mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'unsigned int' [-Wformat]: => 1685:3 Signed-off-by: Randy Dunlap <[email protected]> Reported-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-04-29mm, hotplug: avoid compiling memory hotremove functions when disabledDavid Rientjes1-33/+35
__remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE. PowerPC pseries will return -EOPNOTSUPP if unsupported. Adding an #ifdef causes several other functions it depends on to also become unnecessary, which saves in .text when disabled (it's disabled in most defconfigs besides powerpc, including x86). remove_memory_block() becomes static since it is not referenced outside of drivers/base/memory.c. Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled and disabled. Signed-off-by: David Rientjes <[email protected]> Acked-by: Toshi Kani <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Tang Chen <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>