aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-03-04x86-64, NUMA: Fix numa_emulation code with node0 without RAMYinghai Lu1-3/+1
On one system that does not have RAM on node0. When numa_emulation is compiled in, and 1. boot system without numa=fake... 2. or boot system with numa=fake=128 to make emulation fail will get: [ 0.092026] ------------[ cut here ]------------ [ 0.096005] kernel BUG at arch/x86/mm/numa_emulation.c:439! [ 0.096005] invalid opcode: 0000 [#1] SMP [ 0.096005] last sysfs file: [ 0.096005] CPU 0 [ 0.096005] Modules linked in: [ 0.096005] [ 0.096005] Pid: 0, comm: swapper Not tainted 2.6.38-rc6-tip-yh-03869-gcb0491d-dirty #684 Sun Microsystems Sun Fire X4240/Sun Fire X4240 [ 0.096005] RIP: 0010:[<ffffffff81cdc65b>] [<ffffffff81cdc65b>] numa_add_cpu+0x56/0xcf [ 0.096005] RSP: 0000:ffffffff82437ed8 EFLAGS: 00010246 ... [ 0.096005] Call Trace: [ 0.096005] [<ffffffff81cd7931>] identify_cpu+0x2d7/0x2df [ 0.096005] [<ffffffff827e54fa>] identify_boot_cpu+0x10/0x30 [ 0.096005] [<ffffffff827e5704>] check_bugs+0x9/0x2d [ 0.096005] [<ffffffff827dceda>] start_kernel+0x3d7/0x3f1 [ 0.096005] [<ffffffff827dc2cc>] x86_64_start_reservations+0x9c/0xa0 [ 0.096005] [<ffffffff827dc4ad>] x86_64_start_kernel+0x1dd/0x1e8 [ 0.096005] Code: 74 06 48 8d 04 90 eb 0f 48 c7 c0 30 d9 00 00 48 03 04 d5 90 0f 60 82 8b 00 83 f8 ff 74 0d 0f a3 05 8b 7e 92 00 19 d2 85 d2 75 02 <0f> 0b 48 98 be 00 01 00 00 48 c7 c7 e0 44 60 82 44 8b 2c 85 e0 [ 0.096005] RIP [<ffffffff81cdc65b>] numa_add_cpu+0x56/0xcf [ 0.096005] RSP <ffffffff82437ed8> [ 0.096026] ---[ end trace a7919e7f17c0a725 ]--- We need to use early_cpu_to_node() directly, because numa_cpu_node() will return node0 that is not onlined. Signed-off-by: Yinghai Lu <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2011-03-04x86-64, NUMA: Revert NUMA affine page table allocationTejun Heo4-58/+8
This patch reverts NUMA affine page table allocation added by commit 1411e0ec31 (x86-64, numa: Put pgtable to local node memory). The commit made an undocumented change where the kernel linear mapping strictly follows intersection of e820 memory map and NUMA configuration. If the physical memory configuration has holes or NUMA nodes are not properly aligned, this leads to using unnecessarily smaller mapping size which leads to increased TLB pressure. For details, http://thread.gmane.org/gmane.linux.kernel/1104672 Patches to fix the problem have been proposed but the underlying code needs more cleanup and the approach itself seems a bit heavy handed and it has been determined to revert the feature for now and come back to it in the next developement cycle. http://thread.gmane.org/gmane.linux.kernel/1105959 As init_memory_mapping_high() callsites have been consolidated since the commit, reverting is done manually. Also, the RED-PEN comment in arch/x86/mm/init.c is not restored as the problem no longer exists with memblock based top-down early memory allocation. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]>
2011-03-02x86-64, NUMA: Better explain numa_distance handlingTejun Heo2-2/+15
Handling of out-of-bounds distances and allocation failure can use better documentation. Add it. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Acked-by: David Rientjes <[email protected]>
2011-03-02x86-64, NUMA: Fix distance table handlingYinghai Lu2-11/+11
NUMA distance table handling has the following problems. * numa_reset_distance() uses numa_distance * sizeof(numa_distance[0]) as the table size when it should be using the square of numa_distance. * The same size miscalculation when allocation space for phys_dist in numa_emulation(). * In numa_emulation(), phys_dist must be reserved; otherwise, the new emulated distance table may overlap it. Fix them and, while at it, take numa_distance_cnt resetting in numa_reset_distance() out of the if block to simplify the code a bit. David Rientjes reported incorrect handling of distance table during emulation. -tj: Edited out numa_alloc_distance() related changes which weren't necessary and rewrote patch description. -v2: Ingo was unhappy with 80-column limit induced linebreaks. Let lines run over 80-column. Signed-off-by: Yinghai Lu <[email protected]> Reported-by: David Rientjes <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Cc: Ingo Molnar <[email protected]> Acked-by: David Rientjes <[email protected]>
2011-02-26mm: Move early_node_map[] reverse scan helpers under HAVE_MEMBLOCKYinghai Lu1-32/+32
Heiko found recent memblock change triggers these warnings on s390: mm/page_alloc.c:3623:22: warning: 'last_active_region_index_in_nid' defined but not used mm/page_alloc.c:3638:22: warning: 'previous_active_region_index_in_nid' defined but not used Need to move those two function under HAVE_MEMBLOCK with its only user, find_memory_core_early(). -tj: Minor updates to description. Reported-by: Heiko Carstens <[email protected]> Signed-off-by: Yinghai Lu <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2011-02-25x86-64, NUMA: Fix size of numa_distance arrayDavid Rientjes1-1/+2
numa_distance should be sized like the SLIT, an NxN matrix where N is the highest node id + 1. This patch fixes the calculation to avoid overflowing the array on the subsequent iteration. -tj: The original patch used last index to calculate size. Yinghai pointed out it should be incremented so it is the number of elements instead of the last index to calculate the size of the table. Updated accordingly. Signed-off-by: David Rientjes <[email protected]> Cc: Yinghai Lu <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2011-02-24x86: Rename e820_table_* to pgt_buf_*Yinghai Lu5-20/+20
e820_table_{start|end|top}, which are used to buffer page table allocation during early boot, are now derived from memblock and don't have much to do with e820. Change the names so that they reflect what they're used for. This patch doesn't introduce any behavior change. -v2: Ingo found that earlier patch "x86: Use early pre-allocated page table buffer top-down" caused crash on 32bit and needed to be dropped. This patch was updated to reflect the change. -tj: Updated commit description. Signed-off-by: Yinghai Lu <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2011-02-24bootmem: Move __alloc_memory_core_early() to nobootmem.cYinghai Lu3-30/+25
Now that bootmem.c and nobootmem.c are separate, there's no reason to define __alloc_memory_core_early(), which is used only by nobootmem, inside #ifdef in page_alloc.c. Move it to nobootmem.c and make it static. This patch doesn't introduce any behavior change. -tj: Updated commit description. Signed-off-by: Yinghai Lu <[email protected]> Acked-by: Andrew Morton <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2011-02-24bootmem: Move contig_page_data definition to bootmem.c/nobootmem.cYinghai Lu3-9/+12
Now that bootmem.c and nobootmem.c are separate, it's cleaner to define contig_page_data in each file than in page_alloc.c with #ifdef. Move it. This patch doesn't introduce any behavior change. -v2: According to Andrew, fixed the struct layout. -tj: Updated commit description. Signed-off-by: Yinghai Lu <[email protected]> Acked-by: Andrew Morton <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2011-02-24bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.cYinghai Lu3-171/+415
mm/bootmem.c contained code paths for both bootmem and no bootmem configurations. They implement about the same set of APIs in different ways and as a result bootmem.c contains massive amount of #ifdef CONFIG_NO_BOOTMEM. Separate out CONFIG_NO_BOOTMEM code into mm/nobootmem.c. As the common part is relatively small, duplicate them in nobootmem.c instead of creating a common file or ifdef'ing in bootmem.c. The followings are duplicated. * {min|max}_low_pfn, max_pfn, saved_max_pfn * free_bootmem_late() * ___alloc_bootmem() * __alloc_bootmem_low() The followings are applicable only to nobootmem and moved verbatim. * __free_pages_memory() * free_all_memory_core_early() The followings are not applicable to nobootmem and omitted in nobootmem.c. * reserve_bootmem_node() * reserve_bootmem() The rest split function bodies according to CONFIG_NO_BOOTMEM. Makefile is updated so that only either bootmem.c or nobootmem.c is built according to CONFIG_NO_BOOTMEM. This patch doesn't introduce any behavior change. -tj: Rewrote commit description. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Yinghai Lu <[email protected]> Acked-by: Andrew Morton <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2011-02-22x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance()Yinghai Lu1-35/+40
Alloc code is much bigger the distance setting. Separate it out into numa_alloc_distance() for readability. -v2: Let alloc_numa_distance to return -ENOMEM on failing path, requested by tj. -tj: Description update. Minor tweaks including function name, location and return value check. Signed-off-by: Yinghai Lu <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2011-02-22x86-64, NUMA: Add proper function comments to global functionsTejun Heo2-10/+69
Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Ingo Molnar <[email protected]>
2011-02-22x86-64, NUMA: Move NUMA emulation into numa_emulation.cTejun Heo4-476/+488
Create numa_emulation.c and move all NUMA emulation code there. The definitions of struct numa_memblk and numa_meminfo are moved to numa_64.h. Also, numa_remove_memblk_from(), numa_cleanup_meminfo(), numa_reset_distance() along with numa_emulation() are made global. - v2: Internal declarations moved to numa_internal.h as suggested by Yinghai. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Yinghai Lu <[email protected]> Cc: Ingo Molnar <[email protected]>
2011-02-22x86-64, NUMA: Prepare numa_emulation() for moving NUMA emulation into a ↵Tejun Heo1-23/+33
separate file Update numa_emulation() such that, it - takes @numa_meminfo and @numa_dist_cnt instead of directly referencing the global variables. - copies the distance table by iterating each distance with node_distance() instead of memcpy'ing the distance table. - tests emu_cmdline to determine whether emulation is requested and fills emu_nid_to_phys[] with identity mapping if emulation is not used. This allows the caller to call numa_emulation() unconditionally and makes return value unncessary. - defines dummy version if CONFIG_NUMA_EMU is disabled. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Ingo Molnar <[email protected]>
2011-02-21x86-64, NUMA: Do not scan two times for setup_node_bootmem()Yinghai Lu1-20/+12
By the time setup_node_bootmem() is called, all the memblocks are already registered. As node_data is allocated from these memblocks, calling it more than once doesn't make any difference. Drop the loop. tj: Dropped comment referencing to the old behavior as suggested by David and rephrased the description. Signed-off-by: Yinghai Lu <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2011-02-17x86-64, NUMA: Put dummy_numa_init() in the init sectionYinghai Lu1-1/+1
dummy_numa_init() is used only during system boot. Put it in .init like other NUMA init functions. - tj: Description update. Signed-off-by: Yinghai Lu <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2011-02-17x86-64, NUMA: Don't call __pa() with invalid address in numa_reset_distance()Yinghai Lu1-4/+6
Do not call __pa(numa_distance) if it was not allocated before. Calling with invalid address triggers VIRTUAL_BUG_ON() in __phys_addr() if CONFIG_DEBUG_VIRTUAL. Also reported by Ingo. http://thread.gmane.org/gmane.linux.kernel/1101306/focus=1101785 - v2: Change to check existing path as tj requested. - tj: Description update. Signed-off-by: Yinghai Lu <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Reported-by: Ingo Molnar <[email protected]>
2011-02-16x86-64, NUMA: Unify emulated distance mappingTejun Heo6-176/+40
NUMA emulation needs to update node distance information. It did it by remapping apicid to PXM mapping, even when amdtopology is being used. There is no reason to go through such convolution. The generic code has all the information necessary to transform the distance table to the emulated nid space. Implement generic distance table transformation in numa_emulation() and drop private implementations in srat_64 and amdtopology_64. This makes find_node_by_addr() and fake_physnodes() and related functions unnecessary, drop them. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Unify emulated apicid -> node mapping transformationTejun Heo3-33/+16
NUMA emulation changes node mappings and thus apicid -> node mapping needs to be updated accordingly. srat_64 and amdtopology_64 did this separately; however, all the necessary information is the mapping from emulated nodes to physical nodes which is available in emu_nid_to_phys[]. Implement common __apicid_to_node[] transformation in numa_emulation() and drop duplicate implementations. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Emulate directly from numa_meminfoTejun Heo1-100/+71
NUMA emulation built physnodes[] array which could only represent configurations from the physical meminfo and emulated nodes using the information. There's no reason to take this extra level of indirection. Update emulation functions so that they operate directly on numa_meminfo. This simplifies the code and makes emulation layout behave better with interleaved physical nodes. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Wrap node ID during emulationTejun Heo1-10/+2
Both emulation layout functions - split_nodes[_size]_interleave() - didn't wrap emulated nid while laying out the fake nodes and tried to avoid interating over the specified number of nodes, which is fragile. Now that the emulation code generates numa_meminfo, the node memblks don't need to be consecutive and emulated node IDs can simply wrap. This makes the code more robust and is necessary for updates to better handle the cases where the physical nodes are interleaved. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Make emulation code build numa_meminfo and share the ↵Tejun Heo1-87/+86
registration path NUMA emulation code built nodes[] array and had its own registration path to set up the emulated nodes. Update it such that it generates emulated numa_meminfo and returns control to initmem_init() and shares the same registration path with non-emulated cases. Because {acpi|amd}_fake_nodes() expect nodes[] parameter, fake_physnodes() now generates nodes[] from numa_meminfo. This will go away with further updates. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Build and use direct emulated nid -> phys nid mappingTejun Heo1-29/+35
NUMA emulation copied physical NUMA configuration into physnodes[] and used it to reverse-map emulated nodes to physical nodes, which is unnecessarily convoluted. Build emu_nid_to_phys[] array to map emulated nids directly to the matching physical nids and use it in numa_add_cpu(). physnodes[] will be removed with further patches. - v2: Build failure when CONFIG_DEBUG_PER_CPU_MAPS due to missing local variable definition fixed. Reported by Ingo. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Trivial changes to prepare for emulation updatesTejun Heo1-16/+18
* Separate out numa_add_memblk_to() from numa_add_memblk() so that different numa_meminfo can be used. * Rename cmdline to emu_cmdline. * Drop @start/last_pfn from numa_emulation() and use max_pfn directly. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Implement generic node distance handlingTejun Heo5-17/+109
Node distance either used direct node comparison, ACPI PXM comparison or ACPI SLIT table lookup. This patch implements generic node distance handling. NUMA init methods can call numa_set_distance() to set distance between nodes and the common __node_distance() implementation will report the set distance. Due to the way NUMA emulation is implemented, the generic node distance handling is used only when emulation is not used. Later patches will update NUMA emulation to use the generic distance mechanism. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Kill mem_nodes_parsedTejun Heo4-13/+20
With all memory configuration information now carried in numa_meminfo, there's no need to keep mem_nodes_parsed separate. Drop it and use numa_nodes_parsed for CPU / memory-less nodes. A new helper numa_nodemask_from_meminfo() is added to calculate memnode mask on the fly which is currently used to set node_possible_map. This simplifies NUMA init methods a bit and removes a source of possible inconsistencies. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Rename cpu_nodes_parsed to numa_nodes_parsedTejun Heo4-10/+10
It's no longer necessary to keep both cpu_nodes_parsed and mem_nodes_parsed. In preparation for merge, rename cpu_nodes_parsed to numa_nodes_parsed. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Kill numa_nodes[]Tejun Heo4-58/+53
numa_nodes[] doesn't carry any information which isn't present in numa_meminfo. Each entry is simply min/max range of all the memblks for the node. This is not only redundant but also inaccurate when memblks for different nodes interleave - for example, find_node_by_addr() can return the wrong nodeid. Kill numa_nodes[] and always use numa_meminfo instead. * nodes_cover_memory() is renamed to numa_meminfo_cover_memory() and now operations on numa_meminfo and returns bool. * setup_node_bootmem() needs min/max range. Compute the range on the fly. setup_node_bootmem() invocation is restructured to use outer loop instead of hardcoding the double invocations. * find_node_by_addr() now operates on numa_meminfo. * setup_physnodes() builds physnodes[] from memblks. This will go away when emulation code is updated to use struct numa_meminfo. This patch also makes the following misc changes. * Clearing of nodes_add[] clearing is converted to memset(). * numa_add_memblk() in amd_numa_init() is moved down a bit for consistency. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Add common find_node_by_addr()Tejun Heo4-31/+20
srat_64.c and amdtopology_64.c had their own versions of find_node_by_addr() which were basically the same. Add common one in numa_64.c and remove the duplicates. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: consolidate and improve memblk sanity checksTejun Heo1-50/+49
memblk sanity check was scattered around and incomplete. Consolidate and improve. * Confliction detection and cutoff_node() logic are moved to numa_cleanup_meminfo(). * numa_cleanup_meminfo() clears the unused memblks before returning. * Check and warn about invalid input parameters in numa_add_memblk(). * Check the maximum number of memblk isn't exceeded in numa_add_memblk(). * numa_cleanup_meminfo() is now called before numa_emulation() so that the emulation code also uses the cleaned up version. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: make numa_cleanup_meminfo() prettierTejun Heo1-17/+19
* Factor out numa_remove_memblk_from(). * Hole detection doesn't need separate start/end. Calculate start/end once. * Relocate comment. * Define iterators at the top and remove unnecessary prefix increments. This prepares for further improvements to the function. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Separate out numa_cleanup_meminfo()Tejun Heo1-37/+46
Separate out numa_cleanup_meminfo() from numa_register_memblks(). node_possible_map initialization is moved to the top of the split numa_register_memblks(). This patch doesn't cause behavior change. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Introduce struct numa_meminfoTejun Heo1-70/+75
Arrays for memblks and nodeids and their length lived in separate variables making things unnecessarily cumbersome. Introduce struct numa_meminfo which contains all memory configuration info. This patch doesn't cause any behavior change. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Remove %NULL @nodeids handling from compute_hash_shift()Tejun Heo1-7/+7
numa_emulation() called compute_hash_shift() with %NULL @nodeids which meant identity mapping between index and nodeid. Make numa_emulation() build identity array and drop %NULL @nodeids handling from populate_memnodemap() and thus from compute_hash_shift(). This is to prepare for transition to using memblks instead. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Kill {acpi|amd|dummy}_scan_nodes()Tejun Heo5-26/+0
They are empty now. Kill them. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Unify the rest of memblk registrationTejun Heo3-74/+68
Move the remaining memblk registration logic from acpi_scan_nodes() to numa_register_memblks() and initmem_init(). This applies nodes_cover_memory() sanity check, memory node sorting and node_online() checking, which were only applied to acpi, to all init methods. As all memblk registration is moved to common code, active range clearing is moved to initmem_init() too and removed from bad_srat(). Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Unify use of memblk in all init methodsTejun Heo4-29/+8
Make both amd and dummy use numa_add_memblk() to describe the detected memory blocks. This allows initmem_init() to call numa_register_memblk() regardless of init method in use. Drop custom memory registration codes from amd and dummy. After this change, memblk merge/cleanup in numa_register_memblks() is applied to all init methods. As this makes compute_hash_shift() and numa_register_memblks() used only inside numa_64.c, make them static. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Factor out memblk handling into numa_{add|register}_memblk()Tejun Heo5-94/+117
Factor out memblk handling from srat_64.c into two functions in numa_64.c. This patch doesn't introduce any behavior change. The next patch will make all init methods use these functions. - v2: Fixed build failure on 32bit due to misplaced NR_NODE_MEMBLKS. Reported by Ingo. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Kill {acpi|amd}_get_nodes()Tejun Heo5-39/+10
With common numa_nodes[], common code in numa_64.c can access it directly. Copy directly and kill {acpi|amd}_get_nodes(). Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Use common numa_nodes[]Tejun Heo4-42/+45
ACPI and amd are using separate nodes[] array. Add numa_nodes[] and use them in all NUMA init methods. cutoff_node() cleanup is moved from srat_64.c to numa_64.c and applied in initmem_init() regardless of init methods. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Move apicid to numa mapping initialization from ↵Tejun Heo1-20/+23
amd_scan_nodes() to amd_numa_init() This brings amd initialization behavior closer to that of acpi. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Remove local variable found from amd_numa_init()Tejun Heo1-4/+2
Use weight count on mem_nodes_parsed instead. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Use common {cpu|mem}_nodes_parsedTejun Heo4-24/+31
ACPI and amd are using separate nodes_parsed masks. Add {cpu|mem}_nodes_parsed and use them in all NUMA init methods. Initialization of the masks and building node_possible_map are now handled commonly by initmem_init(). dummy_numa_init() is updated to set node 0 on both masks. While at it, move the info messages from scan to init. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Restructure initmem_init()Tejun Heo2-46/+52
Reorganize initmem_init() such that, * Different NUMA init methods are iterated in a consistent way. * Each iteration re-initializes all the parameters and different method can be tried after a failure. * Dummy init is handled the same as other methods. Apart from how retry after failure, this patch doesn't change the behavior. The call sequences are kept equivalent across the conversion. After the change, bad_srat() doesn't need to clear apic to node mapping or worry about numa_off. Simplified accordingly. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86, NUMA: Move *_numa_init() invocations into initmem_init()Tejun Heo6-20/+20
There's no reason for these to live in setup_arch(). Move them inside initmem_init(). - v2: x86-32 initmem_init() weren't updated breaking 32bit builds. Fixed. Found by Ankita. Signed-off-by: Tejun Heo <[email protected]> Cc: Ankita Garg <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Wrap acpi_numa_init() so that failure can be indicated by ↵Tejun Heo3-1/+12
return value Because of the way ACPI tables are parsed, the generic acpi_numa_init() couldn't return failure when error was detected by arch hooks. Instead, the failure state was recorded and later arch dependent init hook - acpi_scan_nodes() - would fail. Wrap acpi_numa_init() with x86_acpi_numa_init() so that failure can be indicated as return value immediately. This is in preparation for further NUMA init cleanups. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return ↵Tejun Heo7-19/+22
values The functions used during NUMA initialization - *_numa_init() and *_scan_nodes() - have different arguments and return values. Unify them such that they all take no argument and return 0 on success and -errno on failure. This is in preparation for further NUMA init cleanups. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86, NUMA: Drop @start/last_pfn from initmem_init()Tejun Heo6-23/+14
initmem_init() extensively accesses and modifies global data structures and the parameters aren't even followed depending on which path is being used. Drop @start/last_pfn and let it deal with @max_pfn directly. This is in preparation for further NUMA init cleanups. - v2: x86-32 initmem_init() weren't updated breaking 32bit builds. Fixed. Found by Yinghai. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init()Tejun Heo1-18/+13
Hotplug node handling in acpi_numa_memory_affinity_init() was unnecessarily complicated with storing the original nodes[] entry and restoring it afterwards. Simplify it by not modifying the nodes[] entry for hotplug nodes from the beginning. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Make dummy node initialization path similar to non-dummy onesTejun Heo1-2/+3
Dummy node initialization in initmem_init() didn't initialize apicid to node mapping and set cpu to node mapping directly by caling numa_set_node(), which is different from non-dummy init paths. Update it such that they behave similarly. Initialize apicid to node mapping and call numa_init_array(). The actual cpu to node mapping is handled by init_cpu_to_node() later. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>