Age | Commit message (Collapse) | Author | Files | Lines |
|
When calling free_all_bootmem() the free areas under memblock's control
are released to the buddy allocator. Additionally the reserved list is
freed if it was reallocated by memblock. The same should apply for the
memory list.
Signed-off-by: Philipp Hachtmann <[email protected]>
Reviewed-by: Tejun Heo <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Tang Chen <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Jianguo Wu <[email protected]>
Cc: Yinghai Lu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It's recommended to use NUMA_NO_NODE everywhere to select "process any
node" behavior or to indicate that "no node id specified".
Hence, update __next_free_mem_range*() API's to accept both NUMA_NO_NODE
and MAX_NUMNODES, but emit warning once on MAX_NUMNODES, and correct
corresponding API's documentation to describe new behavior. Also,
update other memblock/nobootmem APIs where MAX_NUMNODES is used
dirrectly.
The change was suggested by Tejun Heo.
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Russell King <[email protected]>
Cc: Tony Lindgren <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Reorder parameters of memblock_find_in_range_node to be consistent with
other memblock APIs.
The change was suggested by Tejun Heo <[email protected]>.
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Russell King <[email protected]>
Cc: Tony Lindgren <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Linux kernel cannot migrate pages used by the kernel. As a result,
hotpluggable memory used by the kernel won't be able to be hot-removed.
To solve this problem, the basic idea is to prevent memblock from
allocating hotpluggable memory for the kernel at early time, and arrange
all hotpluggable memory in ACPI SRAT(System Resource Affinity Table) as
ZONE_MOVABLE when initializing zones.
In the previous patches, we have marked hotpluggable memory regions with
MEMBLOCK_HOTPLUG flag in memblock.memory.
In this patch, we make memblock skip these hotpluggable memory regions
in the default top-down allocation function if movable_node boot option
is specified.
[[email protected]: coding-style fixes]
Signed-off-by: Tang Chen <[email protected]>
Signed-off-by: Zhang Yanfei <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "Rafael J . Wysocki" <[email protected]>
Cc: Chen Tang <[email protected]>
Cc: Gong Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Liu Jiang <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Taku Izumi <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thomas Renninger <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vasilis Liaskovitis <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Wen Congyang <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Yinghai Lu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
[[email protected]: fix powerpc build]
Signed-off-by: Tang Chen <[email protected]>
Reviewed-by: Zhang Yanfei <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "Rafael J . Wysocki" <[email protected]>
Cc: Chen Tang <[email protected]>
Cc: Gong Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Liu Jiang <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Taku Izumi <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thomas Renninger <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vasilis Liaskovitis <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Wen Congyang <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Yinghai Lu <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
regions
In find_hotpluggable_memory, once we find out a memory region which is
hotpluggable, we want to mark them in memblock.memory. So that we could
control memblock allocator not to allocte hotpluggable memory for the
kernel later.
To achieve this goal, we introduce MEMBLOCK_HOTPLUG flag to indicate the
hotpluggable memory regions in memblock and a function
memblock_mark_hotplug() to mark hotpluggable memory if we find one.
[[email protected]: coding-style fixes]
Signed-off-by: Tang Chen <[email protected]>
Reviewed-by: Zhang Yanfei <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "Rafael J . Wysocki" <[email protected]>
Cc: Chen Tang <[email protected]>
Cc: Gong Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Liu Jiang <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Taku Izumi <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thomas Renninger <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vasilis Liaskovitis <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Wen Congyang <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Yinghai Lu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is no flag in memblock to describe what type the memory is.
Sometimes, we may use memblock to reserve some memory for special usage.
And we want to know what kind of memory it is. So we need a way to
In hotplug environment, we want to reserve hotpluggable memory so the
kernel won't be able to use it. And when the system is up, we have to
free these hotpluggable memory to buddy. So we need to mark these
memory first.
In order to do so, we need to mark out these special memory in memblock.
In this patch, we introduce a new "flags" member into memblock_region:
struct memblock_region {
phys_addr_t base;
phys_addr_t size;
unsigned long flags; /* This is new. */
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
int nid;
#endif
};
This patch does the following things:
1) Add "flags" member to memblock_region.
2) Modify the following APIs' prototype:
memblock_add_region()
memblock_insert_region()
3) Add memblock_reserve_region() to support reserve memory with flags, and keep
memblock_reserve()'s prototype unmodified.
4) Modify other APIs to support flags, but keep their prototype unmodified.
The idea is from Wen Congyang <[email protected]> and Liu Jiang <[email protected]>.
Suggested-by: Wen Congyang <[email protected]>
Suggested-by: Liu Jiang <[email protected]>
Signed-off-by: Tang Chen <[email protected]>
Reviewed-by: Zhang Yanfei <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "Rafael J . Wysocki" <[email protected]>
Cc: Chen Tang <[email protected]>
Cc: Gong Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Taku Izumi <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thomas Renninger <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vasilis Liaskovitis <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Yinghai Lu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The Linux kernel cannot migrate pages used by the kernel. As a result,
kernel pages cannot be hot-removed. So we cannot allocate hotpluggable
memory for the kernel.
ACPI SRAT (System Resource Affinity Table) contains the memory hotplug
info. But before SRAT is parsed, memblock has already started to allocate
memory for the kernel. So we need to prevent memblock from doing this.
In a memory hotplug system, any numa node the kernel resides in should be
unhotpluggable. And for a modern server, each node could have at least
16GB memory. So memory around the kernel image is highly likely
unhotpluggable.
So the basic idea is: Allocate memory from the end of the kernel image and
to the higher memory. Since memory allocation before SRAT is parsed won't
be too much, it could highly likely be in the same node with kernel image.
The current memblock can only allocate memory top-down. So this patch
introduces a new bottom-up allocation mode to allocate memory bottom-up.
And later when we use this allocation direction to allocate memory, we
will limit the start address above the kernel.
Signed-off-by: Tang Chen <[email protected]>
Signed-off-by: Zhang Yanfei <[email protected]>
Acked-by: Toshi Kani <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Thomas Renninger <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Wen Congyang <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Taku Izumi <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kamezawa Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Current early_pfn_to_nid() on arch that support memblock go over
memblock.memory one by one, so will take too many try near the end.
We can use existing memblock_search to find the node id for given pfn,
that could save some time on bigger system that have many entries
memblock.memory array.
Here are the timing differences for several machines. In each case with
the patch less time was spent in __early_pfn_to_nid().
3.11-rc5 with patch difference (%)
-------- ---------- --------------
UV1: 256 nodes 9TB: 411.66 402.47 -9.19 (2.23%)
UV2: 255 nodes 16TB: 1141.02 1138.12 -2.90 (0.25%)
UV2: 64 nodes 2TB: 128.15 126.53 -1.62 (1.26%)
UV2: 32 nodes 2TB: 121.87 121.07 -0.80 (0.66%)
Time in seconds.
Signed-off-by: Yinghai Lu <[email protected]>
Cc: Tejun Heo <[email protected]>
Acked-by: Russ Anderson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Tim found:
WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
Hardware name: S2600CP
sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
smpboot: Booting Node 1, Processors #1
Modules linked in:
Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
Call Trace:
set_cpu_sibling_map+0x279/0x449
start_secondary+0x11d/0x1e5
Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")
It turns out movable_map has some problems, and it breaks several things
1. numa_init is called several times, NOT just for srat. so those
nodes_clear(numa_nodes_parsed)
memset(&numa_meminfo, 0, sizeof(numa_meminfo))
can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy.
and make fall back path working.
2. simply split acpi_numa_init to early_parse_srat.
a. that early_parse_srat is NOT called for ia64, so you break ia64.
b. for (i = 0; i < MAX_LOCAL_APIC; i++)
set_apicid_to_node(i, NUMA_NO_NODE)
still left in numa_init. So it will just clear result from early_parse_srat.
it should be moved before that....
c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
early before override from INITRD is settled.
3. that patch TITLE is total misleading, there is NO x86 in the title,
but it changes critical x86 code. It caused x86 guys did not
pay attention to find the problem early. Those patches really should
be routed via tip/x86/mm.
4. after that commit, following range can not use movable ram:
a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
b. initrd... it will be freed after booting, so it could be on movable...
c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
anymore.
d. init_mem_mapping: can not put page table high anymore.
e. initmem_init: vmemmap can not be high local node anymore. That is
not good.
If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.
We have workaround patch that could fix some problems, but some can not
be fixed.
So just remove that offending commit and related ones including:
f7210e6c4ac7 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
protect movablecore_map in memblock_overlaps_region().")
01a178a94e8e ("acpi, memory-hotplug: support getting hotplug info from
SRAT")
27168d38fa20 ("acpi, memory-hotplug: extend movablemem_map ranges to
the end of node")
e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is
ready")
fb06bc8e5f42 ("page_alloc: bootmem limit with movablecore_map")
42f47e27e761 ("page_alloc: make movablemem_map have higher priority")
6981ec31146c ("page_alloc: introduce zone_movable_limit[] to keep
movable limit for nodes")
34b71f1e04fc ("page_alloc: add movable_memmap kernel parameter")
4d59a75125d5 ("x86: get pg_data_t's memory from other node")
Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0. Also
need to find way to put other kernel used ram to local node ram.
Reported-by: Tim Gardner <[email protected]>
Reported-by: Don Morris <[email protected]>
Bisected-by: Don Morris <[email protected]>
Tested-by: Don Morris <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Thomas Renninger <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Tang Chen <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
in memblock_overlaps_region().
The definition of struct movablecore_map is protected by
CONFIG_HAVE_MEMBLOCK_NODE_MAP but its use in memblock_overlaps_region()
is not. So add CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect the use of
movablecore_map in memblock_overlaps_region().
Signed-off-by: Tang Chen <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Ensure the bootmem will not allocate memory from areas that may be
ZONE_MOVABLE. The map info is from movablecore_map boot option.
Signed-off-by: Tang Chen <[email protected]>
Reviewed-by: Wen Congyang <[email protected]>
Reviewed-by: Lai Jiangshan <[email protected]>
Tested-by: Lin Feng <[email protected]>
Cc: Wu Jianguo <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use it to get mem size under the limit_pfn.
to replace local version in x86 reserved_initrd.
-v2: remove not needed cast that is pointed out by HPA.
Signed-off-by: Yinghai Lu <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
We will not map partial pages, so need to make sure memblock
allocation will not allocate those bytes out.
Also we will use for_each_mem_pfn_range() to loop to map memory
range to keep them consistent.
Signed-off-by: Yinghai Lu <[email protected]>
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com
Acked-by: Jacob Shin <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
Cc: <[email protected]>
|
|
Commit 0ee332c14518 ("memblock: Kill early_node_map[]") removed
early_node_map[]. Clean up the comments to comply with that change.
Signed-off-by: Wanpeng Li <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Gavin Shan <[email protected]>
Cc: Yinghai Lu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
memblock_free_reserved_regions() calls memblock_free(), but
memblock_free() would double reserved.regions too, so we could free the
old range for reserved.regions.
Also tj said there is another bug which could be related to this.
| I don't think we're saving any noticeable
| amount by doing this "free - give it to page allocator - reserve
| again" dancing. We should just allocate regions aligned to page
| boundaries and free them later when memblock is no longer in use.
in that case, when DEBUG_PAGEALLOC, will get panic:
memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
BUG: unable to handle kernel paging request at ffff88102febd948
IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 0
Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
RIP: 0010:[<ffffffff836a5774>] [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
See the discussion at https://lkml.org/lkml/2012/6/13/469
So try to allocate with PAGE_SIZE alignment and free it later.
Reported-by: Sasha Levin <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now that all early memory information is in memblock when enabled, we
can implement reverse free area iterator and use it to implement NUMA
aware allocator which is then wrapped for simpler variants instead of
the confusing and inefficient mending of information in separate NUMA
aware allocator.
Implement for_each_free_mem_range_reverse(), use it to reimplement
memblock_find_in_range_node() which in turn is used by all allocators.
The visible allocator interface is inconsistent and can probably use
some cleanup too.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Yinghai Lu <[email protected]>
|
|
Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP -
there's no user of early_node_map[] left. Kill early_node_map[] and
replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP. Also,
relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h
as page_alloc.c would no longer host an alternative implementation.
This change is ultimately one to one mapping and shouldn't cause any
observable difference; however, after the recent changes, there are
some functions which now would fit memblock.c better than page_alloc.c
and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK
doesn't make much sense on some of them. Further cleanups for
functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice.
-v2: Fix compile bug introduced by mis-spelling
CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in
mmzone.h. Reported by Stephen Rothwell.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Chen Liqin <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
|
|
Implement memblock_add_node() which can add a new memblock memory
region with specific node ID.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Yinghai Lu <[email protected]>
|
|
The only function of memblock_analyze() is now allowing resize of
memblock region arrays. Rename it to memblock_allow_resize() and
update its users.
* The following users remain the same other than renaming.
arm/mm/init.c::arm_memblock_init()
microblaze/kernel/prom.c::early_init_devtree()
powerpc/kernel/prom.c::early_init_devtree()
openrisc/kernel/prom.c::early_init_devtree()
sh/mm/init.c::paging_init()
sparc/mm/init_64.c::paging_init()
unicore32/mm/init.c::uc32_memblock_init()
* In the following users, analyze was used to update total size which
is no longer necessary.
powerpc/kernel/machine_kexec.c::reserve_crashkernel()
powerpc/kernel/prom.c::early_init_devtree()
powerpc/mm/init_32.c::MMU_init()
powerpc/mm/tlb_nohash.c::__early_init_mmu()
powerpc/platforms/ps3/mm.c::ps3_mm_add_memory()
powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups()
sh/kernel/machine_kexec.c::reserve_crashkernel()
* x86/kernel/e820.c::memblock_x86_fill() was directly setting
memblock_can_resize before populating memblock and calling analyze
afterwards. Call memblock_allow_resize() before start populating.
memblock_can_resize is now static inside memblock.c.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Russell King <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
|
|
Total size of memory regions was calculated by memblock_analyze()
requiring explicitly calling the function between operations which can
change memory regions and possible users of total size, which is
cumbersome and fragile.
This patch makes each memblock_type track total size automatically
with minor modifications to memblock manipulation functions and remove
requirements on calling memblock_analyze(). [__]memblock_dump_all()
now also dumps the total size of reserved regions.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Yinghai Lu <[email protected]>
|
|
memblock_init() initializes arrays for regions and memblock itself;
however, all these can be done with struct initializers and
memblock_init() can be removed. This patch kills memblock_init() and
initializes memblock with struct initializer.
The only difference is that the first dummy entries don't have .nid
set to MAX_NUMNODES initially. This doesn't cause any behavior
difference.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Russell King <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
|
|
Add __memblock_dump_all() which dumps memblock configuration whether
memblock_debug is enabled or not.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Yinghai Lu <[email protected]>
|
|
prototypes
memblock_{add|remove|free|reserve}() return either 0 or -errno but had
long as return type. Chage it to int. Also, drop 'extern' from all
prototypes in memblock.h - they are unnecessary and used
inconsistently (especially if mm.h is included in the picture).
Signed-off-by: Tejun Heo <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Yinghai Lu <[email protected]>
|
|
Conflicts & resolutions:
* arch/x86/xen/setup.c
dc91c728fd "xen: allow extra memory to be in multiple regions"
24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..."
conflicted on xen_add_extra_mem() updates. The resolution is
trivial as the latter just want to replace
memblock_x86_reserve_range() with memblock_reserve().
* drivers/pci/intel-iommu.c
166e9278a3f "x86/ia64: intel-iommu: move to drivers/iommu/"
5dfe8660a3d "bootmem: Replace work_with_active_regions() with..."
conflicted as the former moved the file under drivers/iommu/.
Resolved by applying the chnages from the latter on the moved
file.
* mm/Kconfig
6661672053a "memblock: add NO_BOOTMEM config symbol"
c378ddd53f9 "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option"
conflicted trivially. Both added config options. Just
letting both add their own options resolves the conflict.
* mm/memblock.c
d1f0ece6cdc "mm/memblock.c: small function definition fixes"
ed7b56a799c "memblock: Remove memblock_memory_can_coalesce()"
confliected. The former updates function removed by the
latter. Resolution is trivial.
Signed-off-by: Tejun Heo <[email protected]>
|
|
SPARC32 require access to the start address. Add a new helper
memblock_start_of_DRAM() to give access to the address of the first
memblock - which contains the lowest address.
The awkward name was chosen to match the already present
memblock_end_of_DRAM().
Signed-off-by: Sam Ravnborg <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Yinghai Lu <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Other than sanity check and debug message, the x86 specific version of
memblock reserve/free functions are simple wrappers around the generic
versions - memblock_reserve/free().
This patch adds debug messages with caller identification to the
generic versions and replaces x86 specific ones and kills them.
arch/x86/include/asm/memblock.h and arch/x86/mm/memblock.c are empty
after this change and removed.
Signed-off-by: Tejun Heo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: Yinghai Lu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
From 6839454ae63f1eb21e515c10229ca95c22955fec Mon Sep 17 00:00:00 2001
From: Tejun Heo <[email protected]>
Date: Thu, 14 Jul 2011 11:22:17 +0200
Make ARCH_DISCARD_MEMBLOCK a config option so that it can be handled
together with other MEMBLOCK options.
Signed-off-by: Tejun Heo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: Yinghai Lu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
Implement for_each_free_mem_range() which iterates over free memory
areas according to memblock (memory && !reserved). This will be used
to simplify memblock users.
Signed-off-by: Tejun Heo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: Yinghai Lu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
From 83103b92f3234ec830852bbc5c45911bd6cbdb20 Mon Sep 17 00:00:00 2001
From: Tejun Heo <[email protected]>
Date: Thu, 14 Jul 2011 11:22:16 +0200
Add optional region->nid which can be enabled by arch using
CONFIG_HAVE_MEMBLOCK_NODE_MAP. When enabled, memblock also carries
NUMA node information and replaces early_node_map[].
Newly added memblocks have MAX_NUMNODES as nid. Arch can then call
memblock_set_node() to set node information. memblock takes care of
merging and node affine allocations w.r.t. node information.
When MEMBLOCK_NODE_MAP is enabled, early_node_map[], related data
structures and functions to manipulate and iterate it are disabled.
memblock version of __next_mem_pfn_range() is provided such that
for_each_mem_pfn_range() behaves the same and its users don't have to
be updated.
-v2: Yinghai spotted section mismatch caused by missing
__init_memblock in memblock_set_node(). Fixed.
Signed-off-by: Tejun Heo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: Yinghai Lu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
From 19ab281ed67b87a6623d725237a7333ca79f1e75 Mon Sep 17 00:00:00 2001
From: Tejun Heo <[email protected]>
Date: Thu, 14 Jul 2011 11:22:16 +0200
memblock will be extended to include early_node_map[], which is also
used during memory hotplug. Make memblock use __meminit[data] instead
of __init[data] so that memory hotplug code can safely reference it.
Signed-off-by: Tejun Heo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Yinghai Lu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
Arch could implement memblock_memor_can_coalesce() to veto merging of
adjacent or overlapping memblock regions; however, no arch did and any
vetoing would trigger WARN_ON(). Memblock regions are supposed to
deal with proper memory anyway. Remove the unused hook.
Signed-off-by: Tejun Heo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: Yinghai Lu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
Node affine memblock allocation logic is currently implemented across
memblock_alloc_nid() and memblock_alloc_nid_region(). This
reorganizes it such that it resembles that of non-NUMA allocation API.
Area finding is collected and moved into new exported function
memblock_find_in_range_node() which is symmetrical to non-NUMA
counterpart - it handles @start/@end and understands ANYWHERE and
ACCESSIBLE. memblock_alloc_nid() now simply calls
memblock_find_in_range_node() and reserves the returned area.
This makes memblock_alloc[_try]_nid() observe ACCESSIBLE limit on node
affine allocations too (again, this doesn't make any difference for
the current sole user - sparc64).
Signed-off-by: Tejun Heo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: Yinghai Lu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
memblock_nid_range() is used to implement memblock_[try_]alloc_nid().
The generic version determines the range by walking early_node_map
with for_each_mem_pfn_range(). The generic version is defined __weak
to allow arch override.
Currently, only sparc overrides it; however, with the previous update
to the generic implementation, there isn't much to be gained with arch
override. Sparc would behave exactly the same with the generic
implementation.
This patch disallows arch override for memblock_nid_range() and make
both generic and sparc versions static.
sparc is only compile tested.
Signed-off-by: Tejun Heo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: "David S. Miller" <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
memblock_find_base() is a static function with two callers in
memblock.c and memblock_find_in_range() is a wrapper around it which
just changes the types and order of parameters.
Make memblock_find_in_range() take phys_addr_t instead of u64 for
consistency and replace memblock_find_base() with it.
Signed-off-by: Tejun Heo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: Yinghai Lu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
25818f0f28 (memblock: Make MEMBLOCK_ERROR be 0) thankfully made
MEMBLOCK_ERROR 0 and there already are codes which expect error return
to be 0. There's no point in keeping MEMBLOCK_ERROR around. End its
misery.
Signed-off-by: Tejun Heo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: Yinghai Lu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
On larger systems, information in the kernel log is lost because there is
so much early text printed, that it overflows the static log buffer before
the log_buf_len kernel parameter can be processed, and a bigger log buffer
allocated.
Distros are relunctant to increase memory usage by increasing the size of
the static log buffer, so minimize the problem by allocating the new log
buffer as early as possible.
This patch:
Add an error return if CONFIG_HAVE_MEMBLOCK is not set instead of having
to add #ifdef CONFIG_HAVE_MEMBLOCK around blocks of code calling that
function.
Signed-off-by: Mike Travis <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Jack Steiner <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We need to round memory regions correctly -- specifically, we need to
round reserved region in the more expansive direction (lower limit
down, upper limit up) whereas usable memory regions need to be rounded
in the more restrictive direction (lower limit up, upper limit down).
This introduces two set of inlines:
memblock_region_memory_base_pfn()
memblock_region_memory_end_pfn()
memblock_region_reserved_base_pfn()
memblock_region_reserved_end_pfn()
Although they are antisymmetric (and therefore are technically
duplicates) the use of the different inlines explicitly documents the
programmer's intention.
The lack of proper rounding caused a bug on ARM, which was then found
to also affect other architectures.
Reported-by: Russell King <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
LKML-Reference: <[email protected]>
Cc: Jeremy Fitzhardinge <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
Stephen found a bunch of section mismatch warnings with the
new memblock changes.
Use __init_memblock to replace __init in memblock.c and remove
__init in memblock.h. We should not use __init in header files.
Reported-by: Stephen Rothwell <[email protected]>
Tested-by: Stephen Rothwell <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
So we can avoid export memblock_reserved_init_regions()
Suggested by Ben.
-v2: use __init_memblock attribute
Signed-off-by: Yinghai Lu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
This is a wrapper for memblock_find_base() using slightly different
arguments (start,end instead of start,size for example) in order to
make it easier to convert existing arch/x86 code.
Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
|
|
Arch code can define ARCH_DISCARD_MEMBLOCK in asm/memblock.h,
which in turns causes memblock code and data to go respectively
into the .init and .initdata sections. This will be used by the
x86 architecture.
If ARCH_DISCARD_MEMBLOCK is defined, the debugfs files to inspect
the memblock arrays after boot are not created.
Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
|
|
This should make it easier to catch/debug incorrect use when
the CONFIG_ option isn't set.
Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
|
|
And ensure we don't hand out 0 as a valid allocation. We put the
low limit at PAGE_SIZE arbitrarily.
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
|
|
will used by x86 memblock_x86_find_in_range_node and nobootmem replacement
Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
|
|
This exposes memblock_debug and associated memblock_dbg() macro,
along with memblock_can_resize so that x86 can use these when
ported to use memblock
Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
|
|
The former is now strict, it will fail if it cannot honor the allocation
within the node, while the later implements the previous semantic which
falls back to allocating anywhere.
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
|
|
We now provide a default (weak) implementation of memblock_nid_range()
which uses the early_pfn_map[] if CONFIG_ARCH_POPULATES_NODE_MAP
is set. Sparc still needs to use its own method due to the way
the pages can be scattered between nodes.
This implementation is inefficient due to our main algorithm and
callback construct wanting to work on an ascending addresses bases
while early_pfn_map[] would rather work with nid's (it's unsorted
at that stage). But it should work and we can look into improving
it subsequently, possibly using arch compile options to chose a
different algorithm alltogether.
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
|
|
Some archs such as ARM want to avoid coalescing accross things such
as the lowmem/highmem boundary or similar. This provides the option
to control it via an arch callback for which a weak default is provided
which always allows coalescing.
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
|
|
their size a variable
This is in preparation for having resizable arrays.
Note that we still allocate one more than needed, this is unchanged from
the previous implementation.
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
|