Age | Commit message (Collapse) | Author | Files | Lines |
|
__init is used with inline due to which memblock wraper functions are
not getting inline.
for example:
[ 0.000000] memblock_alloc_try_nid: 1490 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 memblock_alloc+0x20/0x2c
[ 0.000000] memblock_reserve: [0x000000023f09a3c0-0x000000023f09a991] memblock_alloc_range_nid+0xc0/0x188
Dropping __init from memblock wrapper functions to make it inline and it
increase the debugability.
After:
[ 0.000000] memblock_alloc_try_nid: 1490 bytes align=0x40 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 start_kernel+0xa4/0x568
[ 0.000000] memblock_reserve: [0x000000023f09a3c0-0x000000023f09a991] memblock_alloc_range_nid+0xc0/0x188
Signed-off-by: Faiyaz Mohammed <[email protected]>
Signed-off-by: Mike Rapoport <[email protected]>
|
|
for_each_memblock() is used to iterate over memblock.memory in a few
places that use data from memblock_region rather than the memory ranges.
Introduce separate for_each_mem_region() and
for_each_reserved_mem_region() to improve encapsulation of memblock
internals from its users.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Acked-by: Ingo Molnar <[email protected]> [x86]
Acked-by: Thomas Bogendoerfer <[email protected]> [MIPS]
Acked-by: Miguel Ojeda <[email protected]> [.clang-format]
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Emil Renner Berthing <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Iteration over memblock.reserved with for_each_reserved_mem_region() used
__next_reserved_mem_region() that implemented a subset of
__next_mem_region().
Use __for_each_mem_range() and, essentially, __next_mem_region() with
appropriate parameters to reduce code duplication.
While on it, rename for_each_reserved_mem_region() to
for_each_reserved_mem_range() for consistency.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Miguel Ojeda <[email protected]> [.clang-format]
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Emil Renner Berthing <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The only user of memblock_mem_size() was x86 setup code, it is gone now
and memblock_mem_size() funciton can be removed.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Emil Renner Berthing <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently for_each_mem_range() and for_each_mem_range_rev() iterators are
the most generic way to traverse memblock regions. As such, they have 8
parameters and they are hardly convenient to users. Most users choose to
utilize one of their wrappers and the only user that actually needs most
of the parameters is memblock itself.
To avoid yet another naming for memblock iterators, rename the existing
for_each_mem_range[_rev]() to __for_each_mem_range[_rev]() and add a new
for_each_mem_range[_rev]() wrappers with only index, start and end
parameters.
The new wrapper nicely fits into init_unavailable_mem() and will be used
in upcoming changes to simplify memblock traversals.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Thomas Bogendoerfer <[email protected]> [MIPS]
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Emil Renner Berthing <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The only user of memblock_dbg() outside memblock was s390 setup code and
it is converted to use pr_debug() instead. This allows to stop exposing
memblock_debug and memblock_dbg() to the rest of the kernel.
[[email protected]: make memblock_dbg() safer and neater]
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Emil Renner Berthing <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
for_each_memblock_type() is not used outside mm/memblock.c, move it there
from include/linux/memblock.h
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Emil Renner Berthing <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
"physmem" in the memblock allocator is somewhat weird: it's not actually
used for allocation, it's simply information collected during boot, which
describes the unmodified physical memory map at boot time, without any
standby/hotplugged memory. It's only used on s390 and is currently the
only reason s390 keeps using CONFIG_ARCH_KEEP_MEMBLOCK.
Physmem isn't numa aware and current users don't specify any flags. Let's
hide it from the user, exposing only for_each_physmem(), and simplify. The
interface for physmem is now really minimalistic:
- memblock_physmem_add() to add ranges
- for_each_physmem() / __next_physmem_range() to walk physmem ranges
Don't place it into an __init section and don't discard it without
CONFIG_ARCH_KEEP_MEMBLOCK. As we're reusing __next_mem_range(), remove
the __meminit notifier to avoid section mismatch warnings once
CONFIG_ARCH_KEEP_MEMBLOCK is no longer used with
CONFIG_HAVE_MEMBLOCK_PHYS_MAP.
While fixing up the documentation, sneak in some related cleanups. We can
stop setting CONFIG_ARCH_KEEP_MEMBLOCK for s390 next.
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
|
|
Fix a minor typo "usabe->usable" for the current discription of member
variable "memory" in struct memblock.
BTW, I think it's unclear the member variable "base" in struct
memblock_type is currently described as the physical address of memory
region, change it to base address of the region is clearer since the
variable is decorated as phys_addr_t.
Signed-off-by: chenqiwu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Using padata during deferred init has only been tested on x86, so for now
limit it to this architecture.
If another arch wants this, it can find the max thread limit that's best
for it and override deferred_page_init_max_threads().
Signed-off-by: Daniel Jordan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Josh Triplett <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Robert Elliott <[email protected]>
Cc: Shile Zhang <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
CONFIG_HAVE_MEMBLOCK_NODE_MAP is used to differentiate initialization of
nodes and zones structures between the systems that have region to node
mapping in memblock and those that don't.
Currently all the NUMA architectures enable this option and for the
non-NUMA systems we can presume that all the memory belongs to node 0 and
therefore the compile time configuration option is not required.
The remaining few architectures that use DISCONTIGMEM without NUMA are
easily updated to use memblock_add_node() instead of memblock_add() and
thus have proper correspondence of memblock regions to NUMA nodes.
Still, free_area_init_node() must have a backward compatible version
because its semantics with and without CONFIG_HAVE_MEMBLOCK_NODE_MAP is
different. Once all the architectures will use the new semantics, the
entire compatibility layer can be dropped.
To avoid addition of extra run time memory to store node id for
architectures that keep memblock but have only a single node, the node id
field of the memblock_region is guarded by CONFIG_NEED_MULTIPLE_NODES and
the corresponding accessors presume that in those cases it is always 0.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Hoan Tran <[email protected]> [arm64]
Acked-by: Catalin Marinas <[email protected]> [arm64]
Cc: Baoquan He <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
I've noticed that there is no interface exposed by CMA which would let
me to declare contigous memory on particular NUMA node.
This patchset adds the ability to try to allocate contiguous memory on a
specific node. It will fallback to other nodes if the specified one
doesn't work.
Implement a new method for declaring contigous memory on particular node
and keep cma_declare_contiguous() as a wrapper.
[[email protected]: build fix]
Signed-off-by: Aslan Bakirov <[email protected]>
Signed-off-by: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Andreas Schaufler <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
On the s390 platform memblock.physmem array is being built by directly
calling into memblock_add_range() which is a low level function not
intended to be used outside of memblock. Hence lets conditionally add
helper functions for physmem array when HAVE_MEMBLOCK_PHYS_MAP is
enabled. Also use MAX_NUMNODES instead of 0 as node ID similar to
memblock_add() and memblock_reserve(). Make memblock_add_range() a
static function as it is no longer getting used outside of memblock.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Acked-by: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Collin Walling <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: Philipp Rudo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
sparse_buffer_init() use memblock_alloc_try_nid_raw() to allocate memory
for page management structure, if memory allocation fails from specified
node, it will fall back to allocate from other nodes.
Normally, the page management structure will not exceed 2% of the total
memory, but a large continuous block of allocation is needed. In most
cases, memory allocation from the specified node will succeed, but a
node memory become highly fragmented will fail. we expect to allocate
memory base section rather than by allocating a large block of memory
from other NUMA nodes
Add memblock_alloc_exact_nid_raw() for this situation, which allocate
boot memory block on the exact node. If a large contiguous block memory
allocate fail in sparse_buffer_init(), it will fall back to allocate
small block memory base section.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yunfeng Ye <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Allison Randal <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Most architectures do not need the memblock memory after the page
allocator is initialized, but only few enable ARCH_DISCARD_MEMBLOCK in the
arch Kconfig.
Replacing ARCH_DISCARD_MEMBLOCK with ARCH_KEEP_MEMBLOCK and inverting the
logic makes it clear which architectures actually use memblock after
system initialization and skips the necessity to add ARCH_DISCARD_MEMBLOCK
to the architectures that are still missing that option.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Eric Biederman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add yet another iterator, for_each_free_mem_range_in_zone_from, and then
use it to support initializing and freeing pages in groups no larger than
MAX_ORDER_NR_PAGES. By doing this we can greatly improve the cache
locality of the pages while we do several loops over them in the init and
freeing process.
We are able to tighten the loops further as a result of the "from"
iterator as we can perform the initial checks for first_init_pfn in our
first call to the iterator, and continue without the need for those checks
via the "from" iterator. I have added this functionality in the function
called deferred_init_mem_pfn_range_in_zone that primes the iterator and
causes us to exit if we encounter any failure.
On my x86_64 test system with 384GB of memory per node I saw a reduction
in initialization time from 1.85s to 1.38s as a result of this patch.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Duyck <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Introduce a new iterator for_each_free_mem_pfn_range_in_zone.
This iterator will take care of making sure a given memory range provided
is in fact contained within a zone. It takes are of all the bounds
checking we were doing in deferred_grow_zone, and deferred_init_memmap.
In addition it should help to speed up the search a bit by iterating until
the end of a range is greater than the start of the zone pfn range, and
will exit completely if the start is beyond the end of the zone.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Duyck <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The memblock API provides dedicated helpers to set or clear a flag on a
memory region, e.g. memblock_{mark,clear}_hotplug().
The memblock_{set,clear}_region_flags() functions are used only by the
memblock internal function that adjusts the region flags. Drop these
functions and use open-coded implementation instead.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As all the memblock allocation functions return NULL in case of error
rather than panic(), the duplicates with _nopanic suffix can be removed.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Petr Mladek <[email protected]> [printk]
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Guo Ren <[email protected]> [c-sky]
Cc: Heiko Carstens <[email protected]>
Cc: Juergen Gross <[email protected]> [Xen]
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
These functions are not used outside memblock. Make them static.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Guo Ren <[email protected]> [c-sky]
Cc: Heiko Carstens <[email protected]>
Cc: Juergen Gross <[email protected]> [Xen]
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently, memblock has several internal functions with overlapping
functionality. They all call memblock_find_in_range_node() to find free
memory and then reserve the allocated range and mark it with kmemleak.
However, there is difference in the allocation constraints and in
fallback strategies.
The allocations returning physical address first attempt to find free
memory on the specified node within mirrored memory regions, then retry
on the same node without the requirement for memory mirroring and
finally fall back to all available memory.
The allocations returning virtual address start with clamping the
allowed range to memblock.current_limit, attempt to allocate from the
specified node from regions with mirroring and with user defined minimal
address. If such allocation fails, next attempt is done with node
restriction lifted. Next, the allocation is retried with minimal
address reset to zero and at last without the requirement for mirrored
regions.
Let's consolidate various fallbacks handling and make them more
consistent for physical and virtual variants. Most of the fallback
handling is moved to memblock_alloc_range_nid() and it now handles node
and mirror fallbacks.
The memblock_alloc_internal() uses memblock_alloc_range_nid() to get a
physical address of the allocated range and converts it to virtual
address.
The fallback for allocation below the specified minimal address remains
in memblock_alloc_internal() because memblock_alloc_range_nid() is used
by CMA with exact requirement for lower bounds.
The memblock_phys_alloc_nid() function is completely dropped as it is not
used anywhere outside memblock and its only usage can be replaced by a
call to memblock_alloc_range_nid().
[[email protected]: fix parameter order in memblock_phys_alloc_try_nid()]
Link: http://lkml.kernel.org/r/20190203113915.GC8620@rapoport-lnx
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Tested-by: Michael Ellerman <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Guo Ren <[email protected]> [c-sky]
Cc: Heiko Carstens <[email protected]>
Cc: Juergen Gross <[email protected]> [Xen]
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The memblock_alloc_base() function tries to allocate a memory up to the
limit specified by its max_addr parameter and panics if the allocation
fails. Replace its usage with memblock_phys_alloc_range() and make the
callers check the return value and panic in case of error.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Michael Ellerman <[email protected]> [powerpc]
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Guo Ren <[email protected]> [c-sky]
Cc: Heiko Carstens <[email protected]>
Cc: Juergen Gross <[email protected]> [Xen]
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The __memblock_alloc_base() function tries to allocate a memory up to
the limit specified by its max_addr parameter. Depending on the value
of this parameter, the __memblock_alloc_base() can is replaced with the
appropriate memblock_phys_alloc*() variant.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Rob Herring <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Guo Ren <[email protected]> [c-sky]
Cc: Heiko Carstens <[email protected]>
Cc: Juergen Gross <[email protected]> [Xen]
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Make the memblock_phys_alloc() function an inline wrapper for
memblock_phys_alloc_range() and update the memblock_phys_alloc() callers
to check the returned value and panic in case of error.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Guo Ren <[email protected]> [c-sky]
Cc: Heiko Carstens <[email protected]>
Cc: Juergen Gross <[email protected]> [Xen]
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Rename memblock_alloc_range() to memblock_phys_alloc_range() to
emphasize that it returns a physical address.
While on it, remove the 'enum memblock_flags' parameter from this
function as its only user anyway sets it to MEMBLOCK_NONE, which is the
default for the most of memblock allocations.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Guo Ren <[email protected]> [c-sky]
Cc: Heiko Carstens <[email protected]>
Cc: Juergen Gross <[email protected]> [Xen]
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
memblock_alloc_base_nid() is a oneliner wrapper for
memblock_alloc_range_nid() without any side effect.
Replace it's usage by the direct calls to memblock_alloc_range_nid().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Guo Ren <[email protected]> [c-sky]
Cc: Heiko Carstens <[email protected]>
Cc: Juergen Gross <[email protected]> [Xen]
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In the irqchip and EFI code, we have what basically amounts to a quirk
to work around a peculiarity in the GICv3 architecture, which permits
the system memory address of LPI tables to be programmable only once
after a CPU reset. This means kexec kernels must use the same memory
as the first kernel, and thus ensure that this memory has not been
given out for other purposes by the time the ITS init code runs, which
is not very early for secondary CPUs.
On systems with many CPUs, these reservations could overflow the
memblock reservation table, and this was addressed in commit:
eff896288872 ("efi/arm: Defer persistent reservations until after paging_init()")
However, this turns out to have made things worse, since the allocation
of page tables and heap space for the resized memblock reservation table
itself may overwrite the regions we are attempting to reserve, which may
cause all kinds of corruption, also considering that the ITS will still
be poking bits into that memory in response to incoming MSIs.
So instead, let's grow the static memblock reservation table on such
systems so it can accommodate these reservations at an earlier time.
This will permit us to revert the above commit in a subsequent patch.
[ mingo: Minor cleanups. ]
Signed-off-by: Ard Biesheuvel <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Kmemleak does not play well with KASAN (tested on both HPE Apollo 70 and
Huawei TaiShan 2280 aarch64 servers).
After calling start_kernel()->setup_arch()->kasan_init(), kmemleak early
log buffer went from something like 280 to 260000 which caused kmemleak
disabled and crash dump memory reservation failed. The multitude of
kmemleak_alloc() calls is from nested loops while KASAN is setting up full
memory mappings, so let early kmemleak allocations skip those
memblock_alloc_internal() calls came from kasan_init() given that those
early KASAN memory mappings should not reference to other memory. Hence,
no kmemleak false positives.
kasan_init
kasan_map_populate [1]
kasan_pgd_populate [2]
kasan_pud_populate [3]
kasan_pmd_populate [4]
kasan_pte_populate [5]
kasan_alloc_zeroed_page
memblock_alloc_try_nid
memblock_alloc_internal
kmemleak_alloc
[1] for_each_memblock(memory, reg)
[2] while (pgdp++, addr = next, addr != end)
[3] while (pudp++, addr = next, addr != end && pud_none(READ_ONCE(*pudp)))
[4] while (pmdp++, addr = next, addr != end && pmd_none(READ_ONCE(*pmdp)))
[5] while (ptep++, addr = next, addr != end && pte_none(READ_ONCE(*ptep)))
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Qian Cai <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
__memblock_free_early() is only used by the convenience wrappers, so
essentially we wrap a call to memblock_free() twice. Replace calls of
__memblock_free_early() with calls to memblock_free() and drop the former.
Link: http://lkml.kernel.org/r/20181125102940.GE28634@rapoport-lnx
Signed-off-by: Mike Rapoport <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Wentao Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When a memblock allocation APIs are called with align = 0, the alignment
is implicitly set to SMP_CACHE_BYTES.
Implicit alignment is done deep in the memblock allocator and it can
come as a surprise. Not that such an alignment would be wrong even
when used incorrectly but it is better to be explicit for the sake of
clarity and the prinicple of the least surprise.
Replace all such uses of memblock APIs with the 'align' parameter
explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
in the memblock internal allocation functions.
For the case when memblock APIs are used via helper functions, e.g. like
iommu_arena_new_node() in Alpha, the helper functions were detected with
Coccinelle's help and then manually examined and updated where
appropriate.
The direct memblock APIs users were updated using the semantic patch below:
@@
expression size, min_addr, max_addr, nid;
@@
(
|
- memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
|
- memblock_alloc(size, 0)
+ memblock_alloc(size, SMP_CACHE_BYTES)
|
- memblock_alloc_raw(size, 0)
+ memblock_alloc_raw(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from(size, 0, min_addr)
+ memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_nopanic(size, 0)
+ memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low(size, 0)
+ memblock_alloc_low(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low_nopanic(size, 0)
+ memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from_nopanic(size, 0, min_addr)
+ memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_node(size, 0, nid)
+ memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
)
[[email protected]: changelog update]
[[email protected]: coding-style fixes]
[[email protected]: fix missed uses of implicit alignment]
Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Acked-by: Paul Burton <[email protected]> [MIPS]
Acked-by: Michael Ellerman <[email protected]> [powerpc]
Acked-by: Michal Hocko <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.
The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>
@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>
[[email protected]: dma-direct: fix up for the removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: powerpc: fix up for removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Serge Semin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Make it explicit that the caller gets a physical address rather than a
virtual one.
This will also allow using meblock_alloc prefix for memblock allocations
returning virtual address, which is done in the following patches.
The conversion is done using the following semantic patch:
@@
expression e1, e2, e3;
@@
(
- memblock_alloc(e1, e2)
+ memblock_phys_alloc(e1, e2)
|
- memblock_alloc_nid(e1, e2, e3)
+ memblock_phys_alloc_nid(e1, e2, e3)
|
- memblock_alloc_try_nid(e1, e2, e3)
+ memblock_phys_alloc_try_nid(e1, e2, e3)
)
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Serge Semin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
All architecures use memblock for early memory management. There is no need
for the CONFIG_HAVE_MEMBLOCK configuration option.
[[email protected]: of/fdt: fixup #ifdefs]
Link: http://lkml.kernel.org/r/20180919103457.GA20545@rapoport-lnx
[[email protected]: csky: fixups after bootmem removal]
Link: http://lkml.kernel.org/r/20180926112744.GC4628@rapoport-lnx
[[email protected]: remove stale #else and the code it protects]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Serge Semin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: Fix for movable_node boot option", v3.
This patch series contains a fix for the movable_node boot option issue
which was introduced by commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM
regions into memblock.reserved").
The commit breaks the option because it changed the memory gap range to
reserved memblock. So, the node is marked as Normal zone even if the SRAT
has Hot pluggable affinity.
First and second patch fix the original issue which the commit tried to
fix, then revert the commit.
This patch (of 3):
There is a kernel panic that is triggered when reading /proc/kpageflags on
the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':
BUG: unable to handle kernel paging request at fffffffffffffffe
PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
RIP: 0010:stable_page_flags+0x27/0x3c0
Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202
RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0
RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001
R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0
R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10
FS: 00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0
Call Trace:
kpageflags_read+0xc7/0x120
proc_reg_read+0x3c/0x60
__vfs_read+0x36/0x170
vfs_read+0x89/0x130
ksys_pread64+0x71/0x90
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7efc42e75e23
Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24
According to kernel bisection, this problem became visible due to commit
f7f99100d8d9 which changes how struct pages are initialized.
Memblock layout affects the pfn ranges covered by node/zone. Consider
that we have a VM with 2 NUMA nodes and each node has 4GB memory, and the
default (no memmap= given) memblock layout is like below:
MEMBLOCK configuration:
memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000
memory.cnt = 0x4
memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
memory[0x2] [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
memory[0x3] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
...
If you give memmap=1G!4G (so it just covers memory[0x2]),
the range [0x100000000-0x13fffffff] is gone:
MEMBLOCK configuration:
memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000
memory.cnt = 0x3
memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
memory[0x2] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
...
This causes shrinking node 0's pfn range because it is calculated by the
address range of memblock.memory. So some of struct pages in the gap
range are left uninitialized.
We have a function zero_resv_unavail() which does zeroing the struct pages
outside memblock.memory, but currently it covers only the reserved
unavailable range (i.e. memblock.memory && !memblock.reserved). This
patch extends it to cover all unavailable range, which fixes the reported
issue.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by-by: Masayoshi Mizuma <[email protected]>
Tested-by: Oscar Salvador <[email protected]>
Tested-by: Masayoshi Mizuma <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
* make memblock_discard description kernel-doc compatible
* add brief description for memblock_setclr_flag and describe its
parameters
* fixup return value descriptions
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
Since kernel-doc does not like anonymous enums the name is required for
adding documentation. While on it, I've also updated all the function
declarations to use 'enum memblock_flags' instead of unsigned long.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- Support for 4PB user address space on 64-bit, opt-in via mmap().
- Removal of POWER4 support, which was accidentally broken in 2016
and no one noticed, and blocked use of some modern instructions.
- Workarounds so that the hypervisor can enable Transactional Memory
on Power9.
- A series to disable the DAWR (Data Address Watchpoint Register) on
Power9.
- More information displayed in the meltdown/spectre_v1/v2 sysfs
files.
- A vpermxor (Power8 Altivec) implementation for the raid6 Q
Syndrome.
- A big series to make the allocation of our pacas (per cpu area),
kernel page tables, and per-cpu stacks NUMA aware when using the
Radix MMU on Power9.
And as usual many fixes, reworks and cleanups.
Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy,
Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual,
Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic
Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook,
Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe,
Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring,
Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira,
Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras,
Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher
Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev
Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav
Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun"
* tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits)
powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep
powerpc/64s: Fix POWER9 DD2.2 and above in cputable features
powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit
powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits
Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead"
powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo}
powerpc: io.h: move iomap.h include so that it can use readq/writeq defs
cxl: Fix possible deadlock when processing page faults from cxllib
powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it
powerpc/mm/radix: Update command line parsing for disable_radix
powerpc/mm/radix: Parse disable_radix commandline correctly.
powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb
powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix
powerpc/mm/keys: Update documentation and remove unnecessary check
powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead
powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop()
powerpc/powernv: Always stop secondaries before reboot/shutdown
powerpc: hard disable irqs in smp_send_stop loop
powerpc: use NMI IPI for smp_send_stop
powerpc/powernv: Fix SMT4 forcing idle code
...
|
|
Deferred page initialization allows the boot cpu to initialize a small
subset of the system's pages early in boot, with other cpus doing the
rest later on.
It is, however, problematic to know how many pages the kernel needs
during boot. Different modules and kernel parameters may change the
requirement, so the boot cpu either initializes too many pages or runs
out of memory.
To fix that, initialize early pages on demand. This ensures the kernel
does the minimum amount of work to initialize pages during boot and
leaves the rest to be divided in the multithreaded initialization path
(deferred_init_memmap).
The on-demand code is permanently disabled using static branching once
deferred pages are initialized. After the static branch is changed to
false, the overhead is up-to two branch-always instructions if the zone
watermark check fails or if rmqueue fails.
Sergey Senozhatsky noticed that while deferred pages currently make
sense only on NUMA machines (we start one thread per latency node),
CONFIG_NUMA is not a requirement for CONFIG_DEFERRED_STRUCT_PAGE_INIT,
so that is also must be addressed in the patch.
[[email protected]: fix typo in comment, make deferred_pages static]
[[email protected]: fix min() type mismatch warning]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: use zone_to_nid() in deferred_grow_zone()]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: might_sleep warning]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: s/spin_lock/spin_lock_irq/ in page_alloc_init_late()]
[[email protected]: v5]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: tweak comments]
[[email protected]: v6]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: coding-style fixes]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Daniel Jordan <[email protected]>
Reviewed-by: Steven Sistare <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Tested-by: Masayoshi Mizuma <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: AKASHI Takahiro <[email protected]>
Cc: Gioh Kim <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Yaowei Bai <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Miles Chen <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This will be used by powerpc to allocate per-cpu stacks and other
data structures node-local where possible.
Signed-off-by: Nicholas Piggin <[email protected]>
[mpe: Drop stray change to memblock_alloc_range() as noticed by akpm]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This reverts commit b92df1de5d28 ("mm: page_alloc: skip over regions of
invalid pfns where possible"). The commit is meant to be a boot init
speed up skipping the loop in memmap_init_zone() for invalid pfns.
But given some specific memory mapping on x86_64 (or more generally
theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the
implementation also skips valid pfns which is plain wrong and causes
'kernel BUG at mm/page_alloc.c:1389!'
crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
kernel BUG at mm/page_alloc.c:1389!
invalid opcode: 0000 [#1] SMP
--
RIP: 0010: move_freepages+0x15e/0x160
--
Call Trace:
move_freepages_block+0x73/0x80
__rmqueue+0x263/0x460
get_page_from_freelist+0x7e1/0x9e0
__alloc_pages_nodemask+0x176/0x420
--
crash> page_init_bug -v | grep RAM
<struct resource 0xffff88067fffd2f8> 1000 - 9bfff System RAM (620.00 KiB)
<struct resource 0xffff88067fffd3a0> 100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
<struct resource 0xffff88067fffd410> 4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB)
<struct resource 0xffff88067fffd480> 4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB)
<struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB)
<struct resource 0xffff88067fffd640> 100000000 - 67fffffff System RAM ( 22.00 GiB)
crash> page_init_bug | head -6
<struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB)
<struct page 0xffffea0001ede200> 1fffff00000000 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575
<struct page 0xffffea0001ede200> 505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
<struct page 0xffffea0001ed8000> 0 0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA 1 4095
<struct page 0xffffea0001edffc0> 1fffff00000400 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575
BUG, zones differ!
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001ed7fc0 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 0 0 <<<<
ffffea0001ede1c0 7b787000 0 0 0 0
ffffea0001ede200 7b788000 0 0 1 1fffff00000000
Link: http://lkml.kernel.org/r/[email protected]
Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Make memblock_is_map/region_memory return bool due to these two
functions only using either true or false as its return value.
No functional change.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yaowei Bai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Some memory is reserved but unavailable: not present in memblock.memory
(because not backed by physical pages), but present in memblock.reserved.
Such memory has backing struct pages, but they are not initialized by
going through __init_single_page().
In some cases these struct pages are accessed even if they do not
contain any data. One example is page_to_pfn() might access page->flags
if this is where section information is stored (CONFIG_SPARSEMEM,
SECTION_IN_PAGE_FLAGS).
One example of such memory: trim_low_memory_range() unconditionally
reserves from pfn 0, but e820__memblock_setup() might provide the
exiting memory from pfn 1 (i.e. KVM).
Since struct pages are zeroed in __init_single_page(), and not during
allocation time, we must zero such struct pages explicitly.
The patch involves adding a new memblock iterator:
for_each_resv_unavail_range(i, p_start, p_end)
Which iterates through reserved && !memory lists, and we zero struct pages
explicitly by calling mm_zero_struct_page().
===
Here is more detailed example of problem that this patch is addressing:
Run tested on qemu with the following arguments:
-enable-kvm -cpu kvm64 -m 512 -smp 2
This patch reports that there are 98 unavailable pages.
They are: pfn 0 and pfns in range [159, 255].
Note, trim_low_memory_range() reserves only pfns in range [0, 15], it does
not reserve [159, 255] ones.
e820__memblock_setup() reports linux that the following physical ranges are
available:
[1 , 158]
[256, 130783]
Notice, that exactly unavailable pfns are missing!
Now, lets check what we have in zone 0: [1, 131039]
pfn 0, is not part of the zone, but pfns [1, 158], are.
However, the bigger problem we have if we do not initialize these struct
pages is with memory hotplug. Because, that path operates at 2M
boundaries (section_nr). And checks if 2M range of pages is hot
removable. It starts with first pfn from zone, rounds it down to 2M
boundary (sturct pages are allocated at 2M boundaries when vmemmap is
created), and checks if that section is hot removable. In this case
start with pfn 1 and convert it down to pfn 0. Later pfn is converted
to struct page, and some fields are checked. Now, if we do not zero
struct pages, we get unpredictable results.
In fact when CONFIG_VM_DEBUG is enabled, and we explicitly set all
vmemmap memory to ones, the following panic is observed with kernel test
without this patch applied:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: is_pageblock_removable_nolock+0x35/0x90
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT
...
task: ffff88001f4e2900 task.stack: ffffc90000314000
RIP: 0010:is_pageblock_removable_nolock+0x35/0x90
Call Trace:
? is_mem_section_removable+0x5a/0xd0
show_mem_removable+0x6b/0xa0
dev_attr_show+0x1b/0x50
sysfs_kf_seq_show+0xa1/0x100
kernfs_seq_show+0x22/0x30
seq_read+0x1ac/0x3a0
kernfs_fop_read+0x36/0x190
? security_file_permission+0x90/0xb0
__vfs_read+0x16/0x30
vfs_read+0x81/0x130
SyS_read+0x44/0xa0
entry_SYSCALL_64_fastpath+0x1f/0xbd
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Steven Sistare <[email protected]>
Reviewed-by: Daniel Jordan <[email protected]>
Reviewed-by: Bob Picco <[email protected]>
Tested-by: Bob Picco <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
for_each_memblock_type macro function relies on idx variable defined in
the caller context. Silent macro arguments are almost always wrong
thing to do. They make code harder to read and easier to get wrong.
Let's use an explicit iterator parameter for for_each_memblock_type and
make the code more obious. This patch is a mere cleanup and it
shouldn't introduce any functional change.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Gioh Kim <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is existing use after free bug when deferred struct pages are
enabled:
The memblock_add() allocates memory for the memory array if more than
128 entries are needed. See comment in e820__memblock_setup():
* The bootstrap memblock region count maximum is 128 entries
* (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
* than that - so allow memblock resizing.
This memblock memory is freed here:
free_low_memory_core_early()
We access the freed memblock.memory later in boot when deferred pages
are initialized in this path:
deferred_init_memmap()
for_each_mem_pfn_range()
__next_mem_pfn_range()
type = &memblock.memory;
One possible explanation for why this use-after-free hasn't been hit
before is that the limit of INIT_MEMBLOCK_REGIONS has never been
exceeded at least on systems where deferred struct pages were enabled.
Tested by reducing INIT_MEMBLOCK_REGIONS down to 4 from the current 128,
and verifying in qemu that this code is getting excuted and that the
freed pages are sane.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Steven Sistare <[email protected]>
Reviewed-by: Daniel Jordan <[email protected]>
Reviewed-by: Bob Picco <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
movable_node_is_enabled is defined in memblock proper while it is
initialized from the memory hotplug proper. This is quite messy and it
makes a dependency between the two so move movable_node along with the
helper functions to memory_hotplug.
To make it more entertaining the kernel parameter is ignored unless
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y because we do not have the node
information for each memblock otherwise. So let's warn when the option
is disabled.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Reza Arbab <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Xishi Qiu <[email protected]>
Cc: Kani Toshimitsu <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Daniel Kiper <[email protected]>
Cc: Igor Mammedov <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 20b2f52b73fe ("numa: add CONFIG_MOVABLE_NODE for
movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a
good explanation on why it is actually useful.
It makes a lot of sense to make movable node semantic opt in but we
already have that because the feature has to be explicitly enabled on
the kernel command line. A config option on top only makes the
configuration space larger without a good reason. It also adds an
additional ifdefery that pollutes the code.
Just drop the config option and make it de-facto always enabled. This
shouldn't introduce any change to the semantic.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Reza Arbab <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Xishi Qiu <[email protected]>
Cc: Kani Toshimitsu <[email protected]>
Cc: Chen Yucong <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Daniel Kiper <[email protected]>
Cc: Igor Mammedov <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We have seen an early OOM killer invocation on ppc64 systems with
crashkernel=4096M:
kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
kthreadd cpuset=/ mems_allowed=7
CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
Call Trace:
dump_stack+0xb0/0xf0 (unreliable)
dump_header+0xb0/0x258
out_of_memory+0x5f0/0x640
__alloc_pages_nodemask+0xa8c/0xc80
kmem_getpages+0x84/0x1a0
fallback_alloc+0x2a4/0x320
kmem_cache_alloc_node+0xc0/0x2e0
copy_process.isra.25+0x260/0x1b30
_do_fork+0x94/0x470
kernel_thread+0x48/0x60
kthreadd+0x264/0x330
ret_from_kernel_thread+0x5c/0xa4
Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:5 slab_unreclaimable:73
mapped:0 shmem:0 pagetables:0 bounce:0
free:0 free_pcp:0 free_cma:0
Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
0 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
819200 pages RAM
0 pages HighMem/MovableOnly
817481 pages reserved
0 pages cma reserved
0 pages hwpoisoned
the reason is that the managed memory is too low (only 110MB) while the
rest of the the 50GB is still waiting for the deferred intialization to
be done. update_defer_init estimates the initial memoty to initialize
to 2GB at least but it doesn't consider any memory allocated in that
range. In this particular case we've had
Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)
so the low 2GB is mostly depleted.
Fix this by considering memblock allocations in the initial static
initialization estimation. Move the max_initialise to
reset_deferred_meminit and implement a simple memblock_reserved_memory
helper which iterates all reserved blocks and sums the size of all that
start below the given address. The cumulative size is than added on top
of the initial estimation. This is still not ideal because
reset_deferred_meminit doesn't consider holes and so reservation might
be above the initial estimation whihch we ignore but let's make the
logic simpler until we really need to handle more complicated cases.
Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Tested-by: Srikar Dronamraju <[email protected]>
Cc: <[email protected]> [4.2+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add memblock_cap_memory_range() which will remove all the memblock regions
except the memory range specified in the arguments. In addition, rework is
done on memblock_mem_limit_remove_map() to re-implement it using
memblock_cap_memory_range().
This function, like memblock_mem_limit_remove_map(), will not remove
memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
later as "device memory."
See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
address the mem limit issue").
This function is used, in a succeeding patch in the series of arm64 kdump
suuport, to limit the range of usable memory, or System RAM, on crash dump
kernel.
(Please note that "mem=" parameter is of little use for this purpose.)
Signed-off-by: AKASHI Takahiro <[email protected]>
Reviewed-by: Will Deacon <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Dennis Chen <[email protected]>
Cc: [email protected]
Cc: Andrew Morton <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
|