Age | Commit message (Collapse) | Author | Files | Lines |
|
Allow sub-section sized ranges to be added to the memmap.
populate_section_memmap() takes an explict pfn range rather than
assuming a full section, and those parameters are plumbed all the way
through to vmmemap_populate(). There should be no sub-section usage in
current deployments. New warnings are added to clarify which memmap
allocation paths are sub-section capable.
Link: http://lkml.kernel.org/r/156092352058.979959.6551283472062305149.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
sub-section active bitmask, each bit representing a PMD_SIZE span of the
architecture's memory hotplug section size.
The implications of a partially populated section is that pfn_valid()
needs to go beyond a valid_section() check and either determine that the
section is an "early section", or read the sub-section active ranges
from the bitmask. The expectation is that the bitmask (subsection_map)
fits in the same cacheline as the valid_section() / early_section()
data, so the incremental performance overhead to pfn_valid() should be
negligible.
The rationale for using early_section() to short-ciruit the
subsection_map check is that there are legacy code paths that use
pfn_valid() at section granularity before validating the pfn against
pgdat data. So, the early_section() check allows those traditional
assumptions to persist while also permitting subsection_map to tell the
truth for purposes of populating the unused portions of early sections
with PMEM and other ZONE_DEVICE mappings.
Link: http://lkml.kernel.org/r/156092350874.979959.18185938451405518285.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Qian Cai <[email protected]>
Tested-by: Jane Chu <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In preparation for sub-section hotplug, track whether a given section
was created during early memory initialization, or later via memory
hotplug. This distinction is needed to maintain the coarse expectation
that pfn_valid() returns true for any pfn within a given section even if
that section has pages that are reserved from the page allocator.
For example one of the of goals of subsection hotplug is to support
cases where the system physical memory layout collides System RAM and
PMEM within a section. Several pfn_valid() users expect to just check
if a section is valid, but they are not careful to check if the given
pfn is within a "System RAM" boundary and instead expect pgdat
information to further validate the pfn.
Rather than unwind those paths to make their pfn_valid() queries more
precise a follow on patch uses the SECTION_IS_EARLY flag to maintain the
traditional expectation that pfn_valid() returns true for all early
sections.
Link: https://lore.kernel.org/lkml/[email protected]/
Link: http://lkml.kernel.org/r/156092350358.979959.5817209875548072819.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Qian Cai <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: Sub-section memory hotplug support", v10.
The memory hotplug section is an arbitrary / convenient unit for memory
hotplug. 'Section-size' units have bled into the user interface
('memblock' sysfs) and can not be changed without breaking existing
userspace. The section-size constraint, while mostly benign for typical
memory hotplug, has and continues to wreak havoc with 'device-memory'
use cases, persistent memory (pmem) in particular. Recall that pmem
uses devm_memremap_pages(), and subsequently arch_add_memory(), to
allocate a 'struct page' memmap for pmem. However, it does not use the
'bottom half' of memory hotplug, i.e. never marks pmem pages online and
never exposes the userspace memblock interface for pmem. This leaves an
opening to redress the section-size constraint.
To date, the libnvdimm subsystem has attempted to inject padding to
satisfy the internal constraints of arch_add_memory(). Beyond
complicating the code, leading to bugs [2], wasting memory, and limiting
configuration flexibility, the padding hack is broken when the platform
changes this physical memory alignment of pmem from one boot to the
next. Device failure (intermittent or permanent) and physical
reconfiguration are events that can cause the platform firmware to
change the physical placement of pmem on a subsequent boot, and device
failure is an everyday event in a data-center.
It turns out that sections are only a hard requirement of the
user-facing interface for memory hotplug and with a bit more
infrastructure sub-section arch_add_memory() support can be added for
kernel internal usages like devm_memremap_pages(). Here is an analysis
of the current design assumptions in the current code and how they are
addressed in the new implementation:
Current design assumptions:
- Sections that describe boot memory (early sections) are never
unplugged / removed.
- pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
valid_section() check
- __add_pages() and helper routines assume all operations occur in
PAGES_PER_SECTION units.
- The memblock sysfs interface only comprehends full sections
New design assumptions:
- Sections are instrumented with a sub-section bitmask to track (on
x86) individual 2MB sub-divisions of a 128MB section.
- Partially populated early sections can be extended with additional
sub-sections, and those sub-sections can be removed with
arch_remove_memory(). With this in place we no longer lose usable
memory capacity to padding.
- pfn_valid() is updated to look deeper than valid_section() to also
check the active-sub-section mask. This indication is in the same
cacheline as the valid_section() so the performance impact is
expected to be negligible. So far the lkp robot has not reported any
regressions.
- Outside of the core vmemmap population routines which are replaced,
other helper routines like shrink_{zone,pgdat}_span() are updated to
handle the smaller granularity. Core memory hotplug routines that
deal with online memory are not touched.
- The existing memblock sysfs user api guarantees / assumptions are not
touched since this capability is limited to !online
!memblock-sysfs-accessible sections.
Meanwhile the issue reports continue to roll in from users that do not
understand when and how the 128MB constraint will bite them. The current
implementation relied on being able to support at least one misaligned
namespace, but that immediately falls over on any moderately complex
namespace creation attempt. Beyond the initial problem of 'System RAM'
colliding with pmem, and the unsolvable problem of physical alignment
changes, Linux is now being exposed to platforms that collide pmem ranges
with other pmem ranges by default [3]. In short, devm_memremap_pages()
has pushed the venerable section-size constraint past the breaking point,
and the simplicity of section-aligned arch_add_memory() is no longer
tenable.
These patches are exposed to the kbuild robot on a subsection-v10 branch
[4], and a preview of the unit test for this functionality is available
on the 'subsection-pending' branch of ndctl [5].
[2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
[3]: https://github.com/pmem/ndctl/issues/76
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=subsection-v10
[5]: https://github.com/pmem/ndctl/commit/7c59b4867e1c
This patch (of 13):
Towards enabling memory hotplug to track partial population of a section,
introduce 'struct mem_section_usage'.
A pointer to a 'struct mem_section_usage' instance replaces the existing
pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to house
a new 'subsection_map' bitmap. The new bitmap enables the memory
hot{plug,remove} implementation to act on incremental sub-divisions of a
section.
SUBSECTION_SHIFT is defined as global constant instead of per-architecture
value like SECTION_SIZE_BITS in order to allow cross-arch compatibility of
subsection users. Specifically a common subsection size allows for the
possibility that persistent memory namespace configurations be made
compatible across architectures.
The primary motivation for this functionality is to support platforms that
mix "System RAM" and "Persistent Memory" within a single section, or
multiple PMEM ranges with different mapping lifetimes within a single
section. The section restriction for hotplug has caused an ongoing saga
of hacks and bugs for devm_memremap_pages() users.
Beyond the fixups to teach existing paths how to retrieve the 'usemap'
from a section, and updates to usemap allocation path, there are no
expected behavior changes.
Link: http://lkml.kernel.org/r/156092349845.979959.73333291612799019.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: Further memory block device cleanups", v1.
Some further cleanups around memory block devices. Especially, clean up
and simplify walk_memory_range(). Including some other minor cleanups.
This patch (of 6):
We are using a mixture of "int" and "unsigned long". Let's make this
consistent by using "unsigned long" everywhere. We'll do the same with
memory block ids next.
While at it, turn the "unsigned long i" in removable_show() into an int
- sections_per_block is an int.
[[email protected]: s/unsigned long i/unsigned long nr/]
[[email protected]: v3]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Baoquan He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In case of NODE_NOT_IN_PAGE_FLAGS is set, we store section's node id in
section_to_node_table[]. While for hot-add memory, this is missed.
Without this information, page_to_nid() may not give the right node id.
BTW, current online_pages works because it leverages nid in
memory_block. But the granularity of node id should be mem_section
wide.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The parameter is unused, so let's drop it. Memory removal paths should
never care about zones. This is the job of memory offlining and will
require more refactorings.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We want to improve error handling while adding memory by allowing to use
arch_remove_memory() and __remove_pages() even if
CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
arch_add_memory()
rc = do_something();
if (rc) {
arch_remove_memory();
}
We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
quite some dependencies for memory offlining.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The code comment above sparse_add_one_section() is obsolete and incorrect.
Clean it up and write a new one.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Baoquan He <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: Mukesh Ojha <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
memory to zones until online") introduced move_pfn_range_to_zone() which
calls memmap_init_zone() during onlining a memory block.
memmap_init_zone() will reset pagetype flags and makes migrate type to
be MOVABLE.
However, in __offline_pages(), it also call undo_isolate_page_range()
after offline_isolated_pages() to do the same thing. Due to commit
2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed
__first_valid_page() to skip offline pages, undo_isolate_page_range()
here just waste CPU cycles looping around the offlining PFN range while
doing nothing, because __first_valid_page() will return NULL as
offline_isolated_pages() has already marked all memory sections within
the pfn range as offline via offline_mem_sections().
Also, after calling the "useless" undo_isolate_page_range() here, it
reaches the point of no returning by notifying MEM_OFFLINE. Those pages
will be marked as MIGRATE_MOVABLE again once onlining. The only thing
left to do is to decrease the number of isolated pageblocks zone counter
which would make some paths of the page allocation slower that the above
commit introduced.
Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages
on ppc64, an "int" should still be enough to represent the number of
pageblocks there. Fix an incorrect comment along the way.
[[email protected]: v4]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages")
Signed-off-by: Qian Cai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]> [4.13+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As all the memblock allocation functions return NULL in case of error
rather than panic(), the duplicates with _nopanic suffix can be removed.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Petr Mladek <[email protected]> [printk]
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Guo Ren <[email protected]> [c-sky]
Cc: Heiko Carstens <[email protected]>
Cc: Juergen Gross <[email protected]> [Xen]
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add check for the return value of memblock_alloc*() functions and call
panic() in case of error. The panic message repeats the one used by
panicing memblock allocators with adjustment of parameters to include
only relevant ones.
The replacement was mostly automated with semantic patches like the one
below with manual massaging of format strings.
@@
expression ptr, size, align;
@@
ptr = memblock_alloc(size, align);
+ if (!ptr)
+ panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align);
[[email protected]: use '%pa' with 'phys_addr_t' type]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: fix format strings for panics after memblock_alloc]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: don't panic if the allocation in sparse_buffer_init fails]
Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx
[[email protected]: fix xtensa printk warning]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Anders Roxell <[email protected]>
Reviewed-by: Guo Ren <[email protected]> [c-sky]
Acked-by: Paul Burton <[email protected]> [MIPS]
Acked-by: Heiko Carstens <[email protected]> [s390]
Reviewed-by: Juergen Gross <[email protected]> [Xen]
Reviewed-by: Geert Uytterhoeven <[email protected]> [m68k]
Acked-by: Max Filippov <[email protected]> [xtensa]
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
next_present_section_nr() could only return an unsigned number -1, so
just check it specifically where compilers will convert -1 to unsigned
if needed.
mm/sparse.c: In function 'sparse_init_nid':
mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
((section_nr >= 0) && \
^~
mm/sparse.c:478:2: note: in expansion of macro
'for_each_present_section_nr'
for_each_present_section_nr(pnum_begin, pnum) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
((section_nr >= 0) && \
^~
mm/sparse.c:497:2: note: in expansion of macro
'for_each_present_section_nr'
for_each_present_section_nr(pnum_begin, pnum) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/sparse.c: In function 'sparse_init':
mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
((section_nr >= 0) && \
^~
mm/sparse.c:520:2: note: in expansion of macro
'for_each_present_section_nr'
for_each_present_section_nr(pnum_begin + 1, pnum_end) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
Link: http://lkml.kernel.org/r/[email protected]
Fixes: c4e1be9ec113 ("mm, sparsemem: break out of loops early")
Signed-off-by: Qian Cai <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since the information needed in sparse_add_one_section() is node id to
allocate proper memory, it is not necessary to pass its pgdat.
This patch changes the prototype of sparse_add_one_section() to pass node
id directly. This is intended to reduce misleading that
sparse_add_one_section() would touch pgdat.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
pgdat_resize_lock is used to protect pgdat's memory region information
like: node_start_pfn, node_present_pages, etc. While in function
sparse_add/remove_one_section(), pgdat_resize_lock is used to protect
initialization/release of one mem_section. This looks not proper.
These code paths are currently protected by mem_hotplug_lock currently but
should there ever be any reason for locking at the sparse layer a
dedicated lock should be introduced.
Following is the current call trace of sparse_add/remove_one_section()
mem_hotplug_begin()
arch_add_memory()
add_pages()
__add_pages()
__add_section()
sparse_add_one_section()
mem_hotplug_done()
mem_hotplug_begin()
arch_remove_memory()
__remove_pages()
__remove_section()
sparse_remove_one_section()
mem_hotplug_done()
The comment above the pgdat_resize_lock also mentions "Holding this will
also guarantee that any pfn_valid() stays that way.", which is true with
the current implementation and false after this patch. But current
implementation doesn't meet this comment. There isn't any pfn walkers to
take the lock so this looks like a relict from the past. This patch also
removes this comment.
[[email protected]: v4]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: changelog suggestion]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In hot remove, we try to clear poisoned pages, but a small optimization to
check if num_poisoned_pages is 0 helps remove the iteration through
nr_pages.
[[email protected]: tweak comment text]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Balbir Singh <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Presently the arches arm64, arm and sh have a function which loops
through each memblock and calls memory present. riscv will require a
similar function.
Introduce a common memblocks_present() function that can be used by all
the arches. Subsequent patches will cleanup the arches that make use of
this.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Logan Gunthorpe <[email protected]>
Acked-by: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When a memblock allocation APIs are called with align = 0, the alignment
is implicitly set to SMP_CACHE_BYTES.
Implicit alignment is done deep in the memblock allocator and it can
come as a surprise. Not that such an alignment would be wrong even
when used incorrectly but it is better to be explicit for the sake of
clarity and the prinicple of the least surprise.
Replace all such uses of memblock APIs with the 'align' parameter
explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
in the memblock internal allocation functions.
For the case when memblock APIs are used via helper functions, e.g. like
iommu_arena_new_node() in Alpha, the helper functions were detected with
Coccinelle's help and then manually examined and updated where
appropriate.
The direct memblock APIs users were updated using the semantic patch below:
@@
expression size, min_addr, max_addr, nid;
@@
(
|
- memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
nid)
|
- memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
+ memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
|
- memblock_alloc(size, 0)
+ memblock_alloc(size, SMP_CACHE_BYTES)
|
- memblock_alloc_raw(size, 0)
+ memblock_alloc_raw(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from(size, 0, min_addr)
+ memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_nopanic(size, 0)
+ memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low(size, 0)
+ memblock_alloc_low(size, SMP_CACHE_BYTES)
|
- memblock_alloc_low_nopanic(size, 0)
+ memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
|
- memblock_alloc_from_nopanic(size, 0, min_addr)
+ memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
|
- memblock_alloc_node(size, 0, nid)
+ memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
)
[[email protected]: changelog update]
[[email protected]: coding-style fixes]
[[email protected]: fix missed uses of implicit alignment]
Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Acked-by: Paul Burton <[email protected]> [MIPS]
Acked-by: Michael Ellerman <[email protected]> [powerpc]
Acked-by: Michal Hocko <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.
The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>
@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>
[[email protected]: dma-direct: fix up for the removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: powerpc: fix up for removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Serge Semin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Drop BOOTMEM_ALLOC_ACCESSIBLE and BOOTMEM_ALLOC_ANYWHERE in favor of
identical MEMBLOCK definitions.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Serge Semin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With the align parameter memblock_alloc_node() can be used as drop in
replacement for alloc_bootmem_pages_node() and __alloc_bootmem_node(),
which is done in the following patches.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Serge Semin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The conversion is done using
sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
$(git grep -l memblock_virt_alloc)
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Serge Semin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Address issues slowing persistent memory initialization", v5.
The main thing this patch set achieves is that it allows us to initialize
each node worth of persistent memory independently. As a result we reduce
page init time by about 2 minutes because instead of taking 30 to 40
seconds per node and going through each node one at a time, we process all
4 nodes in parallel in the case of a 12TB persistent memory setup spread
evenly over 4 nodes.
This patch (of 3):
On systems with a large amount of memory it can take a significant amount
of time to initialize all of the page structs with the PAGE_POISON_PATTERN
value. I have seen it take over 2 minutes to initialize a system with
over 12TB of RAM.
In order to work around the issue I had to disable CONFIG_DEBUG_VM and
then the boot time returned to something much more reasonable as the
arch_add_memory call completed in milliseconds versus seconds. However in
doing that I had to disable all of the other VM debugging on the system.
In order to work around a kernel that might have CONFIG_DEBUG_VM enabled
on a system that has a large amount of memory I have added a new kernel
parameter named "vm_debug" that can be set to "-" in order to disable it.
Link: http://lkml.kernel.org/r/[email protected]
Reviewed-by: Pavel Tatashin <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Rename new_sparse_init() to sparse_init() which enables it. Delete old
sparse_init() and all the code that became obsolete with.
[[email protected]: remove unused sparse_mem_maps_populate_node()]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Tested-by: Michael Ellerman <[email protected]> [powerpc]
Tested-by: Oscar Salvador <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Abdul Haleem <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
sparse_init() requires to temporary allocate two large buffers: usemap_map
and map_map. Baoquan He has identified that these buffers are so large
that Linux is not bootable on small memory machines, such as a kdump boot.
The buffers are especially large when CONFIG_X86_5LEVEL is set, as they
are scaled to the maximum physical memory size.
Baoquan provided a fix, which reduces these sizes of these buffers, but it
is much better to get rid of them entirely.
Add a new way to initialize sparse memory: sparse_init_nid(), which only
operates within one memory node, and thus allocates memory either in large
contiguous block or allocates section by section. This eliminates the
need for use of temporary buffers.
For simplified bisecting and review temporarly call sparse_init()
new_sparse_init(), the new interface is going to be enabled as well as old
code removed in the next patch.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Tested-by: Oscar Salvador <[email protected]>
Tested-by: Michael Ellerman <[email protected]> [powerpc]
Cc: Pasha Tatashin <[email protected]>
Cc: Abdul Haleem <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now that both variants of sparse memory use the same buffers to populate
memory map, we can move sparse_buffer_init()/sparse_buffer_fini() to the
common place.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Tested-by: Michael Ellerman <[email protected]> [powerpc]
Tested-by: Oscar Salvador <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Abdul Haleem <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
non-vmemmap sparse also allocated large contiguous chunk of memory, and if
fails falls back to smaller allocations. Use the same functions to
allocate buffer as the vmemmap-sparse
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Tested-by: Michael Ellerman <[email protected]> [powerpc]
Reviewed-by: Oscar Salvador <[email protected]>
Tested-by: Oscar Salvador <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Abdul Haleem <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "sparse_init rewrite", v6.
In sparse_init() we allocate two large buffers to temporary hold usemap
and memmap for the whole machine. However, we can avoid doing that if
we changed sparse_init() to operated on per-node bases instead of doing
it on the whole machine beforehand.
As shown by Baoquan
http://lkml.kernel.org/r/[email protected]
The buffers are large enough to cause machine stop to boot on small
memory systems.
Another benefit of these changes is that they also obsolete
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER.
This patch (of 5):
When struct pages are allocated for sparse-vmemmap VA layout, we first try
to allocate one large buffer, and than if that fails allocate struct pages
for each section as we go.
The code that allocates buffer is uses global variables and is spread
across several call sites.
Cleanup the code by introducing three functions to handle the global
buffer:
sparse_buffer_init() initialize the buffer
sparse_buffer_fini() free the remaining part of the buffer
sparse_buffer_alloc() alloc from the buffer, and if buffer is empty
return NULL
Define these functions in sparse.c instead of sparse-vmemmap.c because
later we will use them for non-vmemmap sparse allocations as well.
[[email protected]: use PTR_ALIGN()]
[[email protected]: s/BUG_ON/WARN_ON/]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Tested-by: Michael Ellerman <[email protected]> [powerpc]
Reviewed-by: Oscar Salvador <[email protected]>
Tested-by: Oscar Salvador <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Abdul Haleem <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. With the
help of these two arrays, continuous memory chunk is allocated for
usemap and memmap for memory sections on one node. This avoids too many
memory fragmentations. Like below diagram, '1' indicates the present
memory section, '0' means absent one. The number 'n' could be much
smaller than NR_MEM_SECTIONS on most of systems.
|1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
-------------------------------------------------------
0 1 2 3 4 5 i i+1 n-1 n
If we fail to populate the page tables to map one section's memmap, its
->section_mem_map will be cleared finally to indicate that it's not
present. After use, these two arrays will be released at the end of
sparse_init().
In 4-level paging mode, each array costs 4M which can be ignorable.
While in 5-level paging, they costs 256M each, 512M altogether. Kdump
kernel Usually only reserves very few memory, e.g 256M. So, even thouth
they are temporarily allocated, still not acceptable.
In fact, there's no need to allocate them with the size of
NR_MEM_SECTIONS. Since the ->section_mem_map clearing has been deferred
to the last, the number of present memory sections are kept the same
during sparse_init() until we finally clear out the memory section's
->section_mem_map if its usemap or memmap is not correctly handled.
Thus in the middle whenever for_each_present_section_nr() loop is taken,
the i-th present memory section is always the same one.
Here only allocate usemap_map and map_map with the size of
'nr_present_sections'. For the i-th present memory section, install its
usemap and memmap to usemap_map[i] and mam_map[i] during allocation.
Then in the last for_each_present_section_nr() loop which clears the
failed memory section's ->section_mem_map, fetch usemap and memmap from
usemap_map[] and map_map[] array and set them into mem_section[]
accordingly.
[[email protected]: coding-style fixes]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Baoquan He <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It's used to pass the size of map data unit into
alloc_usemap_and_memmap, and is preparation for next patch.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Baoquan He <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
will allocate one continuous memory chunk for mem maps on one node and
populate the relevant page tables to map memory section one by one. If
fail to populate for a certain mem section, print warning and its
->section_mem_map will be cleared to cancel the marking of being
present. Like this, the number of mem sections marked as present could
become less during sparse_init() execution.
Here just defer the ms->section_mem_map clearing if failed to populate
its page tables until the last for_each_present_section_nr() loop. This
is in preparation for later optimizing the mem map allocation.
[[email protected]: remove now-unused local `ms', per Oscar]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Baoquan He <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm/sparse: Optimize memmap allocation during
sparse_init()", v6.
In sparse_init(), two temporary pointer arrays, usemap_map and map_map
are allocated with the size of NR_MEM_SECTIONS. They are used to store
each memory section's usemap and mem map if marked as present. In
5-level paging mode, this will cost 512M memory though they will be
released at the end of sparse_init(). System with few memory, like
kdump kernel which usually only has about 256M, will fail to boot
because of allocation failure if CONFIG_X86_5LEVEL=y.
In this patchset, optimize the memmap allocation code to only use
usemap_map and map_map with the size of nr_present_sections. This makes
kdump kernel boot up with normal crashkernel='' setting when
CONFIG_X86_5LEVEL=y.
This patch (of 5):
nr_present_sections is used to record how many memory sections are
marked as present during system boot up, and will be used in the later
patch.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Baoquan He <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Pankaj Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
sparse_init_one_section() is being called from two sites: sparse_init()
and sparse_add_one_section(). The former calls it from a
for_each_present_section_nr() loop, and the latter marks the section as
present before calling it. This means that when
sparse_init_one_section() gets called, we already know that the section
is present. So there is no point to double check that in the function.
This removes the check and makes the function void.
[[email protected]: fix error path in sparse_add_one_section]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: simplification suggested by Oscar]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In commit c4e1be9ec113 ("mm, sparsemem: break out of loops early")
__highest_present_section_nr is introduced to reduce the loop counts for
present section. This is also helpful for usemap and memmap allocation.
This patch uses __highest_present_section_nr + 1 to optimize the loop.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When searching a present section, there are two boundaries:
* __highest_present_section_nr
* NR_MEM_SECTIONS
And it is known, __highest_present_section_nr is a more strict boundary
than NR_MEM_SECTIONS. This means it would be necessary to check
__highest_present_section_nr only.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Wei Yang <[email protected]>
Acked-by: David Rientjes <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Memory hotplug and hotremove operate with per-block granularity. If the
machine has a large amount of memory (more than 64G), the size of a
memory block can span multiple sections. By mistake, during hotremove
we set only the first section to offline state.
The bug was discovered because kernel selftest started to fail:
https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop
After commit, "mm/memory_hotplug: optimize probe routine". But, the bug
is older than this commit. In this optimization we also added a check
for sections to be in a proper state during hotplug operation.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
Signed-off-by: Pavel Tatashin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
During memory hotplugging we traverse struct pages three times:
1. memset(0) in sparse_add_one_section()
2. loop in __add_section() to set do: set_page_node(page, nid); and
SetPageReserved(page);
3. loop in memmap_init_zone() to call __init_single_pfn()
This patch removes the first two loops, and leaves only loop 3. All
struct pages are initialized in one place, the same as it is done during
boot.
The benefits:
- We improve memory hotplug performance because we are not evicting the
cache several times and also reduce loop branching overhead.
- Remove condition from hotpath in __init_single_pfn(), that was added
in order to fix the problem that was reported by Bharata in the above
email thread, thus also improve performance during normal boot.
- Make memory hotplug more similar to the boot memory initialization
path because we zero and initialize struct pages only in one
function.
- Simplifies memory hotplug struct page initialization code, and thus
enables future improvements, such as multi-threading the
initialization of struct pages in order to improve hotplug
performance even further on larger machines.
[[email protected]: v5]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Bharata B Rao <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pul removal of obsolete architecture ports from Arnd Bergmann:
"This removes the entire architecture code for blackfin, cris, frv,
m32r, metag, mn10300, score, and tile, including the associated device
drivers.
I have been working with the (former) maintainers for each one to
ensure that my interpretation was right and the code is definitely
unused in mainline kernels. Many had fond memories of working on the
respective ports to start with and getting them included in upstream,
but also saw no point in keeping the port alive without any users.
In the end, it seems that while the eight architectures are extremely
different, they all suffered the same fate: There was one company in
charge of an SoC line, a CPU microarchitecture and a software
ecosystem, which was more costly than licensing newer off-the-shelf
CPU cores from a third party (typically ARM, MIPS, or RISC-V). It
seems that all the SoC product lines are still around, but have not
used the custom CPU architectures for several years at this point. In
contrast, CPU instruction sets that remain popular and have actively
maintained kernel ports tend to all be used across multiple licensees.
[ See the new nds32 port merged in the previous commit for the next
generation of "one company in charge of an SoC line, a CPU
microarchitecture and a software ecosystem" - Linus ]
The removal came out of a discussion that is now documented at
https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
marking any ports as deprecated but remove them all at once after I
made sure that they are all unused. Some architectures (notably tile,
mn10300, and blackfin) are still being shipped in products with old
kernels, but those products will never be updated to newer kernel
releases.
After this series, we still have a few architectures without mainline
gcc support:
- unicore32 and hexagon both have very outdated gcc releases, but the
maintainers promised to work on providing something newer. At least
in case of hexagon, this will only be llvm, not gcc.
- openrisc, risc-v and nds32 are still in the process of finishing
their support or getting it added to mainline gcc in the first
place. They all have patched gcc-7.3 ports that work to some
degree, but complete upstream support won't happen before gcc-8.1.
Csky posted their first kernel patch set last week, their situation
will be similar
[ Palmer Dabbelt points out that RISC-V support is in mainline gcc
since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]"
This really says it all:
2498 files changed, 95 insertions(+), 467668 deletions(-)
* tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits)
MAINTAINERS: UNICORE32: Change email account
staging: iio: remove iio-trig-bfin-timer driver
tty: hvc: remove tile driver
tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers
serial: remove tile uart driver
serial: remove m32r_sio driver
serial: remove blackfin drivers
serial: remove cris/etrax uart drivers
usb: Remove Blackfin references in USB support
usb: isp1362: remove blackfin arch glue
usb: musb: remove blackfin port
usb: host: remove tilegx platform glue
pwm: remove pwm-bfin driver
i2c: remove bfin-twi driver
spi: remove blackfin related host drivers
watchdog: remove bfin_wdt driver
can: remove bfin_can driver
mmc: remove bfin_sdh driver
input: misc: remove blackfin rotary driver
input: keyboard: remove bf54x driver
...
|
|
CONFIG_NEED_NODE_MEMMAP_SIZE logic
node_memmap_size_bytes() has been unused since the v3.9 kernel, so remove it.
Signed-off-by: David Rientjes <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: f03574f2d5b2 ("x86-32, mm: Rip out x86_32 NUMA remapping code")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Tile was the only remaining architecture to implement alloc_remap(),
and since that is being removed, there is no point in keeping this
function.
Removing all callers simplifies the mem_map handling.
Reviewed-by: Pavel Tatashin <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Ross Zwisler:
- Require struct page by default for filesystem DAX to remove a number
of surprising failure cases. This includes failures with direct I/O,
gdb and fork(2).
- Add support for the new Platform Capabilities Structure added to the
NFIT in ACPI 6.2a. This new table tells us whether the platform
supports flushing of CPU and memory controller caches on unexpected
power loss events.
- Revamp vmem_altmap and dev_pagemap handling to clean up code and
better support future future PCI P2P uses.
- Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has
become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL
spec, and instead rely on the generic ND_CMD_CALL approach used by
the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}.
- Enhance nfit_test so we can test some of the new things added in
version 1.6 of the DSM specification. This includes testing firmware
download and simulating the Last Shutdown State (LSS) status.
* tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits)
libnvdimm, namespace: remove redundant initialization of 'nd_mapping'
acpi, nfit: fix register dimm error handling
libnvdimm, namespace: make min namespace size 4K
tools/testing/nvdimm: force nfit_test to depend on instrumented modules
libnvdimm/nfit_test: adding support for unit testing enable LSS status
libnvdimm/nfit_test: add firmware download emulation
nfit-test: Add platform cap support from ACPI 6.2a to test
libnvdimm: expose platform persistence attribute for nd_region
acpi: nfit: add persistent memory control flag for nd_region
acpi: nfit: Add support for detect platform CPU cache flush on power loss
device-dax: Fix trailing semicolon
libnvdimm, btt: fix uninitialized err_lock
dax: require 'struct page' by default for filesystem dax
ext2: auto disable dax instead of failing mount
ext4: auto disable dax instead of failing mount
mm, dax: introduce pfn_t_special()
mm: Fix devm_memremap_pages() collision handling
mm: Fix memory size alignment in devm_memremap_pages_release()
memremap: merge find_dev_pagemap into get_dev_pagemap
memremap: change devm_memremap_pages interface to use struct dev_pagemap
...
|
|
|
|
mem_map pointer
The comment is confusing. On the one hand, it refers to 32-bit
alignment (struct page alignment on 32-bit platforms), but this would
only guarantee that the 2 lowest bits must be zero. On the other hand,
it claims that at least 3 bits are available, and 3 bits are actually
used.
This is not broken, because there is a stronger alignment guarantee,
just less obvious. Let's fix the comment to make it clear how many bits
are available and why.
Although memmap arrays are allocated in various places, the resulting
pointer is encoded eventually, so I am adding a BUG_ON() here to enforce
at runtime that all expected bits are indeed available.
I have also added a BUILD_BUG_ON to check that PFN_SECTION_SHIFT is
sufficient, because this part of the calculation can be easily checked
at build time.
[[email protected]: v2]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Tesarik <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kemi Wang <[email protected]>
Cc: YASUAKI ISHIMATSU <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
In commit 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime
for CONFIG_SPARSEMEM_EXTREME=y") mem_section is allocated at runtime to
save memory.
It allocates the first dimension of array with sizeof(struct mem_section).
It costs extra memory, should be sizeof(struct mem_section *).
Fix it.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
Signed-off-by: Baoquan He <[email protected]>
Tested-by: Dave Young <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Atsushi Kumagai <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
vmemmap_alloc_block() will no longer zero the block, so zero memory at
its call sites for everything except struct pages. Struct page memory
is zero'd by struct page initialization.
Replace allocators in sparse-vmemmap to use the non-zeroing version.
So, we will get the performance improvement by zeroing the memory in
parallel when struct pages are zeroed.
Add struct page zeroing as a part of initialization of other fields in
__init_single_page().
This single thread performance collected on: Intel(R) Xeon(R) CPU E7-8895
v3 @ 2.60GHz with 1T of memory (268400646 pages in 8 nodes):
BASE FIX
sparse_init 11.244671836s 0.007199623s
zone_sizes_init 4.879775891s 8.355182299s
--------------------------
Total 16.124447727s 8.362381922s
sparse_init is where memory for struct pages is zeroed, and the zeroing
part is moved later in this patch into __init_single_page(), which is
called from zone_sizes_init().
[[email protected]: make vmemmap_alloc_block_zero() private to sparse-vmemmap.c]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Steven Sistare <[email protected]>
Reviewed-by: Daniel Jordan <[email protected]>
Reviewed-by: Bob Picco <[email protected]>
Tested-by: Bob Picco <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Most of x86/mm is already in x86/asm, so merge the rest too.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Since commit:
83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
we allocate the mem_section array dynamically in sparse_memory_present_with_active_regions(),
but some architectures, like arm64, don't call the routine to initialize sparsemem.
Let's move the initialization into memory_present() it should cover all
architectures.
Reported-and-tested-by: Sudeep Holla <[email protected]>
Tested-by: Bjorn Andersson <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Will Deacon <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Conflicts:
arch/x86/kernel/cpu/Makefile
Signed-off-by: Ingo Molnar <[email protected]>
|