aboutsummaryrefslogtreecommitdiff
path: root/mm/sparse.c
AgeCommit message (Collapse)AuthorFilesLines
2019-07-18mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()Dan Williams1-23/+27
Allow sub-section sized ranges to be added to the memmap. populate_section_memmap() takes an explict pfn range rather than assuming a full section, and those parameters are plumbed all the way through to vmmemap_populate(). There should be no sub-section usage in current deployments. New warnings are added to clarify which memmap allocation paths are sub-section capable. Link: http://lkml.kernel.org/r/156092352058.979959.6551283472062305149.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64] Reviewed-by: Oscar Salvador <[email protected]> Cc: Michal Hocko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-18mm/sparsemem: add helpers track active portions of a section at bootDan Williams1-0/+35
Prepare for hot{plug,remove} of sub-ranges of a section by tracking a sub-section active bitmask, each bit representing a PMD_SIZE span of the architecture's memory hotplug section size. The implications of a partially populated section is that pfn_valid() needs to go beyond a valid_section() check and either determine that the section is an "early section", or read the sub-section active ranges from the bitmask. The expectation is that the bitmask (subsection_map) fits in the same cacheline as the valid_section() / early_section() data, so the incremental performance overhead to pfn_valid() should be negligible. The rationale for using early_section() to short-ciruit the subsection_map check is that there are legacy code paths that use pfn_valid() at section granularity before validating the pfn against pgdat data. So, the early_section() check allows those traditional assumptions to persist while also permitting subsection_map to tell the truth for purposes of populating the unused portions of early sections with PMEM and other ZONE_DEVICE mappings. Link: http://lkml.kernel.org/r/156092350874.979959.18185938451405518285.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Reported-by: Qian Cai <[email protected]> Tested-by: Jane Chu <[email protected]> Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64] Reviewed-by: Oscar Salvador <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Wei Yang <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-18mm/sparsemem: introduce a SECTION_IS_EARLY flagDan Williams1-11/+9
In preparation for sub-section hotplug, track whether a given section was created during early memory initialization, or later via memory hotplug. This distinction is needed to maintain the coarse expectation that pfn_valid() returns true for any pfn within a given section even if that section has pages that are reserved from the page allocator. For example one of the of goals of subsection hotplug is to support cases where the system physical memory layout collides System RAM and PMEM within a section. Several pfn_valid() users expect to just check if a section is valid, but they are not careful to check if the given pfn is within a "System RAM" boundary and instead expect pgdat information to further validate the pfn. Rather than unwind those paths to make their pfn_valid() queries more precise a follow on patch uses the SECTION_IS_EARLY flag to maintain the traditional expectation that pfn_valid() returns true for all early sections. Link: https://lore.kernel.org/lkml/[email protected]/ Link: http://lkml.kernel.org/r/156092350358.979959.5817209875548072819.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Reported-by: Qian Cai <[email protected]> Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64] Reviewed-by: Oscar Salvador <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Jane Chu <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-18mm/sparsemem: introduce struct mem_section_usageDan Williams1-41/+40
Patch series "mm: Sub-section memory hotplug support", v10. The memory hotplug section is an arbitrary / convenient unit for memory hotplug. 'Section-size' units have bled into the user interface ('memblock' sysfs) and can not be changed without breaking existing userspace. The section-size constraint, while mostly benign for typical memory hotplug, has and continues to wreak havoc with 'device-memory' use cases, persistent memory (pmem) in particular. Recall that pmem uses devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a 'struct page' memmap for pmem. However, it does not use the 'bottom half' of memory hotplug, i.e. never marks pmem pages online and never exposes the userspace memblock interface for pmem. This leaves an opening to redress the section-size constraint. To date, the libnvdimm subsystem has attempted to inject padding to satisfy the internal constraints of arch_add_memory(). Beyond complicating the code, leading to bugs [2], wasting memory, and limiting configuration flexibility, the padding hack is broken when the platform changes this physical memory alignment of pmem from one boot to the next. Device failure (intermittent or permanent) and physical reconfiguration are events that can cause the platform firmware to change the physical placement of pmem on a subsequent boot, and device failure is an everyday event in a data-center. It turns out that sections are only a hard requirement of the user-facing interface for memory hotplug and with a bit more infrastructure sub-section arch_add_memory() support can be added for kernel internal usages like devm_memremap_pages(). Here is an analysis of the current design assumptions in the current code and how they are addressed in the new implementation: Current design assumptions: - Sections that describe boot memory (early sections) are never unplugged / removed. - pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a valid_section() check - __add_pages() and helper routines assume all operations occur in PAGES_PER_SECTION units. - The memblock sysfs interface only comprehends full sections New design assumptions: - Sections are instrumented with a sub-section bitmask to track (on x86) individual 2MB sub-divisions of a 128MB section. - Partially populated early sections can be extended with additional sub-sections, and those sub-sections can be removed with arch_remove_memory(). With this in place we no longer lose usable memory capacity to padding. - pfn_valid() is updated to look deeper than valid_section() to also check the active-sub-section mask. This indication is in the same cacheline as the valid_section() so the performance impact is expected to be negligible. So far the lkp robot has not reported any regressions. - Outside of the core vmemmap population routines which are replaced, other helper routines like shrink_{zone,pgdat}_span() are updated to handle the smaller granularity. Core memory hotplug routines that deal with online memory are not touched. - The existing memblock sysfs user api guarantees / assumptions are not touched since this capability is limited to !online !memblock-sysfs-accessible sections. Meanwhile the issue reports continue to roll in from users that do not understand when and how the 128MB constraint will bite them. The current implementation relied on being able to support at least one misaligned namespace, but that immediately falls over on any moderately complex namespace creation attempt. Beyond the initial problem of 'System RAM' colliding with pmem, and the unsolvable problem of physical alignment changes, Linux is now being exposed to platforms that collide pmem ranges with other pmem ranges by default [3]. In short, devm_memremap_pages() has pushed the venerable section-size constraint past the breaking point, and the simplicity of section-aligned arch_add_memory() is no longer tenable. These patches are exposed to the kbuild robot on a subsection-v10 branch [4], and a preview of the unit test for this functionality is available on the 'subsection-pending' branch of ndctl [5]. [2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com [3]: https://github.com/pmem/ndctl/issues/76 [4]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=subsection-v10 [5]: https://github.com/pmem/ndctl/commit/7c59b4867e1c This patch (of 13): Towards enabling memory hotplug to track partial population of a section, introduce 'struct mem_section_usage'. A pointer to a 'struct mem_section_usage' instance replaces the existing pointer to a 'pageblock_flags' bitmap. Effectively it adds one more 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to house a new 'subsection_map' bitmap. The new bitmap enables the memory hot{plug,remove} implementation to act on incremental sub-divisions of a section. SUBSECTION_SHIFT is defined as global constant instead of per-architecture value like SECTION_SIZE_BITS in order to allow cross-arch compatibility of subsection users. Specifically a common subsection size allows for the possibility that persistent memory namespace configurations be made compatible across architectures. The primary motivation for this functionality is to support platforms that mix "System RAM" and "Persistent Memory" within a single section, or multiple PMEM ranges with different mapping lifetimes within a single section. The section restriction for hotplug has caused an ongoing saga of hacks and bugs for devm_memremap_pages() users. Beyond the fixups to teach existing paths how to retrieve the 'usemap' from a section, and updates to usemap allocation path, there are no expected behavior changes. Link: http://lkml.kernel.org/r/156092349845.979959.73333291612799019.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Wei Yang <[email protected]> Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64] Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Jane Chu <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Qian Cai <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-18mm: section numbers use the type "unsigned long"David Hildenbrand1-6/+6
Patch series "mm: Further memory block device cleanups", v1. Some further cleanups around memory block devices. Especially, clean up and simplify walk_memory_range(). Including some other minor cleanups. This patch (of 6): We are using a mixture of "int" and "unsigned long". Let's make this consistent by using "unsigned long" everywhere. We'll do the same with memory block ids next. While at it, turn the "unsigned long i" in removable_show() into an int - sections_per_block is an int. [[email protected]: s/unsigned long i/unsigned long nr/] [[email protected]: v3] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dan Williams <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Wei Yang <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Arun KS <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-18mm/sparse.c: set section nid for hot-add memoryWei Yang1-0/+1
In case of NODE_NOT_IN_PAGE_FLAGS is set, we store section's node id in section_to_node_table[]. While for hot-add memory, this is missed. Without this information, page_to_nid() may not give the right node id. BTW, current online_pages works because it leverages nid in memory_block. But the granularity of node id should be mem_section wide. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-18mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_sectionDavid Hildenbrand1-2/+2
The parameter is unused, so let's drop it. Memory removal paths should never care about zones. This is the job of memory offlining and will require more refactorings. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Andrew Banman <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Arun KS <[email protected]> Cc: Baoquan He <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chintan Pandya <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Jun Yao <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Mark Brown <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "[email protected]" <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Qian Cai <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Rich Felker <[email protected]> Cc: Rob Herring <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-18mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVEDavid Hildenbrand1-6/+0
We want to improve error handling while adding memory by allowing to use arch_remove_memory() and __remove_pages() even if CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like: arch_add_memory() rc = do_something(); if (rc) { arch_remove_memory(); } We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require quite some dependencies for memory offlining. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Alex Deucher <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Mark Brown <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Rob Herring <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: "[email protected]" <[email protected]> Cc: Andrew Banman <[email protected]> Cc: Arun KS <[email protected]> Cc: Qian Cai <[email protected]> Cc: Mathieu Malaterre <[email protected]> Cc: Baoquan He <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chintan Pandya <[email protected]> Cc: Dan Williams <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Jun Yao <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Wei Yang <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14mm/sparse.c: clean up obsolete code commentBaoquan He1-4/+12
The code comment above sparse_add_one_section() is obsolete and incorrect. Clean it up and write a new one. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Mukesh Ojha <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-29mm/hotplug: fix offline undo_isolate_page_range()Qian Cai1-1/+1
Commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") introduced move_pfn_range_to_zone() which calls memmap_init_zone() during onlining a memory block. memmap_init_zone() will reset pagetype flags and makes migrate type to be MOVABLE. However, in __offline_pages(), it also call undo_isolate_page_range() after offline_isolated_pages() to do the same thing. Due to commit 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed __first_valid_page() to skip offline pages, undo_isolate_page_range() here just waste CPU cycles looping around the offlining PFN range while doing nothing, because __first_valid_page() will return NULL as offline_isolated_pages() has already marked all memory sections within the pfn range as offline via offline_mem_sections(). Also, after calling the "useless" undo_isolate_page_range() here, it reaches the point of no returning by notifying MEM_OFFLINE. Those pages will be marked as MIGRATE_MOVABLE again once onlining. The only thing left to do is to decrease the number of isolated pageblocks zone counter which would make some paths of the page allocation slower that the above commit introduced. Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages on ppc64, an "int" should still be enough to represent the number of pageblocks there. Fix an incorrect comment along the way. [[email protected]: v4] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") Signed-off-by: Qian Cai <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> [4.13+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-12memblock: drop memblock_alloc_*_nopanic() variantsMike Rapoport1-4/+2
As all the memblock allocation functions return NULL in case of error rather than panic(), the duplicates with _nopanic suffix can be removed. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Petr Mladek <[email protected]> [printk] Cc: Catalin Marinas <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> [c-sky] Cc: Heiko Carstens <[email protected]> Cc: Juergen Gross <[email protected]> [Xen] Cc: Mark Salter <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Paul Burton <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Rob Herring <[email protected]> Cc: Rob Herring <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-12treewide: add checks for the return value of memblock_alloc*()Mike Rapoport1-4/+17
Add check for the return value of memblock_alloc*() functions and call panic() in case of error. The panic message repeats the one used by panicing memblock allocators with adjustment of parameters to include only relevant ones. The replacement was mostly automated with semantic patches like the one below with manual massaging of format strings. @@ expression ptr, size, align; @@ ptr = memblock_alloc(size, align); + if (!ptr) + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align); [[email protected]: use '%pa' with 'phys_addr_t' type] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix format strings for panics after memblock_alloc] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: don't panic if the allocation in sparse_buffer_init fails] Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx [[email protected]: fix xtensa printk warning] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Anders Roxell <[email protected]> Reviewed-by: Guo Ren <[email protected]> [c-sky] Acked-by: Paul Burton <[email protected]> [MIPS] Acked-by: Heiko Carstens <[email protected]> [s390] Reviewed-by: Juergen Gross <[email protected]> [Xen] Reviewed-by: Geert Uytterhoeven <[email protected]> [m68k] Acked-by: Max Filippov <[email protected]> [xtensa] Cc: Catalin Marinas <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Rob Herring <[email protected]> Cc: Rob Herring <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-05mm/sparse: fix a bad comparisonQian Cai1-1/+1
next_present_section_nr() could only return an unsigned number -1, so just check it specifically where compilers will convert -1 to unsigned if needed. mm/sparse.c: In function 'sparse_init_nid': mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] ((section_nr >= 0) && \ ^~ mm/sparse.c:478:2: note: in expansion of macro 'for_each_present_section_nr' for_each_present_section_nr(pnum_begin, pnum) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] ((section_nr >= 0) && \ ^~ mm/sparse.c:497:2: note: in expansion of macro 'for_each_present_section_nr' for_each_present_section_nr(pnum_begin, pnum) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/sparse.c: In function 'sparse_init': mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] ((section_nr >= 0) && \ ^~ mm/sparse.c:520:2: note: in expansion of macro 'for_each_present_section_nr' for_each_present_section_nr(pnum_begin + 1, pnum_end) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Link: http://lkml.kernel.org/r/[email protected] Fixes: c4e1be9ec113 ("mm, sparsemem: break out of loops early") Signed-off-by: Qian Cai <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-12-28mm, sparse: pass nid instead of pgdat to sparse_add_one_section()Wei Yang1-4/+4
Since the information needed in sparse_add_one_section() is node id to allocate proper memory, it is not necessary to pass its pgdat. This patch changes the prototype of sparse_add_one_section() to pass node id directly. This is intended to reduce misleading that sparse_add_one_section() would touch pgdat. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-12-28mm, sparse: drop pgdat_resize_lock in sparse_add/remove_one_section()Wei Yang1-8/+1
pgdat_resize_lock is used to protect pgdat's memory region information like: node_start_pfn, node_present_pages, etc. While in function sparse_add/remove_one_section(), pgdat_resize_lock is used to protect initialization/release of one mem_section. This looks not proper. These code paths are currently protected by mem_hotplug_lock currently but should there ever be any reason for locking at the sparse layer a dedicated lock should be introduced. Following is the current call trace of sparse_add/remove_one_section() mem_hotplug_begin() arch_add_memory() add_pages() __add_pages() __add_section() sparse_add_one_section() mem_hotplug_done() mem_hotplug_begin() arch_remove_memory() __remove_pages() __remove_section() sparse_remove_one_section() mem_hotplug_done() The comment above the pgdat_resize_lock also mentions "Holding this will also guarantee that any pfn_valid() stays that way.", which is true with the current implementation and false after this patch. But current implementation doesn't meet this comment. There isn't any pfn walkers to take the lock so this looks like a relict from the past. This patch also removes this comment. [[email protected]: v4] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: changelog suggestion] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-12-28mm/hotplug: optimize clear_hwpoisoned_pages()Balbir Singh1-0/+9
In hot remove, we try to clear poisoned pages, but a small optimization to check if num_poisoned_pages is 0 helps remove the iteration through nr_pages. [[email protected]: tweak comment text] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Balbir Singh <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-12-14mm/sparse: add common helper to mark all memblocks presentLogan Gunthorpe1-0/+16
Presently the arches arm64, arm and sh have a function which loops through each memblock and calls memory present. riscv will require a similar function. Introduce a common memblocks_present() function that can be used by all the arches. Subsequent patches will cleanup the arches that make use of this. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Logan Gunthorpe <[email protected]> Acked-by: Andrew Morton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-31memblock: stop using implicit alignment to SMP_CACHE_BYTESMike Rapoport1-1/+2
When a memblock allocation APIs are called with align = 0, the alignment is implicitly set to SMP_CACHE_BYTES. Implicit alignment is done deep in the memblock allocator and it can come as a surprise. Not that such an alignment would be wrong even when used incorrectly but it is better to be explicit for the sake of clarity and the prinicple of the least surprise. Replace all such uses of memblock APIs with the 'align' parameter explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment in the memblock internal allocation functions. For the case when memblock APIs are used via helper functions, e.g. like iommu_arena_new_node() in Alpha, the helper functions were detected with Coccinelle's help and then manually examined and updated where appropriate. The direct memblock APIs users were updated using the semantic patch below: @@ expression size, min_addr, max_addr, nid; @@ ( | - memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid) + memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr, nid) | - memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid) + memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr, nid) | - memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid) + memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid) | - memblock_alloc(size, 0) + memblock_alloc(size, SMP_CACHE_BYTES) | - memblock_alloc_raw(size, 0) + memblock_alloc_raw(size, SMP_CACHE_BYTES) | - memblock_alloc_from(size, 0, min_addr) + memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr) | - memblock_alloc_nopanic(size, 0) + memblock_alloc_nopanic(size, SMP_CACHE_BYTES) | - memblock_alloc_low(size, 0) + memblock_alloc_low(size, SMP_CACHE_BYTES) | - memblock_alloc_low_nopanic(size, 0) + memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES) | - memblock_alloc_from_nopanic(size, 0, min_addr) + memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr) | - memblock_alloc_node(size, 0, nid) + memblock_alloc_node(size, SMP_CACHE_BYTES, nid) ) [[email protected]: changelog update] [[email protected]: coding-style fixes] [[email protected]: fix missed uses of implicit alignment] Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Suggested-by: Michal Hocko <[email protected]> Acked-by: Paul Burton <[email protected]> [MIPS] Acked-by: Michael Ellerman <[email protected]> [powerpc] Acked-by: Michal Hocko <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michal Simek <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-31mm: remove include/linux/bootmem.hMike Rapoport1-1/+0
Move remaining definitions and declarations from include/linux/bootmem.h into include/linux/memblock.h and remove the redundant header. The includes were replaced with the semantic patch below and then semi-automated removal of duplicated '#include <linux/memblock.h> @@ @@ - #include <linux/bootmem.h> + #include <linux/memblock.h> [[email protected]: dma-direct: fix up for the removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: powerpc: fix up for removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Serge Semin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-31memblock: replace BOOTMEM_ALLOC_* with MEMBLOCK variantsMike Rapoport1-2/+3
Drop BOOTMEM_ALLOC_ACCESSIBLE and BOOTMEM_ALLOC_ANYWHERE in favor of identical MEMBLOCK definitions. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Serge Semin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-31memblock: add align parameter to memblock_alloc_node()Mike Rapoport1-1/+1
With the align parameter memblock_alloc_node() can be used as drop in replacement for alloc_bootmem_pages_node() and __alloc_bootmem_node(), which is done in the following patches. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Simek <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Serge Semin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-31memblock: remove _virt from APIs returning virtual addressMike Rapoport1-6/+6
The conversion is done using sed -i 's@memblock_virt_alloc@memblock_alloc@g' \ $(git grep -l memblock_virt_alloc) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Simek <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Serge Semin <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-10-26mm: provide kernel parameter to allow disabling page init poisoningAlexander Duyck1-3/+1
Patch series "Address issues slowing persistent memory initialization", v5. The main thing this patch set achieves is that it allows us to initialize each node worth of persistent memory independently. As a result we reduce page init time by about 2 minutes because instead of taking 30 to 40 seconds per node and going through each node one at a time, we process all 4 nodes in parallel in the case of a 12TB persistent memory setup spread evenly over 4 nodes. This patch (of 3): On systems with a large amount of memory it can take a significant amount of time to initialize all of the page structs with the PAGE_POISON_PATTERN value. I have seen it take over 2 minutes to initialize a system with over 12TB of RAM. In order to work around the issue I had to disable CONFIG_DEBUG_VM and then the boot time returned to something much more reasonable as the arch_add_memory call completed in milliseconds versus seconds. However in doing that I had to disable all of the other VM debugging on the system. In order to work around a kernel that might have CONFIG_DEBUG_VM enabled on a system that has a large amount of memory I have added a new kernel parameter named "vm_debug" that can be set to "-" in order to disable it. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Pavel Tatashin <[email protected]> Signed-off-by: Alexander Duyck <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-17mm/sparse: delete old sparse_init and enable new onePavel Tatashin1-236/+1
Rename new_sparse_init() to sparse_init() which enables it. Delete old sparse_init() and all the code that became obsolete with. [[email protected]: remove unused sparse_mem_maps_populate_node()] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Tested-by: Michael Ellerman <[email protected]> [powerpc] Tested-by: Oscar Salvador <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Abdul Haleem <[email protected]> Cc: Baoquan He <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Souptick Joarder <[email protected]> Cc: Steven Sistare <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-17mm/sparse: add new sparse_init_nid() and sparse_init()Pavel Tatashin1-0/+85
sparse_init() requires to temporary allocate two large buffers: usemap_map and map_map. Baoquan He has identified that these buffers are so large that Linux is not bootable on small memory machines, such as a kdump boot. The buffers are especially large when CONFIG_X86_5LEVEL is set, as they are scaled to the maximum physical memory size. Baoquan provided a fix, which reduces these sizes of these buffers, but it is much better to get rid of them entirely. Add a new way to initialize sparse memory: sparse_init_nid(), which only operates within one memory node, and thus allocates memory either in large contiguous block or allocates section by section. This eliminates the need for use of temporary buffers. For simplified bisecting and review temporarly call sparse_init() new_sparse_init(), the new interface is going to be enabled as well as old code removed in the next patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Tested-by: Oscar Salvador <[email protected]> Tested-by: Michael Ellerman <[email protected]> [powerpc] Cc: Pasha Tatashin <[email protected]> Cc: Abdul Haleem <[email protected]> Cc: Baoquan He <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Souptick Joarder <[email protected]> Cc: Steven Sistare <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-17mm/sparse: move buffer init/fini to the common placePavel Tatashin1-7/+7
Now that both variants of sparse memory use the same buffers to populate memory map, we can move sparse_buffer_init()/sparse_buffer_fini() to the common place. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Tested-by: Michael Ellerman <[email protected]> [powerpc] Tested-by: Oscar Salvador <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Abdul Haleem <[email protected]> Cc: Baoquan He <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Souptick Joarder <[email protected]> Cc: Steven Sistare <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-17mm/sparse: use the new sparse buffer functions in non-vmemmapPavel Tatashin1-27/+14
non-vmemmap sparse also allocated large contiguous chunk of memory, and if fails falls back to smaller allocations. Use the same functions to allocate buffer as the vmemmap-sparse Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Tested-by: Michael Ellerman <[email protected]> [powerpc] Reviewed-by: Oscar Salvador <[email protected]> Tested-by: Oscar Salvador <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Abdul Haleem <[email protected]> Cc: Baoquan He <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Souptick Joarder <[email protected]> Cc: Steven Sistare <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-17mm/sparse: abstract sparse buffer allocationsPavel Tatashin1-1/+44
Patch series "sparse_init rewrite", v6. In sparse_init() we allocate two large buffers to temporary hold usemap and memmap for the whole machine. However, we can avoid doing that if we changed sparse_init() to operated on per-node bases instead of doing it on the whole machine beforehand. As shown by Baoquan http://lkml.kernel.org/r/[email protected] The buffers are large enough to cause machine stop to boot on small memory systems. Another benefit of these changes is that they also obsolete CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER. This patch (of 5): When struct pages are allocated for sparse-vmemmap VA layout, we first try to allocate one large buffer, and than if that fails allocate struct pages for each section as we go. The code that allocates buffer is uses global variables and is spread across several call sites. Cleanup the code by introducing three functions to handle the global buffer: sparse_buffer_init() initialize the buffer sparse_buffer_fini() free the remaining part of the buffer sparse_buffer_alloc() alloc from the buffer, and if buffer is empty return NULL Define these functions in sparse.c instead of sparse-vmemmap.c because later we will use them for non-vmemmap sparse allocations as well. [[email protected]: use PTR_ALIGN()] [[email protected]: s/BUG_ON/WARN_ON/] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Tested-by: Michael Ellerman <[email protected]> [powerpc] Reviewed-by: Oscar Salvador <[email protected]> Tested-by: Oscar Salvador <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Steven Sistare <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Souptick Joarder <[email protected]> Cc: Baoquan He <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wei Yang <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Abdul Haleem <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-17mm/sparse: optimize memmap allocation during sparse_init()Baoquan He1-9/+37
In sparse_init(), two temporary pointer arrays, usemap_map and map_map are allocated with the size of NR_MEM_SECTIONS. They are used to store each memory section's usemap and mem map if marked as present. With the help of these two arrays, continuous memory chunk is allocated for usemap and memmap for memory sections on one node. This avoids too many memory fragmentations. Like below diagram, '1' indicates the present memory section, '0' means absent one. The number 'n' could be much smaller than NR_MEM_SECTIONS on most of systems. |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0| ------------------------------------------------------- 0 1 2 3 4 5 i i+1 n-1 n If we fail to populate the page tables to map one section's memmap, its ->section_mem_map will be cleared finally to indicate that it's not present. After use, these two arrays will be released at the end of sparse_init(). In 4-level paging mode, each array costs 4M which can be ignorable. While in 5-level paging, they costs 256M each, 512M altogether. Kdump kernel Usually only reserves very few memory, e.g 256M. So, even thouth they are temporarily allocated, still not acceptable. In fact, there's no need to allocate them with the size of NR_MEM_SECTIONS. Since the ->section_mem_map clearing has been deferred to the last, the number of present memory sections are kept the same during sparse_init() until we finally clear out the memory section's ->section_mem_map if its usemap or memmap is not correctly handled. Thus in the middle whenever for_each_present_section_nr() loop is taken, the i-th present memory section is always the same one. Here only allocate usemap_map and map_map with the size of 'nr_present_sections'. For the i-th present memory section, install its usemap and memmap to usemap_map[i] and mam_map[i] during allocation. Then in the last for_each_present_section_nr() loop which clears the failed memory section's ->section_mem_map, fetch usemap and memmap from usemap_map[] and map_map[] array and set them into mem_section[] accordingly. [[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pankaj Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-17mm/sparse.c: add a new parameter 'data_unit_size' for alloc_usemap_and_memmapBaoquan He1-3/+7
It's used to pass the size of map data unit into alloc_usemap_and_memmap, and is preparation for next patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Pankaj Gupta <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-17mm/sparsemem.c: defer the ms->section_mem_map clearingBaoquan He1-4/+8
In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system will allocate one continuous memory chunk for mem maps on one node and populate the relevant page tables to map memory section one by one. If fail to populate for a certain mem section, print warning and its ->section_mem_map will be cleared to cancel the marking of being present. Like this, the number of mem sections marked as present could become less during sparse_init() execution. Here just defer the ms->section_mem_map clearing if failed to populate its page tables until the last for_each_present_section_nr() loop. This is in preparation for later optimizing the mem map allocation. [[email protected]: remove now-unused local `ms', per Oscar] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Acked-by: Dave Hansen <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Pankaj Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-17mm/sparse.c: add a static variable nr_present_sectionsBaoquan He1-0/+7
Patch series "mm/sparse: Optimize memmap allocation during sparse_init()", v6. In sparse_init(), two temporary pointer arrays, usemap_map and map_map are allocated with the size of NR_MEM_SECTIONS. They are used to store each memory section's usemap and mem map if marked as present. In 5-level paging mode, this will cost 512M memory though they will be released at the end of sparse_init(). System with few memory, like kdump kernel which usually only has about 256M, will fail to boot because of allocation failure if CONFIG_X86_5LEVEL=y. In this patchset, optimize the memmap allocation code to only use usemap_map and map_map with the size of nr_present_sections. This makes kdump kernel boot up with normal crashkernel='' setting when CONFIG_X86_5LEVEL=y. This patch (of 5): nr_present_sections is used to record how many memory sections are marked as present during system boot up, and will be used in the later patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Acked-by: Dave Hansen <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Pankaj Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-17mm/sparse.c: make sparse_init_one_section void and remove checkOscar Salvador1-9/+4
sparse_init_one_section() is being called from two sites: sparse_init() and sparse_add_one_section(). The former calls it from a for_each_present_section_nr() loop, and the latter marks the section as present before calling it. This means that when sparse_init_one_section() gets called, we already know that the section is present. So there is no point to double check that in the function. This removes the check and makes the function void. [[email protected]: fix error path in sparse_add_one_section] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: simplification suggested by Oscar] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm/sparse.c: pass the __highest_present_section_nr + 1 to alloc_func()Wei Yang1-1/+1
In commit c4e1be9ec113 ("mm, sparsemem: break out of loops early") __highest_present_section_nr is introduced to reduce the loop counts for present section. This is also helpful for usemap and memmap allocation. This patch uses __highest_present_section_nr + 1 to optimize the loop. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07mm/sparse.c: check __highest_present_section_nr only for a present sectionWei Yang1-3/+1
When searching a present section, there are two boundaries: * __highest_present_section_nr * NR_MEM_SECTIONS And it is known, __highest_present_section_nr is a more strict boundary than NR_MEM_SECTIONS. This means it would be necessary to check __highest_present_section_nr only. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-05-11mm: sections are not offlined during memory hotremovePavel Tatashin1-1/+1
Memory hotplug and hotremove operate with per-block granularity. If the machine has a large amount of memory (more than 64G), the size of a memory block can span multiple sections. By mistake, during hotremove we set only the first section to offline state. The bug was discovered because kernel selftest started to fail: https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop After commit, "mm/memory_hotplug: optimize probe routine". But, the bug is older than this commit. In this optimization we also added a check for sections to be in a proper state during hotplug operation. Link: http://lkml.kernel.org/r/[email protected] Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes") Signed-off-by: Pavel Tatashin <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Steven Sistare <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-05mm/memory_hotplug: optimize memory hotplugPavel Tatashin1-1/+7
During memory hotplugging we traverse struct pages three times: 1. memset(0) in sparse_add_one_section() 2. loop in __add_section() to set do: set_page_node(page, nid); and SetPageReserved(page); 3. loop in memmap_init_zone() to call __init_single_pfn() This patch removes the first two loops, and leaves only loop 3. All struct pages are initialized in one place, the same as it is done during boot. The benefits: - We improve memory hotplug performance because we are not evicting the cache several times and also reduce loop branching overhead. - Remove condition from hotpath in __init_single_pfn(), that was added in order to fix the problem that was reported by Bharata in the above email thread, thus also improve performance during normal boot. - Make memory hotplug more similar to the boot memory initialization path because we zero and initialize struct pages only in one function. - Simplifies memory hotplug struct page initialization code, and thus enables future improvements, such as multi-threading the initialization of struct pages in order to improve hotplug performance even further on larger machines. [[email protected]: v5] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Baoquan He <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Dan Williams <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Sistare <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-02Merge tag 'arch-removal' of ↵Linus Torvalds1-15/+0
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pul removal of obsolete architecture ports from Arnd Bergmann: "This removes the entire architecture code for blackfin, cris, frv, m32r, metag, mn10300, score, and tile, including the associated device drivers. I have been working with the (former) maintainers for each one to ensure that my interpretation was right and the code is definitely unused in mainline kernels. Many had fond memories of working on the respective ports to start with and getting them included in upstream, but also saw no point in keeping the port alive without any users. In the end, it seems that while the eight architectures are extremely different, they all suffered the same fate: There was one company in charge of an SoC line, a CPU microarchitecture and a software ecosystem, which was more costly than licensing newer off-the-shelf CPU cores from a third party (typically ARM, MIPS, or RISC-V). It seems that all the SoC product lines are still around, but have not used the custom CPU architectures for several years at this point. In contrast, CPU instruction sets that remain popular and have actively maintained kernel ports tend to all be used across multiple licensees. [ See the new nds32 port merged in the previous commit for the next generation of "one company in charge of an SoC line, a CPU microarchitecture and a software ecosystem" - Linus ] The removal came out of a discussion that is now documented at https://lwn.net/Articles/748074/. Unlike the original plans, I'm not marking any ports as deprecated but remove them all at once after I made sure that they are all unused. Some architectures (notably tile, mn10300, and blackfin) are still being shipped in products with old kernels, but those products will never be updated to newer kernel releases. After this series, we still have a few architectures without mainline gcc support: - unicore32 and hexagon both have very outdated gcc releases, but the maintainers promised to work on providing something newer. At least in case of hexagon, this will only be llvm, not gcc. - openrisc, risc-v and nds32 are still in the process of finishing their support or getting it added to mainline gcc in the first place. They all have patched gcc-7.3 ports that work to some degree, but complete upstream support won't happen before gcc-8.1. Csky posted their first kernel patch set last week, their situation will be similar [ Palmer Dabbelt points out that RISC-V support is in mainline gcc since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]" This really says it all: 2498 files changed, 95 insertions(+), 467668 deletions(-) * tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits) MAINTAINERS: UNICORE32: Change email account staging: iio: remove iio-trig-bfin-timer driver tty: hvc: remove tile driver tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers serial: remove tile uart driver serial: remove m32r_sio driver serial: remove blackfin drivers serial: remove cris/etrax uart drivers usb: Remove Blackfin references in USB support usb: isp1362: remove blackfin arch glue usb: musb: remove blackfin port usb: host: remove tilegx platform glue pwm: remove pwm-bfin driver i2c: remove bfin-twi driver spi: remove blackfin related host drivers watchdog: remove bfin_wdt driver can: remove bfin_can driver mmc: remove bfin_sdh driver input: misc: remove blackfin rotary driver input: keyboard: remove bf54x driver ...
2018-03-27x86/mm/32: Remove unused node_memmap_size_bytes() & ↵David Rientjes1-22/+0
CONFIG_NEED_NODE_MEMMAP_SIZE logic node_memmap_size_bytes() has been unused since the v3.9 kernel, so remove it. Signed-off-by: David Rientjes <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: f03574f2d5b2 ("x86-32, mm: Rip out x86_32 NUMA remapping code") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-03-16mm: remove obsolete alloc_remap()Arnd Bergmann1-15/+0
Tile was the only remaining architecture to implement alloc_remap(), and since that is being removed, there is no point in keeping this function. Removing all callers simplifies the mem_map handling. Reviewed-by: Pavel Tatashin <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2018-02-06Merge tag 'libnvdimm-for-4.16' of ↵Linus Torvalds1-18/+25
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Ross Zwisler: - Require struct page by default for filesystem DAX to remove a number of surprising failure cases. This includes failures with direct I/O, gdb and fork(2). - Add support for the new Platform Capabilities Structure added to the NFIT in ACPI 6.2a. This new table tells us whether the platform supports flushing of CPU and memory controller caches on unexpected power loss events. - Revamp vmem_altmap and dev_pagemap handling to clean up code and better support future future PCI P2P uses. - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}. - Enhance nfit_test so we can test some of the new things added in version 1.6 of the DSM specification. This includes testing firmware download and simulating the Last Shutdown State (LSS) status. * tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits) libnvdimm, namespace: remove redundant initialization of 'nd_mapping' acpi, nfit: fix register dimm error handling libnvdimm, namespace: make min namespace size 4K tools/testing/nvdimm: force nfit_test to depend on instrumented modules libnvdimm/nfit_test: adding support for unit testing enable LSS status libnvdimm/nfit_test: add firmware download emulation nfit-test: Add platform cap support from ACPI 6.2a to test libnvdimm: expose platform persistence attribute for nd_region acpi: nfit: add persistent memory control flag for nd_region acpi: nfit: Add support for detect platform CPU cache flush on power loss device-dax: Fix trailing semicolon libnvdimm, btt: fix uninitialized err_lock dax: require 'struct page' by default for filesystem dax ext2: auto disable dax instead of failing mount ext4: auto disable dax instead of failing mount mm, dax: introduce pfn_t_special() mm: Fix devm_memremap_pages() collision handling mm: Fix memory size alignment in devm_memremap_pages_release() memremap: merge find_dev_pagemap into get_dev_pagemap memremap: change devm_memremap_pages interface to use struct dev_pagemap ...
2018-02-03Merge branch 'for-4.16/nfit' into libnvdimm-for-nextRoss Zwisler1-1/+1
2018-01-31include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM ↵Petr Tesarik1-1/+5
mem_map pointer The comment is confusing. On the one hand, it refers to 32-bit alignment (struct page alignment on 32-bit platforms), but this would only guarantee that the 2 lowest bits must be zero. On the other hand, it claims that at least 3 bits are available, and 3 bits are actually used. This is not broken, because there is a stronger alignment guarantee, just less obvious. Let's fix the comment to make it clear how many bits are available and why. Although memmap arrays are allocated in various places, the resulting pointer is encoded eventually, so I am adding a BUG_ON() here to enforce at runtime that all expected bits are indeed available. I have also added a BUILD_BUG_ON to check that PFN_SECTION_SHIFT is sufficient, because this part of the calculation can be easily checked at build time. [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Tesarik <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kemi Wang <[email protected]> Cc: YASUAKI ISHIMATSU <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-08mm: pass the vmem_altmap to vmemmap_freeChristoph Hellwig1-10/+13
We can just pass this on instead of having to do a radix tree lookup without proper locking a few levels into the callchain. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2018-01-08mm: pass the vmem_altmap to vmemmap_populateChristoph Hellwig1-8/+12
We can just pass this on instead of having to do a radix tree lookup without proper locking a few levels into the callchain. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2018-01-04mm/sparse.c: wrong allocation for mem_sectionBaoquan He1-1/+1
In commit 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") mem_section is allocated at runtime to save memory. It allocates the first dimension of array with sizeof(struct mem_section). It costs extra memory, should be sizeof(struct mem_section *). Fix it. Link: http://lkml.kernel.org/r/[email protected] Fixes: 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") Signed-off-by: Baoquan He <[email protected]> Tested-by: Dave Young <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Atsushi Kumagai <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-11-15mm: stop zeroing memory during allocation in vmemmapPavel Tatashin1-3/+3
vmemmap_alloc_block() will no longer zero the block, so zero memory at its call sites for everything except struct pages. Struct page memory is zero'd by struct page initialization. Replace allocators in sparse-vmemmap to use the non-zeroing version. So, we will get the performance improvement by zeroing the memory in parallel when struct pages are zeroed. Add struct page zeroing as a part of initialization of other fields in __init_single_page(). This single thread performance collected on: Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz with 1T of memory (268400646 pages in 8 nodes): BASE FIX sparse_init 11.244671836s 0.007199623s zone_sizes_init 4.879775891s 8.355182299s -------------------------- Total 16.124447727s 8.362381922s sparse_init is where memory for struct pages is zeroed, and the zeroing part is moved later in this patch into __init_single_page(), which is called from zone_sizes_init(). [[email protected]: make vmemmap_alloc_block_zero() private to sparse-vmemmap.c] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: Steven Sistare <[email protected]> Reviewed-by: Daniel Jordan <[email protected]> Reviewed-by: Bob Picco <[email protected]> Tested-by: Bob Picco <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: David S. Miller <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-11-10Merge branch 'x86/mm' into x86/asm, to merge branchesIngo Molnar1-0/+10
Most of x86/mm is already in x86/asm, so merge the rest too. Signed-off-by: Ingo Molnar <[email protected]>
2017-11-07mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=yKirill A. Shutemov1-0/+10
Since commit: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") we allocate the mem_section array dynamically in sparse_memory_present_with_active_regions(), but some architectures, like arm64, don't call the routine to initialize sparsemem. Let's move the initialization into memory_present() it should cover all architectures. Reported-and-tested-by: Sudeep Holla <[email protected]> Tested-by: Bjorn Andersson <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-11-07Merge branch 'linus' into x86/asm, to pick up fixes and resolve conflictsIngo Molnar1-0/+1
Conflicts: arch/x86/kernel/cpu/Makefile Signed-off-by: Ingo Molnar <[email protected]>