aboutsummaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)AuthorFilesLines
2021-05-10rcu: Fix typo in comment: kthead -> kthreadRolf Eike Beer1-1/+1
Signed-off-by: Rolf Eike Beer <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-05-07mm: fix typos in commentsLu Jialin3-3/+3
succed -> succeed in mm/hugetlb.c wil -> will in mm/mempolicy.c wit -> with in mm/page_alloc.c Retruns -> Returns in mm/page_vma_mapped.c confict -> conflict in mm/secretmem.c No functionality changed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Lu Jialin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-07mm: fix typos in commentsIngo Molnar37-80/+80
Fix ~94 single-word typos in locking code comments, plus a few very obvious grammar mistakes. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Cc: Bhaskar Chowdhury <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-07mm/slab.c: fix spelling mistake "disired" -> "desired"Colin Ian King1-1/+1
There is a spelling mistake in a comment. Fix it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-07mm/vmalloc: remove vwrite()David Hildenbrand2-125/+1
The last user (/dev/kmem) is gone. Let's drop it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Minchan Kim <[email protected]> Cc: huang ying <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-07drivers/char: remove /dev/kmem for goodDavid Hildenbrand2-2/+2
Patch series "drivers/char: remove /dev/kmem for good". Exploring /dev/kmem and /dev/mem in the context of memory hot(un)plug and memory ballooning, I started questioning the existence of /dev/kmem. Comparing it with the /proc/kcore implementation, it does not seem to be able to deal with things like a) Pages unmapped from the direct mapping (e.g., to be used by secretmem) -> kern_addr_valid(). virt_addr_valid() is not sufficient. b) Special cases like gart aperture memory that is not to be touched -> mem_pfn_is_ram() Unless I am missing something, it's at least broken in some cases and might fault/crash the machine. Looks like its existence has been questioned before in 2005 and 2010 [1], after ~11 additional years, it might make sense to revive the discussion. CONFIG_DEVKMEM is only enabled in a single defconfig (on purpose or by mistake?). All distributions disable it: in Ubuntu it has been disabled for more than 10 years, in Debian since 2.6.31, in Fedora at least starting with FC3, in RHEL starting with RHEL4, in SUSE starting from 15sp2, and OpenSUSE has it disabled as well. 1) /dev/kmem was popular for rootkits [2] before it got disabled basically everywhere. Ubuntu documents [3] "There is no modern user of /dev/kmem any more beyond attackers using it to load kernel rootkits.". RHEL documents in a BZ [5] "it served no practical purpose other than to serve as a potential security problem or to enable binary module drivers to access structures/functions they shouldn't be touching" 2) /proc/kcore is a decent interface to have a controlled way to read kernel memory for debugging puposes. (will need some extensions to deal with memory offlining/unplug, memory ballooning, and poisoned pages, though) 3) It might be useful for corner case debugging [1]. KDB/KGDB might be a better fit, especially, to write random memory; harder to shoot yourself into the foot. 4) "Kernel Memory Editor" [4] hasn't seen any updates since 2000 and seems to be incompatible with 64bit [1]. For educational purposes, /proc/kcore might be used to monitor value updates -- or older kernels can be used. 5) It's broken on arm64, and therefore, completely disabled there. Looks like it's essentially unused and has been replaced by better suited interfaces for individual tasks (/proc/kcore, KDB/KGDB). Let's just remove it. [1] https://lwn.net/Articles/147901/ [2] https://www.linuxjournal.com/article/10505 [3] https://wiki.ubuntu.com/Security/Features#A.2Fdev.2Fkmem_disabled [4] https://sourceforge.net/projects/kme/ [5] https://bugzilla.redhat.com/show_bug.cgi?id=154796 Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Alexander A. Klimov" <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexandre Belloni <[email protected]> Cc: Andrew Lunn <[email protected]> Cc: Andrey Zhizhikin <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Brian Cain <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Corentin Labbe <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Gregory Clement <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hillf Danton <[email protected]> Cc: huang ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: James Troup <[email protected]> Cc: Jiaxun Yang <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kairui Song <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Cc: Kuninori Morimoto <[email protected]> Cc: Liviu Dudau <[email protected]> Cc: Lorenzo Pieralisi <[email protected]> Cc: Luc Van Oostenryck <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mikulas Patocka <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Niklas Schnelle <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Cc: [email protected] Cc: Palmer Dabbelt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: "Pavel Machek (CIP)" <[email protected]> Cc: Pavel Machek <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Pierre Morel <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rich Felker <[email protected]> Cc: Robert Richter <[email protected]> Cc: Rob Herring <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Sebastian Hesselbarth <[email protected]> Cc: [email protected] Cc: Stafford Horne <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Sudeep Holla <[email protected]> Cc: Theodore Dubois <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: William Cohen <[email protected]> Cc: Xiaoming Ni <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-07mm: fix some typos and code style problemsShijie Luo5-7/+7
fix some typos and code style problems in mm. gfp.h: s/MAXNODES/MAX_NUMNODES mmzone.h: s/then/than rmap.c: s/__vma_split()/__vma_adjust() swap.c: s/__mod_zone_page_stat/__mod_zone_page_state, s/is is/is swap_state.c: s/whoes/whose z3fold.c: code style problem fix in z3fold_unregister_migration zsmalloc.c: s/of/or, s/give/given Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shijie Luo <[email protected]> Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-07delayacct: clear right task's flag after blkio completesYafang Shao1-4/+4
When I was implementing a latency analyzer tool by using task->delays and other things, I found an issue in delayacct. The issue is it should clear the target's flag instead of current's in delayacct_blkio_end(). When I git blame delayacct, I found there're some similar issues we have fixed in delayacct_blkio_end(). - Commit c96f5471ce7d ("delayacct: Account blkio completion on the correct task") fixed the issue that it should account blkio completion on the target task instead of current. - Commit b512719f771a ("delayacct: fix crash in delayacct_blkio_end() after delayacct init failure") fixed the issue that it should check target task's delays instead of current task'. It seems that delayacct_blkio_{begin, end} are error prone. So I introduce a new paratmeter - the target task 'p' - to these helpers. After that change, the callsite will specifilly set the right task, which should make it less error prone. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yafang Shao <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Josh Snyder <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05kfence: use power-efficient work queue to run delayed workMarco Elver1-2/+3
Use the power-efficient work queue, to avoid the pathological case where we keep pinning ourselves on the same possibly idle CPU on systems that want to be power-efficient (https://lwn.net/Articles/731052/). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Jann Horn <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05kfence: maximize allocation wait timeout durationMarco Elver1-1/+11
The allocation wait timeout was initially added because of warnings due to CONFIG_DETECT_HUNG_TASK=y [1]. While the 1 sec timeout is sufficient to resolve the warnings (given the hung task timeout must be 1 sec or larger) it may cause unnecessary wake-ups if the system is idle: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com Fix it by computing the timeout duration in terms of the current sysctl_hung_task_timeout_secs value. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Jann Horn <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05kfence: await for allocation using wait_eventMarco Elver1-16/+29
Patch series "kfence: optimize timer scheduling", v2. We have observed that mostly-idle systems with KFENCE enabled wake up otherwise idle CPUs, preventing such to enter a lower power state. Debugging revealed that KFENCE spends too much active time in toggle_allocation_gate(). While the first version of KFENCE was using all the right bits to be scheduling optimal, and thus power efficient, by simply using wait_event() + wake_up(), that code was unfortunately removed. As KFENCE was exposed to various different configs and tests, the scheduling optimal code slowly disappeared. First because of hung task warnings, and finally because of deadlocks when an allocation is made by timer code with debug objects enabled. Clearly, the "fixes" were not too friendly for devices that want to be power efficient. Therefore, let's try a little harder to fix the hung task and deadlock problems that we have with wait_event() + wake_up(), while remaining as scheduling friendly and power efficient as possible. Crucially, we need to defer the wake_up() to an irq_work, avoiding any potential for deadlock. The result with this series is that on the devices where we observed a power regression, power usage returns back to baseline levels. This patch (of 3): On mostly-idle systems, we have observed that toggle_allocation_gate() is a cause of frequent wake-ups, preventing an otherwise idle CPU to go into a lower power state. A late change in KFENCE's development, due to a potential deadlock [1], required changing the scheduling-friendly wait_event_timeout() and wake_up() to an open-coded wait-loop using schedule_timeout(). [1] https://lkml.kernel.org/r/[email protected] To avoid unnecessary wake-ups, switch to using wait_event_timeout(). Unfortunately, we still cannot use a version with direct wake_up() in __kfence_alloc() due to the same potential for deadlock as in [1]. Instead, add a level of indirection via an irq_work that is scheduled if we determine that the kfence_timer requires a wake_up(). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Jann Horn <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Hillf Danton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05kfence: zero guard page after out-of-bounds accessMarco Elver1-0/+1
After an out-of-bounds accesses, zero the guard page before re-protecting in kfence_guarded_free(). On one hand this helps make the failure mode of subsequent out-of-bounds accesses more deterministic, but could also prevent certain information leaks. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Acked-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Jann Horn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/process_vm_access.c: remove duplicate includeZhang Yunkai1-1/+0
'linux/compat.h' included in 'process_vm_access.c' is duplicated. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhang Yunkai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/mempool: minor coding style tweaksZhiyuan Dai14-23/+27
Various coding style tweaks to various files under mm/ [[email protected]: mm/swapfile: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/sparse: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/vmscan: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/compaction: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/oom_kill: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/shmem: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/page_alloc: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/filemap: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/mlock: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/frontswap: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/vmalloc: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/memory_hotplug: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: mm/mempolicy: minor coding style tweaks] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhiyuan Dai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/highmem.c: fix coding style issuesongqiang1-6/+5
Delete/add some blank lines and some blank spaces Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: songqiang <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/zsmalloc: use BUG_ON instead of if condition followed by BUG.zhouchuangao1-4/+2
It can be optimized at compile time. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: zhouchuangao <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/zswap.c: switch from strlcpy to strscpyZhiyuan Dai1-1/+1
strlcpy is marked as deprecated in Documentation/process/deprecated.rst, and there is no functional difference when the caller expects truncation (when not checking the return value). strscpy is relatively better as it also avoids scanning the whole source string. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhiyuan Dai <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm,memory_hotplug: add kernel boot option to enable memmap_on_memoryOscar Salvador2-2/+13
Self stored memmap leads to a sparse memory situation which is unsuitable for workloads that requires large contiguous memory chunks, so make this an opt-in which needs to be explicitly enabled. To control this, let memory_hotplug have its own memory space, as suggested by David, so we can add memory_hotplug.memmap_on_memory parameter. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm,memory_hotplug: allocate memmap from the added memory rangeOscar Salvador3-11/+157
Physical memory hotadd has to allocate a memmap (struct page array) for the newly added memory section. Currently, alloc_pages_node() is used for those allocations. This has some disadvantages: a) an existing memory is consumed for that purpose (eg: ~2MB per 128MB memory section on x86_64) This can even lead to extreme cases where system goes OOM because the physically hotplugged memory depletes the available memory before it is onlined. b) if the whole node is movable then we have off-node struct pages which has performance drawbacks. c) It might be there are no PMD_ALIGNED chunks so memmap array gets populated with base pages. This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled. Vmemap page tables can map arbitrary memory. That means that we can reserve a part of the physically hotadded memory to back vmemmap page tables. This implementation uses the beginning of the hotplugged memory for that purpose. There are some non-obviously things to consider though. Vmemmap pages are allocated/freed during the memory hotplug events (add_memory_resource(), try_remove_memory()) when the memory is added/removed. This means that the reserved physical range is not online although it is used. The most obvious side effect is that pfn_to_online_page() returns NULL for those pfns. The current design expects that this should be OK as the hotplugged memory is considered a garbage until it is onlined. For example hibernation wouldn't save the content of those vmmemmaps into the image so it wouldn't be restored on resume but this should be OK as there no real content to recover anyway while metadata is reachable from other data structures (e.g. vmemmap page tables). The reserved space is therefore (de)initialized during the {on,off}line events (mhp_{de}init_memmap_on_memory). That is done by extracting page allocator independent initialization from the regular onlining path. The primary reason to handle the reserved space outside of {on,off}line_pages is to make each initialization specific to the purpose rather than special case them in a single function. As per above, the functions that are introduced are: - mhp_init_memmap_on_memory: Initializes vmemmap pages by calling move_pfn_range_to_zone(), calls kasan_add_zero_shadow(), and onlines as many sections as vmemmap pages fully span. - mhp_deinit_memmap_on_memory: Offlines as many sections as vmemmap pages fully span, removes the range from zhe zone by remove_pfn_range_from_zone(), and calls kasan_remove_zero_shadow() for the range. The new function memory_block_online() calls mhp_init_memmap_on_memory() before doing the actual online_pages(). Should online_pages() fail, we clean up by calling mhp_deinit_memmap_on_memory(). Adjusting of present_pages is done at the end once we know that online_pages() succedeed. On offline, memory_block_offline() needs to unaccount vmemmap pages from present_pages() before calling offline_pages(). This is necessary because offline_pages() tears down some structures based on the fact whether the node or the zone become empty. If offline_pages() fails, we account back vmemmap pages. If it succeeds, we call mhp_deinit_memmap_on_memory(). Hot-remove: We need to be careful when removing memory, as adding and removing memory needs to be done with the same granularity. To check that this assumption is not violated, we check the memory range we want to remove and if a) any memory block has vmemmap pages and b) the range spans more than a single memory block, we scream out loud and refuse to proceed. If all is good and the range was using memmap on memory (aka vmemmap pages), we construct an altmap structure so free_hugepage_table does the right thing and calls vmem_altmap_free instead of free_pagetable. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm,memory_hotplug: factor out adjusting present pages into ↵David Hildenbrand1-10/+12
adjust_present_page_count() Let's have a single place (inspired by adjust_managed_page_count()) where we adjust present pages. In contrast to adjust_managed_page_count(), only memory onlining or offlining is allowed to modify the number of present pages. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm,memory_hotplug: relax fully spanned sections checkOscar Salvador1-4/+18
We want {online,offline}_pages to operate on whole memblocks, but memmap_on_memory will poke pageblock_nr_pages aligned holes in the beginning, which is a special case we want to allow. Relax the check to account for that case. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/memory_hotplug: remove broken locking of zone PCP structures during hot ↵Mel Gorman1-4/+0
remove zone_pcp_reset allegedly protects against a race with drain_pages using local_irq_save but this is bogus. local_irq_save only operates on the local CPU. If memory hotplug is running on CPU A and drain_pages is running on CPU B, disabling IRQs on CPU A does not affect CPU B and offers no protection. This patch deletes IRQ disable/enable on the grounds that IRQs protect nothing and assumes the existing hotplug paths guarantees the PCP cannot be used after zone_pcp_enable(). That should be the case already because all the pages have been freed and there is no page to put on the PCP lists. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Minchan Kim <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05selftests/vm: gup_test: test faulting in kernel, and verify pinnable pagesPavel Tatashin1-0/+6
When pages are pinned they can be faulted in userland and migrated, and they can be faulted right in kernel without migration. In either case, the pinned pages must end-up being pinnable (not movable). Add a new test to gup_test, to help verify that the gup/pup (get_user_pages() / pin_user_pages()) behavior with respect to pinnable and movable pages is reasonable and correct. Specifically, provide a way to: 1) Verify that only "pinnable" pages are pinned. This is checked automatically for you. 2) Verify that gup/pup performance is reasonable. This requires comparing benchmarks between doing gup/pup on pages that have been pre-faulted in from user space, vs. doing gup/pup on pages that are not faulted in until gup/pup time (via FOLL_TOUCH). This decision is controlled with the new -z command line option. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: John Hubbard <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05selftests/vm: gup_test: fix test flagPavel Tatashin2-14/+12
In gup_test both gup_flags and test_flags use the same flags field. This is broken. Farther, in the actual gup_test.c all the passed gup_flags are erased and unconditionally replaced with FOLL_WRITE. Which means that test_flags are ignored, and code like this always performs pin dump test: 155 if (gup->flags & GUP_TEST_FLAG_DUMP_PAGES_USE_PIN) 156 nr = pin_user_pages(addr, nr, gup->flags, 157 pages + i, NULL); 158 else 159 nr = get_user_pages(addr, nr, gup->flags, 160 pages + i, NULL); 161 break; Add a new test_flags field, to allow raw gup_flags to work. Add a new subcommand for DUMP_USER_PAGES_TEST to specify that pin test should be performed. Remove unconditional overwriting of gup_flags via FOLL_WRITE. But, preserve the previous behaviour where FOLL_WRITE was the default flag, and add a new option "-W" to unset FOLL_WRITE. Rename flags with gup_flags. With the fix, dump works like this: root@virtme:/# gup_test -c ---- page #0, starting from user virt addr: 0x7f8acb9e4000 page:00000000d3d2ee27 refcount:2 mapcount:1 mapping:0000000000000000 index:0x0 pfn:0x100bcf anon flags: 0x300000000080016(referenced|uptodate|lru|swapbacked) raw: 0300000000080016 ffffd0e204021608 ffffd0e208df2e88 ffff8ea04243ec61 raw: 0000000000000000 0000000000000000 0000000200000000 0000000000000000 page dumped because: gup_test: dump_pages() test DUMP_USER_PAGES_TEST: done root@virtme:/# gup_test -c -p ---- page #0, starting from user virt addr: 0x7fd19701b000 page:00000000baed3c7d refcount:1025 mapcount:1 mapping:0000000000000000 index:0x0 pfn:0x108008 anon flags: 0x300000000080014(uptodate|lru|swapbacked) raw: 0300000000080014 ffffd0e204200188 ffffd0e205e09088 ffff8ea04243ee71 raw: 0000000000000000 0000000000000000 0000040100000000 0000000000000000 page dumped because: gup_test: dump_pages() test DUMP_USER_PAGES_TEST: done Refcount shows the difference between pin vs no-pin case. Also change type of nr from int to long, as it counts number of pages. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: John Hubbard <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/gup: longterm pin migration cleanupPavel Tatashin1-56/+37
When pages are longterm pinned, we must migrated them out of movable zone. The function that migrates them has a hidden loop with goto. The loop is to retry on isolation failures, and after successful migration. Make this code better by moving this loop to the caller. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/gup: change index type to long as it counts pagesPavel Tatashin1-1/+1
In __get_user_pages_locked() i counts number of pages which should be long, as long is used in all other places to contain number of pages, and 32-bit becomes increasingly small for handling page count proportional values. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/gup: migrate pinned pages out of movable zonePavel Tatashin1-33/+34
We should not pin pages in ZONE_MOVABLE. Currently, we do not pin only movable CMA pages. Generalize the function that migrates CMA pages to migrate all movable pages. Use is_pinnable_page() to check which pages need to be migrated Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: John Hubbard <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm: honor PF_MEMALLOC_PIN for all movable pagesPavel Tatashin2-12/+10
PF_MEMALLOC_PIN is only honored for CMA pages, extend this flag to work for any allocations from ZONE_MOVABLE by removing __GFP_MOVABLE from gfp_mask when this flag is passed in the current context. Add is_pinnable_page() to return true if page is in a pinnable page. A pinnable page is not in ZONE_MOVABLE and not of MIGRATE_CMA type. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm: apply per-task gfp constraints in fast pathPavel Tatashin1-7/+8
Function current_gfp_context() is called after fast path. However, soon we will add more constraints which will also limit zones based on context. Move this call into fast path, and apply the correct constraints for all allocations. Also update .reclaim_idx based on value returned by current_gfp_context() because it soon will modify the allowed zones. Note: With this patch we will do one extra current->flags load during fast path, but we already load current->flags in fast-path: __alloc_pages() prepare_alloc_pages() current_alloc_flags(gfp_mask, *alloc_flags); Later, when we add the zone constrain logic to current_gfp_context() we will be able to remove current->flags load from current_alloc_flags, and therefore return fast-path to the current performance level. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Suggested-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PINPavel Tatashin3-6/+6
PF_MEMALLOC_NOCMA is used ot guarantee that the allocator will not return pages that might belong to CMA region. This is currently used for long term gup to make sure that such pins are not going to be done on any CMA pages. When PF_MEMALLOC_NOCMA has been introduced we haven't realized that it is focusing on CMA pages too much and that there is larger class of pages that need the same treatment. MOVABLE zone cannot contain any long term pins as well so it makes sense to reuse and redefine this flag for that usecase as well. Rename the flag to PF_MEMALLOC_PIN which defines an allocation context which can only get pages suitable for long-term pins. Also rename: memalloc_nocma_save()/memalloc_nocma_restore to memalloc_pin_save()/memalloc_pin_restore() and make the new functions common. [[email protected]: fix renaming of PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: John Hubbard <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Mike Rapoport <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/gup: check for isolation errorsPavel Tatashin1-26/+34
It is still possible that we pin movable CMA pages if there are isolation errors and cma_page_list stays empty when we check again. Check for isolation errors, and return success only when there are no isolation errors, and cma_page_list is empty after checking. Because isolation errors are transient, we retry indefinitely. Link: https://lkml.kernel.org/r/[email protected] Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/gup: return an error on migration failurePavel Tatashin1-10/+7
When migration failure occurs, we still pin pages, which means that we may pin CMA movable pages which should never be the case. Instead return an error without pinning pages when migration failure happens. No need to retry migrating, because migrate_pages() already retries 10 times. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/gup: check every subpage of a compound page during isolationPavel Tatashin1-12/+7
When pages are isolated in check_and_migrate_movable_pages() we skip compound number of pages at a time. However, as Jason noted, it is not necessary correct that pages[i] corresponds to the pages that we skipped. This is because it is possible that the addresses in this range had split_huge_pmd()/split_huge_pud(), and these functions do not update the compound page metadata. The problem can be reproduced if something like this occurs: 1. User faulted huge pages. 2. split_huge_pmd() was called for some reason 3. User has unmapped some sub-pages in the range 4. User tries to longterm pin the addresses. The resulting pages[i] might end-up having pages which are not compound size page aligned. Link: https://lkml.kernel.org/r/[email protected] Fixes: aa712399c1e8 ("mm/gup: speed up check_and_migrate_cma_pages() on huge page") Signed-off-by: Pavel Tatashin <[email protected]> Reported-by: Jason Gunthorpe <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/gup: don't pin migrated cma pages in movable zonePavel Tatashin1-1/+1
Patch series "prohibit pinning pages in ZONE_MOVABLE", v11. When page is pinned it cannot be moved and its physical address stays the same until pages is unpinned. This is useful functionality to allows userland to implementation DMA access. For example, it is used by vfio in vfio_pin_pages(). However, this functionality breaks memory hotplug/hotremove assumptions that pages in ZONE_MOVABLE can always be migrated. This patch series fixes this issue by forcing new allocations during page pinning to omit ZONE_MOVABLE, and also to migrate any existing pages from ZONE_MOVABLE during pinning. It uses the same scheme logic that is currently used by CMA, and extends the functionality for all allocations. For more information read the discussion [1] about this problem. [1] https://lore.kernel.org/lkml/CA+CK2bBffHBxjmb9jmSKacm0fJMinyt3Nhk8Nx6iudcQSj80_w@mail.gmail.com This patch (of 14): In order not to fragment CMA the pinned pages are migrated. However, they are migrated to ZONE_MOVABLE, which also should not have pinned pages. Remove __GFP_MOVABLE, so pages can be migrated to zones where pinning is allowed. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: John Hubbard <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Michal Hocko <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Dan Williams <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: David Rientjes <[email protected]> Cc: John Hubbard <[email protected]> Cc: Ira Weiny <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/util.c: fix typoBhaskar Chowdhury1-1/+1
s/condtion/condition/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Bhaskar Chowdhury <[email protected]> Acked-by: Randy Dunlap <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/util.c: reduce mem_dump_obj() object sizeJoe Perches1-10/+14
Simplify the code by using a temporary and reduce the object size by using a single call to pr_cont(). Reverse a test and unindent a block too. $ size mm/util.o* (defconfig x86-64) text data bss dec hex filename 7419 372 40 7831 1e97 mm/util.o.new 7477 372 40 7889 1ed1 mm/util.o.old Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joe Perches <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm: generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]Anshuman Khandual1-0/+6
ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE] configs have duplicate definitions on platforms that subscribe them. Instead, just make them generic options which can be selected on applicable platforms. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Catalin Marinas <[email protected]> [arm64] Acked-by: Heiko Carstens <[email protected]> [s390] Cc: Will Deacon <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Helge Deller <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Russell King <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm: generalize ARCH_HAS_CACHE_LINE_SIZEAnshuman Khandual1-0/+3
Patch series "mm: some config cleanups", v2. This series contains config cleanup patches which reduces code duplication across platforms and also improves maintainability. There is no functional change intended with this series. This patch (of 6): ARCH_HAS_CACHE_LINE_SIZE config has duplicate definitions on platforms that subscribe it. Instead, just make it a generic option which can be selected on applicable platforms. This change reduces code duplication and makes it cleaner. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Catalin Marinas <[email protected]> [arm64] Acked-by: Vineet Gupta <[email protected]> [arc] Cc: Will Deacon <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/mmap.c: don't unlock VMAs in remap_file_pages()Liam Howlett1-17/+1
Since this call uses MAP_FIXED, do_mmap() will munlock the necessary range. There is also an error in the loop test expression which will evaluate as false and the loop body has never execute. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Acked-by: Hugh Dickins <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05x86/mm: track linear mapping split eventsSaravanan D1-0/+4
To help with debugging the sluggishness caused by TLB miss/reload, we introduce monotonic hugepage [direct mapped] split event counts since system state: SYSTEM_RUNNING to be displayed as part of /proc/vmstat in x86 servers The lifetime split event information will be displayed at the bottom of /proc/vmstat .... swap_ra 0 swap_ra_hit 0 direct_map_level2_splits 94 direct_map_level3_splits 4 nr_unstable 0 .... One of the many lasting sources of direct hugepage splits is kernel tracing (kprobes, tracepoints). Note that the kernel's code segment [512 MB] points to the same physical addresses that have been already mapped in the kernel's direct mapping range. Source : Documentation/x86/x86_64/mm.rst When we enable kernel tracing, the kernel has to modify attributes/permissions of the text segment hugepages that are direct mapped causing them to split. Kernel's direct mapped hugepages do not coalesce back after split and remain in place for the remainder of the lifetime. An instance of direct page splits when we turn on dynamic kernel tracing .... cat /proc/vmstat | grep -i direct_map_level direct_map_level2_splits 784 direct_map_level3_splits 12 bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @ [pid, comm] = count(); }' cat /proc/vmstat | grep -i direct_map_level direct_map_level2_splits 789 direct_map_level3_splits 12 .... Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Saravanan D <[email protected]> Acked-by: Tejun Heo <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm: /proc/sys/vm/stat_refresh stop checking monotonic numa statsHugh Dickins1-9/+0
All of the VM NUMA stats are event counts, incremented never decremented: it is not very useful for vmstat_refresh() to check them throughout their first aeon, then warn on them throughout their next. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm: /proc/sys/vm/stat_refresh skip checking known negative statsHugh Dickins1-0/+15
vmstat_refresh() can occasionally catch nr_zone_write_pending and nr_writeback when they are transiently negative. The reason is partly that the interrupt which decrements them in test_clear_page_writeback() can come in before __test_set_page_writeback() got to increment them; but transient negatives are still seen even when that is prevented, and I am not yet certain why (but see Roman's note below). Those stats are not buggy, they have never been seen to drift away from 0 permanently: so just avoid the annoyance of showing a warning on them. Similarly avoid showing a warning on nr_free_cma: CMA users have seen that one reported negative from /proc/sys/vm/stat_refresh too, but it does drift away permanently: I believe that's because its incrementation and decrementation are decided by page migratetype, but the migratetype of a pageblock is not guaranteed to be constant. Roman Gushchin points out: "For performance reasons, vmstat counters are incremented and decremented using per-cpu batches. vmstat_refresh() flushes the per-cpu batches on all CPUs, to get values as accurate as possible; but this method is not atomic, so the resulting value is not always precise. As a consequence, for those counters whose actual value is close to 0, a small negative value may occasionally be reported. If the value is small and the state is transient, it is not an indication of an error" Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Reported-by: Roman Gushchin <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm: no more EINVAL from /proc/sys/vm/stat_refreshHugh Dickins1-5/+0
EINVAL was good for drawing the refresher's attention to a warning in dmesg, but became very tiresome when running test suites scripted with "set -e": an underflow from a bug in one feature would cause unrelated tests much later to fail, just because their /proc/sys/vm/stat_refresh touch failed with that error. Stop doing that. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm: restore node stat checking in /proc/sys/vm/stat_refreshHugh Dickins1-0/+8
In v4.7 commit 52b6f46bc163 ("mm: /proc/sys/vm/stat_refresh to force vmstat update") introduced vmstat_refresh(), with its vmstat underflow checking; then in v4.8 commit 75ef71840539 ("mm, vmstat: add infrastructure for per-node vmstats") split NR_VM_NODE_STAT_ITEMS out of NR_VM_ZONE_STAT_ITEMS without updating vmstat_refresh(): so it has been missing out much of the vmstat underflow checking ever since. Reinstate it. Thanks to Roman Gushchin <[email protected]> for tangentially pointing this out. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm/ksm: remove unused parameter from remove_trailing_rmap_items()Chengyang Fan1-4/+3
Since commit 6514d511dbe5 ("ksm: singly-linked rmap_list") was merged, remove_trailing_rmap_items() doesn't use the 'mm_slot' parameter. So remove it, and update caller accordingly. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chengyang Fan <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05ksm: fix potential missing rmap_item for stable_nodeMiaohe Lin1-0/+1
When removing rmap_item from stable tree, STABLE_FLAG of rmap_item is cleared with head reserved. So the following scenario might happen: For ksm page with rmap_item1: cmp_and_merge_page stable_node->head = &migrate_nodes; remove_rmap_item_from_tree, but head still equal to stable_node; try_to_merge_with_ksm_page failed; return; For the same ksm page with rmap_item2, stable node migration succeed this time. The stable_node->head does not equal to migrate_nodes now. For ksm page with rmap_item1 again: cmp_and_merge_page stable_node->head != &migrate_nodes && rmap_item->head == stable_node return; We would miss the rmap_item for stable_node and might result in failed rmap_walk_ksm(). Fix this by set rmap_item->head to NULL when rmap_item is removed from stable tree. Link: https://lkml.kernel.org/r/[email protected] Fixes: 4146d2d673e8 ("ksm: make !merge_across_nodes migration safe") Signed-off-by: Miaohe Lin <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05ksm: remove dedicated macro KSM_FLAG_MASKMiaohe Lin1-3/+1
The macro KSM_FLAG_MASK is used in rmap_walk_ksm() only. So we can replace ~KSM_FLAG_MASK with PAGE_MASK to remove this dedicated macro and make code more consistent because PAGE_MASK is used elsewhere in this file. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05ksm: use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()Miaohe Lin1-2/+1
It's unnecessary to lock the page when get ksm page if we're going to remove the rmap item as page migration is irrelevant in this case. Use GET_KSM_PAGE_NOLOCK instead to save some page lock cycles. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05ksm: remove redundant VM_BUG_ON_PAGE() on stable_tree_search()Miaohe Lin1-2/+0
Patch series "Cleanup and fixup for ksm". This series contains cleanups to remove unnecessary VM_BUG_ON_PAGE and dedicated macro KSM_FLAG_MASK. Also this fixes potential missing rmap_item for stable_node which would result in failed rmap_walk_ksm(). More details can be found in the respective changelogs. This patch (of 4): The same VM_BUG_ON_PAGE() check is already done in the callee. Remove these extra caller one to simplify code slightly. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-05-05mm: use proper type for cma_[alloc|release]Minchan Kim2-11/+12
size_t in cma_alloc is confusing since it makes people think it's byte count, not pages. Change it to unsigned long[1]. The unsigned int in cma_release is also not right so change it. Since we have unsigned long in cma_release, free_contig_range should also respect it. [1] 67a2e213e7e9, mm: cma: fix incorrect type conversion for size during dma allocation Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>