Age | Commit message (Collapse) | Author | Files | Lines |
|
Indirect calls are expensive, thanks to Spectre. Test for
TRANSHUGE_PAGE_DTOR and destroy the folio appropriately. Move the
free_compound_page() call into destroy_large_folio() to simplify later
patches.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Sidhartha Kumar <[email protected]>
Cc: Yanteng Si <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Pass a folio instead of the head page to save a few instructions. Update
the documentation, at least in English.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Sidhartha Kumar <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Indirect calls are expensive, thanks to Spectre. Call free_huge_page()
directly if the folio belongs to hugetlb.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Sidhartha Kumar <[email protected]>
Cc: Yanteng Si <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from
gup/gup_fast") from 2012 documented as the primary reason why we would want
to handle NUMA hinting faults from GUP:
KVM secondary MMU page faults will trigger the NUMA hinting page
faults through gup_fast -> get_user_pages -> follow_page ->
handle_mm_fault.
That is still the case today, and relevant KVM code has been converted to
manually set FOLL_HONOR_NUMA_FAULT. So let's stop setting
FOLL_HONOR_NUMA_FAULT for all GUP users and cross fingers that not that
many other ones that really require such handling for autonuma remain.
Possible interaction with MMU notifiers:
Assume a driver obtains a page using get_user_pages() to map it into
a secondary MMU, and uses the MMU notifier framework to get notified on
changes.
Assume get_user_pages() succeeded on a PROT_NONE-mapped page (because
FOLL_HONOR_NUMA_FAULT is not set) in an accessible VMA and the page is
mapped into a secondary MMU. Once user space would turn that mapping
inaccessible using mprotect(PROT_NONE), the actual PTE in the page table
might not change. If the MMU notifier would be smart and optimize for that
case "why notify if the PTE didn't change", that could be problematic.
At least change_pmd_range() with MMU_NOTIFY_PROTECTION_VMA for now does an
unconditional mmu_notifier_invalidate_range_start() ->
mmu_notifier_invalidate_range_end() and should be fine.
Note that even if a PTE in an accessible VMA is pte_protnone(), the
underlying page might be accessed by a secondary MMU that does not set
FOLL_HONOR_NUMA_FAULT, and test_young() MMU notifiers would return "true".
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: liubo <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
|
|
Sparse is not happy to see non-static variable without declaration:
lib/vsprintf.c:61:6: warning: symbol 'no_hash_pointers' was not declared.
Should it be static?
Declare respective variable in the sprintf.h. With this, add a comment to
discourage its use if no real need.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Marco Elver <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The old name is confusing because it implies the completion of earlier
kmemleak_init(), the new name update to kmemleak_late_initial represents
the completion of kmemleak_late_init().
No functional changes.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiaolei Wang <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Zhaoyang Huang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
set_track_prepare()
Patch series "mm/kmemleak: use object_cache instead of
kmemleak_initialized", v3.
Use object_cache instead of kmemleak_initialized to check in
set_track_prepare(), so that memory leaks after kmemleak_init() can be
recorded and Rename kmemleak_initialized to kmemleak_late_initialized
unreferenced object 0xc674ca80 (size 64):
comm "swapper/0", pid 1, jiffies 4294938337 (age 204.880s)
hex dump (first 32 bytes):
80 55 75 c6 80 54 75 c6 00 55 75 c6 80 52 75 c6 .Uu..Tu..Uu..Ru.
00 53 75 c6 00 00 00 00 00 00 00 00 00 00 00 00 .Su..........
This patch (of 2):
kmemleak_initialized is set in kmemleak_late_init(), which also means that
there is no call trace which object's memory leak is before
kmemleak_late_init(), so use object_cache instead of kmemleak_initialized
to check in set_track_prepare() to avoid no call trace records when there
is a memory leak in the code between kmemleak_init() and
kmemleak_late_init().
unreferenced object 0xc674ca80 (size 64):
comm "swapper/0", pid 1, jiffies 4294938337 (age 204.880s)
hex dump (first 32 bytes):
80 55 75 c6 80 54 75 c6 00 55 75 c6 80 52 75 c6 .Uu..Tu..Uu..Ru.
00 53 75 c6 00 00 00 00 00 00 00 00 00 00 00 00 .Su..........
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 56a61617dd22 ("mm: use stack_depot for recording kmemleak's backtrace")
Signed-off-by: Xiaolei Wang <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Zhaoyang Huang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
ksm currently maintains several statistics, which let you determine how
successful KSM is at sharing pages. However it does not contain a metric
to determine how much work it does.
This commit adds the pages scanned metric. This allows the administrator
to determine how many pages have been scanned over a period of time.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Stefan Roesch <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
By making maybe_unlock_mmap_for_io() handle the VMA lock correctly, we
make fault_dirty_shared_page() safe to be called without the mmap lock
held.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reported-by: David Hildenbrand <[email protected]>
Tested-by: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Saves four implicit call to compound_head().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: ZhangPeng <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Nanyong Sun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm,thp: fix sloppy text output".
Three independent trivial patches, fixing sloppy text output which has
annoyed me; but might risk surprising a parser, so any can be dropped.
This patch (of 3):
The SysRq-m or OOM Mem-Info dmesg showed (long lines containing) ...
shmem:NkB shmem_thp: NkB shmem_pmdmapped: NkB anon_thp: NkB ...
Delete the space after the colon after shmem_thp, shmem_pmdmapped,
anon_thp: as the shmem example shows, no other fields have a space after
the colon in this output.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This sysctl has the very unusual behaviour of not allowing any user (even
CAP_SYS_ADMIN) to reduce the restriction setting, meaning that if you were
to set this sysctl to a more restrictive option in the host pidns you
would need to reboot your machine in order to reset it.
The justification given in [1] is that this is a security feature and thus
it should not be possible to disable. Aside from the fact that we have
plenty of security-related sysctls that can be disabled after being
enabled (fs.protected_symlinks for instance), the protection provided by
the sysctl is to stop users from being able to create a binary and then
execute it. A user with CAP_SYS_ADMIN can trivially do this without
memfd_create(2):
% cat mount-memfd.c
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/mount.h>
#define SHELLCODE "#!/bin/echo this file was executed from this totally private tmpfs:"
int main(void)
{
int fsfd = fsopen("tmpfs", FSOPEN_CLOEXEC);
assert(fsfd >= 0);
assert(!fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 2));
int dfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0);
assert(dfd >= 0);
int execfd = openat(dfd, "exe", O_CREAT | O_RDWR | O_CLOEXEC, 0782);
assert(execfd >= 0);
assert(write(execfd, SHELLCODE, strlen(SHELLCODE)) == strlen(SHELLCODE));
assert(!close(execfd));
char *execpath = NULL;
char *argv[] = { "bad-exe", NULL }, *envp[] = { NULL };
execfd = openat(dfd, "exe", O_PATH | O_CLOEXEC);
assert(execfd >= 0);
assert(asprintf(&execpath, "/proc/self/fd/%d", execfd) > 0);
assert(!execve(execpath, argv, envp));
}
% ./mount-memfd
this file was executed from this totally private tmpfs: /proc/self/fd/5
%
Given that it is possible for CAP_SYS_ADMIN users to create executable
binaries without memfd_create(2) and without touching the host filesystem
(not to mention the many other things a CAP_SYS_ADMIN process would be
able to do that would be equivalent or worse), it seems strange to cause a
fair amount of headache to admins when there doesn't appear to be an
actual security benefit to blocking this. There appear to be concerns
about confused-deputy-esque attacks[2] but a confused deputy that can
write to arbitrary sysctls is a bigger security issue than executable
memfds.
/* New API */
The primary requirement from the original author appears to be more based
on the need to be able to restrict an entire system in a hierarchical
manner[3], such that child namespaces cannot re-enable executable memfds.
So, implement that behaviour explicitly -- the vm.memfd_noexec scope is
evaluated up the pidns tree to &init_pid_ns and you have the most
restrictive value applied to you. The new lower limit you can set
vm.memfd_noexec is whatever limit applies to your parent.
Note that a pidns will inherit a copy of the parent pidns's effective
vm.memfd_noexec setting at unshare() time. This matches the existing
behaviour, and it also ensures that a pidns will never have its
vm.memfd_noexec setting *lowered* behind its back (but it will be raised
if the parent raises theirs).
/* Backwards Compatibility */
As the previous version of the sysctl didn't allow you to lower the
setting at all, there are no backwards compatibility issues with this
aspect of the change.
However it should be noted that now that the setting is completely
hierarchical. Previously, a cloned pidns would just copy the current
pidns setting, meaning that if the parent's vm.memfd_noexec was changed it
wouldn't propoagate to existing pid namespaces. Now, the restriction
applies recursively. This is a uAPI change, however:
* The sysctl is very new, having been merged in 6.3.
* Several aspects of the sysctl were broken up until this patchset and
the other patchset by Jeff Xu last month.
And thus it seems incredibly unlikely that any real users would run into
this issue. In the worst case, if this causes userspace isues we could
make it so that modifying the setting follows the hierarchical rules but
the restriction checking uses the cached copy.
[1]: https://lore.kernel.org/CABi2SkWnAgHK1i6iqSqPMYuNEhtHBkO8jUuCvmG3RmUB5TKHJw@mail.gmail.com/
[2]: https://lore.kernel.org/CALmYWFs_dNCzw_pW1yRAo4bGCPEtykroEQaowNULp7svwMLjOg@mail.gmail.com/
[3]: https://lore.kernel.org/CALmYWFuahdUF7cT4cm7_TGLqPanuHXJ-hVSfZt7vpTnc18DPrw@mail.gmail.com/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Aleksa Sarai <[email protected]>
Cc: Dominique Martinet <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Daniel Verkamp <[email protected]>
Cc: Jeff Xu <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In order to incentivise userspace to switch to passing MFD_EXEC and
MFD_NOEXEC_SEAL, we need to provide a warning on each attempt to call
memfd_create() without the new flags. pr_warn_once() is not useful
because on most systems the one warning is burned up during the boot
process (on my system, systemd does this within the first second of boot)
and thus userspace will in practice never see the warnings to push them to
switch to the new flags.
The original patchset[1] used pr_warn_ratelimited(), however there were
concerns about the degree of spam in the kernel log[2,3]. The resulting
inability to detect every case was flagged as an issue at the time[4].
While we could come up with an alternative rate-limiting scheme such as
only outputting the message if vm.memfd_noexec has been modified, or only
outputting the message once for a given task, these alternatives have
downsides that don't make sense given how low-stakes a single kernel
warning message is. Switching to pr_info_ratelimited() instead should be
fine -- it's possible some monitoring tool will be unhappy with a stream
of warning-level messages but there's already plenty of info-level message
spam in dmesg.
[1]: https://lore.kernel.org/[email protected]/
[2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/
[3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
[4]: https://lore.kernel.org/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Aleksa Sarai <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Daniel Verkamp <[email protected]>
Cc: Dominique Martinet <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Given the difficulty of auditing all of userspace to figure out whether
every memfd_create() user has switched to passing MFD_EXEC and
MFD_NOEXEC_SEAL flags, it seems far less distruptive to make it possible
for older programs that don't make use of executable memfds to run under
vm.memfd_noexec=2. Otherwise, a small dependency change can result in
spurious errors. For programs that don't use executable memfds, passing
MFD_NOEXEC_SEAL is functionally a no-op and thus having the same
In addition, every failure under vm.memfd_noexec=2 needs to print to the
kernel log so that userspace can figure out where the error came from.
The concerns about pr_warn_ratelimited() spam that caused the switch to
pr_warn_once()[1,2] do not apply to the vm.memfd_noexec=2 case.
This is a user-visible API change, but as it allows programs to do
something that would be blocked before, and the sysctl itself was broken
and recently released, it seems unlikely this will cause any issues.
[1]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
[2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Aleksa Sarai <[email protected]>
Cc: Dominique Martinet <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Daniel Verkamp <[email protected]>
Cc: Jeff Xu <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This removes some direct accesses to struct page, working towards
splitting out struct ptdesc from struct page.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit e774a7bc7f0a ("mm: zswap: remove page reclaim logic from
z3fold"), zpool and zpool_ops have been removed, so also remove the
corresponding comments.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiu Jianfeng <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Cc: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
We have get_pageblock_migratetype and get_pfnblock_migratetype to get
migratetype of page. get_pfnblock_migratetype accepts both page and pfn
from caller while get_pageblock_migratetype only accept page and get pfn
with page_to_pfn from page.
In case we already record pfn of page, we can simply call
get_pfnblock_migratetype to avoid a page_to_pfn.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Two minor cleanups for get pageblock migratetype".
This series contains two minor cleanups for get pageblock migratetype.
More details can be found in respective patches.
This patch (of 2):
get_pfnblock_flags_mask() just calls inline inner
__get_pfnblock_flags_mask without any extra work. Just opencode
__get_pfnblock_flags_mask in get_pfnblock_flags_mask and replace call to
__get_pfnblock_flags_mask with call to get_pfnblock_flags_mask to remove
unnecessary __get_pfnblock_flags_mask.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Just remove the redundant parameter alloc_order from
reserve_highatomic_pageblock(). No functional modification involved.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: ZhangPeng <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Nanyong Sun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This patch is based on the commit 5da226dbfce3("mm: skip CMA pages when
they are not available") which skips cma pages reclaim when they are not
eligible for the current allocation context. In mglru, such pages are
added to the tail of the immediate generation to maintain better LRU
order, which is unlike the case of conventional LRU where such pages are
directly added to the head of the LRU list(akin to adding to head of the
youngest generation in mglru).
No observable issue without this patch on MGLRU, but logically it make
sense to skip the CMA page reclaim when those pages can't be satisfied for
the current allocation context.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Charan Teja Kalla <[email protected]>
Reviewed-by: Kalesh Singh <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: Zhaoyang Huang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Parameter pgdat is not used in fragmentation_score_wmark. Just remove it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
We get batch from pcp and just pass it to nr_pcp_free immediately. Get
batch from pcp inside nr_pcp_free to remove unnecessary parameter batch.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Two minor cleanups for pcp list in page_alloc".
There are two minor cleanups for pcp list in page_alloc. More details
can be found in respective patches.
This patch (of 2):
After commit fd56eef258a17 ("mm/page_alloc: simplify how many pages are
selected per pcp list during bulk free"), we will drain all pages in
selected pcp list. And we ensured passed count is < pcp->count. Then,
the search will finish before wrap-around and track of active PCP lists
range intended for wrap-around case is no longer needed.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
computing them again when we remove a memory block, embed vmem_altmap
details in struct memory_block if we are using memmap on memory block
feature.
[[email protected]: fix error return code in add_memory_resource()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
pageblocks
Currently, memmap_on_memory feature is only supported with memory block
sizes that result in vmemmap pages covering full page blocks. This is
because memory onlining/offlining code requires applicable ranges to be
pageblock-aligned, for example, to set the migratetypes properly.
This patch helps to lift that restriction by reserving more pages than
required for vmemmap space. This helps the start address to be page block
aligned with different memory block sizes. Using this facility implies
the kernel will be reserving some pages for every memoryblock. This
allows the memmap on memory feature to be widely useful with different
memory block size values.
For ex: with 64K page size and 256MiB memory block size, we require 4
pages to map vmemmap pages, To align things correctly we end up adding a
reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Some architectures would want different restrictions. Hence add an
architecture-specific override.
The PMD_SIZE check is moved there.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
If not supported, fallback to not using memap on memmory. This avoids
the need for callers to do the fallback.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Add support for memmap on memory feature on ppc64", v8.
This patch series update memmap on memory feature to fall back to
memmap allocation outside the memory block if the alignment rules are
not met. This makes the feature more useful on architectures like
ppc64 where alignment rules are different with 64K page size.
This patch (of 6):
Instead of adding menu entry with all supported architectures, add
mm/Kconfig variable and select the same from supported architectures.
No functional change in this patch.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Vishal Verma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The check for root memcg will be done in wb_get_lookup(), so remove the
redundant one to simplify the code.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jinliang Zheng <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit 0bb488498c98 ("mm: zswap: remove zswap_header"), the 'offset'
has been replaced by swpentry, update the comment for it, and also add
comment for 'objcg'.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiu Jianfeng <[email protected]>
Reviewed-by: Yosry Ahmed <[email protected]>
Acked-by: Nhat Pham <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
It is better to not expose too many internal variables of memtest,
add a helper memtest_report_meminfo() to show memtest results.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Tomas Mudrunka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
It's more readable to use helper macro BITS_PER_LONG and BITS_PER_BYTE.
No functional change intended.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Rapoport (IBM) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
It's more convenient to use helper macro llist_for_each_entry_safe().
No functional change intended.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Otherwise the kernel ends up with multiple copies:
$ nm vmlinux | grep dummy_vm_ops
ffffffff81e4ea00 d dummy_vm_ops.2
ffffffff81e11760 d dummy_vm_ops.254
ffffffff81e406e0 d dummy_vm_ops.4
ffffffff81e3c780 d dummy_vm_ops.7
While here prefix it with vma_.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mateusz Guzik <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
vma_prepare() is currently the central place where vmas are being locked
before vma_complete() applies changes to them. While this is convenient,
it also obscures vma locking and makes it harder to follow the locking
rules. Move vma locking out of vma_prepare() and take vma locks
explicitly at the locations where vmas are being modified. Move vma
locking and replace it with an assertion inside dup_anon_vma() to further
clarify the locking pattern inside vma_merge().
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Liam R. Howlett <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Cc: Jann Horn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
While it's not strictly necessary to lock a newly created vma before
adding it into the vma tree (as long as no further changes are performed
to it), it seems like a good policy to lock it and prevent accidental
changes after it becomes visible to the page faults. Lock the vma before
adding it into the vma tree.
[[email protected]: fix reject fixing in vma_link(), per Jann]
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Jann Horn <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Jann Horn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Implicit vma locking inside vm_flags_reset() and vm_flags_reset_once() is
not obvious and makes it hard to understand where vma locking is happening.
Also in some cases (like in dup_userfaultfd()) vma should be locked earlier
than vma_flags modification. To make locking more visible, change these
functions to assert that the vma write lock is taken and explicitly lock
the vma beforehand. Fix userfaultfd functions which should lock the vma
earlier.
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Vma write lock assertion always includes mmap write lock assertion and
additional vma lock checks when per-VMA locks are enabled. Replace
weaker mmap_assert_write_locked() assertions with stronger
vma_assert_write_locked() ones when we are operating on a vma which
is expected to be locked.
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Jann Horn <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use helper macro K() to improve code readability. No functional
modification involved.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: ZhangPeng <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Nanyong Sun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use helper macro K() to improve code readability. No functional
modification involved.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: ZhangPeng <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Nanyong Sun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use helper macro K() to improve code readability. No functional
modification involved.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: ZhangPeng <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Nanyong Sun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use helper macro K() to improve code readability. No functional
modification involved.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: ZhangPeng <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Nanyong Sun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use helper macro K() to improve code readability. No functional
modification involved.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: ZhangPeng <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Nanyong Sun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use helper macro K() to improve code readability. No functional
modification involved.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: ZhangPeng <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Nanyong Sun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "cleanup with helper macro K()".
Use helper macro K() to improve code readability. No functional
modification involved. Remove redundant K() macro definition.
This patch (of 7):
Since commit eb8589b4f8c1 ("mm: move mem_init_print_info() to mm_init.c"),
the K() macro definition has been moved to mm/internal.h. Therefore, the
definitions in mm/memcontrol.c, mm/backing-dev.c and mm/oom_kill.c are
redundant. Drop redundant definitions.
[[email protected]: oom_kill.c: remove "#undef K", per Kefeng]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: ZhangPeng <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Nanyong Sun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
For system with kernelcore=mirror enabled while no mirrored memory is
reported by efi. This could lead to kernel OOM during startup since all
memory beside zone DMA are in the movable zone and this prevents the
kernel to use it.
Zone DMA/DMA32 initialization is independent of mirrored memory and their
max pfn is set in zone_sizes_init(). Since kernel can fallback to zone
DMA/DMA32 if there is no memory in zone Normal, these zones are seen as
mirrored memory no mather their memory attributes are.
To solve this problem, disable kernelcore=mirror when there is no real
mirrored memory exists.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ma Wupeng <[email protected]>
Suggested-by: Kefeng Wang <[email protected]>
Suggested-by: Mike Rapoport <[email protected]>
Reviewed-by: Mike Rapoport (IBM) <[email protected]>
Reviewed-by: Kefeng Wang <[email protected]>
Cc: Levi Yun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Keep the same logic as update_pageblock_skip, only set skip if
no_set_skip_hint is false which is more reasonable.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Remove unnecessary return for void function
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|