| Age | Commit message (Collapse) | Author | Files | Lines |
|
instead of PageReserved()
We currently initialize the memmap such that PG_reserved is set and the
refcount of the page is 1. In virtio-mem code, we have to manually clear
that PG_reserved flag to make memory offlining with partially hotplugged
memory blocks possible: has_unmovable_pages() would otherwise bail out on
such pages.
We want to avoid PG_reserved where possible and move to typed pages
instead. Further, we want to further enlighten memory offlining code
about PG_offline: offline pages in an online memory section. One example
is handling managed page count adjustments in a cleaner way during memory
offlining.
So let's initialize the pages with PG_offline instead of PG_reserved.
generic_online_page()->__free_pages_core() will now clear that flag before
handing that memory to the buddy.
Note that the page refcount is still 1 and would forbid offlining of such
memory except when special care is take during GOING_OFFLINE as currently
only implemented by virtio-mem.
With this change, we can now get non-PageReserved() pages in the XEN
balloon list. From what I can tell, that can already happen via
decrease_reservation(), so that should be fine.
HV-balloon should not really observe a change: partial online memory
blocks still cannot get surprise-offlined, because the refcount of these
PageOffline() pages is 1.
Update virtio-mem, HV-balloon and XEN-balloon code to be aware that
hotplugged pages are now PageOffline() instead of PageReserved() before
they are handed over to the buddy.
We'll leave the ZONE_DEVICE case alone for now.
Note that self-hosted vmemmap pages will no longer be marked as
reserved. This matches ordinary vmemmap pages allocated from the buddy
during memory hotplug. Now, really only vmemmap pages allocated from
memblock during early boot will be marked reserved. Existing
PageReserved() checks seem to be handling all relevant cases correctly
even after this change.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Oscar Salvador <[email protected]> [generic memory-hotplug bits]
Cc: Alexander Potapenko <[email protected]>
Cc: Dexuan Cui <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Eugenio Pérez <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Oleksandr Tyshchenko <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Xuan Zhuo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There are no more users of page_mkclean(), remove it and update the
document and comment.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
After the last user of page_maybe_dma_pinned() is converted to
folio_maybe_dma_pinned(), remove page_maybe_dma_pinned() and update the
document and comment.
[[email protected]: fix pin_user_pages.rst underlining]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Let's simply reinitialize the page->_mapcount directly. We can now get
rid of page_mapcount_reset().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Tested-by: Sergey Senozhatsky <[email protected]> [zram/zsmalloc workloads]
Cc: Hyeonggon Yoo <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Let's clean it up: use a proper page type and store our data (offset into
a page) in the lower 16 bit as documented.
We won't be able to support 256 KiB base pages, which is acceptable.
Teach Kconfig to handle that cleanly using a new CONFIG_HAVE_ZSMALLOC.
Based on this, we should do a proper "struct zsdesc" conversion, as
proposed in [1].
This removes the last _mapcount/page_type offender.
[1] https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Tested-by: Sergey Senozhatsky <[email protected]> [zram/zsmalloc workloads]
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
As long as the owner sets a page type first, we can allow reuse of the
lower 16 bit: sufficient to store an offset into a 64 KiB page, which is
the maximum base page size in *common* configurations (ignoring the 256
KiB variant). Restrict it to the head page.
We'll use that for zsmalloc next, to set a proper type while still reusing
that field to store information (offset into a base page) that cannot go
elsewhere for now.
Let's reserve the lower 16 bit for that purpose and for catching mapcount
underflows, and let's reduce PAGE_TYPE_BASE to a single bit.
Note that we will still have to overflow the mapcount quite a lot until we
would actually indicate a valid page type.
Start handing out the type bits from highest to lowest, to make it clearer
how many bits for types we have left. Out of 15 bit we can use for types,
we currently use 6. If we run out of bits before we have better typing
(e.g., memdesc), we can always investigate storing a value instead [1].
[1] https://lore.kernel.org/all/[email protected]/
[[email protected]: fix PG_hugetlb typo, per David]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Tested-by: Sergey Senozhatsky <[email protected]> [zram/zsmalloc workloads]
Cc: Hyeonggon Yoo <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm: page_type, zsmalloc and page_mapcount_reset()", v2.
Wanting to remove the remaining abuser of _mapcount/page_type along with
page_mapcount_reset(), I stumbled over zsmalloc, which is yet to be
converted away from "struct page" [1].
Unfortunately, we cannot stop using the page_type field in zsmalloc code
completely for its own purposes. All other fields in "struct page" are
used one way or the other. Could we simply store a 2-byte offset value at
the beginning of each page? Likely, but that will require a bit more
work; and once we have memdesc we might want to move the offset in there
(struct zsalloc?) again.
... but we can limit the abuse to 16 bit, glue it to a page type that
must be set, and document it. page_has_type() will always successfully
indicate such zsmalloc pages, and such zsmalloc pages only.
We lose zsmalloc support for PAGE_SIZE > 64KB, which should be tolerable.
We could use more bits from the page type, but 16 bit sounds like a good
idea for now.
So clarify the _mapcount/page_type documentation, use a proper page_type
for zsmalloc, and remove page_mapcount_reset().
[1] https://lore.kernel.org/all/[email protected]/
This patch (of 6):
Let's make it clearer that _mapcount must no longer be used for own
purposes, and how _mapcount and page_type behaves nowadays (also in the
context of hugetlb folios, which are typed folios that will be mapped to
user space).
Move the documentation regarding "-1" over from page_mapcount_reset(),
which we will remove next. Move "page_type" before "mapcount", to make it
clearer what typed folios are.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Tested-by: Sergey Senozhatsky <[email protected]> [zram/zsmalloc workloads]
Cc: Hyeonggon Yoo <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Implement functions for supporting online DAMON context level parameters
update. The function receives two DAMON context structs. One is the
struct that currently being used by a kdamond and therefore to be updated.
The other one contains the parameters to be applied to the first one.
The function applies the new parameters to the destination struct while
keeping/updating the internal status and operation results. The function
should be called from DAMON context-update-safe place, like DAMON
callbacks.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm/damon: introduce DAMON parameters online commit function".
DAMON context struct (damon_ctx) contains user requests (parameters),
internal status, and operation results. For flexible usages, DAMON API
users are encouraged to manually manipulate the struct. That works well
for simple use cases. However, it has turned out that it is not that
simple at least for online parameters udpate. It is easy to forget
properly maintaining internal status and operation results. Also, such
manual manipulation for online tuning is implemented multiple times on
DAMON API users including DAMON sysfs interface, DAMON_RECLAIM and
DAMON_LRU_SORT. As a result, we have multiple sources of bugs for same
problem. Actually we found and fixed a few bugs from online parameter
updating of DAMON API users.
Implement a function for online DAMON parameters update in core layer, and
replace DAMON API users' manual manipulation code for the use case. The
core layer function could still have bugs, but this change reduces the
source of bugs for the problem to one place.
This patch (of 12):
Implement functions for supporting online DAMOS quota goals parameters
update. The function receives two DAMOS quota structs. One is the struct
that currently being used by a kdamond and therefore to be updated. The
other one contains the parameters to be applied to the first one. The
function applies the new parameters to the destination struct while
keeping/updating the internal status. The function should be called from
parameters-update safe place, like DAMON callbacks.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This patch introduces DAMOS_MIGRATE_HOT action, which is similar to
DAMOS_MIGRATE_COLD, but proritizes hot pages.
It migrates pages inside the given region to the 'target_nid' NUMA node
in the sysfs.
Here is one of the example usage of this 'migrate_hot' action.
$ cd /sys/kernel/mm/damon/admin/kdamonds/<N>
$ cat contexts/<N>/schemes/<N>/action
migrate_hot
$ echo 0 > contexts/<N>/schemes/<N>/target_nid
$ echo commit > state
$ numactl -p 2 ./hot_cold 500M 600M &
$ numastat -c -p hot_cold
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Total
-------------- ------ ------ ------ -----
701 (hot_cold) 501 0 601 1101
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hyeongtak Ji <[email protected]>
Signed-off-by: Honggyu Kim <[email protected]>
Signed-off-by: SeongJae Park <[email protected]>
Cc: Gregory Price <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Masami Hiramatsu (Google) <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Rakie Kim <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This patch introduces DAMOS_MIGRATE_COLD action, which is similar to
DAMOS_PAGEOUT, but migrate folios to the given 'target_nid' in the sysfs
instead of swapping them out.
The 'target_nid' sysfs knob informs the migration target node ID.
Here is one of the example usage of this 'migrate_cold' action.
$ cd /sys/kernel/mm/damon/admin/kdamonds/<N>
$ cat contexts/<N>/schemes/<N>/action
migrate_cold
$ echo 2 > contexts/<N>/schemes/<N>/target_nid
$ echo commit > state
$ numactl -p 0 ./hot_cold 500M 600M &
$ numastat -c -p hot_cold
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Total
-------------- ------ ------ ------ -----
701 (hot_cold) 501 0 601 1101
Since there are some common routines with pageout, many functions have
similar logics between pageout and migrate cold.
damon_pa_migrate_folio_list() is a minimized version of
shrink_folio_list().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Honggyu Kim <[email protected]>
Signed-off-by: Hyeongtak Ji <[email protected]>
Signed-off-by: SeongJae Park <[email protected]>
Cc: Gregory Price <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Masami Hiramatsu (Google) <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Rakie Kim <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The current patch series introduces DAMON based migration across NUMA
nodes so it'd be better to have a new migrate_reason in trace events.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Honggyu Kim <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: SeongJae Park <[email protected]>
Cc: Gregory Price <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Hyeongtak Ji <[email protected]>
Cc: Masami Hiramatsu (Google) <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Rakie Kim <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This patch adds target_nid under
/sys/kernel/mm/damon/admin/kdamonds/<N>/contexts/<N>/schemes/<N>/
The 'target_nid' can be used as the destination node for DAMOS actions
such as DAMOS_MIGRATE_{HOT,COLD} in the follow up patches.
[[email protected]: document target_nid file]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hyeongtak Ji <[email protected]>
Signed-off-by: Honggyu Kim <[email protected]>
Signed-off-by: SeongJae Park <[email protected]>
Cc: Gregory Price <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Masami Hiramatsu (Google) <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Rakie Kim <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There are some functions only used inside mm. Move them into internal.h.
No functional change intended.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit 46df8e73a4a3 ("mm: free up PG_slab"), MF_MSG_SLAB becomes
unused. Remove it. No functional change intended.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The previous patch has already reconstructed the cftype attributes based
on the templates and saved them in dfl_cftypes and legacy_cftypes. then
remove the old procedure and switch to the new cftypes.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiu Jianfeng <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Unlike other cgroup subsystems, the hugetlb cgroup does not provide a
static array of cftype that explicitly displays the properties, handling
functions, etc., of each file. Instead, it dynamically creates the
attribute of cftypes based on the hstate during the startup procedure.
This reduces the readability of the code.
To fix this issue, introduce two templates of cftypes, and rebuild the
attributes according to the hstate to make it ready to be added to cgroup
framework.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiu Jianfeng <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: kernel test robot <[email protected]>
From: Xiu Jianfeng <[email protected]>
Subject: mm/hugetlb_cgroup: register lockdep key for cftype
Date: Tue, 18 Jun 2024 07:19:22 +0000
When CONFIG_DEBUG_LOCK_ALLOC is enabled, the following commands can
trigger a bug,
mount -t cgroup2 none /sys/fs/cgroup
cd /sys/fs/cgroup
echo "+hugetlb" > cgroup.subtree_control
The log is as below:
BUG: key ffff8880046d88d8 has not been registered!
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 3 PID: 226 at kernel/locking/lockdep.c:4945 lockdep_init_map_type+0x185/0x220
Modules linked in:
CPU: 3 PID: 226 Comm: bash Not tainted 6.10.0-rc4-next-20240617-g76db4c64526c #544
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:lockdep_init_map_type+0x185/0x220
Code: 00 85 c0 0f 84 6c ff ff ff 8b 3d 6a d1 85 01 85 ff 0f 85 5e ff ff ff 48 c7 c6 21 99 4a 82 48 c7 c7 60 29 49 82 e8 3b 2e f5
RSP: 0018:ffffc9000083fc30 EFLAGS: 00000282
RAX: 0000000000000000 RBX: ffffffff828dd820 RCX: 0000000000000027
RDX: ffff88803cd9cac8 RSI: 0000000000000001 RDI: ffff88803cd9cac0
RBP: ffff88800674fbb0 R08: ffffffff828ce248 R09: 00000000ffffefff
R10: ffffffff8285e260 R11: ffffffff828b8eb8 R12: ffff8880046d88d8
R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880067281c0
FS: 00007f68601ea740(0000) GS:ffff88803cd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005614f3ebc740 CR3: 000000000773a000 CR4: 00000000000006f0
Call Trace:
<TASK>
? __warn+0x77/0xd0
? lockdep_init_map_type+0x185/0x220
? report_bug+0x189/0x1a0
? handle_bug+0x3c/0x70
? exc_invalid_op+0x18/0x70
? asm_exc_invalid_op+0x1a/0x20
? lockdep_init_map_type+0x185/0x220
__kernfs_create_file+0x79/0x100
cgroup_addrm_files+0x163/0x380
? find_held_lock+0x2b/0x80
? find_held_lock+0x2b/0x80
? find_held_lock+0x2b/0x80
css_populate_dir+0x73/0x180
cgroup_apply_control_enable+0x12f/0x3a0
cgroup_subtree_control_write+0x30b/0x440
kernfs_fop_write_iter+0x13a/0x1f0
vfs_write+0x341/0x450
ksys_write+0x64/0xe0
do_syscall_64+0x4b/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f68602d9833
Code: 8b 15 61 26 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 08
RSP: 002b:00007fff9bbdf8e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f68602d9833
RDX: 0000000000000009 RSI: 00005614f3ebc740 RDI: 0000000000000001
RBP: 00005614f3ebc740 R08: 000000000000000a R09: 0000000000000008
R10: 00005614f3db6ba0 R11: 0000000000000246 R12: 0000000000000009
R13: 00007f68603bd6a0 R14: 0000000000000009 R15: 00007f68603b8880
For lockdep, there is a sanity check in lockdep_init_map_type(), the
lock-class key must either have been allocated statically or must
have been registered as a dynamic key. However the commit e18df2889ff9
("mm/hugetlb_cgroup: prepare cftypes based on template") has changed
the cftypes from static allocated objects to dynamic allocated objects,
so the cft->lockdep_key must be registered proactively.
[[email protected]: fix BUG()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Fixes: e18df2889ff9 ("mm/hugetlb_cgroup: prepare cftypes based on template")
Signed-off-by: Xiu Jianfeng <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Tested-by: Marek Szyprowski <[email protected]>
Tested-by: SeongJae Park <[email protected]>
Closes: https://lore.kernel.org/[email protected]
Cc: Muchun Song <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Today, we do not have any observability of per-page metadata and how much
it takes away from the machine capacity. Thus, we want to describe the
amount of memory that is going towards per-page metadata, which can vary
depending on build configuration, machine architecture, and system use.
This patch adds 2 fields to /proc/vmstat that can used as shown below:
Accounting per-page metadata allocated by boot-allocator:
/proc/vmstat:nr_memmap_boot * PAGE_SIZE
Accounting per-page metadata allocated by buddy-allocator:
/proc/vmstat:nr_memmap * PAGE_SIZE
Accounting total Perpage metadata allocated on the machine:
(/proc/vmstat:nr_memmap_boot +
/proc/vmstat:nr_memmap) * PAGE_SIZE
Utility for userspace:
Observability: Describe the amount of memory overhead that is going to
per-page metadata on the system at any given time since this overhead is
not currently observable.
Debugging: Tracking the changes or absolute value in struct pages can help
detect anomalies as they can be correlated with other metrics in the
machine (e.g., memtotal, number of huge pages, etc).
page_ext overheads: Some kernel features such as page_owner
page_table_check that use page_ext can be optionally enabled via kernel
parameters. Having the total per-page metadata information helps users
precisely measure impact. Furthermore, page-metadata metrics will reflect
the amount of struct pages reliquished (or overhead reduced) when
hugetlbfs pages are reserved which will vary depending on whether hugetlb
vmemmap optimization is enabled or not.
For background and results see:
lore.kernel.org/all/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sourav Panda <[email protected]>
Acked-by: David Rientjes <[email protected]>
Reviewed-by: Pasha Tatashin <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Chen Linxuan <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ivan Babrou <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tomas Mudrunka <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Xu <[email protected]>
Cc: Yang Yang <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add zswap_never_enabled() to skip the xarray lookup in zswap_load() if
zswap was never enabled on the system. It is implemented using static
branches for efficiency, as enabling zswap should be a rare event. This
could shave some cycles off zswap_load() when CONFIG_ZSWAP is used but
zswap is never enabled.
However, the real motivation behind this patch is two-fold:
- Incoming large folio swapin work will need to fallback to order-0
folios if zswap was ever enabled, because any part of the folio could be
in zswap, until proper handling of large folios with zswap is added.
- A warning and recovery attempt will be added in a following change in
case the above was not done incorrectly. Zswap will fail the read if
the folio is large and it was ever enabled.
Expose zswap_never_enabled() in the header for the swapin work to use
it later.
[[email protected]: expose zswap_never_enabled() in the header]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yosry Ahmed <[email protected]>
Reviewed-by: Nhat Pham <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Chengming Zhou <[email protected]>
Cc: Chris Li <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In preparation for introducing a similar function, rename
is_zswap_enabled() to use zswap_* prefix like other zswap functions.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yosry Ahmed <[email protected]>
Reviewed-by: Barry Song <[email protected]>
Reviewed-by: Nhat Pham <[email protected]>
Cc: Chengming Zhou <[email protected]>
Cc: Chris Li <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When the user no longer requires the pages, they would use
madvise(MADV_FREE) to mark the pages as lazy free. Subsequently, they
typically would not re-write to that memory again.
During memory reclaim, if we detect that the large folio and its PMD are
both still marked as clean and there are no unexpected references (such as
GUP), so we can just discard the memory lazily, improving the efficiency
of memory reclamation in this case.
On an Intel i5 CPU, reclaiming 1GiB of lazyfree THPs using
mem_cgroup_force_empty() results in the following runtimes in seconds
(shorter is better):
--------------------------------------------
| Old | New | Change |
--------------------------------------------
| 0.683426 | 0.049197 | -92.80% |
--------------------------------------------
[[email protected]: minor changes per David]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Lance Yang <[email protected]>
Suggested-by: Zi Yan <[email protected]>
Suggested-by: David Hildenbrand <[email protected]>
Cc: Bang Li <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: Jeff Xie <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yin Fengwei <[email protected]>
Cc: Zach O'Keefe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
folios, start the pagewalk first, then call split_huge_pmd_address() to
split the folio.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Lance Yang <[email protected]>
Suggested-by: David Hildenbrand <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Suggested-by: Baolin Wang <[email protected]>
Acked-by: Zi Yan <[email protected]>
Cc: Bang Li <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: Jeff Xie <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yin Fengwei <[email protected]>
Cc: Zach O'Keefe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
It looks rather weird that totalhigh_pages() returns an "unsigned long"
but nr_free_highpages() returns an "unsigned int".
Let's return an "unsigned long" from nr_free_highpages() to be consistent.
While at it, use a plain "0" instead of a "0UL" in the !CONFIG_HIGHMEM
totalhigh_pages() implementation, to make these look alike as well.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm/highmem: don't track highmem pages manually".
Let's remove highmem special-casing from adjust_managed_page_count(), to
result in less confusion why memblock manually adjusts totalram_pages, and
__free_pages_core() only adjusts the zone's managed pages -- what about
the highmem pages that adjust_managed_page_count() updates?
Now, we only maintain totalram_pages and a zone's managed pages
independent of highmem support. We can derive the number of highmem pages
simply by looking at the relevant zone's managed pages. I don't think
there is any particular fast path that needs a maximum-efficient
totalhigh_pages() implementation.
Note that highmem memory is currently initialized using
free_highmem_page()->free_reserved_page(), not __free_pages_core(). In
the future we might want to also use __free_pages_core() to initialize
highmem memory, to make that less special, and consider moving
totalram_pages updates into __free_pages_core() [1], so we can just use
adjust_managed_page_count() in there as well.
Booting a simple kernel in QEMU reveals no highmem accounting change:
Before:
Memory: 3095448K/3145208K available (14802K kernel code, 2073K rwdata,
5000K rodata, 740K init, 556K bss, 49760K reserved, 0K cma-reserved,
2244488K highmem)
After:
Memory: 3095276K/3145208K available (14802K kernel code, 2073K rwdata,
5000K rodata, 740K init, 556K bss, 49932K reserved, 0K cma-reserved,
2244488K highmem)
[1] https://lkml.kernel.org/r/[email protected]
This patch (of 2):
Can we get rid of the highmem ifdef in adjust_managed_page_count()?
Likely yes: we don't have that many totalhigh_pages() users, and they all
don't seem to be very performance critical.
So let's implement totalhigh_pages() like nr_free_highpages(), collecting
information from all zones. This is now similar to what we do in
si_meminfo_node() to collect the per-node highmem page count.
In the common case (single node, 3-4 zones), we really shouldn't care. We
could optimize a bit further (only walk ZONE_HIGHMEM and ZONE_MOVABLE if
required), but there doesn't seem a real need for that.
[[email protected]: fix build bot complaint]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
... and rename it to folio_precise_page_mapcount(). fs/proc is the last
remaining user, and that should stay that way.
While at it, cleanup kpagecount_read() a bit: there are still some legacy
leftovers -- when the interface was introduced it returned the page
refcount, but was changed briefly afterwards to return the page mapcount.
Further, some simple folio conversion.
Once we stop using the per-page mapcounts of large folios, all
folio_precise_page_mapcount() users will have to implement an alternative
way to achieve what they are trying to achieve, possibly in a less precise
way.
[[email protected]: fix uninitialized variable in pagemap_pmd_range()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Lance Yang <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add mTHP counters for anonymous shmem.
[[email protected]: update Documentation/admin-guide/mm/transhuge.rst]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/4fd9e467d49ae4a747e428bcd821c7d13125ae67.1718090413.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <[email protected]>
Reviewed-by: Lance Yang <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Daniel Gomez <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Pankaj Raghav <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Commit 19eaf44954df adds multi-size THP (mTHP) for anonymous pages, that
can allow THP to be configured through the sysfs interface located at
'/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.
However, the anonymous shmem will ignore the anonymous mTHP rule
configured through the sysfs interface, and can only use the PMD-mapped
THP, that is not reasonable. Users expect to apply the mTHP rule for all
anonymous pages, including the anonymous shmem, in order to enjoy the
benefits of mTHP. For example, lower latency than PMD-mapped THP, smaller
memory bloat than PMD-mapped THP, contiguous PTEs on ARM architecture to
reduce TLB miss etc. In addition, the mTHP interfaces can be extended to
support all shmem/tmpfs scenarios in the future, especially for the shmem
mmap() case.
The primary strategy is similar to supporting anonymous mTHP. Introduce a
new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled',
which can have almost the same values as the top-level
'/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
additional "inherit" option and dropping the testing options 'force' and
'deny'. By default all sizes will be set to "never" except PMD size,
which is set to "inherit". This ensures backward compatibility with the
anonymous shmem enabled of the top level, meanwhile also allows
independent control of anonymous shmem enabled for each mTHP.
Link: https://lkml.kernel.org/r/65796c1e72e51e15f3410195b5c2d5b6c160d411.1718090413.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Daniel Gomez <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Lance Yang <[email protected]>
Cc: Pankaj Raghav <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
To support the use of mTHP with anonymous shmem, add a new sysfs interface
'shmem_enabled' in the '/sys/kernel/mm/transparent_hugepage/hugepages-kB/'
directory for each mTHP to control whether shmem is enabled for that mTHP,
with a value similar to the top level 'shmem_enabled', which can be set
to: "always", "inherit (to inherit the top level setting)", "within_size",
"advise", "never". An 'inherit' option is added to ensure compatibility
with these global settings, and the options 'force' and 'deny' are
dropped, which are rather testing artifacts from the old ages.
By default, PMD-sized hugepages have enabled="inherit" and all other
hugepage sizes have enabled="never" for
'/sys/kernel/mm/transparent_hugepage/hugepages-xxkB/shmem_enabled'.
In addition, if top level value is 'force', then only PMD-sized hugepages
have enabled="inherit", otherwise configuration will be failed and vice
versa. That means now we will avoid using non-PMD sized THP to override
the global huge allocation.
[[email protected]: fix transhuge.rst indentation]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: reflow transhuge.rst addition to 80 cols]
[[email protected]: move huge_shmem_orders_lock under CONFIG_SYSFS]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: huge_memory.c needs mm_types.h]
Link: https://lkml.kernel.org/r/ffddfa8b3cb4266ff963099ab78cfd7184c57ac7.1718090413.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Daniel Gomez <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Lance Yang <[email protected]>
Cc: Pankaj Raghav <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add __this_cpu_try_cmpxchg() version of the percpu op.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uros Bizjak <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Kernel test robot reported [1] performance regression for will-it-scale
test suite's page_fault2 test case for the commit 70a64b7919cb ("memcg:
dynamically allocate lruvec_stats"). After inspection it seems like the
commit has unintentionally introduced false cache sharing.
After the commit the fields of mem_cgroup_per_node which get read on the
performance critical path share the cacheline with the fields which get
updated often on LRU page allocations or deallocations. This has caused
contention on that cacheline and the workloads which manipulates a lot of
LRU pages are regressed as reported by the test report.
The solution is to rearrange the fields of mem_cgroup_per_node such that
the false sharing is eliminated. Let's move all the read only pointers at
the start of the struct, followed by memcg-v1 only fields and at the end
fields which get updated often.
Experiment setup: Ran fallocate1, fallocate2, page_fault1, page_fault2 and
page_fault3 from the will-it-scale test suite inside a three level memcg
with /tmp mounted as tmpfs on two different machines, one a single numa
node and the other one, two node machine.
$ ./[testcase]_processes -t $NR_CPUS -s 50
Results for single node, 52 CPU machine:
Testcase base with-patch
fallocate1 1031081 1431291 (38.80 %)
fallocate2 1029993 1421421 (38.00 %)
page_fault1 2269440 3405788 (50.07 %)
page_fault2 2375799 3572868 (50.30 %)
page_fault3 28641143 28673950 ( 0.11 %)
Results for dual node, 80 CPU machine:
Testcase base with-patch
fallocate1 2976288 3641185 (22.33 %)
fallocate2 2979366 3638181 (22.11 %)
page_fault1 6221790 7748245 (24.53 %)
page_fault2 6482854 7847698 (21.05 %)
page_fault3 28804324 28991870 ( 0.65 %)
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 70a64b7919cb ("memcg: dynamically allocate lruvec_stats")
Signed-off-by: Shakeel Butt <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Yosry Ahmed <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Yin Fengwei <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
By using a folio in scan_movable_pages() we convert the last user of the
page-based hugetlb information macro functions to the folio version.
After this conversion, we can safely remove the page-based definitions
from include/linux/hugetlb.h.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Should do_swap_page() have the capability to directly map a large folio,
metadata restoration becomes necessary for a specified number of pages
denoted as nr. It's important to highlight that metadata restoration is
solely required by the SPARC platform, which, however, does not enable
THP_SWAP. Consequently, in the present kernel configuration, there exists
no practical scenario where users necessitate the restoration of nr
metadata. Platforms implementing THP_SWAP might invoke this function with
nr values exceeding 1, subsequent to do_swap_page() successfully mapping
an entire large folio. Nonetheless, their arch_do_swap_page_nr()
functions remain empty.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Barry Song <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Reviewed-by: Khalid Aziz <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Chris Li <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Chuanhua Han <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
To streamline maintenance efforts, we propose removing the implementation
of swap_free(). Instead, we can simply invoke swap_free_nr() with nr set
to 1. swap_free_nr() is designed with a bitmap consisting of only one
long, resulting in overhead that can be ignored for cases where nr equals
1.
A prime candidate for leveraging swap_free_nr() lies within
kernel/power/swap.c. Implementing this change facilitates the adoption of
batch processing for hibernation.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Barry Song <[email protected]>
Suggested-by: "Huang, Ying" <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Acked-by: Chris Li <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Chuanhua Han <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "large folios swap-in: handle refault cases first", v5.
This patchset is extracted from the large folio swapin series[1],
primarily addressing the handling of scenarios involving large folios in
the swap cache. Currently, it is particularly focused on addressing the
refaulting of mTHP, which is still undergoing reclamation. This approach
aims to streamline code review and expedite the integration of this
segment into the MM tree.
It relies on Ryan's swap-out series[2], leveraging the helper function
swap_pte_batch() introduced by that series.
Presently, do_swap_page only encounters a large folio in the swap cache
before the large folio is released by vmscan. However, the code should
remain equally useful once we support large folio swap-in via
swapin_readahead(). This approach can effectively reduce page faults and
eliminate most redundant checks and early exits for MTE restoration in
recent MTE patchset[3].
The large folio swap-in for SWP_SYNCHRONOUS_IO and swapin_readahead() will
be split into separate patch sets and sent at a later time.
[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/
[3] https://lore.kernel.org/linux-mm/[email protected]/
This patch (of 6):
While swapping in a large folio, we need to free swaps related to the
whole folio. To avoid frequently acquiring and releasing swap locks, it
is better to introduce an API for batched free. Furthermore, this new
function, swap_free_nr(), is designed to efficiently handle various
scenarios for releasing a specified number, nr, of swap entries.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Chuanhua Han <[email protected]>
Co-developed-by: Barry Song <[email protected]>
Signed-off-by: Barry Song <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Acked-by: Chris Li <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Commit 2916ecc0f9d4 ("mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY")
introduce a new MIGRATE_SYNC_NO_COPY mode to allow to offload the copy to
a device DMA engine, which is only used __migrate_device_pages() to decide
whether or not copy the old page, and the MIGRATE_SYNC_NO_COPY mode only
set in hmm, as the MIGRATE_SYNC_NO_COPY set is removed by previous
cleanup, it seems that we could remove the unnecessary
MIGRATE_SYNC_NO_COPY.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Reviewed-by: Jane Chu <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jiaqi Yan <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
migrate_folio_extra() is only called in migrate.c now, convert it a static
function and take a new src_private argument which could be shared by
migrate_folio() and filemap_migrate_folio() to simplify code a bit.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Reviewed-by: Jane Chu <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jiaqi Yan <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This are no users since commit 40d707f33db5 ("mm/ksm: use folio in
write_protect_page"), so remove DEFINE_PAGE_VMA_WALK().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
All callers are now converted, delete this compatibility wrapper. Also
fix up some comments which referred to page_mapping.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Sidhartha Kumar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The page_memcg() only called by mod_memcg_page_state(), so squash it to
cleanup page_memcg().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
x86_64 is already using the node's cpu as maximum threads. Make that the
default for all archs setting DEFERRED_STRUCT_PAGE_INIT.
This returns to the behavior prior making the function arch-specific with
commit ecd096506922 ("mm: make deferred init's max threads
arch-specific").
Setting DEFERRED_STRUCT_PAGE_INIT and testing on a few arm64 platforms
shows faster deferred_init_memmap completions:
| | x13s | SA8775p-ride | Ampere R137-P31 | Ampere HR330 |
| | Metal, 32GB | VM, 36GB | VM, 58GB | Metal, 128GB |
| | 8cpus | 8cpus | 8cpus | 32cpus |
|---------|-------------|--------------|-----------------|--------------|
| threads | ms (%) | ms (%) | ms (%) | ms (%) |
|---------|-------------|--------------|-----------------|--------------|
| 1 | 108 (0%) | 72 (0%) | 224 (0%) | 324 (0%) |
| cpus | 24 (-77%) | 36 (-50%) | 40 (-82%) | 56 (-82%) |
Michael Ellerman reported:
: On a machine here (1TB, 40 cores, 4KB pages) the existing code gives:
:
: [ 0.500124] node 2 deferred pages initialised in 210ms
: [ 0.515790] node 3 deferred pages initialised in 230ms
: [ 0.516061] node 0 deferred pages initialised in 230ms
: [ 0.516522] node 7 deferred pages initialised in 230ms
: [ 0.516672] node 4 deferred pages initialised in 230ms
: [ 0.516798] node 6 deferred pages initialised in 230ms
: [ 0.517051] node 5 deferred pages initialised in 230ms
: [ 0.523887] node 1 deferred pages initialised in 240ms
:
: vs with the patch:
:
: [ 0.379613] node 0 deferred pages initialised in 90ms
: [ 0.380388] node 1 deferred pages initialised in 90ms
: [ 0.380540] node 4 deferred pages initialised in 100ms
: [ 0.390239] node 6 deferred pages initialised in 100ms
: [ 0.390249] node 2 deferred pages initialised in 100ms
: [ 0.390786] node 3 deferred pages initialised in 110ms
: [ 0.396721] node 5 deferred pages initialised in 110ms
: [ 0.397095] node 7 deferred pages initialised in 110ms
:
: Which is a nice speedup.
[[email protected]: v3]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric Chanudet <[email protected]>
Tested-by: Michael Ellerman <[email protected]> (powerpc)
Reviewed-by: Baoquan He <[email protected]>
Acked-by: Alexander Gordeev <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Added two explicit MF_MSG messages describing failure in
get_hwpoison_page. Attemped to document the definition of various action
names, and made a few adjustment to the action_result() calls.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jane Chu <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Acked-by: Miaohe Lin <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Let's make update_mmu_tlb() simply a generic wrapper around
update_mmu_tlb_range(). Only the latter can now be overridden by the
architecture. We can now remove __HAVE_ARCH_UPDATE_MMU_TLB as well.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bang Li <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Lance Yang <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Add update_mmu_tlb_range() to simplify code", v4.
This series of commits mainly adds the update_mmu_tlb_range() to batch
update tlb in an address range and implement update_mmu_tlb() using
update_mmu_tlb_range().
After commit 19eaf44954df ("mm: thp: support allocation of anonymous
multi-size THP"), We may need to batch update tlb of a certain address
range by calling update_mmu_tlb() in a loop. Using the
update_mmu_tlb_range(), we can simplify the code and possibly reduce the
execution of some unnecessary code in some architectures.
This patch (of 3):
Add update_mmu_tlb_range(), we can batch update tlb of an address range.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bang Li <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Lance Yang <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Using insert_page() we might have previously ended up passing the zeropage
into rmap code. Make sure that won't happen again.
Note that we won't check the huge zeropage for now, which might still end
up in RMAP code.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Vincent Donnefort <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
All users have been converted to use the folio version of these macros, we
can safely remove the page based interface.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There are two helpers for retrieving the index within address space for
mixed usage of swap cache and page cache:
- page_index
- folio_index
This commit drops page_index, as we have eliminated all users, and
converts folio_index's helper __page_file_index to use folio to avoid the
page conversion.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kairui Song <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Cc: Anna Schumaker <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Chao Yu <[email protected]>
Cc: Chris Li <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Howells <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ilya Dryomov <[email protected]>
Cc: Jaegeuk Kim <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: Marc Dionne <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Ryusuke Konishi <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Xiubo Li <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
These two helpers were useful for mixed usage of swap cache and page
cache, which help retrieve the corresponding file or swap device offset of
a page or folio.
They were introduced in commit f981c5950fa8 ("mm: methods for teaching
filesystems about PG_swapcache pages") and used in commit d56b4ddf7781
("nfs: teach the NFS client how to treat PG_swapcache pages"), suppose to
be used with direct_IO for swap over fs.
But after commit e1209d3a7a67 ("mm: introduce ->swap_rw and use it for
reads from SWP_FS_OPS swap-space"), swap with direct_IO is no more, and
swap cache mapping is never exposed to fs.
Now we have dropped all users of page_file_offset and folio_file_pos, so
they can be deleted.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kairui Song <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Cc: Anna Schumaker <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Chao Yu <[email protected]>
Cc: Chris Li <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Howells <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ilya Dryomov <[email protected]>
Cc: Jaegeuk Kim <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: Marc Dionne <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Ryusuke Konishi <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Xiubo Li <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
huge_anon_orders_always is accessed lockless, it is better to use the
READ_ONCE() wrapper. This is not fixing any visible bug, hopefully this
can cease some KCSAN complains in the future. Also do that for
huge_anon_orders_madvise.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ran Xiaokai <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Lu Zhongjun <[email protected]>
Reviewed-by: xu xin <[email protected]>
Cc: Yang Yang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm: convert to folio_alloc_mpol()".
This patch (of 4):
This adds a new folio_alloc_mpol() like folio_alloc() but allocate folio
according to NUMA mempolicy.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Introduce misc.peak to record the historical maximum usage of the
resource, as in some scenarios the value of misc.max could be
adjusted based on the peak usage of the resource.
Signed-off-by: Xiu Jianfeng <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|