Age | Commit message (Collapse) | Author | Files | Lines |
|
Delete those ancient pr_debug()s - PDprintk()s in Andi Kleen's original
submission of core NUMA API, and useful when debugging shared mempolicy
lifetime back then, but not used recently.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Sidhartha Kumar <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tejun heo <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
"man 2 migrate_pages" says "On success migrate_pages() returns the number
of pages that could not be moved". Although 5.3 and 5.4 commits fixed
mbind(MPOL_MF_STRICT|MPOL_MF_MOVE*) to fail with EIO when not all pages
could be moved (because some could not be isolated for migration),
migrate_pages(2) was left still reporting only those pages failing at the
migration stage, forgetting those failing at the earlier isolation stage.
Fix that by accumulating a long nr_failed count in struct queue_pages,
returned by queue_pages_range() when it's not returning an error, for
adding on to the nr_failed count from migrate_pages() in mm/migrate.c. A
count of pages? It's more a count of folios, but changing it to pages
would entail more work (also in mm/migrate.c): does not seem justified.
queue_pages_range() itself should only return -EIO in the "strictly
unmovable" case (STRICT without any MOVEs): in that case it's best to
break out as soon as nr_failed gets set; but otherwise it should continue
to isolate pages for MOVing even when nr_failed - as the mbind(2) manpage
promises.
There's a case when nr_failed should be incremented when it was missed:
queue_folios_pte_range() and queue_folios_hugetlb() count the transient
migration entries, like queue_folios_pmd() already did. And there's a
case when nr_failed should not be incremented when it would have been: in
meeting later PTEs of the same large folio, which can only be isolated
once: fixed by recording the current large folio in struct queue_pages.
Clean up the affected functions, fixing or updating many comments. Bool
migrate_folio_add(), without -EIO: true if adding, or if skipping shared
(but its arguable folio_estimated_sharers() heuristic left unchanged).
Use MPOL_MF_WRLOCK flag to queue_pages_range(), instead of bool lock_vma.
Use explicit STRICT|MOVE* flags where queue_pages_test_walk() checks for
skipping, instead of hiding them behind MPOL_MF_VALID.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Sidhartha Kumar <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tejun heo <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
It seems strange that kernfs should be an outlier with a set_policy and
get_policy in its kernfs_vm_ops. Ah, it dates back to v2.6.30's commit
095160aee954 ("sysfs: fix some bin_vm_ops errors"), when I had crashed on
powerpc's pci_mmap_legacy_page_range() fallback to shmem_zero_setup().
Well, that was commendably thorough, to give sysfs-bin a set_policy and
get_policy, just to avoid the way it was coded resulting in EINVAL from
mmap when CONFIG_NUMA; but somehow feels a bit over-the-top to me now.
It's easier to say that nobody should expect to manage a shmem object's
shared NUMA mempolicy via some kernfs backdoor to that object: delete that
code (and there's no longer an EINVAL from mmap in the NUMA case).
This then leaves set_policy/get_policy as implemented only by shmem -
though importantly also by SysV SHM, which has to interface with shmem
which implements them, and with SHM_HUGETLB which does not.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Sidhartha Kumar <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tejun heo <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mempolicy: cleanups leading to NUMA mpol without vma", v2.
Mostly cleanups in mm/mempolicy.c, but finally removing the pseudo-vma
from shmem folio allocation, and removing the mmap_lock around folio
migration for mbind and migrate_pages syscalls.
This patch (of 12):
hugetlbfs_fallocate() goes through the motions of pasting a shared NUMA
mempolicy onto its pseudo-vma, but how could there ever be a shared NUMA
mempolicy for this file? hugetlb_vm_ops has never offered a set_policy
method, and hugetlbfs_parse_param() has never supported any mpol options
for a mount-wide default policy.
It's just an illusion: clean it away so as not to confuse others, giving
us more freedom to adjust shmem's set_policy/get_policy implementation.
But hugetlbfs_inode_info is still required, just to accommodate seals.
Yes, shared NUMA mempolicy support could be added to hugetlbfs, with a
set_policy method and/or mpol mount option (Andi's first posting did
include an admitted-unsatisfactory hugetlb_set_policy()); but it seems
that nobody has bothered to add that in the nineteen years since v2.6.7
made it possible, and there is at least one company that has invested
enough into hugetlbfs, that I guess they have learnt well enough how to
manage its NUMA, without needing shared mempolicy.
Remove linux/mempolicy.h from linux/hugetlb.h: include linux/pagemap.h in
its place, because hugetlb.h's recently added use of filemap_lock_folio()
requires that (although most .configs and .c's get it in some other way).
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Sidhartha Kumar <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tejun heo <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
damon_sysfs_set_targets() had a bug that can result in unexpected memory
usage and monitoring overhead increase. The bug has fixed by a previous
commit. Add a unit test for avoiding a similar bug of future.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Brendan Higgins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When calculating the pseudo-moving access rate, DAMON divides some values
by the maximum nr_accesses. However, due to the type of the related
variables, simple division-based calculation of the divisor can return
zero. As a result, divide-by-zero is possible. Fix it by using
damon_max_nr_accesses(), which handles the case.
Note that this is a fix for a commit that not in the mainline but mm
tree.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ace30fb21af5 ("mm/damon/core: use pseudo-moving sum for nr_accesses_bp")
Reported-by: Jakub Acs <[email protected]>
Signed-off-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When calculating the hotness threshold for lru_prio scheme of
DAMON_LRU_SORT, the module divides some values by the maximum nr_accesses.
However, due to the type of the related variables, simple division-based
calculation of the divisor can return zero. As a result, divide-by-zero
is possible. Fix it by using damon_max_nr_accesses(), which handles the
case.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 40e983cca927 ("mm/damon: introduce DAMON-based LRU-lists Sorting")
Signed-off-by: SeongJae Park <[email protected]>
Reported-by: Jakub Acs <[email protected]>
Cc: <[email protected]> [6.0+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
When calculating the hotness of each region for the under-quota regions
prioritization, DAMON divides some values by the maximum nr_accesses.
However, due to the type of the related variables, simple division-based
calculation of the divisor can return zero. As a result, divide-by-zero
is possible. Fix it by using damon_max_nr_accesses(), which handles the
case.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization")
Signed-off-by: SeongJae Park <[email protected]>
Reported-by: Jakub Acs <[email protected]>
Cc: <[email protected]> [5.16+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
When monitoring attributes are changed, DAMON updates access rate of the
monitoring results accordingly. For that, it divides some values by the
maximum nr_accesses. However, due to the type of the related variables,
simple division-based calculation of the divisor can return zero. As a
result, divide-by-zero is possible. Fix it by using
damon_max_nr_accesses(), which handles the case.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 2f5bef5a590b ("mm/damon/core: update monitoring results for new monitoring attributes")
Signed-off-by: SeongJae Park <[email protected]>
Reported-by: Jakub Acs <[email protected]>
Cc: <[email protected]> [6.3+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "avoid divide-by-zero due to max_nr_accesses overflow".
The maximum nr_accesses of given DAMON context can be calculated by
dividing the aggregation interval by the sampling interval. Some logics
in DAMON uses the maximum nr_accesses as a divisor. Hence, the value
shouldn't be zero. Such case is avoided since DAMON avoids setting the
agregation interval as samller than the sampling interval. However, since
nr_accesses is unsigned int while the intervals are unsigned long, the
maximum nr_accesses could be zero while casting.
Avoid the divide-by-zero by implementing a function that handles the
corner case (first patch), and replaces the vulnerable direct max
nr_accesses calculations (remaining patches).
Note that the patches for the replacements are divided for broken commits,
to make backporting on required tres easier. Especially, the last patch
is for a patch that not yet merged into the mainline but in mm tree.
This patch (of 4):
The maximum nr_accesses of given DAMON context can be calculated by
dividing the aggregation interval by the sampling interval. Some logics
in DAMON uses the maximum nr_accesses as a divisor. Hence, the value
shouldn't be zero. Such case is avoided since DAMON avoids setting the
agregation interval as samller than the sampling interval. However, since
nr_accesses is unsigned int while the intervals are unsigned long, the
maximum nr_accesses could be zero while casting. Implement a function
that handles the corner case.
Note that this commit is not fixing the real issue since this is only
introducing the safe function that will replaces the problematic
divisions. The replacements will be made by followup commits, to make
backporting on stable series easier.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization")
Signed-off-by: SeongJae Park <[email protected]>
Reported-by: Jakub Acs <[email protected]>
Cc: <[email protected]> [5.16+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit dc68badcede4 ("mm: mlock: update mlock_pte_range to handle
large folio") I've just occasionally seen VM_WARN_ON_FOLIO(folio_test_ksm)
warnings from folio_within_range(), in a splurge after testing with KSM
hyperactive.
folio_referenced_one()'s use of folio_within_vma() is safe because it
checks folio_test_large() first; but allow_mlock_munlock() needs to do the
same to avoid those warnings (or check !folio_test_ksm() itself? Or move
either check into folio_within_range()? Hard to tell without more
examples of its use).
Link: https://lkml.kernel.org/r/[email protected]
Fixes: dc68badcede4 ("mm: mlock: update mlock_pte_range to handle large folio")
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Yin Fengwei <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Stefan Roesch <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit e509ad4d77e6 ("ext4: use bdev_getblk() to avoid memory
reclaim in readahead path") rightly replaced GFP_NOFAIL allocations by
GFP_NOWAIT allocations, I've occasionally been seeing "page allocation
failure: order:0" warnings under load: all with
ext4_sb_breadahead_unmovable() in the stack. I don't think those warnings
are of any interest: suppress them with __GFP_NOWARN.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e509ad4d77e6 ("ext4: use bdev_getblk() to avoid memory reclaim in readahead path")
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Hui Zhu <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When doing compaction, I found the lru_add_drain() is an obvious hotspot
when migrating pages. The distribution of this hotspot is as follows:
- 18.75% compact_zone
- 17.39% migrate_pages
- 13.79% migrate_pages_batch
- 11.66% migrate_folio_move
- 7.02% lru_add_drain
+ 7.02% lru_add_drain_cpu
+ 3.00% move_to_new_folio
1.23% rmap_walk
+ 1.92% migrate_folio_unmap
+ 3.20% migrate_pages_sync
+ 0.90% isolate_migratepages
The lru_add_drain() was added by commit c3096e6782b7 ("mm/migrate:
__unmap_and_move() push good newpage to LRU") to drain the newpage to LRU
immediately, to help to build up the correct newpage->mlock_count in
remove_migration_ptes() for mlocked pages. However, if there are no
mlocked pages are migrating, then we can avoid this lru drain operation,
especailly for the heavy concurrent scenarios.
So we can record the source pages' mlocked status in
migrate_folio_unmap(), and only drain the lru list when the mlocked status
is set in migrate_folio_move().
In addition, the page was already isolated from lru when migrating, so
checking the mlocked status is stable by folio_test_mlocked() in
migrate_folio_unmap().
After this patch, I can see the hotpot of the lru_add_drain() is gone:
- 9.41% migrate_pages_batch
- 6.15% migrate_folio_move
- 3.64% move_to_new_folio
+ 1.80% migrate_folio_extra
+ 1.70% buffer_migrate_folio
+ 1.41% rmap_walk
+ 0.62% folio_add_lru
+ 3.07% migrate_folio_unmap
Meanwhile, the compaction latency shows some improvements when running
thpscale:
base patched
Amean fault-both-1 1131.22 ( 0.00%) 1112.55 * 1.65%*
Amean fault-both-3 2489.75 ( 0.00%) 2324.15 * 6.65%*
Amean fault-both-5 3257.37 ( 0.00%) 3183.18 * 2.28%*
Amean fault-both-7 4257.99 ( 0.00%) 4079.04 * 4.20%*
Amean fault-both-12 6614.02 ( 0.00%) 6075.60 * 8.14%*
Amean fault-both-18 10607.78 ( 0.00%) 8978.86 * 15.36%*
Amean fault-both-24 14911.65 ( 0.00%) 11619.55 * 22.08%*
Amean fault-both-30 14954.67 ( 0.00%) 14925.66 * 0.19%*
Amean fault-both-32 16654.87 ( 0.00%) 15580.31 * 6.45%*
Link: https://lkml.kernel.org/r/06e9153a7a4850352ec36602df3a3a844de45698.1697859741.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yin Fengwei <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The directory this file is in was renamed but the reference didn't get
updated. Fix it.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ee65728e103b ("docs: rename Documentation/vm to Documentation/mm")
Signed-off-by: Vegard Nossum <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Acked-by: Mike Kravetz <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Wu XiangCheng <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
For compound pages, the head sets the PG_head flag and the tail sets the
compound_head to indicate the head page. If a user allocates a compound
page and frees it with a different order, the compound page information
will not be properly initialized. To detect this problem,
compound_order(page) and the order argument are compared, but this is not
checked when the order argument is zero. That error should be checked
regardless of the order.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hyesoo Yu <[email protected]>
Reviewed-by: Vishal Moola (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muhammad Muzammil <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Muhammad Muzammil <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This removes 2 calls to compound_head() and helps convert khugepaged to
use folios throughout.
Previously, if the address passed to collapse_pte_mapped_thp()
corresponded to a tail page, the scan would fail immediately. Using
filemap_lock_folio() we get the corresponding folio back and try to
operate on the folio instead.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Also remove count_memcg_page_event now that its last caller no longer uses
it and reword hpage_collapse_alloc_page() to hpage_collapse_alloc_folio().
This removes 1 call to compound_head() and helps convert khugepaged to
use folios throughout.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Both callers of is_refcount_suitable() have been converted to use
folios, so convert it to take in a folio. Both callers only operate on
head pages of folios so mapcount/refcount conversions here are trivial.
Removes 3 calls to compound head, and removes 315 bytes of kernel text.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Replaces 5 calls to compound_head(), and removes 1385 bytes of kernel
text.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Some khugepaged folio conversions", v3.
This patchset converts a number of functions to use folios. This cleans
up some khugepaged code and removes a large number of hidden
compound_head() calls.
This patch (of 5):
Replaces 11 calls to compound_head() with 1, and removes 1348 bytes of
kernel text.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Kefeng Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In offline_pages(), if a node becomes memoryless, we will clear its
N_MEMORY state by calling node_states_clear_node(). But we do this
after rebuilding the zonelists by calling build_all_zonelists(), which
will cause this memoryless node to still be in the fallback nodes
(node_order[]) of other nodes.
To drop memoryless nodes from fallback nodes in this case, just call
node_states_clear_node() before calling build_all_zonelists().
In this way, we will not try to allocate pages from memoryless node0,
then the panic mentioned in [1] will also be fixed. Even though this
problem has been solved by dropping the NODE_MIN_SIZE constrain in x86
[2], it would be better to fix it in the core MM as well.
https://lore.kernel.org/all/[email protected]/ [1]
https://lore.kernel.org/all/[email protected]/ [2]
Link: https://lkml.kernel.org/r/9f1dbe7ee1301c7163b2770e32954ff5e3ecf2c4.1697711415.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "handle memoryless nodes more appropriately", v3.
Currently, in the process of initialization or offline memory, memoryless
nodes will still be built into the fallback list of itself or other nodes.
This is not what we expected, so this patch series removes memoryless
nodes from the fallback list entirely.
This patch (of 2):
In find_next_best_node(), we skipped the memoryless nodes when building
the zonelists of other normal nodes (N_NORMAL), but did not skip the
memoryless node itself when building the zonelist. This will cause it to
be traversed at runtime.
For example, say we have node0 and node1, node0 is memoryless
node, then the fallback order of node0 and node1 as follows:
[ 0.153005] Fallback order for Node 0: 0 1
[ 0.153564] Fallback order for Node 1: 1
After this patch, we skip memoryless node0 entirely, then
the fallback order of node0 and node1 as follows:
[ 0.155236] Fallback order for Node 0: 1
[ 0.155806] Fallback order for Node 1: 1
So it becomes completely invisible, which will reduce runtime
overhead.
And in this way, we will not try to allocate pages from memoryless node0,
then the panic mentioned in [1] will also be fixed. Even though this
problem has been solved by dropping the NODE_MIN_SIZE constrain in x86
[2], it would be better to fix it in core MM as well.
[1]. https://lore.kernel.org/all/[email protected]/
[2]. https://lore.kernel.org/all/[email protected]/
[[email protected]: update comment, per Ingo]
Link: https://lkml.kernel.org/r/7300fc00a057eefeb9a68c8ad28171c3f0ce66ce.1697799303.git.zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/157013e978468241de4a4c05d5337a44638ecb0e.1697711415.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add nr_split to trace_mm_migrate_pages for large folio (including THP)
split events.
[[email protected]: cleanup per Huang, Ying]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zi Yan <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
nr_failed was missing the large folio splits from migrate_pages_batch()
and can cause a mismatch between migrate_pages() return value and the
number of not migrated pages, i.e., when the return value of
migrate_pages() is 0, there are still pages left in the from page list.
It will happen when a non-PMD THP large folio fails to migrate due to
-ENOMEM and is split successfully but not all the split pages are not
migrated, migrate_pages_batch() would return non-zero, but
astats.nr_thp_split = 0. nr_failed would be 0 and returned to the caller
of migrate_pages(), but the not migrated pages are left in the from page
list without being added back to LRU lists.
Fix it by adding a new nr_split counter for large folio splits and adding
it to nr_failed in migrate_page_sync() after migrate_pages_batch() is
done.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 2ef7dbb26990 ("migrate_pages: try migrate in batch asynchronously firstly")
Signed-off-by: Zi Yan <[email protected]>
Acked-by: Huang Ying <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In patch (mm: kmemleak: split __create_object into two functions), the
initialisation of object has been splited in two places. Catalin said it
feels a bit weird and error prone. So leave __alloc_object() to just do
the actual allocation and let __link_object() do the full initialisation.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Suggested-by: Catalin Marinas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
delete_object_part() can be called by multiple callers in the same time.
If an object is found and removed by a caller, and then another caller try
to find it too, it failed and return directly. It still be recorded by
kmemleak even if it has already been freed to buddy. With DEBUG on,
kmemleak will report the following warning,
kmemleak: Partially freeing unknown object at 0xa1af86000 (size 4096)
CPU: 0 PID: 742 Comm: test_huge Not tainted 6.6.0-rc3kmemleak+ #54
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x37/0x50
kmemleak_free_part_phys+0x50/0x60
hugetlb_vmemmap_optimize+0x172/0x290
? __pfx_vmemmap_remap_pte+0x10/0x10
__prep_new_hugetlb_folio+0xe/0x30
prep_new_hugetlb_folio.isra.0+0xe/0x40
alloc_fresh_hugetlb_folio+0xc3/0xd0
alloc_surplus_hugetlb_folio.constprop.0+0x6e/0xd0
hugetlb_acct_memory.part.0+0xe6/0x2a0
hugetlb_reserve_pages+0x110/0x2c0
hugetlbfs_file_mmap+0x11d/0x1b0
mmap_region+0x248/0x9a0
? hugetlb_get_unmapped_area+0x15c/0x2d0
do_mmap+0x38b/0x580
vm_mmap_pgoff+0xe6/0x190
ksys_mmap_pgoff+0x18a/0x1f0
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Expand __create_object() and move __alloc_object() to the beginning. Then
use kmemleak_lock to protect __find_and_remove_object() and
__link_object() as a whole, which can guarantee all objects are processed
sequentialally.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 53238a60dd4a ("kmemleak: Allow partial freeing of memory blocks")
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Patrick Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add new __find_and_remove_object() without kmemleak_lock protect, it is in
preparation for the next patch.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Patrick Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The kmemleak object is allocated by mem_pool_alloc(), which could be from
slab or mem_pool[], so it's not suitable using __kmem_cache_free() to free
the object, use __mem_pool_free() instead.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0647398a8c7b ("mm: kmemleak: simple memory allocation pool for kmemleak objects")
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Patrick Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
__create_object() consists of two part, the first part allocate a kmemleak
object and initialize it, the second part insert it into object tree.
This function need kmemleak_lock but actually only the second part need
lock.
Split it into two functions, the first function __alloc_object only
allocate a kmemleak object, and the second function __link_object() will
initialize the object and insert it into object tree, use the
kmemleak_lock to protect __link_object() only.
[[email protected]: coding-style cleanups]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Patrick Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
With 0x%p, the pointer will be hashed and print (____ptrval____) instead.
And with 0x%pa, the pointer can be successfully printed but with duplicate
prefixes, which looks like:
kmemleak: kmemleak_free(0x(____ptrval____))
kmemleak: kmemleak_free_percpu(0x(____ptrval____))
kmemleak: kmemleak_free_part_phys(0x0x0000000a1af86000)
Use 0x%px instead of 0x%p or 0x%pa to print the pointer. Then the print
will be like:
kmemleak: kmemleak_free(0xffff9111c145b020)
kmemleak: kmemleak_free_percpu(0x00000000000333b0)
kmemleak: kmemleak_free_part_phys(0x0000000a1af80000)
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liu Shixin <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Patrick Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since kmemleak_alloc_phys() rather than kmemleak_alloc() was called from
memblock_alloc_range_nid(), kmemleak_free_part_phys() should be used to
delete kmemleak object in free_bootmem_page(). In debug mode, there are
following warning:
kmemleak: Partially freeing unknown object at 0xffff97345aff7000 (size 4096)
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 028725e73375 ("bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page")
Signed-off-by: Liu Shixin <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Patrick Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Some bugfix about kmemleak", v3.
Some bugfixes for kmemleak and the printed info from debug mode.
This patch (of 7):
Since kmemleak_alloc_phys() rather than kmemleak_alloc() was called from
memblock_alloc_range_nid(), kmemleak_free_part_phys() should be used to
delete kmemleak object in put_page_bootmem(). In debug mode, there are
following warning:
kmemleak: Partially freeing unknown object at 0xffff97345aff7000 (size 4096)
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: dd0ff4d12dd2 ("bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem")
Signed-off-by: Liu Shixin <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: Patrick Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since all calls use folio_xchg_last_cpupid(), remove
page_cpupid_xchg_last().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Convert to use folio_xchg_last_cpupid() in wp_page_reuse(), and remove
page variable. Since now only normal and PMD-mapped page is handled by
numa balancing, it's enough to only update the entire folio's last cpupid.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Saves one compound_head() call, also in preparation for
page_cpupid_xchg_last() conversion.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Make finish_mkwrite_fault static since it is not used outside of
memory.c.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Convert to use folio_xchg_last_cpupid() in __split_huge_page_tail().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Convert to use folio_xchg_last_cpupid() in folio_migrate_flags(), also
directly use folio_nid() instead of page_to_nid(&folio->page).
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Convert to use folio_xchg_last_cpupid() in should_numa_migrate_memory().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add folio_xchg_last_cpupid() wrapper, which is required to convert
page_cpupid_xchg_last() to folio vertion later in the series.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since all calls use folio_xchg_access_time(), remove
xchg_page_access_time().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use a folio in change_huge_pmd(), which helps to remove last
xchg_page_access_time() caller.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use a folio in change_pte_range() to save three compound_head() calls.
Since now only normal and PMD-mapped page is handled by numa balancing,
it is enough to only update the entire folio's access time.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Convert to use folio_xchg_access_time() in numa_hint_fault_latency().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add folio_xchg_access_time() wrapper, which is required to convert
xchg_page_access_time() to folio vertion later in the series.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since all calls use folio_last_cpupid(), remove page_cpupid_last().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Convert to use folio_last_cpupid() in __split_huge_page_tail().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Convert to use folio_last_cpupid() in do_huge_pmd_numa_page().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Convert to use folio_last_cpupid() in do_numa_page().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|