Age | Commit message (Collapse) | Author | Files | Lines |
|
Remove the following unused inline functions from mm_inline.h:
1. All uses of add_page_to_lru_list_tail() have been removed since
commit 7a3dbfe8a52b ("mm/swap: convert lru_deactivate_file to a
folio_batch"), and it can be replaced by lruvec_add_folio_tail().
2. All uses of __clear_page_lru_flags() have been removed since commit
188e8caee968 ("mm/swap: convert __page_cache_release() to use a
folio"), and it can be replaced by __folio_clear_lru_flags().
They are useless, so remove them.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add :collapse mod to userfaultfd selftest. Currently this mod is only
valid for "shmem" test type, but could be used for other test types.
When provided, memory allocated by ->allocate_area() will be
hugepage-aligned enforced to be hugepage-sized. userfaultf_minor_test,
after the UFFD-registered mapping has been populated by UUFD minor fault
handler, attempt to MADV_COLLAPSE the UFFD-registered mapping to collapse
the memory into a pmd-mapped THP.
This test is meant to be a functional test of what occurs during
UFFD-driven live migration of VMs backed by huge tmpfs where, after a
hugepage-sized region has been successfully migrated (in native page-sized
chunks, to avoid latency of fetched a hugepage over the network), we want
to reclaim previous VM performance by remapping it at the PMD level.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zach O'Keefe <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This test tests that MADV_COLLAPSE acting on file/shmem memory for which
(1) the file extent mapping by the memory is already a huge page in the
page cache, and (2) the pmd mapping this memory in the target process is
none.
In practice, (1)+(2) is the state left over after khugepaged has
successfully collapsed file/shmem memory for a target VMA, but the memory
has not yet been refaulted. So, this test in-effect tests MADV_COLLAPSE
racing with khugepaged to collapse the memory first.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zach O'Keefe <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add memory operations for shmem (memfd) memory, and reuse existing tests
with the new memory operations.
Shmem tests can be called with "shmem" mem_type, and shmem tests are ran
with "all" mem_type as well.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zach O'Keefe <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add memory operations for file-backed and tmpfs memory. Call existing
tests with these new memory operations to test collapse functionality of
khugepaged and MADV_COLLAPSE on file-backed and tmpfs memory. Not all
tests are reusable; for example, collapse_swapin_single_pte() which checks
swap usage.
Refactor test arguments. Usage is now:
Usage: ./khugepaged <test type> [dir]
<test type> : <context>:<mem_type>
<context> : [all|khugepaged|madvise]
<mem_type> : [all|anon|file]
"file,all" mem_type requires [dir] argument
"file,all" mem_type requires kernel built with
CONFIG_READ_ONLY_THP_FOR_FS=y
if [dir] is a (sub)directory of a tmpfs mount, tmpfs must be
mounted with huge=madvise option for khugepaged tests to work
Refactor calling tests to make it clear what collapse context / memory
operations they support, but only invoke tests requested by user. Also
log what test is being ran, and with what context / memory, to make test
logs more human readable.
A new test file is created and deleted for every test to ensure no pages
remain in the page cache between tests (tests also may attempt to collapse
different amount of memory).
For file-backed memory where the file is stored on a block device, disable
/sys/block/<device>/queue/read_ahead_kb so that pages don't find their way
into the page cache without the tests faulting them in.
Add file and shmem wrappers to vm_utils check for file and shmem hugepages
in smaps.
[[email protected]: fix "add thp collapse file and tmpfs testing" for
tmpfs]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zach O'Keefe <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Modularize operations to setup, cleanup, fault, and check for huge pages,
for a given memory type. This allows reusing existing tests with
additional memory types by defining new memory operations. Following
patches will add file and shmem memory types.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zach O'Keefe <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
These files:
tools/testing/selftests/vm/vm_util.c
tools/testing/selftests/vm/khugepaged.c
Both contain logic to:
1) Determine hugepage size on current system
2) Read /proc/self/smaps to determine number of THPs at an address
Refactor selftests/vm/khugepaged.c to use the vm_util common helpers and
add it as a build dependency.
Since selftests/vm/khugepaged.c is the largest user of check_huge(),
change the signature of check_huge() to match selftests/vm/khugepaged.c's
useage: take an expected number of hugepages, and return a bool indicating
if the correct number of hugepages were found. Add a wrapper,
check_huge_anon(), in anticipation of checking smaps for file and shmem
hugepages.
Update existing callsites to use the new pattern / function.
Likewise, check_for_pattern() was duplicated, and it's a general enough
helper to include in vm_util helpers as well.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zach O'Keefe <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add huge_memory:trace_mm_khugepaged_scan_file tracepoint to
hpage_collapse_scan_file() analogously to hpage_collapse_scan_pmd().
While this change is targeted at debugging MADV_COLLAPSE pathway, the
"mm_khugepaged" prefix is retained for symmetry with
huge_memory:trace_mm_khugepaged_scan_pmd, which retains it's legacy name
to prevent changing kernel ABI as much as possible.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zach O'Keefe <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add support for MADV_COLLAPSE to collapse shmem-backed and file-backed
memory into THPs (requires CONFIG_READ_ONLY_THP_FOR_FS=y).
On success, the backing memory will be a hugepage. For the memory range
and process provided, the page tables will synchronously have a huge pmd
installed, mapping the THP. Other mappings of the file extent mapped by
the memory range may be added to a set of entries that khugepaged will
later process and attempt update their page tables to map the THP by a
pmd.
This functionality unlocks two important uses:
(1) Immediately back executable text by THPs. Current support provided
by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large
system which might impair services from serving at their full rated
load after (re)starting. Tricks like mremap(2)'ing text onto
anonymous memory to immediately realize iTLB performance prevents
page sharing and demand paging, both of which increase steady state
memory footprint. Now, we can have the best of both worlds: Peak
upfront performance and lower RAM footprints.
(2) userfaultfd-based live migration of virtual machines satisfy UFFD
faults by fetching native-sized pages over the network (to avoid
latency of transferring an entire hugepage). However, after guest
memory has been fully copied to the new host, MADV_COLLAPSE can
be used to immediately increase guest performance.
Since khugepaged is single threaded, this change now introduces
possibility of collapse contexts racing in file collapse path. There a
important few places to consider:
(1) hpage_collapse_scan_file(), when we xas_pause() and drop RCU.
We could have the memory collapsed out from under us, but
the next xas_for_each() iteration will correctly pick up the
hugepage. The hugepage might not be up to date (insofar as
copying of small page contents might not have completed - the
page still may be locked), but regardless what small page index
we were iterating over, we'll find the hugepage and identify it
as a suitably aligned compound page of order HPAGE_PMD_ORDER.
In khugepaged path, we locklessly check the value of the pmd,
and only add it to deferred collapse array if we find pmd
mapping pte table. This is fine, since other values that could
have raced in right afterwards denote failure, or that the
memory was successfully collapsed, so we don't need further
processing.
In madvise path, we'll take mmap_lock() in write to serialize
against page table updates and will know what to do based on the
true value of the pmd: recheck all ptes if we point to a pte table,
directly install the pmd, if the pmd has been cleared, but
memory not yet faulted, or nothing at all if we find a huge pmd.
It's worth putting emphasis here on how we treat the none pmd
here. If khugepaged has processed this mm's page tables
already, it will have left the pmd cleared (ready for refault by
the process). Depending on the VMA flags and sysfs settings,
amount of RAM on the machine, and the current load, could be a
relatively common occurrence - and as such is one we'd like to
handle successfully in MADV_COLLAPSE. When we see the none pmd
in collapse_pte_mapped_thp(), we've locked mmap_lock in write
and checked (a) huepaged_vma_check() to see if the backing
memory is appropriate still, along with VMA sizing and
appropriate hugepage alignment within the file, and (b) we've
found a hugepage head of order HPAGE_PMD_ORDER at the offset
in the file mapped by our hugepage-aligned virtual address.
Even though the common-case is likely race with khugepaged,
given these checks (regardless how we got here - we could be
operating on a completely different file than originally checked
in hpage_collapse_scan_file() for all we know) it should be safe
to directly make the pmd a huge pmd pointing to this hugepage.
(2) collapse_file() is mostly serialized on the same file extent by
lock sequence:
| lock hupepage
| lock mapping->i_pages
| lock 1st page
| unlock mapping->i_pages
| <page checks>
| lock mapping->i_pages
| page_ref_freeze(3)
| xas_store(hugepage)
| unlock mapping->i_pages
| page_ref_unfreeze(1)
| unlock 1st page
V unlock hugepage
Once a context (who already has their fresh hugepage locked)
locks mapping->i_pages exclusively, it will hold said lock
until it locks the first page, and it will hold that lock until
the after the hugepage has been added to the page cache (and
will unlock the hugepage after page table update, though that
isn't important here).
A racing context that loses the race for mapping->i_pages will
then lose the race to locking the first page. Here - depending
on how far the other racing context has gotten - we might find
the new hugepage (in which case we'll exit cleanly when we
check PageTransCompound()), or we'll find the "old" 1st small
page (in which we'll exit cleanly when we discover unexpected
refcount of 2 after isolate_lru_page()). This is assuming we
are able to successfully lock the page we find - in shmem path,
we could just fail the trylock and exit cleanly anyways.
Failure path in collapse_file() is similar: once we hold lock
on 1st small page, we are serialized against other collapse
contexts. Before the 1st small page is unlocked, we add it
back to the pagecache and unfreeze the refcount appropriately.
Contexts who lost the race to the 1st small page will then find
the same 1st small page with the correct refcount and will be
able to proceed.
[[email protected]: don't check pmd value twice in collapse_pte_mapped_thp()]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: Delete hugepage_vma_revalidate_anon(), remove
check for multi-add in khugepaged_add_pte_mapped_thp()]
Link: https://lore.kernel.org/linux-mm/CAHbLzkrtpM=ic7cYAHcqkubah5VTR8N5=k5RT8MTvv5rN1Y91w@mail.gmail.com/
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zach O'Keefe <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The main benefit of THPs are that they can be mapped at the pmd level,
increasing the likelihood of TLB hit and spending less cycles in page
table walks. pte-mapped hugepages - that is - hugepage-aligned compound
pages of order HPAGE_PMD_ORDER mapped by ptes - although being contiguous
in physical memory, don't have this advantage. In fact, one could argue
they are detrimental to system performance overall since they occupy a
precious hugepage-aligned/sized region of physical memory that could
otherwise be used more effectively. Additionally, pte-mapped hugepages
can be the cheapest memory to collapse for khugepaged since no new
hugepage allocation or copying of memory contents is necessary - we only
need to update the mapping page tables.
In the anonymous collapse path, we are able to collapse pte-mapped
hugepages (albeit, perhaps suboptimally), but the file/shmem path makes no
effort when compound pages (of any order) are encountered.
Identify pte-mapped hugepages in the file/shmem collapse path. The
final step of which makes a racy check of the value of the pmd to
ensure it maps a pte table. This should be fine, since races that
result in false-positive (i.e. attempt collapse even though we
shouldn't) will fail later in collapse_pte_mapped_thp() once we
actually lock mmap_lock and reinspect the pmd value. Races that result
in false-negatives (i.e. where we decide to not attempt collapse, but
should have) shouldn't be an issue, since in the worst case, we do
nothing - which is what we've done up to this point. We make a similar
check in retract_page_tables(). If we do think we've found a
pte-mapped hugepgae in khugepaged context, attempt to update page
tables mapping this hugepage.
Note that these collapses still count towards the
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed counter,
and if the pte-mapped hugepage was also mapped into multiple process'
address spaces, could be incremented for each page table update. Since we
increment the counter when a pte-mapped hugepage is successfully added to
the list of to-collapse pte-mapped THPs, it's possible that we never
actually update the page table either. This is different from how
file/shmem pages_collapsed accounting works today where only a successful
page cache update is counted (it's also possible here that no page tables
are actually changed). Though it incurs some slop, this is preferred to
either not accounting for the event at all, or plumbing through data in
struct mm_slot on whether to account for the collapse or not.
Also note that work still needs to be done to support arbitrary compound
pages, and that this should all be converted to using folios.
[[email protected]: Spelling mistake, update comment, and add Documentation]
Link: https://lore.kernel.org/linux-mm/CAHbLzkpHwZxFzjfX9nxVoRhzup8WMjMfyL6Xiq8mZ9M-N3ombw@mail.gmail.com/
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zach O'Keefe <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm: add file/shmem support to MADV_COLLAPSE", v4.
This series builds on top of the previous "mm: userspace hugepage
collapse" series which introduced the MADV_COLLAPSE madvise mode and added
support for private, anonymous mappings[2], by adding support for file and
shmem backed memory to CONFIG_READ_ONLY_THP_FOR_FS=y kernels.
File and shmem support have been added with effort to align with existing
MADV_COLLAPSE semantics and policy decisions[3]. Collapse of shmem-backed
memory ignores kernel-guiding directives and heuristics including all
sysfs settings (transparent_hugepage/shmem_enabled), and tmpfs huge= mount
options (shmem always supports large folios). Like anonymous mappings, on
successful return of MADV_COLLAPSE on file/shmem memory, the contents of
memory mapped by the addresses provided will be synchronously pmd-mapped
THPs.
This functionality unlocks two important uses:
(1) Immediately back executable text by THPs. Current support provided
by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large
system which might impair services from serving at their full rated
load after (re)starting. Tricks like mremap(2)'ing text onto
anonymous memory to immediately realize iTLB performance prevents
page sharing and demand paging, both of which increase steady state
memory footprint. Now, we can have the best of both worlds: Peak
upfront performance and lower RAM footprints.
(2) userfaultfd-based live migration of virtual machines satisfy UFFD
faults by fetching native-sized pages over the network (to avoid
latency of transferring an entire hugepage). However, after guest
memory has been fully copied to the new host, MADV_COLLAPSE can
be used to immediately increase guest performance.
khugepaged has received a small improvement by association and can now
detect and collapse pte-mapped THPs. However, there is still work to be
done along the file collapse path. Compound pages of arbitrary order
still needs to be supported and THP collapse needs to be converted to
using folios in general. Eventually, we'd like to move away from the
read-only and executable-mapped constraints currently imposed on eligible
files and support any inode claiming huge folio support. That said, I
think the series as-is covers enough to claim that MADV_COLLAPSE supports
file/shmem memory.
Patches 1-3 Implement the guts of the series.
Patch 4 Is a tracepoint for debugging.
Patches 5-9 Refactor existing khugepaged selftests to work with new
memory types + new collapse tests.
Patch 10 Adds a userfaultfd selftest mode to mimic a functional test
of UFFDIO_REGISTER_MODE_MINOR+MADV_COLLAPSE live migration.
(v4 note: "userfaultfd shmem" selftest is failing as of
Sep 22 mm-unstable)
[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/
[3] https://lore.kernel.org/linux-mm/[email protected]/
[4] https://lore.kernel.org/linux-mm/[email protected]/
[5] https://lore.kernel.org/linux-mm/[email protected]/
This patch (of 10):
Extend 'mm/thp: add flag to enforce sysfs THP in hugepage_vma_check()' to
shmem, allowing callers to ignore
/sys/kernel/transparent_hugepage/shmem_enabled and tmpfs huge= mount.
This is intended to be used by MADV_COLLAPSE, and the rationale is
analogous to the anon/file case: MADV_COLLAPSE is not coupled to
directives that advise the kernel's decisions on when THPs should be
considered eligible. shmem/tmpfs always claims large folio support,
regardless of sysfs or mount options.
[[email protected]: test shmem_huge_force explicitly]
Link: https://lore.kernel.org/linux-mm/CAHbLzko3A5-TpS0BgBeKkx5cuOkWgLvWXQH=TdgW-baO4rPtdg@mail.gmail.com/
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zach O'Keefe <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
MADV_COLLAPSE is a best-effort request that will set errno to an
actionable value if the request cannot be performed.
For example, if pages are not found on the LRU, or if they are currently
locked by something else, MADV_COLLAPSE will fail and set errno to EAGAIN
to inform callers that they may try again.
Since the khugepaged selftest is the first public use of MADV_COLLAPSE,
set a best practice of checking errno and retrying on EAGAIN.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9330694de59f ("selftests/vm: add MADV_COLLAPSE collapse context to selftests")
Signed-off-by: Zach O'Keefe <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
MADV_COLLAPSE is a best-effort request that attempts to set an actionable
errno value if the request cannot be fulfilled at the time. EAGAIN should
be used to communicate that a resource was temporarily unavailable, but
that the user may try again immediately.
SCAN_DEL_PAGE_LRU is an internal result code used when a page cannot be
isolated from it's LRU list. Since this, like SCAN_PAGE_LRU, is likely a
transitory state, make MADV_COLLAPSE return EAGAIN so that users know they
may reattempt the operation.
Another important scenario to consider is race with khugepaged.
khugepaged might isolate a page while MADV_COLLAPSE is interested in it.
Even though racing with khugepaged might mean that the memory has already
been collapsed, signalling an errno that is non-intrinsic to that memory
or arguments provided to madvise(2) lets the user know that future
attempts might (and in this case likely would) succeed, and avoids
false-negative assumptions by the user.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Signed-off-by: Zach O'Keefe <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
By the time we lock a page in collapse_pte_mapped_thp(), the page mapped
by the address pushed onto the slot's .pte_mapped_thp[] array might have
changed arbitrarily since we last looked at it. We revalidate that the
page is still the head of a compound page, but we don't revalidate if the
compound page is of order HPAGE_PMD_ORDER before applying rmap and page
table updates.
Since the kernel now supports large folios of arbitrary order, and since
replacing page's pte mappings by a pmd mapping only makes sense for
compound pages of order HPAGE_PMD_ORDER, revalidate that the compound
order is indeed of order HPAGE_PMD_ORDER before proceeding.
Link: https://lore.kernel.org/linux-mm/CAHbLzkon+2ky8v9ywGcsTUgXM_B35jt5NThYqQKXW2YV_GUacw@mail.gmail.com/
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zach O'Keefe <[email protected]>
Suggested-by: Yang Shi <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Chris Kennelly <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Houghton <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rongwei Wang <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The vma_lock and hugetlb_fault_mutex are dropped before handling userfault
and reacquire them again after handle_userfault(), but reacquire the
vma_lock could lead to UAF[1,2] due to the following race,
hugetlb_fault
hugetlb_no_page
/*unlock vma_lock */
hugetlb_handle_userfault
handle_userfault
/* unlock mm->mmap_lock*/
vm_mmap_pgoff
do_mmap
mmap_region
munmap_vma_range
/* clean old vma */
/* lock vma_lock again <--- UAF */
/* unlock vma_lock */
Since the vma_lock will unlock immediately after
hugetlb_handle_userfault(), let's drop the unneeded lock and unlock in
hugetlb_handle_userfault() to fix the issue.
[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1a1aad8a9b7b ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook")
Signed-off-by: Liu Shixin <[email protected]>
Signed-off-by: Kefeng Wang <[email protected]>
Reported-by: [email protected]
Reported-by: Liu Zixian <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Sidhartha Kumar <[email protected]>
Cc: <[email protected]> [4.14+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
cgroup_memory_noswap is used in many hot path, so make it a static key
to lower the kernel overhead.
Using 8G of ZRAM as SWAP, benchmark using `perf stat -d -d -d --repeat 100`
with the following code snip in a non-root cgroup:
#include <stdio.h>
#include <string.h>
#include <linux/mman.h>
#include <sys/mman.h>
#define MB 1024UL * 1024UL
int main(int argc, char **argv){
void *p = mmap(NULL, 8000 * MB, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
memset(p, 0xff, 8000 * MB);
madvise(p, 8000 * MB, MADV_PAGEOUT);
memset(p, 0xff, 8000 * MB);
return 0;
}
Before:
7,021.43 msec task-clock # 0.967 CPUs utilized ( +- 0.03% )
4,010 context-switches # 573.853 /sec ( +- 0.01% )
0 cpu-migrations # 0.000 /sec
2,052,057 page-faults # 293.661 K/sec ( +- 0.00% )
12,616,546,027 cycles # 1.805 GHz ( +- 0.06% ) (39.92%)
156,823,666 stalled-cycles-frontend # 1.25% frontend cycles idle ( +- 0.10% ) (40.25%)
310,130,812 stalled-cycles-backend # 2.47% backend cycles idle ( +- 4.39% ) (40.73%)
18,692,516,591 instructions # 1.49 insn per cycle
# 0.01 stalled cycles per insn ( +- 0.04% ) (40.75%)
4,907,447,976 branches # 702.283 M/sec ( +- 0.05% ) (40.30%)
13,002,578 branch-misses # 0.26% of all branches ( +- 0.08% ) (40.48%)
7,069,786,296 L1-dcache-loads # 1.012 G/sec ( +- 0.03% ) (40.32%)
649,385,847 L1-dcache-load-misses # 9.13% of all L1-dcache accesses ( +- 0.07% ) (40.10%)
1,485,448,688 L1-icache-loads # 212.576 M/sec ( +- 0.15% ) (39.49%)
31,628,457 L1-icache-load-misses # 2.13% of all L1-icache accesses ( +- 0.40% ) (39.57%)
6,667,311 dTLB-loads # 954.129 K/sec ( +- 0.21% ) (39.50%)
5,668,555 dTLB-load-misses # 86.40% of all dTLB cache accesses ( +- 0.12% ) (39.03%)
765 iTLB-loads # 109.476 /sec ( +- 21.81% ) (39.44%)
4,370,351 iTLB-load-misses # 214320.09% of all iTLB cache accesses ( +- 1.44% ) (39.86%)
149,207,254 L1-dcache-prefetches # 21.352 M/sec ( +- 0.13% ) (40.27%)
7.25869 +- 0.00203 seconds time elapsed ( +- 0.03% )
After:
6,576.16 msec task-clock # 0.953 CPUs utilized ( +- 0.10% )
4,020 context-switches # 605.595 /sec ( +- 0.01% )
0 cpu-migrations # 0.000 /sec
2,052,056 page-faults # 309.133 K/sec ( +- 0.00% )
11,967,619,180 cycles # 1.803 GHz ( +- 0.36% ) (38.76%)
161,259,240 stalled-cycles-frontend # 1.38% frontend cycles idle ( +- 0.27% ) (36.58%)
253,605,302 stalled-cycles-backend # 2.16% backend cycles idle ( +- 4.45% ) (34.78%)
19,328,171,892 instructions # 1.65 insn per cycle
# 0.01 stalled cycles per insn ( +- 0.10% ) (31.46%)
5,213,967,902 branches # 785.461 M/sec ( +- 0.18% ) (30.68%)
12,385,170 branch-misses # 0.24% of all branches ( +- 0.26% ) (34.13%)
7,271,687,822 L1-dcache-loads # 1.095 G/sec ( +- 0.12% ) (35.29%)
649,873,045 L1-dcache-load-misses # 8.93% of all L1-dcache accesses ( +- 0.11% ) (41.41%)
1,950,037,608 L1-icache-loads # 293.764 M/sec ( +- 0.33% ) (43.11%)
31,365,566 L1-icache-load-misses # 1.62% of all L1-icache accesses ( +- 0.39% ) (45.89%)
6,767,809 dTLB-loads # 1.020 M/sec ( +- 0.47% ) (48.42%)
6,339,590 dTLB-load-misses # 95.43% of all dTLB cache accesses ( +- 0.50% ) (46.60%)
736 iTLB-loads # 110.875 /sec ( +- 1.79% ) (48.60%)
4,314,836 iTLB-load-misses # 518653.73% of all iTLB cache accesses ( +- 0.63% ) (42.91%)
144,950,156 L1-dcache-prefetches # 21.836 M/sec ( +- 0.37% ) (41.39%)
6.89935 +- 0.00703 seconds time elapsed ( +- 0.10% )
The performance is clearly better. There is no significant hotspot
improvement according to perf report, as there are quite a few
callers of memcg_swap_enabled and do_memsw_account (which calls
memcg_swap_enabled). Many pieces of minor optimizations resulted
in lower overhead for the branch predictor, and bettter performance.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kairui Song <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Acked-by: Muchun Song <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm: memcontrol: cleanup and optimize for two accounting
params", v2.
This patch (of 2):
There are currently two helpers for checking if cgroup kmem
accounting is enabled:
- mem_cgroup_kmem_disabled
- memcg_kmem_enabled
mem_cgroup_kmem_disabled is a simple helper that returns true
if cgroup.memory=nokmem is specified, otherwise returns false.
memcg_kmem_enabled is a bit different, it returns true if
cgroup.memory=nokmem is not specified and there was at least one
non-root memory control enabled cgroup ever created. This help improve
performance when kmem accounting was not actually activated. And it's
optimized with static branch.
The usage of mem_cgroup_kmem_disabled is for sub-systems that need to
preallocate data for kmem accounting since they could be initialized
before kmem accounting is activated. But count_objcg_event doesn't
need that, so using memcg_kmem_enabled is better here.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kairui Song <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Acked-by: Muchun Song <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The bodies of damon_{reclaim,lru_sort}_apply_parameters() contain
duplicates. This commit adds a common function
damon_set_region_biggest_system_ram_default() to remove the duplicates.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kaixu Xia <[email protected]>
Suggested-by: SeongJae Park <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
We had better return the 'err' value when calling kstrtoul() failed, so
the user will know why it really fails, there do little change, let it
return the 'err' value when failed.
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: SeongJae Park <[email protected]>
Signed-off-by: Xin Hao <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: SeongJae Park <[email protected]>
Reviewed-by: Xin Hao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit 44042b449872 ("mm/page_alloc: allow high-order pages to be
stored on the per-cpu lists"), the per-cpu page allocators (PCP) is not
only for order-0 pages. Update the comments.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ran Xiaokai <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In the beginning there is only one damos_action 'DAMOS_PAGEOUT' that need
to get the coldness score of a region for a scheme, which using
damon_pageout_score() to do that. But now there are also other
damos_action actions need the coldness score, so rename it to
damon_cold_score() to make more sense.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When creating hugetlb pages, the hugetlb code must first allocate
contiguous pages from a low level allocator such as buddy, cma or
memblock. The pages returned from these low level allocators are ref
counted. This creates potential issues with other code taking speculative
references on these pages before they can be transformed to a hugetlb
page. This issue has been addressed with methods and code such as that
provided in [1].
Recent discussions about vmemmap freeing [2] have indicated that it would
be beneficial to freeze all sub pages, including the head page of pages
returned from low level allocators before converting to a hugetlb page.
This helps avoid races if we want to replace the page containing vmemmap
for the head page.
There have been proposals to change at least the buddy allocator to return
frozen pages as described at [3]. If such a change is made, it can be
employed by the hugetlb code. However, as mentioned above hugetlb uses
several low level allocators so each would need to be modified to return
frozen pages. For now, we can manually freeze the returned pages. This
is done in two places:
1) alloc_buddy_huge_page, only the returned head page is ref counted.
We freeze the head page, retrying once in the VERY rare case where
there may be an inflated ref count.
2) prep_compound_gigantic_page, for gigantic pages the current code
freezes all pages except the head page. New code will simply freeze
the head page as well.
In a few other places, code checks for inflated ref counts on newly
allocated hugetlb pages. With the modifications to freeze after
allocating, this code can be removed.
After hugetlb pages are freshly allocated, they are often added to the
hugetlb free lists. Since these pages were previously ref counted, this
was done via put_page() which would end up calling the hugetlb destructor:
free_huge_page. With changes to freeze pages, we simply call
free_huge_page directly to add the pages to the free list.
In a few other places, freshly allocated hugetlb pages were immediately
put into use, and the expectation was they were already ref counted. In
these cases, we must manually ref count the page.
[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/
[3] https://lore.kernel.org/linux-mm/[email protected]/
[[email protected]: fix NULL pointer dereference]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Kravetz <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Peter Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There are no architectures that can have holes in the memory map within a
pageblock since commit 859a85ddf90e ("mm: remove pfn_valid_within() and
CONFIG_HOLES_IN_ZONE"). Update the corresponding comment.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit dacb5d8875cc ("tcp: fix page frag corruption on page fault"),
there's no caller of gfpflags_normal_context(). Remove it as this helper
is strictly tied to the sk page frag usage and there won't be other user
in the future.
[[email protected]: fix htmldocs]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There's no need to check whether order > PAGE_ALLOC_COSTLY_ORDER again.
Minor readability improvement.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The local variable buddy_pfn could be passed to buddy_merge_likely()
without initialization if the passed in order is MAX_ORDER - 1. This
looks buggy but buddy_pfn won't be used in this case as there's a order >=
MAX_ORDER - 2 check. Init buddy_pfn to 0 anyway to avoid possible future
misuse.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use helper macro SZ_1K and SZ_1M to do the size conversion. Minor
readability improvement.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
It's only used in mm/page_alloc.c now. Make it static.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Commit 390511e1476e ("mm, memory_hotplug: drop arch_free_nodedata") drops
the last caller of generic_free_nodedata(). Remove it too.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use local variable zone_idx directly since it holds the exact value of
zone_idx(). No functional change intended.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In MIGRATE_ISOLATE case, zone freepage state shouldn't be modified as
caller will take care of it. Add missing is_migrate_isolate() here to
avoid possible unbalanced freepage state. This would happen if someone
isolates the block, and then we face an MCE failure/soft-offline on a page
within that block. __mod_zone_freepage_state() will be triggered via
below call trace which already had been triggered back when block was
isolated:
take_page_off_buddy
break_down_buddy_pages
set_page_guard
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages")
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There's no caller. Remove it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The size of struct per_cpu_zonestat can be 0 on !SMP && !NUMA. In that
case, zone->per_cpu_zonestats will always equal to boot_zonestats. But in
zone_pcp_reset(), zone->per_cpu_zonestats is freed via free_percpu()
directly without checking against boot_zonestats first. boot_zonestats
will be released by free_percpu() unexpectedly.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 28f836b6777b ("mm/page_alloc: split per cpu page lists and zone stats")
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
It's only called by mm_init(). Add __init annotations to it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit 43c95bcc51e4 ("mm/page_alloc: reduce duration that IRQs are
disabled for VM counters"), zone_statistics() is not called with
interrupts disabled. Update the corresponding comment.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit 8b10b465d0e1 ("mm/page_alloc: free pages in a single pass
during bulk free"), they're not used anymore. Remove them.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit b92ca18e8ca5 ("mm/page_alloc: disassociate the pcp->high from
pcp->batch"), zone_pcp_update() is only used in mm/page_alloc.c. Move
zone_pcp_update() up to avoid forward declaration and then make it static.
No functional change intended.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "A few cleanup patches for mm", v2.
This series contains a few cleanup patches to remove the obsolete comments
and functions, use helper macro to improve readability and so on. More
details can be found in the respective changelogs.
This patch (of 16):
If ALLOC_KSWAPD is set, wake_all_kswapds() will be called to ensure kswapd
doesn't accidentally go to sleep. But when reserve_flags is set,
alloc_flags will be overwritten and ALLOC_KSWAPD is thus lost. Preserve
the ALLOC_KSWAPD flag in alloc_flags to ensure kswapd won't go to sleep
accidentally.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0a79cdad5eb2 ("mm: use alloc_flags to record if kswapd can wake")
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Chih-En Lin <[email protected]>
Acked-by: Pasha Tatashin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There is no point in returning an int from damon_set_schemes(). It always
returns 0 which is meaningless for the caller, so change it to return void
directly.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
It's a fs_initcall entry, add __init annotation to it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiu Jianfeng <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
damon_lru_sort_wmarks is only used in lru_sort.c now, change it to static.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 189aa3d58206 ("mm/damon/lru_sort: use watermarks parameters generator macro")
Signed-off-by: Yang Yingliang <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
damon_reclaim_wmarks is only used in reclaim.c now, change it to static.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 89dd02d8abd1 ("mm/damon/reclaim: use watermarks parameters generator macro")
Signed-off-by: Yang Yingliang <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
We could use 'struct damon_target *' directly instead of 'void *' in
target_valid() operation to make code simple.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In damon_lru_sort_new_hot_scheme() and damon_lru_sort_new_cold_scheme(),
they have so much in common, so we can combine them into a single
function, and we just need to distinguish their differences.
[[email protected]: change damon_lru_sort_stub_pattern to static]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xin Hao <[email protected]>
Signed-off-by: SeongJae Park <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Suggested-by: SeongJae Park <[email protected]>
Reviewed-by: Xin Hao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In damon_sysfs_destroy_targets(), we call damon_target_has_pid() to check
whether the 'ctx' include a valid pid, but there no need to call
damon_target_has_pid() to check repeatedly, just need call it once.
[[email protected]: more simplified code calls damon_target_has_pid()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xin Hao <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Among other data, CPU entry area holds exception stacks, so addresses from
this area can be passed to kmsan_get_metadata().
This previously led to kmsan_get_metadata() returning NULL, which in turn
resulted in a warning that triggered further attempts to call
kmsan_get_metadata() in the exception context, which quickly exhausted the
exception stack.
This patch allocates shadow and origin for the CPU entry area on x86 and
introduces arch_kmsan_get_meta_or_null(), which performs arch-specific
metadata mapping.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Fixes: 21d723a7c1409 ("kmsan: add KMSAN runtime core")
Cc: Alexander Viro <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ilya Leoshkevich <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vegard Nossum <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Make KMSAN usable by adding the necessary Kconfig bits.
Also declare x86-specific functions checking address validity in
arch/x86/include/asm/kmsan.h.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ilya Leoshkevich <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vegard Nossum <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Functions implementing the a_ops->write_end() interface accept the `void
*fsdata` parameter that is supposed to be initialized by the corresponding
a_ops->write_begin() (which accepts `void **fsdata`).
However not all a_ops->write_begin() implementations initialize `fsdata`
unconditionally, so it may get passed uninitialized to a_ops->write_end(),
resulting in undefined behavior.
Fix this by initializing fsdata with NULL before the call to
write_begin(), rather than doing so in all possible a_ops implementations.
This patch covers only the following cases found by running x86 KMSAN
under syzkaller:
- generic_perform_write()
- cont_expand_zero() and generic_cont_expand_simple()
- page_symlink()
Other cases of passing uninitialized fsdata may persist in the codebase.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ilya Leoshkevich <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vegard Nossum <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When executing BPF programs, certain registers may get passed
uninitialized to helper functions. E.g. when performing a JMP_CALL,
registers BPF_R1-BPF_R5 are always passed to the helper, no matter how
many of them are actually used.
Passing uninitialized values as function parameters is technically
undefined behavior, so we work around it by always initializing the
registers.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ilya Leoshkevich <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vegard Nossum <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|