Age | Commit message (Collapse) | Author | Files | Lines |
|
try_get_folio() takes in a page, then chooses to do some folio operations
based on the flags (either FOLL_GET or FOLL_PIN). We can rewrite this
function to be more purpose oriented.
After calling try_get_folio(), if neither FOLL_GET nor FOLL_PIN are set,
warn and fail. If FOLL_GET is set we can return the result. If FOLL_GET
is not set then FOLL_PIN is set, so we pin the folio.
This change assists with folio conversions, and makes the function more
readable.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
verify_dma_pinned() checks that pages are dma-pinned. We can convert this
to use folios.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Introduce folio_migratetype() as a folio equivalent for
get_pageblock_migratetype(). This function intends to return the
migratetype the folio is located in, hence the name choice.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Replace is_longterm_pinnable_page()", v2.
This patchset introduces some more helper functions for the folio
conversions, and converts all callers of is_longterm_pinnable_page() to
use folios.
This patch (of 5):
Introduce folio_is_zone_movable() to act as a folio equivalent for
is_zone_movable_page(). This is to assist in later folio conversions.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vishal Moola (Oracle) <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
KASAN's boot time kernel parameter 'kasan.fault=' currently supports
'report' and 'panic', which results in either only reporting bugs or also
panicking on reports.
However, some users may wish to have more control over when KASAN reports
result in a kernel panic: in particular, KASAN reported invalid _writes_
are of special interest, because they have greater potential to corrupt
random kernel memory or be more easily exploited.
To panic on invalid writes only, introduce 'kasan.fault=panic_on_write',
which allows users to choose to continue running on invalid reads, but
panic only on invalid writes.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Marco Elver <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
Cc: Aleksandr Nogikh <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Taras Madan <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Recompression threshold should be below huge-size-class watermark. Any
object larger than huge-size-class is a "huge object" and occupies a
whole physical page on the zsmalloc side, in other words it's
incompressible, as far as zsmalloc is concerned.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sergey Senozhatsky <[email protected]>
Suggested-by: Brian Geffon <[email protected]>
Acked-by: Brian Geffon <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When an entry started writeback, it used to be invalidated with ref count
logic alone, meaning that it would stay on the tree until all references
were put. The problem with this behavior is that as soon as the writeback
started, the ownership of the data held by the entry is passed to the
swapcache and should not be left in zswap too. Currently there are no
known issues because of this, but this change explicitly invalidates an
entry that started writeback to reduce opportunities for future bugs.
This patch is a follow up on the series titled "mm: zswap: move writeback
LRU from zpool to zswap" + commit f090b7949768("mm: zswap: support
exclusive loads").
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Domenico Cerasuolo <[email protected]>
Suggested-by: Johannes Weiner <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit c7c3dec1c9db ("mm: rmap: remove lock_page_memcg()"),
no more user, kill lock_page_memcg() and unlock_page_memcg().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
cma: display pfn as well as pfn_to_page(pfn)
page_owner: display pfn in hex rather than decimal
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kassey Li <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Support large folios in block_truncate_page() and avoid three hidden calls
to compound_head().
[[email protected]: fix check of filemap_grab_folio() return value]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Bob Peterson <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Saves a call to compound_head() and may be needed to support block size >
PAGE_SIZE.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Bob Peterson <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Its one caller already has a folio, so switch it to use the folio API.
Removes a hidden call to compound_head().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Bob Peterson <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use the folio API and pass the folio from both callers. Saves a hidden
call to compound_head().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Bob Peterson <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Get a folio from the page cache instead of a page, then use the folio API
throughout. Removes a few calls to compound_head() and may be needed to
support block size > PAGE_SIZE.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Bob Peterson <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Most of the callers already have a folio; convert reiserfs_write_end() to
have a folio. Removes a couple of hidden calls to compound_head().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Bob Peterson <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This removes a hidden call to compound_head() inside
__block_commit_write() and moves it to those callers which are still page
based. Also make block_write_end() safe for large folios.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Bob Peterson <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
If any page in a folio is dirtied, dirty the entire folio. Removes a
number of hidden calls to compound_head() and references to page->mapping
and page->index. Fixes a pre-existing bug where we could mark a folio as
dirty if the file is truncated to a multiple of the page size just as we
take the page fault. I don't believe this bug has any bad effect, it's
just inefficient.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Bob Peterson <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Keep the interface as struct page, but work entirely on the folio
internally. Removes several PAGE_SIZE assumptions and removes some
references to page->index and page->mapping.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Tested-by: Bob Peterson <[email protected]>
Reviewed-by: Bob Peterson <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
We may someday support folios larger than 4GB, so use a size_t for the
byte count within a folio to prevent unpleasant truncations.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Tested-by: Bob Peterson <[email protected]>
Reviewed-by: Bob Peterson <[email protected]>
Reviewed-by: Andreas Gruenbacher <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Remove nine hidden calls to compound_head() by using a folio instead of a
page.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Tested-by: Bob Peterson <[email protected]>
Reviewed-by: Bob Peterson <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add support for large folios and remove some accesses to page->mapping and
page->index.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Tested-by: Bob Peterson <[email protected]>
Reviewed-by: Bob Peterson <[email protected]>
Reviewed-by: Andreas Gruenbacher <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Remove a couple of folio->page conversions in the callers, and two calls
to compound_head() in the function itself. Rename it from
__gfs2_jdata_writepage() to __gfs2_jdata_write_folio().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Tested-by: Bob Peterson <[email protected]>
Reviewed-by: Bob Peterson <[email protected]>
Reviewed-by: Andreas Gruenbacher <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "gfs2/buffer folio changes for 6.5", v3.
This kind of started off as a gfs2 patch series, then became entwined with
buffer heads once I realised that gfs2 was the only remaining caller of
__block_write_full_page(). For those not in the gfs2 world, the big point
of this series is that block_write_full_page() should now handle large
folios correctly.
This patch (of 14):
Replace a few implicit calls to compound_head() with one explicit one.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Tested-by: Bob Peterson <[email protected]>
Reviewed-by: Bob Peterson <[email protected]>
Reviewed-by: Andreas Gruenbacher <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
These are equivalent, but DEFINE_READ_MOSTLY_HASHTABLE exists to define
a hashtable in the .data..read_mostly section.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
sharing
When running UnixBench/Execl throughput case, false sharing is observed
due to frequent read on base_addr and write on free_bytes, chunk_md.
UnixBench/Execl represents a class of workload where bash scripts are
spawned frequently to do some short jobs. It will do system call on execl
frequently, and execl will call mm_init to initialize mm_struct of the
process. mm_init will call __percpu_counter_init for percpu_counters
initialization. Then pcpu_alloc is called to read the base_addr of
pcpu_chunk for memory allocation. Inside pcpu_alloc, it will call
pcpu_alloc_area to allocate memory from a specified chunk. This function
will update "free_bytes" and "chunk_md" to record the rest free bytes and
other meta data for this chunk. Correspondingly, pcpu_free_area will also
update these 2 members when free memory.
Call trace from perf is as below:
+ 57.15% 0.01% execl [kernel.kallsyms] [k] __percpu_counter_init
+ 57.13% 0.91% execl [kernel.kallsyms] [k] pcpu_alloc
- 55.27% 54.51% execl [kernel.kallsyms] [k] osq_lock
- 53.54% 0x654278696e552f34
main
__execve
entry_SYSCALL_64_after_hwframe
do_syscall_64
__x64_sys_execve
do_execveat_common.isra.47
alloc_bprm
mm_init
__percpu_counter_init
pcpu_alloc
- __mutex_lock.isra.17
In current pcpu_chunk layout, `base_addr' is in the same cache line with
`free_bytes' and `chunk_md', and `base_addr' is at the last 8 bytes. This
patch moves `bound_map' up to `base_addr', to let `base_addr' locate in a
new cacheline.
With this change, on Intel Sapphire Rapids 112C/224T platform, based on
v6.4-rc4, the 160 parallel score improves by 24%.
The pcpu_chunk struct is a backing data structure per chunk, so the
additional memory should not be dramatic. A chunk covers ballpark
between 64kb and 512kb memory depending on some config and boot time
stuff, so I believe the additional memory used here is nominal at best.
Working the #s on my desktop:
Percpu: 58624 kB
28 cores -> ~2.1MB of percpu memory.
At say ~128KB per chunk -> 33 chunks, generously 40 chunks.
Adding alignment might bump the chunk size ~64 bytes, so in total ~2KB
of overhead?
I believe we can do a little better to avoid eating that full padding,
so likely less than that.
[[email protected]: changelog details]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yu Ma <[email protected]>
Reviewed-by: Tim Chen <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
establish_demotion_targets() is defined while CONFIG_MIGRATION is
enabled. There's no need to check it again.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add __meminit to kcompactd_run() and kcompactd_stop() to ensure they're
default to __init when memory hotplug is not enabled.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
commit c7f8f31c00d1 ("mm: separate vma->lock from vm_area_struct")
left this behind.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: YueHaibing <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This inline function is unused, remove it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: YueHaibing <[email protected]>
Cc: Jeff Xu <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Android reported a performance regression in the userfaultfd unmap path.
A closer inspection on the userfaultfd_unmap_prep() change showed that a
second tree walk would be necessary in the reworked code.
Fix the regression by passing each VMA that will be unmapped through to
the userfaultfd_unmap_prep() function as they are added to the unmap list,
instead of re-walking the tree for the VMA.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs")
Signed-off-by: Liam R. Howlett <[email protected]>
Reported-by: Suren Baghdasaryan <[email protected]>
Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The patch ("mm/folio: Avoid special handling for order value 0 in
folio_set_order") [1] removed the need for special handling of order = 0
in folio_set_order. Now, folio_set_order and set_compound_order becomes
similar function. This patch removes the set_compound_order and uses
folio_set_order instead.
[1] https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tarun Sahu <[email protected]>
Reviewed-by Sidhartha Kumar <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Previously, zswap_header served the purpose of storing the swpentry within
zpool pages. This allowed zpool implementations to pass relevant
information to the writeback function. However, with the current
implementation, writeback is directly handled within zswap. Consequently,
there is no longer a necessity for zswap_header, as the swp_entry_t can be
stored directly in zswap_entry.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Domenico Cerasuolo <[email protected]>
Tested-by: Yosry Ahmed <[email protected]>
Suggested-by: Yosry Ahmed <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
zswap_writeback_entry() used to be a callback for the backends, which
don't know about struct zswap_entry.
Now that the only user is the generic zswap LRU reclaimer, it can be
simplified: pass the pinned zswap_entry directly, and consolidate the
refcount management in the shrink function.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Domenico Cerasuolo <[email protected]>
Tested-by: Yosry Ahmed <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Now that all three zswap backends have removed their shrink code, it is
no longer necessary for the zpool interface to include shrink/writeback
endpoints.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Domenico Cerasuolo <[email protected]>
Reviewed-by: Yosry Ahmed <[email protected]>
Acked-by: Nhat Pham <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Switch zsmalloc to the new generic zswap LRU and remove its custom
implementation.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Domenico Cerasuolo <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Nhat Pham <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Tested-by: Yosry Ahmed <[email protected]>
Acked-by: Sergey Senozhatsky <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Switch z3fold to the new generic zswap LRU and remove its custom
implementation.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Domenico Cerasuolo <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Switch zbud to the new generic zswap LRU and remove its custom
implementation.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Domenico Cerasuolo <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing
the LRU management. In the current implementation, the LRU is maintained
within each zpool driver, resulting in duplicated code across the three
drivers. The proposed change consists in moving the LRU management from
the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the
codebase. By unifying the reclaim loop and consolidating LRU handling
within zswap, we can eliminate redundant code and improve
maintainability. Additionally, this change enables the reclamation of
stored pages in their actual LRU order. Presently, the zpool drivers
link backing pages in an LRU, causing compressed pages with different
LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the
LRU and the reclaim loop in zswap, but it is not used yet because all
three driver implementations are marked as zpool_evictable.
The following three commits modify each zpool driver to be not
zpool_evictable, allowing the use of the reclaim loop in zswap.
As the drivers removed their shrink functions, the zpool interface is
then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink.
Finally, the code in zswap is further cleaned up by simplifying the
writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink
function, which is called from zpool_shrink. However, with this commit, a
unified shrink function is added to zswap. The ultimate goal is to
eliminate the need for zpool_shrink once all zpool implementations have
dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on
adding the mechanism itself. No modifications are made to the backends,
meaning that functionally, there are no immediate changes. The zswap
mechanism will only come into effect once the backends have removed their
shrink code. The subsequent commits will address the modifications needed
in the backends.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Domenico Cerasuolo <[email protected]>
Acked-by: Nhat Pham <[email protected]>
Tested-by: Yosry Ahmed <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Yosry Ahmed <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Remove all defines which aren't needed after correctly including the
kernel header files.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Stefan Roesch <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
It is wrong to include unprocessed user header files directly. They are
processed to "<source_tree>/usr/include" by running "make headers" and
they are included in selftests by kselftest makefiles automatically with
help of KHDR_INCLUDES variable. These headers should always bulilt first
before building kselftests.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 07115fcc15b4 ("selftests/mm: add new selftests for KSM")
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Stefan Roesch <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper. This means that by default, the accesses change from a
C dereference to a READ_ONCE(). This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.
But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte. Arch code
is deliberately not converted, as the arch code knows best. It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.
Conversion was done using Coccinelle:
----
// $ make coccicheck \
// COCCI=ptepget.cocci \
// SPFLAGS="--include-headers" \
// MODE=patch
virtual patch
@ depends on patch @
pte_t *v;
@@
- *v
+ ptep_get(v)
----
Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so. This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.
Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot. The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get(). HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined. Fix by continuing to do a direct dereference
when MMU=n. This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.
Link: https://lkml.kernel.org/r/[email protected]
Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Ryan Roberts <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dimitri Sivanich <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oleksandr Tyshchenko <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There are many call sites that directly dereference a pte_t pointer. This
makes it very difficult to properly encapsulate a page table in the arch
code without having to allocate shadow page tables.
We will shortly solve this by replacing all the call sites with ptep_get()
calls. But there are call sites above the function definition in the
header file, so let's move ptep_get() to an earlier location to solve that
problem. And move pmdp_get() at the same time to keep it close to
ptep_get().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryan Roberts <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dimitri Sivanich <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oleksandr Tyshchenko <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Encapsulate PTE contents from non-arch code", v3.
A series to improve the encapsulation of pte entries by disallowing
non-arch code from directly dereferencing pte_t pointers.
This means that by default, the accesses change from a C dereference to a
READ_ONCE(). This is technically the correct thing to do since where
pgtables are modified by HW (for access/dirty) they are volatile and
therefore we should always ensure READ_ONCE() semantics.
But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte. Arch code
is deliberately not converted, as the arch code knows best. It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.
This patch (of 3):
The page table dumper uses walk_page_range_novma() to walk the page
tables, which does not lock the PTL before calling the pte_entry()
callback. Therefore, the page table dumper's callback must use
ptep_get_lockless() rather than ptep_get() to ensure that the pte it reads
is not torn or otherwise corrupt when racing with writers.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryan Roberts <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dimitri Sivanich <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oleksandr Tyshchenko <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: kernel test robot <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The sh architecture defines ARCH_DMA_MINALIGN in asm/page.h. Move it to
asm/cache.h to allow a generic ARCH_DMA_MINALIGN definition in
linux/cache.h without redefine errors/warnings.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The microblaze architecture defines ARCH_DMA_MINALIGN in asm/page.h. Move
it to asm/cache.h to allow a generic ARCH_DMA_MINALIGN definition in
linux/cache.h without redefine errors/warnings.
While at it, also move ARCH_SLAB_MINALIGN to asm/cache.h for
consistency.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Move the ARCH_DMA_MINALIGN definition to asm/cache.h".
The ARCH_KMALLOC_MINALIGN reduction series defines a generic
ARCH_DMA_MINALIGN in linux/cache.h:
https://lore.kernel.org/r/[email protected]/
Unfortunately, this causes a duplicate definition warning for
microblaze, powerpc (32-bit only) and sh as these architectures define
ARCH_DMA_MINALIGN in a different file than asm/cache.h. Move the macro
to asm/cache.h to avoid this issue and also bring them in line with the
other architectures.
This patch (of 3):
The powerpc architecture defines ARCH_DMA_MINALIGN in asm/page_32.h and
only if CONFIG_NOT_COHERENT_CACHE is enabled (32-bit platforms only).
Move this macro to asm/cache.h to allow a generic ARCH_DMA_MINALIGN
definition in linux/cache.h without redefine errors/warnings.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
With the DMA bouncing of unaligned kmalloc() buffers now in place, enable
it for arm64 to allow the kmalloc-{8,16,32,48,96} caches. In addition,
always create the swiotlb buffer even when the end of RAM is within the
32-bit physical address range (the swiotlb buffer can still be disabled on
the kernel command line).
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
Tested-by: Isaac J. Manjarres <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jerry Snitselaar <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Lars-Peter Clausen <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mike Snitzer <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Saravana Kannan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
If an architecture opted in to DMA bouncing of unaligned kmalloc() buffers
(ARCH_WANT_KMALLOC_DMA_BOUNCE), reduce the minimum kmalloc() cache
alignment below cache-line size to ARCH_KMALLOC_MINALIGN.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Tested-by: Isaac J. Manjarres <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jerry Snitselaar <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Lars-Peter Clausen <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mike Snitzer <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Saravana Kannan <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Similarly to the direct DMA, bounce small allocations as they may have
originated from a kmalloc() cache not safe for DMA. Unlike the direct
DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all
non-coherent devices as this would break some cases where the iova is
expected to be contiguous (dmabuf). Instead, scan the scatterlist for
any small sizes and only go the swiotlb path if any element of the list
needs bouncing (note that iommu_dma_map_page() would still only bounce
those buffers which are not DMA-aligned).
To avoid scanning the scatterlist on the 'sync' operations, introduce an
SG_DMA_SWIOTLB flag set by iommu_dma_map_sg_swiotlb(). The
dev_use_swiotlb() function together with the newly added
dev_use_sg_swiotlb() now check for both untrusted devices and unaligned
kmalloc() buffers (suggested by Robin Murphy).
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Tested-by: Isaac J. Manjarres <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jerry Snitselaar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Lars-Peter Clausen <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mike Snitzer <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Saravana Kannan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
For direct DMA, if the size is small enough to have originated from a
kmalloc() cache below ARCH_DMA_MINALIGN, check its alignment against
dma_get_cache_alignment() and bounce if necessary. For larger sizes, it
is the responsibility of the DMA API caller to ensure proper alignment.
At this point, the kmalloc() caches are properly aligned but this will
change in a subsequent patch.
Architectures can opt in by selecting DMA_BOUNCE_UNALIGNED_KMALLOC.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Tested-by: Isaac J. Manjarres <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jerry Snitselaar <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Lars-Peter Clausen <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mike Snitzer <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Saravana Kannan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|