aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-03-24kasan, page_alloc: rework kasan_unpoison_pages call siteAndrey Konovalov1-7/+12
Rework the checks around kasan_unpoison_pages() call in post_alloc_hook(). The logical condition for calling this function is: - If a software KASAN mode is enabled, we need to mark shadow memory. - Otherwise, HW_TAGS KASAN is enabled, and it only makes sense to set tags if they haven't already been cleared by tag_clear_highpage(), which is indicated by init_tags. This patch concludes the changes for post_alloc_hook(). Link: https://lkml.kernel.org/r/0ecebd0d7ccd79150e3620ea4185a32d3dfe912f.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: move kernel_init_free_pages in post_alloc_hookAndrey Konovalov1-4/+8
Pull the kernel_init_free_pages() call in post_alloc_hook() out of the big if clause for better code readability. This also allows for more simplifications in the following patch. This patch does no functional changes. Link: https://lkml.kernel.org/r/a7a76456501eb37ddf9fca6529cee9555e59cdb1.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: move SetPageSkipKASanPoison in post_alloc_hookAndrey Konovalov1-3/+3
Pull the SetPageSkipKASanPoison() call in post_alloc_hook() out of the big if clause for better code readability. This also allows for more simplifications in the following patches. Also turn the kasan_has_integrated_init() check into the proper kasan_hw_tags_enabled() one. These checks evaluate to the same value, but logically skipping kasan poisoning has nothing to do with integrated init. Link: https://lkml.kernel.org/r/7214c1698b754ccfaa44a792113c95cc1f807c48.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: combine tag_clear_highpage calls in post_alloc_hookAndrey Konovalov1-16/+16
Move tag_clear_highpage() loops out of the kasan_has_integrated_init() clause as a code simplification. This patch does no functional changes. Link: https://lkml.kernel.org/r/587e3fc36358b88049320a89cc8dc6deaecb0cda.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: merge kasan_alloc_pages into post_alloc_hookAndrey Konovalov4-37/+16
Currently, the code responsible for initializing and poisoning memory in post_alloc_hook() is scattered across two locations: kasan_alloc_pages() hook for HW_TAGS KASAN and post_alloc_hook() itself. This is confusing. This and a few following patches combine the code from these two locations. Along the way, these patches do a step-by-step restructure the many performed checks to make them easier to follow. Replace the only caller of kasan_alloc_pages() with its implementation. As kasan_has_integrated_init() is only true when CONFIG_KASAN_HW_TAGS is enabled, moving the code does no functional changes. Also move init and init_tags variables definitions out of kasan_has_integrated_init() clause in post_alloc_hook(), as they have the same values regardless of what the if condition evaluates to. This patch is not useful by itself but makes the simplifications in the following patches easier to follow. Link: https://lkml.kernel.org/r/5ac7e0b30f5cbb177ec363ddd7878a3141289592.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: refactor init checks in post_alloc_hookAndrey Konovalov1-8/+10
Separate code for zeroing memory from the code clearing tags in post_alloc_hook(). This patch is not useful by itself but makes the simplifications in the following patches easier to follow. This patch does no functional changes. Link: https://lkml.kernel.org/r/2283fde963adfd8a2b29a92066f106cc16661a3c.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan: only apply __GFP_ZEROTAGS when memory is zeroedAndrey Konovalov1-1/+2
__GFP_ZEROTAGS should only be effective if memory is being zeroed. Currently, hardware tag-based KASAN violates this requirement. Fix by including an initialization check along with checking for __GFP_ZEROTAGS. Link: https://lkml.kernel.org/r/f4f4593f7f675262d29d07c1938db5bd0cd5e285.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm: clarify __GFP_ZEROTAGS commentAndrey Konovalov1-2/+4
__GFP_ZEROTAGS is intended as an optimization: if memory is zeroed during allocation, it's possible to set memory tags at the same time with little performance impact. Clarify this intention of __GFP_ZEROTAGS in the comment. Link: https://lkml.kernel.org/r/cdffde013973c5634a447513e10ec0d21e8eee29.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan: drop skip_kasan_poison variable in free_pages_prepareAndrey Konovalov1-2/+1
skip_kasan_poison is only used in a single place. Call should_skip_kasan_poison() directly for simplicity. Link: https://lkml.kernel.org/r/1d33212e79bc9ef0b4d3863f903875823e89046f.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Suggested-by: Marco Elver <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: init memory of skipped pages on freeAndrey Konovalov1-3/+8
Since commit 7a3b83537188 ("kasan: use separate (un)poison implementation for integrated init"), when all init, kasan_has_integrated_init(), and skip_kasan_poison are true, free_pages_prepare() doesn't initialize the page. This is wrong. Fix it by remembering whether kasan_poison_pages() performed initialization, and call kernel_init_free_pages() if it didn't. Reordering kasan_poison_pages() and kernel_init_free_pages() is OK, since kernel_init_free_pages() can handle poisoned memory. Link: https://lkml.kernel.org/r/1d97df75955e52727a3dc1c4e33b3b50506fc3fd.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: simplify kasan_poison_pages call siteAndrey Konovalov1-13/+5
Simplify the code around calling kasan_poison_pages() in free_pages_prepare(). This patch does no functional changes. Link: https://lkml.kernel.org/r/ae4f9bcf071577258e786bcec4798c145d718c46.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: merge kasan_free_pages into free_pages_prepareAndrey Konovalov4-22/+5
Currently, the code responsible for initializing and poisoning memory in free_pages_prepare() is scattered across two locations: kasan_free_pages() for HW_TAGS KASAN and free_pages_prepare() itself. This is confusing. This and a few following patches combine the code from these two locations. Along the way, these patches also simplify the performed checks to make them easier to follow. Replaces the only caller of kasan_free_pages() with its implementation. As kasan_has_integrated_init() is only true when CONFIG_KASAN_HW_TAGS is enabled, moving the code does no functional changes. This patch is not useful by itself but makes the simplifications in the following patches easier to follow. Link: https://lkml.kernel.org/r/303498d15840bb71905852955c6e2390ecc87139.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: move tag_clear_highpage out of kernel_init_free_pagesAndrey Konovalov1-11/+13
Currently, kernel_init_free_pages() serves two purposes: it either only zeroes memory or zeroes both memory and memory tags via a different code path. As this function has only two callers, each using only one code path, this behaviour is confusing. Pull the code that zeroes both memory and tags out of kernel_init_free_pages(). As a result of this change, the code in free_pages_prepare() starts to look complicated, but this is improved in the few following patches. Those improvements are not integrated into this patch to make diffs easier to read. This patch does no functional changes. Link: https://lkml.kernel.org/r/7719874e68b23902629c7cf19f966c4fd5f57979.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24kasan, page_alloc: deduplicate should_skip_kasan_poisonAndrey Konovalov1-22/+33
Patch series "kasan, vmalloc, arm64: add vmalloc tagging support for SW/HW_TAGS", v6. This patchset adds vmalloc tagging support for SW_TAGS and HW_TAGS KASAN modes. About half of patches are cleanups I went for along the way. None of them seem to be important enough to go through stable, so I decided not to split them out into separate patches/series. The patchset is partially based on an early version of the HW_TAGS patchset by Vincenzo that had vmalloc support. Thus, I added a Co-developed-by tag into a few patches. SW_TAGS vmalloc tagging support is straightforward. It reuses all of the generic KASAN machinery, but uses shadow memory to store tags instead of magic values. Naturally, vmalloc tagging requires adding a few kasan_reset_tag() annotations to the vmalloc code. HW_TAGS vmalloc tagging support stands out. HW_TAGS KASAN is based on Arm MTE, which can only assigns tags to physical memory. As a result, HW_TAGS KASAN only tags vmalloc() allocations, which are backed by page_alloc memory. It ignores vmap() and others. This patch (of 39): Currently, should_skip_kasan_poison() has two definitions: one for when CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, one for when it's not. Instead of duplicating the checks, add a deferred_pages_enabled() helper and use it in a single should_skip_kasan_poison() definition. Also move should_skip_kasan_poison() closer to its caller and clarify all conditions in the comment. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/658b79f5fb305edaf7dc16bc52ea870d3220d4a8.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Evgenii Stepanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/migration: add trace events for base page and HugeTLB migrationsAnshuman Khandual4-2/+40
This adds two trace events for base page and HugeTLB page migrations. These events, closely follow the implementation details like setting and removing of PTE migration entries, which are essential operations for migration. The new CREATE_TRACE_POINTS in <mm/rmap.c> covers both <events/migration.h> and <events/tlb.h> based trace events. Hence drop redundant CREATE_TRACE_POINTS from other places which could have otherwise conflicted during build. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Reported-by: kernel test robot <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Zi Yan <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/migration: add trace events for THP migrationsAnshuman Khandual3-1/+32
Patch series "mm/migration: Add trace events", v3. This adds trace events for all migration scenarios including base page, THP and HugeTLB. This patch (of 3): This adds two trace events for PMD based THP migration without split. These events closely follow the implementation details like setting and removing of PMD migration entries, which are essential operations for THP migration. This moves CREATE_TRACE_POINTS into generic THP from powerpc for these new trace events to be available on other platforms as well. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Zi Yan <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/thp: fix NR_FILE_MAPPED accounting in page_*_file_rmap()Hugh Dickins1-17/+14
NR_FILE_MAPPED accounting in mm/rmap.c (for /proc/meminfo "Mapped" and /proc/vmstat "nr_mapped" and the memcg's memory.stat "mapped_file") is slightly flawed for file or shmem huge pages. It is well thought out, and looks convincing, but there's a racy case when the careful counting in page_remove_file_rmap() (without page lock) gets discarded. So that in a workload like two "make -j20" kernel builds under memory pressure, with cc1 on hugepage text, "Mapped" can easily grow by a spurious 5MB or more on each iteration, ending up implausibly bigger than most other numbers in /proc/meminfo. And, hypothetically, might grow to the point of seriously interfering in mm/vmscan.c's heuristics, which do take NR_FILE_MAPPED into some consideration. Fixed by moving the __mod_lruvec_page_state() down to where it will not be missed before return (and I've grown a bit tired of that oft-repeated but-not-everywhere comment on the __ness: it gets lost in the move here). Does page_add_file_rmap() need the same change? I suspect not, because page lock is held in all relevant cases, and its skipping case looks safe; but it's much easier to be sure, if we do make the same change. Link: https://lkml.kernel.org/r/[email protected] Fixes: dd78fedde4b9 ("rmap: support file thp") Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm: filemap_unaccount_folio() large skip mapcount fixupHugh Dickins1-13/+13
The page_mapcount_reset() when folio_mapped() while mapping_exiting() was devised long before there were huge or compound pages in the cache. It is still valid for small pages, but not at all clear what's right to check and reset on large pages. Just don't try when folio_test_large(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm: delete __ClearPageWaiters()Hugh Dickins4-22/+9
The PG_waiters bit is not included in PAGE_FLAGS_CHECK_AT_FREE, and vmscan.c's free_unref_page_list() callers rely on that not to generate bad_page() alerts. So __page_cache_release(), put_pages_list() and release_pages() (and presumably copy-and-pasted free_zone_device_page()) are redundant and misleading to make a special point of clearing it (as the "__" implies, it could only safely be used on the freeing path). Delete __ClearPageWaiters(). Remark on this in one of the "possible" comments in folio_wake_bit(), and delete the superfluous comments. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hugh Dickins <[email protected]> Tested-by: Yu Zhao <[email protected]> Reviewed-by: Yang Shi <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24selftest/vm: add helpers to detect PAGE_SIZE and PAGE_SHIFTMike Rapoport2-4/+26
PAGE_SIZE is not 4096 in many configurations, particularly ppc64 uses 64K pages in majority of cases. Add helpers to detect PAGE_SIZE and PAGE_SHIFT dynamically. Without this tests are broken w.r.t reading /proc/self/pagemap if (pread(pagemap_fd, ent, sizeof(ent), (uintptr_t)ptr >> (PAGE_SHIFT - 3)) != sizeof(ent)) err(2, "read pagemap"); Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24selftest/vm: add util.h and and move helper functions thereAneesh Kumar K.V3-75/+52
Avoid code duplication by adding util.h. No functional change in this patch. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm: unexport page_init_poisonChristoph Hellwig1-1/+0
page_init_poison is only used in core MM code, so unexport it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort.c: support for user-defined culling rulesJiajian Ye2-22/+157
When viewing page owner information, we may want to cull blocks of information with our own rules. So it is important to enhance culling function to provide the support for customizing culling rules. Therefore, following adjustments are made: 1. Add --cull option to support the culling of blocks of information with user-defined culling rules. ./page_owner_sort <input> <output> --cull=<rules> ./page_owner_sort <input> <output> --cull <rules> <rules> is a single argument in the form of a comma-separated list to specify individual culling rules, by the sequence of keys k1,k2, .... Mixed use of abbreviated and complete-form of keys is allowed. For reference, please see the document(Documentation/vm/page_owner.rst). Now, assuming two blocks in the input file are as follows: Page allocated via order 0, mask xxxx, pid 1, tgid 1 (task_name_demo) PFN xxxx prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158 Page allocated via order 0, mask xxxx, pid 32, tgid 32 (task_name_demo) PFN xxxx prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158 If we want to cull the blocks by stacktrace and task command name, we can use this command: ./page_owner_sort <input> <output> --cull=stacktrace,name The output would be like: 2 times, 2 pages, task_comm_name: task_name_demo prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158 As we can see, these two blocks are culled successfully, for they share the same pid and task command name. However, if we want to cull the blocks by pid, stacktrace and task command name, we can this command: ./page_owner_sort <input> <output> --cull=stacktrace,name,pid The output would be like: 1 times, 1 pages, PID 1, task_comm_name: task_name_demo prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158 1 times, 1 pages, PID 32, task_comm_name: task_name_demo prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158 As we can see, these two blocks are failed to cull, for their PIDs are different. 2. Add explanations of --cull options to the document. This work is coauthored by Yixuan Cao Shenghong Han Yinan Zhang Chongxi Zhao Yuhong Feng Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiajian Ye <[email protected]> Cc: Yixuan Cao <[email protected]> Cc: Shenghong Han <[email protected]> Cc: Yinan Zhang <[email protected]> Cc: Chongxi Zhao <[email protected]> Cc: Yuhong Feng <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Sean Anderson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort.c: support for selecting by PID, TGID or task ↵Jiajian Ye2-27/+98
command name When viewing page owner information, we may also need to select the blocks by PID, TGID or task command name, which helps to get more accurate page allocation information as needed. Therefore, following adjustments are made: 1. Add three new options, including --pid, --tgid and --name, to support the selection of information blocks by a specific pid, tgid and task command name. In addtion, multiple options are allowed to be used at the same time. ./page_owner_sort [input] [output] --pid <PID> ./page_owner_sort [input] [output] --tgid <TGID> ./page_owner_sort [input] [output] --name <TASK_COMMAND_NAME> Assuming a scenario when a multi-threaded program, ./demo (PID = 5280), is running, and ./demo creates a child process (PID = 5281). $ps PID TTY TIME CMD 5215 pts/0 00:00:00 bash 5280 pts/0 00:00:00 ./demo 5281 pts/0 00:00:00 ./demo 5282 pts/0 00:00:00 ps It would be better to filter out the records with tgid=5280 and the task name "demo" when debugging the parent process, and the specific usage is ./page_owner_sort [input] [output] --tgid 5280 --name demo 2. Add explanations of three new options, including --pid, --tgid and --name, to the document. This work is coauthored by Shenghong Han <[email protected]>, Yixuan Cao <[email protected]>, Yinan Zhang <[email protected]>, Chongxi Zhao <[email protected]>, Yuhong Feng <[email protected]>. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiajian Ye <[email protected]> Cc: Sean Anderson <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Zhenliang Wei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort: support for sorting by task command nameJiajian Ye2-1/+35
When viewing page owner information, we may also need to the block to be sorted by task command name. Therefore, the following adjustments are made: 1. Add a member variable to record task command name of block. 2. Add a new -n option to sort the information of blocks by task command name. 3. Add -n option explanation in the document. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiajian Ye <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Sean Anderson <[email protected]> Cc: Yixuan Cao <[email protected]> Cc: Zhenliang Wei <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort: fix three trivival placesJiajian Ye1-18/+19
The following adjustments are made: 1. Instead of using another array to cull the blocks after sorting, reuse the old array. So there is no need to malloc a new array. 2. When enabling '-f' option to filter out the blocks which have been released, only add those have not been released in the list, rather than add all of blocks in the list and then do the filtering when printing the result. 3. When enabling '-c' option to cull the blocks by comparing stacktrace, print the stacetrace rather than the total block. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiajian Ye <[email protected]> Cc: <[email protected]> Cc: Sean Anderson <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Yixuan Cao <[email protected]> Cc: <[email protected]> Cc: Zhenliang Wei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort.c: support sorting by tgid and update documentationJiajian Ye2-3/+38
When the "page owner" information is read, the information sorted by TGID is expected. As a result, the following adjustments have been made: 1. Add a new -P option to sort the information of blocks by TGID in ascending order. 2. Adjust the order of member variables in block_list strust to avoid one 4 byte hole. 3. Add -P option explanation in the document. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiajian Ye <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Yixuan Cao <[email protected]> Cc: Zhenliang Wei <[email protected]> Cc: Yinan Zhang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort.c: add a security checkJiajian Ye1-0/+6
Add a security check after using malloc() to allocate memory. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiajian Ye <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Yinan Zhang <[email protected]> Cc: Yixuan Cao <[email protected]> Cc: Zhenliang Wei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort.c: fix commentsJiajian Ye1-2/+2
Two adjustments are made: 1. Correct a grammatical error: replace the "what" in "Do the job what you want to debug" with "that". 2. Replace "has not been" with "has been" in the description of the -f option: According to Commit b1c9ba071e7d ("tools/vm/page_owner_sort.c: fix the instructions for use"), the description of the "-f" option is "Filter out the information of blocks whose memory has been released." Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiajian Ye <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Yinan Zhang <[email protected]> Cc: Yixuan Cao <[email protected]> Cc: Zhenliang Wei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort.c: fix the instructions for useYixuan Cao1-1/+1
I noticed a discrepancy between the usage method and the code logic. If we enable the -f option, it should be "Filter out the information of blocks whose memory has been released". Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yixuan Cao <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Sean Anderson <[email protected]> Cc: Muchun Song <[email protected]> Cc: Zhenliang Wei <[email protected]> Cc: Tang Bin <[email protected]> Cc: Yinan Zhang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/page_owner.c: record tgidYixuan Cao1-6/+9
In a single-threaded process, the pid in kernel task_struct is the same as the tgid, which can mark the process of page allocation. But in a multithreaded process, only the task_struct of the thread leader has the same pid as tgid, and the pids of other threads are different from tgid. Therefore, tgid is recorded to provide effective information for debugging and data statistics of multithreaded programs. This can also be achieved by observing the task name (executable file name) for a specific process. However, when the same program is started multiple times, the task name is the same and the tgid is different. Therefore, in the debugging of multi-threaded programs, combined with the task name and tgid, more accurate runtime information of a certain run of the program can be obtained. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yixuan Cao <[email protected]> Cc: Waiman Long <[email protected]> Cc: Rafael Aquini <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/page_owner: record task command nameWaiman Long1-4/+10
The page_owner information currently includes the pid of the calling task. That is useful as long as the task is still running. Otherwise, the number is meaningless. To have more information about the allocating tasks that had exited by the time the page_owner information is retrieved, we need to store the command name of the task. Add a new comm field into page_owner structure to store the command name and display it when the page_owner information is retrieved. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Waiman Long <[email protected]> Acked-by: Rafael Aquini <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/page_owner: print memcg informationWaiman Long1-0/+42
It was found that a number of offline memcgs were not freed because they were pinned by some charged pages that were present. Even "echo 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These offline but not freed memcgs tend to increase in number over time with the side effect that percpu memory consumption as shown in /proc/meminfo also increases over time. In order to find out more information about those pages that pin offline memcgs, the page_owner feature is extended to print memory cgroup information especially whether the cgroup is offline or not. RCU read lock is taken when memcg is being accessed to make sure that it won't be freed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Waiman Long <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Rafael Aquini <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24mm/page_owner: use scnprintf() to avoid excessive buffer overrun checkWaiman Long1-11/+3
The snprintf() function can return a length greater than the given input size. That will require a check for buffer overrun after each invocation of snprintf(). scnprintf(), on the other hand, will never return a greater length. By using scnprintf() in selected places, we can avoid some buffer overrun checks except after stack_depot_snprint() and after the last snprintf(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Waiman Long <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Acked-by: Rafael Aquini <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24lib/vsprintf: avoid redundant work with 0 sizeWaiman Long1-3/+5
Patch series "mm/page_owner: Extend page_owner to show memcg information", v4. While debugging the constant increase in percpu memory consumption on a system that spawned large number of containers, it was found that a lot of offline mem_cgroup structures remained in place without being freed. Further investigation indicated that those mem_cgroup structures were pinned by some pages. In order to find out what those pages are, the existing page_owner debugging tool is extended to show memory cgroup information and whether those memcgs are offline or not. With the enhanced page_owner tool, the following is a typical page that pinned the mem_cgroup structure in my test case: Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 162970 (podman), ts 1097761405537 ns, free_ts 1097760838089 ns PFN 1925700 type Movable Block 3761 type Movable Flags 0x17ffffc00c001c(uptodate|dirty|lru|reclaim|swapbacked|node=0|zone=2|lastcpupid=0x1fffff) prep_new_page+0xac/0xe0 get_page_from_freelist+0x1327/0x14d0 __alloc_pages+0x191/0x340 alloc_pages_vma+0x84/0x250 shmem_alloc_page+0x3f/0x90 shmem_alloc_and_acct_page+0x76/0x1c0 shmem_getpage_gfp+0x281/0x940 shmem_write_begin+0x36/0xe0 generic_perform_write+0xed/0x1d0 __generic_file_write_iter+0xdc/0x1b0 generic_file_write_iter+0x5d/0xb0 new_sync_write+0x11f/0x1b0 vfs_write+0x1ba/0x2a0 ksys_write+0x59/0xd0 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Charged to offline memcg libpod-conmon-15e4f9c758422306b73b2dd99f9d50a5ea53cbb16b4a13a2c2308a4253cc0ec8. So the page was not freed because it was part of a shmem segment. That is useful information that can help users to diagnose similar problems. With cgroup v1, /proc/cgroups can be read to find out the total number of memory cgroups (online + offline). With cgroup v2, the cgroup.stat of the root cgroup can be read to find the number of dying cgroups (most likely pinned by dying memcgs). The page_owner feature is not supposed to be enabled for production system due to its memory overhead. However, if it is suspected that dying memcgs are increasing over time, a test environment with page_owner enabled can then be set up with appropriate workload for further analysis on what may be causing the increasing number of dying memcgs. This patch (of 4): For *scnprintf(), vsnprintf() is always called even if the input size is 0. That is a waste of time, so just return 0 in this case. Note that vsnprintf() will never return -1 to indicate an error. So skipping the call to vsnprintf() when size is 0 will have no functional impact at all. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Waiman Long <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Rafael Aquini <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Ira Weiny <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24Documentation/vm/page_owner.rst: fix unexpected indentation warnsShuah Khan1-3/+3
Fix Unexpected indentation warns in page_owner: Documentation/vm/page_owner.rst:92: WARNING: Unexpected indentation. Documentation/vm/page_owner.rst:96: WARNING: Unexpected indentation. Documentation/vm/page_owner.rst:107: WARNING: Unexpected indentation. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shuah Khan <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24Documentation/vm/page_owner.rst: update the documentationShenghong Han1-2/+21
Update the documentation of ``page_owner``. [[email protected]: small grammatical tweaks] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shenghong Han <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Georgi Djakov <[email protected]> Cc: Liam Mark <[email protected]> Cc: Tang Bin <[email protected]> Cc: Zhang Shengju <[email protected]> Cc: Zhenliang Wei <[email protected]> Cc: Xiaoming Ni <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort.c: delete invalid duplicate codeYixuan Cao1-2/+0
I noticed that there is two invalid lines of duplicate code. It's better to delete it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yixuan Cao <[email protected]> Cc: Mark Brown <[email protected]> Cc: Sean Anderson <[email protected]> Cc: Zhenliang Wei <[email protected]> Cc: Tang Bin <[email protected]> Cc: Yinan Zhang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort.c: two trivial fixesShenghong Han1-3/+2
1) There is an unused variable. It's better to delete it. 2) One case is missing in the usage(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shenghong Han <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort.c: support sorting pid and timeChongxi Zhao1-29/+148
When viewing the page owner information, we expect that the information can be sorted by PID, so that we can quickly combine PID with the program to check the information together. We also expect that the information can be sorted by time. Time sorting helps to view the running status of the program according to the time interval when the program hangs up. Finally, we hope to pass the page_ owner_ Sort. C can reduce part of the output and only output the plate information whose memory has not been released, which can make us locate the problem of the program faster. Therefore, the following adjustments have been made: 1. Add the static functions search_pattern and check_regcomp to improve the cleanliness. 2. Add member attributes and their corresponding sorting methods. In terms of comparison time, int will overflow because the data of ull is too large, so the ternary operator is used 3. Add the -f parameter to filter out the information of blocks whose memory has not been released Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chongxi Zhao <[email protected]> Reviewed-by: Sean Anderson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort.c: add switch between culling by stacktrace and txtYinan Zhang1-3/+20
Culling by comparing stacktrace would casue loss of some information. For example, if there exists 2 blocks which have the same stacktrace and the different head info Page allocated via order 0, mask 0x108c48(...), pid 73696, ts 1578829190639010 ns, free_ts 1576583851324450 ns prep_new_page+0x80/0xb8 get_page_from_freelist+0x924/0xee8 __alloc_pages+0x138/0xc18 alloc_pages+0x80/0xf0 __page_cache_alloc+0x90/0xc8 Page allocated via order 0, mask 0x108c48(...), pid 61806, ts 1354113726046100 ns, free_ts 1354104926841400 ns prep_new_page+0x80/0xb8 get_page_from_freelist+0x924/0xee8 __alloc_pages+0x138/0xc18 alloc_pages+0x80/0xf0 __page_cache_alloc+0x90/0xc8 After culling, it would be like this 2 times, 2 pages: Page allocated via order 0, mask 0x108c48(...), pid 73696, ts 1578829190639010 ns, free_ts 1576583851324450 ns prep_new_page+0x80/0xb8 get_page_from_freelist+0x924/0xee8 __alloc_pages+0x138/0xc18 alloc_pages+0x80/0xf0 __page_cache_alloc+0x90/0xc8 The info of second block missed. So, add -c to turn on culling by stacktrace. By default, it will cull by txt. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yinan Zhang <[email protected]> Cc: Changhee Han <[email protected]> Cc: Sean Anderson <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Tang Bin <[email protected]> Cc: Zhang Shengju <[email protected]> Cc: Zhenliang Wei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort.c: support sorting by stack traceSean Anderson1-9/+14
This adds the ability to sort by stacktraces. This is helpful when comparing multiple dumps of page_owner taken at different times, since blocks will not be reordered if they were allocated/free'd. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sean Anderson <[email protected]> Cc: Zhenliang Wei <[email protected]> Cc: Changhee Han <[email protected]> Cc: Tang Bin <[email protected]> Cc: Zhang Shengju <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Yinan Zhang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24tools/vm/page_owner_sort.c: sort by stacktrace before cullingSean Anderson1-4/+6
The contents of page_owner have changed to include more information than the stack trace. On a modern kernel, the blocks look like Page allocated via order 0, mask 0x0(), pid 1, ts 165564237 ns, free_ts 0 ns register_early_stack+0x4b/0x90 init_page_owner+0x39/0x250 kernel_init_freeable+0x11e/0x242 kernel_init+0x16/0x130 Sorting by the contents of .txt will result in almost no repeated pages, as the pid, ts, and free_ts will almost never be the same. Instead, sort by the contents of the stack trace, which we assume to be whatever is after the first line. [[email protected]: fix NULL-pointer dereference when comparing stack traces] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sean Anderson <[email protected]> Cc: Changhee Han <[email protected]> Cc: Tang Bin <[email protected]> Cc: Zhang Shengju <[email protected]> Cc: Zhenliang Wei <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Yinan Zhang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-03-24Merge tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds20-376/+577
Pull ceph updates from Ilya Dryomov: "The highlights are: - several changes to how snap context and snap realms are tracked (Xiubo Li). In particular, this should resolve a long-standing issue of high kworker CPU usage and various stalls caused by needless iteration over all inodes in the snap realm. - async create fixes to address hangs in some edge cases (Jeff Layton) - support for getvxattr MDS op for querying server-side xattrs, such as file/directory layouts and ephemeral pins (Milind Changire) - average latency is now maintained for all metrics (Venky Shankar) - some tweaks around handling inline data to make it fit better with netfs helper library (David Howells) Also a couple of memory leaks got plugged along with a few assorted fixups. Last but not least, Xiubo has stepped up to serve as a CephFS co-maintainer" * tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-client: (27 commits) ceph: fix memory leak in ceph_readdir when note_last_dentry returns error ceph: uninitialized variable in debug output ceph: use tracked average r/w/m latencies to display metrics in debugfs ceph: include average/stdev r/w/m latency in mds metrics ceph: track average r/w/m latency ceph: use ktime_to_timespec64() rather than jiffies_to_timespec64() ceph: assign the ci only when the inode isn't NULL ceph: fix inode reference leakage in ceph_get_snapdir() ceph: misc fix for code style and logs ceph: allocate capsnap memory outside of ceph_queue_cap_snap() ceph: do not release the global snaprealm until unmounting ceph: remove incorrect and unused CEPH_INO_DOTDOT macro MAINTAINERS: add Xiubo Li as cephfs co-maintainer ceph: eliminate the recursion when rebuilding the snap context ceph: do not update snapshot context when there is no new snapshot ceph: zero the dir_entries memory when allocating it ceph: move to a dedicated slabcache for ceph_cap_snap ceph: add getvxattr op libceph: drop else branches in prepare_read_data{,_cont} ceph: fix comments mentioning i_mutex ...
2022-03-24Merge tag 'xfs-5.18-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds27-211/+344
Pull xfs updates from Darrick Wong: "The biggest change this cycle is bringing XFS' inode attribute setting code back towards alignment with what the VFS does. IOWs, setgid bit handling should be a closer match with ext4 and btrfs behavior. The rest of the branch is bug fixes around the filesystem -- patching gaps in quota enforcement, removing bogus selinux audit messages, and fixing log corruption and problems with log recovery. There will be a second pull request later on in the merge window with more bug fixes. Dave Chinner will be taking over as XFS maintainer for one release cycle, starting from the day 5.18-rc1 drops until 5.19-rc1 is tagged so that I can focus on starting a massive design review for the (feature complete after five years) online repair feature. Summary: - Fix some incorrect mapping state being passed to iomap during COW - Don't create bogus selinux audit messages when deciding to degrade gracefully due to lack of privilege - Fix setattr implementation to use VFS helpers so that we drop setgid consistently with the other filesystems - Fix link/unlink/rename to check quota limits - Constify xfs_name_dotdot to prevent abuse of in-kernel symbols - Fix log livelock between the AIL and inodegc threads during recovery - Fix a log stall when the AIL races with pushers - Fix stalls in CIL flushes due to pinned inode cluster buffers during recovery - Fix log corruption due to incorrect usage of xfs_is_shutdown vs xlog_is_shutdown because during an induced fs shutdown, AIL writeback must continue until the log is shut down, even if the filesystem has already shut down" * tag 'xfs-5.18-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: xfs_is_shutdown vs xlog_is_shutdown cage fight xfs: AIL should be log centric xfs: log items should have a xlog pointer, not a mount xfs: async CIL flushes need pending pushes to be made stable xfs: xfs_ail_push_all_sync() stalls when racing with updates xfs: check buffer pin state after locking in delwri_submit xfs: log worker needs to start before intent/unlink recovery xfs: constify xfs_name_dotdot xfs: constify the name argument to various directory functions xfs: reserve quota for target dir expansion when renaming files xfs: reserve quota for dir expansion when linking/unlinking files xfs: refactor user/group quota chown in xfs_setattr_nonsize xfs: use setattr_copy to set vfs inode attributes xfs: don't generate selinux audit messages for capability testing xfs: add missing cmap->br_state = XFS_EXT_NORM update
2022-03-24Merge tag 'dax-for-5.18' of ↵Linus Torvalds2-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull DAX updates from Dan Williams: "Andrew has been shepherding major dax features that touch the core -mm through his tree, but I still collect the dax updates that are core-mm independent. - Fix a crash due to a missing rcu_barrier() in dax_fs_exit() - Fix two miscellaneous doc issues" * tag 'dax-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Fix missing kdoc for dax_device dax: make sure inodes are flushed before destroy cache fsdax: fix function description
2022-03-24Merge tag 'cxl-for-5.18' of ↵Linus Torvalds32-1225/+3725
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL (Compute Express Link) updates from Dan Williams: "This development cycle extends the subsystem to discover CXL resources throughout a CXL/PCIe switch topology and respond to hot add/remove events anywhere in that topology. This is more foundational infrastructure in preparation for dynamic memory region provisioning support. Recall that CXL memory regions, as the new "Theory of Operation" section of Documentation/driver-api/cxl/memory-devices.rst describes, bring storage volume striping semantics to memory. The hot add/remove behavior is validated with extensions to the cxl_test unit test environment and this test in the cxl-cli test suite: https://github.com/pmem/ndctl/blob/djbw/for-74/cxl/test/cxl-topology.sh Summary: - Add a driver for 'struct cxl_memdev' objects responsible for CXL.mem operation as distinct from 'cxl_pci' mailbox operations. Its primary responsibility is enumerating an endpoint 'struct cxl_port' and all the 'struct cxl_port' instances between an endpoint and the CXL platform root. - Add a driver for 'struct cxl_port' objects responsible for enumerating and operating all Host-managed Device Memory (HDM) decoder resources between the platform-level CXL memory description, all intervening host bridges / switches, and the HDM resources in endpoints. - Update the cxl_pci driver to validate CXL.mem operation precursors to HDM decoder operation like ready-polling, and legacy CXL 1.1 DVSEC based CXL.mem configuration. - Add basic lockdep coverage for usage of device_lock() on CXL subsystem objects similar to what exists for LIBNVDIMM. Include a compile-time switch for which subsystem to validate at run-time. - Update cxl_test to emulate a one level switch topology. - Document a "Theory of Operation" for the subsystem. - Add 'numa_node' and 'serial' attributes to cxl_memdev sysfs - Include miscellaneous fixes for spec / QEMU CXL emulation compatibility and static analysis reports" * tag 'cxl-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (48 commits) cxl/core/port: Fix NULL but dereferenced coccicheck error cxl/port: Hold port reference until decoder release cxl/port: Fix endpoint refcount leak cxl/core: Fix cxl_device_lock() class detection cxl/core/port: Fix unregister_port() lock assertion cxl/regs: Fix size of CXL Capability Header Register cxl/core/port: Handle invalid decoders cxl/core/port: Fix / relax decoder target enumeration tools/testing/cxl: Add a physical_node link tools/testing/cxl: Enumerate mock decoders tools/testing/cxl: Mock one level of switches tools/testing/cxl: Fix root port to host bridge assignment tools/testing/cxl: Mock dvsec_ranges() cxl/core/port: Add endpoint decoders cxl/core: Move target_list out of base decoder attributes cxl/mem: Add the cxl_mem driver cxl/core/port: Add switch port enumeration cxl/memdev: Add numa_node attribute cxl/pci: Emit device serial number cxl/pci: Implement wait for media active ...
2022-03-25fbdev: Fix cfb_imageblit() for arbitrary image widthsThomas Zimmermann1-4/+24
Commit 0d03011894d2 ("fbdev: Improve performance of cfb_imageblit()") broke cfb_imageblit() for image widths that are not aligned to 8-bit boundaries. Fix this by handling the trailing pixels on each line separately. The performance improvements in the original commit do not regress by this change. Signed-off-by: Thomas Zimmermann <[email protected]> Fixes: 0d03011894d2 ("fbdev: Improve performance of cfb_imageblit()") Reported-by: Marek Szyprowski <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Javier Martinez Canillas <[email protected]> Cc: Sam Ravnborg <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Acked-by: Daniel Vetter <[email protected]> Reviewed-by: Javier Martinez Canillas <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-03-25fbdev: Fix sys_imageblit() for arbitrary image widthsThomas Zimmermann1-4/+25
Commit 6f29e04938bf ("fbdev: Improve performance of sys_imageblit()") broke sys_imageblit() for image width that are not aligned to 8-bit boundaries. Fix this by handling the trailing pixels on each line separately. The performance improvements in the original commit do not regress by this change. Signed-off-by: Thomas Zimmermann <[email protected]> Fixes: 6f29e04938bf ("fbdev: Improve performance of sys_imageblit()") Cc: Thomas Zimmermann <[email protected]> Cc: Javier Martinez Canillas <[email protected]> Cc: Sam Ravnborg <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Javier Martinez Canillas <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-03-25Merge tag 'drm-misc-next-fixes-2022-03-24-1' of ↵Dave Airlie3-10/+15
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next-fixes for v5.18-rc1: - Make audio and color plane support checking only happen when a CEA extension block is found. - Fix a small regression from ttm_resource_fini() - Small selftest fix. Signed-off-by: Dave Airlie <[email protected]> From: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]