aboutsummaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2022-10-03selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memoryZach O'Keefe2-38/+134
Add :collapse mod to userfaultfd selftest. Currently this mod is only valid for "shmem" test type, but could be used for other test types. When provided, memory allocated by ->allocate_area() will be hugepage-aligned enforced to be hugepage-sized. userfaultf_minor_test, after the UFFD-registered mapping has been populated by UUFD minor fault handler, attempt to MADV_COLLAPSE the UFFD-registered mapping to collapse the memory into a pmd-mapped THP. This test is meant to be a functional test of what occurs during UFFD-driven live migration of VMs backed by huge tmpfs where, after a hugepage-sized region has been successfully migrated (in native page-sized chunks, to avoid latency of fetched a hugepage over the network), we want to reclaim previous VM performance by remapping it at the PMD level. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zach O'Keefe <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmdZach O'Keefe1-0/+30
This test tests that MADV_COLLAPSE acting on file/shmem memory for which (1) the file extent mapping by the memory is already a huge page in the page cache, and (2) the pmd mapping this memory in the target process is none. In practice, (1)+(2) is the state left over after khugepaged has successfully collapsed file/shmem memory for a target VMA, but the memory has not yet been refaulted. So, this test in-effect tests MADV_COLLAPSE racing with khugepaged to collapse the memory first. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zach O'Keefe <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03selftests/vm: add thp collapse shmem testingZach O'Keefe1-2/+55
Add memory operations for shmem (memfd) memory, and reuse existing tests with the new memory operations. Shmem tests can be called with "shmem" mem_type, and shmem tests are ran with "all" mem_type as well. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zach O'Keefe <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03selftests/vm: add thp collapse file and tmpfs testingZach O'Keefe3-56/+431
Add memory operations for file-backed and tmpfs memory. Call existing tests with these new memory operations to test collapse functionality of khugepaged and MADV_COLLAPSE on file-backed and tmpfs memory. Not all tests are reusable; for example, collapse_swapin_single_pte() which checks swap usage. Refactor test arguments. Usage is now: Usage: ./khugepaged <test type> [dir] <test type> : <context>:<mem_type> <context> : [all|khugepaged|madvise] <mem_type> : [all|anon|file] "file,all" mem_type requires [dir] argument "file,all" mem_type requires kernel built with CONFIG_READ_ONLY_THP_FOR_FS=y if [dir] is a (sub)directory of a tmpfs mount, tmpfs must be mounted with huge=madvise option for khugepaged tests to work Refactor calling tests to make it clear what collapse context / memory operations they support, but only invoke tests requested by user. Also log what test is being ran, and with what context / memory, to make test logs more human readable. A new test file is created and deleted for every test to ensure no pages remain in the page cache between tests (tests also may attempt to collapse different amount of memory). For file-backed memory where the file is stored on a block device, disable /sys/block/<device>/queue/read_ahead_kb so that pages don't find their way into the page cache without the tests faulting them in. Add file and shmem wrappers to vm_utils check for file and shmem hugepages in smaps. [[email protected]: fix "add thp collapse file and tmpfs testing" for tmpfs] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zach O'Keefe <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03selftests/vm: modularize thp collapse memory operationsZach O'Keefe1-166/+207
Modularize operations to setup, cleanup, fault, and check for huge pages, for a given memory type. This allows reusing existing tests with additional memory types by defining new memory operations. Following patches will add file and shmem memory types. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zach O'Keefe <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03selftests/vm: dedup THP helpersZach O'Keefe6-76/+32
These files: tools/testing/selftests/vm/vm_util.c tools/testing/selftests/vm/khugepaged.c Both contain logic to: 1) Determine hugepage size on current system 2) Read /proc/self/smaps to determine number of THPs at an address Refactor selftests/vm/khugepaged.c to use the vm_util common helpers and add it as a build dependency. Since selftests/vm/khugepaged.c is the largest user of check_huge(), change the signature of check_huge() to match selftests/vm/khugepaged.c's useage: take an expected number of hugepages, and return a bool indicating if the correct number of hugepages were found. Add a wrapper, check_huge_anon(), in anticipation of checking smaps for file and shmem hugepages. Update existing callsites to use the new pattern / function. Likewise, check_for_pattern() was duplicated, and it's a general enough helper to include in vm_util helpers as well. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zach O'Keefe <[email protected]> Reviewed-by: Zi Yan <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03selftests/vm: retry on EAGAIN for MADV_COLLAPSE selftestZach O'Keefe1-1/+22
MADV_COLLAPSE is a best-effort request that will set errno to an actionable value if the request cannot be performed. For example, if pages are not found on the LRU, or if they are currently locked by something else, MADV_COLLAPSE will fail and set errno to EAGAIN to inform callers that they may try again. Since the khugepaged selftest is the first public use of MADV_COLLAPSE, set a best practice of checking errno and retrying on EAGAIN. Link: https://lkml.kernel.org/r/[email protected] Fixes: 9330694de59f ("selftests/vm: add MADV_COLLAPSE collapse context to selftests") Signed-off-by: Zach O'Keefe <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03objtool: kmsan: list KMSAN API functions as uaccess-safeAlexander Potapenko1-0/+20
KMSAN inserts API function calls in a lot of places (function entries and exits, local variables, memory accesses), so they may get called from the uaccess regions as well. KMSAN API functions are used to update the metadata (shadow/origin pages) for kernel memory accesses. The metadata pages for kernel pointers are also located in the kernel memory, so touching them is not a problem. For userspace pointers, no metadata is allocated. If an API function is supposed to read or modify the metadata, it does so for kernel pointers and ignores userspace pointers. If an API function is supposed to return a pair of metadata pointers for the instrumentation to use (like all __msan_metadata_ptr_for_TYPE_SIZE() functions do), it returns the allocated metadata for kernel pointers and special dummy buffers residing in the kernel memory for userspace pointers. As a result, none of KMSAN API functions perform userspace accesses, but since they might be called from UACCESS regions they use user_access_save/restore(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03selftest/damon: add a test for duplicate context dirs creationSeongJae Park2-0/+28
Patch series "mm/damon: minor fixes and cleanups". This patchset contains minor fixes and cleanups for DAMON including - selftest for a bug we found before (Patch 1), - fix of region holes in vaddr corner case and a kunit test for it (Patches 2 and 3), and - documents/Kconfig updates for title wordsmithing (Patch 4) and more aggressive DAMON debugfs interface deprecation announcement (Patches 5-7). This patch (of 7): Commit d26f60703606 ("mm/damon/dbgfs: avoid duplicate context directory creation") fixes a bug which could result in memory leak and DAMON disablement. This commit adds a selftest for verifying the fix and avoid regression. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Yun Levi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03memblock tests: add new pageblock related macroKefeng Wang1-0/+2
Add new pageblock_start_pfn() and pageblock_align() macro which are needed by memblock tests. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03mm/hmm/test: use char dev with struct device to get device nodeMika Penttilä1-10/+0
HMM selftests use an in-kernel pseudo device to emulate device memory. The pseudo device registers a major device range for two or four pseudo device instances. User space has a script that reads /proc/devices in order to find the assigned major number, and sends that to mknod(1), once for each node. Change this to properly use cdev and struct device APIs. Delete the /proc/devices parsing from the user-space test script, now that it is unnecessary. Also, delete an unused field in struct dmirror_device: devmem. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mika Penttilä <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03hugetlb_encode.h: fix undefined behaviour (34 << 26)Matthias Goergens1-13/+13
Left-shifting past the size of your datatype is undefined behaviour in C. The literal 34 gets the type `int`, and that one is not big enough to be left shifted by 26 bits. An `unsigned` is long enough (on any machine that has at least 32 bits for their ints.) For uniformity, we mark all the literals as unsigned. But it's only really needed for HUGETLB_FLAG_ENCODE_16GB. Thanks to Randy Dunlap for an initial review and suggestion. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthias Goergens <[email protected]> Acked-by: Randy Dunlap <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26mm: add merging after mremap resizeJakub Matěna1-1/+48
When mremap call results in expansion, it might be possible to merge the VMA with the next VMA which might become adjacent. This patch adds vma_merge call after the expansion is done to try and merge. [[email protected]: coding-style cleanups] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jakub Matěna <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26lib/test_maple_tree: add testing for maple treeLiam R. Howlett1-2/+7
This is a test suite that uses the radix test infrastructure. It has been split into its own commit to allow for easier review of the maple tree code. The testing includes: - Allocation of nodes - gfp flag allocation checks - Expansion & contraction of tree - preallocation checks - tree navigation by next/prev - tree navigation by iterators (mas_for_each, etc) - Number of nodes for a given number of entries - Generic tree construction tests - Addition and removal of entries in forward and reverse numerical indexes - gap searching both forward and reverse - Combining gaps by overwriting entries in different ways - splitting right-most node - splitting left-most node - overwriting multiple slots - overwriting across different levels of the tree - overwriting the middle of a tree - causing a 3-way split up to the root by overwriting the last slot and first slot of different nodes and spanning different levels - RCU stress testing of the tree with threads - Duplication of the tree by entry count - Tests which were generated by fuzzers have been added. - A large number of tests which come from recording crashing in a VM and reconstructing the tree (see check_erase2_set()) Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26radix tree test suite: add lockdep_is_held to headerLiam R. Howlett1-0/+2
maple tree uses lockdep_is_held, so define it as external in the header. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26radix tree test suite: add support for slab bulk APIsLiam R. Howlett2-2/+120
Add support for kmem_cache_free_bulk() and kmem_cache_alloc_bulk() to the radix tree test suite. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26radix tree test suite: add allocation counts and size to kmem_cacheLiam R. Howlett1-1/+27
Add functions to get the number of allocations, and total allocations from a kmem_cache. Also add a function to get the allocated size and a way to zero the total allocations. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26radix tree test suite: add kmem_cache_set_non_kernel()Liam R. Howlett1-2/+14
kmem_cache_set_non_kernel() is a mechanism to allow a certain number of kmem_cache_alloc requests to succeed even when GFP_KERNEL is not set in the flags. This functionality allows for testing different paths though the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26radix tree test suite: add pr_err defineLiam R. Howlett1-0/+1
define pr_err to printk Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26Maple Tree: add new data structureLiam R. Howlett5-0/+74
Patch series "Introducing the Maple Tree" The maple tree is an RCU-safe range based B-tree designed to use modern processor cache efficiently. There are a number of places in the kernel that a non-overlapping range-based tree would be beneficial, especially one with a simple interface. If you use an rbtree with other data structures to improve performance or an interval tree to track non-overlapping ranges, then this is for you. The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes. With the increased branching factor, it is significantly shorter than the rbtree so it has fewer cache misses. The removal of the linked list between subsequent entries also reduces the cache misses and the need to pull in the previous and next VMA during many tree alterations. The first user that is covered in this patch set is the vm_area_struct, where three data structures are replaced by the maple tree: the augmented rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The long term goal is to reduce or remove the mmap_lock contention. The plan is to get to the point where we use the maple tree in RCU mode. Readers will not block for writers. A single write operation will be allowed at a time. A reader re-walks if stale data is encountered. VMAs would be RCU enabled and this mode would be entered once multiple tasks are using the mm_struct. Davidlor said : Yes I like the maple tree, and at this stage I don't think we can ask for : more from this series wrt the MM - albeit there seems to still be some : folks reporting breakage. Fundamentally I see Liam's work to (re)move : complexity out of the MM (not to say that the actual maple tree is not : complex) by consolidating the three complimentary data structures very : much worth it considering performance does not take a hit. This was very : much a turn off with the range locking approach, which worst case scenario : incurred in prohibitive overhead. Also as Liam and Matthew have : mentioned, RCU opens up a lot of nice performance opportunities, and in : addition academia[1] has shown outstanding scalability of address spaces : with the foundation of replacing the locked rbtree with RCU aware trees. A similar work has been discovered in the academic press https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf Sheer coincidence. We designed our tree with the intention of solving the hardest problem first. Upon settling on a b-tree variant and a rough outline, we researched ranged based b-trees and RCU b-trees and did find that article. So it was nice to find reassurances that we were on the right path, but our design choice of using ranges made that paper unusable for us. This patch (of 70): The maple tree is an RCU-safe range based B-tree designed to use modern processor cache efficiently. There are a number of places in the kernel that a non-overlapping range-based tree would be beneficial, especially one with a simple interface. If you use an rbtree with other data structures to improve performance or an interval tree to track non-overlapping ranges, then this is for you. The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes. With the increased branching factor, it is significantly shorter than the rbtree so it has fewer cache misses. The removal of the linked list between subsequent entries also reduces the cache misses and the need to pull in the previous and next VMA during many tree alterations. The first user that is covered in this patch set is the vm_area_struct, where three data structures are replaced by the maple tree: the augmented rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The long term goal is to reduce or remove the mmap_lock contention. The plan is to get to the point where we use the maple tree in RCU mode. Readers will not block for writers. A single write operation will be allowed at a time. A reader re-walks if stale data is encountered. VMAs would be RCU enabled and this mode would be entered once multiple tasks are using the mm_struct. There is additional BUG_ON() calls added within the tree, most of which are in debug code. These will be replaced with a WARN_ON() call in the future. There is also additional BUG_ON() calls within the code which will also be reduced in number at a later date. These exist to catch things such as out-of-range accesses which would crash anyways. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: David Howells <[email protected]> Tested-by: Sven Schnelle <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26Merge branch 'mm-hotfixes-stable' into mm-stableAndrew Morton2-20/+2
2022-09-11selftest: vm: remove deleted local_config.* from .gitignoreTarun Sahu1-1/+0
Commit d2d6cba5d6623 ("selftest: vm: remove orphaned references to local_config.{h,mk}") took care of removing orphaned references. This commit removes local_config from .gitignore. Parent patch commit 69007f156ba ("Kselftests: remove support of libhugetlbfs from kselftests") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tarun Sahu <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-11Kselftests: remove support of libhugetlbfs from kselftestsTarun Sahu3-86/+74
libhugetlbfs, the user side utitlity to work with hugepages, does not have any active support. There are only 2 selftests which are part of in vm/hmm_test.c that depends on libhugetlbfs. This patch modifies the tests so that they will not require libhugetlb library. [[email protected]: : remove orphaned references to local_config.{h,mk}] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tarun Sahu <[email protected]> Signed-off-by: Axel Rasmussen <[email protected]> Tested-by: Zach O'Keefe <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Shivaprasad G Bhat <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Vaibhav Jain <[email protected]> Cc: Axel Rasmussen <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-11tools/vm/page_owner_sort: fix -f optionYixuan Cao1-1/+6
The -f option is to filter out the information of blocks whose memory has not been released, I noticed some blocks should not be filtered out. Commit 9cc7e96aa846 ("mm/page_owner: record timestamp and pid") records the allocation timestamp (ts_nsec) of all pages. Commit 866b48526217 ("mm/page_owner: record the timestamp of all pages during free") records the free timestamp (free_ts_nsec) of all pages. When the page is allocated for the first time, the initial value of free_ts_nsec is 0, and the corresponding time will be obtained when the page is released. But during reallocation the free_ts_nsec will not reset to 0 again. In particular, when page migration occurs, these two timestamps will be the same. Now page_owner_sort removes all text blocks whose free_ts_nsec is not 0 when using -f option. However, this way can only select pages allocated for the first time. If a freed page is reallocated, free_ts_nsec will be less than ts_nsec; if page migration occurs, the two timestamps will be equal. These cases should be considered as pages are not released. So I fix the function is_need() to keep text blocks that meet the above two conditions when using -f option. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yixuan Cao <[email protected]> Cc: Chongxi Zhao <[email protected]> Cc: Jiajian Ye <[email protected]> Cc: Yuhong Feng <[email protected]> Cc: Liam Mark <[email protected]> Cc: Georgi Djakov <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-11selftests: vm: add /dev/userfaultfd test cases to run_vmtests.shAxel Rasmussen1-7/+10
This new mode was recently added to the userfaultfd selftest. We want to exercise both userfaultfd(2) as well as /dev/userfaultfd, so add both test cases to the script. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Reviewed-by: Shuah Khan <[email protected]> Acked-by: Peter Xu <[email protected]> Cc: Al Viro <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dmitry V. Levin <[email protected]> Cc: Gleb Fotengauer-Malinovskiy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Zhang Yi <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-11userfaultfd: selftests: modify selftest to use /dev/userfaultfdAxel Rasmussen1-10/+66
We clearly want to ensure both userfaultfd(2) and /dev/userfaultfd keep working into the future, so just run the test twice, using each interface. Instead of always testing both userfaultfd(2) and /dev/userfaultfd, let the user choose which to test. As with other test features, change the behavior based on a new command line flag. Introduce the idea of "test mods", which are generic (not specific to a test type) modifications to the behavior of the test. This is sort of borrowed from this RFC patch series [1], but simplified a bit. The benefit is, in "typical" configurations this test is somewhat slow (say, 30sec or something). Testing both clearly doubles it, so it may not always be desirable, as users are likely to use one or the other, but never both, in the "real world". [1]: https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/ [[email protected]: modify selftest to exit with KSFT_SKIP *only* when features are unsupported, per Mike] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Acked-by: Peter Xu <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: Al Viro <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dmitry V. Levin <[email protected]> Cc: Gleb Fotengauer-Malinovskiy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Zhang Yi <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-11selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.shAxel Rasmussen1-2/+4
Patch series "userfaultfd: add /dev/userfaultfd for fine grained access control", v7. Why not ...? ============ - Why not /proc/[pid]/userfaultfd? Two main points (additional discussion [1]): - /proc/[pid]/* files are all owned by the user/group of the process, and they don't really support chmod/chown. So, without extending procfs it doesn't solve the problem this series is trying to solve. - The main argument *for* this was to support creating UFFDs for remote processes. But, that use case clearly calls for CAP_SYS_PTRACE, so to support this we could just use the UFFD syscall as-is. - Why not use a syscall? Access to syscalls is generally controlled by capabilities. We don't have a capability which is used for userfaultfd access without also granting more / other permissions as well, and adding a new capability was rejected [2]. - It's possible a LSM could be used to control access instead, but I have some concerns. I don't think this approach would be as easy to use, particularly if we were to try to solve this with something heavyweight like SELinux. Maybe we could pursue adding a new LSM specifically for this user case, but it may be too narrow of a case to justify that. [1]: https://patchwork.kernel.org/project/linux-mm/cover/[email protected]/ [2]: https://lore.kernel.org/lkml/[email protected]/T/ This patch (of 5): This not being included was just a simple oversight. There are certain features (like minor fault support) which are only enabled on shared mappings, so without including hugetlb_shared we actually lose a significant amount of test coverage. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Reviewed-by: Shuah Khan <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Al Viro <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dmitry V. Levin <[email protected]> Cc: Gleb Fotengauer-Malinovskiy <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Zhang Yi <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-11selftests/vm: add selftest to verify multi THP collapseZach O'Keefe1-67/+73
Add support to allocate and verify collapse of multiple hugepage-sized regions into multiple THPs. Add "nr" argument to check_huge() that instructs check_huge() to check for exactly "nr_hpages" THPs. This has the added benefit of now being able to check for exactly 0 THPs, and so callsites that previously checked the negation of exactly 1 THP are now more correct. ->collapse struct collapse_context hook has been expanded with a "nr_hpages" argument to collapse "nr_hpages" hugepages. The collapse_full() test has been repurposed to collapse 4 THPs at once. It is expected more tests will want to test multi THP collapse (e.g. file/shmem). This is of particular benefit to madvise collapse context given that it may do many THP collapses during a single syscall. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zach O'Keefe <[email protected]> Cc: Alex Shi <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jens Axboe <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: "Souptick Joarder (HPE)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-11selftests/vm: add selftest to verify recollapse of THPsZach O'Keefe1-0/+31
Add selftest specific to madvise collapse context that tests MADV_COLLAPSE is "successful" if a hugepage-aligned/sized region is already pmd-mapped. This test also verifies that MADV_COLLAPSE can collapse memory into THPs even in "madvise" THP mode and the memory isn't marked VM_HUGEPAGE. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zach O'Keefe <[email protected]> Cc: Alex Shi <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jens Axboe <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: "Souptick Joarder (HPE)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-11selftests/vm: add MADV_COLLAPSE collapse context to selftestsZach O'Keefe1-46/+125
Add madvise collapse context to hugepage collapse selftests. This context is tested with /sys/kernel/mm/transparent_hugepage/enabled set to "never" in order to avoid unwanted interaction with khugepaged during testing. Also, refactor updates to sysfs THP settings using a stack so that the THP settings from nested callers can be restored. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zach O'Keefe <[email protected]> Cc: Alex Shi <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jens Axboe <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: "Souptick Joarder (HPE)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-11selftests/vm: dedup hugepage allocation logicZach O'Keefe1-39/+23
The code p = alloc_mapping(); printf("Allocate huge page..."); madvise(p, hpage_pmd_size, MADV_HUGEPAGE); fill_memory(p, 0, hpage_pmd_size); if (check_huge(p)) success("OK"); else fail("Fail"); Is repeated many times in different tests. Add a helper, alloc_hpage() to handle this. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zach O'Keefe <[email protected]> Cc: Alex Shi <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jens Axboe <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: "Souptick Joarder (HPE)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-11selftests/vm: modularize collapse selftestsZach O'Keefe1-141/+110
Modularize the collapse action of khugepaged collapse selftests by introducing a struct collapse_context which specifies how to collapse a given memory range and the expected semantics of the collapse. This can be reused later to test other collapse contexts. Additionally, all tests have logic that checks if a collapse occurred via reading /proc/self/smaps, and report if this is different than expected. Move this logic into the per-context ->collapse() hook instead of repeating it in every test. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zach O'Keefe <[email protected]> Cc: Alex Shi <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jens Axboe <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: "Souptick Joarder (HPE)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-11mm/madvise: introduce MADV_COLLAPSE sync hugepage collapseZach O'Keefe1-0/+2
This idea was introduced by David Rientjes[1]. Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request a synchronous collapse of memory at their own expense. The benefits of this approach are: * CPU is charged to the process that wants to spend the cycles for the THP * Avoid unpredictable timing of khugepaged collapse Semantics This call is independent of the system-wide THP sysfs settings, but will fail for memory marked VM_NOHUGEPAGE. If the ranges provided span multiple VMAs, the semantics of the collapse over each VMA is independent from the others. This implies a hugepage cannot cross a VMA boundary. If collapse of a given hugepage-aligned/sized region fails, the operation may continue to attempt collapsing the remainder of memory specified. The memory ranges provided must be page-aligned, but are not required to be hugepage-aligned. If the memory ranges are not hugepage-aligned, the start/end of the range will be clamped to the first/last hugepage-aligned address covered by said range. The memory ranges must span at least one hugepage-sized region. All non-resident pages covered by the range will first be swapped/faulted-in, before being internally copied onto a freshly allocated hugepage. Unmapped pages will have their data directly initialized to 0 in the new hugepage. However, for every eligible hugepage aligned/sized region to-be collapsed, at least one page must currently be backed by memory (a PMD covering the address range must already exist). Allocation for the new hugepage may enter direct reclaim and/or compaction, regardless of VMA flags. When the system has multiple NUMA nodes, the hugepage will be allocated from the node providing the most native pages. This operation operates on the current state of the specified process and makes no persistent changes or guarantees on how pages will be mapped, constructed, or faulted in the future Return Value If all hugepage-sized/aligned regions covered by the provided range were either successfully collapsed, or were already PMD-mapped THPs, this operation will be deemed successful. On success, process_madvise(2) returns the number of bytes advised, and madvise(2) returns 0. Else, -1 is returned and errno is set to indicate the error for the most-recently attempted hugepage collapse. Note that many failures might have occurred, since the operation may continue to collapse in the event a single hugepage-sized/aligned region fails. ENOMEM Memory allocation failed or VMA not found EBUSY Memcg charging failed EAGAIN Required resource temporarily unavailable. Try again might succeed. EINVAL Other error: No PMD found, subpage doesn't have Present bit set, "Special" page no backed by struct page, VMA incorrectly sized, address not page-aligned, ... Most notable here is ENOMEM and EBUSY (new to madvise) which are intended to provide the caller with actionable feedback so they may take an appropriate fallback measure. Use Cases An immediate user of this new functionality are malloc() implementations that manage memory in hugepage-sized chunks, but sometimes subrelease memory back to the system in native-sized chunks via MADV_DONTNEED; zapping the pmd. Later, when the memory is hot, the implementation could madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage coverage and dTLB performance. TCMalloc is such an implementation that could benefit from this[2]. Only privately-mapped anon memory is supported for now, but additional support for file, shmem, and HugeTLB high-granularity mappings[2] is expected. File and tmpfs/shmem support would permit: * Backing executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. With MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance and lower RAM footprints. * Backing guest memory by hugapages after the memory contents have been migrated in native-page-sized chunks to a new host, in a userfaultfd-based live-migration stack. [1] https://lore.kernel.org/linux-mm/[email protected]/ [2] https://github.com/google/tcmalloc/tree/master/tcmalloc [[email protected]: avoid possible memory leak in failure path] Link: https://lkml.kernel.org/r/[email protected] [[email protected] add missing kfree() to madvise_collapse()] Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] [[email protected]: delay computation of hpage boundaries until use]] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zach O'Keefe <[email protected]> Signed-off-by: "Souptick Joarder (HPE)" <[email protected]> Suggested-by: David Rientjes <[email protected]> Cc: Alex Shi <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jens Axboe <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rongwei Wang <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Song Liu <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-11tools: fix compilation after gfp_types.h splitMatthew Wilcox (Oracle)2-20/+2
When gfp_types.h was split from gfp.h, it broke the radix test suite. Fix the test suite by using gfp_types.h in the tools gfp.h header. Link: https://lkml.kernel.org/r/[email protected] Fixes: cb5a065b4ea9 (headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>) Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reported-by: Liam R. Howlett <[email protected]> Reviewed-by: Yury Norov <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-08-28Merge tag 'x86-urgent-2022-08-28' of ↵Linus Torvalds1-16/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix PAT on Xen, which caused i915 driver failures - Fix compat INT 80 entry crash on Xen PV guests - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs - Fix RSB stuffing regressions - Fix ORC unwinding on ftrace trampolines - Add Intel Raptor Lake CPU model number - Fix (work around) a SEV-SNP bootloader bug providing bogus values in boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups. - Fix SEV-SNP early boot failure - Fix the objtool list of noreturn functions and annotate snp_abort(), which bug confused objtool on gcc-12. - Fix the documentation for retbleed * tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI: Mention retbleed vulnerability info file for sysfs x86/sev: Mark snp_abort() noreturn x86/sev: Don't use cc_platform_has() for early SEV-SNP calls x86/boot: Don't propagate uninitialized boot_params->cc_blob_address x86/cpu: Add new Raptor Lake CPU model number x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry x86/nospec: Fix i386 RSB stuffing x86/nospec: Unwreck the RSB stuffing x86/bugs: Add "unknown" reporting for MMIO Stale Data x86/entry: Fix entry_INT80_compat for Xen PV guests x86/PAT: Have pat_enabled() properly reflect state when running on Xen
2022-08-27perf stat: Capitalize topdown metrics' namesZhengjun Xing1-12/+12
Capitalize topdown metrics' names to follow the intel SDM. Before: # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 228,094.05 msec cpu-clock # 225.026 CPUs utilized 842 context-switches # 3.691 /sec 224 cpu-migrations # 0.982 /sec 70 page-faults # 0.307 /sec 23,164,105 cycles # 0.000 GHz 29,403,446 instructions # 1.27 insn per cycle 5,268,185 branches # 23.097 K/sec 33,239 branch-misses # 0.63% of all branches 136,248,990 slots # 597.337 K/sec 32,976,450 topdown-retiring # 24.2% retiring 4,651,918 topdown-bad-spec # 3.4% bad speculation 26,148,695 topdown-fe-bound # 19.2% frontend bound 72,515,776 topdown-be-bound # 53.2% backend bound 6,008,540 topdown-heavy-ops # 4.4% heavy operations # 19.8% light operations 3,934,049 topdown-br-mispredict # 2.9% branch mispredict # 0.5% machine clears 16,655,439 topdown-fetch-lat # 12.2% fetch latency # 7.0% fetch bandwidth 41,635,972 topdown-mem-bound # 30.5% memory bound # 22.7% Core bound 1.013634593 seconds time elapsed After: # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 228,081.94 msec cpu-clock # 225.003 CPUs utilized 824 context-switches # 3.613 /sec 224 cpu-migrations # 0.982 /sec 67 page-faults # 0.294 /sec 22,647,423 cycles # 0.000 GHz 28,870,551 instructions # 1.27 insn per cycle 5,167,099 branches # 22.655 K/sec 32,383 branch-misses # 0.63% of all branches 133,411,074 slots # 584.926 K/sec 32,352,607 topdown-retiring # 24.3% Retiring 4,456,977 topdown-bad-spec # 3.3% Bad Speculation 25,626,487 topdown-fe-bound # 19.2% Frontend Bound 70,955,316 topdown-be-bound # 53.2% Backend Bound 5,834,844 topdown-heavy-ops # 4.4% Heavy Operations # 19.9% Light Operations 3,738,781 topdown-br-mispredict # 2.8% Branch Mispredict # 0.5% Machine Clears 16,286,803 topdown-fetch-lat # 12.2% Fetch Latency # 7.0% Fetch Bandwidth 40,802,069 topdown-mem-bound # 30.6% Memory Bound # 22.6% Core Bound 1.013683125 seconds time elapsed Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Xing Zhengjun <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-27perf docs: Update the documentation for the save_type filterKan Liang1-0/+3
Update the documentation to reflect the kernel changes. Signed-off-by: Kan Liang <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-27perf sched: Fix memory leaks in __cmd_record detected with -fsanitize=addressIan Rogers1-5/+19
An array of strings is passed to cmd_record but not freed. As cmd_record modifies the array, add another array as a copy that can be mutated allowing the original array contents to all be freed. Detected with -fsanitize=address. Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-27perf record: Fix manpage formatting of description of support to hybrid systemsAndi Kleen2-12/+2
The Intel hybrid description is written in a different style than the rest of the perf record man page. There were some new command line options added after it which resulted in very strange section ordering. Move the hybrid include last. Also the sub sections in the hybrid document don't fit the record manpage well (especially since it talks about all kinds of unrelated commands). I left this for now, but would be better to separate this properly in the different man pages. It would be better to use sub sections for the other sections, but these don't seem to be supported in AsciiDoc? Some of the examples are still misrendered in the manpage with an indented troff command, but I don't know how to fix that. In any case it's now better than before. Signed-off-by: Andi Kleen <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-27perf test: Stat test for repeat with a weak groupIan Rogers1-0/+19
Breaking a weak group requires multiple passes of an evlist, with multiple runs this can introduce bugs ultimately leading to segfaults. Add a test to cover this. Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-27perf stat: Clear evsel->reset_group for each stat runIan Rogers1-0/+1
If a weak group is broken then the reset_group flag remains set for the next run. Having reset_group set means the counter isn't created and ultimately a segfault. A simple reproduction of this is: # perf stat -r2 -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}:W which will be added as a test in the next patch. Fixes: 4804e0111662d7d8 ("perf stat: Use affinity for opening events") Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Xing Zhengjun <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-27tools kvm headers arm64: Update KVM header from the kernel sourcesArnaldo Carvalho de Melo1-2/+4
To pick the changes from: ae3b1da95413614f ("KVM: arm64: Fix compile error due to sign extension") That doesn't result in any changes in tooling (when built on x86), only addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h' diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Yang Yingliang <[email protected]> Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-27perf python: Fix build when PYTHON_CONFIG is user suppliedJames Clark1-1/+1
The previous change to Python autodetection had a small mistake where the auto value was used to determine the Python binary, rather than the user supplied value. The Python binary is only used for one part of the build process, rather than the final linking, so it was producing correct builds in most scenarios, especially when the auto detected value matched what the user wanted, or the system only had a valid set of Pythons. Change it so that the Python binary path is derived from either the PYTHON_CONFIG value or PYTHON value, depending on what is specified by the user. This was the original intention. This error was spotted in a build failure an odd cross compilation environment after commit 4c41cb46a732fe82 ("perf python: Prefer python3") was merged. Fixes: 630af16eee495f58 ("perf tools: Use Python devtools for version autodetection rather than runtime") Signed-off-by: James Clark <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2022-08-25Merge tag 'net-6.0-rc3' of ↵Linus Torvalds5-0/+90
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from ipsec and netfilter (with one broken Fixes tag). Current release - new code bugs: - dsa: don't dereference NULL extack in dsa_slave_changeupper() - dpaa: fix <1G ethernet on LS1046ARDB - neigh: don't call kfree_skb() under spin_lock_irqsave() Previous releases - regressions: - r8152: fix the RX FIFO settings when suspending - dsa: microchip: keep compatibility with device tree blobs with no phy-mode - Revert "net: macsec: update SCI upon MAC address change." - Revert "xfrm: update SA curlft.use_time", comply with RFC 2367 Previous releases - always broken: - netfilter: conntrack: work around exceeded TCP receive window - ipsec: fix a null pointer dereference of dst->dev on a metadata dst in xfrm_lookup_with_ifid - moxa: get rid of asymmetry in DMA mapping/unmapping - dsa: microchip: make learning configurable and keep it off while standalone - ice: xsk: prohibit usage of non-balanced queue id - rxrpc: fix locking in rxrpc's sendmsg Misc: - another chunk of sysctl data race silencing" * tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) net: lantiq_xrx200: restore buffer if memory allocation failed net: lantiq_xrx200: fix lock under memory pressure net: lantiq_xrx200: confirm skb is allocated before using net: stmmac: work around sporadic tx issue on link-up ionic: VF initial random MAC address if no assigned mac ionic: fix up issues with handling EAGAIN on FW cmds ionic: clear broken state on generation change rxrpc: Fix locking in rxrpc's sendmsg net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2 MAINTAINERS: rectify file entry in BONDING DRIVER i40e: Fix incorrect address type for IPv6 flow rules ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter net: Fix a data-race around sysctl_somaxconn. net: Fix a data-race around netdev_unregister_timeout_secs. net: Fix a data-race around gro_normal_batch. net: Fix data-races around sysctl_devconf_inherit_init_net. net: Fix data-races around sysctl_fb_tunnels_only_for_init_net. net: Fix a data-race around netdev_budget_usecs. net: Fix data-races around sysctl_max_skb_frags. net: Fix a data-race around netdev_budget. ...
2022-08-25x86/sev: Mark snp_abort() noreturnBorislav Petkov1-16/+18
Mark both the function prototype and definition as noreturn in order to prevent the compiler from doing transformations which confuse objtool like so: vmlinux.o: warning: objtool: sme_enable+0x71: unreachable instruction This triggers with gcc-12. Add it and sev_es_terminate() to the objtool noreturn tracking array too. Sort it while at it. Suggested-by: Michael Matz <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-08-23Merge tag 'linux-kselftest-fixes-6.0-rc3' of ↵Linus Torvalds2-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: "Fixes to vm and sgx test builds" * tag 'linux-kselftest-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/vm: fix inability to build any vm tests selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning
2022-08-22selftests: include bonding tests into the kselftest infraJonathan Toppins5-0/+90
This creates a test collection in drivers/net/bonding for bonding specific kernel selftests. The first test is a reproducer that provisions a bond and given the specific order in how the ip-link(8) commands are issued the bond never transmits an LACPDU frame on any of its slaves. Signed-off-by: Jonathan Toppins <[email protected]> Acked-by: Jay Vosburgh <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-22perf tools: Fix compile error for x86Yang Jihong1-0/+4
Commit a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO") eradicates CC_HAS_ASM_GOTO, and in the process also causes the perf tool on x86 to use asm_volatile_goto when compiling __GEN_RMWcc. However, asm_volatile_goto is not declared in the perf tool headers, which causes a compilation error: In file included from tools/arch/x86/include/asm/atomic.h:7, from tools/include/asm/atomic.h:6, from tools/include/linux/atomic.h:5, from tools/include/linux/refcount.h:41, from tools/lib/perf/include/internal/cpumap.h:5, from tools/perf/util/cpumap.h:7, from tools/perf/util/env.h:7, from tools/perf/util/header.h:12, from pmu-events/pmu-events.c:9: tools/arch/x86/include/asm/atomic.h: In function ‘atomic_dec_and_test’: tools/arch/x86/include/asm/rmwcc.h:7:2: error: implicit declaration of function ‘asm_volatile_goto’ [-Werror=implicit-function-declaration] asm_volatile_goto (fullop "; j" cc " %l[cc_label]" \ ^~~~~~~~~~~~~~~~~ Define asm_volatile_goto in compiler_types.h if not declared, like the main kernel header files do. Fixes: a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO") Signed-off-by: Yang Jihong <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Ingo Molnar <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-08-21asm goto: eradicate CC_HAS_ASM_GOTONick Desaulniers1-21/+0
GCC has supported asm goto since 4.5, and Clang has since version 9.0.0. The minimum supported versions of these tools for the build according to Documentation/process/changes.rst are 5.1 and 11.0.0 respectively. Remove the feature detection script, Kconfig option, and clean up some fallback code that is no longer supported. The removed script was also testing for a GCC specific bug that was fixed in the 4.7 release. Also remove workarounds for bpftrace using clang older than 9.0.0, since other BPF backend fixes are required at this point. Link: https://lore.kernel.org/lkml/CAK7LNATSr=BXKfkdW8f-H5VT_w=xBpT2ZQcZ7rm6JfkdE+QnmA@mail.gmail.com/ Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637 Acked-by: Borislav Petkov <[email protected]> Suggested-by: Masahiro Yamada <[email protected]> Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Nick Desaulniers <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Reviewed-by: Alexandre Belloni <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-08-20Merge tag 'perf-tools-fixes-for-v6.0-2022-08-19' of ↵Linus Torvalds27-235/+976
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix alignment for cpu map masks in event encoding. - Support reading PERF_FORMAT_LOST, perf tool counterpart for a feature that was added in this merge window. - Sync perf tools copies of kernel headers: socket, msr-index, fscrypt, cpufeatures, i915_drm, kvm, vhost, perf_event. * tag 'perf-tools-fixes-for-v6.0-2022-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf tools: Support reading PERF_FORMAT_LOST libperf: Add a test case for read formats libperf: Handle read format in perf_evsel__read() tools headers UAPI: Sync linux/perf_event.h with the kernel sources tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources tools headers UAPI: Sync KVM's vmx.h header with the kernel sources tools include UAPI: Sync linux/vhost.h with the kernel sources tools headers kvm s390: Sync headers with the kernel sources tools headers UAPI: Sync linux/kvm.h with the kernel sources tools headers UAPI: Sync drm/i915_drm.h with the kernel sources tools headers cpufeatures: Sync with the kernel sources tools headers UAPI: Sync linux/fscrypt.h with the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources perf beauty: Update copy of linux/socket.h with the kernel sources perf cpumap: Fix alignment for masks in event encoding perf cpumap: Compute mask size in constant time perf cpumap: Synthetic events and const/static perf cpumap: Const map for max()