aboutsummaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)AuthorFilesLines
2020-08-07mm: filemap: add missing FGP_ flags in kerneldoc comment for pagecache_get_pageYang Shi1-0/+3
FGP_{WRITE|NOFS|NOWAIT} were missed in pagecache_get_page's kerneldoc comment. Signed-off-by: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Gang Deng <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm: filemap: clear idle flag for writesYang Shi1-0/+6
Since commit bbddabe2e436aa ("mm: filemap: only do access activations on reads"), mark_page_accessed() is called for reads only. But the idle flag is cleared by mark_page_accessed() so the idle flag won't get cleared if the page is write accessed only. Basically idle page tracking is used to estimate workingset size of workload, noticeable size of workingset might be missed if the idle flag is not maintained correctly. It seems good enough to just clear idle flag for write operations. Fixes: bbddabe2e436 ("mm: filemap: only do access activations on reads") Reported-by: Gang Deng <[email protected]> Signed-off-by: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, dump_page: do not crash with bad compound_mapcount()John Hubbard1-3/+3
If a compound page is being split while dump_page() is being run on that page, we can end up calling compound_mapcount() on a page that is no longer compound. This leads to a crash (already seen at least once in the field), due to the VM_BUG_ON_PAGE() assertion inside compound_mapcount(). (The above is from Matthew Wilcox's analysis of Qian Cai's bug report.) A similar problem is possible, via compound_pincount() instead of compound_mapcount(). In order to avoid this kind of crash, make dump_page() slightly more robust, by providing a pair of simpler routines that don't contain assertions: head_mapcount() and head_pincount(). For debug tools, we don't want to go *too* far in this direction, but this is a simple small fix, and the crash has already been seen, so it's a good trade-off. Reported-by: Qian Cai <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Signed-off-by: John Hubbard <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: William Kucharski <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/debug: print hashed address of struct pageMatthew Wilcox (Oracle)1-4/+4
The actual address of the struct page isn't particularly helpful, while the hashed address helps match with other messages elsewhere. Add the PFN that the page refers to in order to help diagnose problems where the page is improperly aligned for the purpose. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: John Hubbard <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/debug: print the inode number in dump_pageMatthew Wilcox (Oracle)1-3/+3
The inode number helps correlate this page with debug messages elsewhere in the kernel. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: John Hubbard <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/debug: switch dump_page to get_kernel_nofaultMatthew Wilcox (Oracle)1-20/+16
This is simpler to use than copy_from_kernel_nofault(). Also make some of the related error messages less verbose. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Mike Rapoport <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: John Hubbard <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: William Kucharski <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/debug: print head flags in dump_pageMatthew Wilcox (Oracle)1-1/+1
Tail page flags contain very little useful information. Print the head page's flags instead. While the flags will contain "head" for tail pages, this should not be too confusing as the previous line starts with the word "head:" and so the flags should be interpreted as belonging to the head page. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: John Hubbard <[email protected]> Acked-by: Mike Rapoport <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: William Kucharski <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/debug: dump compound page information on a second lineMatthew Wilcox (Oracle)1-18/+12
Simplify both the implementation and the output by splitting all the compound page information onto a second line. Reported-by: John Hubbard <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: John Hubbard <[email protected]> Reviewed-by: John Hubbard <[email protected]> Acked-by: Mike Rapoport <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: William Kucharski <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/debug: handle page->mapping better in dump_pageMatthew Wilcox (Oracle)1-2/+13
Patch series "Improvements for dump_page()", v2. Here's a sample dump of a pagecache tail page with all of the patches applied: page:000000006d1c49ca refcount:6 mapcount:0 mapping:00000000136b8d90 index:0x109 pfn:0x6c645 head:000000008bd38076 order:2 compound_mapcount:0 compound_pincount:0 aops:xfs_address_space_operations ino:800042 dentry name:"fd" flags: 0x4000000000012014(uptodate|lru|private|head) raw: 4000000000000000 ffffd46ac1b19101 ffffffff00000202 dead000000000004 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 head: 4000000000012014 ffffd46ac1b1bbc8 ffffd46ac1b1bc08 ffff91976f659560 head: 0000000000000108 ffff919773220680 00000006ffffffff 0000000000000000 page dumped because: testing This patch (of 6): If we can't call page_mapping() to get the page mapping, handle the anon/ksm/movable bits correctly. [[email protected]: augmented code comment from John] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: John Hubbard <[email protected]> Acked-by: Mike Rapoport <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: William Kucharski <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07Documentation/mm: add descriptions for arch page table helpersAnshuman Khandual1-0/+6
This adds a specific description file for all arch page table helpers which is in sync with the semantics being tested via CONFIG_DEBUG_VM_PGTABLE. All future changes either to these descriptions here or the debug test should always remain in sync. [[email protected]: fold in Mike's patch for the rst document, fix typos in the rst document] Link: http://lkml.kernel.org/r/[email protected] Suggested-by: Mike Rapoport <[email protected]> Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Zi Yan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/debug_vm_pgtable: add debug prints for individual testsAnshuman Khandual1-1/+45
This adds debug print information that enlists all tests getting executed on a given platform. With dynamic debug enabled, the following information will be splashed during boot. For compactness purpose, dropped both time stamp and prefix (i.e debug_vm_pgtable) from this sample output. [debug_vm_pgtable ]: Validating architecture page table helpers [pte_basic_tests ]: Validating PTE basic [pmd_basic_tests ]: Validating PMD basic [p4d_basic_tests ]: Validating P4D basic [pgd_basic_tests ]: Validating PGD basic [pte_clear_tests ]: Validating PTE clear [pmd_clear_tests ]: Validating PMD clear [pte_advanced_tests ]: Validating PTE advanced [pmd_advanced_tests ]: Validating PMD advanced [hugetlb_advanced_tests]: Validating HugeTLB advanced [pmd_leaf_tests ]: Validating PMD leaf [pmd_huge_tests ]: Validating PMD huge [pte_savedwrite_tests ]: Validating PTE saved write [pmd_savedwrite_tests ]: Validating PMD saved write [pmd_populate_tests ]: Validating PMD populate [pte_special_tests ]: Validating PTE special [pte_protnone_tests ]: Validating PTE protnone [pmd_protnone_tests ]: Validating PMD protnone [pte_devmap_tests ]: Validating PTE devmap [pmd_devmap_tests ]: Validating PMD devmap [pte_swap_tests ]: Validating PTE swap [swap_migration_tests ]: Validating swap migration [hugetlb_basic_tests ]: Validating HugeTLB basic [pmd_thp_tests ]: Validating PMD based THP Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Vineet Gupta <[email protected]> [arc] Reviewed-by: Zi Yan <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mike Rapoport <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/debug_vm_pgtable: add tests validating advanced arch page table helpersAnshuman Khandual1-0/+312
This adds new tests validating for these following arch advanced page table helpers. These tests create and test specific mapping types at various page table levels. 1. pxxp_set_wrprotect() 2. pxxp_get_and_clear() 3. pxxp_set_access_flags() 4. pxxp_get_and_clear_full() 5. pxxp_test_and_clear_young() 6. pxx_leaf() 7. pxx_set_huge() 8. pxx_(clear|mk)_savedwrite() 9. huge_pxxp_xxx() [[email protected]: drop RANDOM_ORVALUE from hugetlb_advanced_tests()] Link: http://lkml.kernel.org/r/[email protected] Suggested-by: Catalin Marinas <[email protected]> Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Vineet Gupta <[email protected]> [arc] Reviewed-by: Zi Yan <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Steven Price <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/debug_vm_pgtable: add tests validating arch helpers for core MM featuresAnshuman Khandual1-1/+301
Patch series "mm/debug_vm_pgtable: Add some more tests", v5. This series adds some more arch page table helper validation tests which are related to core and advanced memory functions. This also creates a documentation, enlisting expected semantics for all page table helpers as suggested by Mike Rapoport previously (https://lkml.org/lkml/2020/1/30/40). There are many TRANSPARENT_HUGEPAGE and ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD ifdefs scattered across the test. But consolidating all the fallback stubs is not very straight forward because ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD is not explicitly dependent on ARCH_HAS_TRANSPARENT_HUGEPAGE. Tested on arm64, x86 platforms but only build tested on all other enabled platforms through ARCH_HAS_DEBUG_VM_PGTABLE i.e powerpc, arc, s390. The following failure on arm64 still exists which was mentioned previously. It will be fixed with the upcoming THP migration on arm64 enablement series. WARNING .... mm/debug_vm_pgtable.c:860 debug_vm_pgtable+0x940/0xa54 WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd)))) This patch (of 4): This adds new tests validating arch page table helpers for these following core memory features. These tests create and test specific mapping types at various page table levels. 1. SPECIAL mapping 2. PROTNONE mapping 3. DEVMAP mapping 4. SOFTDIRTY mapping 5. SWAP mapping 6. MIGRATION mapping 7. HUGETLB mapping 8. THP mapping Suggested-by: Catalin Marinas <[email protected]> Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Vineet Gupta <[email protected]> [arc] Reviewed-by: Zi Yan <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Steven Price <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, kcsan: instrument SLAB/SLUB free with "ASSERT_EXCLUSIVE_ACCESS"Marco Elver2-0/+10
Provide the necessary KCSAN checks to assist with debugging racy use-after-frees. While KASAN is more reliable at generally catching such use-after-frees (due to its use of a quarantine), it can be difficult to debug racy use-after-frees. If a reliable reproducer exists, KCSAN can assist in debugging such issues. Note: ASSERT_EXCLUSIVE_ACCESS is a convenience wrapper if the size is simply sizeof(var). Instead, here we just use __kcsan_check_access() explicitly to pass the correct size. Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/slub.c: drop lockdep_assert_held() from put_map()Sebastian Andrzej Siewior1-2/+0
There is no point in using lockdep_assert_held() unlock that is about to be unlocked. It works only with lockdep and lockdep will complain if spin_unlock() is used on a lock that has not been locked. Remove superfluous lockdep_assert_held(). Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, slab/slub: improve error reporting and overhead of cache_from_obj()Vlastimil Babka3-45/+46
cache_from_obj() was added by commit b9ce5ef49f00 ("sl[au]b: always get the cache from its page in kmem_cache_free()") to support kmemcg, where per-memcg cache can be different from the root one, so we can't use the kmem_cache pointer given to kmem_cache_free(). Prior to that commit, SLUB already had debugging check+warning that could be enabled to compare the given kmem_cache pointer to one referenced by the slab page where the object-to-be-freed resides. This check was moved to cache_from_obj(). Later the check was also enabled for SLAB_FREELIST_HARDENED configs by commit 598a0717a816 ("mm/slab: validate cache membership under freelist hardening"). These checks and warnings can be useful especially for the debugging, which can be improved. Commit 598a0717a816 changed the pr_err() with WARN_ON_ONCE() to WARN_ONCE() so only the first hit is now reported, others are silent. This patch changes it to WARN() so that all errors are reported. It's also useful to print SLUB allocation/free tracking info for the offending object, if tracking is enabled. Thus, export the SLUB print_tracking() function and provide an empty one for SLAB. For SLUB we can also benefit from the static key check in kmem_cache_debug_flags(), but we need to move this function to slab.h and declare the static key there. [1] https://lore.kernel.org/r/[email protected] [[email protected]: avoid bogus WARN()] Link: https://lore.kernel.org/r/20200623090213.GW5535@shao2-debian Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Garrett <[email protected]> Cc: Jann Horn <[email protected]> Cc: Vijayanand Jitta <[email protected]> Cc: Vinayak Menon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, slab/slub: move and improve cache_from_obj()Vlastimil Babka3-23/+29
The function cache_from_obj() was added by commit b9ce5ef49f00 ("sl[au]b: always get the cache from its page in kmem_cache_free()") to support kmemcg, where per-memcg cache can be different from the root one, so we can't use the kmem_cache pointer given to kmem_cache_free(). Prior to that commit, SLUB already had debugging check+warning that could be enabled to compare the given kmem_cache pointer to one referenced by the slab page where the object-to-be-freed resides. This check was moved to cache_from_obj(). Later the check was also enabled for SLAB_FREELIST_HARDENED configs by commit 598a0717a816 ("mm/slab: validate cache membership under freelist hardening"). These checks and warnings can be useful especially for the debugging, which can be improved. Commit 598a0717a816 changed the pr_err() with WARN_ON_ONCE() to WARN_ONCE() so only the first hit is now reported, others are silent. This patch changes it to WARN() so that all errors are reported. It's also useful to print SLUB allocation/free tracking info for the offending object, if tracking is enabled. We could export the SLUB print_tracking() function and provide an empty one for SLAB, or realize that both the debugging and hardening cases in cache_from_obj() are only supported by SLUB anyway. So this patch moves cache_from_obj() from slab.h to separate instances in slab.c and slub.c, where the SLAB version only does the kmemcg lookup and even could be completely removed once the kmemcg rework [1] is merged. The SLUB version can thus easily use the print_tracking() function. It can also use the kmem_cache_debug_flags() static key check for improved performance in kernels without the hardening and with debugging not enabled on boot. [1] https://lore.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kees Cook <[email protected]> Cc: Vijayanand Jitta <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, slub: extend checks guarded by slub_debug static keyVlastimil Babka1-3/+3
There are few more places in SLUB that could benefit from reduced overhead of the static key introduced by a previous patch: - setup_object_debug() called on each object in newly allocated slab page - setup_page_debug() called on newly allocated slab page - __free_slab() called on freed slab page Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kees Cook <[email protected]> Cc: Vijayanand Jitta <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, slub: introduce kmem_cache_debug_flags()Vlastimil Babka1-5/+16
There are few places that call kmem_cache_debug(s) (which tests if any of debug flags are enabled for a cache) immediately followed by a test for a specific flag. The compiler can probably eliminate the extra check, but we can make the code nicer by introducing kmem_cache_debug_flags() that works like kmem_cache_debug() (including the static key check) but tests for specific flag(s). The next patches will add more users. [[email protected]: change return from int to bool, per Kees. Add VM_WARN_ON_ONCE() for invalid flags, per Roman] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Jann Horn <[email protected]> Cc: Vijayanand Jitta <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, slub: introduce static key for slub_debug()Vlastimil Babka1-3/+13
One advantage of CONFIG_SLUB_DEBUG is that a generic distro kernel can be built with the option enabled, but it's inactive until simply enabled on boot, without rebuilding the kernel. With a static key, we can further eliminate the overhead of checking whether a cache has a particular debug flag enabled if we know that there are no such caches (slub_debug was not enabled during boot). We use the same mechanism also for e.g. page_owner, debug_pagealloc or kmemcg functionality. This patch introduces the static key and makes the general check for per-cache debug flags kmem_cache_debug() use it. This benefits several call sites, including (slow path but still rather frequent) __slab_free(). The next patches will add more uses. Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Jann Horn <[email protected]> Cc: Vijayanand Jitta <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, slub: make reclaim_account attribute read-onlyVlastimil Babka1-10/+1
The attribute reflects the SLAB_RECLAIM_ACCOUNT cache flag. It's not clear why this attribute was writable in the first place, as it's tied to how the cache is used by its creator, it's not a user tunable. Furthermore: - it affects slab merging, but that's not being checked while toggled - if affects whether __GFP_RECLAIMABLE flag is used to allocate page, but the runtime toggle doesn't update allocflags - it affects cache_vmstat_idx() so runtime toggling might lead to incosistency of NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE Thus make it read-only. Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Jann Horn <[email protected]> Cc: Vijayanand Jitta <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, slub: make remaining slub_debug related attributes read-onlyVlastimil Babka1-59/+3
SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be read to check if the respective debugging options are enabled for given cache. Some options, namely sanity_checks, trace, and failslab can be also enabled and disabled at runtime by writing into the files. The runtime toggling is racy. Some options disable __CMPXCHG_DOUBLE when enabled, which means that in case of concurrent allocations, some can still use __CMPXCHG_DOUBLE and some not, leading to potential corruption. The s->flags field is also not updated or checked atomically. The simplest solution is to remove the runtime toggling. The extended slub_debug boot parameter syntax introduced by earlier patch should allow to fine-tune the debugging configuration during boot with same granularity. Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Jann Horn <[email protected]> Cc: Vijayanand Jitta <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, slub: remove runtime allocation order changesVlastimil Babka1-18/+1
SLUB allows runtime changing of page allocation order by writing into the /sys/kernel/slab/<cache>/order file. Jann has reported [1] that this interface allows the order to be set too small, leading to crashes. While it's possible to fix the immediate issue, closer inspection reveals potential races. Storing the new order calls calculate_sizes() which non-atomically updates a lot of kmem_cache fields while the cache is still in use. Unexpected behavior might occur even if the fields are set to the same value as they were. This could be fixed by splitting out the part of calculate_sizes() that depends on forced_order, so that we only update kmem_cache.oo field. This could still race with init_cache_random_seq(), shuffle_freelist(), allocate_slab(). Perhaps it's possible to audit and e.g. add some READ_ONCE/WRITE_ONCE accesses, it might be easier just to remove the runtime order changes, which is what this patch does. If there are valid usecases for per-cache order setting, we could e.g. extend the boot parameters to do that. [1] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com Reported-by: Jann Horn <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Vijayanand Jitta <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, slub: make some slub_debug related attributes read-onlyVlastimil Babka1-43/+3
SLUB_DEBUG creates several files under /sys/kernel/slab/<cache>/ that can be read to check if the respective debugging options are enabled for given cache. The options can be also toggled at runtime by writing into the files. Some of those, namely red_zone, poison, and store_user can be toggled only when no objects yet exist in the cache. Vijayanand reports [1] that there is a problem with freelist randomization if changing the debugging option's state results in different number of objects per page, and the random sequence cache needs thus needs to be recomputed. However, another problem is that the check for "no objects yet exist in the cache" is racy, as noted by Jann [2] and fixing that would add overhead or otherwise complicate the allocation/freeing paths. Thus it would be much simpler just to remove the runtime toggling support. The documentation describes it's "In case you forgot to enable debugging on the kernel command line", but the neccessity of having no objects limits its usefulness anyway for many caches. Vijayanand describes an use case [3] where debugging is enabled for all but zram caches for memory overhead reasons, and using the runtime toggles was the only way to achieve such configuration. After the previous patch it's now possible to do that directly from the kernel boot option, so we can remove the dangerous runtime toggles by making the /sys attribute files read-only. While updating it, also improve the documentation of the debugging /sys files. [1] https://lkml.kernel.org/r/[email protected] [2] https://lore.kernel.org/r/CAG48ez31PP--h6_FzVyfJ4H86QYczAFPdxtJHUEEan+7VJETAQ@mail.gmail.com [3] https://lore.kernel.org/r/[email protected] Reported-by: Vijayanand Jitta <[email protected]> Reported-by: Jann Horn <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, slub: extend slub_debug syntax for multiple blocksVlastimil Babka1-51/+126
Patch series "slub_debug fixes and improvements". The slub_debug kernel boot parameter can either apply a single set of options to all caches or a list of caches. There is a use case where debugging is applied for all caches and then disabled at runtime for specific caches, for performance and memory consumption reasons [1]. As runtime changes are dangerous, extend the boot parameter syntax so that multiple blocks of either global or slab-specific options can be specified, with blocks delimited by ';'. This will also support the use case of [1] without runtime changes. For details see the updated Documentation/vm/slub.rst [1] https://lore.kernel.org/r/[email protected] [[email protected]: make parse_slub_debug_flags() static] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Jann Horn <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vijayanand Jitta <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/slab.c: update outdated kmem_list3 in a commentXiao Yang1-1/+1
kmem_list3 has been renamed to kmem_cache_node long long ago so update it. References: 6744f087ba2a ("slab: Common name for the per node structures") ce8eb6c424c7 ("slab: Rename list3/l3 to node") Signed-off-by: Xiao Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, slab: check GFP_SLAB_BUG_MASK before alloc_pages in kmalloc_orderLong Li4-14/+23
kmalloc cannot allocate memory from HIGHMEM. Allocating large amounts of memory currently bypasses the check and will simply leak the memory when page_address() returns NULL. To fix this, factor the GFP_SLAB_BUG_MASK check out of slab & slub, and call it from kmalloc_order() as well. In order to make the code clear, the warning message is put in one place. Signed-off-by: Long Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/20200704035027.GA62481@lilong Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/slab: add naive detection of double freeKees Cook1-2/+12
Similar to commit ce6fa91b9363 ("mm/slub.c: add a naive detection of double free or corruption"), add a very cheap double-free check for SLAB under CONFIG_SLAB_FREELIST_HARDENED. With this added, the "SLAB_FREE_DOUBLE" LKDTM test passes under SLAB: lkdtm: Performing direct entry SLAB_FREE_DOUBLE lkdtm: Attempting double slab free ... ------------[ cut here ]------------ WARNING: CPU: 2 PID: 2193 at mm/slab.c:757 ___cache _free+0x325/0x390 [[email protected]: fix misplaced __free_one()] Link: http://lkml.kernel.org/r/202006261306.0D82A2B@keescook Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Randy Dunlap <[email protected]> [build tested] Cc: Roman Gushchin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Alexander Popov <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Vinayak Menon <[email protected]> Cc: Matthew Garrett <[email protected]> Cc: Jann Horn <[email protected]> Cc: Vijayanand Jitta <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm: ksize() should silently accept a NULL pointerWilliam Kucharski1-9/+5
Other mm routines such as kfree() and kzfree() silently do the right thing if passed a NULL pointer, so ksize() should do the same. Signed-off-by: William Kucharski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm, treewide: rename kzfree() to kfree_sensitive()Waiman Long1-4/+4
As said by Linus: A symmetric naming is only helpful if it implies symmetries in use. Otherwise it's actively misleading. In "kzalloc()", the z is meaningful and an important part of what the caller wants. In "kzfree()", the z is actively detrimental, because maybe in the future we really _might_ want to use that "memfill(0xdeadbeef)" or something. The "zero" part of the interface isn't even _relevant_. The main reason that kzfree() exists is to clear sensitive information that should not be leaked to other future users of the same memory objects. Rename kzfree() to kfree_sensitive() to follow the example of the recently added kvfree_sensitive() and make the intention of the API more explicit. In addition, memzero_explicit() is used to clear the memory to make sure that it won't get optimized away by the compiler. The renaming is done by using the command sequence: git grep -w --name-only kzfree |\ xargs sed -i 's/kzfree/kfree_sensitive/' followed by some editing of the kfree_sensitive() kerneldoc and adding a kzfree backward compatibility macro in slab.h. [[email protected]: fs/crypto/inline_crypt.c needs linux/slab.h] [[email protected]: fix fs/crypto/inline_crypt.c some more] Suggested-by: Joe Perches <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: David Howells <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: James Morris <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: Joe Perches <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: "Jason A . Donenfeld" <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/shuffle: don't move pages between zones and don't read garbage memmapsDavid Hildenbrand1-9/+9
Especially with memory hotplug, we can have offline sections (with a garbage memmap) and overlapping zones. We have to make sure to only touch initialized memmaps (online sections managed by the buddy) and that the zone matches, to not move pages between zones. To test if this can actually happen, I added a simple BUG_ON(page_zone(page_i) != page_zone(page_j)); right before the swap. When hotplugging a 256M DIMM to a 4G x86-64 VM and onlining the first memory block "online_movable" and the second memory block "online_kernel", it will trigger the BUG, as both zones (NORMAL and MOVABLE) overlap. This might result in all kinds of weird situations (e.g., double allocations, list corruptions, unmovable allocations ending up in the movable zone). Fixes: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization") Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Wei Yang <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Dan Williams <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Huang Ying <[email protected]> Cc: Wei Yang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: <[email protected]> [5.2+] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm/migrate: fix migrate_pgmap_owner w/o CONFIG_MMU_NOTIFIERRalph Campbell1-3/+3
On x86_64, when CONFIG_MMU_NOTIFIER is not set/enabled, there is a compiler error: mm/migrate.c: In function 'migrate_vma_collect': mm/migrate.c:2481:7: error: 'struct mmu_notifier_range' has no member named 'migrate_pgmap_owner' range.migrate_pgmap_owner = migrate->pgmap_owner; ^ Fixes: 998427b3ad2c ("mm/notifier: add migration invalidation type") Reported-by: Randy Dunlap <[email protected]> Signed-off-by: Ralph Campbell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Randy Dunlap <[email protected]> Acked-by: Randy Dunlap <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Hubbard <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: "Jason Gunthorpe" <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds2-3/+7
Pull networking updates from David Miller: 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan. 2) Support UDP segmentation in code TSO code, from Eric Dumazet. 3) Allow flashing different flash images in cxgb4 driver, from Vishal Kulkarni. 4) Add drop frames counter and flow status to tc flower offloading, from Po Liu. 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni. 6) Various new indirect call avoidance, from Eric Dumazet and Brian Vazquez. 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from Yonghong Song. 8) Support querying and setting hardware address of a port function via devlink, use this in mlx5, from Parav Pandit. 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson. 10) Switch qca8k driver over to phylink, from Jonathan McDowell. 11) In bpftool, show list of processes holding BPF FD references to maps, programs, links, and btf objects. From Andrii Nakryiko. 12) Several conversions over to generic power management, from Vaibhav Gupta. 13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry Yakunin. 14) Various https url conversions, from Alexander A. Klimov. 15) Timestamping and PHC support for mscc PHY driver, from Antoine Tenart. 16) Support bpf iterating over tcp and udp sockets, from Yonghong Song. 17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov. 18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan. 19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several drivers. From Luc Van Oostenryck. 20) XDP support for xen-netfront, from Denis Kirjanov. 21) Support receive buffer autotuning in MPTCP, from Florian Westphal. 22) Support EF100 chip in sfc driver, from Edward Cree. 23) Add XDP support to mvpp2 driver, from Matteo Croce. 24) Support MPTCP in sock_diag, from Paolo Abeni. 25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic infrastructure, from Jakub Kicinski. 26) Several pci_ --> dma_ API conversions, from Christophe JAILLET. 27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel. 28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki. 29) Refactor a lot of networking socket option handling code in order to avoid set_fs() calls, from Christoph Hellwig. 30) Add rfc4884 support to icmp code, from Willem de Bruijn. 31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei. 32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin. 33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin. 34) Support TCP syncookies in MPTCP, from Flowian Westphal. 35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano Brivio. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits) net: thunderx: initialize VF's mailbox mutex before first usage usb: hso: remove bogus check for EINPROGRESS usb: hso: no complaint about kmalloc failure hso: fix bailout in error case of probe ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM selftests/net: relax cpu affinity requirement in msg_zerocopy test mptcp: be careful on subflow creation selftests: rtnetlink: make kci_test_encap() return sub-test result selftests: rtnetlink: correct the final return value for the test net: dsa: sja1105: use detected device id instead of DT one on mismatch tipc: set ub->ifindex for local ipv6 address ipv6: add ipv6_dev_find() net: openvswitch: silence suspicious RCU usage warning Revert "vxlan: fix tos value before xmit" ptp: only allow phase values lower than 1 period farsync: switch from 'pci_' to 'dma_' API wan: wanxl: switch from 'pci_' to 'dma_' API hv_netvsc: do not use VF device if link is down dpaa2-eth: Fix passing zero to 'PTR_ERR' warning net: macb: Properly handle phylink on at91sam9x ...
2020-08-05Merge tag 'for-linus-hmm' of ↵Linus Torvalds2-6/+24
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull hmm updates from Jason Gunthorpe: "Ralph has been working on nouveau's use of hmm_range_fault() and migrate_vma() which resulted in this small series. It adds reporting of the page table order from hmm_range_fault() and some optimization of migrate_vma(): - Report the size of the page table mapping out of hmm_range_fault(). This makes it easier to establish a large/huge/etc mapping in the device's page table. - Allow devices to ignore the invalidations during migration in cases where the migration is not going to change pages. For instance migrating pages to a device does not require the device to invalidate pages already in the device. - Update nouveau and hmm_tests to use the above" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: mm/hmm/test: use the new migration invalidation nouveau/svm: use the new migration invalidation mm/notifier: add migration invalidation type mm/migrate: add a flags parameter to migrate_vma nouveau: fix storing invalid ptes nouveau/hmm: support mapping large sysmem pages nouveau: fix mapping 2MB sysmem pages nouveau/hmm: fault one page at a time mm/hmm: add tests for hmm_pfn_to_map_order() mm/hmm: provide the page mapping order in hmm_range_fault()
2020-08-04Merge tag 'docs-5.9' of git://git.lwn.net/linuxLinus Torvalds2-2/+2
Pull documentation updates from Jonathan Corbet: "It's been a busy cycle for documentation - hopefully the busiest for a while to come. Changes include: - Some new Chinese translations - Progress on the battle against double words words and non-HTTPS URLs - Some block-mq documentation - More RST conversions from Mauro. At this point, that task is essentially complete, so we shouldn't see this kind of churn again for a while. Unless we decide to switch to asciidoc or something...:) - Lots of typo fixes, warning fixes, and more" * tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits) scripts/kernel-doc: optionally treat warnings as errors docs: ia64: correct typo mailmap: add entry for <[email protected]> doc/zh_CN: add cpu-load Chinese version Documentation/admin-guide: tainted-kernels: fix spelling mistake MAINTAINERS: adjust kprobes.rst entry to new location devices.txt: document rfkill allocation PCI: correct flag name docs: filesystems: vfs: correct flag name docs: filesystems: vfs: correct sync_mode flag names docs: path-lookup: markup fixes for emphasis docs: path-lookup: more markup fixes docs: path-lookup: fix HTML entity mojibake CREDITS: Replace HTTP links with HTTPS ones docs: process: Add an example for creating a fixes tag doc/zh_CN: add Chinese translation prefer section doc/zh_CN: add clearing-warn-once Chinese version doc/zh_CN: add admin-guide index doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label futex: MAINTAINERS: Re-add selftests directory ...
2020-08-04Merge tag 'uninit-macro-v5.9-rc1' of ↵Linus Torvalds10-13/+13
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
2020-08-03Merge tag 'core-rcu-2020-08-03' of ↵Linus Torvalds2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: - kfree_rcu updates - RCU tasks updates - Read-side scalability tests - SRCU updates - Torture-test updates - Documentation updates - Miscellaneous fixes * tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (109 commits) torture: Remove obsolete "cd $KVM" torture: Avoid duplicate specification of qemu command torture: Dump ftrace at shutdown only if requested torture: Add kvm-tranform.sh script for qemu-cmd files torture: Add more tracing crib notes to kvm.sh torture: Improve diagnostic for KCSAN-incapable compilers torture: Correctly summarize build-only runs torture: Pass --kmake-arg to all make invocations rcutorture: Check for unwatched readers torture: Abstract out console-log error detection torture: Add a stop-run capability torture: Create qemu-cmd in --buildonly runs rcu/rcutorture: Replace 0 with false torture: Add --allcpus argument to the kvm.sh script torture: Remove whitespace from identify_qemu_vcpus output rcutorture: NULL rcu_torture_current earlier in cleanup code rcutorture: Handle non-statistic bang-string error messages torture: Set configfile variable to current scenario rcutorture: Add races with task-exit processing locktorture: Use true and false to assign to bool variables ...
2020-08-03Merge tag 'arm64-upstream' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 and cross-arch updates from Catalin Marinas: "Here's a slightly wider-spread set of updates for 5.9. Going outside the usual arch/arm64/ area is the removal of read_barrier_depends() series from Will and the MSI/IOMMU ID translation series from Lorenzo. The notable arm64 updates include ARMv8.4 TLBI range operations and translation level hint, time namespace support, and perf. Summary: - Removal of the tremendously unpopular read_barrier_depends() barrier, which is a NOP on all architectures apart from Alpha, in favour of allowing architectures to override READ_ONCE() and do whatever dance they need to do to ensure address dependencies provide LOAD -> LOAD/STORE ordering. This work also offers a potential solution if compilers are shown to convert LOAD -> LOAD address dependencies into control dependencies (e.g. under LTO), as weakly ordered architectures will effectively be able to upgrade READ_ONCE() to smp_load_acquire(). The latter case is not used yet, but will be discussed further at LPC. - Make the MSI/IOMMU input/output ID translation PCI agnostic, augment the MSI/IOMMU ACPI/OF ID mapping APIs to accept an input ID bus-specific parameter and apply the resulting changes to the device ID space provided by the Freescale FSL bus. - arm64 support for TLBI range operations and translation table level hints (part of the ARMv8.4 architecture version). - Time namespace support for arm64. - Export the virtual and physical address sizes in vmcoreinfo for makedumpfile and crash utilities. - CPU feature handling cleanups and checks for programmer errors (overlapping bit-fields). - ACPI updates for arm64: disallow AML accesses to EFI code regions and kernel memory. - perf updates for arm64. - Miscellaneous fixes and cleanups, most notably PLT counting optimisation for module loading, recordmcount fix to ignore relocations other than R_AARCH64_CALL26, CMA areas reserved for gigantic pages on 16K and 64K configurations. - Trivial typos, duplicate words" Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (82 commits) arm64: use IRQ_STACK_SIZE instead of THREAD_SIZE for irq stack arm64/mm: save memory access in check_and_switch_context() fast switch path arm64: sigcontext.h: delete duplicated word arm64: ptrace.h: delete duplicated word arm64: pgtable-hwdef.h: delete duplicated words bus: fsl-mc: Add ACPI support for fsl-mc bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver of/irq: Make of_msi_map_rid() PCI bus agnostic of/irq: make of_msi_map_get_device_domain() bus agnostic dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus of/device: Add input id to of_dma_configure() of/iommu: Make of_map_rid() PCI agnostic ACPI/IORT: Add an input ID to acpi_dma_configure() ACPI/IORT: Remove useless PCI bus walk ACPI/IORT: Make iort_msi_map_rid() PCI agnostic ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC arm64: enable time namespace support arm64/vdso: Restrict splitting VVAR VMA arm64/vdso: Handle faults on timens page ...
2020-08-03Merge tag 's390-5.9-1' of ↵Linus Torvalds1-28/+29
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Add support for function error injection. - Add support for custom exception handlers, as required by BPF_PROBE_MEM. - Add support for BPF_PROBE_MEM. - Add trace events for idle enter / exit for the s390 specific idle implementation. - Remove unused zcore memmmap device. - Remove unused "raw view" from s390 debug feature. - AP bus + zcrypt device driver code refactoring. - Provide cex4 cca sysfs attributes for cex3 for zcrypt device driver. - Expose only minimal interface to walk physmem for mm/memblock. This is a common code change and it has been agreed on with Mike Rapoport and Andrew Morton that this can go upstream via the s390 tree. - Rework of the s390 vmem/vmmemap code to allow for future memory hot remove. - Get rid of FORCE_MAX_ZONEORDER to finally allow for order-10 allocations again, instead of only order-8 allocations. - Various small improvements and fixes. * tag 's390-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits) s390/vmemmap: coding style updates s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections s390/vmemmap: remember unused sub-pmd ranges s390/vmemmap: fallback to PTEs if mapping large PMD fails s390/vmem: cleanup empty page tables s390/vmemmap: take the vmem_mutex when populating/freeing s390/vmemmap: cleanup when vmemmap_populate() fails s390/vmemmap: extend modify_pagetable() to handle vmemmap s390/vmem: consolidate vmem_add_range() and vmem_remove_range() s390/vmem: rename vmem_add_mem() to vmem_add_range() s390: enable HAVE_FUNCTION_ERROR_INJECTION s390/pci: clarify comment in s390_mmio_read/write s390/time: improve comparison for tod steering s390/time: select CLOCKSOURCE_VALIDATE_LAST_CYCLE s390/time: use CLOCKSOURCE_MASK s390/bpf: implement BPF_PROBE_MEM s390/kernel: expand exception table logic to allow new handling options s390/kernel: unify EX_TABLE* implementations s390/mm: allow order 10 allocations s390/mm: avoid trimming to MAX_ORDER ...
2020-08-03Merge tag 'for-5.9/io_uring-20200802' of git://git.kernel.dk/linux-blockLinus Torvalds1-27/+64
Pull io_uring updates from Jens Axboe: "Lots of cleanups in here, hardening the code and/or making it easier to read and fixing bugs, but a core feature/change too adding support for real async buffered reads. With the latter in place, we just need buffered write async support and we're done relying on kthreads for the fast path. In detail: - Cleanup how memory accounting is done on ring setup/free (Bijan) - sq array offset calculation fixup (Dmitry) - Consistently handle blocking off O_DIRECT submission path (me) - Support proper async buffered reads, instead of relying on kthread offload for that. This uses the page waitqueue to drive retries from task_work, like we handle poll based retry. (me) - IO completion optimizations (me) - Fix race with accounting and ring fd install (me) - Support EPOLLEXCLUSIVE (Jiufei) - Get rid of the io_kiocb unionizing, made possible by shrinking other bits (Pavel) - Completion side cleanups (Pavel) - Cleanup REQ_F_ flags handling, and kill off many of them (Pavel) - Request environment grabbing cleanups (Pavel) - File and socket read/write cleanups (Pavel) - Improve kiocb_set_rw_flags() (Pavel) - Tons of fixes and cleanups (Pavel) - IORING_SQ_NEED_WAKEUP clear fix (Xiaoguang)" * tag 'for-5.9/io_uring-20200802' of git://git.kernel.dk/linux-block: (127 commits) io_uring: flip if handling after io_setup_async_rw fs: optimise kiocb_set_rw_flags() io_uring: don't touch 'ctx' after installing file descriptor io_uring: get rid of atomic FAA for cq_timeouts io_uring: consolidate *_check_overflow accounting io_uring: fix stalled deferred requests io_uring: fix racy overflow count reporting io_uring: deduplicate __io_complete_rw() io_uring: de-unionise io_kiocb io-wq: update hash bits io_uring: fix missing io_queue_linked_timeout() io_uring: mark ->work uninitialised after cleanup io_uring: deduplicate io_grab_files() calls io_uring: don't do opcode prep twice io_uring: clear IORING_SQ_NEED_WAKEUP after executing task works io_uring: batch put_task_struct() tasks: add put_task_struct_many() io_uring: return locked and pinned page accounting io_uring: don't miscount pinned memory io_uring: don't open-code recv kbuf managment ...
2020-08-03Merge tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-blockLinus Torvalds3-145/+31
Pull core block updates from Jens Axboe: "Good amount of cleanups and tech debt removals in here, and as a result, the diffstat shows a nice net reduction in code. - Softirq completion cleanups (Christoph) - Stop using ->queuedata (Christoph) - Cleanup bd claiming (Christoph) - Use check_events, moving away from the legacy media change (Christoph) - Use inode i_blkbits consistently (Christoph) - Remove old unused writeback congestion bits (Christoph) - Cleanup/unify submission path (Christoph) - Use bio_uninit consistently, instead of bio_disassociate_blkg (Christoph) - sbitmap cleared bits handling (John) - Request merging blktrace event addition (Jan) - sysfs add/remove race fixes (Luis) - blk-mq tag fixes/optimizations (Ming) - Duplicate words in comments (Randy) - Flush deferral cleanup (Yufen) - IO context locking/retry fixes (John) - struct_size() usage (Gustavo) - blk-iocost fixes (Chengming) - blk-cgroup IO stats fixes (Boris) - Various little fixes" * tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits) block: blk-timeout: delete duplicated word block: blk-mq-sched: delete duplicated word block: blk-mq: delete duplicated word block: genhd: delete duplicated words block: elevator: delete duplicated word and fix typos block: bio: delete duplicated words block: bfq-iosched: fix duplicated word iocost_monitor: start from the oldest usage index iocost: Fix check condition of iocg abs_vdebt block: Remove callback typedefs for blk_mq_ops block: Use non _rcu version of list functions for tag_set_list blk-cgroup: show global disk stats in root cgroup io.stat blk-cgroup: make iostat functions visible to stat printing block: improve discard bio alignment in __blkdev_issue_discard() block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers block: defer flush request no matter whether we have elevator block: make blk_timeout_init() static block: remove retry loop in ioc_release_fn() block: remove unnecessary ioc nested locking block: integrate bd_start_claiming into __blkdev_get ...
2020-08-02list: add "list_del_init_careful()" to go with "list_empty_careful()"Linus Torvalds1-6/+1
That gives us ordering guarantees around the pair. Signed-off-by: Linus Torvalds <[email protected]>
2020-08-02mm: rewrite wait_on_page_bit_common() logicLinus Torvalds1-47/+85
It turns out that wait_on_page_bit_common() had several problems, ranging from just unfair behavioe due to re-queueing at the end of the wait queue when re-trying, and an outright bug that could result in missed wakeups (but probably never happened in practice). This rewrites the whole logic to avoid both issues, by simply moving the logic to check (and possibly take) the bit lock into the wakeup path instead. That makes everything much more straightforward, and means that we never need to re-queue the wait entry: if we get woken up, we'll be notified through WQ_FLAG_WOKEN, and the wait queue entry will have been removed, and everything will have been done for us. Link: https://lore.kernel.org/lkml/CAHk-=wjJA2Z3kUFb-5s=6+n0qbTs8ELqKFt9B3pH85a8fGD73w@mail.gmail.com/ Link: https://lore.kernel.org/lkml/[email protected]/ Reported-by: Oleg Nesterov <[email protected]> Reported-by: Hugh Dickins <[email protected]> Cc: Michal Hocko <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-07-31Merge branch 'for-mingo' of ↵Ingo Molnar2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull the v5.9 RCU bits from Paul E. McKenney: - Documentation updates - Miscellaneous fixes - kfree_rcu updates - RCU tasks updates - Read-side scalability tests - SRCU updates - Torture-test updates Signed-off-by: Ingo Molnar <[email protected]>
2020-07-28mm/notifier: add migration invalidation typeRalph Campbell1-1/+7
Currently migrate_vma_setup() calls mmu_notifier_invalidate_range_start() which flushes all device private page mappings whether or not a page is being migrated to/from device private memory. In order to not disrupt device mappings that are not being migrated, shift the responsibility for clearing device private mappings to the device driver and leave CPU page table unmapping handled by migrate_vma_setup(). To support this, the caller of migrate_vma_setup() should always set struct migrate_vma::pgmap_owner to a non NULL value that matches the device private page->pgmap->owner. This value is then passed to the struct mmu_notifier_range with a new event type which the driver's invalidation function can use to avoid device MMU invalidations. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ralph Campbell <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-07-28mm/migrate: add a flags parameter to migrate_vmaRalph Campbell1-2/+4
The src_owner field in struct migrate_vma is being used for two purposes, it acts as a selection filter for which types of pages are to be migrated and it identifies device private pages owned by the caller. Split this into separate parameters so the src_owner field can be used just to identify device private pages owned by the caller of migrate_vma_setup(). Rename the src_owner field to pgmap_owner to reflect it is now used only to identify which device private pages to migrate. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ralph Campbell <[email protected]> Reviewed-by: Bharata B Rao <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-07-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller8-21/+88
The UDP reuseport conflict was a little bit tricky. The net-next code, via bpf-next, extracted the reuseport handling into a helper so that the BPF sk lookup code could invoke it. At the same time, the logic for reuseport handling of unconnected sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace which changed the logic to carry on the reuseport result into the rest of the lookup loop if we do not return immediately. This requires moving the reuseport_has_conns() logic into the callers. While we are here, get rid of inline directives as they do not belong in foo.c files. The other changes were cases of more straightforward overlapping modifications. Signed-off-by: David S. Miller <[email protected]>
2020-07-24khugepaged: fix null-pointer dereference due to raceKirill A. Shutemov1-0/+3
khugepaged has to drop mmap lock several times while collapsing a page. The situation can change while the lock is dropped and we need to re-validate that the VMA is still in place and the PMD is still subject for collapse. But we miss one corner case: while collapsing an anonymous pages the VMA could be replaced with file VMA. If the file VMA doesn't have any private pages we get NULL pointer dereference: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] anon_vma_lock_write include/linux/rmap.h:120 [inline] collapse_huge_page mm/khugepaged.c:1110 [inline] khugepaged_scan_pmd mm/khugepaged.c:1349 [inline] khugepaged_scan_mm_slot mm/khugepaged.c:2110 [inline] khugepaged_do_scan mm/khugepaged.c:2193 [inline] khugepaged+0x3bba/0x5a10 mm/khugepaged.c:2238 The fix is to make sure that the VMA is anonymous in hugepage_vma_revalidate(). The helper is only used for collapsing anonymous pages. Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Reported-by: [email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Yang Shi <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-07-24mm/hugetlb: avoid hardcoding while checking if cma is enabledBarry Song1-5/+10
hugetlb_cma[0] can be NULL due to various reasons, for example, node0 has no memory. so NULL hugetlb_cma[0] doesn't necessarily mean cma is not enabled. gigantic pages might have been reserved on other nodes. This patch fixes possible double reservation and CMA leak. [[email protected]: fix CONFIG_CMA=n warning] [[email protected]: better checks before using hugetlb_cma] Link: http://lkml.kernel.org/r/[email protected] Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") Signed-off-by: Barry Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-07-24mm: memcg/slab: fix memory leak at non-root kmem_cache destroyMuchun Song1-7/+28
If the kmem_cache refcount is greater than one, we should not mark the root kmem_cache as dying. If we mark the root kmem_cache dying incorrectly, the non-root kmem_cache can never be destroyed. It resulted in memory leak when memcg was destroyed. We can use the following steps to reproduce. 1) Use kmem_cache_create() to create a new kmem_cache named A. 2) Coincidentally, the kmem_cache A is an alias for kmem_cache B, so the refcount of B is just increased. 3) Use kmem_cache_destroy() to destroy the kmem_cache A, just decrease the B's refcount but mark the B as dying. 4) Create a new memory cgroup and alloc memory from the kmem_cache B. It leads to create a non-root kmem_cache for allocating memory. 5) When destroy the memory cgroup created in the step 4), the non-root kmem_cache can never be destroyed. If we repeat steps 4) and 5), this will cause a lot of memory leak. So only when refcount reach zero, we mark the root kmem_cache as dying. Fixes: 92ee383f6daa ("mm: fix race between kmem_cache destroy, create and deactivate") Signed-off-by: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>