aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2021-06-30mm/thp: make ARCH_ENABLE_SPLIT_PMD_PTLOCK dependent on PGTABLE_LEVELS > 2Anshuman Khandual2-2/+2
ARCH_ENABLE_SPLIT_PMD_PTLOCK is irrelevant unless there are more than two page table levels including PMD (also per Documentation/vm/split_page_table_lock.rst). Make this dependency explicit on remaining platforms i.e x86 and s390 where ARCH_ENABLE_SPLIT_PMD_PTLOCK is subscribed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Gerald Schaefer <[email protected]> # s390 Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30arm64/mm: drop HAVE_ARCH_PFN_VALIDAnshuman Khandual3-39/+0
CONFIG_SPARSEMEM_VMEMMAP is now the only available memory model on arm64 platforms and free_unused_memmap() would just return without creating any holes in the memmap mapping. There is no need for any special handling in pfn_valid() and HAVE_ARCH_PFN_VALID can just be dropped. This also moves the pfn upper bits sanity check into generic pfn_valid(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30arm64: drop pfn_valid_within() and simplify pfn_valid()Mike Rapoport2-2/+1
The arm64's version of pfn_valid() differs from the generic because of two reasons: * Parts of the memory map are freed during boot. This makes it necessary to verify that there is actual physical memory that corresponds to a pfn which is done by querying memblock. * There are NOMAP memory regions. These regions are not mapped in the linear map and until the previous commit the struct pages representing these areas had default values. As the consequence of absence of the special treatment of NOMAP regions in the memory map it was necessary to use memblock_is_map_memory() in pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that generic mm functionality would not treat a NOMAP page as a normal page. Since the NOMAP regions are now marked as PageReserved(), pfn walkers and the rest of core mm will treat them as unusable memory and thus pfn_valid_within() is no longer required at all and can be disabled on arm64. pfn_valid() can be slightly simplified by replacing memblock_is_map_memory() with memblock_is_memory(). [[email protected]: fix merge fix] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Reviewed-by: Kefeng Wang <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30arm64: decouple check whether pfn is in linear map from pfn_valid()Mike Rapoport6-6/+19
The intended semantics of pfn_valid() is to verify whether there is a struct page for the pfn in question and nothing else. Yet, on arm64 it is used to distinguish memory areas that are mapped in the linear map vs those that require ioremap() to access them. Introduce a dedicated pfn_is_map_memory() wrapper for memblock_is_map_memory() to perform such check and use it where appropriate. Using a wrapper allows to avoid cyclic include dependencies. While here also update style of pfn_valid() so that both pfn_valid() and pfn_is_map_memory() declarations will be consistent. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Reviewed-by: Kefeng Wang <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30mm/kconfig: move HOLES_IN_ZONE into mmKefeng Wang3-9/+1
commit a55749639dc1 ("ia64: drop marked broken DISCONTIGMEM and VIRTUAL_MEM_MAP") drop VIRTUAL_MEM_MAP, so there is no need HOLES_IN_ZONE on ia64. Also move HOLES_IN_ZONE into mm/Kconfig, select it if architecture needs this feature. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Acked-by: Catalin Marinas <[email protected]> [arm64] Cc: Will Deacon <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30mm: sparsemem: use huge PMD mapping for vmemmap pagesMuchun Song1-6/+2
The preparation of splitting huge PMD mapping of vmemmap pages is ready, so switch the mapping from PTE to PMD. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Chen Huang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Xiongchun Duan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30powerpc/8xx: add support for huge pages on VMAP and VMALLOCChristophe Leroy2-1/+44
powerpc 8xx has 4 page sizes: - 4k - 16k - 512k - 8M At the time being, vmalloc and vmap only support huge pages which are leaf at PMD level. Here the PMD level is 4M, it doesn't correspond to any supported page size. For now, implement use of 16k and 512k pages which is done at PTE level. Support of 8M pages will be implemented later, it requires vmalloc to support hugepd tables. Link: https://lkml.kernel.org/r/8b972f1c03fb6bd59953035f0a3e4d26659de4f8.1620795204.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Uladzislau Rezki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30mm/pgtable: add stubs for {pmd/pub}_{set/clear}_hugeChristophe Leroy2-23/+31
For architectures with no PMD and/or no PUD, add stubs similar to what we have for architectures without P4D. [[email protected]: arm64: define only {pud/pmd}_{set/clear}_huge when useful] Link: https://lkml.kernel.org/r/73ec95f40cafbbb69bdfb43a7f53876fd845b0ce.1620990479.git.christophe.leroy@csgroup.eu [[email protected]: x86: define only {pud/pmd}_{set/clear}_huge when useful] Link: https://lkml.kernel.org/r/7fbf1b6bc3e15c07c24fa45278d57064f14c896b.1620930415.git.christophe.leroy@csgroup.eu Link: https://lkml.kernel.org/r/5ac5976419350e8e048d463a64cae449eb3ba4b0.1620795204.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Naresh Kamboju <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30mm/hugetlb: change parameters of arch_make_huge_pte()Christophe Leroy5-14/+8
Patch series "Subject: [PATCH v2 0/5] Implement huge VMAP and VMALLOC on powerpc 8xx", v2. This series implements huge VMAP and VMALLOC on powerpc 8xx. Powerpc 8xx has 4 page sizes: - 4k - 16k - 512k - 8M At the time being, vmalloc and vmap only support huge pages which are leaf at PMD level. Here the PMD level is 4M, it doesn't correspond to any supported page size. For now, implement use of 16k and 512k pages which is done at PTE level. Support of 8M pages will be implemented later, it requires use of hugepd tables. To allow this, the architecture provides two functions: - arch_vmap_pte_range_map_size() which tells vmap_pte_range() what page size to use. A stub returning PAGE_SIZE is provided when the architecture doesn't provide this function. - arch_vmap_pte_supported_shift() which tells __vmalloc_node_range() what page shift to use for a given area size. A stub returning PAGE_SHIFT is provided when the architecture doesn't provide this function. This patch (of 5): At the time being, arch_make_huge_pte() has the following prototype: pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma, struct page *page, int writable); vma is used to get the pages shift or size. vma is also used on Sparc to get vm_flags. page is not used. writable is not used. In order to use this function without a vma, replace vma by shift and flags. Also remove the used parameters. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/f4633ac6a7da2f22f31a04a89e0a7026bb78b15b.1620795204.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <[email protected]> Acked-by: Mike Kravetz <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Uladzislau Rezki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30mm: hugetlb: add a kernel parameter hugetlb_free_vmemmapMuchun Song1-2/+6
Add a kernel parameter hugetlb_free_vmemmap to enable the feature of freeing unused vmemmap pages associated with each hugetlb page on boot. We disable PMD mapping of vmemmap pages for x86-64 arch when this feature is enabled. Because vmemmap_remap_free() depends on vmemmap being base page mapped. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: Barry Song <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Tested-by: Chen Huang <[email protected]> Tested-by: Bodeddula Balasubramaniam <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: HORIGUCHI NAOYA <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joao Martins <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Neukum <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Pawan Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Xiongchun Duan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAPMuchun Song1-1/+1
The option HUGETLB_PAGE_FREE_VMEMMAP allows for the freeing of some vmemmap pages associated with pre-allocated HugeTLB pages. For example, on X86_64 6 vmemmap pages of size 4KB each can be saved for each 2MB HugeTLB page. 4094 vmemmap pages of size 4KB each can be saved for each 1GB HugeTLB page. When a HugeTLB page is allocated or freed, the vmemmap array representing the range associated with the page will need to be remapped. When a page is allocated, vmemmap pages are freed after remapping. When a page is freed, previously discarded vmemmap pages must be allocated before remapping. The config option is introduced early so that supporting code can be written to depend on the option. The initial version of the code only provides support for x86-64. If config HAVE_BOOTMEM_INFO_NODE is enabled, the freeing vmemmap page code denpend on it to free vmemmap pages. Otherwise, just use free_reserved_page() to free vmemmmap pages. The routine register_page_bootmem_info() is used to register bootmem info. Therefore, make sure register_page_bootmem_info is enabled if HUGETLB_PAGE_FREE_VMEMMAP is defined. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Mike Kravetz <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Tested-by: Chen Huang <[email protected]> Tested-by: Bodeddula Balasubramaniam <[email protected]> Reviewed-by: Balbir Singh <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Barry Song <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: HORIGUCHI NAOYA <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joao Martins <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Oliver Neukum <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Pawan Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Xiongchun Duan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-30mm: memory_hotplug: factor out bootmem core functions to bootmem_info.cMuchun Song2-1/+3
Patch series "Free some vmemmap pages of HugeTLB page", v23. This patch series will free some vmemmap pages(struct page structures) associated with each HugeTLB page when preallocated to save memory. In order to reduce the difficulty of the first version of code review. In this version, we disable PMD/huge page mapping of vmemmap if this feature was enabled. This acutely eliminates a bunch of the complex code doing page table manipulation. When this patch series is solid, we cam add the code of vmemmap page table manipulation in the future. The struct page structures (page structs) are used to describe a physical page frame. By default, there is an one-to-one mapping from a page frame to it's corresponding page struct. The HugeTLB pages consist of multiple base page size pages and is supported by many architectures. See hugetlbpage.rst in the Documentation directory for more details. On the x86 architecture, HugeTLB pages of size 2MB and 1GB are currently supported. Since the base page size on x86 is 4KB, a 2MB HugeTLB page consists of 512 base pages and a 1GB HugeTLB page consists of 4096 base pages. For each base page, there is a corresponding page struct. Within the HugeTLB subsystem, only the first 4 page structs are used to contain unique information about a HugeTLB page. HUGETLB_CGROUP_MIN_ORDER provides this upper limit. The only 'useful' information in the remaining page structs is the compound_head field, and this field is the same for all tail pages. By removing redundant page structs for HugeTLB pages, memory can returned to the buddy allocator for other uses. When the system boot up, every 2M HugeTLB has 512 struct page structs which size is 8 pages(sizeof(struct page) * 512 / PAGE_SIZE). HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | -------------> | 1 | | | +-----------+ +-----------+ | | | 2 | -------------> | 2 | | | +-----------+ +-----------+ | | | 3 | -------------> | 3 | | | +-----------+ +-----------+ | | | 4 | -------------> | 4 | | 2MB | +-----------+ +-----------+ | | | 5 | -------------> | 5 | | | +-----------+ +-----------+ | | | 6 | -------------> | 6 | | | +-----------+ +-----------+ | | | 7 | -------------> | 7 | | | +-----------+ +-----------+ | | | | | | +-----------+ The value of page->compound_head is the same for all tail pages. The first page of page structs (page 0) associated with the HugeTLB page contains the 4 page structs necessary to describe the HugeTLB. The only use of the remaining pages of page structs (page 1 to page 7) is to point to page->compound_head. Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs will be used for each HugeTLB page. This will allow us to free the remaining 6 pages to the buddy allocator. Here is how things look after remapping. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | -------------> | 1 | | | +-----------+ +-----------+ | | | 2 | ----------------^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | 3 | ------------------+ | | | | | | +-----------+ | | | | | | | 4 | --------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | ----------------------+ | | | | +-----------+ | | | | | 6 | ------------------------+ | | | +-----------+ | | | | 7 | --------------------------+ | | +-----------+ | | | | | | +-----------+ When a HugeTLB is freed to the buddy system, we should allocate 6 pages for vmemmap pages and restore the previous mapping relationship. Apart from 2MB HugeTLB page, we also have 1GB HugeTLB page. It is similar to the 2MB HugeTLB page. We also can use this approach to free the vmemmap pages. In this case, for the 1GB HugeTLB page, we can save 4094 pages. This is a very substantial gain. On our server, run some SPDK/QEMU applications which will use 1024GB HugeTLB page. With this feature enabled, we can save ~16GB (1G hugepage)/~12GB (2MB hugepage) memory. Because there are vmemmap page tables reconstruction on the freeing/allocating path, it increases some overhead. Here are some overhead analysis. 1) Allocating 10240 2MB HugeTLB pages. a) With this patch series applied: # time echo 10240 > /proc/sys/vm/nr_hugepages real 0m0.166s user 0m0.000s sys 0m0.166s # bpftrace -e 'kprobe:alloc_fresh_huge_page { @start[tid] = nsecs; } kretprobe:alloc_fresh_huge_page /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }' Attaching 2 probes... @latency: [8K, 16K) 5476 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16K, 32K) 4760 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [32K, 64K) 4 | | b) Without this patch series: # time echo 10240 > /proc/sys/vm/nr_hugepages real 0m0.067s user 0m0.000s sys 0m0.067s # bpftrace -e 'kprobe:alloc_fresh_huge_page { @start[tid] = nsecs; } kretprobe:alloc_fresh_huge_page /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }' Attaching 2 probes... @latency: [4K, 8K) 10147 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [8K, 16K) 93 | | Summarize: this feature is about ~2x slower than before. 2) Freeing 10240 2MB HugeTLB pages. a) With this patch series applied: # time echo 0 > /proc/sys/vm/nr_hugepages real 0m0.213s user 0m0.000s sys 0m0.213s # bpftrace -e 'kprobe:free_pool_huge_page { @start[tid] = nsecs; } kretprobe:free_pool_huge_page /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }' Attaching 2 probes... @latency: [8K, 16K) 6 | | [16K, 32K) 10227 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32K, 64K) 7 | | b) Without this patch series: # time echo 0 > /proc/sys/vm/nr_hugepages real 0m0.081s user 0m0.000s sys 0m0.081s # bpftrace -e 'kprobe:free_pool_huge_page { @start[tid] = nsecs; } kretprobe:free_pool_huge_page /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }' Attaching 2 probes... @latency: [4K, 8K) 6805 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [8K, 16K) 3427 |@@@@@@@@@@@@@@@@@@@@@@@@@@ | [16K, 32K) 8 | | Summary: The overhead of __free_hugepage is about ~2-3x slower than before. Although the overhead has increased, the overhead is not significant. Like Mike said, "However, remember that the majority of use cases create HugeTLB pages at or shortly after boot time and add them to the pool. So, additional overhead is at pool creation time. There is no change to 'normal run time' operations of getting a page from or returning a page to the pool (think page fault/unmap)". Despite the overhead and in addition to the memory gains from this series. The following data is obtained by Joao Martins. Very thanks to his effort. There's an additional benefit which is page (un)pinners will see an improvement and Joao presumes because there are fewer memmap pages and thus the tail/head pages are staying in cache more often. Out of the box Joao saw (when comparing linux-next against linux-next + this series) with gup_test and pinning a 16G HugeTLB file (with 1G pages): get_user_pages(): ~32k -> ~9k unpin_user_pages(): ~75k -> ~70k Usually any tight loop fetching compound_head(), or reading tail pages data (e.g. compound_head) benefit a lot. There's some unpinning inefficiencies Joao was fixing[2], but with that in added it shows even more: unpin_user_pages(): ~27k -> ~3.8k [1] https://lore.kernel.org/linux-mm/[email protected]/ [2] https://lore.kernel.org/linux-mm/[email protected]/ This patch (of 9): Move bootmem info registration common API to individual bootmem_info.c. And we will use {get,put}_page_bootmem() to initialize the page for the vmemmap pages or free the vmemmap pages to buddy in the later patch. So move them out of CONFIG_MEMORY_HOTPLUG_SPARSE. This is just code movement without any functional change. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Acked-by: Mike Kravetz <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Tested-by: Chen Huang <[email protected]> Tested-by: Bodeddula Balasubramaniam <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Pawan Gupta <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Oliver Neukum <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Mina Almasry <[email protected]> Cc: David Rientjes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Barry Song <[email protected]> Cc: HORIGUCHI NAOYA <[email protected]> Cc: Joao Martins <[email protected]> Cc: Xiongchun Duan <[email protected]> Cc: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-01ARM: dts: aspeed: Update e3c246d4i vuart propertiesZev Weiss1-1/+3
This device-tree was merged with a provisional vuart IRQ-polarity property that was still under review and ended up taking a somewhat different form. This patch updates it to match the final form of the new vuart properties, which additionally allow specifying the SIRQ number and LPC address. Signed-off-by: Zev Weiss <[email protected]> Reviewed-by: Andrew Jeffery <[email protected]> Fixes: ca03042f0f12 ("serial: 8250_aspeed_vuart: add aspeed, lpc-io-reg and aspeed, lpc-interrupts DT properties") Reviewed-by: Joel Stanley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joel Stanley <[email protected]>
2021-06-30Merge tag 'net-next-5.14' of ↵Linus Torvalds13-135/+1212
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - BPF: - add syscall program type and libbpf support for generating instructions and bindings for in-kernel BPF loaders (BPF loaders for BPF), this is a stepping stone for signed BPF programs - infrastructure to migrate TCP child sockets from one listener to another in the same reuseport group/map to improve flexibility of service hand-off/restart - add broadcast support to XDP redirect - allow bypass of the lockless qdisc to improving performance (for pktgen: +23% with one thread, +44% with 2 threads) - add a simpler version of "DO_ONCE()" which does not require jump labels, intended for slow-path usage - virtio/vsock: introduce SOCK_SEQPACKET support - add getsocketopt to retrieve netns cookie - ip: treat lowest address of a IPv4 subnet as ordinary unicast address allowing reclaiming of precious IPv4 addresses - ipv6: use prandom_u32() for ID generation - ip: add support for more flexible field selection for hashing across multi-path routes (w/ offload to mlxsw) - icmp: add support for extended RFC 8335 PROBE (ping) - seg6: add support for SRv6 End.DT46 behavior - mptcp: - DSS checksum support (RFC 8684) to detect middlebox meddling - support Connection-time 'C' flag - time stamping support - sctp: packetization Layer Path MTU Discovery (RFC 8899) - xfrm: speed up state addition with seq set - WiFi: - hidden AP discovery on 6 GHz and other HE 6 GHz improvements - aggregation handling improvements for some drivers - minstrel improvements for no-ack frames - deferred rate control for TXQs to improve reaction times - switch from round robin to virtual time-based airtime scheduler - add trace points: - tcp checksum errors - openvswitch - action execution, upcalls - socket errors via sk_error_report Device APIs: - devlink: add rate API for hierarchical control of max egress rate of virtual devices (VFs, SFs etc.) - don't require RCU read lock to be held around BPF hooks in NAPI context - page_pool: generic buffer recycling New hardware/drivers: - mobile: - iosm: PCIe Driver for Intel M.2 Modem - support for Qualcomm MSM8998 (ipa) - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU) - NXP SJA1110 Automotive Ethernet 10-port switch - Qualcomm QCA8327 switch support (qca8k) - Mikrotik 10/25G NIC (atl1c) Driver changes: - ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP (our first foray into MAC/PHY description via ACPI) - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx - Mellanox/Nvidia NIC (mlx5) - NIC VF offload of L2 bridging - support IRQ distribution to Sub-functions - Marvell (prestera): - add flower and match all - devlink trap - link aggregation - Netronome (nfp): connection tracking offload - Intel 1GE (igc): add AF_XDP support - Marvell DPU (octeontx2): ingress ratelimit offload - Google vNIC (gve): new ring/descriptor format support - Qualcomm mobile (rmnet & ipa): inline checksum offload support - MediaTek WiFi (mt76) - mt7915 MSI support - mt7915 Tx status reporting - mt7915 thermal sensors support - mt7921 decapsulation offload - mt7921 enable runtime pm and deep sleep - Realtek WiFi (rtw88) - beacon filter support - Tx antenna path diversity support - firmware crash information via devcoredump - Qualcomm WiFi (wcn36xx) - Wake-on-WLAN support with magic packets and GTK rekeying - Micrel PHY (ksz886x/ksz8081): add cable test support" * tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits) tcp: change ICSK_CA_PRIV_SIZE definition tcp_yeah: check struct yeah size at compile time gve: DQO: Fix off by one in gve_rx_dqo() stmmac: intel: set PCI_D3hot in suspend stmmac: intel: Enable PHY WOL option in EHL net: stmmac: option to enable PHY WOL with PMT enabled net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del} net: use netdev_info in ndo_dflt_fdb_{add,del} ptp: Set lookup cookie when creating a PTP PPS source. net: sock: add trace for socket errors net: sock: introduce sk_error_report net: dsa: replay the local bridge FDB entries pointing to the bridge dev too net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev net: dsa: include fdb entries pointing to bridge in the host fdb list net: dsa: include bridge addresses which are local in the host fdb list net: dsa: sync static FDB entries on foreign interfaces to hardware net: dsa: install the host MDB and FDB entries in the master's RX filter net: dsa: reference count the FDB addresses at the cross-chip notifier level net: dsa: introduce a separate cross-chip notifier type for host FDBs net: dsa: reference count the MDB entries at the cross-chip notifier level ...
2021-06-30Merge tag 'microblaze-v5.14' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds2-4/+1
Pull microblaze updates from Michal Simek: - Remove unused PAGE_UP/DOWN macros - Fix trivial spelling mistake * tag 'microblaze-v5.14' of git://git.monstr.eu/linux-2.6-microblaze: arch: microblaze: Fix spelling mistake "vesion" -> "version" microblaze: Cleanup unused functions
2021-06-30ubd: remove dead code in ubd_setup_commonChristoph Hellwig1-10/+0
Remove some leftovers of the fake major number parsing that cause complains from some compilers. Fixes: 2933a1b2c6f3 ("ubd: remove the code to register as the legacy IDE driver") Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-30ubd: use blk_mq_alloc_disk and blk_cleanup_diskChristoph Hellwig1-30/+14
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-30ubd: remove the code to register as the legacy IDE driverChristoph Hellwig1-94/+12
With the legacy IDE driver long deprecated, and modern userspace being much more flexible about dev_t assignments there is no reason to fake a registration as the legacy IDE driver in ubd. This registeration is a little problematic as it registers the same request_queue for multiple gendisks, so just remove it. Signed-off-by: Christoph Hellwig <[email protected]> Acked-By: Anton Ivanov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-06-30Merge tag 'clang-features-v5.14-rc1' of ↵Linus Torvalds6-18/+28
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull clang feature updates from Kees Cook: - Add CC_HAS_NO_PROFILE_FN_ATTR in preparation for PGO support in the face of the noinstr attribute, paving the way for PGO and fixing GCOV. (Nick Desaulniers) - x86_64 LTO coverage is expanded to 32-bit x86. (Nathan Chancellor) - Small fixes to CFI. (Mark Rutland, Nathan Chancellor) * tag 'clang-features-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR compiler_attributes.h: cleanups for GCC 4.9+ compiler_attributes.h: define __no_profile, add to noinstr x86, lto: Enable Clang LTO for 32-bit as well CFI: Move function_nocfi() into compiler.h MAINTAINERS: Add Clang CFI section
2021-06-30Merge tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-blockLinus Torvalds2-37/+12
Pull core block updates from Jens Axboe: - disk events cleanup (Christoph) - gendisk and request queue allocation simplifications (Christoph) - bdev_disk_changed cleanups (Christoph) - IO priority improvements (Bart) - Chained bio completion trace fix (Edward) - blk-wbt fixes (Jan) - blk-wbt enable/disable fix (Zhang) - Scheduler dispatch improvements (Jan, Ming) - Shared tagset scheduler improvements (John) - BFQ updates (Paolo, Luca, Pietro) - BFQ lock inversion fix (Jan) - Documentation improvements (Kir) - CLONE_IO block cgroup fix (Tejun) - Remove of ancient and deprecated block dump feature (zhangyi) - Discard merge fix (Ming) - Misc fixes or followup fixes (Colin, Damien, Dan, Long, Max, Thomas, Yang) * tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block: (129 commits) block: fix discard request merge block/mq-deadline: Remove a WARN_ON_ONCE() call blk-mq: update hctx->dispatch_busy in case of real scheduler blk: Fix lock inversion between ioc lock and bfqd lock bfq: Remove merged request already in bfq_requests_merged() block: pass a gendisk to bdev_disk_changed block: move bdev_disk_changed block: add the events* attributes to disk_attrs block: move the disk events code to a separate file block: fix trace completion for chained bio block/partitions/msdos: Fix typo inidicator -> indicator block, bfq: reset waker pointer with shared queues block, bfq: check waker only for queues with no in-flight I/O block, bfq: avoid delayed merge of async queues block, bfq: boost throughput by extending queue-merging times block, bfq: consider also creation time in delayed stable merge block, bfq: fix delayed stable merge check block, bfq: let also stably merged queues enjoy weight raising blk-wbt: make sure throttle is enabled properly blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled() ...
2021-06-30MIPS: Fix PKMAP with 32-bit MIPS huge page supportWei Li1-1/+1
When 32-bit MIPS huge page support is enabled, we halve the number of pointers a PTE page holds, making its last half go to waste. Correspondingly, we should halve the number of kmap entries, as we just initialized only a single pte table for that in pagetable_init(). Fixes: 35476311e529 ("MIPS: Add partial 32-bit huge page support") Signed-off-by: Wei Li <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2021-06-30MIPS: CI20: Add second percpu timer for SMP.周琰杰 (Zhou Yanjie)1-10/+14
1.Add a new TCU channel as the percpu timer of core1, this is to prepare for the subsequent SMP support. The newly added channel will not adversely affect the current single-core state. 2.Adjust the position of TCU node to make it consistent with the order in jz4780.dtsi file. Tested-by: Nikolaus Schaller <[email protected]> # on CI20 Signed-off-by: 周琰杰 (Zhou Yanjie) <[email protected]> Acked-by: Paul Cercueil <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2021-06-30MIPS: CI20: Reduce clocksource to 750 kHz.周琰杰 (Zhou Yanjie)1-2/+2
The original clock (3 MHz) is too fast for the clocksource, there will be a chance that the system may get stuck. Reported-by: Nikolaus Schaller <[email protected]> Tested-by: Nikolaus Schaller <[email protected]> # on CI20 Signed-off-by: 周琰杰 (Zhou Yanjie) <[email protected]> Acked-by: Paul Cercueil <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2021-06-30MIPS: Ingenic: Add MAC syscon nodes for Ingenic SoCs.周琰杰 (Zhou Yanjie)2-0/+14
Add MAC syscon nodes for X1000 SoC and X1830 SoC from Ingenic. Signed-off-by: 周琰杰 (Zhou Yanjie) <[email protected]> Acked-by: Paul Cercueil <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2021-06-30MIPS: X1830: Respect cell count of common properties.周琰杰 (Zhou Yanjie)1-5/+4
If N fields of X cells should be provided, then that's what the devicetree should represent, instead of having one single field of (N * X) cells. Signed-off-by: 周琰杰 (Zhou Yanjie) <[email protected]> Acked-by: Paul Cercueil <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2021-06-30powerpc/64s: move ret_from_fork etc above __end_soft_maskedNicholas Piggin1-26/+26
Code which runs with interrupts enabled should be moved above __end_soft_masked where possible, because maskable interrupts that hit below that symbol will need to consult the soft mask table, which is an extra cost. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-30powerpc/64s/interrupt: clean up interrupt return labelsNicholas Piggin1-3/+5
Normal kernel-interrupt exits can get interrupt_return_srr_user_restart in their backtrace, which is an unusual and notable function, and it is part of the user-interrupt exit path, which is doubly confusing. Add non-local labels for both user and kernel interrupt exit cases to address this and make the user and kernel cases more symmetric. Also get rid of an unused label. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-30powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbolsNicholas Piggin1-0/+6
If one interrupt exit symbol must not be kprobed, none of them can be, without more justification for why it's safe. Disallow kprobing on any of the (non-local) labels in the exit paths. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-30powerpc/64: enable MSR[EE] in irq replay pt_regsNicholas Piggin2-0/+5
Similar to commit 2b48e96be2f9f ("powerpc/64: fix irq replay pt_regs->softe value"), enable MSR_EE in pt_regs->msr. This makes the regs look more normal. It also allows some extra debug checks to be added to interrupt handler entry. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-30powerpc/64s/interrupt: preserve regs->softe for NMI interruptsNicholas Piggin1-0/+3
If an NMI interrupt hits in an implicit soft-masked region, regs->softe is modified to reflect that. This may not be necessary for correctness at the moment, but it is less surprising and it's unhelpful when debugging or adding checks. Make sure this is changed back to how it was found before returning. Fixes: 4ec5feec1ad0 ("powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled") Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-30powerpc/64s: add a table of implicit soft-masked addressesNicholas Piggin6-11/+106
Commit 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") ends up catching too much code, including ret_from_fork, and parts of interrupt and syscall return that do not expect to be interrupts to be soft-masked. If an interrupt gets marked pending, and then the code proceeds out of the implicit soft-masked region it will fail to deal with the pending interrupt. Fix this by adding a new table of addresses which explicitly marks the regions of code that are soft masked. This table is only checked for interrupts that below __end_soft_masked, so most kernel interrupts will not have the overhead of the table search. Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") Reported-by: Sachin Sant <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Tested-by: Sachin Sant <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-30powerpc/64e: remove implicit soft-masking and interrupt exit restart logicNicholas Piggin4-23/+40
The implicit soft-masking to speed up interrupt return was going to be used by 64e as well, but it has not been extensively tested on that platform and is not considered ready. It was intended to be disabled before merge. Disable it for now. Most of the restart code is common with 64s, so with more correctness and performance testing this could be re-enabled again by adding the extra soft-mask checks to interrupt handlers and flipping exit_must_hard_disable(). Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-30powerpc/64e: fix CONFIG_RELOCATABLE build warningsNicholas Piggin1-0/+11
CONFIG_RELOCATABLE=y causes build warnings from unresolved relocations. Fix these by using TOC addressing for these cases. Commit 24d33ac5b8ff ("powerpc/64s: Make prom_init require RELOCATABLE") caused some 64e configs to select RELOCATABLE resulting in these warnings, but the underlying issue was already there. This passes basic qemu testing. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-30powerpc/64s: fix hash page fault interrupt handlerNicholas Piggin1-13/+11
The early bad fault or key fault test in do_hash_fault() ends up calling into ___do_page_fault without having gone through an interrupt handler wrapper (except the initial _RAW one). This can end up calling local irq functions while the interrupt has not been reconciled, which will likely cause crashes and it trips up on a later patch that adds more assertions. pkey_exec_prot from selftests causes this path to be executed. There is no real reason to run the in_nmi() test should be performed before the key fault check. In fact if a perf interrupt in the hash fault code did a stack walk that was made to take a key fault somehow then running ___do_page_fault could possibly cause another hash fault causing problems. Move the in_nmi() test first, and then do everything else inside the regular interrupt handler function. Fixes: 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers") Reported-by: Sachin Sant <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Tested-by: Sachin Sant <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-30powerpc/4xx: Fix setup_kuep() on SMPChristophe Leroy1-1/+5
On SMP, setup_kuep() is also called from start_secondary() since commit 86f46f343272 ("powerpc/32s: Initialise KUAP and KUEP in C"). start_secondary() is not an __init function. Remove the __init marker from setup_kuep() and bail out when not caller on the first CPU as the work is already done. Fixes: 10248dcba120 ("powerpc/44x: Implement Kernel Userspace Exec Protection (KUEP)") Fixes: 86f46f343272 ("powerpc/32s: Initialise KUAP and KUEP in C") Reported-by: kernel test robot <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/8ee05934288994a65743a987acb1558f12c0c8c1.1624969450.git.christophe.leroy@csgroup.eu
2021-06-30powerpc/32s: Fix setup_{kuap/kuep}() on SMPChristophe Leroy2-2/+2
On SMP, setup_kup() is also called from start_secondary(). start_secondary() is not an __init function. Remove the __init marker from setup_kuep() and setup_kuap(). Fixes: 86f46f343272 ("powerpc/32s: Initialise KUAP and KUEP in C") Reported-by: kernel test robot <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/42f4bd12b476942e4d5dc81c0e839d8871b20b1c.1624863319.git.christophe.leroy@csgroup.eu
2021-06-30arm: extend pfn_valid to take into account freed memory map alignmentMike Rapoport1-1/+12
When unused memory map is freed the preserved part of the memory map is extended to match pageblock boundaries because lots of core mm functionality relies on homogeneity of the memory map within pageblock boundaries. Since pfn_valid() is used to check whether there is a valid memory map entry for a PFN, make it return true also for PFNs that have memory map entries even if there is no actual memory populated there. Signed-off-by: Mike Rapoport <[email protected]> Tested-by: Kefeng Wang <[email protected]> Tested-by: Tony Lindgren <[email protected]>
2021-06-29Merge branch 'akpm' (patches from Andrew)Linus Torvalds75-794/+94
Merge misc updates from Andrew Morton: "191 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab, slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap, mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization, pagealloc, and memory-failure)" * emailed patches from Andrew Morton <[email protected]>: (191 commits) mm,hwpoison: make get_hwpoison_page() call get_any_page() mm,hwpoison: send SIGBUS with error virutal address mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes mm/page_alloc: allow high-order pages to be stored on the per-cpu lists mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA docs: remove description of DISCONTIGMEM arch, mm: remove stale mentions of DISCONIGMEM mm: remove CONFIG_DISCONTIGMEM m68k: remove support for DISCONTIGMEM arc: remove support for DISCONTIGMEM arc: update comment about HIGHMEM implementation alpha: remove DISCONTIGMEM and NUMA mm/page_alloc: move free_the_page mm/page_alloc: fix counting of managed_pages mm/page_alloc: improve memmap_pages dbg msg mm: drop SECTION_SHIFT in code comments mm/page_alloc: introduce vm.percpu_pagelist_high_fraction mm/page_alloc: limit the number of pages on PCP lists when reclaim is active mm/page_alloc: scale the number of pages that are batch freed ...
2021-06-29Merge tag 'acpi-5.14-rc1' of ↵Linus Torvalds1-71/+47
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update the ACPICA code in the kernel to the 20210604 upstream revision, add preliminary support for the Platform Runtime Mechanism (PRM), address issues related to the handling of device dependencies in the ACPI device eunmeration code, improve the tracking of ACPI power resource states, improve the ACPI support for suspend-to-idle on AMD systems, continue the unification of message printing in the ACPI code, address assorted issues and clean up the code in a number of places. Specifics: - Update ACPICA code in the kernel to upstrea revision 20210604 including the following changes: - Add defines for the CXL Host Bridge Structureand and add the CFMWS structure definition to CEDT (Alison Schofield). - iASL: Finish support for the IVRS ACPI table (Bob Moore). - iASL: Add support for the SVKL table (Bob Moore). - iASL: Add full support for RGRT ACPI table (Bob Moore). - iASL: Add support for the BDAT ACPI table (Bob Moore). - iASL: add disassembler support for PRMT (Erik Kaneda). - Fix memory leak caused by _CID repair function (Erik Kaneda). - Add support for PlatformRtMechanism OpRegion (Erik Kaneda). - Add PRMT module header to facilitate parsing (Erik Kaneda). - Add _PLD panel positions (Fabian Wüthrich). - MADT: add Multiprocessor Wakeup Mailbox Structure and the SVKL table headers (Kuppuswamy Sathyanarayanan). - Use ACPI_FALLTHROUGH (Wei Ming Chen). - Add preliminary support for the Platform Runtime Mechanism (PRM) to allow the AML interpreter to call PRM functions (Erik Kaneda). - Address some issues related to the handling of device dependencies reported by _DEP in the ACPI device enumeration code and clean up some related pieces of it (Rafael Wysocki). - Improve the tracking of states of ACPI power resources (Rafael Wysocki). - Improve ACPI support for suspend-to-idle on AMD systems (Alex Deucher, Mario Limonciello, Pratik Vishwakarma). - Continue the unification and cleanup of message printing in the ACPI code (Hanjun Guo, Heiner Kallweit). - Fix possible buffer overrun issue with the description_show() sysfs attribute method (Krzysztof Wilczyński). - Improve the acpi_mask_gpe kernel command line parameter handling and clean up the core ACPI code related to sysfs (Andy Shevchenko, Baokun Li, Clayton Casciato). - Postpone bringing devices in the general ACPI PM domain to D0 during resume from system-wide suspend until they are really needed (Dmitry Torokhov). - Make the ACPI processor driver fix up C-state latency if not ordered (Mario Limonciello). - Add support for identifying devices depening on the given one that are not its direct descendants with the help of _DEP (Daniel Scally). - Extend the checks related to ACPI IRQ overrides on x86 in order to avoid false-positives (Hui Wang). - Add battery DPTF participant for Intel SoCs (Sumeet Pawnikar). - Rearrange the ACPI fan driver and device power management code to use a common list of device IDs (Rafael Wysocki). - Fix clang CFI violation in the ACPI BGRT table parsing code and clean it up (Nathan Chancellor). - Add GPE-related quirks for some laptops to the EC driver (Chris Chiu, Zhang Rui). - Make the ACPI PPTT table parsing code populate the cache-id value if present in the firmware (James Morse). - Remove redundant clearing of context->ret.pointer from acpi_run_osc() (Hans de Goede). - Add missing acpi_put_table() in acpi_init_fpdt() (Jing Xiangfeng). - Make ACPI APEI handle ARM Processor Error CPER records like Memory Error ones to avoid user space task lockups (Xiaofei Tan). - Stop warning about disabled ACPI in APEI (Jon Hunter). - Fix fall-through warning for Clang in the SBSHC driver (Gustavo A. R. Silva). - Add custom DSDT file as Makefile prerequisite (Richard Fitzgerald). - Initialize local variable to avoid garbage being returned (Colin Ian King). - Simplify assorted pieces of code, address assorted coding style and documentation issues and comment typos (Baokun Li, Christophe JAILLET, Clayton Casciato, Liu Shixin, Shaokun Zhang, Wei Yongjun, Yang Li, Zhen Lei)" * tag 'acpi-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (97 commits) ACPI: PM: postpone bringing devices to D0 unless we need them ACPI: tables: Add custom DSDT file as makefile prerequisite ACPI: bgrt: Use sysfs_emit ACPI: bgrt: Fix CFI violation ACPI: EC: trust DSDT GPE for certain HP laptop ACPI: scan: Simplify acpi_table_events_fn() ACPI: PM: Adjust behavior for field problems on AMD systems ACPI: PM: s2idle: Add support for new Microsoft UUID ACPI: PM: s2idle: Add support for multiple func mask ACPI: PM: s2idle: Refactor common code ACPI: PM: s2idle: Use correct revision id ACPI: sysfs: Remove tailing return statement in void function ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros ACPI: sysfs: Sort headers alphabetically ACPI: sysfs: Refactor param_get_trace_state() to drop dead code ACPI: sysfs: Unify pattern of memory allocations ACPI: sysfs: Allow bitmap list to be supplied to acpi_mask_gpe ACPI: sysfs: Make sparse happy about address space in use ACPI: scan: Fix race related to dropping dependencies ACPI: scan: Reorganize acpi_device_add() ...
2021-06-29Merge branches 'clk-legacy', 'clk-vc5', 'clk-allwinner', 'clk-nvidia' and ↵Stephen Boyd23-514/+302
'clk-imx' into clk-next * clk-legacy: clkdev: remove unused clkdev_alloc() interfaces clkdev: remove CONFIG_CLKDEV_LOOKUP m68k: coldfire: remove private clk_get/clk_put m68k: coldfire: use clkdev_lookup on most coldfire mips: ralink: convert to CONFIG_COMMON_CLK mips: ar7: convert to CONFIG_COMMON_CLK mips: ar7: convert to clkdev_lookup * clk-vc5: clk: vc5: fix output disabling when enabling a FOD * clk-allwinner: clk: sunxi-ng: v3s: fix incorrect postdivider on pll-audio * clk-nvidia: clk: tegra: clk-tegra124-dfll-fcpu: don't use devm functions for regulator clk: tegra: tegra124-emc: Fix clock imbalance in emc_set_timing() clk: tegra: Add stubs needed for compile-testing clk: tegra: Don't deassert reset on enabling clocks clk: tegra: Mark external clocks as not having reset control clk: tegra: cclk: Handle thermal DIV2 CPU frequency throttling clk: tegra: Don't allow zero clock rate for PLLs clk: tegra: Halve SCLK rate on Tegra20 clk: tegra: Ensure that PLLU configuration is applied properly clk: tegra: Fix refcounting of gate clocks clk: tegra30: Use 300MHz for video decoder by default * clk-imx: clk: imx8mq: remove SYS PLL 1/2 clock gates clk: imx: scu: Do not enable runtime PM for CPU clks clk: imx: scu: add parent save and restore clk: imx: scu: Only save DC SS clock using non-cached clock rate clk: imx: scu: Add A72 frequency scaling support clk: imx: scu: Add A53 frequency scaling support clk: imx: scu: bypass pi_pll enable status restore clk: imx: scu: detach pd if can't power up clk: imx: scu: bypass cpu clock save and restore clk: imx: scu: add parallel port clock ops clk: imx: scu: add more scu clocks clk: imx: scu: add enet rgmii gpr clocks clk: imx8qm: add clock valid resource checking clk: imx8qxp: add clock valid checking mechnism clk: imx: scu: add gpr clocks support clk: imx: scu: remove legacy scu clock binding support dt-bindings: arm: imx: scu: drop deprecated legacy clock binding dt-bindings: arm: imx: scu: fix naming typo of clk compatible string clk: imx: Remove the audio ipg clock from imx8mp
2021-06-29Merge tag 'x86-entry-2021-06-29' of ↵Linus Torvalds16-217/+126
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry code related updates from Thomas Gleixner: - Consolidate the macros for .byte ... opcode sequences - Deduplicate register offset defines in include files - Simplify the ia32,x32 compat handling of the related syscall tables to get rid of #ifdeffery. - Clear all EFLAGS which are not required for syscall handling - Consolidate the syscall tables and switch the generation over to the generic shell script and remove the CFLAGS tweaks which are not longer required. - Use 'int' type for system call numbers to match the generic code. - Add more selftests for syscalls * tag 'x86-entry-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/syscalls: Don't adjust CFLAGS for syscall tables x86/syscalls: Remove -Wno-override-init for syscall tables x86/uml/syscalls: Remove array index from syscall initializers x86/syscalls: Clear 'offset' and 'prefix' in case they are set in env x86/entry: Use int everywhere for system call numbers x86/entry: Treat out of range and gap system calls the same x86/entry/64: Sign-extend system calls on entry to int selftests/x86/syscall: Add tests under ptrace to syscall_numbering_64 selftests/x86/syscall: Simplify message reporting in syscall_numbering selftests/x86/syscall: Update and extend syscall_numbering_64 x86/syscalls: Switch to generic syscallhdr.sh x86/syscalls: Use __NR_syscalls instead of __NR_syscall_max x86/unistd: Define X32_NR_syscalls only for 64-bit kernel x86/syscalls: Stop filling syscall arrays with *_sys_ni_syscall x86/syscalls: Switch to generic syscalltbl.sh x86/entry/x32: Rename __x32_compat_sys_* to __x64_compat_sys_*
2021-06-29Merge tag 'x86-irq-2021-06-29' of ↵Linus Torvalds7-53/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 interrupt related updates from Thomas Gleixner: - Consolidate the VECTOR defines and the usage sites. - Cleanup GDT/IDT related code and replace open coded ASM with proper native helper functions. * tag 'x86-irq-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kexec: Set_[gi]dt() -> native_[gi]dt_invalidate() in machine_kexec_*.c x86: Add native_[ig]dt_invalidate() x86/idt: Remove address argument from idt_invalidate() x86/irq: Add and use NR_EXTERNAL_VECTORS and NR_SYSTEM_VECTORS x86/irq: Remove unused vectors defines
2021-06-29Merge tag 'timers-core-2021-06-29' of ↵Linus Torvalds2-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Time and clocksource/clockevent related updates: Core changes: - Infrastructure to support per CPU "broadcast" devices for per CPU clockevent devices which stop in deep idle states. This allows us to utilize the more efficient architected timer on certain ARM SoCs for normal operation instead of permanentely using the slow to access SoC specific clockevent device. - Print the name of the broadcast/wakeup device in /proc/timer_list - Make the clocksource watchdog more robust against delays between reading the current active clocksource and the watchdog clocksource. Such delays can be caused by NMIs, SMIs and vCPU preemption. Handle this by reading the watchdog clocksource twice, i.e. before and after reading the current active clocksource. In case that the two watchdog reads shows an excessive time delta, the read sequence is repeated up to 3 times. - Improve the debug output and add a test module for the watchdog mechanism. - Reimplementation of the venerable time64_to_tm() function with a faster and significantly smaller version. Straight from the source, i.e. the author of the related research paper contributed this! Driver changes: - No new drivers, not even new device tree bindings! - Fixes, improvements and cleanups and all over the place" * tag 'timers-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) time/kunit: Add missing MODULE_LICENSE() time: Improve performance of time64_to_tm() clockevents: Use list_move() instead of list_del()/list_add() clocksource: Print deviation in nanoseconds when a clocksource becomes unstable clocksource: Provide kernel module to test clocksource watchdog clocksource: Reduce clocksource-skew threshold clocksource: Limit number of CPUs checked for clock synchronization clocksource: Check per-CPU clock synchronization when marked unstable clocksource: Retry clock read if long delays detected clockevents: Add missing parameter documentation clocksource/drivers/timer-ti-dm: Drop unnecessary restore clocksource/arm_arch_timer: Improve Allwinner A64 timer workaround clocksource/drivers/arm_global_timer: Remove duplicated argument in arm_global_timer clocksource/drivers/arm_global_timer: Make symbol 'gt_clk_rate_change_nb' static arm: zynq: don't disable CONFIG_ARM_GLOBAL_TIMER due to CONFIG_CPU_FREQ anymore clocksource/drivers/arm_global_timer: Implement rate compensation whenever source clock changes clocksource/drivers/ingenic: Rename unreasonable array names clocksource/drivers/timer-ti-dm: Save and restore timer TIOCP_CFG clocksource/drivers/mediatek: Ack and disable interrupts on suspend clocksource/drivers/samsung_pwm: Constify source IO memory ...
2021-06-29Merge tag 'irq-core-2021-06-29' of ↵Linus Torvalds31-12/+54
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core changes: - Cleanup and simplification of common code to invoke the low level interrupt flow handlers when this invocation requires irqdomain resolution. Add the necessary core infrastructure. - Provide a proper interface for modular PMU drivers to set the interrupt affinity. - Add a request flag which allows to exclude interrupts from spurious interrupt detection. Useful especially for IPI handlers which always return IRQ_HANDLED which turns the spurious interrupt detection into a pointless waste of CPU cycles. Driver changes: - Bulk convert interrupt chip drivers to the new irqdomain low level flow handler invocation mechanism. - Add device tree bindings for the Renesas R-Car M3-W+ SoC - Enable modular build of the Qualcomm PDC driver - The usual small fixes and improvements" * tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) dt-bindings: interrupt-controller: arm,gic-v3: Describe GICv3 optional properties irqchip: gic-pm: Remove redundant error log of clock bulk irqchip/sun4i: Remove unnecessary oom message irqchip/irq-imx-gpcv2: Remove unnecessary oom message irqchip/imgpdc: Remove unnecessary oom message irqchip/gic-v3-its: Remove unnecessary oom message irqchip/gic-v2m: Remove unnecessary oom message irqchip/exynos-combiner: Remove unnecessary oom message irqchip: Bulk conversion to generic_handle_domain_irq() genirq: Move non-irqdomain handle_domain_irq() handling into ARM's handle_IRQ() genirq: Add generic_handle_domain_irq() helper irqchip/nvic: Convert from handle_IRQ() to handle_domain_irq() irqdesc: Fix __handle_domain_irq() comment genirq: Use irq_resolve_mapping() to implement __handle_domain_irq() and co irqdomain: Introduce irq_resolve_mapping() irqdomain: Protect the linear revmap with RCU irqdomain: Cache irq_data instead of a virq number in the revmap irqdomain: Use struct_size() helper when allocating irqdomain irqdomain: Make normal and nomap irqdomains exclusive powerpc: Move the use of irq_domain_add_nomap() behind a config option ...
2021-06-29Merge tag 'hyperv-next-signed-20210629' of ↵Linus Torvalds2-48/+1
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: "Just a few minor enhancement patches and bug fixes" * tag 'hyperv-next-signed-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: PCI: hv: Add check for hyperv_initialized in init_hv_pci_drv() Drivers: hv: Move Hyper-V extended capability check to arch neutral code drivers: hv: Fix missing error code in vmbus_connect() x86/hyperv: fix logical processor creation hv_utils: Fix passing zero to 'PTR_ERR' warning scsi: storvsc: Use blk_mq_unique_tag() to generate requestIDs Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer hv_balloon: Remove redundant assignment to region_start
2021-06-29mm,hwpoison: send SIGBUS with error virutal addressNaoya Horiguchi1-2/+11
Now an action required MCE in already hwpoisoned address surely sends a SIGBUS to current process, but the SIGBUS doesn't convey error virtual address. That's not optimal for hwpoison-aware applications. To fix the issue, make memory_failure() call kill_accessing_process(), that does pagetable walk to find the error virtual address. It could find multiple virtual addresses for the same error page, and it seems hard to tell which virtual address is correct one. But that's rare and sending incorrect virtual address could be better than no address. So let's report the first found virtual address for now. [[email protected]: fix walk_page_range() return] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Cc: Tony Luck <[email protected]> Cc: Aili Yao <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Jue Wang <[email protected]> Cc: Borislav Petkov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-29mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMAMike Rapoport26-40/+40
After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA configuration options are equivalent. Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead. Done with $ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \ $(git grep -wl CONFIG_NEED_MULTIPLE_NODES) $ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \ $(git grep -wl NEED_MULTIPLE_NODES) with manual tweaks afterwards. [[email protected]: fix arm boot crash] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matt Turner <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-29arch, mm: remove stale mentions of DISCONIGMEMMike Rapoport6-25/+4
There are several places that mention DISCONIGMEM in comments or have stale code guarded by CONFIG_DISCONTIGMEM. Remove the dead code and update the comments. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matt Turner <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-29m68k: remove support for DISCONTIGMEMMike Rapoport5-76/+1
DISCONTIGMEM was replaced by FLATMEM with freeing of the unused memory map in v5.11. Remove the support for DISCONTIGMEM entirely. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matt Turner <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-06-29arc: remove support for DISCONTIGMEMMike Rapoport3-61/+0
DISCONTIGMEM was replaced by FLATMEM with freeing of the unused memory map in v5.11. Remove the support for DISCONTIGMEM entirely. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Vineet Gupta <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Matt Turner <[email protected]> Cc: Richard Henderson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>