aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-08-12mm/x86: use general page fault accountingPeter Xu1-15/+2
Use the general page fault accounting by passing regs into handle_mm_fault(). Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: H. Peter Anvin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/sparc64: use general page fault accountingPeter Xu1-10/+1
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: David S. Miller <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/sparc32: use general page fault accountingPeter Xu1-10/+1
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: David S. Miller <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/sh: use general page fault accountingPeter Xu1-10/+1
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/s390: use general page fault accountingPeter Xu1-15/+1
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Gerald Schaefer <[email protected]> Acked-by: Gerald Schaefer <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/riscv: use general page fault accountingPeter Xu1-15/+1
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Acked-by: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Albert Ou <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/powerpc: use general page fault accountingPeter Xu1-8/+3
Use the general page fault accounting by passing regs into handle_mm_fault(). Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/parisc: use general page fault accountingPeter Xu1-5/+3
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault(). Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: Helge Deller <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/openrisc: use general page fault accountingPeter Xu1-5/+4
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault(). Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Stafford Horne <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Stefan Kristiansson <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/nios2: use general page fault accountingPeter Xu1-10/+4
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault(). Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Ley Foon Tan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/nds32: use general page fault accountingPeter Xu1-16/+3
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Fix PERF_COUNT_SW_PAGE_FAULTS perf event manually for page fault retries, by moving it before taking mmap_sem. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Greentime Hu <[email protected]> Cc: Nick Hu <[email protected]> Cc: Vincent Chen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/mips: use general page fault accountingPeter Xu1-11/+3
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Fix PERF_COUNT_SW_PAGE_FAULTS perf event manually for page fault retries, by moving it before taking mmap_sem. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Thomas Bogendoerfer <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/microblaze: use general page fault accountingPeter Xu1-5/+4
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault(). Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Michal Simek <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/m68k: use general page fault accountingPeter Xu1-10/+4
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault(). Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/ia64: use general page fault accountingPeter Xu1-5/+4
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault(). Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: "Luck, Tony" <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/hexagon: use general page fault accountingPeter Xu1-5/+4
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault(). Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Brian Cain <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/csky: use general page fault accountingPeter Xu1-11/+1
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Guo Ren <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/arm64: use general page fault accountingPeter Xu1-23/+6
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. To do this, we pass pt_regs pointer into __do_page_fault(). Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Catalin Marinas <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/arm: use general page fault accountingPeter Xu1-19/+6
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. To do this, we need to pass the pt_regs pointer into __do_page_fault(). Fix PERF_COUNT_SW_PAGE_FAULTS perf event manually for page fault retries, by moving it before taking mmap_sem. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Russell King <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/arc: use general page fault accountingPeter Xu1-15/+3
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. Fix PERF_COUNT_SW_PAGE_FAULTS perf event manually for page fault retries, by moving it before taking mmap_sem. Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Vineet Gupta <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/alpha: use general page fault accountingPeter Xu1-5/+3
Use the general page fault accounting by passing regs into handle_mm_fault(). Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault(). Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm: do page fault accounting in handle_mm_faultPeter Xu31-34/+103
Patch series "mm: Page fault accounting cleanups", v5. This is v5 of the pf accounting cleanup series. It originates from Gerald Schaefer's report on an issue a week ago regarding to incorrect page fault accountings for retried page fault after commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times"): https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/ What this series did: - Correct page fault accounting: we do accounting for a page fault (no matter whether it's from #PF handling, or gup, or anything else) only with the one that completed the fault. For example, page fault retries should not be counted in page fault counters. Same to the perf events. - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf event is used in an adhoc way across different archs. Case (1): for many archs it's done at the entry of a page fault handler, so that it will also cover e.g. errornous faults. Case (2): for some other archs, it is only accounted when the page fault is resolved successfully. Case (3): there're still quite some archs that have not enabled this perf event. Since this series will touch merely all the archs, we unify this perf event to always follow case (1), which is the one that makes most sense. And since we moved the accounting into handle_mm_fault, the other two MAJ/MIN perf events are well taken care of naturally. - Unify definition of "major faults": the definition of "major fault" is slightly changed when used in accounting (not VM_FAULT_MAJOR). More information in patch 1. - Always account the page fault onto the one that triggered the page fault. This does not matter much for #PF handlings, but mostly for gup. More information on this in patch 25. Patchset layout: Patch 1: Introduced the accounting in handle_mm_fault(), not enabled. Patch 2-23: Enable the new accounting for arch #PF handlers one by one. Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.) Patch 25: Cleanup GUP task_struct pointer since it's not needed any more This patch (of 25): This is a preparation patch to move page fault accountings into the general code in handle_mm_fault(). This includes both the per task flt_maj/flt_min counters, and the major/minor page fault perf events. To do this, the pt_regs pointer is passed into handle_mm_fault(). PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault handlers. So far, all the pt_regs pointer that passed into handle_mm_fault() is NULL, which means this patch should have no intented functional change. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David S. Miller <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nick Hu <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/gup: use a standard migration target allocation callbackJoonsoo Kim1-48/+6
There is a well-defined migration target allocation callback. Use it. Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Roman Gushchin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/hugetlb: make hugetlb migration callback CMA awareJoonsoo Kim3-9/+10
new_non_cma_page() in gup.c requires to allocate the new page that is not on the CMA area. new_non_cma_page() implements it by using allocation scope APIs. However, there is a work-around for hugetlb. Normal hugetlb page allocation API for migration is alloc_huge_page_nodemask(). It consists of two steps. First is dequeing from the pool. Second is, if there is no available page on the queue, allocating by using the page allocator. new_non_cma_page() can't use this API since first step (deque) isn't aware of scope API to exclude CMA area. So, new_non_cma_page() exports hugetlb internal function for the second step, alloc_migrate_huge_page(), to global scope and uses it directly. This is suboptimal since hugetlb pages on the queue cannot be utilized. This patch tries to fix this situation by making the deque function on hugetlb CMA aware. In the deque function, CMA memory is skipped if PF_MEMALLOC_NOCMA flag is found. Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Mike Kravetz <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Roman Gushchin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/gup: restrict CMA region by using allocation scope APIJoonsoo Kim2-9/+10
We have well defined scope API to exclude CMA region. Use it rather than manipulating gfp_mask manually. With this change, we can now restore __GFP_MOVABLE for gfp_mask like as usual migration target allocation. It would result in that the ZONE_MOVABLE is also searched by page allocator. For hugetlb, gfp_mask is redefined since it has a regular allocation mask filter for migration target. __GPF_NOWARN is added to hugetlb gfp_mask filter since a new user for gfp_mask filter, gup, want to be silent when allocation fails. Note that this can be considered as a fix for the commit 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region"). However, "Fixes" tag isn't added here since it is just suboptimal but it doesn't cause any problem. Suggested-by: Michal Hocko <[email protected]> Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/page_alloc: remove a wrapper for alloc_migration_target()Joonsoo Kim2-12/+6
There is a well-defined standard migration target callback. Use it directly. Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Roman Gushchin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/mempolicy: use a standard migration target allocation callbackJoonsoo Kim3-28/+12
There is a well-defined migration target allocation callback. Use it. Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Roman Gushchin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/migrate: introduce a standard migration target allocation functionJoonsoo Kim7-22/+61
There are some similar functions for migration target allocation. Since there is no fundamental difference, it's better to keep just one rather than keeping all variants. This patch implements base migration target allocation function. In the following patches, variants will be converted to use this function. Changes should be mechanical, but, unfortunately, there are some differences. First, some callers' nodemask is assgined to NULL since NULL nodemask will be considered as all available nodes, that is, &node_states[N_MEMORY]. Second, for hugetlb page allocation, gfp_mask is redefined as regular hugetlb allocation gfp_mask plus __GFP_THISNODE if user provided gfp_mask has it. This is because future caller of this function requires to set this node constaint. Lastly, if provided nodeid is NUMA_NO_NODE, nodeid is set up to the node where migration source lives. It helps to remove simple wrappers for setting up the nodeid. Note that PageHighmem() call in previous function is changed to open-code "is_highmem_idx()" since it provides more readability. [[email protected]: tweak patch title, per Vlastimil] [[email protected]: fix typo in comment] Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Roman Gushchin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/migrate: clear __GFP_RECLAIM to make the migration callback consistent ↵Joonsoo Kim1-0/+5
with regular THP allocations new_page_nodemask is a migration callback and it tries to use a common gfp flags for the target page allocation whether it is a base page or a THP. The later only adds GFP_TRANSHUGE to the given mask. This results in the allocation being slightly more aggressive than necessary because the resulting gfp mask will contain also __GFP_RECLAIM_KSWAPD. THP allocations usually exclude this flag to reduce over eager background reclaim during a high THP allocation load which has been seen during large mmaps initialization. There is no indication that this is a problem for migration as well but theoretically the same might happen when migrating large mappings to a different node. Make the migration callback consistent with regular THP allocations. [[email protected]: fix comment typo, per Vlastimil] Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Roman Gushchin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/hugetlb: unify migration callbacksJoonsoo Kim4-49/+33
There is no difference between two migration callback functions, alloc_huge_page_node() and alloc_huge_page_nodemask(), except __GFP_THISNODE handling. It's redundant to have two almost similar functions in order to handle this flag. So, this patch tries to remove one by introducing a new argument, gfp_mask, to alloc_huge_page_nodemask(). After introducing gfp_mask argument, it's caller's job to provide correct gfp_mask. So, every callsites for alloc_huge_page_nodemask() are changed to provide gfp_mask. Note that it's safe to remove a node id check in alloc_huge_page_node() since there is no caller passing NUMA_NO_NODE as a node id. Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Roman Gushchin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/migrate: move migration helper from .h to .cJoonsoo Kim2-28/+34
It's not performance sensitive function. Move it to .c. This is a preparation step for future change. Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: Mike Kravetz <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Roman Gushchin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm/page_isolation: prefer the node of the source pageJoonsoo Kim1-1/+3
Patch series "clean-up the migration target allocation functions", v5. This patch (of 9): For locality, it's better to migrate the page to the same node rather than the node of the current caller's cpu. Signed-off-by: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12ipc/shm.c: remove the superfluous breakLiao Pingfang1-1/+0
Remove the superfuous break, as there is a 'return' before it. Signed-off-by: Liao Pingfang <[email protected]> Signed-off-by: Yi Wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12ipc: uninline functionsAlexey Dobriyan2-4/+2
Two functions are only called via function pointers, don't bother inlining them. Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12scripts/gdb: fix python 3.8 SyntaxWarningNick Desaulniers1-2/+2
Fixes the observed warnings: scripts/gdb/linux/rbtree.py:20: SyntaxWarning: "is" with a literal. Did you mean "=="? if node is 0: scripts/gdb/linux/rbtree.py:36: SyntaxWarning: "is" with a literal. Did you mean "=="? if node is 0: It looks like this is a new warning added in Python 3.8. I've only seen this once after adding the add-auto-load-safe-path rule to my ~/.gdbinit for a new tree. Fixes: commit 449ca0c95ea2 ("scripts/gdb: add rb tree iterating utilities") Signed-off-by: Nick Desaulniers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Cc: Jan Kiszka <[email protected]> Cc: Kieran Bingham <[email protected]> Cc: Aymeric Agon-Rambosson <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: https://adamj.eu/tech/2020/01/21/why-does-python-3-8-syntaxwarning-for-is-literal/ Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12kcov: make some symbols staticWei Yongjun1-3/+3
Fix sparse build warnings: kernel/kcov.c:99:1: warning: symbol '__pcpu_scope_kcov_percpu_data' was not declared. Should it be static? kernel/kcov.c:778:6: warning: symbol 'kcov_remote_softirq_start' was not declared. Should it be static? kernel/kcov.c:795:6: warning: symbol 'kcov_remote_softirq_stop' was not declared. Should it be static? Reported-by: Hulk Robot <[email protected]> Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12kcov: unconditionally add -fno-stack-protector to compiler optionsMarco Elver1-1/+1
Unconditionally add -fno-stack-protector to KCOV's compiler options, as all supported compilers support the option. This saves a compiler invocation to determine if the option is supported. Because Clang does not support -fno-conserve-stack, and -fno-stack-protector was wrapped in the same cc-option, we were missing -fno-stack-protector with Clang. Unconditionally adding this option fixes this for Clang. Suggested-by: Nick Desaulniers <[email protected]> Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Alexander Potapenko <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12panic: make print_oops_end_marker() staticYue Hu2-2/+1
Since print_oops_end_marker() is not used externally, also remove it in kernel.h at the same time. Signed-off-by: Yue Hu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Kees Cook <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12lib/Kconfig.debug: fix typo in the help text of CONFIG_PANIC_TIMEOUTTiezhu Yang1-1/+1
There exists duplicated "the" in the help text of CONFIG_PANIC_TIMEOUT, Remove it. Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Xuefeng Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12kernel/panic.c: make oops_may_print() return boolTiezhu Yang2-2/+2
The return value of oops_may_print() is true or false, so change its type to reflect that. Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Xuefeng Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12rapidio/rio_mport_cdev: use array_size() helper in copy_{from,to}_user()Gustavo A. R. Silva1-2/+2
Use array_size() helper instead of the open-coded version in copy_{from,to}_user(). These sorts of multiplication factors need to be wrapped in array_size(). This issue was found with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Matt Porter <[email protected]> Cc: Alexandre Bounine <[email protected]> Link: http://lkml.kernel.org/r/20200616183050.GA31840@embeddedor Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83 Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12drivers/rapidio/rio-scan.c: use struct_size() helperGustavo A. R. Silva1-5/+3
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. Also, while there, use the preferred form for passing a size of a struct. The alternative form where struct name is spelled out hurts readability and introduces an opportunity for a bug when the pointer variable type is changed but the corresponding sizeof that is passed as argument is not. This issue was found with the help of Coccinelle and, audited and fixed manually. Addresses KSPP ID: https://github.com/KSPP/linux/issues/83 Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Matt Porter <[email protected]> Cc: Alexandre Bounine <[email protected]> Link: http://lkml.kernel.org/r/20200619170445.GA22641@embeddedor Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12drivers/rapidio/devices/rio_mport_cdev.c: use struct_size() helperGustavo A. R. Silva1-2/+1
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This issue was found with the help of Coccinelle and, audited and fixed manually. Addresses KSPP ID: https://github.com/KSPP/linux/issues/83 Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Matt Porter <[email protected]> Cc: Alexandre Bounine <[email protected]> Link: http://lkml.kernel.org/r/20200619170843.GA24923@embeddedor Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12kdump: append kernel build-id string to VMCOREINFOVijay Balakrishna2-0/+56
Make kernel GNU build-id available in VMCOREINFO. Having build-id in VMCOREINFO facilitates presenting appropriate kernel namelist image with debug information file to kernel crash dump analysis tools. Currently VMCOREINFO lacks uniquely identifiable key for crash analysis automation. Regarding if this patch is necessary or matching of linux_banner and OSRELEASE in VMCOREINFO employed by crash(8) meets the need -- IMO, build-id approach more foolproof, in most instances it is a cryptographic hash generated using internal code/ELF bits unlike kernel version string upon which linux_banner is based that is external to the code. I feel each is intended for a different purpose. Also OSRELEASE is not suitable when two different kernel builds from same version with different features enabled. Currently for most linux (and non-linux) systems build-id can be extracted using standard methods for file types such as user mode crash dumps, shared libraries, loadable kernel modules etc., This is an exception for linux kernel dump. Having build-id in VMCOREINFO brings some uniformity for automation tools. Tyler said: : I think this is a nice improvement over today's linux_banner approach for : correlating vmlinux to a kernel dump. : : The elf notes parsing in this patch lines up with what is described in in : the "Notes (Nhdr)" section of the elf(5) man page. : : BUILD_ID_MAX is sufficient to hold a sha1 build-id, which is the default : build-id type today in GNU ld(2). It is also sufficient to hold the : "fast" build-id, which is the default build-id type today in LLVM lld(2). Signed-off-by: Vijay Balakrishna <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Tyler Hicks <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12exec: move path_noexec() check earlierKees Cook2-8/+8
The path_noexec() check, like the regular file check, was happening too late, letting LSMs see impossible execve()s. Check it earlier as well in may_open() and collect the redundant fs/exec.c path_noexec() test under the same robustness comment as the S_ISREG() check. My notes on the call path, and related arguments, checks, etc: do_open_execat() struct open_flags open_exec_flags = { .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, .acc_mode = MAY_EXEC, ... do_filp_open(dfd, filename, open_flags) path_openat(nameidata, open_flags, flags) file = alloc_empty_file(open_flags, current_cred()); do_open(nameidata, file, open_flags) may_open(path, acc_mode, open_flag) /* new location of MAY_EXEC vs path_noexec() test */ inode_permission(inode, MAY_OPEN | acc_mode) security_inode_permission(inode, acc_mode) vfs_open(path, file) do_dentry_open(file, path->dentry->d_inode, open) security_file_open(f) open() /* old location of path_noexec() test */ Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Aleksa Sarai <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Tetsuo Handa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12exec: move S_ISREG() check earlierKees Cook3-10/+16
The execve(2)/uselib(2) syscalls have always rejected non-regular files. Recently, it was noticed that a deadlock was introduced when trying to execute pipes, as the S_ISREG() test was happening too late. This was fixed in commit 73601ea5b7b1 ("fs/open.c: allow opening only regular files during execve()"), but it was added after inode_permission() had already run, which meant LSMs could see bogus attempts to execute non-regular files. Move the test into the other inode type checks (which already look for other pathological conditions[1]). Since there is no need to use FMODE_EXEC while we still have access to "acc_mode", also switch the test to MAY_EXEC. Also include a comment with the redundant S_ISREG() checks at the end of execve(2)/uselib(2) to note that they are present to avoid any mistakes. My notes on the call path, and related arguments, checks, etc: do_open_execat() struct open_flags open_exec_flags = { .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, .acc_mode = MAY_EXEC, ... do_filp_open(dfd, filename, open_flags) path_openat(nameidata, open_flags, flags) file = alloc_empty_file(open_flags, current_cred()); do_open(nameidata, file, open_flags) may_open(path, acc_mode, open_flag) /* new location of MAY_EXEC vs S_ISREG() test */ inode_permission(inode, MAY_OPEN | acc_mode) security_inode_permission(inode, acc_mode) vfs_open(path, file) do_dentry_open(file, path->dentry->d_inode, open) /* old location of FMODE_EXEC vs S_ISREG() test */ security_file_open(f) open() [1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/ Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Aleksa Sarai <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Tetsuo Handa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12exec: change uselib(2) IS_SREG() failure to EACCESKees Cook1-2/+1
Patch series "Relocate execve() sanity checks", v2. While looking at the code paths for the proposed O_MAYEXEC flag, I saw some things that looked like they should be fixed up. exec: Change uselib(2) IS_SREG() failure to EACCES This just regularizes the return code on uselib(2). exec: Move S_ISREG() check earlier This moves the S_ISREG() check even earlier than it was already. exec: Move path_noexec() check earlier This adds the path_noexec() check to the same place as the S_ISREG() check. This patch (of 3): Change uselib(2)' S_ISREG() error return to EACCES instead of EINVAL so the behavior matches execve(2), and the seemingly documented value. The "not a regular file" failure mode of execve(2) is explicitly documented[1], but it is not mentioned in uselib(2)[2] which does, however, say that open(2) and mmap(2) errors may apply. The documentation for open(2) does not include a "not a regular file" error[3], but mmap(2) does[4], and it is EACCES. [1] http://man7.org/linux/man-pages/man2/execve.2.html#ERRORS [2] http://man7.org/linux/man-pages/man2/uselib.2.html#ERRORS [3] http://man7.org/linux/man-pages/man2/open.2.html#ERRORS [4] http://man7.org/linux/man-pages/man2/mmap.2.html#ERRORS Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Christian Brauner <[email protected]> Cc: Aleksa Sarai <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Tetsuo Handa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12coredump: add %f for executable filenameLepton Wu2-5/+15
The document reads "%e" should be "executable filename" while actually it could be changed by things like pr_ctl PR_SET_NAME. People who uses "%e" in core_pattern get surprised when they find out they get thread name instead of executable filename. This is either a bug of document or a bug of code. Since the behavior of "%e" is there for long time, it could bring another surprise for users if we "fix" the code. So we just "fix" the document. And more, for users who really need the "executable filename" in core_pattern, we introduce a new "%f" for the real executable filename. We already have "%E" for executable path in kernel, so just reuse most of its code for the new added "%f" format. Signed-off-by: Lepton Wu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12test_kmod: avoid potential double free in trigger_config_run_type()Tiezhu Yang1-1/+1
Reset the member "test_fs" of the test configuration after a call of the function "kfree_const" to a null pointer so that a double memory release will not be performed. Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader") Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Luis Chamberlain <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Al Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Chuck Lever <[email protected]> Cc: David Howells <[email protected]> Cc: David S. Miller <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: James Morris <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Kees Cook <[email protected]> Cc: Lars Ellenberg <[email protected]> Cc: Nikolay Aleksandrov <[email protected]> Cc: Philipp Reisner <[email protected]> Cc: Roopa Prabhu <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: Sergei Trofimovich <[email protected]> Cc: Sergey Kvachonok <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Tony Vroon <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12kmod: remove redundant "be an" in the commentTiezhu Yang1-3/+2
There exists redundant "be an" in the comment, remove it. Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Luis Chamberlain <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Al Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Chuck Lever <[email protected]> Cc: David Howells <[email protected]> Cc: David S. Miller <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: James Morris <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Kees Cook <[email protected]> Cc: Lars Ellenberg <[email protected]> Cc: Nikolay Aleksandrov <[email protected]> Cc: Philipp Reisner <[email protected]> Cc: Roopa Prabhu <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: Sergei Trofimovich <[email protected]> Cc: Sergey Kvachonok <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Tony Vroon <[email protected]> Cc: Christoph Hellwig <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>