Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
|
|
Initialise global and static variable to 0 is always unnecessary.
Remove the unnecessary initialisations.
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
The double `the' is duplicated in line 41, remove one.
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
This enables ARCH_HAS_VM_GET_PAGE_PROT on the platform and exports
standard vm_get_page_prot() implementation via DECLARE_VM_GET_PAGE_PROT,
which looks up a private and static protection_map[] array. Subsequently
all __SXXX and __PXXX macros can be dropped which are no longer needed.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
I observed that for each of the shared file-backed page faults, we're very
likely to retry one more time for the 1st write fault upon no page. It's
because we'll need to release the mmap lock for dirty rate limit purpose
with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
Then after that throttling we return VM_FAULT_RETRY.
We did that probably because VM_FAULT_RETRY is the only way we can return
to the fault handler at that time telling it we've released the mmap lock.
However that's not ideal because it's very likely the fault does not need
to be retried at all since the pgtable was well installed before the
throttling, so the next continuous fault (including taking mmap read lock,
walk the pgtable, etc.) could be in most cases unnecessary.
It's not only slowing down page faults for shared file-backed, but also add
more mmap lock contention which is in most cases not needed at all.
To observe this, one could try to write to some shmem page and look at
"pgfault" value in /proc/vmstat, then we should expect 2 counts for each
shmem write simply because we retried, and vm event "pgfault" will capture
that.
To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
show that we've completed the whole fault and released the lock. It's also
a hint that we should very possibly not need another fault immediately on
this page because we've just completed it.
This patch provides a ~12% perf boost on my aarch64 test VM with a simple
program sequentially dirtying 400MB shmem file being mmap()ed and these are
the time it needs:
Before: 650.980 ms (+-1.94%)
After: 569.396 ms (+-1.38%)
I believe it could help more than that.
We need some special care on GUP and the s390 pgfault handler (for gmap
code before returning from pgfault), the rest changes in the page fault
handlers should be relatively straightforward.
Another thing to mention is that mm_account_fault() does take this new
fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
them as-is.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Vineet Gupta <[email protected]>
Acked-by: Guo Ren <[email protected]>
Acked-by: Max Filippov <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Acked-by: Catalin Marinas <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Acked-by: Russell King (Oracle) <[email protected]> [arm part]
Acked-by: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Janosch Frank <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add some build time checks to prevent that the various usages of
"ldil L%(TMPALIAS_MAP_START), %reg"
sign-extends into the upper 32 bits when building a 64-bit kernel.
Signed-off-by: Helge Deller <[email protected]>
|
|
Change the "BUG" to "WARNING" and disable the message because it triggers
occasionally in spite of the check in flush_cache_page_if_present.
The pte value extracted for the "from" page in copy_user_highpage is racy and
occasionally the pte is cleared before the flush is complete. I assume that
the page is simultaneously flushed by flush_cache_mm before the pte is cleared
as nullifying the fdc doesn't seem to cause problems.
I investigated various locking scenarios but I wasn't able to find a way to
sequence the flushes. This code is called for every COW break and locks impact
performance.
This patch is related to the bigger cache flush patch because we need the pte
on PA8800/PA8900 to flush using the vma context.
I have also seen this from copy_to_user_page and copy_from_user_page.
The messages appear infrequently when enabled.
Signed-off-by: John David Anglin <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
Currently, the parisc kernel does not fully support non-access TLB
fault handling for probe instructions. In the fast path, we set the
target register to zero if it is not a shadowed register. The slow
path is not implemented, so we call do_page_fault. The architecture
indicates that non-access faults should not cause a page fault from
disk.
This change adds to code to provide non-access fault support for
probe instructions. It also modifies the handling of faults on
userspace so that if the address lies in a valid VMA and the access
type matches that for the VMA, the probe target register is set to
one. Otherwise, the target register is set to zero.
This was done to make probe instructions more useful for userspace.
Probe instructions are not very useful if they set the target register
to zero whenever a page is not present in memory. Nominally, the
purpose of the probe instruction is determine whether read or write
access to a given address is allowed.
This fixes a problem in function pointer comparison noticed in the
glibc testsuite (stdio-common/tst-vfprintf-user-type). The same
problem is likely in glibc (_dl_lookup_address).
V2 adds flush and lpa instruction support to handle_nadtlb_fault.
Signed-off-by: John David Anglin <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
With huge kernel pages, we randomly eat a SPARC in map_pages(). This
is fixed by dropping __init from the declaration.
However, map_pages references the __init routine memblock_alloc_try_nid
via memblock_alloc. Thus, it needs to be marked with __ref.
memblock_alloc is only called before the kernel text is set to readonly.
The __ref on free_initmem is no longer needed.
Comment regarding map_pages being in the init section is removed.
Signed-off-by: John David Anglin <[email protected]>
Cc: [email protected] # v5.4+
Signed-off-by: Helge Deller <[email protected]>
|
|
Merge misc updates from Andrew Morton:
"146 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
damon)"
* emailed patches from Andrew Morton <[email protected]>: (146 commits)
mm/damon: hide kernel pointer from tracepoint event
mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
mm/damon/dbgfs: remove an unnecessary variable
mm/damon: move the implementation of damon_insert_region to damon.h
mm/damon: add access checking for hugetlb pages
Docs/admin-guide/mm/damon/usage: update for schemes statistics
mm/damon/dbgfs: support all DAMOS stats
Docs/admin-guide/mm/damon/reclaim: document statistics parameters
mm/damon/reclaim: provide reclamation statistics
mm/damon/schemes: account how many times quota limit has exceeded
mm/damon/schemes: account scheme actions that successfully applied
mm/damon: remove a mistakenly added comment for a future feature
Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
Docs/admin-guide/mm/damon/usage: remove redundant information
Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
mm/damon: convert macro functions to static inline functions
mm/damon: modify damon_rand() macro to static inline function
mm/damon: move damon_rand() definition into damon.h
...
|
|
Since commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple
times") allowed VM_FAULT_RETRY for multiple times, the
FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page
fault path, so the following check is no longer needed:
flags & FAULT_FLAG_ALLOW_RETRY
So just remove it.
[[email protected]: coding style fixes]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Qi Zheng <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Kirill Shutemov <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Chengming Zhou <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In debugging kernel panics, I believe it is useful to know what type
of page fault caused the termination. "Bad Address" is too vague.
Signed-off-by: John David Anglin <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
It is dangerous to call faulthandler_disabled() when user_mode(regs)
is true. The task pagefault_disabled counter is racy and it is not
updated atomically on parisc. As a result, calling faulthandler_disabled()
may cause erroneous termination.
We now handle execption fixups and termination when user_mode(regs) is
false in handle_interruption(). Thus, we can just remove the
faulthandler_disabled() check from do_page_fault().
Signed-off-by: John David Anglin <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
Use register r29 instead of register r8 to signal faults when accessing
user memory. In case of faults, the fixup routine will store -EFAULT in
this register.
This change saves up to 752 bytes on a 32bit kernel, partly because the
compiler doesn't need to save and restore the old r8 value on the stack.
bloat-o-meter results for usage with r29 register:
add/remove: 0/0 grow/shrink: 23/86 up/down: 228/-980 (-752)
bloat-o-meter results for usage with r28 register:
add/remove: 0/0 grow/shrink: 28/83 up/down: 296/-956 (-660)
Signed-off-by: Helge Deller <[email protected]>
|
|
Fix a kernel crash which happens on PA1.x CPUs while initializing the
FTRACE/KPROBE breakpoints. The PTE table entries for the fixmap area
were not created correctly.
Signed-off-by: Helge Deller <[email protected]>
Fixes: ccfbc68d41c2 ("parisc: add set_fixmap()/clear_fixmap()")
Cc: [email protected] # v5.2+
|
|
Signed-off-by: Yihao Han <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
I've got the following splat after enabling preemption:
[ 3.724721] BUG: using __this_cpu_add() in preemptible [00000000] code: swapper/0/1
[ 3.734630] caller is __this_cpu_preempt_check+0x38/0x50
[ 3.740635] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc4-64bit+ #324
[ 3.744605] Hardware name: 9000/785/C8000
[ 3.744605] Backtrace:
[ 3.744605] [<00000000401d9d58>] show_stack+0x74/0xb0
[ 3.744605] [<0000000040c27bd4>] dump_stack_lvl+0x10c/0x188
[ 3.744605] [<0000000040c27c84>] dump_stack+0x34/0x48
[ 3.744605] [<0000000040c33438>] check_preemption_disabled+0x178/0x1b0
[ 3.744605] [<0000000040c334f8>] __this_cpu_preempt_check+0x38/0x50
[ 3.744605] [<00000000401d632c>] flush_tlb_all+0x58/0x2e0
[ 3.744605] [<00000000401075c0>] 0x401075c0
[ 3.744605] [<000000004010b8fc>] 0x4010b8fc
[ 3.744605] [<00000000401080fc>] 0x401080fc
[ 3.744605] [<00000000401d5224>] do_one_initcall+0x128/0x378
[ 3.744605] [<0000000040102de8>] 0x40102de8
[ 3.744605] [<0000000040c33864>] kernel_init+0x60/0x3a8
[ 3.744605] [<00000000401d1020>] ret_from_kernel_thread+0x20/0x28
[ 3.744605]
Fix this by moving the __inc_irq_stat() into the locked section.
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
When adding kfence support, we need to tell kfence_handle_page_fault()
if the interrupted assembler statement is a read or write operation.
Signed-off-by: Helge Deller <[email protected]>
|
|
This is the order of the page table allocation, not the order of a PMD.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4.
This series tries to disable huge pmd unshare of hugetlbfs backed memory
for uffd-wp. Although uffd-wp of hugetlbfs is still during rfc stage,
the idea of this series may be needed for multiple tasks (Axel's uffd
minor fault series, and Mike's soft dirty series), so I picked it out
from the larger series.
This patch (of 4):
It is a preparation work to be able to behave differently in the per
architecture huge_pte_alloc() according to different VMA attributes.
Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call.
[[email protected]: build fix]
Link: https://lkml.kernel.org/r/20210304164653.GB397383@xz-x1Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Xu <[email protected]>
Suggested-by: Mike Kravetz <[email protected]>
Cc: Adam Ruprecht <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Cannon Matthews <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chinwen Chang <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Dr . David Alan Gilbert" <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Lokesh Gidra <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Michal Koutn" <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Mina Almasry <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Shawn Anastasio <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
mem_init_print_info() is called in mem_init() on each architecture, and
pass NULL argument, so using void argument and move it into mm_init().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: Dave Hansen <[email protected]> [x86]
Reviewed-by: Christophe Leroy <[email protected]> [powerpc]
Acked-by: David Hildenbrand <[email protected]>
Tested-by: Anatoly Pugachev <[email protected]> [sparc64]
Acked-by: Russell King <[email protected]> [arm]
Acked-by: Mike Rapoport <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: "Peter Zijlstra" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
On parisc a spinlock is stored in the next page behind the pgd which
protects against parallel accesses to the pgd. That's why one additional
page (PGD_ALLOC_ORDER) is allocated for the pgd.
Matthew Wilcox suggested that we instead should use a pointer in the
struct page table for this spinlock and noted, that the comments for the
PGD_ORDER and PMD_ORDER defines were wrong.
Both suggestions are addressed with this patch. Instead of having an own
spinlock to protect the pgd, we now switch to use the existing
page_table_lock. Additionally, beside loading the pgd into cr25 in
switch_mm_irqs_off(), the physical address of this lock is loaded into
cr28 (tr4), so that we can avoid implementing a complicated lookup in
assembly for this lock in the TLB fault handlers.
The existing Hybrid L2/L3 page table scheme (where the pmd is adjacent
to the pgd) has been dropped with this patch.
Remove the locking in set_pte() and the huge-page pte functions too.
They trigger a spinlock recursion on 32bit machines and seem unnecessary.
Suggested-by: Matthew Wilcox <[email protected]>
Fixes: b37d1c1898b2 ("parisc: Use per-pagetable spinlock")
Signed-off-by: John David Anglin <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.
Remove the quote operator # from compiler_attributes.h __section macro.
Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.
Conversion done using the script at:
https://lore.kernel.org/lkml/[email protected]/2-convert_section.pl
Signed-off-by: Joe Perches <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Miguel Ojeda <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <[email protected]>
|
|
Use the general page fault accounting by passing regs into
handle_mm_fault(). It naturally solve the issue of multiple page fault
accounting when page fault retry happened.
Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the
other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in
handle_mm_fault().
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Helge Deller <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: Page fault accounting cleanups", v5.
This is v5 of the pf accounting cleanup series. It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 4064b9827063 ("mm: allow
VM_FAULT_RETRY for multiple times"):
https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/
What this series did:
- Correct page fault accounting: we do accounting for a page fault
(no matter whether it's from #PF handling, or gup, or anything else)
only with the one that completed the fault. For example, page fault
retries should not be counted in page fault counters. Same to the
perf events.
- Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
event is used in an adhoc way across different archs.
Case (1): for many archs it's done at the entry of a page fault
handler, so that it will also cover e.g. errornous faults.
Case (2): for some other archs, it is only accounted when the page
fault is resolved successfully.
Case (3): there're still quite some archs that have not enabled
this perf event.
Since this series will touch merely all the archs, we unify this
perf event to always follow case (1), which is the one that makes most
sense. And since we moved the accounting into handle_mm_fault, the
other two MAJ/MIN perf events are well taken care of naturally.
- Unify definition of "major faults": the definition of "major
fault" is slightly changed when used in accounting (not
VM_FAULT_MAJOR). More information in patch 1.
- Always account the page fault onto the one that triggered the page
fault. This does not matter much for #PF handlings, but mostly for
gup. More information on this in patch 25.
Patchset layout:
Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 25: Cleanup GUP task_struct pointer since it's not needed any more
This patch (of 25):
This is a preparation patch to move page fault accountings into the
general code in handle_mm_fault(). This includes both the per task
flt_maj/flt_min counters, and the major/minor page fault perf events. To
do this, the pt_regs pointer is passed into handle_mm_fault().
PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
handlers.
So far, all the pt_regs pointer that passed into handle_mm_fault() is
NULL, which means this patch should have no intented functional change.
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: "Luck, Tony" <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
functions that call memory_present() for each region in memblock.memory:
sparse_memory_present_with_active_regions() and membocks_present().
Moreover, all architectures have a call to either of these functions
preceding the call to sparse_init() and in the most cases they are called
one after the other.
Mark the regions from memblock.memory as present during sparce_init() by
making sparse_init() call memblocks_present(), make memblocks_present()
and memory_present() functions static and remove redundant
sparse_memory_present_with_active_regions() function.
Also remove no longer required HAVE_MEMORY_PRESENT configuration option.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: cleanup usage of <asm/pgalloc.h>"
Most architectures have very similar versions of pXd_alloc_one() and
pXd_free_one() for intermediate levels of page table. These patches add
generic versions of these functions in <asm-generic/pgalloc.h> and enable
use of the generic functions where appropriate.
In addition, functions declared and defined in <asm/pgalloc.h> headers are
used mostly by core mm and early mm initialization in arch and there is no
actual reason to have the <asm/pgalloc.h> included all over the place.
The first patch in this series removes unneeded includes of
<asm/pgalloc.h>
In the end it didn't work out as neatly as I hoped and moving
pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
unnecessary changes to arches that have custom page table allocations, so
I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
to mm/.
This patch (of 8):
In most cases <asm/pgalloc.h> header is required only for allocations of
page table memory. Most of the .c files that include that header do not
use symbols declared in <asm/pgalloc.h> and do not require that header.
As for the other header files that used to include <asm/pgalloc.h>, it is
possible to move that include into the .c file that actually uses symbols
from <asm/pgalloc.h> and drop the include from the header file.
The process was somewhat automated using
sed -i -E '/[<"]asm\/pgalloc\.h/d' \
$(grep -L -w -f /tmp/xx \
$(git grep -E -l '[<"]asm/pgalloc\.h'))
where /tmp/xx contains all the symbols defined in
arch/*/include/asm/pgalloc.h.
[[email protected]: fix powerpc warning]
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Pekka Enberg <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]> [m68k]
Cc: Abdul Haleem <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Satheesh Rajendran <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Drop own open-coded implementation to set bits and use the kernel
provided BIT_MASK() and BIT_WORD() macros.
Signed-off-by: Helge Deller <[email protected]>
|
|
Convert comments that reference old mmap_sem APIs to reference
corresponding new mmap locking APIs instead.
Signed-off-by: Michel Lespinasse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Daniel Jordan <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ying Han <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.
The change is generated using coccinelle with the following rule:
// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .
@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)
Signed-off-by: Michel Lespinasse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Daniel Jordan <[email protected]>
Reviewed-by: Laurent Dufour <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ying Han <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The powerpc 32-bit implementation of pgtable has nice shortcuts for
accessing kernel PMD and PTE for a given virtual address. Make these
helpers available for all architectures.
[[email protected]: microblaze: fix page table traversal in setup_rt_frame()]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: s/pmd_ptr_k/pmd_off_k/ in various powerpc places]
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: consolidate definitions of page table accessors", v2.
The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once. For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.
Most of these definitions are actually identical and typically it boils
down to, e.g.
static inline unsigned long pmd_index(unsigned long address)
{
return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}
These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.
These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.
This patch (of 12):
The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g. pte_alloc() and
pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.
The include statements in such cases are remove with a simple loop:
for f in $(git grep -l "include <linux/mm.h>") ; do
sed -i -e '/include <asm\/pgtable.h>/ d' $f
done
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
free_area_init() only requires the definition of maximal PFN for each of
the supported zone rater than calculation of actual zone sizes and the
sizes of the holes between the zones.
After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
available to all architectures.
Using this function instead of free_area_init_node() simplifies the zone
detection.
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Hoan Tran <[email protected]> [arm64]
Cc: Baoquan He <[email protected]>
Cc: Brian Cain <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ley Foon Tan <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The Debian kernel v5.6 triggers this kernel panic:
Kernel panic - not syncing: Bad Address (null pointer deref?)
Bad Address (null pointer deref?): Code=26 (Data memory access rights trap) at addr 0000000000000000
CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-2-parisc64 #1 Debian 5.6.14-1
IAOQ[0]: mem_init+0xb0/0x150
IAOQ[1]: mem_init+0xb4/0x150
RP(r2): start_kernel+0x6c8/0x1190
Backtrace:
[<0000000040101ab4>] start_kernel+0x6c8/0x1190
[<0000000040108574>] start_parisc+0x158/0x1b8
on a HP-PARISC rp3440 machine with this memory layout:
Memory Ranges:
0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB
1) Start 0x0000004040000000 End 0x00000040ffdfffff Size 3070 MB
Fix the crash by avoiding virt_to_page() and similar functions in
mem_init() until the memory zones have been fully set up.
Signed-off-by: Helge Deller <[email protected]>
Cc: [email protected] # v5.0+
|
|
The idea comes from a discussion between Linus and Andrea [1].
Before this patch we only allow a page fault to retry once. We achieved
this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time. This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle the
page fault on a single page. However that should hardly happen, and after
all for each code path to return a VM_FAULT_RETRY we'll first wait for a
condition (during which time we should possibly yield the cpu) to happen
before VM_FAULT_RETRY is really returned.
This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
flag when we receive VM_FAULT_RETRY. It means that the page fault handler
now can retry the page fault for multiple times if necessary without the
need to generate another page fault event. Meanwhile we still keep the
FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
page fault is the first attempt or not.
Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):
- ALLOW_RETRY and !TRIED: this means the page fault allows to
retry, and this is the first try
- ALLOW_RETRY and TRIED: this means the page fault allows to
retry, and this is not the first try
- !ALLOW_RETRY and !TRIED: this means the page fault does not allow
to retry at all
- !ALLOW_RETRY and TRIED: this is forbidden and should never be used
In existing code we have multiple places that has taken special care of
the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect
the first retry of a page fault by checking against both (fault_flags &
FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
even the 2nd try will have the ALLOW_RETRY set, then use that helper in
all existing special paths. One example is in __lock_page_or_retry(), now
we'll drop the mmap_sem only in the first attempt of page fault and we'll
keep it in follow up retries, so old locking behavior will be retained.
This will be a nice enhancement for current code [2] at the same time a
supporting material for the future userfaultfd-writeprotect work, since in
that work there will always be an explicit userfault writeprotect retry
for protected pages, and if that cannot resolve the page fault (e.g., when
userfaultfd-writeprotect is used in conjunction with swapped pages) then
we'll possibly need a 3rd retry of the page fault. It might also benefit
other potential users who will have similar requirement like userfault
write-protection.
GUP code is not touched yet and will be covered in follow up patch.
Please read the thread below for more information.
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/
Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Brian Geffon <[email protected]>
Cc: Bobby Powers <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Denis Plotnikov <[email protected]>
Cc: "Dr . David Alan Gilbert" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Martin Cracauer <[email protected]>
Cc: Marty McFadden <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Maya Gokhale <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Although there're tons of arch-specific page fault handlers, most of them
are still sharing the same initial value of the page fault flags. Say,
merely all of the page fault handlers would allow the fault to be retried,
and they also allow the fault to respond to SIGKILL.
Let's define a default value for the fault flags to replace those initial
page fault flags that were copied over. With this, it'll be far easier to
introduce new fault flag that can be used by all the architectures instead
of touching all the archs.
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Brian Geffon <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Bobby Powers <[email protected]>
Cc: Denis Plotnikov <[email protected]>
Cc: "Dr . David Alan Gilbert" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Martin Cracauer <[email protected]>
Cc: Marty McFadden <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Maya Gokhale <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
For most architectures, we've got a quick path to detect fatal signal
after a handle_mm_fault(). Introduce a helper for that quick path.
It cleans the current codes a bit so we don't need to duplicate the same
check across archs. More importantly, this will be an unified place that
we handle the signal immediately right after an interrupted page fault, so
it'll be much easier for us if we want to change the behavior of handling
signals later on for all the archs.
Note that currently only part of the archs are using this new helper,
because some archs have their own way to handle signals. In the follow up
patches, we'll try to apply this helper to all the rest of archs.
Another note is that the "regs" parameter in the new helper is not used
yet. It'll be used very soon. Now we kept it in this patch only to avoid
touching all the archs again in the follow up patches.
[[email protected]: fix sparse warnings]
Link: http://lkml.kernel.org/r/20200311145921.GD479302@xz-x1
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Brian Geffon <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Bobby Powers <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Denis Plotnikov <[email protected]>
Cc: "Dr . David Alan Gilbert" <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "Kirill A . Shutemov" <[email protected]>
Cc: Martin Cracauer <[email protected]>
Cc: Marty McFadden <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Maya Gokhale <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The current code uses '#if PTRS_PER_PMD == 1' to distinguish 2 vs 3 levels,
setup, it casts pgd to pgd to cope with page table folding and converts
addresses of page table entries from physical to virtual and back for no
good reason.
Simplify the accesses to the page table entries using proper unfolding of
the upper layers and replacing '#if PTRS_PER_PMD' with explicit
'#if CONFIG_PGTABLE_LEVELS == 3'
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
The commit d96885e277b5 ("parisc: use pgtable-nopXd instead of
4level-fixup") converted PA-RISC to use folded page tables, but it missed
the conversion of pgd_populate() to pud_populate() in maps_pages()
function. This caused the upper page table directory to remain empty and
the system would crash as a result.
Using pud_populate() that actually populates the page table instead of
dummy pgd_populate() fixes the issue.
Fixes: d96885e277b5 ("parisc: use pgtable-nopXd instead of 4level-fixup")
Reported-by: Meelis Roos <[email protected]>
Reported-by: Jeroen Roovers <[email protected]>
Reported-by: Mikulas Patocka <[email protected]>
Tested-by: Jeroen Roovers <[email protected]>
Tested-by: Mikulas Patocka <[email protected]>
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Mike Rapoport <[email protected]>
Cc: Anatoly Pugachev <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Peter Rosin <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rolf Eike Beer <[email protected]>
Cc: Russell King <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Creasey <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Vineet Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
parisc has two or three levels of page tables and can use appropriate
pgtable-nopXd and folding of the upper layers.
Replace usage of include/asm-generic/4level-fixup.h and explicit
definitions of __PAGETABLE_PxD_FOLDED in parisc with
include/asm-generic/pgtable-nopmd.h for two-level configurations and
with include/asm-generic/pgtable-nopud.h for three-lelve configurations
and adjust page table manipulation macros and functions accordingly.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Helge Deller <[email protected]>
Cc: Anatoly Pugachev <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mark Salter <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Peter Rosin <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rolf Eike Beer <[email protected]>
Cc: Russell King <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Creasey <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Vineet Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
__ioremap is always called with the _PAGE_NO_CACHE, so fold the whole
thing and rename it to ioremap.
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Helge Deller <[email protected]>
|
|
Sven noticed that calling ioremap() and iounmap() multiple times leads
to a vmap memory leak:
vmap allocation for size 4198400 failed:
use vmalloc=<size> to increase size
It seems we missed calling vunmap() in iounmap().
Signed-off-by: Helge Deller <[email protected]>
Noticed-by: Sven Schnelle <[email protected]>
Cc: <[email protected]> # v3.16+
|
|
Fix a fall-through warning in fault.c.
Fixes: a035d552a93b ("Makefile: Globally enable fall-through warning")
Signed-off-by: Helge Deller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"Dynamic ftrace support by Sven Schnelle and a header guard fix by
Denis Efremov"
* 'parisc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: asm: psw.h: missing header guard
parisc: add dynamic ftrace
compiler.h: add CC_USING_PATCHABLE_FUNCTION_ENTRY
parisc: use pr_debug() in kernel/module.c
parisc: add WARN_ON() to clear_fixmap
parisc: add spinlock to patch function
parisc: add support for patching multiple words
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull force_sig() argument change from Eric Biederman:
"A source of error over the years has been that force_sig has taken a
task parameter when it is only safe to use force_sig with the current
task.
The force_sig function is built for delivering synchronous signals
such as SIGSEGV where the userspace application caused a synchronous
fault (such as a page fault) and the kernel responded with a signal.
Because the name force_sig does not make this clear, and because the
force_sig takes a task parameter the function force_sig has been
abused for sending other kinds of signals over the years. Slowly those
have been fixed when the oopses have been tracked down.
This set of changes fixes the remaining abusers of force_sig and
carefully rips out the task parameter from force_sig and friends
making this kind of error almost impossible in the future"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
signal: Remove the signal number and task parameters from force_sig_info
signal: Factor force_sig_info_to_task out of force_sig_info
signal: Generate the siginfo in force_sig
signal: Move the computation of force into send_signal and correct it.
signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
signal: Remove the task parameter from force_sig_fault
signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
signal: Explicitly call force_sig_fault on current
signal/unicore32: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from ptrace_break
signal/nds32: Remove tsk parameter from send_sigtrap
signal/riscv: Remove tsk parameter from do_trap
signal/sh: Remove tsk parameter from force_sig_info_fault
signal/um: Remove task parameter from send_sigtrap
signal/x86: Remove task parameter from send_sigtrap
signal: Remove task parameter from force_sig_mceerr
signal: Remove task parameter from force_sig
signal: Remove task parameter from force_sigsegv
...
|
|
This patch implements dynamic ftrace for PA-RISC. The required mcount
call sequences can get pretty long, so instead of patching the
whole call sequence out of the functions, we are using
-fpatchable-function-entry from gcc. This puts a configurable amount of
NOPS before/at the start of the function. Taking do_sys_open() as example,
which would look like this when the call is patched out:
1036b248: 08 00 02 40 nop
1036b24c: 08 00 02 40 nop
1036b250: 08 00 02 40 nop
1036b254: 08 00 02 40 nop
1036b258 <do_sys_open>:
1036b258: 08 00 02 40 nop
1036b25c: 08 03 02 41 copy r3,r1
1036b260: 6b c2 3f d9 stw rp,-14(sp)
1036b264: 08 1e 02 43 copy sp,r3
1036b268: 6f c1 01 00 stw,ma r1,80(sp)
When ftrace gets enabled for this function the kernel will patch these
NOPs to:
1036b248: 10 19 57 20 <address of ftrace>
1036b24c: 6f c1 00 80 stw,ma r1,40(sp)
1036b250: 48 21 3f d1 ldw -18(r1),r1
1036b254: e8 20 c0 02 bv,n r0(r1)
1036b258 <do_sys_open>:
1036b258: e8 3f 1f df b,l,n .-c,r1
1036b25c: 08 03 02 41 copy r3,r1
1036b260: 6b c2 3f d9 stw rp,-14(sp)
1036b264: 08 1e 02 43 copy sp,r3
1036b268: 6f c1 01 00 stw,ma r1,80(sp)
So the first NOP in do_sys_open() will be patched to jump backwards into
some minimal trampoline code which pushes a stackframe, saves r1 which
holds the return address, loads the address of the real ftrace function,
and branches to that location. For 64 Bit things are getting a bit more
complicated (and longer) because we must make sure that the address of
ftrace location is 8 byte aligned, and the offset passed to ldd for
fetching the address is 8 byte aligned as well.
Note that gcc has a bug which misplaces the function label, and needs a
patch to make dynamic ftrace work. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90751 for details.
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
Calling clear_fixmap() on an already cleared fixed mapping is
a bad thing to do. Add a WARN_ON() to catch such issues.
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
As synchronous exceptions really only make sense against the current
task (otherwise how are you synchronous) remove the task parameter
from from force_sig_fault to make it explicit that is what is going
on.
The two known exceptions that deliver a synchronous exception to a
stopped ptraced task have already been changed to
force_sig_fault_to_task.
The callers have been changed with the following emacs regular expression
(with obvious variations on the architectures that take more arguments)
to avoid typos:
force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
->
force_sig_fault(\1,\2,\3)
Signed-off-by: "Eric W. Biederman" <[email protected]>
|