aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/mm
AgeCommit message (Collapse)AuthorFilesLines
2021-06-17powerpc/32s: Rework Kernel Userspace Access ProtectionChristophe Leroy1-1/+13
On book3s/32, KUAP is provided by toggling Ks bit in segment registers. One segment register addresses 256M of virtual memory. At the time being, KUAP implements a complex logic to apply the unlock/lock on the exact number of segments covering the user range to access, with saving the boundaries of the range of segments in a member of thread struct. But most if not all user accesses are within a single segment. Rework KUAP with a different approach: - Open only one segment, the one corresponding to the starting address of the range to be accessed. - If a second segment is involved, it will generate a page fault. The segment will then be open by the page fault handler. The kuap member of thread struct will now contain: - The start address of the current on going user access, that will be used to know which segment to lock at the end of the user access. - ~0 when no user access is open - ~1 when additionnal segments are opened by a page fault. Then, at lock time - When only one segment is open, close it. - When several segments are open, close all user segments. Almost 100% of the time, only one segment will be involved. In interrupts, inline the function that unlock/lock all segments, because not inlining them implies a lot of register save/restore. With the patch, writing value 128 in userspace in perf_copy_attr() is done with 16 instructions: 3890: 93 82 04 dc stw r28,1244(r2) 3894: 7d 20 e5 26 mfsrin r9,r28 3898: 55 29 00 80 rlwinm r9,r9,0,2,0 389c: 7d 20 e1 e4 mtsrin r9,r28 38a0: 4c 00 01 2c isync 38a4: 39 20 00 80 li r9,128 38a8: 91 3c 00 00 stw r9,0(r28) 38ac: 81 42 04 dc lwz r10,1244(r2) 38b0: 39 00 ff ff li r8,-1 38b4: 91 02 04 dc stw r8,1244(r2) 38b8: 2c 0a ff fe cmpwi r10,-2 38bc: 41 82 00 88 beq 3944 <perf_copy_attr+0x36c> 38c0: 7d 20 55 26 mfsrin r9,r10 38c4: 65 29 40 00 oris r9,r9,16384 38c8: 7d 20 51 e4 mtsrin r9,r10 38cc: 4c 00 01 2c isync ... 3944: 48 00 00 01 bl 3944 <perf_copy_attr+0x36c> 3944: R_PPC_REL24 kuap_lock_all_ool Before the patch it was 118 instructions. In reality only 42 are executed in most cases, but GCC is not able to see that a properly aligned user access cannot involve more than one segment. 5060: 39 1d 00 04 addi r8,r29,4 5064: 3d 20 b0 00 lis r9,-20480 5068: 7c 08 48 40 cmplw r8,r9 506c: 40 81 00 08 ble 5074 <perf_copy_attr+0x2cc> 5070: 3d 00 b0 00 lis r8,-20480 5074: 39 28 ff ff addi r9,r8,-1 5078: 57 aa 00 06 rlwinm r10,r29,0,0,3 507c: 55 29 27 3e rlwinm r9,r9,4,28,31 5080: 39 29 00 01 addi r9,r9,1 5084: 7d 29 53 78 or r9,r9,r10 5088: 91 22 04 dc stw r9,1244(r2) 508c: 7d 20 ed 26 mfsrin r9,r29 5090: 55 29 00 80 rlwinm r9,r9,0,2,0 5094: 7c 08 50 40 cmplw r8,r10 5098: 40 81 00 c0 ble 5158 <perf_copy_attr+0x3b0> 509c: 7d 46 50 f8 not r6,r10 50a0: 7c c6 42 14 add r6,r6,r8 50a4: 54 c6 27 be rlwinm r6,r6,4,30,31 50a8: 7d 20 51 e4 mtsrin r9,r10 50ac: 3c ea 10 00 addis r7,r10,4096 50b0: 39 29 01 11 addi r9,r9,273 50b4: 7f 88 38 40 cmplw cr7,r8,r7 50b8: 55 29 02 06 rlwinm r9,r9,0,8,3 50bc: 40 9d 00 9c ble cr7,5158 <perf_copy_attr+0x3b0> 50c0: 2f 86 00 00 cmpwi cr7,r6,0 50c4: 41 9e 00 4c beq cr7,5110 <perf_copy_attr+0x368> 50c8: 2f 86 00 01 cmpwi cr7,r6,1 50cc: 41 9e 00 2c beq cr7,50f8 <perf_copy_attr+0x350> 50d0: 2f 86 00 02 cmpwi cr7,r6,2 50d4: 41 9e 00 14 beq cr7,50e8 <perf_copy_attr+0x340> 50d8: 7d 20 39 e4 mtsrin r9,r7 50dc: 39 29 01 11 addi r9,r9,273 50e0: 3c e7 10 00 addis r7,r7,4096 50e4: 55 29 02 06 rlwinm r9,r9,0,8,3 50e8: 7d 20 39 e4 mtsrin r9,r7 50ec: 39 29 01 11 addi r9,r9,273 50f0: 3c e7 10 00 addis r7,r7,4096 50f4: 55 29 02 06 rlwinm r9,r9,0,8,3 50f8: 7d 20 39 e4 mtsrin r9,r7 50fc: 3c e7 10 00 addis r7,r7,4096 5100: 39 29 01 11 addi r9,r9,273 5104: 7f 88 38 40 cmplw cr7,r8,r7 5108: 55 29 02 06 rlwinm r9,r9,0,8,3 510c: 40 9d 00 4c ble cr7,5158 <perf_copy_attr+0x3b0> 5110: 7d 20 39 e4 mtsrin r9,r7 5114: 39 29 01 11 addi r9,r9,273 5118: 3c c7 10 00 addis r6,r7,4096 511c: 55 29 02 06 rlwinm r9,r9,0,8,3 5120: 7d 20 31 e4 mtsrin r9,r6 5124: 39 29 01 11 addi r9,r9,273 5128: 3c c6 10 00 addis r6,r6,4096 512c: 55 29 02 06 rlwinm r9,r9,0,8,3 5130: 7d 20 31 e4 mtsrin r9,r6 5134: 39 29 01 11 addi r9,r9,273 5138: 3c c7 30 00 addis r6,r7,12288 513c: 55 29 02 06 rlwinm r9,r9,0,8,3 5140: 7d 20 31 e4 mtsrin r9,r6 5144: 3c e7 40 00 addis r7,r7,16384 5148: 39 29 01 11 addi r9,r9,273 514c: 7f 88 38 40 cmplw cr7,r8,r7 5150: 55 29 02 06 rlwinm r9,r9,0,8,3 5154: 41 9d ff bc bgt cr7,5110 <perf_copy_attr+0x368> 5158: 4c 00 01 2c isync 515c: 39 20 00 80 li r9,128 5160: 91 3d 00 00 stw r9,0(r29) 5164: 38 e0 00 00 li r7,0 5168: 90 e2 04 dc stw r7,1244(r2) 516c: 7d 20 ed 26 mfsrin r9,r29 5170: 65 29 40 00 oris r9,r9,16384 5174: 40 81 00 c0 ble 5234 <perf_copy_attr+0x48c> 5178: 7d 47 50 f8 not r7,r10 517c: 7c e7 42 14 add r7,r7,r8 5180: 54 e7 27 be rlwinm r7,r7,4,30,31 5184: 7d 20 51 e4 mtsrin r9,r10 5188: 3d 4a 10 00 addis r10,r10,4096 518c: 39 29 01 11 addi r9,r9,273 5190: 7c 08 50 40 cmplw r8,r10 5194: 55 29 02 06 rlwinm r9,r9,0,8,3 5198: 40 81 00 9c ble 5234 <perf_copy_attr+0x48c> 519c: 2c 07 00 00 cmpwi r7,0 51a0: 41 82 00 4c beq 51ec <perf_copy_attr+0x444> 51a4: 2c 07 00 01 cmpwi r7,1 51a8: 41 82 00 2c beq 51d4 <perf_copy_attr+0x42c> 51ac: 2c 07 00 02 cmpwi r7,2 51b0: 41 82 00 14 beq 51c4 <perf_copy_attr+0x41c> 51b4: 7d 20 51 e4 mtsrin r9,r10 51b8: 39 29 01 11 addi r9,r9,273 51bc: 3d 4a 10 00 addis r10,r10,4096 51c0: 55 29 02 06 rlwinm r9,r9,0,8,3 51c4: 7d 20 51 e4 mtsrin r9,r10 51c8: 39 29 01 11 addi r9,r9,273 51cc: 3d 4a 10 00 addis r10,r10,4096 51d0: 55 29 02 06 rlwinm r9,r9,0,8,3 51d4: 7d 20 51 e4 mtsrin r9,r10 51d8: 3d 4a 10 00 addis r10,r10,4096 51dc: 39 29 01 11 addi r9,r9,273 51e0: 7c 08 50 40 cmplw r8,r10 51e4: 55 29 02 06 rlwinm r9,r9,0,8,3 51e8: 40 81 00 4c ble 5234 <perf_copy_attr+0x48c> 51ec: 7d 20 51 e4 mtsrin r9,r10 51f0: 39 29 01 11 addi r9,r9,273 51f4: 3c ea 10 00 addis r7,r10,4096 51f8: 55 29 02 06 rlwinm r9,r9,0,8,3 51fc: 7d 20 39 e4 mtsrin r9,r7 5200: 39 29 01 11 addi r9,r9,273 5204: 3c e7 10 00 addis r7,r7,4096 5208: 55 29 02 06 rlwinm r9,r9,0,8,3 520c: 7d 20 39 e4 mtsrin r9,r7 5210: 39 29 01 11 addi r9,r9,273 5214: 3c ea 30 00 addis r7,r10,12288 5218: 55 29 02 06 rlwinm r9,r9,0,8,3 521c: 7d 20 39 e4 mtsrin r9,r7 5220: 3d 4a 40 00 addis r10,r10,16384 5224: 39 29 01 11 addi r9,r9,273 5228: 7c 08 50 40 cmplw r8,r10 522c: 55 29 02 06 rlwinm r9,r9,0,8,3 5230: 41 81 ff bc bgt 51ec <perf_copy_attr+0x444> 5234: 4c 00 01 2c isync Signed-off-by: Christophe Leroy <[email protected]> [mpe: Export the ool handlers to fix build errors] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/d9121f96a7c4302946839a0771f5d1daeeb6968c.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17powerpc/32s: Allow disabling KUAP at boot timeChristophe Leroy1-4/+8
PPC64 uses MMU features to enable/disable KUAP at boot time. But feature fixups are applied way too early on PPC32. Now that all KUAP related actions are in C following the conversion of KUAP initial setup and context switch in C, static branches can be used to enable/disable KUAP. Signed-off-by: Christophe Leroy <[email protected]> [mpe: Export disable_kuap_key to fix build errors] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/cd79e8008455fba5395d099f9bb1305c039b931c.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17powerpc/32s: Allow disabling KUEP at boot timeChristophe Leroy1-4/+7
PPC64 uses MMU features to enable/disable KUEP at boot time. But feature fixups are applied way too early on PPC32. Now that all KUEP related actions are in C following the conversion of KUEP initial setup and context switch in C, static branches can be used to enable/disable KUEP. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/7745a2c3a08ec46302920a3f48d1cb9b5469dbbb.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17powerpc/32s: Initialise KUAP and KUEP in CChristophe Leroy2-0/+12
In order to selectively activate KUAP and KUEP in a following patch, perform KUAP and KUEP initialisation in C. Unlike PPC64, PPC32 doesn't have an early_setup_secondary(), so do it in start_secondary(). Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/87be72023448dd4e476744ed279b8c04b8d08a1c.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17powerpc/32s: Convert switch_mmu_context() to CChristophe Leroy1-0/+35
switch_mmu_context() does things that can easily be done in C. For updating user segments, we have update_user_segments(). As mentionned in commit b5efec00b671 ("powerpc/32s: Move KUEP locking/unlocking in C"), update_user_segments() has the loop unrolled which is a significant performance gain. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/05c0875ad8220c03452c3a334946e207c6ca04d6.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17powerpc/32s: move CTX_TO_VSID() into mmu-hash.hChristophe Leroy1-13/+0
In order to reuse it in switch_mmu_context(), this patch moves CTX_TO_VSID() macro into asm/book3s/32/mmu-hash.h Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/26b36ef2939234a04b37baf6ffe50cba81f5d1b7.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17powerpc/32s: Refactor update of user segment registersChristophe Leroy1-37/+0
KUEP implements the update of user segment registers. Move it into mmu-hash.h in order to use it from other places. And inline kuep_lock() and kuep_unlock(). Inlining kuep_lock() is important for system_call_exception(), otherwise system_call_exception() has to save into stack the system call parameters that are used just after, and doing that takes more instructions than kuep_lock() itself. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/24591ca480d14a62ef910e38a5273d551262c4a2.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17powerpc/32s: Move setup_{kuep/kuap}() into {kuep/kuap}.cChristophe Leroy4-20/+20
Avoids the #ifdef in mmu.c Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/0b7a13d414837e58264edc336b89c2fe9f35f9bc.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17powerpc/8xx: Allow disabling KUAP at boot timeChristophe Leroy1-3/+8
PPC64 uses MMU features to enable/disable KUAP at boot time. But feature fixups are applied way too early on PPC32. But since commit c16728835eec ("powerpc/32: Manage KUAP in C"), all KUAP is in C so it is now possible to use static branches. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/3dca510ce555335261a47c4799167da698f569c0.1622782111.git.christophe.leroy@csgroup.eu
2021-06-17powerpc/44x: Implement Kernel Userspace Exec Protection (KUEP)Christophe Leroy1-0/+13
Powerpc 44x has two bits for exec protection in TLBs: one for user (UX) and one for superviser (SX). Clear SX on user pages in TLB miss handlers to provide KUEP. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/169310e08152aa1d96c979770291d165ec6896ae.1622616032.git.christophe.leroy@csgroup.eu
2021-06-17powerpc: Don't use 'struct ppc_inst' to reference instruction locationChristophe Leroy1-2/+2
'struct ppc_inst' is an internal representation of an instruction, but in-memory instructions are and will remain a table of 'u32' forever. Replace all 'struct ppc_inst *' used for locating an instruction in memory by 'u32 *'. This removes a lot of undue casts to 'struct ppc_inst *'. It also helps locating ab-use of 'struct ppc_inst' dereference. Signed-off-by: Christophe Leroy <[email protected]> [mpe: Fix ppc_inst_next(), use u32 instead of unsigned int] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/7062722b087228e42cbd896e39bfdf526d6a340a.1621516826.git.christophe.leroy@csgroup.eu
2021-06-10KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMUNicholas Piggin3-68/+9
Rather than partition the guest PID space + flush a rogue guest PID to work around this problem, instead fix it by always disabling the MMU when switching in or out of guest MMU context in HV mode. This may be a bit less efficient, but it is a lot less complicated and allows the P9 path to trivally implement the workaround too. Newer CPUs are not subject to this issue. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-06-10powerpc: Add missing linux/{of.h,irqdomain.h} include directivesMarc Zyngier1-0/+1
A bunch of PPC files are missing the inclusion of linux/of.h and linux/irqdomain.h, relying on transitive inclusion from another file. As we are about to break this dependency, make sure these dependencies are explicit. Signed-off-by: Marc Zyngier <[email protected]>
2021-06-06powerpc/mem: Add back missing header to fix 'no previous prototype' errorChristophe Leroy1-0/+1
Commit b26e8f27253a ("powerpc/mem: Move cache flushing functions into mm/cacheflush.c") removed asm/sparsemem.h which is required when CONFIG_MEMORY_HOTPLUG is selected to get the declaration of create_section_mapping(). Add it back. Fixes: b26e8f27253a ("powerpc/mem: Move cache flushing functions into mm/cacheflush.c") Reported-by: kernel test robot <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/3e5b63bb3daab54a1eb9c20221c2e9528c4db9b3.1622883330.git.christophe.leroy@csgroup.eu
2021-05-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+2
Merge more updates from Andrew Morton: "The remainder of the main mm/ queue. 143 patches. Subsystems affected by this patch series (all mm): pagecache, hugetlb, userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap, kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and kfence" * emailed patches from Andrew Morton <[email protected]>: (143 commits) kfence: use power-efficient work queue to run delayed work kfence: maximize allocation wait timeout duration kfence: await for allocation using wait_event kfence: zero guard page after out-of-bounds access mm/process_vm_access.c: remove duplicate include mm/mempool: minor coding style tweaks mm/highmem.c: fix coding style issue btrfs: use memzero_page() instead of open coded kmap pattern iov_iter: lift memzero_page() to highmem.h mm/zsmalloc: use BUG_ON instead of if condition followed by BUG. mm/zswap.c: switch from strlcpy to strscpy arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE mm,memory_hotplug: add kernel boot option to enable memmap_on_memory acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported mm,memory_hotplug: allocate memmap from the added memory range mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count() mm,memory_hotplug: relax fully spanned sections check drivers/base/memory: introduce memory_block_{online,offline} mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove ...
2021-05-05hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share()Peter Xu1-1/+2
Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4. This series tries to disable huge pmd unshare of hugetlbfs backed memory for uffd-wp. Although uffd-wp of hugetlbfs is still during rfc stage, the idea of this series may be needed for multiple tasks (Axel's uffd minor fault series, and Mike's soft dirty series), so I picked it out from the larger series. This patch (of 4): It is a preparation work to be able to behave differently in the per architecture huge_pte_alloc() according to different VMA attributes. Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call. [[email protected]: build fix] Link: https://lkml.kernel.org/r/20210304164653.GB397383@xz-x1Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Suggested-by: Mike Kravetz <[email protected]> Cc: Adam Ruprecht <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Cannon Matthews <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chinwen Chang <[email protected]> Cc: David Rientjes <[email protected]> Cc: "Dr . David Alan Gilbert" <[email protected]> Cc: Huang Ying <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Michal Koutn" <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Shawn Anastasio <[email protected]> Cc: Steven Price <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-30Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-23/+1
Merge misc updates from Andrew Morton: "A few misc subsystems and some of MM. 175 patches. Subsystems affected by this patch series: ia64, kbuild, scripts, sh, ocfs2, kfifo, vfs, kernel/watchdog, and mm (slab-generic, slub, kmemleak, debug, pagecache, msync, gup, memremap, memcg, pagemap, mremap, dma, sparsemem, vmalloc, documentation, kasan, initialization, pagealloc, and memory-failure)" * emailed patches from Andrew Morton <[email protected]>: (175 commits) mm/memory-failure: unnecessary amount of unmapping mm/mmzone.h: fix existing kernel-doc comments and link them to core-api mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1 net: page_pool: use alloc_pages_bulk in refill code path net: page_pool: refactor dma_map into own function page_pool_dma_map SUNRPC: refresh rq_pages using a bulk page allocator SUNRPC: set rq_page_end differently mm/page_alloc: inline __rmqueue_pcplist mm/page_alloc: optimize code layout for __alloc_pages_bulk mm/page_alloc: add an array-based interface to the bulk page allocator mm/page_alloc: add a bulk page allocator mm/page_alloc: rename alloced to allocated mm/page_alloc: duplicate include linux/vmalloc.h mm, page_alloc: avoid page_to_pfn() in move_freepages() mm/Kconfig: remove default DISCONTIGMEM_MANUAL mm: page_alloc: dump migrate-failed pages mm/mempolicy: fix mpol_misplaced kernel-doc mm/mempolicy: rewrite alloc_pages_vma documentation mm/mempolicy: rewrite alloc_pages documentation mm/mempolicy: rename alloc_pages_current to alloc_pages ...
2021-04-30mm: move mem_init_print_info() into mm_init()Kefeng Wang1-1/+0
mem_init_print_info() is called in mem_init() on each architecture, and pass NULL argument, so using void argument and move it into mm_init(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Acked-by: Dave Hansen <[email protected]> [x86] Reviewed-by: Christophe Leroy <[email protected]> [powerpc] Acked-by: David Hildenbrand <[email protected]> Tested-by: Anatoly Pugachev <[email protected]> [sparc64] Acked-by: Russell King <[email protected]> [arm] Acked-by: Mike Rapoport <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Guo Ren <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: "Peter Zijlstra" <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-30mm/vmalloc: remove unmap_kernel_rangeNicholas Piggin1-1/+1
This is a shim around vunmap_range, get rid of it. Move the main API comment from the _noflush variant to the normal variant, and make _noflush internal to mm/. [[email protected]: fix nommu builds and a comment bug per sfr] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: move vunmap_range_noflush() stub inside !CONFIG_MMU, not !CONFIG_NUMA] [[email protected]: fix nommu builds] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Cédric Le Goater <[email protected]> Cc: Uladzislau Rezki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-30powerpc: inline huge vmap supported functionsNicholas Piggin1-21/+0
This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Acked-by: Michael Ellerman <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ding Tianhong <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Russell King <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-30mm: HUGE_VMAP arch support cleanupNicholas Piggin1-4/+4
This changes the awkward approach where architectures provide init functions to determine which levels they can provide large mappings for, to one where the arch is queried for each call. This removes code and indirection, and allows constant-folding of dead code for unsupported levels. This also adds a prot argument to the arch query. This is unused currently but could help with some architectures (e.g., some powerpc processors can't map uncacheable memory with large pages). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Ding Tianhong <[email protected]> Acked-by: Catalin Marinas <[email protected]> [arm64] Cc: Will Deacon <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Russell King <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-23powerpc/64s: Fix mm_cpumask memory ordering commentNicholas Piggin1-11/+13
The memory ordering comment no longer applies, because mm_ctx_id is no longer used anywhere. At best always been difficult to follow. It's better to consider the load on which the slbmte depends on, which the MMU depends on before it can start loading TLBs, rather than a store which may or may not have a subsequent dependency chain to the slbmte. So update the comment and we use the load of the mm's user context ID. This is much more analogous the radix ordering too, which is good. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-04-21powerpc: Move copy_inst_from_kernel_nofault()Christophe Leroy1-0/+21
When probe_kernel_read_inst() was created, there was no good place to put it, so a file called lib/inst.c was dedicated for it. Since then, probe_kernel_read_inst() has been renamed copy_inst_from_kernel_nofault(). And mm/maccess.h didn't exist at that time. Today, mm/maccess.h is related to copy_from_kernel_nofault(). Move copy_inst_from_kernel_nofault() into mm/maccess.c Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/9655d8957313906b77b8db5700a0e33ce06f45e5.1618405715.git.christophe.leroy@csgroup.eu
2021-04-17powerpc/traps: Enhance readability for trap typesXiongwei Song2-10/+10
Define macros to list ppc interrupt types in interttupt.h, replace the reference of the trap hex values with these macros. Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S, arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S, arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h. Signed-off-by: Xiongwei Song <[email protected]> [mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-04-14powerpc/mm/radix: Make radix__change_memory_range() staticMichael Ellerman1-2/+2
The lkp bot pointed out that with W=1 we get: arch/powerpc/mm/book3s64/radix_pgtable.c:183:6: error: no previous prototype for 'radix__change_memory_range' Which is really saying that it could be static, make it so. Reported-by: kernel test robot <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2021-04-14powerpc: clean up do_page_faultNicholas Piggin2-28/+15
search_exception_tables + __bad_page_fault can be substituted with bad_page_fault, do_page_fault no longer needs to return a value to asm for any sub-architecture, and __bad_page_fault can be static. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-04-14powerpc/64e/interrupt: handle bad_page_fault in CNicholas Piggin1-4/+1
With non-volatile registers saved on interrupt, bad_page_fault can now be called by do_page_fault. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-04-14powerpc/mem: Use kmap_local_page() in flushing functionsChristophe Leroy1-9/+10
Flushing functions don't rely on preemption being disabled, so use kmap_local_page() instead of kmap_atomic(). Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/b6a880ea0ec7886b51edbb4979c188be549231c0.1617895813.git.christophe.leroy@csgroup.eu
2021-04-14powerpc/mem: Inline flush_dcache_page()Christophe Leroy1-15/+0
flush_dcache_page() is only a few lines, it is worth inlining. ia64, csky, mips, openrisc and riscv have a similar flush_dcache_page() and inline it. On pmac32_defconfig, we get a small size reduction. On ppc64_defconfig, we get a very small size increase. In both case that's in the noise (less than 0.1%). text data bss dec hex filename 18991155 5934744 1497624 26423523 19330e3 vmlinux64.before 18994829 5936732 1497624 26429185 1934701 vmlinux64.after 9150963 2467502 184548 11803013 b41985 vmlinux32.before 9149689 2467302 184548 11801539 b413c3 vmlinux32.after Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/21c417488b70b7629dae316539fb7bb8bdef4fdd.1617895813.git.christophe.leroy@csgroup.eu
2021-04-14powerpc/mem: Help GCC realise __flush_dcache_icache() flushes single pagesChristophe Leroy1-1/+1
'And' the given page address with PAGE_MASK to help GCC. With the patch: 00000024 <__flush_dcache_icache>: 24: 54 63 00 26 rlwinm r3,r3,0,0,19 28: 39 40 00 40 li r10,64 2c: 7c 69 1b 78 mr r9,r3 30: 7d 49 03 a6 mtctr r10 34: 7c 00 48 6c dcbst 0,r9 38: 39 29 00 20 addi r9,r9,32 3c: 7c 00 48 6c dcbst 0,r9 40: 39 29 00 20 addi r9,r9,32 44: 42 00 ff f0 bdnz 34 <__flush_dcache_icache+0x10> 48: 7c 00 04 ac hwsync 4c: 39 20 00 40 li r9,64 50: 7d 29 03 a6 mtctr r9 54: 7c 00 1f ac icbi 0,r3 58: 38 63 00 20 addi r3,r3,32 5c: 7c 00 1f ac icbi 0,r3 60: 38 63 00 20 addi r3,r3,32 64: 42 00 ff f0 bdnz 54 <__flush_dcache_icache+0x30> 68: 7c 00 04 ac hwsync 6c: 4c 00 01 2c isync 70: 4e 80 00 20 blr Without the patch: 00000024 <__flush_dcache_icache>: 24: 54 6a 00 34 rlwinm r10,r3,0,0,26 28: 39 23 10 1f addi r9,r3,4127 2c: 7d 2a 48 50 subf r9,r10,r9 30: 55 29 d9 7f rlwinm. r9,r9,27,5,31 34: 41 82 00 94 beq c8 <__flush_dcache_icache+0xa4> 38: 71 28 00 01 andi. r8,r9,1 3c: 38 c9 ff ff addi r6,r9,-1 40: 7d 48 53 78 mr r8,r10 44: 7d 27 4b 78 mr r7,r9 48: 40 82 00 6c bne b4 <__flush_dcache_icache+0x90> 4c: 54 e7 f8 7e rlwinm r7,r7,31,1,31 50: 7c e9 03 a6 mtctr r7 54: 7c 00 40 6c dcbst 0,r8 58: 39 08 00 20 addi r8,r8,32 5c: 7c 00 40 6c dcbst 0,r8 60: 39 08 00 20 addi r8,r8,32 64: 42 00 ff f0 bdnz 54 <__flush_dcache_icache+0x30> 68: 7c 00 04 ac hwsync 6c: 71 28 00 01 andi. r8,r9,1 70: 39 09 ff ff addi r8,r9,-1 74: 40 82 00 2c bne a0 <__flush_dcache_icache+0x7c> 78: 55 29 f8 7e rlwinm r9,r9,31,1,31 7c: 7d 29 03 a6 mtctr r9 80: 7c 00 57 ac icbi 0,r10 84: 39 4a 00 20 addi r10,r10,32 88: 7c 00 57 ac icbi 0,r10 8c: 39 4a 00 20 addi r10,r10,32 90: 42 00 ff f0 bdnz 80 <__flush_dcache_icache+0x5c> 94: 7c 00 04 ac hwsync 98: 4c 00 01 2c isync 9c: 4e 80 00 20 blr a0: 7c 00 57 ac icbi 0,r10 a4: 2c 08 00 00 cmpwi r8,0 a8: 39 4a 00 20 addi r10,r10,32 ac: 40 82 ff cc bne 78 <__flush_dcache_icache+0x54> b0: 4b ff ff e4 b 94 <__flush_dcache_icache+0x70> b4: 7c 00 50 6c dcbst 0,r10 b8: 2c 06 00 00 cmpwi r6,0 bc: 39 0a 00 20 addi r8,r10,32 c0: 40 82 ff 8c bne 4c <__flush_dcache_icache+0x28> c4: 4b ff ff a4 b 68 <__flush_dcache_icache+0x44> c8: 7c 00 04 ac hwsync cc: 7c 00 04 ac hwsync d0: 4c 00 01 2c isync d4: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/23030822ea5cd0a122948b10226abe56602dc027.1617895813.git.christophe.leroy@csgroup.eu
2021-04-14powerpc/mem: flush_dcache_icache_phys() is for HIGHMEM pages onlyChristophe Leroy1-8/+9
__flush_dcache_icache() is usable for non HIGHMEM pages on every platform. It is only for HIGHMEM pages that BOOKE needs kmap() and BOOK3S needs flush_dcache_icache_phys(). So make flush_dcache_icache_phys() dependent on CONFIG_HIGHMEM and call it only when it is a HIGHMEM page. We could make flush_dcache_icache_phys() available at all time, but as it is declared NOKPROBE_SYMBOL(), GCC doesn't optimise it out when it is not used. So define a stub for !CONFIG_HIGHMEM in order to remove the #ifdef in flush_dcache_icache_page() and use IS_ENABLED() instead. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/79ed5d7914f497cd5fcd681ca2f4d50a91719455.1617895813.git.christophe.leroy@csgroup.eu
2021-04-14powerpc/mem: Optimise flush_dcache_icache_hugepage()Christophe Leroy1-6/+6
flush_dcache_icache_hugepage() is a static function, with only one caller. That caller calls it when PageCompound() is true, so bugging on !PageCompound() is useless if we can trust the compiler a little. Remove the BUG_ON(!PageCompound()). The number of elements of a page won't change over time, but GCC doesn't know about it, so it gets the value at every iteration. To avoid that, call compound_nr() outside the loop and save it in a local variable. Whether the page is a HIGHMEM page or not doesn't change over time. But GCC doesn't know it so it does the test on every iteration. Do the test outside the loop. When the page is not a HIGHMEM page, page_address() will fallback on lowmem_page_address(), so call lowmem_page_address() directly and don't suffer the call to page_address() on every iteration. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/ab03712b70105fccfceef095aa03007de9295a40.1617895813.git.christophe.leroy@csgroup.eu
2021-04-14powerpc/mem: Call flush_coherent_icache() at higher levelChristophe Leroy1-8/+3
flush_coherent_icache() doesn't need the address anymore, so it can be called immediately when entering the public functions and doesn't need to be disseminated among lower level functions. And use page_to_phys() instead of open coding the calculation of phys address to call flush_dcache_icache_phys(). Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/5f063986e325d2efdd404b8f8c5f4bcbd4eb11a6.1617895813.git.christophe.leroy@csgroup.eu
2021-04-14powerpc/mem: Remove address argument to flush_coherent_icache()Christophe Leroy1-8/+5
flush_coherent_icache() can use any valid address as mentionned by the comment. Use PAGE_OFFSET as base address. This allows removing the user access stuff. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/742b6360ae4f344a1c6ecfadcf3b6645f443fa7a.1617895813.git.christophe.leroy@csgroup.eu
2021-04-14powerpc/mem: Declare __flush_dcache_icache() staticChristophe Leroy1-30/+30
__flush_dcache_icache() is only used in mem.c. Move it before the functions that use it and declare it static. And also fix the name of the parameter in the comment. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/3fa903eb5a10b2bc7d99a8c559ffdaa05452d8e0.1617895813.git.christophe.leroy@csgroup.eu
2021-04-14powerpc/mem: Move cache flushing functions into mm/cacheflush.cChristophe Leroy3-282/+257
Cache flushing functions are in the middle of completely unrelated stuff in mm/mem.c Create a dedicated mm/cacheflush.c for those functions. Also cleanup the list of included headers. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/7bf6f1600acad146e541a4e220940062f2e5b03d.1617895813.git.christophe.leroy@csgroup.eu
2021-04-14powerpc/32s: Define a MODULE area below kernel text all the timeChristophe Leroy1-7/+0
On book3s/32, the segment below kernel text is used for module allocation when CONFIG_STRICT_KERNEL_RWX is defined. In order to benefit from the powerpc specific module_alloc() function which allocate modules with 32 Mbytes from end of kernel text, use that segment below PAGE_OFFSET at all time. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/a46dcdd39a9e80b012d86c294c4e5cd8d31665f3.1617283827.git.christophe.leroy@csgroup.eu
2021-04-14powerpc/mm: Add cond_resched() while removing hpte mappingsVaibhav Jain1-1/+12
While removing large number of mappings from hash page tables for large memory systems as soft-lockup is reported because of the time spent inside htap_remove_mapping() like one below: watchdog: BUG: soft lockup - CPU#8 stuck for 23s! <snip> NIP plpar_hcall+0x38/0x58 LR pSeries_lpar_hpte_invalidate+0x68/0xb0 Call Trace: 0x1fffffffffff000 (unreliable) pSeries_lpar_hpte_removebolted+0x9c/0x230 hash__remove_section_mapping+0xec/0x1c0 remove_section_mapping+0x28/0x3c arch_remove_memory+0xfc/0x150 devm_memremap_pages_release+0x180/0x2f0 devm_action_release+0x30/0x50 release_nodes+0x28c/0x300 device_release_driver_internal+0x16c/0x280 unbind_store+0x124/0x170 drv_attr_store+0x44/0x60 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 __vfs_write+0x3c/0x70 vfs_write+0xd4/0x270 ksys_write+0xdc/0x130 system_call+0x5c/0x70 Fix this by adding a cond_resched() to the loop in htap_remove_mapping() that issues hcall to remove hpte mapping. The call to cond_resched() is issued every HZ jiffies which should prevent the soft-lockup from being reported. Suggested-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Vaibhav Jain <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-04-08powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPARMichael Ellerman1-1/+104
When we enabled STRICT_KERNEL_RWX we received some reports of boot failures when using the Hash MMU and running under phyp. The crashes are intermittent, and often exhibit as a completely unresponsive system, or possibly an oops. One example, which was caught in xmon: [ 14.068327][ T1] devtmpfs: mounted [ 14.069302][ T1] Freeing unused kernel memory: 5568K [ 14.142060][ T347] BUG: Unable to handle kernel instruction fetch [ 14.142063][ T1] Run /sbin/init as init process [ 14.142074][ T347] Faulting instruction address: 0xc000000000004400 cpu 0x2: Vector: 400 (Instruction Access) at [c00000000c7475e0] pc: c000000000004400: exc_virt_0x4400_instruction_access+0x0/0x80 lr: c0000000001862d4: update_rq_clock+0x44/0x110 sp: c00000000c747880 msr: 8000000040001031 current = 0xc00000000c60d380 paca = 0xc00000001ec9de80 irqmask: 0x03 irq_happened: 0x01 pid = 347, comm = kworker/2:1 ... enter ? for help [c00000000c747880] c0000000001862d4 update_rq_clock+0x44/0x110 (unreliable) [c00000000c7478f0] c000000000198794 update_blocked_averages+0xb4/0x6d0 [c00000000c7479f0] c000000000198e40 update_nohz_stats+0x90/0xd0 [c00000000c747a20] c0000000001a13b4 _nohz_idle_balance+0x164/0x390 [c00000000c747b10] c0000000001a1af8 newidle_balance+0x478/0x610 [c00000000c747be0] c0000000001a1d48 pick_next_task_fair+0x58/0x480 [c00000000c747c40] c000000000eaab5c __schedule+0x12c/0x950 [c00000000c747cd0] c000000000eab3e8 schedule+0x68/0x120 [c00000000c747d00] c00000000016b730 worker_thread+0x130/0x640 [c00000000c747da0] c000000000174d50 kthread+0x1a0/0x1b0 [c00000000c747e10] c00000000000e0f0 ret_from_kernel_thread+0x5c/0x6c This shows that CPU 2, which was idle, woke up and then appears to randomly take an instruction fault on a completely valid area of kernel text. The cause turns out to be the call to hash__mark_rodata_ro(), late in boot. Due to the way we layout text and rodata, that function actually changes the permissions for all of text and rodata to read-only plus execute. To do the permission change we use a hypervisor call, H_PROTECT. On phyp that appears to be implemented by briefly removing the mapping of the kernel text, before putting it back with the updated permissions. If any other CPU is executing during that window, it will see spurious faults on the kernel text and/or data, leading to crashes. To fix it we use stop machine to collect all other CPUs, and then have them drop into real mode (MMU off), while we change the mapping. That way they are unaffected by the mapping temporarily disappearing. We don't see this bug on KVM because KVM always use VPM=1, where faults are directed to the hypervisor, and the fault will be serialised vs the h_protect() by HPTE_V_HVLOCK. Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-04-08powerpc/mm/64s/hash: Factor out change_memory_range()Michael Ellerman1-8/+15
Pull the loop calling hpte_updateboltedpp() out of hash__change_memory_range() into a helper function. We need it to be a separate function for the next patch. Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-04-08powerpc/64s: Use htab_convert_pte_flags() in hash__mark_rodata_ro()Michael Ellerman1-2/+4
In hash__mark_rodata_ro() we pass the raw PP_RXXX value to hash__change_memory_range(). That has the effect of setting the key to zero, because PP_RXXX contains no key value. Fix it by using htab_convert_pte_flags(), which knows how to convert a pgprot into a pp value, including the key. Fixes: d94b827e89dc ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation") Signed-off-by: Michael Ellerman <[email protected]> Reviewed-by: Daniel Axtens <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-04-08powerpc/64s: Fix pte update for kernel memory on radixJordan Niethe1-2/+2
When adding a PTE a ptesync is needed to order the update of the PTE with subsequent accesses otherwise a spurious fault may be raised. radix__set_pte_at() does not do this for performance gains. For non-kernel memory this is not an issue as any faults of this kind are corrected by the page fault handler. For kernel memory these faults are not handled. The current solution is that there is a ptesync in flush_cache_vmap() which should be called when mapping from the vmalloc region. However, map_kernel_page() does not call flush_cache_vmap(). This is troublesome in particular for code patching with Strict RWX on radix. In do_patch_instruction() the page frame that contains the instruction to be patched is mapped and then immediately patched. With no ordering or synchronization between setting up the PTE and writing to the page it is possible for faults. As the code patching is done using __put_user_asm_goto() the resulting fault is obscured - but using a normal store instead it can be seen: BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c Faulting instruction address: 0xc00000000008bd74 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: nop_module(PO+) [last unloaded: nop_module] CPU: 4 PID: 757 Comm: sh Tainted: P O 5.10.0-rc5-01361-ge3c1b78c8440-dirty #43 NIP: c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810 REGS: c000000016f634a0 TRAP: 0300 Tainted: P O (5.10.0-rc5-01361-ge3c1b78c8440-dirty) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44002884 XER: 00000000 CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1 This results in the kind of issue reported here: https://lore.kernel.org/linuxppc-dev/[email protected]/ Chris Riedl suggested a reliable way to reproduce the issue: $ mount -t debugfs none /sys/kernel/debug $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) & Turning ftrace on and off does a large amount of code patching which in usually less then 5min will crash giving a trace like: ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000) ------------[ ftrace bug ]------------ ftrace failed to modify [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390 actual: 11:3b:47:4b Setting ftrace call site to call ftrace function ftrace record flags: 80000001 (1) expected tramp: c00000000006c96c ------------[ cut here ]------------ WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8 Modules linked in: nop_module(PO-) [last unloaded: nop_module] CPU: 4 PID: 809 Comm: sh Tainted: P O 5.10.0-rc5-01360-gf878ccaf250a #1 NIP: c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0 REGS: c000000004c8b760 TRAP: 0700 Tainted: P O (5.10.0-rc5-01360-gf878ccaf250a) MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28008848 XER: 20040000 CFAR: c0000000001a9c98 IRQMASK: 0 GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022 GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8 GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118 GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000 GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008 GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8 GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020 GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0 NIP ftrace_bug+0x28c/0x2e8 LR ftrace_bug+0x288/0x2e8 Call Trace: ftrace_bug+0x288/0x2e8 (unreliable) ftrace_modify_all_code+0x168/0x210 arch_ftrace_update_code+0x18/0x30 ftrace_run_update_code+0x44/0xc0 ftrace_startup+0xf8/0x1c0 register_ftrace_function+0x4c/0xc0 function_trace_init+0x80/0xb0 tracing_set_tracer+0x2a4/0x4f0 tracing_set_trace_write+0xd4/0x130 vfs_write+0xf0/0x330 ksys_write+0x84/0x140 system_call_exception+0x14c/0x230 system_call_common+0xf0/0x27c To fix this when updating kernel memory PTEs using ptesync. Fixes: f1cb8f9beba8 ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags") Signed-off-by: Jordan Niethe <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> [mpe: Tidy up change log slightly] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-04-08powerpc: Spelling/typo fixesBhaskar Chowdhury1-1/+1
Various spelling/typo fixes. Signed-off-by: Bhaskar Chowdhury <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2021-03-29powerpc/64s: Fix hash fault to use TRAP accessorNicholas Piggin1-2/+2
Hash faults use the trap vector to decide whether this is an instruction or data fault. This should use the TRAP accessor rather than open access regs->trap. This won't cause a problem at the moment because 64s only uses trap flags for system call interrupts (the norestart flag), but that could change if any other trap flags get used in future. Fixes: a4922f5442e7e ("powerpc/64s: move the hash fault handling logic to C") Suggested-by: Christophe Leroy <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29powerpc/mm: Remove unneeded #ifdef CONFIG_PPC_MEM_KEYSChristophe Leroy1-6/+1
In fault.c, #ifdef CONFIG_PPC_MEM_KEYS is not needed because all functions are always defined, and arch_vma_access_permitted() always returns true when CONFIG_PPC_MEM_KEYS is not defined so access_pkey_error() will return false so bad_access_pkey() will never be called. Include linux/pkeys.h to get a definition of vma_pkeys() for bad_access_pkey(). Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/8038392f38d81f2ad169347efac29146f553b238.1615819955.git.christophe.leroy@csgroup.eu
2021-03-29powerpc/64s: Fold update_current_thread_[i]amr() into their only callersMichael Ellerman1-15/+5
lkp reported warnings in some configuration due to update_current_thread_amr() being unused: arch/powerpc/mm/book3s64/pkeys.c:284:20: error: unused function 'update_current_thread_amr' static inline void update_current_thread_amr(u64 value) Which is because it's only use is inside an ifdef. We could move it inside the ifdef, but it's a single line function and only has one caller, so just fold it in. Similarly update_current_thread_iamr() is small and only called once, so fold it in also. Fixes: 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.") Reported-by: kernel test robot <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29powerpc/mm/book3s64: Fix a typo in mmu_context.cBhaskar Chowdhury1-1/+1
s/detalis/details/ Signed-off-by: Bhaskar Chowdhury <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-03-29powerpc/32s: Move KUEP locking/unlocking in CChristophe Leroy2-0/+41
This can be done in C, do it. Unrolling the loop gains approx. 15% performance. From now on, prepare_transfer_to_handler() is only for interrupts from kernel. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/4eadd873927e9a73c3d1dfe2f9497353465514cf.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29powerpc/32: Call bad_page_fault() from do_page_fault()Christophe Leroy1-1/+1
Now that non volatile registers are saved at all time, no need to split bad_page_fault() out of do_page_fault(). Remove handle_page_fault() and use do_page_fault() directly. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/cfb95be8863204cc2bf45a22ea44dd1d0dc16b7f.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29powerpc/32: Always enable data translation in exception prologChristophe Leroy1-14/+0
If the code can use a stack in vm area, it can also use a stack in linear space. Simplify code by removing old non VMAP stack code on PPC32. That means the data translation is now re-enabled early in exception prolog in all cases, not only when using VMAP stacks. While we are touching EXCEPTION_PROLOG macros, remove the unused for_rtas parameter in EXCEPTION_PROLOG_1. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/7cd6440c60a7e8f4f035b245c57720f51e225aae.1615552866.git.christophe.leroy@csgroup.eu