aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-10-29powerpc/32e: Ignore ESR in instruction storage interrupt handlerNicholas Piggin1-3/+12
A e5500 machine running a 32-bit kernel sometimes hangs at boot, seemingly going into an infinite loop of instruction storage interrupts. The ESR (Exception Syndrome Register) has a value of 0x800000 (store) when this happens, which is likely set by a previous store. An instruction TLB miss interrupt would then leave ESR unchanged, and if no PTE exists it calls directly to the instruction storage interrupt handler without changing ESR. access_error() does not cause a segfault due to a store to a read-only vma because is_exec is true. Most subsequent fault handling does not check for a write fault on a read-only vma, and might do strange things like create a writeable PTE or call page_mkwrite on a read only vma or file. It's not clear what happens here to cause the infinite faulting in this case, a fault handler failure or low level PTE or TLB handling. In any case this can be fixed by having the instruction storage interrupt zero regs->dsisr rather than storing the ESR value to it. Fixes: a01a3f2ddbcd ("powerpc: remove arguments from fault handler functions") Cc: [email protected] # v5.12+ Reported-by: Jacques de Laval <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Tested-by: Jacques de Laval <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-29powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unloadVasant Hegde1-1/+11
Commit 587164cd, introduced new opal message type (OPAL_MSG_PRD2) and added opal notifier. But I missed to unregister the notifier during module unload path. This results in below call trace if you try to unload and load opal_prd module. Also add new notifier_block for OPAL_MSG_PRD2 message. Sample calltrace (modprobe -r opal_prd; modprobe opal_prd) BUG: Unable to handle kernel data access on read at 0xc0080000192200e0 Faulting instruction address: 0xc00000000018d1cc Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV CPU: 66 PID: 7446 Comm: modprobe Kdump: loaded Tainted: G E 5.14.0prd #759 NIP: c00000000018d1cc LR: c00000000018d2a8 CTR: c0000000000cde10 REGS: c0000003c4c0f0a0 TRAP: 0300 Tainted: G E (5.14.0prd) MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 24224824 XER: 20040000 CFAR: c00000000018d2a4 DAR: c0080000192200e0 DSISR: 40000000 IRQMASK: 1 ... NIP notifier_chain_register+0x2c/0xc0 LR atomic_notifier_chain_register+0x48/0x80 Call Trace: 0xc000000002090610 (unreliable) atomic_notifier_chain_register+0x58/0x80 opal_message_notifier_register+0x7c/0x1e0 opal_prd_probe+0x84/0x150 [opal_prd] platform_probe+0x78/0x130 really_probe+0x110/0x5d0 __driver_probe_device+0x17c/0x230 driver_probe_device+0x60/0x130 __driver_attach+0xfc/0x220 bus_for_each_dev+0xa8/0x130 driver_attach+0x34/0x50 bus_add_driver+0x1b0/0x300 driver_register+0x98/0x1a0 __platform_driver_register+0x38/0x50 opal_prd_driver_init+0x34/0x50 [opal_prd] do_one_initcall+0x60/0x2d0 do_init_module+0x7c/0x320 load_module+0x3394/0x3650 __do_sys_finit_module+0xd4/0x160 system_call_exception+0x140/0x290 system_call_common+0xf4/0x258 Fixes: 587164cd593c ("powerpc/powernv: Add new opal message type") Cc: [email protected] # v5.4+ Signed-off-by: Vasant Hegde <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-29powerpc: Don't provide __kernel_map_pages() without ↵Christophe Leroy1-1/+1
ARCH_SUPPORTS_DEBUG_PAGEALLOC When ARCH_SUPPORTS_DEBUG_PAGEALLOC is not selected, the user can still select CONFIG_DEBUG_PAGEALLOC in which case __kernel_map_pages() is provided by mm/page_poison.c So only define __kernel_map_pages() when both CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC are defined. Fixes: 68b44f94d637 ("powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE") Reported-by: kernel test robot <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/971b69739ff4746252e711a9845210465c023a9e.1635425947.git.christophe.leroy@csgroup.eu
2021-10-29Merge branch 'topic/ppc-kvm' into nextMichael Ellerman4-3/+56
Merge a couple of KVM ppc patches we are keeping in a topic branch.
2021-10-28MAINTAINERS: Update powerpc KVM entryMichael Ellerman1-5/+2
Paul is no longer handling patches for kvmppc. Instead we'll treat them as regular powerpc patches, taking them via the powerpc tree, using the topic/ppc-kvm branch when necessary. Also drop the web reference, it doesn't have any information specifically relevant to powerpc KVM. Signed-off-by: Michael Ellerman <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-28powerpc/xmon: fix task state outputDenis Kirjanov1-2/+1
p_state is unsigned since the commit 2f064a59a11f The patch also uses TASK_RUNNING instead of null. Fixes: 2f064a59a11f ("sched: Change task_struct::state") Signed-off-by: Denis Kirjanov <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-28powerpc/44x/fsp2: add missing of_node_putBixuan Cui1-0/+2
Early exits from for_each_compatible_node() should decrement the node reference counter. Reported by Coccinelle: ./arch/powerpc/platforms/44x/fsp2.c:206:1-25: WARNING: Function "for_each_compatible_node" should have of_node_put() before return around line 218. Fixes: 7813043e1bbc ("powerpc/44x/fsp2: Add irq error handlers") Signed-off-by: Bixuan Cui <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-28powerpc/dcr: Use cmplwi instead of 3-argument cmpliMichael Ellerman1-1/+1
In dcr-low.S we use cmpli with three arguments, instead of four arguments as defined in the ISA: cmpli cr0,r3,1024 This appears to be a PPC440-ism, looking at the "PPC440x5 CPU Core User’s Manual" it shows cmpli having no L field, but implied to be 0 due to the core being 32-bit. It mentions that the ISA defines four arguments and recommends using cmplwi. It also corresponds to the old POWER instruction set, which had no L field there, a reserved bit instead. dcr-low.S is only built 32-bit, because it is only built when DCR_NATIVE=y, which is only selected by 40x and 44x. Looking at the generated code (with gcc/gas) we see cmplwi as expected. Although gas is happy with the 3-argument version when building for 32-bit, the LLVM assembler is not and errors out with: arch/powerpc/sysdev/dcr-low.S:27:10: error: invalid operand for instruction cmpli 0,%r3,1024; ... ^ Switch to the cmplwi extended opcode, which avoids any confusion when reading the ISA, fixes the issue with the LLVM assembler, and also means the code could be built 64-bit in future (though that's very unlikely). Reported-by: Nick Desaulniers <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> BugLink: https://github.com/ClangBuiltLinux/linux/issues/1419 Link: https://lore.kernel.org/r/[email protected]
2021-10-28KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handlingLaurent Vivier2-3/+43
Commit 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs") moved guest_exit() into the interrupt protected area to avoid wrong context warning (or worse). The problem is that tick-based time accounting has not yet been updated at this point (because it depends on the timer interrupt firing), so the guest time gets incorrectly accounted to system time. To fix the problem, follow the x86 fix in commit 160457140187 ("Defer vtime accounting 'til after IRQ handling"), and allow host IRQs to run before accounting the guest exit time. In the case vtime accounting is enabled, this is not required because TB is used directly for accounting. Before this patch, with CONFIG_TICK_CPU_ACCOUNTING=y in the host and a guest running a kernel compile, the 'guest' fields of /proc/stat are stuck at zero. With the patch they can be observed increasing roughly as expected. Fixes: e233d54d4d97 ("KVM: booke: use __kvm_guest_exit") Fixes: 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs") Cc: [email protected] # 5.12+ Signed-off-by: Laurent Vivier <[email protected]> [np: only required for tick accounting, add Book3E fix, tweak changelog] Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-28powerpc/security: Use a mutex for interrupt exit code patchingRussell Currey1-0/+11
The mitigation-patching.sh script in the powerpc selftests toggles all mitigations on and off simultaneously, revealing that rfi_flush and stf_barrier cannot safely operate at the same time due to races in updating the static key. On some systems, the static key code throws a warning and the kernel remains functional. On others, the kernel will hang or crash. Fix this by slapping on a mutex. Fixes: 13799748b957 ("powerpc/64: use interrupt restart table to speed up return from interrupt") Cc: [email protected] # v5.14+ Signed-off-by: Russell Currey <[email protected]> Acked-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-28powerpc/83xx/mpc8349emitx: Make mcu_gpiochip_remove() return voidUwe Kleine-König1-5/+2
Up to now mcu_gpiochip_remove() returns zero unconditionally. Make it return void instead which makes it easier to see in the callers that there is no error to handle. Also the return value of i2c remove callbacks is ignored anyway. Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-28powerpc/fsl_booke: Fix setting of exec flag when setting TLBCAMsChristophe Leroy1-1/+2
Building tqm8541_defconfig results in: arch/powerpc/mm/nohash/fsl_book3e.c: In function 'settlbcam': arch/powerpc/mm/nohash/fsl_book3e.c:126:40: error: '_PAGE_BAP_SX' undeclared (first use in this function) 126 | TLBCAM[index].MAS3 |= (flags & _PAGE_BAP_SX) ? MAS3_SX : 0; | ^~~~~~~~~~~~ arch/powerpc/mm/nohash/fsl_book3e.c:126:40: note: each undeclared identifier is reported only once for each function it appears in make[3]: *** [scripts/Makefile.build:277: arch/powerpc/mm/nohash/fsl_book3e.o] Error 1 make[2]: *** [scripts/Makefile.build:540: arch/powerpc/mm/nohash] Error 2 make[1]: *** [scripts/Makefile.build:540: arch/powerpc/mm] Error 2 make: *** [Makefile:1868: arch/powerpc] Error 2 This is because _PAGE_BAP_SX is not defined when using 32 bits PTE. Now that _PAGE_EXEC contains both _PAGE_BAP_SX and _PAGE_BAP_UX, it can be used instead. Fixes: 01116e6e98b0 ("powerpc/fsl_booke: Take exec flag into account when setting TLBCAMs") Reported-by: kernel test robot <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/91a0235e7f2a85308b84aa5b9efd8d022e2b899a.1635226743.git.christophe.leroy@csgroup.eu
2021-10-28powerpc/book3e: Fix set_memory_x() and set_memory_nx()Christophe Leroy4-13/+20
set_memory_x() calls pte_mkexec() which sets _PAGE_EXEC. set_memory_nx() calls pte_exprotec() which clears _PAGE_EXEC. Book3e has 2 bits, UX and SX, which defines the exec rights resp. for user (PR=1) and for kernel (PR=0). _PAGE_EXEC is defined as UX only. An executable kernel page is set with either _PAGE_KERNEL_RWX or _PAGE_KERNEL_ROX, which both have SX set and UX cleared. So set_memory_nx() call for an executable kernel page does nothing because UX is already cleared. And set_memory_x() on a non-executable kernel page makes it executable for the user and keeps it non-executable for kernel. Also, pte_exec() always returns 'false' on kernel pages, because it checks _PAGE_EXEC which doesn't include SX, so for instance the W+X check doesn't work. To fix this: - change tlb_low_64e.S to use _PAGE_BAP_UX instead of _PAGE_USER - sets both UX and SX in _PAGE_EXEC so that pte_exec() returns true whenever one of the two bits is set and pte_exprotect() clears both bits. - Define a book3e specific version of pte_mkexec() which sets either SX or UX based on UR. Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/c41100f9c144dc5b62e5a751b810190c6b5d42fd.1635226743.git.christophe.leroy@csgroup.eu
2021-10-28powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect()Christophe Leroy2-9/+30
Commit 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code") changed those two functions to use pte helpers to determine which bits to clear and which bits to set. This change was based on the assumption that bits to be set/cleared are always the same and can be determined by applying the pte manipulation helpers on __pte(0). But on platforms like book3e, the bits depend on whether the page is a user page or not. For the time being it more or less works because of _PAGE_EXEC being used for user pages only and exec right being set at all time on kernel page. But following patch will clean that and output of pte_mkexec() will depend on the page being a user or kernel page. Instead of trying to make an even more complicated helper where bits would become dependent on the final pte value, come back to a more static situation like before commit 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code"), by introducing an 8xx specific version of __ptep_set_access_flags() and ptep_set_wrprotect(). Fixes: 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/922bdab3a220781bae2360ff3dd5adb7fe4d34f1.1635226743.git.christophe.leroy@csgroup.eu
2021-10-28powerpc/bpf: Fix write protecting JIT codeHari Bathini1-1/+1
Running program with bpf-to-bpf function calls results in data access exception (0x300) with the below call trace: bpf_int_jit_compile+0x238/0x750 (unreliable) bpf_check+0x2008/0x2710 bpf_prog_load+0xb00/0x13a0 __sys_bpf+0x6f4/0x27c0 sys_bpf+0x2c/0x40 system_call_exception+0x164/0x330 system_call_vectored_common+0xe8/0x278 as bpf_int_jit_compile() tries writing to write protected JIT code location during the extra pass. Fix it by holding off write protection of JIT code until the extra pass, where branch target addresses fixup happens. Fixes: 62e3d4210ac9 ("powerpc/bpf: Write protect JIT code") Cc: [email protected] # v5.14+ Signed-off-by: Hari Bathini <[email protected]> Reviewed-by: Naveen N. Rao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-27selftests/powerpc: Use date instead of EPOCHSECONDS in mitigation-patching.shRussell Currey1-2/+2
The EPOCHSECONDS environment variable was added in bash 5.0 (released 2019). Some distributions of the "stable" and "long-term" variety ship older versions of bash than this, so swap to using the date command instead. "%s" was added to coreutils `date` in 1993 so we should be good, but who knows, it is a GNU extension and not part of the POSIX spec for `date`. Signed-off-by: Russell Currey <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-27powerpc/64s/interrupt: Fix check_return_regs_valid() false positiveNicholas Piggin1-1/+1
The check_return_regs_valid() can cause a false positive if the return regs are marked as norestart and they are an HSRR type interrupt, because the low bit in the bottom of regs->trap causes interrupt type matching to fail. This can occcur for example on bare metal with a HV privileged doorbell interrupt that causes a signal, but do_signal returns early because get_signal() fails, and takes the "No signal to deliver" path. In this case no signal was delivered so the return location is not changed so return SRRs are not invalidated, yet set_trap_norestart is called, which messes up the match. Building go-1.16.6 is known to reproduce this. Fix it by using the TRAP() accessor which masks out the low bit. Fixes: 6eaaf9de3599 ("powerpc/64s/interrupt: Check and fix srr_valid without crashing") Cc: [email protected] # v5.14+ Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-27powerpc/boot: Set LC_ALL=C in wrapper scriptChristophe Leroy1-0/+2
While trying to build a simple Image for ACADIA platform, I got the following error: WRAP arch/powerpc/boot/simpleImage.acadia INFO: Uncompressed kernel (size 0x6ae7d0) overlaps the address of the wrapper(0x400000) INFO: Fixing the link_address of wrapper to (0x700000) powerpc64-linux-gnu-ld : mode d'émulation non reconnu : -T Émulations prises en charge : elf64ppc elf32ppc elf32ppclinux elf32ppcsim elf64lppc elf32lppc elf32lppclinux elf32lppcsim make[1]: *** [arch/powerpc/boot/Makefile:424 : arch/powerpc/boot/simpleImage.acadia] Erreur 1 make: *** [arch/powerpc/Makefile:285 : simpleImage.acadia] Erreur 2 Trying again with V=1 shows the following command powerpc64-linux-gnu-ld -m -T arch/powerpc/boot/zImage.lds -Ttext 0x700000 --no-dynamic-linker -o arch/powerpc/boot/simpleImage.acadia -Map wrapper.map arch/powerpc/boot/fixed-head.o arch/powerpc/boot/simpleboot.o ./zImage.3278022.o arch/powerpc/boot/wrapper.a The argument of '-m' is missing. This is due to the wrapper script calling 'objdump -p vmlinux' and looking for 'file format', whereas the output of objdump is: vmlinux: format de fichier elf32-powerpc En-tête de programme: LOAD off 0x00010000 vaddr 0xc0000000 paddr 0x00000000 align 2**16 filesz 0x0069e1d4 memsz 0x006c128c flags rwx NOTE off 0x0064591c vaddr 0xc063591c paddr 0x0063591c align 2**2 filesz 0x00000054 memsz 0x00000054 flags --- Add LC_ALL=C at the beginning of the wrapper script in order to get the output expected by the script: vmlinux: file format elf32-powerpc Program Header: LOAD off 0x00010000 vaddr 0xc0000000 paddr 0x00000000 align 2**16 filesz 0x0069e1d4 memsz 0x006c128c flags rwx NOTE off 0x0064591c vaddr 0xc063591c paddr 0x0063591c align 2**2 filesz 0x00000054 memsz 0x00000054 flags --- Signed-off-by: Christophe Leroy <[email protected]> Acked-by: Segher Boessenkool <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/a9ff3bc98035f63b122c051f02dc47c7aed10430.1635256089.git.christophe.leroy@csgroup.eu
2021-10-27powerpc/64s: Default to 64K pages for 64 bit book3sJoel Stanley11-6/+5
For 64-bit book3s the default should be 64K as that's what modern CPUs are designed for. The following defconfigs already set CONFIG_PPC_64K_PAGES: cell_defconfig pasemi_defconfig powernv_defconfig ppc64_defconfig pseries_defconfig skiroot_defconfig The have the option removed from the defconfig, as it is now the default. The defconfigs that now need to set CONFIG_PPC_4K_PAGES to maintain their existing behaviour are: g5_defconfig maple_defconfig microwatt_defconfig ps3_defconfig Signed-off-by: Joel Stanley <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> BugLink: https://github.com/linuxppc/issues/issues/109 Link: https://lore.kernel.org/r/[email protected]
2021-10-27Revert "powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC"Michael Ellerman5-8/+135
This reverts commit 566af8cda399c088763d07464463dc871c943b54. This caused some conflicts vs the audit tree, and the audit maintainers would prefer we postpone this to the next merge window so we have more time for testing. Signed-off-by: Michael Ellerman <[email protected]>
2021-10-22powerpc/pseries/mobility: ignore ibm, platform-facilities updatesNathan Lynch1-0/+34
On VMs with NX encryption, compression, and/or RNG offload, these capabilities are described by nodes in the ibm,platform-facilities device tree hierarchy: $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/ /sys/firmware/devicetree/base/ibm,platform-facilities/ ├── ibm,compression-v1 ├── ibm,random-v1 └── ibm,sym-encryption-v1 3 directories The acceleration functions that these nodes describe are not disrupted by live migration, not even temporarily. But the post-migration ibm,update-nodes sequence firmware always sends "delete" messages for this hierarchy, followed by an "add" directive to reconstruct it via ibm,configure-connector (log with debugging statements enabled in mobility.c): mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285 mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284 mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283 mobility: removing node /ibm,platform-facilities:4294967286 ... mobility: added node /ibm,platform-facilities:4294967286 Note we receive a single "add" message for the entire hierarchy, and what we receive from the ibm,configure-connector sequence is the top-level platform-facilities node along with its three children. The debug message simply reports the parent node and not the whole subtree. Also, significantly, the nodes added are almost completely equivalent to the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in the leaf nodes is the only property I've observed to differ, and Linux does not use that. So in practice, the sum of update messages Linux receives for this hierarchy is equivalent to minor property updates. We succeed in removing the original hierarchy from the device tree. But the vio bus code is ignorant of this, and does not unbind or relinquish its references. The leaf nodes, still reachable through sysfs, of course still refer to the now-freed ibm,platform-facilities parent node, which makes use-after-free possible: refcount_t: addition on 0; use-after-free. WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0 refcount_warn_saturate+0x160/0x1f0 (unreliable) kobject_get+0xf0/0x100 of_node_get+0x30/0x50 of_get_parent+0x50/0xb0 of_fwnode_get_parent+0x54/0x90 fwnode_count_parents+0x50/0x150 fwnode_full_name_string+0x30/0x110 device_node_string+0x49c/0x790 vsnprintf+0x1c0/0x4c0 sprintf+0x44/0x60 devspec_show+0x34/0x50 dev_attr_show+0x40/0xa0 sysfs_kf_seq_show+0xbc/0x200 kernfs_seq_show+0x44/0x60 seq_read_iter+0x2a4/0x740 kernfs_fop_read_iter+0x254/0x2e0 new_sync_read+0x120/0x190 vfs_read+0x1d0/0x240 Moreover, the "new" replacement subtree is not correctly added to the device tree, resulting in ibm,platform-facilities parent node without the appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy: $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/ /sys/firmware/devicetree/base/ibm,platform-facilities/ 0 directories $ cd /sys/devices/vio ; find . -xtype l -exec file {} + ./ibm,sym-encryption-v1/of_node: broken symbolic link to ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1 ./ibm,random-v1/of_node: broken symbolic link to ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1 ./ibm,compression-v1/of_node: broken symbolic link to ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1 This is because add_dt_node() -> dlpar_attach_node() attaches only the parent node returned from configure-connector, ignoring any children. This should be corrected for the general case, but fixing that won't help with the stale OF node references, which is the more urgent problem. One way to address that would be to make the drivers respond to node removal notifications, so that node references can be dropped appropriately. But this would likely force the drivers to disrupt active clients for no useful purpose: equivalent nodes are immediately re-added. And recall that the acceleration capabilities described by the nodes remain available throughout the whole process. The solution I believe to be robust for this situation is to convert remove+add of a node with an unchanged phandle to an update of the node's properties in the Linux device tree structure. That would involve changing and adding a fair amount of code, and may take several iterations to land. Until that can be realized we have a confirmed use-after-free and the possibility of memory corruption. So add a limited workaround that discriminates on the node type, ignoring adds and removes. This should be amenable to backporting in the meantime. Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel") Cc: [email protected] Signed-off-by: Nathan Lynch <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-22powerpc/32: Don't use a struct based type for pte_tChristophe Leroy3-3/+19
Long time ago we had a config item called STRICT_MM_TYPECHECKS to build the kernel with pte_t defined as a structure in order to perform additional build checks or build it with pte_t defined as a simple type in order to get simpler generated code. Commit 670eea924198 ("powerpc/mm: Always use STRICT_MM_TYPECHECKS") made the struct based definition the only one, considering that the generated code was similar in both cases. That's right on ppc64 because the ABI is such that the content of a struct having a single simple type element is passed as register, but on ppc32 such a structure is passed via the stack like any structure. Simple test function: pte_t test(pte_t pte) { return pte; } Before this patch we get c00108ec <test>: c00108ec: 81 24 00 00 lwz r9,0(r4) c00108f0: 91 23 00 00 stw r9,0(r3) c00108f4: 4e 80 00 20 blr So, for PPC32, restore the simple type behaviour we got before commit 670eea924198, but instead of adding a config option to activate type check, do it when __CHECKER__ is set so that type checking is performed by 'sparse' and provides feedback like: arch/powerpc/mm/pgtable.c:466:16: warning: incorrect type in return expression (different base types) arch/powerpc/mm/pgtable.c:466:16: expected unsigned long arch/powerpc/mm/pgtable.c:466:16: got struct pte_t [usertype] x With this patch we now get c0010890 <test>: c0010890: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <[email protected]> [mpe: Define STRICT_MM_TYPECHECKS rather than repeating the condition] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/c904599f33aaf6bb7ee2836a9ff8368509e0d78d.1631887042.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/breakpoint: CleanupChristophe Leroy1-12/+3
cache_op_size() does exactly the same as l1_dcache_bytes(). Remove it. MSR_64BIT already exists, no need to enclode the check around #ifdef __powerpc64__ Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/6184b08088312a7d787d450eb902584e4ae77f7a.1632317816.git.christophe.leroy@csgroup.eu
2021-10-22powerpc: Activate CONFIG_STRICT_KERNEL_RWX by defaultChristophe Leroy1-0/+1
CONFIG_STRICT_KERNEL_RWX should be set by default on every architectures (See https://github.com/KSPP/linux/issues/4) On PPC32 we have to find a compromise between performance and/or memory wasting and selection of strict_kernel_rwx, because it implies either smaller memory chunks or larger alignment between RO memory and RW memory. For instance the 8xx maps memory with 8M pages. So either the limit between RO and RW must be 8M aligned or it falls back or 512k pages which implies more pressure on the TLB. book3s/32 maps memory with BATs as much as possible. BATS can have any power-of-two size between 128k and 256M but we have only 4 to 8 BATs so the alignment must be good enough to allow efficient use of the BATs and avoid falling back on standard page mapping which would kill performance. So let's go one step forward and make it the default but still allow users to unset it when wanted. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/057c40164084bfc7d77c0b2ff78d95dbf6a2a21b.1632503622.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/8xx: Simplify TLB handlingChristophe Leroy2-0/+17
In the old days, TLB handling for 8xx was using tlbie and tlbia instructions directly as much as possible. But commit f048aace29e0 ("powerpc/mm: Add SMP support to no-hash TLB handling") broke that by introducing out-of-line unnecessary complex functions for booke/smp which don't have tlbie/tlbia instructions and require more complex handling. Restore direct use of tlbie and tlbia for 8xx which is never SMP. With this patch we now get c00ecc68 <ptep_clear_flush>: c00ecc68: 39 00 00 00 li r8,0 c00ecc6c: 81 46 00 00 lwz r10,0(r6) c00ecc70: 91 06 00 00 stw r8,0(r6) c00ecc74: 7c 00 2a 64 tlbie r5,r0 c00ecc78: 7c 00 04 ac hwsync c00ecc7c: 91 43 00 00 stw r10,0(r3) c00ecc80: 4e 80 00 20 blr Before it was c0012880 <local_flush_tlb_page>: c0012880: 2c 03 00 00 cmpwi r3,0 c0012884: 41 82 00 54 beq c00128d8 <local_flush_tlb_page+0x58> c0012888: 81 22 00 00 lwz r9,0(r2) c001288c: 81 43 00 20 lwz r10,32(r3) c0012890: 39 29 00 01 addi r9,r9,1 c0012894: 91 22 00 00 stw r9,0(r2) c0012898: 2c 0a 00 00 cmpwi r10,0 c001289c: 41 82 00 10 beq c00128ac <local_flush_tlb_page+0x2c> c00128a0: 81 2a 01 dc lwz r9,476(r10) c00128a4: 2c 09 ff ff cmpwi r9,-1 c00128a8: 41 82 00 0c beq c00128b4 <local_flush_tlb_page+0x34> c00128ac: 7c 00 22 64 tlbie r4,r0 c00128b0: 7c 00 04 ac hwsync c00128b4: 81 22 00 00 lwz r9,0(r2) c00128b8: 39 29 ff ff addi r9,r9,-1 c00128bc: 2c 09 00 00 cmpwi r9,0 c00128c0: 91 22 00 00 stw r9,0(r2) c00128c4: 4c a2 00 20 bclr+ 4,eq c00128c8: 81 22 00 70 lwz r9,112(r2) c00128cc: 71 29 00 04 andi. r9,r9,4 c00128d0: 4d 82 00 20 beqlr c00128d4: 48 65 76 74 b c0669f48 <preempt_schedule> c00128d8: 81 22 00 00 lwz r9,0(r2) c00128dc: 39 29 00 01 addi r9,r9,1 c00128e0: 91 22 00 00 stw r9,0(r2) c00128e4: 4b ff ff c8 b c00128ac <local_flush_tlb_page+0x2c> ... c00ecdc8 <ptep_clear_flush>: c00ecdc8: 94 21 ff f0 stwu r1,-16(r1) c00ecdcc: 39 20 00 00 li r9,0 c00ecdd0: 93 c1 00 08 stw r30,8(r1) c00ecdd4: 83 c6 00 00 lwz r30,0(r6) c00ecdd8: 91 26 00 00 stw r9,0(r6) c00ecddc: 93 e1 00 0c stw r31,12(r1) c00ecde0: 7c 08 02 a6 mflr r0 c00ecde4: 7c 7f 1b 78 mr r31,r3 c00ecde8: 7c 83 23 78 mr r3,r4 c00ecdec: 7c a4 2b 78 mr r4,r5 c00ecdf0: 90 01 00 14 stw r0,20(r1) c00ecdf4: 4b f2 5a 8d bl c0012880 <local_flush_tlb_page> c00ecdf8: 93 df 00 00 stw r30,0(r31) c00ecdfc: 7f e3 fb 78 mr r3,r31 c00ece00: 80 01 00 14 lwz r0,20(r1) c00ece04: 83 c1 00 08 lwz r30,8(r1) c00ece08: 83 e1 00 0c lwz r31,12(r1) c00ece0c: 7c 08 03 a6 mtlr r0 c00ece10: 38 21 00 10 addi r1,r1,16 c00ece14: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/fb324f1c8f2ddb57cf6aad1cea26329558f1c1c0.1631887021.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/lib/sstep: Don't use __{get/put}_user() on kernel addressesChristophe Leroy1-57/+140
In the old days, when we didn't have kernel userspace access protection and had set_fs(), it was wise to use __get_user() and friends to read kernel memory. Nowadays, get_user() and put_user() are granting userspace access and are exclusively for userspace access. Convert single step emulation functions to user_access_begin() and friends and use unsafe_get_user() and unsafe_put_user(). When addressing kernel addresses, there is no need to open userspace access. And for book3s/32 it is particularly important to no try and open userspace access on kernel address, because that would break the content of kernel space segment registers. No guard has been put against that risk in order to avoid degrading performance. copy_from_kernel_nofault() and copy_to_kernel_nofault() should be used but they are out-of-line functions which would degrade performance. Those two functions are making use of __get_kernel_nofault() and __put_kernel_nofault() macros. Those two macros are just wrappers behind __get_user_size_goto() and __put_user_size_goto(). unsafe_get_user() and unsafe_put_user() are also wrappers of __get_user_size_goto() and __put_user_size_goto(). Use them to access kernel space. That allows refactoring userspace and kernelspace access. Reported-by: Stan Johnson <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Depends-on: 4fe5cda9f89d ("powerpc/uaccess: Implement user_read_access_begin and user_write_access_begin") Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/22831c9d17f948680a12c5292e7627288b15f713.1631817805.git.christophe.leroy@csgroup.eu
2021-10-22powerpc: warn on emulation of dcbz instruction in kernel modeChristophe Leroy1-0/+1
dcbz instruction shouldn't be used on non-cached memory. Using it on non-cached memory can result in alignment exception and implies a heavy handling. Instead of silentely emulating the instruction and resulting in high performance degradation, warn whenever an alignment exception is taken in kernel mode due to dcbz, so that the user is made aware that dcbz instruction has been used unexpectedly by the kernel. Reported-by: Stan Johnson <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/2e3acfe63d289c6fba366e16973c9ab8369e8b75.1631803922.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/32: Add support for out-of-line static callsChristophe Leroy4-1/+67
Add support for out-of-line static calls on PPC32. This change improve performance of calls to global function pointers by using direct calls instead of indirect calls. The trampoline is initialy populated with a 'blr' or branch to target, followed by an unreachable long jump sequence. In order to cater with parallele execution, the trampoline needs to be updated in a way that ensures it remains consistent at all time. This means we can't use the traditional lis/addi to load r12 with the target address, otherwise there would be a window during which the first instruction contains the upper part of the new target address while the second instruction still contains the lower part of the old target address. To avoid that the target address is stored just after the 'bctr' and loaded from there with a single instruction. Then, depending on the target distance, arch_static_call_transform() will either replace the first instruction by a direct 'bl <target>' or 'nop' in order to have the trampoline fall through the long jump sequence. For the special case of __static_call_return0(), to avoid the risk of a far branch, a version of it is inlined at the end of the trampoline. Performancewise the long jump sequence is probably not better than the indirect calls set by GCC when we don't use static calls, but such calls are unlikely to be required on powerpc32: With most configurations the kernel size is far below 32 Mbytes so only modules may happen to be too far. And even modules are likely to be close enough as they are allocated below the kernel core and as close as possible of the kernel text. static_call selftest is running successfully with this change. With this patch, __do_irq() has the following sequence to trace irq entries: c0004a00 <__SCT__tp_func_irq_entry>: c0004a00: 48 00 00 e0 b c0004ae0 <__traceiter_irq_entry> c0004a04: 3d 80 c0 00 lis r12,-16384 c0004a08: 81 8c 4a 1c lwz r12,18972(r12) c0004a0c: 7d 89 03 a6 mtctr r12 c0004a10: 4e 80 04 20 bctr c0004a14: 38 60 00 00 li r3,0 c0004a18: 4e 80 00 20 blr c0004a1c: 00 00 00 00 .long 0x0 ... c0005654 <__do_irq>: ... c0005664: 7c 7f 1b 78 mr r31,r3 ... c00056a0: 81 22 00 00 lwz r9,0(r2) c00056a4: 39 29 00 01 addi r9,r9,1 c00056a8: 91 22 00 00 stw r9,0(r2) c00056ac: 3d 20 c0 af lis r9,-16209 c00056b0: 81 29 74 cc lwz r9,29900(r9) c00056b4: 2c 09 00 00 cmpwi r9,0 c00056b8: 41 82 00 10 beq c00056c8 <__do_irq+0x74> c00056bc: 80 69 00 04 lwz r3,4(r9) c00056c0: 7f e4 fb 78 mr r4,r31 c00056c4: 4b ff f3 3d bl c0004a00 <__SCT__tp_func_irq_entry> Before this patch, __do_irq() was doing the following to trace irq entries: c0005700 <__do_irq>: ... c0005710: 7c 7e 1b 78 mr r30,r3 ... c000574c: 93 e1 00 0c stw r31,12(r1) c0005750: 81 22 00 00 lwz r9,0(r2) c0005754: 39 29 00 01 addi r9,r9,1 c0005758: 91 22 00 00 stw r9,0(r2) c000575c: 3d 20 c0 af lis r9,-16209 c0005760: 83 e9 f4 cc lwz r31,-2868(r9) c0005764: 2c 1f 00 00 cmpwi r31,0 c0005768: 41 82 00 24 beq c000578c <__do_irq+0x8c> c000576c: 81 3f 00 00 lwz r9,0(r31) c0005770: 80 7f 00 04 lwz r3,4(r31) c0005774: 7d 29 03 a6 mtctr r9 c0005778: 7f c4 f3 78 mr r4,r30 c000577c: 4e 80 04 21 bctrl c0005780: 85 3f 00 0c lwzu r9,12(r31) c0005784: 2c 09 00 00 cmpwi r9,0 c0005788: 40 82 ff e4 bne c000576c <__do_irq+0x6c> Behind the fact of now using a direct 'bl' instead of a 'load/mtctr/bctr' sequence, we can also see that we get one less register on the stack. Signed-off-by: Christophe Leroy <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/machdep: Remove stale functions from ppc_md structureChristophe Leroy9-44/+2
ppc_md.iommu_save() is not set anymore by any platform after commit c40785ad305b ("powerpc/dart: Use a cachable DART"). So iommu_save() has become a nop and can be removed. ppc_md.show_percpuinfo() is not set anymore by any platform after commit 4350147a816b ("[PATCH] ppc64: SMU based macs cpufreq support"). Last users of ppc_md.rtc_read_val() and ppc_md.rtc_write_val() were removed by commit 0f03a43b8f0f ("[POWERPC] Remove todc code from ARCH=powerpc") Last user of kgdb_map_scc() was removed by commit 17ce452f7ea3 ("kgdb, powerpc: arch specific powerpc kgdb support"). ppc.machine_kexec_prepare() has not been used since commit 8ee3e0d69623 ("powerpc: Remove the main legacy iSerie platform code"). This allows the removal of machine_kexec_prepare() and the rename of default_machine_kexec_prepare() into machine_kexec_prepare() Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Daniel Axtens <[email protected]> [mpe: Drop prototype for default_machine_kexec_prepare() as noted by dja] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/24d4ca0ada683c9436a5f812a7aeb0a1362afa2b.1630398606.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/time: Remove generic_suspend_{dis/en}able_irqs()Christophe Leroy1-15/+7
Commit d75d68cfef49 ("powerpc: Clean up obsolete code relating to decrementer and timebase") made generic_suspend_enable_irqs() and generic_suspend_disable_irqs() static. Fold them into their only caller. Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Daniel Axtens <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/c3f9ec9950394ef939014f7934268e6ee30ca04f.1630398566.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERICChristophe Leroy5-135/+8
Commit e65e1fc2d24b ("[PATCH] syscall class hookup for all normal targets") added generic support for AUDIT but that didn't include support for bi-arch like powerpc. Commit 4b58841149dc ("audit: Add generic compat syscall support") added generic support for bi-arch. Convert powerpc to that bi-arch generic audit support. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/a4b3951d1191d4183d92a07a6097566bde60d00a.1629812058.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/32: Don't use lmw/stmw for saving/restoring non volatile regsChristophe Leroy1-2/+2
Instructions lmw/stmw are interesting for functions that are rarely used and not in the cache, because only one instruction is to be copied into the instruction cache instead of 19. However those instruction are less performant than 19x raw lwz/stw as they require synchronisation plus one additional cycle. SAVE_NVGPRS / REST_NVGPRS are used in only a few places which are mostly in interrupts entries/exits and in task switch so they are likely already in the cache. Using standard lwz improves null_syscall selftest by: - 10 cycles on mpc832x. - 2 cycles on mpc8xx. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/316c543b8906712c108985c8463eec09c8db577b.1629732542.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/5200: dts: fix memory node unit nameAnatolij Gustschin12-12/+12
Fixes build warnings: Warning (unit_address_vs_reg): /memory: node has a reg or ranges property, but no unit name Signed-off-by: Anatolij Gustschin <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-22powerpc/5200: dts: fix pci ranges warningsAnatolij Gustschin10-30/+30
Fix ranges property warnings: pci@f0000d00:ranges: 'oneOf' conditional failed, one must be fixed: Signed-off-by: Anatolij Gustschin <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-22powerpc/5200: dts: add missing pci rangesAnatolij Gustschin1-1/+3
Add ranges property to fix build warnings: Warning (pci_bridge): /pci@f0000d00: missing ranges for PCI bridge (or not a bridge) Signed-off-by: Anatolij Gustschin <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-22powerpc/vas: Fix potential NULL pointer dereferenceGustavo A. R. Silva1-2/+2
(!ptr && !ptr->foo) strikes again. :) The expression (!ptr && !ptr->foo) is bogus and in case ptr is NULL, it leads to a NULL pointer dereference: ptr->foo. Fix this by converting && to || This issue was detected with the help of Coccinelle, and audited and fixed manually. Fixes: 1a0d0d5ed5e3 ("powerpc/vas: Add platform specific user window operations") Cc: [email protected] Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Tyrel Datwyler <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/20211015050345.GA1161918@embeddedor
2021-10-22powerpc/fsl_booke: Enable STRICT_KERNEL_RWXChristophe Leroy1-2/+9
Enable STRICT_KERNEL_RWX on fsl_booke. For that, we need additional TLBCAMs dedicated to linear mapping, based on the alignment of _sinittext. By default, up to 768 Mbytes of memory are mapped. It uses 3 TLBCAMs of size 256 Mbytes. With a data alignment of 16, we need up to 9 TLBCAMs: 16/16/16/16/64/64/64/256/256 With a data alignment of 4, we need up to 12 TLBCAMs: 4/4/4/4/16/16/16/64/64/64/256/256 With a data alignment of 1, we need up to 15 TLBCAMs: 1/1/1/1/4/4/4/16/16/16/64/64/64/256/256 By default, set a 16 Mbytes alignment as a compromise between memory usage and number of TLBCAMs. This can be adjusted manually when needed. For the time being, it doens't work when the base is randomised. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/29f9e5d2bbbc83ae9ca879265426a6278bf4d5bb.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/fsl_booke: Update of TLBCAMs after initChristophe Leroy2-5/+29
After init, set readonly memory as ROX and set readwrite memory as RWX, if STRICT_KERNEL_RWX is enabled. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/66bef0b9c273e1121706883f3cf5ad0a053d863f.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/fsl_booke: Allocate separate TLBCAMs for readonly memoryChristophe Leroy1-3/+22
Reorganise TLBCAM allocation so that when STRICT_KERNEL_RWX is enabled, TLBCAMs are allocated such that readonly memory uses different TLBCAMs. This results in an allocation looking like: Memory CAM mapping: 4/4/4/1/1/1/1/16/16/16/64/64/64/256/256 Mb, residual: 256Mb Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/8ca169bc288261a0e0558712f979023c3a960ebb.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/fsl_booke: Tell map_mem_in_cams() if init is doneChristophe Leroy4-10/+10
In order to be able to call map_mem_in_cams() once more after init for STRICT_KERNEL_RWX, add an argument. For now, map_mem_in_cams() is always called only during init. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/3b69a7e0b393b16984ade882a5eae5d727717459.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/fsl_booke: Enable reloading of TLBCAM without switching to AS1Christophe Leroy1-2/+6
Avoid switching to AS1 when reloading TLBCAM after init for STRICT_KERNEL_RWX. When we setup AS1 we expect the entire accessible memory to be mapped through one entry, this is not the case anymore at the end of init. We are not changing the size of TLBCAMs, only flags, so no need to switch to AS1. So change loadcam_multi() to not switch to AS1 when the given temporary tlb entry in 0. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/a9d517fbfbc940f56103c46b323f6eb8f4485571.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/fsl_booke: Take exec flag into account when setting TLBCAMsChristophe Leroy1-4/+6
Don't force MAS3_SX and MAS3_UX at all time. Take into account the exec flag. While at it, fix a couple of closeby style problems (indent with space and unnecessary parenthesis), it keeps more readability. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/5467044e59f27f9fcf709b9661779e3ce5f784f6.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/fsl_booke: Rename fsl_booke.c to fsl_book3e.cChristophe Leroy2-2/+2
We have a myriad of CONFIG symbols around different variants of BOOKEs, which would be worth tidying up one day. But at least, make file names and CONFIG option match: We have CONFIG_FSL_BOOKE and CONFIG_PPC_FSL_BOOK3E. fsl_booke.c is selected by and only by CONFIG_PPC_FSL_BOOK3E. So rename it fsl_book3e to reduce confusion. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/5dc871db1f67739319bec11f049ca450da1c13a2.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCEChristophe Leroy1-3/+3
fsl_booke and 44x are not able to map kernel linear memory with pages, so they can't support DEBUG_PAGEALLOC and KFENCE, and STRICT_KERNEL_RWX is also a problem for now. Enable those only on book3s (both 32 and 64 except KFENCE), 8xx and 40x. Fixes: 88df6e90fa97 ("[POWERPC] DEBUG_PAGEALLOC for 32-bit") Fixes: 95902e6c8864 ("powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32") Fixes: 90cbac0e995d ("powerpc: Enable KFENCE for PPC32") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/d1ad9fdd9b27da3fdfa16510bb542ed51fa6e134.1634292136.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/kexec_file: Add of_node_put() before gotoWan Jiabing1-0/+1
Fix following coccicheck warning: ./arch/powerpc/kexec/file_load_64.c:698:1-22: WARNING: Function for_each_node_by_type should have of_node_put() before goto Early exits from for_each_node_by_type should decrement the node reference counter. Signed-off-by: Wan Jiabing <[email protected]> Acked-by: Hari Bathini <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-22powerpc/pseries/iommu: Add of_node_put() before breakWan Jiabing1-1/+3
Fix following coccicheck warning: ./arch/powerpc/platforms/pseries/iommu.c:924:1-28: WARNING: Function for_each_node_with_property should have of_node_put() before break Early exits from for_each_node_with_property should decrement the node reference counter. Signed-off-by: Wan Jiabing <[email protected]> Reviewed-by: Leonardo Bras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-22powerpc/s64: Clarify that radix lacks DEBUG_PAGEALLOCJoel Stanley5-1/+23
The page_alloc.c code will call into __kernel_map_pages() when DEBUG_PAGEALLOC is configured and enabled. As the implementation assumes hash, this should crash spectacularly if not for a bit of luck in __kernel_map_pages(). In this function linear_map_hash_count is always zero, the for loop exits without doing any damage. There are no other platforms that determine if they support debug_pagealloc at runtime. Instead of adding code to mm/page_alloc.c to do that, this change turns the map/unmap into a noop when in radix mode and prints a warning once. Signed-off-by: Joel Stanley <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> [mpe: Reformat if per Christophe's suggestion] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-14powerpc: Mark .opd section read-onlyChristophe Leroy1-6/+6
.opd section contains function descriptors used to locate functions in the kernel. If someone is able to modify a function descriptor he will be able to run arbitrary kernel function instead of another. To avoid that, move .opd section inside read-only memory. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/3cd40b682fb6f75bb40947b55ca0bac20cb3f995.1634136222.git.christophe.leroy@csgroup.eu
2021-10-14powerpc/perf: Fix cycles/instructions as PM_CYC/PM_INST_CMPL in power10Athira Rajeev2-17/+35
On power9 and earlier platforms, the default event used for cyles and instructions is PM_CYC (0x0001e) and PM_INST_CMPL (0x00002) respectively. These events use two programmable PMCs and by default will count irrespective of the run latch state (idle state). But since they use programmable PMCs, these events can lead to multiplexing with other events, because there are only 4 programmable PMCs. Hence in power10, performance monitoring unit (PMU) driver uses performance monitor counter 5 (PMC5) and performance monitor counter6 (PMC6) for counting instructions and cycles. Currently on power10, the event used for cycles is PM_RUN_CYC (0x600F4) and instructions uses PM_RUN_INST_CMPL (0x500fa). But counting of these events in idle state is controlled by the CC56RUN bit setting in Monitor Mode Control Register0 (MMCR0). If the CC56RUN bit is zero, PMC5/6 will not count when CTRL[RUN] (run latch) is zero. This could lead to missing some counts if a thread is in idle state during system wide profiling. To fix it, set the CC56RUN bit in MMCR0 for power10, which makes PMC5 and PMC6 count instructions and cycles regardless of the run latch state. Since this change make PMC5/6 count as PM_INST_CMPL/PM_CYC, rename the event code 0x600f4 as PM_CYC instead of PM_RUN_CYC and event code 0x500fa as PM_INST_CMPL instead of PM_RUN_INST_CMPL. The changes are only for PMC5/6 event codes and will not affect the behaviour of PM_RUN_CYC/PM_RUN_INST_CMPL if progammed in other PMC's. Fixes: a64e697cef23 ("powerpc/perf: power10 Performance Monitoring support") Signed-off-by: Athira Rajeev <[email protected]> Reviewed-by: Madhavan Srinivasan <[email protected]> [mpe: Tweak change log wording for style and consistency] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-13powerpc/eeh: Fix docstrings in eeh.cKai Song1-4/+8
We fix the following warnings when building kernel with W=1: arch/powerpc/kernel/eeh.c:598: warning: Function parameter or member 'function' not described in 'eeh_pci_enable' arch/powerpc/kernel/eeh.c:774: warning: Function parameter or member 'edev' not described in 'eeh_set_dev_freset' arch/powerpc/kernel/eeh.c:774: warning: expecting prototype for eeh_set_pe_freset(). Prototype was for eeh_set_dev_freset() instead arch/powerpc/kernel/eeh.c:814: warning: Function parameter or member 'include_passed' not described in 'eeh_pe_reset_full' arch/powerpc/kernel/eeh.c:944: warning: Function parameter or member 'ops' not described in 'eeh_init' arch/powerpc/kernel/eeh.c:1451: warning: Function parameter or member 'include_passed' not described in 'eeh_pe_reset' arch/powerpc/kernel/eeh.c:1526: warning: Function parameter or member 'func' not described in 'eeh_pe_inject_err' arch/powerpc/kernel/eeh.c:1526: warning: Excess function parameter 'function' described in 'eeh_pe_inject_err' Signed-off-by: Kai Song <[email protected]> Reviewed-by: Daniel Axtens <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]