aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/include/asm
AgeCommit message (Collapse)AuthorFilesLines
2021-11-24KVM: PPC: Book3S HV P9: Avoid cpu_in_guest atomics on entry and exitNicholas Piggin2-2/+0
cpu_in_guest is set to determine if a CPU needs to be IPI'ed to exit the guest and notice the need_tlb_flush bit. This can be implemented as a global per-CPU pointer to the currently running guest instead of per-guest cpumasks, saving 2 atomics per entry/exit. P7/8 doesn't require cpu_in_guest, nor does a nested HV (only the L0 does), so move it to the P9 HV path. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24KVM: PPC: Book3S HV P9: Improve mfmsr performance on entryNicholas Piggin1-0/+2
Rearrange the MSR saving on entry so it does not follow the mtmsrd to disable interrupts, avoiding a possible RAW scoreboard stall. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24KVM: PPC: Book3S HV Nested: Avoid extra mftb() in nested entryNicholas Piggin1-0/+1
mftb() is expensive and one can be avoided on nested guest dispatch. If the time checking code distinguishes between the L0 timer and the nested HV timer, then both can be tested in the same place with the same mftb() value. This also nicely illustrates the relationship between the L0 and nested HV timers. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24KVM: PPC: Book3S HV: Split P8 from P9 path guest vCPU TLB flushingNicholas Piggin1-2/+1
This creates separate functions for old and new paths for vCPU TLB flushing, which will reduce complexity of the next change. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24KVM: PPC: Book3S HV P9: Use Linux SPR save/restore to manage some host SPRsNicholas Piggin1-0/+1
Linux implements SPR save/restore including storage space for registers in the task struct for process context switching. Make use of this similarly to the way we make use of the context switching fp/vec save restore. This improves code reuse, allows some stack space to be saved, and helps with avoiding VRSAVE updates if they are not required. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24KVM: PPC: Book3S HV P9: Demand fault TM facility registersNicholas Piggin1-0/+3
Use HFSCR facility disabling to implement demand faulting for TM, with a hysteresis counter similar to the load_fp etc counters in context switching that implement the equivalent demand faulting for userspace facilities. This speeds up guest entry/exit by avoiding the register save/restore when a guest is not frequently using them. When a guest does use them often, there will be some additional demand fault overhead, but these are not commonly used facilities. Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Fabiano Rosas <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24KVM: PPC: Book3S HV P9: Demand fault EBB facility registersNicholas Piggin1-0/+1
Use HFSCR facility disabling to implement demand faulting for EBB, with a hysteresis counter similar to the load_fp etc counters in context switching that implement the equivalent demand faulting for userspace facilities. This speeds up guest entry/exit by avoiding the register save/restore when a guest is not frequently using them. When a guest does use them often, there will be some additional demand fault overhead, but these are not commonly used facilities. Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Fabiano Rosas <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24KVM: PPC: Book3S HV P9: Optimise timebase readsNicholas Piggin1-1/+1
Reduce the number of mfTB executed by passing the current timebase around entry and exit code rather than read it multiple times. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24KVM: PPC: Book3S HV: Change dec_expires to be relative to guest timebaseNicholas Piggin2-1/+7
Change dec_expires to be relative to the guest timebase, and allow it to be moved into low level P9 guest entry functions, to improve SPR access scheduling. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRsNicholas Piggin1-0/+2
This reduces the number of mtmsrd required to enable facility bits when saving/restoring registers, by having the KVM code set all bits up front rather than using individual facility functions that set their particular MSR bits. Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Fabiano Rosas <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24KVM: PPC: Book3S HV P9: Implement PMU save/restore in CNicholas Piggin1-5/+0
Implement the P9 path PMU save/restore code in C, and remove the POWER9/10 code from the P7/8 path assembly. Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Athira Jajeev <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24powerpc/time: add API for KVM to re-arm the host timer/decrementerNicholas Piggin1-12/+4
Rather than have KVM look up the host timer and fiddle with the irq-work internal details, have the powerpc/time.c code provide a function for KVM to re-arm the Linux timer code when exiting a guest. This is implementation has an improvement over existing code of marking a decrementer interrupt as soft-pending if a timer has expired, rather than setting DEC to a -ve value, which tended to cause host timers to take two interrupts (first hdec to exit the guest, then the immediate dec). Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24KVM: PPC: Book3S HV P9: Use large decrementer for HDECNicholas Piggin1-0/+2
On processors that don't suppress the HDEC exceptions when LPCR[HDICE]=0, this could help reduce needless guest exits due to leftover exceptions on entering the guest. Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-24KVM: PPC: Book3S HV P9: Use host timer accounting to avoid decrementer readNicholas Piggin1-0/+5
There is no need to save away the host DEC value, as it is derived from the host timer subsystem which maintains the next timer time, so it can be restored from there. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-11-10Merge tag 'asm-generic-5.16' of ↵Linus Torvalds1-10/+0
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic cleanup from Arnd Bergmann: "This is a single cleanup from Peter Collingbourne, removing some dead code" * tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: arch: remove unused function syscall_set_arguments()
2021-11-06Merge tag 'pci-v5.16-changes' of ↵Linus Torvalds1-5/+0
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Conserve IRQs by setting up portdrv IRQs only when there are users (Jan Kiszka) - Rework and simplify _OSC negotiation for control of PCIe features (Joerg Roedel) - Remove struct pci_dev.driver pointer since it's redundant with the struct device.driver pointer (Uwe Kleine-König) Resource management: - Coalesce contiguous host bridge apertures from _CRS to accommodate BARs that cover more than one aperture (Kai-Heng Feng) Sysfs: - Check CAP_SYS_ADMIN before parsing user input (Krzysztof Wilczyński) - Return -EINVAL consistently from "store" functions (Krzysztof Wilczyński) - Use sysfs_emit() in endpoint "show" functions to avoid buffer overruns (Kunihiko Hayashi) PCIe native device hotplug: - Ignore Link Down/Up caused by resets during error recovery so endpoint drivers can remain bound to the device (Lukas Wunner) Virtualization: - Avoid bus resets on Atheros QCA6174, where they hang the device (Ingmar Klein) - Work around Pericom PI7C9X2G switch packet drop erratum by using store and forward mode instead of cut-through (Nathan Rossi) - Avoid trying to enable AtomicOps on VFs; the PF setting applies to all VFs (Selvin Xavier) MSI: - Document that /sys/bus/pci/devices/.../irq contains the legacy INTx interrupt or the IRQ of the first MSI (not MSI-X) vector (Barry Song) VPD: - Add pci_read_vpd_any() and pci_write_vpd_any() to access anywhere in the possible VPD space; use these to simplify the cxgb3 driver (Heiner Kallweit) Peer-to-peer DMA: - Add (not subtract) the bus offset when calculating DMA address (Wang Lu) ASPM: - Re-enable LTR at Downstream Ports so they don't report Unsupported Requests when reset or hot-added devices send LTR messages (Mingchuang Qiao) Apple PCIe controller driver: - Add driver for Apple M1 PCIe controller (Alyssa Rosenzweig, Marc Zyngier) Cadence PCIe controller driver: - Return success when probe succeeds instead of falling into error path (Li Chen) HiSilicon Kirin PCIe controller driver: - Reorganize PHY logic and add support for external PHY drivers (Mauro Carvalho Chehab) - Support PERST# GPIOs for HiKey970 external PEX 8606 bridge (Mauro Carvalho Chehab) - Add Kirin 970 support (Mauro Carvalho Chehab) - Make driver removable (Mauro Carvalho Chehab) Intel VMD host bridge driver: - If IOMMU supports interrupt remapping, leave VMD MSI-X remapping enabled (Adrian Huang) - Number each controller so we can tell them apart in /proc/interrupts (Chunguang Xu) - Avoid building on UML because VMD depends on x86 bare metal APIs (Johannes Berg) Marvell Aardvark PCIe controller driver: - Define macros for PCI_EXP_DEVCTL_PAYLOAD_* (Pali Rohár) - Set Max Payload Size to 512 bytes per Marvell spec (Pali Rohár) - Downgrade PIO Response Status messages to debug level (Marek Behún) - Preserve CRS SV (Config Request Retry Software Visibility) bit in emulated Root Control register (Pali Rohár) - Fix issue in configuring reference clock (Pali Rohár) - Don't clear status bits for masked interrupts (Pali Rohár) - Don't mask unused interrupts (Pali Rohár) - Avoid code repetition in advk_pcie_rd_conf() (Marek Behún) - Retry config accesses on CRS response (Pali Rohár) - Simplify emulated Root Capabilities initialization (Pali Rohár) - Fix several link training issues (Pali Rohár) - Fix link-up checking via LTSSM (Pali Rohár) - Fix reporting of Data Link Layer Link Active (Pali Rohár) - Fix emulation of W1C bits (Marek Behún) - Fix MSI domain .alloc() method to return zero on success (Marek Behún) - Read entire 16-bit MSI vector in MSI handler, not just low 8 bits (Marek Behún) - Clear Root Port I/O Space, Memory Space, and Bus Master Enable bits at startup; PCI core will set those as necessary (Pali Rohár) - When operating as a Root Port, set class code to "PCI Bridge" instead of the default "Mass Storage Controller" (Pali Rohár) - Add emulation for PCI_BRIDGE_CTL_BUS_RESET since aardvark doesn't implement this per spec (Pali Rohár) - Add emulation of option ROM BAR since aardvark doesn't implement this per spec (Pali Rohár) MediaTek MT7621 PCIe controller driver: - Add MediaTek MT7621 PCIe host controller driver and DT binding (Sergio Paracuellos) Qualcomm PCIe controller driver: - Add SC8180x compatible string (Bjorn Andersson) - Add endpoint controller driver and DT binding (Manivannan Sadhasivam) - Restructure to use of_device_get_match_data() (Prasad Malisetty) - Add SC7280-specific pcie_1_pipe_clk_src handling (Prasad Malisetty) Renesas R-Car PCIe controller driver: - Remove unnecessary includes (Geert Uytterhoeven) Rockchip DesignWare PCIe controller driver: - Add DT binding (Simon Xue) Socionext UniPhier Pro5 controller driver: - Serialize INTx masking/unmasking (Kunihiko Hayashi) Synopsys DesignWare PCIe controller driver: - Run dwc .host_init() method before registering MSI interrupt handler so we can deal with pending interrupts left by bootloader (Bjorn Andersson) - Clean up Kconfig dependencies (Andy Shevchenko) - Export symbols to allow more modular drivers (Luca Ceresoli) TI DRA7xx PCIe controller driver: - Allow host and endpoint drivers to be modules (Luca Ceresoli) - Enable external clock if present (Luca Ceresoli) TI J721E PCIe driver: - Disable PHY when probe fails after initializing it (Christophe JAILLET) MicroSemi Switchtec management driver: - Return error to application when command execution fails because an out-of-band reset has cleared the device BARs, Memory Space Enable, etc (Kelvin Cao) - Fix MRPC error status handling issue (Kelvin Cao) - Mask out other bits when reading of management VEP instance ID (Kelvin Cao) - Return EOPNOTSUPP instead of ENOTSUPP from sysfs show functions (Kelvin Cao) - Add check of event support (Logan Gunthorpe) Miscellaneous: - Remove unused pci_pool wrappers, which have been replaced by dma_pool (Cai Huoqing) - Use 'unsigned int' instead of bare 'unsigned' (Krzysztof Wilczyński) - Use kstrtobool() directly, sans strtobool() wrapper (Krzysztof Wilczyński) - Fix some sscanf(), sprintf() format mismatches (Krzysztof Wilczyński) - Update PCI subsystem information in MAINTAINERS (Krzysztof Wilczyński) - Correct some misspellings (Krzysztof Wilczyński)" * tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (137 commits) PCI: Add ACS quirk for Pericom PI7C9X2G switches PCI: apple: Configure RID to SID mapper on device addition iommu/dart: Exclude MSI doorbell from PCIe device IOVA range PCI: apple: Implement MSI support PCI: apple: Add INTx and per-port interrupt support PCI: kirin: Allow removing the driver PCI: kirin: De-init the dwc driver PCI: kirin: Disable clkreq during poweroff sequence PCI: kirin: Move the power-off code to a common routine PCI: kirin: Add power_off support for Kirin 960 PHY PCI: kirin: Allow building it as a module PCI: kirin: Add MODULE_* macros PCI: kirin: Add Kirin 970 compatible PCI: kirin: Support PERST# GPIOs for HiKey970 external PEX 8606 bridge PCI: apple: Set up reference clocks when probing PCI: apple: Add initial hardware bring-up PCI: of: Allow matching of an interrupt-map local to a PCI device of/irq: Allow matching of an interrupt-map local to an interrupt controller irqdomain: Make of_phandle_args_to_fwspec() generally available PCI: Do not enable AtomicOps on VFs ...
2021-11-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-14/+1
Merge misc updates from Andrew Morton: "257 patches. Subsystems affected by this patch series: scripts, ocfs2, vfs, and mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache, gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools, memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm, vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram, cleanups, kfence, and damon)" * emailed patches from Andrew Morton <[email protected]>: (257 commits) mm/damon: remove return value from before_terminate callback mm/damon: fix a few spelling mistakes in comments and a pr_debug message mm/damon: simplify stop mechanism Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions Docs/admin-guide/mm/damon/start: simplify the content Docs/admin-guide/mm/damon/start: fix a wrong link Docs/admin-guide/mm/damon/start: fix wrong example commands mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on mm/damon: remove unnecessary variable initialization Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM) selftests/damon: support watermarks mm/damon/dbgfs: support watermarks mm/damon/schemes: activate schemes based on a watermarks mechanism tools/selftests/damon: update for regions prioritization of schemes mm/damon/dbgfs: support prioritization weights mm/damon/vaddr,paddr: support pageout prioritization mm/damon/schemes: prioritize regions within the quotas mm/damon/selftests: support schemes quotas mm/damon/dbgfs: support quotas of schemes ...
2021-11-06mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSEDavid Hildenbrand1-1/+1
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Shuah Khan <[email protected]> [kselftest] Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Oscar Salvador <[email protected]> Cc: Alex Shi <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06powerpc: use generic version of arch_is_kernel_initmem_freed()Christophe Leroy1-13/+0
The generic version of arch_is_kernel_initmem_freed() now does the same as powerpc version. Remove the powerpc version. Link: https://lkml.kernel.org/r/c53764eb45d41491e2b21da2e7812239897dbebb.1633001016.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Paul Mackerras <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-05Merge tag 'powerpc-5.16-1' of ↵Linus Torvalds20-58/+172
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Enable STRICT_KERNEL_RWX for Freescale 85xx platforms. - Activate CONFIG_STRICT_KERNEL_RWX by default, while still allowing it to be disabled. - Add support for out-of-line static calls on 32-bit. - Fix oopses doing bpf-to-bpf calls when STRICT_KERNEL_RWX is enabled. - Fix boot hangs on e5500 due to stale value in ESR passed to do_page_fault(). - Fix several bugs on pseries in handling of device tree cache information for hotplugged CPUs, and/or during partition migration. - Various other small features and fixes. Thanks to Alexey Kardashevskiy, Alistair Popple, Anatolij Gustschin, Andrew Donnellan, Athira Rajeev, Bixuan Cui, Bjorn Helgaas, Cédric Le Goater, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Denis Kirjanov, Fabiano Rosas, Frederic Barrat, Gustavo A. R. Silva, Hari Bathini, Jacques de Laval, Joel Stanley, Kai Song, Kajol Jain, Laurent Vivier, Leonardo Bras, Madhavan Srinivasan, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Niklas Schnelle, Oliver O'Halloran, Rob Herring, Russell Currey, Srikar Dronamraju, Stan Johnson, Tyrel Datwyler, Uwe Kleine-König, Vasant Hegde, Wan Jiabing, and Xiaoming Ni, * tag 'powerpc-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (73 commits) powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST powerpc/32e: Ignore ESR in instruction storage interrupt handler powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload powerpc: Don't provide __kernel_map_pages() without ARCH_SUPPORTS_DEBUG_PAGEALLOC MAINTAINERS: Update powerpc KVM entry powerpc/xmon: fix task state output powerpc/44x/fsp2: add missing of_node_put powerpc/dcr: Use cmplwi instead of 3-argument cmpli KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handling powerpc/security: Use a mutex for interrupt exit code patching powerpc/83xx/mpc8349emitx: Make mcu_gpiochip_remove() return void powerpc/fsl_booke: Fix setting of exec flag when setting TLBCAMs powerpc/book3e: Fix set_memory_x() and set_memory_nx() powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect() powerpc/bpf: Fix write protecting JIT code selftests/powerpc: Use date instead of EPOCHSECONDS in mitigation-patching.sh powerpc/64s/interrupt: Fix check_return_regs_valid() false positive powerpc/boot: Set LC_ALL=C in wrapper script powerpc/64s: Default to 64K pages for 64 bit book3s Revert "powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC" ...
2021-11-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-3/+3
Pull KVM updates from Paolo Bonzini: "ARM: - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us RISC-V: - New KVM port. x86: - New API to control TSC offset from userspace - TSC scaling for nested hypervisors on SVM - Switch masterclock protection from raw_spin_lock to seqcount - Clean up function prototypes in the page fault code and avoid repeated memslot lookups - Convey the exit reason to userspace on emulation failure - Configure time between NX page recovery iterations - Expose Predictive Store Forwarding Disable CPUID leaf - Allocate page tracking data structures lazily (if the i915 KVM-GT functionality is not compiled in) - Cleanups, fixes and optimizations for the shadow MMU code s390: - SIGP Fixes - initial preparations for lazy destroy of secure VMs - storage key improvements/fixes - Log the guest CPNC Starting from this release, KVM-PPC patches will come from Michael Ellerman's PPC tree" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) RISC-V: KVM: fix boolreturn.cocci warnings RISC-V: KVM: remove unneeded semicolon RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions RISC-V: KVM: Factor-out FP virtualization into separate sources KVM: s390: add debug statement for diag 318 CPNC data KVM: s390: pv: properly handle page flags for protected guests KVM: s390: Fix handle_sske page fault handling KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol KVM: x86: On emulation failure, convey the exit reason, etc. to userspace KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info KVM: x86: Clarify the kvm_run.emulation_failure structure layout KVM: s390: Add a routine for setting userspace CPU state KVM: s390: Simplify SIGP Set Arch handling KVM: s390: pv: avoid stalls when making pages secure KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm KVM: s390: pv: avoid double free of sida page KVM: s390: pv: add macros for UVC CC values s390/mm: optimize reset_guest_reference_bit() s390/mm: optimize set_guest_storage_key() s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present ...
2021-11-01Merge tag 'trace-v5.16' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - kprobes: Restructured stack unwinder to show properly on x86 when a stack dump happens from a kretprobe callback. - Fix to bootconfig parsing - Have tracefs allow owner and group permissions by default (only denying others). There's been pressure to allow non root to tracefs in a controlled fashion, and using groups is probably the safest. - Bootconfig memory managament updates. - Bootconfig clean up to have the tools directory be less dependent on changes in the kernel tree. - Allow perf to be traced by function tracer. - Rewrite of function graph tracer to be a callback from the function tracer instead of having its own trampoline (this change will happen on an arch by arch basis, and currently only x86_64 implements it). - Allow multiple direct trampolines (bpf hooks to functions) be batched together in one synchronization. - Allow histogram triggers to add variables that can perform calculations against the event's fields. - Use the linker to determine architecture callbacks from the ftrace trampoline to allow for proper parameter prototypes and prevent warnings from the compiler. - Extend histogram triggers to key off of variables. - Have trace recursion use bit magic to determine preempt context over if branches. - Have trace recursion disable preemption as all use cases do anyway. - Added testing for verification of tracing utilities. - Various small clean ups and fixes. * tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits) tracing/histogram: Fix semicolon.cocci warnings tracing/histogram: Fix documentation inline emphasis warning tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together tracing: Show size of requested perf buffer bootconfig: Initialize ret in xbc_parse_tree() ftrace: do CPU checking after preemption disabled ftrace: disable preemption when recursion locked tracing/histogram: Document expression arithmetic and constants tracing/histogram: Optimize division by a power of 2 tracing/histogram: Covert expr to const if both operands are constants tracing/histogram: Simplify handling of .sym-offset in expressions tracing: Fix operator precedence for hist triggers expression tracing: Add division and multiplication support for hist triggers tracing: Add support for creating hist trigger variables from literal selftests/ftrace: Stop tracing while reading the trace file by default MAINTAINERS: Update KPROBES and TRACING entries test_kprobes: Move it from kernel/ to lib/ docs, kprobes: Remove invalid URL and add new reference samples/kretprobes: Fix return value if register_kretprobe() failed lib/bootconfig: Fix the xbc_get_info kerneldoc ...
2021-11-01Merge tag 'kspp-misc-fixes-5.16-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull hardening fixes and cleanups from Gustavo A. R. Silva: "Various hardening fixes and cleanups that I've been collecting during the last development cycle: Fix -Wcast-function-type error: - firewire: Remove function callback casts (Oscar Carter) Fix application of sizeof operator: - firmware/psci: fix application of sizeof to pointer (jing yangyang) Replace open coded instances with size_t saturating arithmetic helpers: - assoc_array: Avoid open coded arithmetic in allocator arguments (Len Baker) - writeback: prefer struct_size over open coded arithmetic (Len Baker) - aio: Prefer struct_size over open coded arithmetic (Len Baker) - dmaengine: pxa_dma: Prefer struct_size over open coded arithmetic (Len Baker) Flexible array transformation: - KVM: PPC: Replace zero-length array with flexible array member (Len Baker) Use 2-factor argument multiplication form: - nouveau/svm: Use kvcalloc() instead of kvzalloc() (Gustavo A. R. Silva) - xfs: Use kvcalloc() instead of kvzalloc() (Gustavo A. R. Silva)" * tag 'kspp-misc-fixes-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: firewire: Remove function callback casts nouveau/svm: Use kvcalloc() instead of kvzalloc() firmware/psci: fix application of sizeof to pointer dmaengine: pxa_dma: Prefer struct_size over open coded arithmetic KVM: PPC: Replace zero-length array with flexible array member aio: Prefer struct_size over open coded arithmetic writeback: prefer struct_size over open coded arithmetic xfs: Use kvcalloc() instead of kvzalloc() assoc_array: Avoid open coded arithmetic in allocator arguments
2021-11-01Merge tag 'cpu-to-thread_info-v5.16-rc1' of ↵Linus Torvalds2-16/+4
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull thread_info update to move 'cpu' back from task_struct from Kees Cook: "Cross-architecture update to move task_struct::cpu back into thread_info on arm64, x86, s390, powerpc, and riscv. All Acked by arch maintainers. Quoting Ard Biesheuvel: 'Move task_struct::cpu back into thread_info Keeping CPU in task_struct is problematic for architectures that define raw_smp_processor_id() in terms of this field, as it requires linux/sched.h to be included, which causes a lot of pain in terms of circular dependencies (aka 'header soup') This series moves it back into thread_info (where it came from) for all architectures that enable THREAD_INFO_IN_TASK, addressing the header soup issue as well as some pointless differences in the implementations of task_cpu() and set_task_cpu()'" * tag 'cpu-to-thread_info-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: riscv: rely on core code to keep thread_info::cpu updated powerpc: smp: remove hack to obtain offset of task_struct::cpu sched: move CPU field back into thread_info if THREAD_INFO_IN_TASK=y powerpc: add CPU field to struct thread_info s390: add CPU field to struct thread_info x86: add CPU field to struct thread_info arm64: add CPU field to struct thread_info
2021-11-01Merge tag 'x86_cc_for_v5.16_rc1' of ↵Linus Torvalds1-5/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull generic confidential computing updates from Borislav Petkov: "Add an interface called cc_platform_has() which is supposed to be used by confidential computing solutions to query different aspects of the system. The intent behind it is to unify testing of such aspects instead of having each confidential computing solution add its own set of tests to code paths in the kernel, leading to an unwieldy mess" * tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide: Replace the use of mem_encrypt_active() with cc_platform_has() x86/sev: Replace occurrences of sev_es_active() with cc_platform_has() x86/sev: Replace occurrences of sev_active() with cc_platform_has() x86/sme: Replace occurrences of sme_active() with cc_platform_has() powerpc/pseries/svm: Add a powerpc version of cc_platform_has() x86/sev: Add an x86 version of cc_platform_has() arch/cc: Introduce a function to check for confidential computing features x86/ioremap: Selectively build arch override encryption functions
2021-11-01Merge tag 'sched-core-2021-11-01' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Revert the printk format based wchan() symbol resolution as it can leak the raw value in case that the symbol is not resolvable. - Make wchan() more robust and work with all kind of unwinders by enforcing that the task stays blocked while unwinding is in progress. - Prevent sched_fork() from accessing an invalid sched_task_group - Improve asymmetric packing logic - Extend scheduler statistics to RT and DL scheduling classes and add statistics for bandwith burst to the SCHED_FAIR class. - Properly account SCHED_IDLE entities - Prevent a potential deadlock when initial priority is assigned to a newly created kthread. A recent change to plug a race between cpuset and __sched_setscheduler() introduced a new lock dependency which is now triggered. Break the lock dependency chain by moving the priority assignment to the thread function. - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems. - Improve idle balancing in general and especially for NOHZ enabled systems. - Provide proper interfaces for live patching so it does not have to fiddle with scheduler internals. - Add cluster aware scheduling support. - A small set of tweaks for RT (irqwork, wait_task_inactive(), various scheduler options and delaying mmdrop) - The usual small tweaks and improvements all over the place * tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) sched/fair: Cleanup newidle_balance sched/fair: Remove sysctl_sched_migration_cost condition sched/fair: Wait before decaying max_newidle_lb_cost sched/fair: Skip update_blocked_averages if we are defering load balance sched/fair: Account update_blocked_averages in newidle_balance cost x86: Fix __get_wchan() for !STACKTRACE sched,x86: Fix L2 cache mask sched/core: Remove rq_relock() sched: Improve wake_up_all_idle_cpus() take #2 irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support. sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ sched: Add cluster scheduler level for x86 sched: Add cluster scheduler level in core and related Kconfig for ARM64 topology: Represent clusters of CPUs within a die sched: Disable -Wunused-but-set-variable sched: Add wrapper for get_wchan() to keep task blocked x86: Fix get_wchan() to support the ORC unwinder proc: Use task_is_running() for wchan in /proc/$pid/stat ...
2021-11-01Merge tag 'locking-core-2021-10-31' of ↵Linus Torvalds1-21/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Thomas Gleixner: - Move futex code into kernel/futex/ and split up the kitchen sink into seperate files to make integration of sys_futex_waitv() simpler. - Add a new sys_futex_waitv() syscall which allows to wait on multiple futexes. The main use case is emulating Windows' WaitForMultipleObjects which allows Wine to improve the performance of Windows Games. Also native Linux games can benefit from this interface as this is a common wait pattern for this kind of applications. - Add context to ww_mutex_trylock() to provide a path for i915 to rework their eviction code step by step without making lockdep upset until the final steps of rework are completed. It's also useful for regulator and TTM to avoid dropping locks in the non contended path. - Lockdep and might_sleep() cleanups and improvements - A few improvements for the RT substitutions. - The usual small improvements and cleanups. * tag 'locking-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) locking: Remove spin_lock_flags() etc locking/rwsem: Fix comments about reader optimistic lock stealing conditions locking: Remove rcu_read_{,un}lock() for preempt_{dis,en}able() locking/rwsem: Disable preemption for spinning region docs: futex: Fix kernel-doc references futex: Fix PREEMPT_RT build futex2: Documentation: Document sys_futex_waitv() uAPI selftests: futex: Test sys_futex_waitv() wouldblock selftests: futex: Test sys_futex_waitv() timeout selftests: futex: Add sys_futex_waitv() test futex,arm: Wire up sys_futex_waitv() futex,x86: Wire up sys_futex_waitv() futex: Implement sys_futex_waitv() futex: Simplify double_lock_hb() futex: Split out wait/wake futex: Split out requeue futex: Rename mark_wake_futex() futex: Rename: match_futex() futex: Rename: hb_waiter_{inc,dec,pending}() futex: Split out PI futex ...
2021-10-30locking: Remove spin_lock_flags() etcArnd Bergmann1-21/+0
parisc, ia64 and powerpc32 are the only remaining architectures that provide custom arch_{spin,read,write}_lock_flags() functions, which are meant to re-enable interrupts while waiting for a spinlock. However, none of these can actually run into this codepath, because it is only called on architectures without CONFIG_GENERIC_LOCKBREAK, or when CONFIG_DEBUG_LOCK_ALLOC is set without CONFIG_LOCKDEP, and none of those combinations are possible on the three architectures. Going back in the git history, it appears that arch/mn10300 may have been able to run into this code path, but there is a good chance that it never worked. On the architectures that still exist, it was already impossible to hit back in 2008 after the introduction of CONFIG_GENERIC_LOCKBREAK, and possibly earlier. As this is all dead code, just remove it and the helper functions built around it. For arch/ia64, the inline asm could be cleaned up, but it seems safer to leave it untouched. Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Helge Deller <[email protected]> # parisc Link: https://lore.kernel.org/r/[email protected]
2021-10-29Merge branch 'topic/ppc-kvm' into nextMichael Ellerman1-0/+4
Merge a couple of KVM ppc patches we are keeping in a topic branch.
2021-10-28powerpc/book3e: Fix set_memory_x() and set_memory_nx()Christophe Leroy3-9/+16
set_memory_x() calls pte_mkexec() which sets _PAGE_EXEC. set_memory_nx() calls pte_exprotec() which clears _PAGE_EXEC. Book3e has 2 bits, UX and SX, which defines the exec rights resp. for user (PR=1) and for kernel (PR=0). _PAGE_EXEC is defined as UX only. An executable kernel page is set with either _PAGE_KERNEL_RWX or _PAGE_KERNEL_ROX, which both have SX set and UX cleared. So set_memory_nx() call for an executable kernel page does nothing because UX is already cleared. And set_memory_x() on a non-executable kernel page makes it executable for the user and keeps it non-executable for kernel. Also, pte_exec() always returns 'false' on kernel pages, because it checks _PAGE_EXEC which doesn't include SX, so for instance the W+X check doesn't work. To fix this: - change tlb_low_64e.S to use _PAGE_BAP_UX instead of _PAGE_USER - sets both UX and SX in _PAGE_EXEC so that pte_exec() returns true whenever one of the two bits is set and pte_exprotect() clears both bits. - Define a book3e specific version of pte_mkexec() which sets either SX or UX based on UR. Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/c41100f9c144dc5b62e5a751b810190c6b5d42fd.1635226743.git.christophe.leroy@csgroup.eu
2021-10-28powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect()Christophe Leroy2-9/+30
Commit 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code") changed those two functions to use pte helpers to determine which bits to clear and which bits to set. This change was based on the assumption that bits to be set/cleared are always the same and can be determined by applying the pte manipulation helpers on __pte(0). But on platforms like book3e, the bits depend on whether the page is a user page or not. For the time being it more or less works because of _PAGE_EXEC being used for user pages only and exec right being set at all time on kernel page. But following patch will clean that and output of pte_mkexec() will depend on the page being a user or kernel page. Instead of trying to make an even more complicated helper where bits would become dependent on the final pte value, come back to a more static situation like before commit 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code"), by introducing an 8xx specific version of __ptep_set_access_flags() and ptep_set_wrprotect(). Fixes: 26973fa5ac0e ("powerpc/mm: use pte helpers in generic code") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/922bdab3a220781bae2360ff3dd5adb7fe4d34f1.1635226743.git.christophe.leroy@csgroup.eu
2021-10-27Revert "powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC"Michael Ellerman1-7/+0
This reverts commit 566af8cda399c088763d07464463dc871c943b54. This caused some conflicts vs the audit tree, and the audit maintainers would prefer we postpone this to the next merge window so we have more time for testing. Signed-off-by: Michael Ellerman <[email protected]>
2021-10-22powerpc/32: Don't use a struct based type for pte_tChristophe Leroy2-2/+18
Long time ago we had a config item called STRICT_MM_TYPECHECKS to build the kernel with pte_t defined as a structure in order to perform additional build checks or build it with pte_t defined as a simple type in order to get simpler generated code. Commit 670eea924198 ("powerpc/mm: Always use STRICT_MM_TYPECHECKS") made the struct based definition the only one, considering that the generated code was similar in both cases. That's right on ppc64 because the ABI is such that the content of a struct having a single simple type element is passed as register, but on ppc32 such a structure is passed via the stack like any structure. Simple test function: pte_t test(pte_t pte) { return pte; } Before this patch we get c00108ec <test>: c00108ec: 81 24 00 00 lwz r9,0(r4) c00108f0: 91 23 00 00 stw r9,0(r3) c00108f4: 4e 80 00 20 blr So, for PPC32, restore the simple type behaviour we got before commit 670eea924198, but instead of adding a config option to activate type check, do it when __CHECKER__ is set so that type checking is performed by 'sparse' and provides feedback like: arch/powerpc/mm/pgtable.c:466:16: warning: incorrect type in return expression (different base types) arch/powerpc/mm/pgtable.c:466:16: expected unsigned long arch/powerpc/mm/pgtable.c:466:16: got struct pte_t [usertype] x With this patch we now get c0010890 <test>: c0010890: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <[email protected]> [mpe: Define STRICT_MM_TYPECHECKS rather than repeating the condition] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/c904599f33aaf6bb7ee2836a9ff8368509e0d78d.1631887042.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/8xx: Simplify TLB handlingChristophe Leroy1-0/+15
In the old days, TLB handling for 8xx was using tlbie and tlbia instructions directly as much as possible. But commit f048aace29e0 ("powerpc/mm: Add SMP support to no-hash TLB handling") broke that by introducing out-of-line unnecessary complex functions for booke/smp which don't have tlbie/tlbia instructions and require more complex handling. Restore direct use of tlbie and tlbia for 8xx which is never SMP. With this patch we now get c00ecc68 <ptep_clear_flush>: c00ecc68: 39 00 00 00 li r8,0 c00ecc6c: 81 46 00 00 lwz r10,0(r6) c00ecc70: 91 06 00 00 stw r8,0(r6) c00ecc74: 7c 00 2a 64 tlbie r5,r0 c00ecc78: 7c 00 04 ac hwsync c00ecc7c: 91 43 00 00 stw r10,0(r3) c00ecc80: 4e 80 00 20 blr Before it was c0012880 <local_flush_tlb_page>: c0012880: 2c 03 00 00 cmpwi r3,0 c0012884: 41 82 00 54 beq c00128d8 <local_flush_tlb_page+0x58> c0012888: 81 22 00 00 lwz r9,0(r2) c001288c: 81 43 00 20 lwz r10,32(r3) c0012890: 39 29 00 01 addi r9,r9,1 c0012894: 91 22 00 00 stw r9,0(r2) c0012898: 2c 0a 00 00 cmpwi r10,0 c001289c: 41 82 00 10 beq c00128ac <local_flush_tlb_page+0x2c> c00128a0: 81 2a 01 dc lwz r9,476(r10) c00128a4: 2c 09 ff ff cmpwi r9,-1 c00128a8: 41 82 00 0c beq c00128b4 <local_flush_tlb_page+0x34> c00128ac: 7c 00 22 64 tlbie r4,r0 c00128b0: 7c 00 04 ac hwsync c00128b4: 81 22 00 00 lwz r9,0(r2) c00128b8: 39 29 ff ff addi r9,r9,-1 c00128bc: 2c 09 00 00 cmpwi r9,0 c00128c0: 91 22 00 00 stw r9,0(r2) c00128c4: 4c a2 00 20 bclr+ 4,eq c00128c8: 81 22 00 70 lwz r9,112(r2) c00128cc: 71 29 00 04 andi. r9,r9,4 c00128d0: 4d 82 00 20 beqlr c00128d4: 48 65 76 74 b c0669f48 <preempt_schedule> c00128d8: 81 22 00 00 lwz r9,0(r2) c00128dc: 39 29 00 01 addi r9,r9,1 c00128e0: 91 22 00 00 stw r9,0(r2) c00128e4: 4b ff ff c8 b c00128ac <local_flush_tlb_page+0x2c> ... c00ecdc8 <ptep_clear_flush>: c00ecdc8: 94 21 ff f0 stwu r1,-16(r1) c00ecdcc: 39 20 00 00 li r9,0 c00ecdd0: 93 c1 00 08 stw r30,8(r1) c00ecdd4: 83 c6 00 00 lwz r30,0(r6) c00ecdd8: 91 26 00 00 stw r9,0(r6) c00ecddc: 93 e1 00 0c stw r31,12(r1) c00ecde0: 7c 08 02 a6 mflr r0 c00ecde4: 7c 7f 1b 78 mr r31,r3 c00ecde8: 7c 83 23 78 mr r3,r4 c00ecdec: 7c a4 2b 78 mr r4,r5 c00ecdf0: 90 01 00 14 stw r0,20(r1) c00ecdf4: 4b f2 5a 8d bl c0012880 <local_flush_tlb_page> c00ecdf8: 93 df 00 00 stw r30,0(r31) c00ecdfc: 7f e3 fb 78 mr r3,r31 c00ece00: 80 01 00 14 lwz r0,20(r1) c00ece04: 83 c1 00 08 lwz r30,8(r1) c00ece08: 83 e1 00 0c lwz r31,12(r1) c00ece0c: 7c 08 03 a6 mtlr r0 c00ece10: 38 21 00 10 addi r1,r1,16 c00ece14: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/fb324f1c8f2ddb57cf6aad1cea26329558f1c1c0.1631887021.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/32: Add support for out-of-line static callsChristophe Leroy1-0/+28
Add support for out-of-line static calls on PPC32. This change improve performance of calls to global function pointers by using direct calls instead of indirect calls. The trampoline is initialy populated with a 'blr' or branch to target, followed by an unreachable long jump sequence. In order to cater with parallele execution, the trampoline needs to be updated in a way that ensures it remains consistent at all time. This means we can't use the traditional lis/addi to load r12 with the target address, otherwise there would be a window during which the first instruction contains the upper part of the new target address while the second instruction still contains the lower part of the old target address. To avoid that the target address is stored just after the 'bctr' and loaded from there with a single instruction. Then, depending on the target distance, arch_static_call_transform() will either replace the first instruction by a direct 'bl <target>' or 'nop' in order to have the trampoline fall through the long jump sequence. For the special case of __static_call_return0(), to avoid the risk of a far branch, a version of it is inlined at the end of the trampoline. Performancewise the long jump sequence is probably not better than the indirect calls set by GCC when we don't use static calls, but such calls are unlikely to be required on powerpc32: With most configurations the kernel size is far below 32 Mbytes so only modules may happen to be too far. And even modules are likely to be close enough as they are allocated below the kernel core and as close as possible of the kernel text. static_call selftest is running successfully with this change. With this patch, __do_irq() has the following sequence to trace irq entries: c0004a00 <__SCT__tp_func_irq_entry>: c0004a00: 48 00 00 e0 b c0004ae0 <__traceiter_irq_entry> c0004a04: 3d 80 c0 00 lis r12,-16384 c0004a08: 81 8c 4a 1c lwz r12,18972(r12) c0004a0c: 7d 89 03 a6 mtctr r12 c0004a10: 4e 80 04 20 bctr c0004a14: 38 60 00 00 li r3,0 c0004a18: 4e 80 00 20 blr c0004a1c: 00 00 00 00 .long 0x0 ... c0005654 <__do_irq>: ... c0005664: 7c 7f 1b 78 mr r31,r3 ... c00056a0: 81 22 00 00 lwz r9,0(r2) c00056a4: 39 29 00 01 addi r9,r9,1 c00056a8: 91 22 00 00 stw r9,0(r2) c00056ac: 3d 20 c0 af lis r9,-16209 c00056b0: 81 29 74 cc lwz r9,29900(r9) c00056b4: 2c 09 00 00 cmpwi r9,0 c00056b8: 41 82 00 10 beq c00056c8 <__do_irq+0x74> c00056bc: 80 69 00 04 lwz r3,4(r9) c00056c0: 7f e4 fb 78 mr r4,r31 c00056c4: 4b ff f3 3d bl c0004a00 <__SCT__tp_func_irq_entry> Before this patch, __do_irq() was doing the following to trace irq entries: c0005700 <__do_irq>: ... c0005710: 7c 7e 1b 78 mr r30,r3 ... c000574c: 93 e1 00 0c stw r31,12(r1) c0005750: 81 22 00 00 lwz r9,0(r2) c0005754: 39 29 00 01 addi r9,r9,1 c0005758: 91 22 00 00 stw r9,0(r2) c000575c: 3d 20 c0 af lis r9,-16209 c0005760: 83 e9 f4 cc lwz r31,-2868(r9) c0005764: 2c 1f 00 00 cmpwi r31,0 c0005768: 41 82 00 24 beq c000578c <__do_irq+0x8c> c000576c: 81 3f 00 00 lwz r9,0(r31) c0005770: 80 7f 00 04 lwz r3,4(r31) c0005774: 7d 29 03 a6 mtctr r9 c0005778: 7f c4 f3 78 mr r4,r30 c000577c: 4e 80 04 21 bctrl c0005780: 85 3f 00 0c lwzu r9,12(r31) c0005784: 2c 09 00 00 cmpwi r9,0 c0005788: 40 82 ff e4 bne c000576c <__do_irq+0x6c> Behind the fact of now using a direct 'bl' instead of a 'load/mtctr/bctr' sequence, we can also see that we get one less register on the stack. Signed-off-by: Christophe Leroy <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/machdep: Remove stale functions from ppc_md structureChristophe Leroy3-20/+0
ppc_md.iommu_save() is not set anymore by any platform after commit c40785ad305b ("powerpc/dart: Use a cachable DART"). So iommu_save() has become a nop and can be removed. ppc_md.show_percpuinfo() is not set anymore by any platform after commit 4350147a816b ("[PATCH] ppc64: SMU based macs cpufreq support"). Last users of ppc_md.rtc_read_val() and ppc_md.rtc_write_val() were removed by commit 0f03a43b8f0f ("[POWERPC] Remove todc code from ARCH=powerpc") Last user of kgdb_map_scc() was removed by commit 17ce452f7ea3 ("kgdb, powerpc: arch specific powerpc kgdb support"). ppc.machine_kexec_prepare() has not been used since commit 8ee3e0d69623 ("powerpc: Remove the main legacy iSerie platform code"). This allows the removal of machine_kexec_prepare() and the rename of default_machine_kexec_prepare() into machine_kexec_prepare() Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Daniel Axtens <[email protected]> [mpe: Drop prototype for default_machine_kexec_prepare() as noted by dja] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/24d4ca0ada683c9436a5f812a7aeb0a1362afa2b.1630398606.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERICChristophe Leroy1-0/+7
Commit e65e1fc2d24b ("[PATCH] syscall class hookup for all normal targets") added generic support for AUDIT but that didn't include support for bi-arch like powerpc. Commit 4b58841149dc ("audit: Add generic compat syscall support") added generic support for bi-arch. Convert powerpc to that bi-arch generic audit support. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/a4b3951d1191d4183d92a07a6097566bde60d00a.1629812058.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/32: Don't use lmw/stmw for saving/restoring non volatile regsChristophe Leroy1-2/+2
Instructions lmw/stmw are interesting for functions that are rarely used and not in the cache, because only one instruction is to be copied into the instruction cache instead of 19. However those instruction are less performant than 19x raw lwz/stw as they require synchronisation plus one additional cycle. SAVE_NVGPRS / REST_NVGPRS are used in only a few places which are mostly in interrupts entries/exits and in task switch so they are likely already in the cache. Using standard lwz improves null_syscall selftest by: - 10 cycles on mpc832x. - 2 cycles on mpc8xx. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/316c543b8906712c108985c8463eec09c8db577b.1629732542.git.christophe.leroy@csgroup.eu
2021-10-22powerpc/s64: Clarify that radix lacks DEBUG_PAGEALLOCJoel Stanley3-0/+15
The page_alloc.c code will call into __kernel_map_pages() when DEBUG_PAGEALLOC is configured and enabled. As the implementation assumes hash, this should crash spectacularly if not for a bit of luck in __kernel_map_pages(). In this function linear_map_hash_count is always zero, the for loop exits without doing any damage. There are no other platforms that determine if they support debug_pagealloc at runtime. Instead of adding code to mm/page_alloc.c to do that, this change turns the map/unmap into a noop when in radix mode and prints a warning once. Signed-off-by: Joel Stanley <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> [mpe: Reformat if per Christophe's suggestion] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-20KVM: PPC: Replace zero-length array with flexible array memberLen Baker1-1/+1
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use "flexible array members" [1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Also, make use of the struct_size() helper in kzalloc(). [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Signed-off-by: Len Baker <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Gustavo A. R. Silva <[email protected]>
2021-10-15sched: Add wrapper for get_wchan() to keep task blockedKees Cook1-1/+1
Having a stable wchan means the process must be blocked and for it to stay that way while performing stack unwinding. Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: Russell King (Oracle) <[email protected]> [arm] Tested-by: Mark Rutland <[email protected]> [arm64] Link: https://lkml.kernel.org/r/[email protected]
2021-10-13KVM: PPC: Book3S HV: H_ENTER filter out reserved HPTE[B] valueNicholas Piggin1-0/+4
The HPTE B field is a 2-bit field with values 0b10 and 0b11 reserved. This field is also taken from the HPTE and used when KVM executes TLBIEs to set the B field of those instructions. Disallow the guest setting B to a reserved value with H_ENTER by rejecting it. This is the same approach already taken for rejecting reserved (unsupported) LLP values. This prevents the guest from being able to induce the host to execute TLBIE with reserved values, which is not known to be a problem with current processors but in theory it could prevent the TLBIE from working correctly in a future processor. Signed-off-by: Nicholas Piggin <[email protected]> Reviewed-by: Fabiano Rosas <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-12powerpc/eeh: Use dev_driver_string() instead of struct pci_dev->driver->nameUwe Kleine-König1-5/+0
Replace pdev->driver->name by dev_driver_string() for the corresponding struct device. This is a step toward removing pci_dev->driver. Move the function nearer its only user and instead of the ?: operator use a normal "if" which is more readable. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2021-10-09powerpc/paravirt: correct preempt debug splat in vcpu_is_preempted()Nathan Lynch1-1/+17
vcpu_is_preempted() can be used outside of preempt-disabled critical sections, yielding warnings such as: BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185 caller is rwsem_spin_on_owner+0x1cc/0x2d0 CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33 Call Trace: [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable) [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160 [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0 [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0 [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0 [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0 [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70 [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0 [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250 The result of vcpu_is_preempted() is always used speculatively, and the function does not access per-cpu resources in a (Linux) preempt-unsafe way. Use raw_smp_processor_id() to avoid such warnings, adding explanatory comments. Fixes: ca3f969dcb11 ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()") Signed-off-by: Nathan Lynch <[email protected]> Reviewed-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-09powerpc/paravirt: vcpu_is_preempted() commentaryNathan Lynch1-4/+18
Add comments more clearly documenting that this function determines whether hypervisor-level preemption of the VM has occurred. Signed-off-by: Nathan Lynch <[email protected]> Reviewed-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-09powerpc/asm: Remove UPD_CONSTR after GCC 4.9 removalNick Desaulniers4-11/+9
UPD_CONSTR was previously a preprocessor define for an old GCC 4.9 inline asm bug with m<> constraints. Fixes: 6563139d90ad ("powerpc: remove GCC version check for UPD_CONSTR") Suggested-by: Nathan Chancellor <[email protected]> Suggested-by: Christophe Leroy <[email protected]> Suggested-by: Michael Ellerman <[email protected]> Signed-off-by: Nick Desaulniers <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07powerpc/32s: Fix kuap_kernel_restore()Christophe Leroy1-0/+8
At interrupt exit, kuap_kernel_restore() calls kuap_unlock() with the value contained in regs->kuap. However, when regs->kuap contains 0xffffffff it means that KUAP was not unlocked so calling kuap_unlock() is unrelevant and results in jeopardising the contents of kernel space segment registers. So check that regs->kuap doesn't contain KUAP_NONE before calling kuap_unlock(). In the meantime it also means that if KUAP has not been correcly locked back at interrupt exit, it must be locked before continuing. This is done by checking the content of current->thread.kuap which was returned by kuap_get_and_assert_locked() Fixes: 16132529cee5 ("powerpc/32s: Rework Kernel Userspace Access Protection") Reported-by: Stan Johnson <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/0d0c4d0f050a637052287c09ba521bad960a2790.1631715131.git.christophe.leroy@csgroup.eu
2021-10-07powerpc/64s: Fix unrecoverable MCE calling async handler from NMINicholas Piggin1-3/+2
The machine check handler is not considered NMI on 64s. The early handler is the true NMI handler, and then it schedules the machine_check_exception handler to run when interrupts are enabled. This works fine except the case of an unrecoverable MCE, where the true NMI is taken when MSR[RI] is clear, it can not recover, so it calls machine_check_exception directly so something might be done about it. Calling an async handler from NMI context can result in irq state and other things getting corrupted. This can also trigger the BUG at arch/powerpc/include/asm/interrupt.h:168 BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE)); Fix this by making an _async version of the handler which is called in the normal case, and a NMI version that is called for unrecoverable interrupts. Fixes: 2b43dd7653cc ("powerpc/64: enable MSR[EE] in irq replay pt_regs") Signed-off-by: Nicholas Piggin <[email protected]> Tested-by: Cédric Le Goater <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUGNicholas Piggin1-5/+8
If a NMI hits early in an interrupt handler before the irq soft-mask state is reconciled, that can cause a false-positive BUG with a CONFIG_PPC_IRQ_SOFT_MASK_DEBUG assertion. Remove that assertion and instead check the case that if regs->msr has EE clear, then regs->softe should be marked as disabled so the irq state looks correct to NMI handlers, the same as how it's fixed up in the case it was implicit soft-masked. This doesn't fix a known problem -- the change that was fixed by commit 4ec5feec1ad02 ("powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled") was the addition of a warning in the soft-nmi watchdog interrupt which can never actually fire when MSR[EE]=0. However it may be important if NMI handlers grow more code, and it's less surprising to anything using 'regs' - (I tripped over this when working in the area). Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-07powerpc/security: Add a helper to query stf_barrier typeNaveen N. Rao1-0/+5
Add a helper to return the stf_barrier type for the current processor. Signed-off-by: Naveen N. Rao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/3bd5d7f96ea1547991ac2ce3137dc2b220bae285.1633464148.git.naveen.n.rao@linux.vnet.ibm.com