aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-07-25Merge tag 'v5.8-rc6' into locking/core, to pick up fixesIngo Molnar2-2/+2
Signed-off-by: Ingo Molnar <[email protected]>
2020-07-23Merge branch 'scv' support into nextMichael Ellerman9-27/+374
From Nick's cover letter: Linux powerpc new system call instruction and ABI System Call Vectored (scv) ABI ============================== The scv instruction is introduced with POWER9 / ISA3, it comes with an rfscv counter-part. The benefit of these instructions is performance (trading slower SRR0/1 with faster LR/CTR registers, and entering the kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR updates. The scv instruction has 128 levels (not enough to cover the Linux system call space). Assignment and advertisement ---------------------------- The proposal is to assign scv levels conservatively, and advertise them with HWCAP feature bits as we add support for more. Linux has not enabled FSCR[SCV] yet, so executing the scv instruction will cause the kernel to log a "SCV facility unavilable" message, and deliver a SIGILL with ILL_ILLOPC to the process. Linux has defined a HWCAP2 bit PPC_FEATURE2_SCV for SCV support, but does not set it. This change allocates the zero level ('scv 0'), advertised with PPC_FEATURE2_SCV, which will be used to provide normal Linux system calls (equivalent to 'sc'). Attempting to execute scv with other levels will cause a SIGILL to be delivered the same as before, but will not log a "SCV facility unavailable" message (because the processor facility is enabled). Calling convention ------------------ The proposal is for scv 0 to provide the standard Linux system call ABI with the following differences from sc convention[1]: - LR is to be volatile across scv calls. This is necessary because the scv instruction clobbers LR. From previous discussion, this should be possible to deal with in GCC clobbers and CFI. - cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow the kernel system call exit to avoid restoring the volatile cr registers (although we probably still would anyway to avoid information leaks). - Error handling: The consensus among kernel, glibc, and musl is to move to using negative return values in r3 rather than CR0[SO]=1 to indicate error, which matches most other architectures, and is closer to a function call. Notes ----- - r0,r4-r8 are documented as volatile in the ABI, but the kernel patch as submitted currently preserves them. This is to leave room for deciding which way to go with these. Some small benefit was found by preserving them[1] but I'm not convinced it's worth deviating from the C function call ABI just for this. Release code should follow the ABI. Previous discussions: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208691.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209268.html [1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209263.html
2020-07-23powerpc/powernv: Machine check handler for POWER10Nicholas Piggin3-0/+95
Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-23powerpc: Select ARCH_HAS_MEMBARRIER_SYNC_CORENicholas Piggin1-0/+6
powerpc return from interrupt and return from system call sequences are context synchronising. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-23powerpc/64: Fix an out of date comment about MMIO orderingPalmer Dabbelt1-1/+1
This primitive has been renamed, but because it was spelled incorrectly in the first place it must have escaped the fixup patch. As far as I can tell this logic is still correct: smp_mb__after_spinlock() uses the default smp_mb() implementation, which is "sync" rather than "hwsync" but those are the same (though I'm not that familiar with PowerPC). Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-23powerpc: Add a ppc_inst_as_str() helperJordan Niethe2-14/+14
There are quite a few places where instructions are printed, this is done using a '%x' format specifier. With the introduction of prefixed instructions, this does not work well. Currently in these places, ppc_inst_val() is used for the value for %x so only the first word of prefixed instructions are printed. When the instructions are word instructions, only a single word should be printed. For prefixed instructions both the prefix and suffix should be printed. To accommodate both of these situations, instead of a '%x' specifier use '%s' and introduce a helper, __ppc_inst_as_str() which returns a char *. The char * __ppc_inst_as_str() returns is buffer that is passed to it by the caller. It is cumbersome to require every caller of __ppc_inst_as_str() to now declare a buffer. To make it more convenient to use __ppc_inst_as_str(), wrap it in a macro that uses a compound statement to allocate a buffer on the caller's stack before calling it. Signed-off-by: Jordan Niethe <[email protected]> Reviewed-by: Joel Stanley <[email protected]> Acked-by: Segher Boessenkool <[email protected]> [mpe: Drop 0x prefix to match most existings uses, especially xmon] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-22powerpc/64s: system call support for scv/rfscv instructionsNicholas Piggin9-24/+350
Add support for the scv instruction on POWER9 and later CPUs. For now this implements the zeroth scv vector 'scv 0', as identical to 'sc' system calls, with the exception that LR is not preserved, nor are volatile CR registers, and error is not indicated with CR0[SO], but by returning a negative errno. rfscv is implemented to return from scv type system calls. It can not be used to return from sc system calls because those are defined to preserve LR. getpid syscall throughput on POWER9 is improved by 26% (428 to 318 cycles), largely due to reducing mtmsr and mtspr. Signed-off-by: Nicholas Piggin <[email protected]> [mpe: Fix ppc64e build] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-22powerpc/64s/exception: treat NIA below __end_interrupts as soft-maskedNicholas Piggin1-3/+24
The scv instruction causes an interrupt which can enter the kernel with MSR[EE]=1, thus allowing interrupts to hit at any time. These must not be taken as normal interrupts, because they come from MSR[PR]=0 context, and yet the kernel stack is not yet set up and r13 is not set to the PACA). Treat this as a soft-masked interrupt regardless of the soft masked state. This does not affect behaviour yet, because currently all interrupts are taken with MSR[EE]=0. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-22powerpc/perf: Add Power10 PMU feature to DT CPU featuresMadhavan Srinivasan1-0/+26
Add Power10 feature function to DT CPU features, along with a Power10 specific init() to initialize PMU SPRs, sets the oprofile_cpu_type and cpu_features. This will enable performance monitoring unit (PMU) for Power10 in CPU features with "performance-monitor-power10". For Power ISA v3.1, BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This patch initializes MMCRA BHRBRD to disable BHRB feature at boot for Power10. Signed-off-by: Madhavan Srinivasan <[email protected]> [mpe: Move MMCRA_BHRB_DISABLE as noted by jpn, drop CPU setup changes] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-22KVM: PPC: Book3S HV: Save/restore new PMU registersAthira Rajeev1-0/+3
Power ISA v3.1 has added new performance monitoring unit (PMU) special purpose registers (SPRs). They are: Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register A (SIER2) Sampled Instruction Event Register B (SIER3) Add support to save/restore these new SPRs while entering/exiting guest. Also include changes to support KVM_REG_PPC_MMCR3/SIER2/SIER3. Add new SPRs to KVM API documentation. Signed-off-by: Athira Rajeev <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-22powerpc/perf: Add support for ISA3.1 PMU SPRsMadhavan Srinivasan1-0/+8
PowerISA v3.1 includes new performance monitoring unit(PMU) special purpose registers (SPRs). They are Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register 2 (SIER2) Sampled Instruction Event Register 3 (SIER3) MMCR3 is added for further sampling related configuration control. SIER2/SIER3 are added to provide additional information about the sampled instruction. Patch adds new PPMU flag called "PPMU_ARCH_31" to support handling of these new SPRs, updates the struct thread_struct to include these new SPRs, include MMCR3 in struct mmcr_regs. This is needed to support programming of MMCR3 SPR during event_enable/disable. Patch also adds the sysfs support for the MMCR3 SPR along with SPRN_ macros for these new pmu SPRs. Signed-off-by: Madhavan Srinivasan <[email protected]> [mpe: Rename to PPMU_ARCH_31 as noted by jpn] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-22KVM: PPC: Book3S HV: Cleanup updates for kvm vcpu MMCRAthira Rajeev1-0/+2
Currently `kvm_vcpu_arch` stores all Monitor Mode Control registers in a flat array in order: mmcr0, mmcr1, mmcra, mmcr2, mmcrs Split this to give mmcra and mmcrs its own entries in vcpu and use a flat array for mmcr0 to mmcr2. This patch implements this cleanup to make code easier to read. Signed-off-by: Athira Rajeev <[email protected]> [mpe: Fix MMCRA/MMCR2 uapi breakage as noted by paulus] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-22powerpc/64s: Remove PROT_SAO supportNicholas Piggin1-1/+1
ISA v3.1 does not support the SAO storage control attribute required to implement PROT_SAO. PROT_SAO was used by specialised system software (Lx86) that has been discontinued for about 7 years, and is not thought to be used elsewhere, so removal should not cause problems. We rather remove it than keep support for older processors, because live migrating guest partitions to newer processors may not be possible if SAO is in use (or worse allowed with silent races). - PROT_SAO stays in the uapi header so code using it would still build. - arch_validate_prot() is removed, the generic version rejects PROT_SAO so applications would get a failure at mmap() time. Signed-off-by: Nicholas Piggin <[email protected]> [mpe: Drop KVM change for the time being] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-20powerpc/book3s64/kuap: Move UAMOR setup to key init functionAneesh Kumar K.V2-6/+22
UAMOR values are not application-specific. The kernel initializes its value based on different reserved keys. Remove the thread-specific UAMOR value and don't switch the UAMOR on context switch. Move UAMOR initialization to key initialization code and remove thread_struct.uamor because it is not used anymore. Before commit: 4a4a5e5d2aad ("powerpc/pkeys: key allocation/deallocation must not change pkey registers") we used to update uamor based on key allocation and free. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-20powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexecAneesh Kumar K.V1-14/+0
As we kexec across kernels that use AMR/IAMR for different purposes we need to ensure that new kernels get kexec'd with a reset value of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and the old AMR value prevents access to key 0. This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of AMOR is not needed and the IAMR reset is partial (it doesn't do the reset on secondary cpus) and is redundant with this patch. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-20powerpc/book3s64/pkeys: Add MMU_FTR_PKEYAneesh Kumar K.V1-0/+5
Parse storage keys related device tree entry in early_init_devtree and enable MMU feature MMU_FTR_PKEY if pkeys are supported. MMU feature is used instead of CPU feature because this enables us to group MMU_FTR_KUAP and MMU_FTR_PKEY in asm feature fixup code. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-20powerpc/book3s64/pkeys: kill cpu feature key CPU_FTR_PKEYAneesh Kumar K.V1-6/+0
We don't use CPU_FTR_PKEY anymore. Remove the feature bit and mark it free. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-20powerpc/mce: Add MCE notification chainSantosh Sivaraj1-0/+15
Introduce notification chain which lets us know about uncorrected memory errors(UE). This would help prospective users in pmem or nvdimm subsystem to track bad blocks for better handling of persistent memory allocations. Signed-off-by: Santosh Sivaraj <[email protected]> Signed-off-by: Ganesh Goudar <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-20powerpc/prom: Enable Radix GTSE in cpu pa-featuresNicholas Piggin1-1/+1
When '029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.")' made GTSE an MMU feature, it was enabled by default in powerpc-cpu-features but was missed in pa-features. This causes random memory corruption during boot of PowerNV kernels where CONFIG_PPC_DT_CPU_FTRS isn't enabled. Fixes: 029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.") Reported-by: Qian Cai <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Bharata B Rao <[email protected]> [mpe: Unwrap long line] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-19net: remove compat_sys_{get,set}sockoptChristoph Hellwig1-2/+2
Now that the ->compat_{get,set}sockopt proto_ops methods are gone there is no good reason left to keep the compat syscalls separate. This fixes the odd use of unsigned int for the compat_setsockopt optlen and the missing sock_use_custom_sol_socket. It would also easily allow running the eBPF hooks for the compat syscalls, but such a large change in behavior does not belong into a consolidation patch like this one. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-19powerpc: use the generic dma_ops_bypass modeChristoph Hellwig1-81/+9
Use the DMA API bypass mechanism for direct window mappings. This uses common code and speed up the direct mapping case by avoiding indirect calls just when not using dma ops at all. It also fixes a problem where the sync_* methods were using the bypass check for DMA allocations, but those are part of the streaming ops. Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which has never been well defined, as is only used by a few drivers, which IIRC never showed up in the typical Cell blade setups that are affected by the ordering workaround. Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB") Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]>
2020-07-18Merge branch 'fixes' into nextMichael Ellerman2-2/+2
Merge our fixes branch, primarily to bring in the ebb selftests build fix and the pkey fix, which is a dependency for some future work.
2020-07-16powerpc/pseries: Detect secure and trusted boot state of the system.Nayna Jain1-2/+16
The device-tree properties to check secure and trusted boot state are different for guests (pseries) compared to baremetal (powernv). This patch updates the existing is_ppc_secureboot_enabled() and is_ppc_trustedboot_enabled() functions to add support for pseries. For pseries the secureboot and trustedboot state are exposed via device-tree properties /ibm,secure-boot and /ibm,trusted-boot. The values of ibm,secure-boot under pseries are interpreted as: 0 - Disabled 1 - Enabled in Log-only mode. This patch interprets this value as disabled, since audit mode is currently not supported for Linux. 2 - Enabled and enforced. 3-9 - Enabled and enforcing; requirements are at the discretion of the operating system. The values of ibm,trusted-boot under pseries are interpreted as: 0 - Disabled 1 - Enabled Signed-off-by: Nayna Jain <[email protected]> Reviewed-by: Daniel Axtens <[email protected]> Reviewed-by: Mimi Zohar <[email protected]> [mpe: Drop machdep.h inclusion, tweak change log slightly] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/vdso: Fix vdso cpu truncationMilton Miller1-1/+1
The code in vdso_cpu_init that exposes the cpu and numa node to userspace via SPRG_VDSO incorrctly masks the cpu to 12 bits. This means that any kernel running on a box with more than 4096 threads (NR_CPUS advertises a limit of of 8192 cpus) would expose userspace to two cpu contexts running at the same time with the same cpu number. Note: I'm not aware of any distro shipping a kernel with support for more than 4096 threads today, nor of any system image that currently exceeds 4096 threads. Found via code browsing. Fixes: 18ad51dd342a7eb09dbcd059d0b451b616d4dafc ("powerpc: Add VDSO version of getcpu") Signed-off-by: Milton Miller <[email protected]> Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/fadump: fix race between pstore write and fadump crash triggerSourabh Jain1-0/+24
When we enter into fadump crash path via system reset we fail to update the pstore. On the system reset path we first update the pstore then we go for fadump crash. But the problem here is when all the CPUs try to get the pstore lock to initiate the pstore write, only one CPUs will acquire the lock and proceed with the pstore write. Since it in NMI context CPUs that fail to get lock do not wait for their turn to write to the pstore and simply proceed with the next operation which is fadump crash. One of the CPU who proceeded with fadump crash path triggers the crash and does not wait for the CPU who gets the pstore lock to complete the pstore update. Timeline diagram to depicts the sequence of events that leads to an unsuccessful pstore update when we hit fadump crash path via system reset. 1 2 3 ... n CPU Threads | | | | | | | | Reached to -->|--->|---->| ----------->| system reset | | | | path | | | | | | | | Try to -->|--->|---->|------------>| acquire the | | | | pstore lock | | | | | | | | | | | | Got the -->| +->| | |<-+ pstore lock | | | | | |--> Didn't get the | --------------------------+ lock and moving | | | | ahead on fadump | | | | crash path | | | | Begins the -->| | | | process to | | | |<-- Got the chance to update the | | | | trigger the crash pstore | -> | | ... <- | | | | | | | | | | | | |<-- Triggers the | | | | | | crash | | | | | | ^ | | | | | | | Writing to -->| | | | | | | pstore | | | | | | | | | | ^ |__________________| | | CPU Relax | | | +-----------------------------------------+ | v Race: crash triggered before pstore update completes To avoid this race condition a barrier is added on crash_fadump path, it prevents the CPU to trigger the crash until all the online CPUs completes their task. A barrier is added to make sure all the secondary CPUs hit the crash_fadump function before we initiates the crash. A timeout is kept to ensure the primary CPU (one who initiates the crash) do not wait for secondary CPUs indefinitely. Signed-off-by: Sourabh Jain <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/rtasd: simplify handle_rtas_event(), emit message on eventsNathan Lynch1-25/+3
prrn_is_enabled() always returns false/0, so handle_rtas_event() can be simplified and some dead code can be removed. Use machine_is() instead of #ifdef to run this code only on pseries, and add an informational ratelimited message that we are ignoring the events. PRRN events are relatively rare in normal operation and usually arise from operator-initiated actions such as a DPO (Dynamic Platform Optimizer) run. Eventually we do want to consume these events and update the device tree, but that needs more care to be safe vs LPM and DLPAR. Signed-off-by: Nathan Lynch <[email protected]> Reviewed-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/rtas: don't online CPUs for partition suspendNathan Lynch1-120/+2
Partition suspension, used for hibernation and migration, requires that the OS place all but one of the LPAR's processor threads into one of two states prior to calling the ibm,suspend-me RTAS function: * the architected offline state (via RTAS stop-self); or * the H_JOIN hcall, which does not return until the partition resumes execution Using H_CEDE as the offline mode, introduced by commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state"), means that any threads which are offline from Linux's point of view must be moved to one of those two states before a partition suspension can proceed. This was eventually addressed in commit 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation"), which added code to temporarily bring up any offline processor threads so they can call H_JOIN. Conceptually this is fine, but the implementation has had multiple races with cpu hotplug operations initiated from user space[1][2][3], the error handling is fragile, and it generates user-visible cpu hotplug events which is a lot of noise for a platform feature that's supposed to minimize disruption to workloads. With commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state") reverted, this code becomes unnecessary, so remove it. Since any offline CPUs now are truly offline from the platform's point of view, it is no longer necessary to bring up CPUs only to have them call H_JOIN and then go offline again upon resuming. Only active threads are required to call H_JOIN; stopped threads can be left alone. [1] commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and serialization during LPM") [2] commit 9fb603050ffd ("powerpc/rtas: retry when cpu offline races with suspend/migration") [3] commit dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & Migration") Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation") Signed-off-by: Nathan Lynch <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/security: Allow for processors that flush the link stack using the ↵Nicholas Piggin1-8/+19
special bcctr If both count cache and link stack are to be flushed, and can be flushed with the special bcctr, patch that in directly to the flush/branch nop site. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/64s: Move branch cache flushing bcctr variant to ppc-ops.hNicholas Piggin1-4/+2
Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/security: split branch cache flush toggle from code patchingNicholas Piggin1-43/+51
Branch cache flushing code patching has inter-dependencies on both the link stack and the count cache flushing state. To make the code clearer and to separate the link stack and count cache handling, split the "toggle" (setting up variables and printing enable/disable) from the code patching. Signed-off-by: Nicholas Piggin <[email protected]> [mpe: Always print something, even if the flush is disabled] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/security: make display of branch cache flush more consistentNicholas Piggin1-4/+4
Make the count-cache and link-stack messages look the same Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/security: change link stack flush state to the flush type enumNicholas Piggin1-5/+5
Prepare to allow for hardware link stack flushing by using the none/sw/hw type, same as the count cache state. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/security: re-name count cache flush to branch cache flushNicholas Piggin2-22/+21
The count cache flush mostly refers to both count cache and link stack flushing. As a first step to untangling these a bit, re-name the bits that apply to both. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc: re-initialise lazy FPU/VEC counters on every faultNicholas Piggin2-6/+2
When a FP/VEC/VSX unavailable fault loads registers and enables the facility in the MSR, re-set the lazy restore counters to 1 rather than incrementing them so every fault gets the same number of restores before the next fault. This probably shouldn't be a practical change because if a lazy counter was non-zero then it should have been restored and would not cause a fault when userspace tries to access it. However the code and comment implies otherwise so that's misleading and unnecessary. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/64s: Fix restore_math unnecessarily changing MSRNicholas Piggin2-40/+68
Before returning to user, if there are missing FP/VEC/VSX bits from the user MSR then those registers had been saved and must be restored again before use. restore_math will decide whether to restore immediately, or skip the restore and let fp/vec/vsx unavailable faults demand load the registers. Each time restore_math restores one of the FP/VSX or VEC register sets is loaded, an 8-bit counter is incremented (load_fp and load_vec). When these wrap to zero, restore_math no longer restores that register set until after they are next demand faulted. It's quite usual for those counters to have different values, so if one wraps to zero and restore_math no longer restores its registers or user MSR bit but the other is not zero yet does not need to be restored (because the kernel is not frequently using the FPU), then restore_math will be called and it will also not return in the early exit check. This causes msr_check_and_set to test and set the MSR at every kernel exit despite having no work to do. This can cause workloads (e.g., a NULL syscall microbenchmark) to run fast for a time while both counters are non-zero, then slow down when one of the counters reaches zero, then speed up again after the second counter reaches zero. The cost is significant, about 10% slowdown on a NULL syscall benchmark, and the jittery behaviour is very undesirable. Fix this by having restore_math test all conditions first, and only update MSR if we will be loading registers. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/64s: restore_math remove TM testNicholas Piggin1-2/+1
The TM test in restore_math added by commit dc16b553c949e ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") is no longer necessary after commit a8318c13e79ba ("powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts"), which removed the cases where restore_math has to restore if TM is active. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-16powerpc/mm: Enable radix GTSE only if supported.Bharata B Rao2-5/+9
Make GTSE an MMU feature and enable it by default for radix. However for guest, conditionally enable it if hypervisor supports it via OV5 vector. Let prom_init ask for radix GTSE only if the support exists. Having GTSE as an MMU feature will make it easy to enable radix without GTSE. Currently radix assumes GTSE is enabled by default. Signed-off-by: Bharata B Rao <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-15powerpc/vdso64: Switch from __get_datapage() to get_datapage inline macroChristophe Leroy3-34/+12
On the same way as already done on PPC32, drop __get_datapage() function and use get_datapage inline macro instead. See commit ec0895f08f99 ("powerpc/vdso32: inline __get_datapage()") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/e13d95312e0b9792556b19b4bb8955cc1ff19fc7.1588079622.git.christophe.leroy@c-s.fr
2020-07-15powerpc/signal64: Don't opencode page prefaultingChristophe Leroy1-4/+3
Instead of doing a __get_user() from the first and last location into a tmp var which won't be used, use fault_in_pages_readable() Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/810bd8840ef990a200f58c9dea9abe767ca02a3a.1594146723.git.christophe.leroy@csgroup.eu
2020-07-15powerpc/signal_32: Simplify loop in PPC64 save_general_regs()Christophe Leroy1-10/+8
save_general_regs() which does special handling when i == PT_SOFTE. Rewrite it to minimise the specific part, especially the __put_user() and associated error handling is the same so make it common. Signed-off-by: Christophe Leroy <[email protected]> [mpe: Use a regular if rather than ternary operator] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/47a38df46cae5a5a88a558a64d71f75e9c4d9950.1594125164.git.christophe.leroy@csgroup.eu
2020-07-15powerpc/signal_32: Remove !FULL_REGS() special handling in PPC64 ↵Christophe Leroy1-2/+0
save_general_regs() Since commit ("1bd79336a426 powerpc: Fix various syscall/signal/swapcontext bugs"), getting save_general_regs() called without FULL_REGS() is very unlikely and generates a warning. The 32-bit version of save_general_regs() doesn't take care of it at all and copies all registers anyway since that commit. Moreover, commit 965dd3ad3076 ("powerpc/64/syscall: Remove non-volatile GPR save optimisation") is another reason why it would never happen. So the same with 64-bit, don't worry about FULL_REGS() and copy all registers all the time. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/173de3b659fa3a5f126a0eb170522cccd909950f.1594125164.git.christophe.leroy@csgroup.eu
2020-07-15powerpc/64/signal: Balance return predictor stack in signal trampolineNicholas Piggin2-18/+17
Returning from an interrupt or syscall to a signal handler currently begins execution directly at the handler's entry point, with LR set to the address of the sigreturn trampoline. When the signal handler function returns, it runs the trampoline. It looks like this: # interrupt at user address xyz # kernel stuff... signal is raised rfid # void handler(int sig) addis 2,12,.TOC.-.LCF0@ha addi 2,2,.TOC.-.LCF0@l mflr 0 std 0,16(1) stdu 1,-96(1) # handler stuff ld 0,16(1) mtlr 0 blr # __kernel_sigtramp_rt64 addi r1,r1,__SIGNAL_FRAMESIZE li r0,__NR_rt_sigreturn sc # kernel executes rt_sigreturn rfid # back to user address xyz Note the blr with no matching bl. This can corrupt the return predictor. Solve this by instead resuming execution at the signal trampoline which then calls the signal handler. qtrace-tools link_stack checker confirms the entire user/kernel/vdso cycle is balanced after this patch, whereas it's not upstream. Alan confirms the dwarf unwind info still looks good. gdb still recognises the signal frame and can step into parent frames if it break inside a signal handler. Performance is pretty noisy, not a very significant change on a POWER9 here, but branch misses are consistently a lot lower on a microbenchmark: Performance counter stats for './signal': 13,085.72 msec task-clock # 1.000 CPUs utilized 45,024,760,101 cycles # 3.441 GHz 65,102,895,542 instructions # 1.45 insn per cycle 11,271,673,787 branches # 861.372 M/sec 59,468,979 branch-misses # 0.53% of all branches 12,989.09 msec task-clock # 1.000 CPUs utilized 44,692,719,559 cycles # 3.441 GHz 65,109,984,964 instructions # 1.46 insn per cycle 11,282,136,057 branches # 868.585 M/sec 39,786,942 branch-misses # 0.35% of all branches Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-15powerpc/cacheinfo: Add per cpu per index shared_cpu_listSrikar Dronamraju1-1/+10
Unlike drivers/base/cacheinfo, powerpc cacheinfo code is not exposing shared_cpu_list under /sys/devices/system/cpu/cpu<n>/cache/index<m> Add shared_cpu_list to per cpu per index directory to maintain parity with x86. Some scripts (example: mmtests https://github.com/gormanm/mmtests) seem to be looking for shared_cpu_list instead of shared_cpu_map. Before this patch: # ls /sys/devices/system/cpu0/cache/index1 coherency_line_size number_of_sets size ways_of_associativity level shared_cpu_map type # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map 00ff # After this patch: # ls /sys/devices/system/cpu0/cache/index1 coherency_line_size number_of_sets shared_cpu_map type level shared_cpu_list size ways_of_associativity # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map 00ff # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_list 0-7 # Signed-off-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-15powerpc/cacheinfo: Make cpumap_show code reusableSrikar Dronamraju1-2/+8
In anticipation of implementing shared_cpu_list, move code under shared_cpu_map_show() to a common function. No functional changes. Signed-off-by: Srikar Dronamraju <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-15powerpc/cacheinfo: Use cpumap_print to print cpumapSrikar Dronamraju1-6/+2
Tejun Heo had modified shared_cpu_map_show() to use scnprintf instead of cpumap_print during support for *pb[l] format. Refer commit 0c118b7bd09a ("powerpc: use %*pb[l] to print bitmaps including cpumasks and nodemasks"). cpumap_print_to_pagebuf() is a standard function to print cpumap. With commit 9cf79d115f0d ("bitmap: remove explicit newline handling using scnprintf format string"), there is no need to print explicit newline and trailing null character. cpumap_print_to_pagebuf() internally uses scnprintf(). Hence replace scnprintf() with cpumap_print_to_pagebuf(). Note: shared_cpu_map_show() in drivers/base/cacheinfo.c already uses cpumap_print_to_pagebuf(). Before this patch: # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map 00ff # (Notice the extra blank line). After this patch: # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map 00ff # Signed-off-by: Srikar Dronamraju <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-14powerpc/pseries/svm: Fix incorrect check for shared_lppaca_sizeSatheesh Rajendran1-1/+1
Early secure guest boot hits the below crash while booting with vcpus numbers aligned with page boundary for PAGE size of 64k and LPPACA size of 1k i.e 64, 128 etc. Partition configured for 64 cpus. CPU maps initialized for 1 thread per core ------------[ cut here ]------------ kernel BUG at arch/powerpc/kernel/paca.c:89! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries This is due to the BUG_ON() for shared_lppaca_total_size equal to shared_lppaca_size. Instead the code should only BUG_ON() if we have exceeded the total_size, which indicates we've overflowed the array. Fixes: bd104e6db6f0 ("powerpc/pseries/svm: Use shared memory for LPPACA structures") Cc: [email protected] # v5.4+ Signed-off-by: Satheesh Rajendran <[email protected]> Reviewed-by: Laurent Dufour <[email protected]> Reviewed-by: Thiago Jung Bauermann <[email protected]> [mpe: Reword change log to clarify we're fixing not removing the check] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-10powerpc64: Break asm/percpu.h vs spinlock_types.h dependencyPeter Zijlstra1-0/+2
In order to use <asm/percpu.h> in lockdep.h, we need to make sure asm/percpu.h does not itself depend on lockdep. The below seems to make that so and builds powerpc64-defconfig + PROVE_LOCKING. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> https://lkml.kernel.org/r/[email protected]
2020-07-08powerpc/64s/exception: Fix 0x1500 interrupt handler crashNicholas Piggin1-1/+1
A typo caused the interrupt handler to branch immediately to the common "unknown interrupt" handler and skip the special case test for denormal cause. This does not affect KVM softpatch handling (e.g., for POWER9 TM assist) because the KVM test was moved to common code by commit 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code") just before this bug was introduced. Fixes: 3f7fbd97d07d ("powerpc/64s/exception: Clean up SRR specifiers") Reported-by: Paul Menzel <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Tested-by: Paul Menzel <[email protected]> [mpe: Split selftest into a separate patch] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-07-07PCI: Use 'pci_channel_state_t' instead of 'enum pci_channel_state'Luc Van Oostenryck1-1/+1
The method struct pci_error_handlers.error_detected() is defined and documented as taking an 'enum pci_channel_state' for the second argument, but most drivers use 'pci_channel_state_t' instead. This 'pci_channel_state_t' is not a typedef for the enum but a typedef for a bitwise type in order to have better/stricter typechecking. Consolidate everything by using 'pci_channel_state_t' in the method's definition, in the related helpers and in the drivers. Enforce use of 'pci_channel_state_t' by replacing 'enum pci_channel_state' with an anonymous 'enum'. Note: Currently, from a typechecking point of view this patch changes nothing because only the constants defined by the enum are bitwise, not the enum itself (sparse doesn't have the notion of 'bitwise enum'). This may change in some not too far future, hence the patch. [bhelgaas: squash in https://lore.kernel.org/r/[email protected] https://lore.kernel.org/r/[email protected]] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Luc Van Oostenryck <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2020-07-07kbuild: remove cc-option test of -fno-stack-protectorMasahiro Yamada1-1/+1
Some Makefiles already pass -fno-stack-protector unconditionally. For example, arch/arm64/kernel/vdso/Makefile, arch/x86/xen/Makefile. No problem report so far about hard-coding this option. So, we can assume all supported compilers know -fno-stack-protector. GCC 4.8 and Clang support this option (https://godbolt.org/z/_HDGzN) Get rid of cc-option from -fno-stack-protector. Remove CONFIG_CC_HAS_STACKPROTECTOR_NONE, which is always 'y'. Note: arch/mips/vdso/Makefile adds -fno-stack-protector twice, first unconditionally, and second conditionally. I removed the second one. Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]>