aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-03-27powerpc: Update xmon to use ppc_breakpoint_available()Michael Neuling1-0/+4
The 'bd' command will now print an error and not set the breakpoint on P9. Signed-off-by: Michael Neuling <[email protected]> [mpe: Unsplit quoted string] Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc: Update ptrace to use ppc_breakpoint_available()Michael Neuling2-2/+17
This updates the ptrace code to use ppc_breakpoint_available(). We now advertise via PPC_PTRACE_GETHWDBGINFO zero breakpoints when the DAWR is missing (ie. POWER9). This results in GDB falling back to software emulation of the breakpoint (which is slow). For the features advertised by PPC_PTRACE_GETHWDBGINFO, we keep advertising DAWR as if we don't GDB assumes 1 breakpoint irrespective of the number of breakpoints advertised. GDB then fails later when trying to set this one breakpoint. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc: Add ppc_breakpoint_available()Michael Neuling2-0/+13
Add ppc_breakpoint_available() to determine if a breakpoint is available currently via the DAWR or DABR. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/eeh: Add eeh_state_active() helperSam Bobroff3-20/+14
Checking for a "fully active" device state requires testing two flag bits, which is open coded in several places, so add a function to do it. Signed-off-by: Sam Bobroff <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/eeh: Factor out common code eeh_reset_device()Sam Bobroff1-22/+10
The caller will always pass NULL for 'rmv_data' when 'eeh_aware_driver' is true, so the first two calls to eeh_pe_dev_traverse() can be combined without changing behaviour as can the two arms of the final 'if' block. This should not change behaviour. Signed-off-by: Sam Bobroff <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/eeh: Remove always-true tests in eeh_reset_device()Sam Bobroff1-2/+2
eeh_reset_device() tests the value of 'bus' more than once but the only caller, eeh_handle_normal_device() does this test itself and will never pass NULL. So, remove the dead tests. This should not change behaviour. Signed-off-by: Sam Bobroff <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/eeh: Clarify arguments to eeh_reset_device()Sam Bobroff1-9/+11
It is currently difficult to understand the behaviour of eeh_reset_device() due to the way it's parameters are used. In particular, when 'bus' is NULL, it's value is still necessary so the same value is looked up again locally under a different name ('frozen_bus') but behaviour is changed. To clarify this, add a new parameter 'driver_eeh_aware', and have the caller set it when it would have passed NULL for 'bus' and always pass a value for 'bus'. Then change any test that was on 'bus' to one on '!driver_eeh_aware' and replace uses of 'frozen_bus' with 'bus'. Also update the function's comment. This should not change behaviour. Signed-off-by: Sam Bobroff <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/eeh: Rename frozen_bus to bus in eeh_handle_normal_event()Sam Bobroff1-5/+5
The name "frozen_bus" is misleading: it's not necessarily frozen, it's just the PE's PCI bus. Signed-off-by: Sam Bobroff <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/eeh: Remove misleading test in eeh_handle_normal_event()Sam Bobroff1-13/+11
Remove a test that checks if "frozen_bus" is NULL, because it cannot have changed since it was tested at the start of the function and so must be true here. Signed-off-by: Sam Bobroff <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/eeh: Fix misleading comment in __eeh_addr_cache_get_device()Sam Bobroff1-2/+1
Commit "0ba178888b05 powerpc/eeh: Remove reference to PCI device" removed a call to pci_dev_get() from __eeh_addr_cache_get_device() but did not update the comment to match. Signed-off-by: Sam Bobroff <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/eeh: Manage EEH_PE_RECOVERING inside eeh_handle_normal_event()Sam Bobroff3-21/+12
Currently the EEH_PE_RECOVERING flag for a PE is managed by both the caller and callee of eeh_handle_normal_event() (among other places not considered here). This is complicated by the fact that the PE may or may not have been invalidated by the call. So move the callee's handling into eeh_handle_normal_event(), which clarifies it and allows the return type to be changed to void (because it no longer needs to indicate at the PE has been invalidated). This should not change behaviour except in eeh_event_handler() where it was previously possible to cause eeh_pe_state_clear() to be called on an invalid PE, which is now avoided. Signed-off-by: Sam Bobroff <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/eeh: Remove eeh_handle_event()Sam Bobroff3-30/+19
The function eeh_handle_event(pe) does nothing other than switching between calling eeh_handle_normal_event(pe) and eeh_handle_special_event(). However it is only called in two places, one where pe can't be NULL and the other where it must be NULL (see eeh_event_handler()) so it does nothing but obscure the flow of control. So, remove it. Signed-off-by: Sam Bobroff <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/powernv/npu: Do not try invalidating 32bit table when 64bit table is ↵Alexey Kardashevskiy1-3/+24
enabled GPUs and the corresponding NVLink bridges get different PEs as they have separate translation validation entries (TVEs). We put these PEs to the same IOMMU group so they cannot be passed through separately. So the iommu_table_group_ops::set_window/unset_window for GPUs do set tables to the NPU PEs as well which means that iommu_table's list of attached PEs (iommu_table_group_link) has both GPU and NPU PEs linked. This list is used for TCE cache invalidation. The problem is that NPU PE has just a single TVE and can be programmed to point to 32bit or 64bit windows while GPU PE has two (as any other PCI device). So we end up having an 32bit iommu_table struct linked to both PEs even though only the 64bit TCE table cache can be invalidated on NPU. And a relatively recent skiboot detects this and prints errors. This changes GPU's iommu_table_group_ops::set_window/unset_window to make sure that NPU PE is only linked to the table actually used by the hardware. If there are two tables used by an IOMMU group, the NPU PE will use the last programmed one which with the current use scenarios is expected to be a 64bit one. Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/mm: Fix typo in commentsAlexey Kardashevskiy1-7/+7
Fixes: 912cc87a6 "powerpc/mm/radix: Add LPID based tlb flush helpers" Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/lpar/debug: Initialize flags before printing debug messageAlexey Kardashevskiy1-3/+3
With enabled DEBUG, there is a compile error: "error: ‘flags’ is used uninitialized in this function". This moves pr_devel() little further where @flags are initialized. Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/init: Do not advertise radix during client-architecture-supportAlexey Kardashevskiy1-1/+2
Currently the pseries kernel advertises radix MMU support even if the actual support is disabled via the CONFIG_PPC_RADIX_MMU option. This adds a check for CONFIG_PPC_RADIX_MMU to avoid advertising radix to the hypervisor. Suggested-by: Paul Mackerras <[email protected]> Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/mm: Fix section mismatch warning in stop_machine_change_mapping()Mauricio Faria de Oliveira3-10/+10
Fix the warning messages for stop_machine_change_mapping(), and a number of other affected functions in its call chain. All modified functions are under CONFIG_MEMORY_HOTPLUG, so __meminit is okay (keeps them / does not discard them). Boot-tested on powernv/power9/radix-mmu and pseries/power8/hash-mmu. $ make -j$(nproc) CONFIG_DEBUG_SECTION_MISMATCH=y vmlinux ... MODPOST vmlinux.o WARNING: vmlinux.o(.text+0x6b130): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping() The function stop_machine_change_mapping() references the function __meminit create_physical_mapping(). This is often because stop_machine_change_mapping lacks a __meminit annotation or the annotation of create_physical_mapping is wrong. WARNING: vmlinux.o(.text+0x6b13c): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping() The function stop_machine_change_mapping() references the function __meminit create_physical_mapping(). This is often because stop_machine_change_mapping lacks a __meminit annotation or the annotation of create_physical_mapping is wrong. ... Signed-off-by: Mauricio Faria de Oliveira <[email protected]> Acked-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/64s: Wire up cpu_show_spectre_v2()Michael Ellerman1-0/+33
Add a definition for cpu_show_spectre_v2() to override the generic version. This has several permuations, though in practice some may not occur we cater for any combination. The most verbose is: Mitigation: Indirect branch serialisation (kernel only), Indirect branch cache disabled, ori31 speculation barrier enabled We don't treat the ori31 speculation barrier as a mitigation on its own, because it has to be *used* by code in order to be a mitigation and we don't know if userspace is doing that. So if that's all we see we say: Vulnerable, ori31 speculation barrier enabled Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/64s: Wire up cpu_show_spectre_v1()Michael Ellerman1-0/+8
Add a definition for cpu_show_spectre_v1() to override the generic version. Currently this just prints "Not affected" or "Vulnerable" based on the firmware flag. Although the kernel does have array_index_nospec() in a few places, we haven't yet audited all the powerpc code to see where it's necessary, so for now we don't list that as a mitigation. Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/pseries: Use the security flags in pseries_setup_rfi_flush()Michael Ellerman1-15/+12
Now that we have the security flags we can simplify the code in pseries_setup_rfi_flush() because the security flags have pessimistic defaults. Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/powernv: Use the security flags in pnv_setup_rfi_flush()Michael Ellerman1-31/+10
Now that we have the security flags we can significantly simplify the code in pnv_setup_rfi_flush(), because we can use the flags instead of checking device tree properties and because the security flags have pessimistic defaults. Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/64s: Enhance the information in cpu_show_meltdown()Michael Ellerman2-2/+29
Now that we have the security feature flags we can make the information displayed in the "meltdown" file more informative. Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/64s: Move cpu_show_meltdown()Michael Ellerman2-8/+11
This landed in setup_64.c for no good reason other than we had nowhere else to put it. Now that we have a security-related file, that is a better place for it so move it. Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/powernv: Set or clear security feature flagsMichael Ellerman1-0/+56
Now that we have feature flags for security related things, set or clear them based on what we see in the device tree provided by firmware. Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/pseries: Set or clear security feature flagsMichael Ellerman1-0/+43
Now that we have feature flags for security related things, set or clear them based on what we receive from the hypercall. Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc: Add security feature flags for Spectre/MeltdownMichael Ellerman3-1/+81
This commit adds security feature flags to reflect the settings we receive from firmware regarding Spectre/Meltdown mitigations. The feature names reflect the names we are given by firmware on bare metal machines. See the hostboot source for details. Arguably these could be firmware features, but that then requires them to be read early in boot so they're available prior to asm feature patching, but we don't actually want to use them for patching. We may also want to dynamically update them in future, which would be incompatible with the way firmware features work (at the moment at least). So for now just make them separate flags. Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/pseries: Add new H_GET_CPU_CHARACTERISTICS flagsMichael Ellerman1-0/+3
Add some additional values which have been defined for the H_GET_CPU_CHARACTERISTICS hypercall. Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/rfi-flush: Call setup_rfi_flush() after LPM migrationMichael Ellerman3-1/+6
We might have migrated to a machine that uses a different flush type, or doesn't need flushing at all. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Mauricio Faria de Oliveira <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/rfi-flush: Differentiate enabled and patched flush typesMauricio Faria de Oliveira2-4/+11
Currently the rfi-flush messages print 'Using <type> flush' for all enabled_flush_types, but that is not necessarily true -- as now the fallback flush is always enabled on pseries, but the fixup function overwrites its nop/branch slot with other flush types, if available. So, replace the 'Using <type> flush' messages with '<type> flush is available'. Also, print the patched flush types in the fixup function, so users can know what is (not) being used (e.g., the slower, fallback flush, or no flush type at all if flush is disabled via the debugfs switch). Suggested-by: Michael Ellerman <[email protected]> Signed-off-by: Mauricio Faria de Oliveira <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/rfi-flush: Always enable fallback flush on pseriesMichael Ellerman1-9/+1
This ensures the fallback flush area is always allocated on pseries, so in case a LPAR is migrated from a patched to an unpatched system, it is possible to enable the fallback flush in the target system. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Mauricio Faria de Oliveira <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/rfi-flush: Make it possible to call setup_rfi_flush() againMichael Ellerman2-2/+6
For PowerVM migration we want to be able to call setup_rfi_flush() again after we've migrated the partition. To support that we need to check that we're not trying to allocate the fallback flush area after memblock has gone away (i.e., boot-time only). Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Mauricio Faria de Oliveira <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/rfi-flush: Move the logic to avoid a redo into the debugfs codeMichael Ellerman1-5/+8
rfi_flush_enable() includes a check to see if we're already enabled (or disabled), and in that case does nothing. But that means calling setup_rfi_flush() a 2nd time doesn't actually work, which is a bit confusing. Move that check into the debugfs code, where it really belongs. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Mauricio Faria de Oliveira <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/perf: Add blacklisted events for Power9 DD2.2Madhavan Srinivasan2-0/+37
These events either do not count, or do not count correctly, so to prevent user confusion block counting them at all. Signed-off-by: Madhavan Srinivasan <[email protected]> [mpe: Change log] Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/perf: Add blacklisted events for Power9 DD2.1Madhavan Srinivasan2-0/+39
These events either do not count, or do not count correctly, so to prevent user confusion block counting them at all. Signed-off-by: Madhavan Srinivasan <[email protected]> [mpe: Change log] Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/perf: Infrastructure to support addition of blacklisted eventsMadhavan Srinivasan2-0/+23
Introduce code to support addition of blacklisted events for a processor version. Blacklisted events are events that are known to not count correctly on that CPU revision, and so should be prevented from being counted so as to avoid user confusion. A 'pointer' and 'int' variable to hold the number of events are added to 'struct power_pmu', along with a generic function to loop through the list to validate the given event. Generic function 'is_event_blacklisted' is called in power_pmu_event_init() to detect and reject early. Signed-off-by: Madhavan Srinivasan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/perf: Prevent kernel address leak via perf_get_data_addr()Madhavan Srinivasan1-0/+4
Sampled Data Address Register (SDAR) is a 64-bit register that contains the effective address of the storage operand of an instruction that was being executed, possibly out-of-order, at or around the time that the Performance Monitor alert occurred. In certain scenario SDAR happen to contain the kernel address even for userspace only sampling. Add checks to prevent it. Signed-off-by: Madhavan Srinivasan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/perf: Prevent kernel address leak to userspace via BHRB bufferMadhavan Srinivasan1-0/+10
The current Branch History Rolling Buffer (BHRB) code does not check for any privilege levels before updating the data from BHRB. This could leak kernel addresses to userspace even when profiling only with userspace privileges. Add proper checks to prevent it. Acked-by: Balbir Singh <[email protected]> Signed-off-by: Madhavan Srinivasan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/perf: Fix kernel address leak via sampling registersMichael Ellerman1-0/+15
Current code in power_pmu_disable() does not clear the sampling registers like Sampling Instruction Address Register (SIAR) and Sampling Data Address Register (SDAR) after disabling the PMU. Since these are userspace readable and could contain kernel addresses, add code to explicitly clear the content of these registers. Also add a "context synchronizing instruction" to enforce no further updates to these registers as suggested by Power ISA v3.0B. From section 9.4, on page 1108: "If an mtspr instruction is executed that changes the value of a Performance Monitor register other than SIAR, SDAR, and SIER, the change is not guaranteed to have taken effect until after a subsequent context synchronizing instruction has been executed (see Chapter 11. "Synchronization Requirements for Context Alterations" on page 1133)." Signed-off-by: Madhavan Srinivasan <[email protected]> [mpe: Massage change log and add ISA reference] Signed-off-by: Michael Ellerman <[email protected]>
2018-03-27powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9Paul Mackerras2-2/+12
On POWER9, since commit cc3d2940133d ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9", 2017-01-30), we set both the radix and HPT bits in the client-architecture-support (CAS) vector, which tells the hypervisor that we can do either radix or HPT. According to PAPR, if we use this combination we are promising to do a H_REGISTER_PROC_TBL hcall later on to let the hypervisor know whether we are doing radix or HPT. We currently do this call if we are doing radix but not if we are doing HPT. If the hypervisor is able to support both radix and HPT guests, it would be entitled to defer allocation of the HPT until the H_REGISTER_PROC_TBL call, and to fail any attempts to create HPTEs until the H_REGISTER_PROC_TBL call. Thus we need to do a H_REGISTER_PROC_TBL call when we are doing HPT; otherwise we may crash at boot time. This adds the code to call H_REGISTER_PROC_TBL in this case, before we attempt to create any HPT entries using H_ENTER. Fixes: cc3d2940133d ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9") Cc: [email protected] # v4.11+ Signed-off-by: Paul Mackerras <[email protected]> Reviewed-by: Suraj Jitindar Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-26powerpc/64s: Fix i-side SLB miss bad address handler saving nonvolatile GPRsNicholas Piggin1-1/+1
The SLB bad address handler's trap number fixup does not preserve the low bit that indicates nonvolatile GPRs have not been saved. This leads save_nvgprs to skip saving them, and subsequent functions and return from interrupt will think they are saved. This causes kernel branch-to-garbage debugging to not have correct registers, can also cause userspace to have its registers clobbered after a segfault. Fixes: f0f558b131db ("powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address") Cc: [email protected] # v4.9+ Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-24Merge branch 'topic/ppc-kvm' into nextMichael Ellerman27-136/+842
This brings in two series from Paul, one of which touches KVM code and may need to be merged into the kvm-ppc tree to resolve conflicts.
2018-03-24KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend statePaul Mackerras1-7/+10
This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors, where the contents of the TEXASR can get corrupted while a thread is in fake suspend state. The workaround is for the instruction emulation code to use the value saved at the most recent guest exit in real suspend mode. We achieve this by simply not saving the TEXASR into the vcpu struct on an exit in fake suspend state. We also have to take care to set the orig_texasr field only on guest exit in real suspend state. This also means that on guest entry in fake suspend state, TEXASR will be restored to the value it had on the last exit in real suspend state, effectively counteracting any hardware-caused corruption. This works because TEXASR may not be written in suspend state. With this, the guest might see the wrong values in TEXASR if it reads it while in suspend state, but will see the correct value in non-transactional state (e.g. after a treclaim), and treclaim will work correctly. With this workaround, the code will actually run slightly faster, and will operate correctly on systems without the TEXASR bug (since TEXASR may not be written in suspend state, and is only changed by failure recording, which will have already been done before we get into fake suspend state). Therefore these changes are not made subject to a CPU feature bit. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-24KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend modeSuraj Jitindar Singh1-4/+20
This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors, where a treclaim performed in fake suspend mode can cause subsequent reads from the XER register to return inconsistent values for the SO (summary overflow) bit. The inconsistent SO bit state can potentially be observed on any thread in the core. We have to do the treclaim because that is the only way to get the thread out of suspend state (fake or real) and into non-transactional state. The workaround for the bug is to force the core into SMT4 mode before doing the treclaim. This patch adds the code to do that, conditional on the CPU_FTR_P9_TM_XER_SO_BUG feature bit. Signed-off-by: Suraj Jitindar Singh <[email protected]> Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-24KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9Paul Mackerras16-10/+557
POWER9 has hardware bugs relating to transactional memory and thread reconfiguration (changes to hardware SMT mode). Specifically, the core does not have enough storage to store a complete checkpoint of all the architected state for all four threads. The DD2.2 version of POWER9 includes hardware modifications designed to allow hypervisor software to implement workarounds for these problems. This patch implements those workarounds in KVM code so that KVM guests see a full, working transactional memory implementation. The problems center around the use of TM suspended state, where the CPU has a checkpointed state but execution is not transactional. The workaround is to implement a "fake suspend" state, which looks to the guest like suspended state but the CPU does not store a checkpoint. In this state, any instruction that would cause a transition to transactional state (rfid, rfebb, mtmsrd, tresume) or would use the checkpointed state (treclaim) causes a "soft patch" interrupt (vector 0x1500) to the hypervisor so that it can be emulated. The trechkpt instruction also causes a soft patch interrupt. On POWER9 DD2.2, we avoid returning to the guest in any state which would require a checkpoint to be present. The trechkpt in the guest entry path which would normally create that checkpoint is replaced by either a transition to fake suspend state, if the guest is in suspend state, or a rollback to the pre-transactional state if the guest is in transactional state. Fake suspend state is indicated by a flag in the PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and reads back as 0. On exit from the guest, if the guest is in fake suspend state, we still do the treclaim instruction as we would in real suspend state, in order to get into non-transactional state, but we do not save the resulting register state since there was no checkpoint. Emulation of the instructions that cause a softpatch interrupt is handled in two paths. If the guest is in real suspend mode, we call kvmhv_p9_tm_emulation_early() to handle the cases where the guest is transitioning to transactional state. This is called before we do the treclaim in the guest exit path; because we haven't done treclaim, we can get back to the guest with the transaction still active. If the instruction is a case that kvmhv_p9_tm_emulation_early() doesn't handle, or if the guest is in fake suspend state, then we proceed to do the complete guest exit path and subsequently call kvmhv_p9_tm_emulation() in host context with the MMU on. This handles all the cases including the cases that generate program interrupts (illegal instruction or TM Bad Thing) and facility unavailable interrupts. The emulation is reasonably straightforward and is mostly concerned with checking for exception conditions and updating the state of registers such as MSR and CR0. The treclaim emulation takes care to ensure that the TEXASR register gets updated as if it were the guest treclaim instruction that had done failure recording, not the treclaim done in hypervisor state in the guest exit path. With this, the KVM_CAP_PPC_HTM capability returns true (1) even if transactional memory is not available to host userspace. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-24powerpc/powernv: Provide a way to force a core into SMT4 modePaul Mackerras6-0/+110
POWER9 processors up to and including "Nimbus" v2.2 have hardware bugs relating to transactional memory and thread reconfiguration. One of these bugs has a workaround which is to get the core into SMT4 state temporarily. This workaround is only needed when running bare-metal. This patch provides a function which gets the core into SMT4 mode by preventing threads from going to a stop state, and waking up those which are already in a stop state. Once at least 3 threads are not in a stop state, the core will be in SMT4 and we can continue. To do this, we add a "dont_stop" flag to the paca to tell the thread not to go into a stop state. If this flag is set, power9_idle_stop() just returns immediately with a return value of 0. The pnv_power9_force_smt4_catch() function does the following: 1. Set the dont_stop flag for each thread in the core, except ourselves (in fact we use an atomic_inc() in case more than one thread is calling this function concurrently). 2. See how many threads are awake, indicated by their requested_psscr field in the paca being 0. If this is at least 3, skip to step 5. 3. Send a doorbell interrupt to each thread that was seen as being in a stop state in step 2. 4. Until at least 3 threads are awake, scan the threads to which we sent a doorbell interrupt and check if they are awake now. This relies on the following properties: - Once dont_stop is non-zero, requested_psccr can't go from zero to non-zero, except transiently (and without the thread doing stop). - requested_psscr being zero guarantees that the thread isn't in a state-losing stop state where thread reconfiguration could occur. - Doing stop with a PSSCR value of 0 won't be a state-losing stop and thus won't allow thread reconfiguration. - Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core must be in SMT4 mode, since SMT modes are powers of 2. This does add a sync to power9_idle_stop(), which is necessary to provide the correct ordering between setting requested_psscr and checking dont_stop. The overhead of the sync should be unnoticeable compared to the latency of going into and out of a stop state. Because some objected to incurring this extra latency on systems where the XER[SO] bug is not relevant, I have put the test in power9_idle_stop inside a feature section. This means that pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will probably hang the system. In order to cater for uses where the caller has an operation that has to be done while the core is in SMT4, the core continues to be kept in SMT4 after pnv_power9_force_smt4_catch() function returns, until the pnv_power9_force_smt4_release() function is called. It undoes the effect of step 1 above and allows the other threads to go into a stop state. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-24powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2Paul Mackerras3-3/+33
This adds a CPU feature bit which is set for POWER9 "Nimbus" DD2.2 processors which will be used to enable the hypervisor to assist hardware with the handling of checkpointed register values while the CPU is in suspend state, in order to work around hardware bugs. The hardware assistance for these workarounds introduced a new hardware bug relating to the XER[SO] bit. We add a separate feature bit for this bug in case future chips fix it while still requiring the hypervisor assistance with suspend state. When the dt_cpu_ftrs subsystem is in use, the software assistance can be enabled using a "tm-suspend-hypervisor-assist" node in the device tree, and a "tm-suspend-xer-so-bug" node enables the workarounds for the XER[SO] bug. In the absence of such nodes, a quirk enables both for POWER9 "Nimbus" DD2.2 processors. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-24powerpc: Free up CPU feature bits on 64-bit machinesPaul Mackerras3-64/+73
This moves all the CPU feature bits that are only used on 32-bit machines to the top 20 bits of the CPU feature word and arranges for them to be defined only in 32-bit builds. The features that are common to 32-bit and 64-bit machines are moved to bits 0-11 of the CPU feature word. This means that for 64-bit platforms, bits 44-63 can now be used for new features that only exist on 64-bit machines. (These bit numbers are counting from the right, i.e. the LSB is bit 0.) Because CPU_FTR_L3_DISABLE_NAP moved from the low 16 bits to the high 16 bits, we have to adjust some assembly code. Also, CPU_FTR_EMB_HV moved from the high 16 bits to the low 16 bits. Note that CPU_FTR_REAL_LE only applies to 64-bit chips, because only 64-bit chips (POWER6, 7, 8, 9) have a true little-endian mode that is a CPU execution mode as opposed to being a page attribute. With this we now have 20 free CPU feature bits on 64-bit machines. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-24powerpc: Book E: Remove unused CPU_FTR_L2CSR bitPaul Mackerras1-4/+3
The CPU_FTR_L2CSR bit is never tested anywhere, so let's reclaim the bit. The last usage was removed in 86d63363defc ("powerpc/e500mc: Remove dead L2 flushing code in idle_e500.S") (Jun 2015). Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-24powerpc: Use feature bit for RTC presence rather than timebase presencePaul Mackerras4-55/+47
All PowerPC CPUs other than the original PPC601 have a timebase register rather than the "real-time clock" (RTC) register that the PPC601 (and the original POWER and POWER2 CPUs) had. Currently we have a CPU feature bit to indicate the presence of the timebase, but it makes more sense to use a bit to indicate the unusual situation rather than the common situation. This therefore defines a CPU_FTR_USE_RTC bit in place of the CPU_FTR_USE_TB bit, and arranges for it to be set on PPC601 systems. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-03-23powerpc/mm: Fixup tlbie vs store ordering issue on POWER9Aneesh Kumar K.V7-2/+50
On POWER9, under some circumstances, a broadcast TLB invalidation might complete before all previous stores have drained, potentially allowing stale stores from becoming visible after the invalidation. This works around it by doubling up those TLB invalidations which was verified by HW to be sufficient to close the risk window. This will be documented in a yet-to-be-published errata. Fixes: 1a472c9dba6b ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: Aneesh Kumar K.V <[email protected]> [mpe: Enable the feature in the DT CPU features code for all Power9, rename the feature to CPU_FTR_P9_TLBIE_BUG per benh.] Signed-off-by: Michael Ellerman <[email protected]>