Age | Commit message (Collapse) | Author | Files | Lines |
|
slb_shadow structures are avoided for radix environment.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We no longer allocate lppacas in an array, so this patch removes the
1kB static alignment for the structure, and enforces the PAPR
alignment requirements at allocation time. We can not reduce the 1kB
allocation size however, due to existing KVM hypervisors.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Change the paca array into an array of pointers to pacas. Allocate
pacas individually.
This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.
This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The "lppaca" is a structure registered with the hypervisor. This is
unnecessary when running on non-virtualised platforms. One field from
the lppaca (pmcregs_in_use) is also used by the host, so move the host
part out into the paca (lppaca field is still updated in
guest mode).
Signed-off-by: Nicholas Piggin <[email protected]>
[mpe: Fix non-pseries build with some #ifdefs]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In mpic_physmask() we loop over all CPUs up to 32, then get the hard
SMP processor id of that CPU.
Currently that's possibly walking off the end of the paca array, but
in a future patch we will change the paca array to be an array of
pointers, and in that case we will get a NULL for missing CPUs and
oops. eg:
Unable to handle kernel paging request for data at address 0x88888888888888b8
Faulting instruction address: 0xc00000000004e380
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP .mpic_set_affinity+0x60/0x1a0
LR .irq_do_set_affinity+0x48/0x100
Fix it by checking the CPU is possible, this also fixes the code if
there are gaps in the CPU numbering which probably never happens on
mpic systems but who knows.
Debugged-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Merge our fixes branch from the 4.16 cycle.
There were a number of important fixes merged, in particular some Power9
workarounds that we want in next for testing purposes. There's also been
some conflicting changes in the CPU features code which are best merged
and tested before going upstream.
|
|
Merge the DAWR series, which touches arch code and KVM code and may need
to be merged into the kvm-ppc tree.
|
|
Using the DAWR on POWER9 can cause xstops, hence we need to disable
it.
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This disables the DAWR on all POWER9 CPUs via cpu feature quirk.
Using the DAWR on POWER9 can cause xstops, hence we need to disable
it.
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
POWER9 with the DAWR disabled causes problems for partition
migration. Either we have to fail the migration (since we lose the
DAWR) or we silently drop the DAWR and allow the migration to pass.
This patch does the latter and allows the migration to pass (at the
cost of silently losing the DAWR). This is not ideal but hopefully the
best overall solution. This approach has been acked by Paulus.
With this patch kvmppc_set_one_reg() will store the DAWR in the vcpu
but won't actually set it on POWER9 hardware.
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
POWER7 compat mode guests can use h_set_dabr on POWER9. POWER9 should
use the DAWR but since it's disabled there we can't.
This returns H_UNSUPPORTED on a h_set_dabr() on POWER9 where the DAWR
is disabled.
Current Linux guests ignore this error, so they will silently not get
the DAWR (sigh). The same error code is being used by POWERVM in this
case.
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Return H_P2 on a h_set_mode(SET_DAWR) on POWER9 where the DAWR is
disabled.
Current Linux guests ignore this error, so they will silently not get
the DAWR (sigh). The same error code is being used by POWERVM in this
case.
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The 'bd' command will now print an error and not set the breakpoint on
P9.
Signed-off-by: Michael Neuling <[email protected]>
[mpe: Unsplit quoted string]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This updates the ptrace code to use ppc_breakpoint_available().
We now advertise via PPC_PTRACE_GETHWDBGINFO zero breakpoints when the
DAWR is missing (ie. POWER9). This results in GDB falling back to
software emulation of the breakpoint (which is slow).
For the features advertised by PPC_PTRACE_GETHWDBGINFO, we keep
advertising DAWR as if we don't GDB assumes 1 breakpoint irrespective
of the number of breakpoints advertised. GDB then fails later when
trying to set this one breakpoint.
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Add ppc_breakpoint_available() to determine if a breakpoint is
available currently via the DAWR or DABR.
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Checking for a "fully active" device state requires testing two flag
bits, which is open coded in several places, so add a function to do
it.
Signed-off-by: Sam Bobroff <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The caller will always pass NULL for 'rmv_data' when
'eeh_aware_driver' is true, so the first two calls to
eeh_pe_dev_traverse() can be combined without changing behaviour as
can the two arms of the final 'if' block.
This should not change behaviour.
Signed-off-by: Sam Bobroff <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
eeh_reset_device() tests the value of 'bus' more than once but the
only caller, eeh_handle_normal_device() does this test itself and will
never pass NULL.
So, remove the dead tests.
This should not change behaviour.
Signed-off-by: Sam Bobroff <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
It is currently difficult to understand the behaviour of
eeh_reset_device() due to the way it's parameters are used. In
particular, when 'bus' is NULL, it's value is still necessary so the
same value is looked up again locally under a different name
('frozen_bus') but behaviour is changed.
To clarify this, add a new parameter 'driver_eeh_aware', and have the
caller set it when it would have passed NULL for 'bus' and always pass
a value for 'bus'. Then change any test that was on 'bus' to one on
'!driver_eeh_aware' and replace uses of 'frozen_bus' with 'bus'.
Also update the function's comment.
This should not change behaviour.
Signed-off-by: Sam Bobroff <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The name "frozen_bus" is misleading: it's not necessarily frozen, it's
just the PE's PCI bus.
Signed-off-by: Sam Bobroff <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Remove a test that checks if "frozen_bus" is NULL, because it cannot
have changed since it was tested at the start of the function and so
must be true here.
Signed-off-by: Sam Bobroff <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Commit "0ba178888b05 powerpc/eeh: Remove reference to PCI device"
removed a call to pci_dev_get() from __eeh_addr_cache_get_device() but
did not update the comment to match.
Signed-off-by: Sam Bobroff <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently the EEH_PE_RECOVERING flag for a PE is managed by both the
caller and callee of eeh_handle_normal_event() (among other places not
considered here). This is complicated by the fact that the PE may
or may not have been invalidated by the call.
So move the callee's handling into eeh_handle_normal_event(), which
clarifies it and allows the return type to be changed to void (because
it no longer needs to indicate at the PE has been invalidated).
This should not change behaviour except in eeh_event_handler() where
it was previously possible to cause eeh_pe_state_clear() to be called
on an invalid PE, which is now avoided.
Signed-off-by: Sam Bobroff <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The function eeh_handle_event(pe) does nothing other than switching
between calling eeh_handle_normal_event(pe) and
eeh_handle_special_event(). However it is only called in two places,
one where pe can't be NULL and the other where it must be NULL (see
eeh_event_handler()) so it does nothing but obscure the flow of
control.
So, remove it.
Signed-off-by: Sam Bobroff <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
enabled
GPUs and the corresponding NVLink bridges get different PEs as they
have separate translation validation entries (TVEs). We put these PEs
to the same IOMMU group so they cannot be passed through separately.
So the iommu_table_group_ops::set_window/unset_window for GPUs do set
tables to the NPU PEs as well which means that iommu_table's list of
attached PEs (iommu_table_group_link) has both GPU and NPU PEs linked.
This list is used for TCE cache invalidation.
The problem is that NPU PE has just a single TVE and can be programmed
to point to 32bit or 64bit windows while GPU PE has two (as any other
PCI device). So we end up having an 32bit iommu_table struct linked to
both PEs even though only the 64bit TCE table cache can be invalidated
on NPU. And a relatively recent skiboot detects this and prints
errors.
This changes GPU's iommu_table_group_ops::set_window/unset_window to
make sure that NPU PE is only linked to the table actually used by the
hardware. If there are two tables used by an IOMMU group, the NPU PE
will use the last programmed one which with the current use scenarios
is expected to be a 64bit one.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Fixes: 912cc87a6 "powerpc/mm/radix: Add LPID based tlb flush helpers"
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
With enabled DEBUG, there is a compile error:
"error: ‘flags’ is used uninitialized in this function".
This moves pr_devel() little further where @flags are initialized.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently the pseries kernel advertises radix MMU support even if
the actual support is disabled via the CONFIG_PPC_RADIX_MMU option.
This adds a check for CONFIG_PPC_RADIX_MMU to avoid advertising radix
to the hypervisor.
Suggested-by: Paul Mackerras <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Fix the warning messages for stop_machine_change_mapping(), and a number
of other affected functions in its call chain.
All modified functions are under CONFIG_MEMORY_HOTPLUG, so __meminit
is okay (keeps them / does not discard them).
Boot-tested on powernv/power9/radix-mmu and pseries/power8/hash-mmu.
$ make -j$(nproc) CONFIG_DEBUG_SECTION_MISMATCH=y vmlinux
...
MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x6b130): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping()
The function stop_machine_change_mapping() references
the function __meminit create_physical_mapping().
This is often because stop_machine_change_mapping lacks a __meminit
annotation or the annotation of create_physical_mapping is wrong.
WARNING: vmlinux.o(.text+0x6b13c): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping()
The function stop_machine_change_mapping() references
the function __meminit create_physical_mapping().
This is often because stop_machine_change_mapping lacks a __meminit
annotation or the annotation of create_physical_mapping is wrong.
...
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Add a definition for cpu_show_spectre_v2() to override the generic
version. This has several permuations, though in practice some may not
occur we cater for any combination.
The most verbose is:
Mitigation: Indirect branch serialisation (kernel only), Indirect
branch cache disabled, ori31 speculation barrier enabled
We don't treat the ori31 speculation barrier as a mitigation on its
own, because it has to be *used* by code in order to be a mitigation
and we don't know if userspace is doing that. So if that's all we see
we say:
Vulnerable, ori31 speculation barrier enabled
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Add a definition for cpu_show_spectre_v1() to override the generic
version. Currently this just prints "Not affected" or "Vulnerable"
based on the firmware flag.
Although the kernel does have array_index_nospec() in a few places, we
haven't yet audited all the powerpc code to see where it's necessary,
so for now we don't list that as a mitigation.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Now that we have the security flags we can simplify the code in
pseries_setup_rfi_flush() because the security flags have pessimistic
defaults.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Now that we have the security flags we can significantly simplify the
code in pnv_setup_rfi_flush(), because we can use the flags instead of
checking device tree properties and because the security flags have
pessimistic defaults.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Now that we have the security feature flags we can make the
information displayed in the "meltdown" file more informative.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This landed in setup_64.c for no good reason other than we had nowhere
else to put it. Now that we have a security-related file, that is a
better place for it so move it.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Now that we have feature flags for security related things, set or
clear them based on what we see in the device tree provided by
firmware.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Now that we have feature flags for security related things, set or
clear them based on what we receive from the hypercall.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This commit adds security feature flags to reflect the settings we
receive from firmware regarding Spectre/Meltdown mitigations.
The feature names reflect the names we are given by firmware on bare
metal machines. See the hostboot source for details.
Arguably these could be firmware features, but that then requires them
to be read early in boot so they're available prior to asm feature
patching, but we don't actually want to use them for patching. We may
also want to dynamically update them in future, which would be
incompatible with the way firmware features work (at the moment at
least). So for now just make them separate flags.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Add some additional values which have been defined for the
H_GET_CPU_CHARACTERISTICS hypercall.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We might have migrated to a machine that uses a different flush type,
or doesn't need flushing at all.
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently the rfi-flush messages print 'Using <type> flush' for all
enabled_flush_types, but that is not necessarily true -- as now the
fallback flush is always enabled on pseries, but the fixup function
overwrites its nop/branch slot with other flush types, if available.
So, replace the 'Using <type> flush' messages with '<type> flush is
available'.
Also, print the patched flush types in the fixup function, so users
can know what is (not) being used (e.g., the slower, fallback flush,
or no flush type at all if flush is disabled via the debugfs switch).
Suggested-by: Michael Ellerman <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This ensures the fallback flush area is always allocated on pseries,
so in case a LPAR is migrated from a patched to an unpatched system,
it is possible to enable the fallback flush in the target system.
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
For PowerVM migration we want to be able to call setup_rfi_flush()
again after we've migrated the partition.
To support that we need to check that we're not trying to allocate the
fallback flush area after memblock has gone away (i.e., boot-time only).
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
rfi_flush_enable() includes a check to see if we're already
enabled (or disabled), and in that case does nothing.
But that means calling setup_rfi_flush() a 2nd time doesn't actually
work, which is a bit confusing.
Move that check into the debugfs code, where it really belongs.
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
These events either do not count, or do not count correctly, so to
prevent user confusion block counting them at all.
Signed-off-by: Madhavan Srinivasan <[email protected]>
[mpe: Change log]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
These events either do not count, or do not count correctly, so to
prevent user confusion block counting them at all.
Signed-off-by: Madhavan Srinivasan <[email protected]>
[mpe: Change log]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Introduce code to support addition of blacklisted events for a
processor version. Blacklisted events are events that are known to not
count correctly on that CPU revision, and so should be prevented from
being counted so as to avoid user confusion.
A 'pointer' and 'int' variable to hold the number of events are added
to 'struct power_pmu', along with a generic function to loop through
the list to validate the given event. Generic function
'is_event_blacklisted' is called in power_pmu_event_init() to detect
and reject early.
Signed-off-by: Madhavan Srinivasan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Sampled Data Address Register (SDAR) is a 64-bit register that
contains the effective address of the storage operand of an
instruction that was being executed, possibly out-of-order, at or
around the time that the Performance Monitor alert occurred.
In certain scenario SDAR happen to contain the kernel address even for
userspace only sampling. Add checks to prevent it.
Signed-off-by: Madhavan Srinivasan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The current Branch History Rolling Buffer (BHRB) code does not check
for any privilege levels before updating the data from BHRB. This
could leak kernel addresses to userspace even when profiling only with
userspace privileges. Add proper checks to prevent it.
Acked-by: Balbir Singh <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Current code in power_pmu_disable() does not clear the sampling
registers like Sampling Instruction Address Register (SIAR) and
Sampling Data Address Register (SDAR) after disabling the PMU. Since
these are userspace readable and could contain kernel addresses, add
code to explicitly clear the content of these registers.
Also add a "context synchronizing instruction" to enforce no further
updates to these registers as suggested by Power ISA v3.0B. From
section 9.4, on page 1108:
"If an mtspr instruction is executed that changes the value of a
Performance Monitor register other than SIAR, SDAR, and SIER, the
change is not guaranteed to have taken effect until after a
subsequent context synchronizing instruction has been executed (see
Chapter 11. "Synchronization Requirements for Context Alterations"
on page 1133)."
Signed-off-by: Madhavan Srinivasan <[email protected]>
[mpe: Massage change log and add ISA reference]
Signed-off-by: Michael Ellerman <[email protected]>
|