Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently, FADump is only supported on pSeries but that is going to
change soon with FADump support being added on PowerNV platform. So,
move rtas specific definitions to platform code to allow FADump
to have multiple platforms support.
Signed-off-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Declare helper functions, that can be reused by multiple platforms,
in the internal header file.
Signed-off-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add helper functions to setup & free CPU notes buffer and to find if a
given memory area is contiguous. Also, use boolean as return type for
the function that finds if boot memory area is contiguous. While at
it, save the virtual address of CPU notes buffer instead of physical
address as virtual address is used often.
Signed-off-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Though asm/fadump.h is meant to be used by other components dealing
with FADump, it also has macros/definitions internal to FADump code.
Move them to a new header file used within FADump code. This also
makes way for refactoring platform specific FADump code.
Signed-off-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
kvm_tmp is now in .text and so doesn't need a special overlap check.
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The build breaks when STACKTRACE=n, eg. skiroot_defconfig:
arch/powerpc/kernel/eeh_event.c:124:23: error: implicit declaration of function 'stack_trace_save'
Fix it with some ifdefs for now.
Fixes: 25baf3d81614 ("powerpc/eeh: Defer printing stack trace")
Signed-off-by: Michael Ellerman <[email protected]>
|
|
There's a bug in skiboot that causes the OPAL_XIVE_ALLOCATE_IRQ call
to return the 32-bit value 0xffffffff when OPAL has run out of IRQs.
Unfortunatelty, OPAL return values are signed 64-bit entities and
errors are supposed to be negative. If that happens, the linux code
confusingly treats 0xffffffff as a valid IRQ number and panics at some
point.
A fix was recently merged in skiboot:
e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()")
but we need a workaround anyway to support older skiboots already
in the field.
Internally convert 0xffffffff to OPAL_RESOURCE which is the usual error
returned upon resource exhaustion.
Cc: [email protected] # v4.12+
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
We have OPAL_MSG_PRD message type to pass prd related messages from
OPAL to `opal-prd`. It can handle messages upto 64 bytes. We have a
requirement to send bigger than 64 bytes of data from OPAL to
`opal-prd`. Lets add new message type (OPAL_MSG_PRD2) to pass bigger
data.
Signed-off-by: Vasant Hegde <[email protected]>
[mpe: Make the error string clear that it's the PRD2 event that failed]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The setjmp function should be declared as "returns_twice", or bad
things can happen[1]. This does not actually change generated code in
my testing.
The longjmp function should be declared as "noreturn", so that the
compiler can optimise calls to it better. This makes the generated
code a little shorter.
1: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-returns_005ftwice-function-attribute
Signed-off-by: Segher Boessenkool <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/c02ce4a573f3bac907e2c70957a2d1275f910013.1567605586.git.segher@kernel.crashing.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for 5.4
- New ITS translation cache
- Allow up to 512 CPUs to be supported with GICv3 (for real this time)
- Now call kvm_arch_vcpu_blocking early in the blocking sequence
- Tidy-up device mappings in S2 when DIC is available
- Clean icache invalidation on VMID rollover
- General cleanup
|
|
Commit 2874c5fd2842 ("treewide: Replace GPLv2 boilerplate/reference with
SPDX - rule 152") left an empty comment in machdep.h, as the boilerplate
was the only text in the comment. Remove the empty comment.
Signed-off-by: Jordan Niethe <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Introduce two options to control the use of the tlbie instruction. A
boot time option which completely disables the kernel using the
instruction, this is currently incompatible with HASH MMU, KVM, and
coherent accelerators.
And a debugfs option can be switched at runtime and avoids using tlbie
for invalidating CPU TLBs for normal process and kernel address
mappings. Coherent accelerators are still managed with tlbie, as will
KVM partition scope translations.
Cross-CPU TLB flushing is implemented with IPIs and tlbiel. This is a
basic implementation which does not attempt to make any optimisation
beyond the tlbie implementation.
This is useful for performance testing among other things. For example
in certain situations on large systems, using IPIs may be faster than
tlbie as they can be directed rather than broadcast. Later we may also
take advantage of the IPIs to do more interesting things such as trim
the mm cpumask more aggressively.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
No functional change.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
There should be no functional changes.
- Use calls to existing radix_tlb.c functions in flush_partition.
- Rename radix__flush_tlb_lpid to radix__flush_all_lpid and similar,
because they flush everything, matching flush_all_mm rather than
flush_tlb_mm for the lpid.
- Remove some unused radix_tlb.c flush primitives.
Signed-off: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This callback is only required because the partition table init comes
before process table allocation on powernv (aka bare metal aka native).
Change the order to allocate the process table first, and remove the
callback.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Currently we print a stack trace in the event handler to help with
debugging EEH issues. In the case of suprise hot-unplug this is unneeded,
so we want to prevent printing the stack trace unless we know it's due to
an actual device error. To accomplish this, we can save a stack trace at
the point of detection and only print it once the EEH recovery handler has
determined the freeze was due to an actual error.
Since the whole point of this is to prevent spurious EEH output we also
move a few prints out of the detection thread, or mark them as pr_debug
so anyone interested can get output from the eeh_check_dev_failure()
if they want.
Signed-off-by: Oliver O'Halloran <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When hot-adding devices we rely on the hotplug driver to create pci_dn's
for the devices under the hotplug slot. Converse, when hot-removing the
driver will remove the pci_dn's that it created. This is a problem because
the pci_dev is still live until it's refcount drops to zero. This can
happen if the driver is slow to tear down it's internal state. Ideally, the
driver would not attempt to perform any config accesses to the device once
it's been marked as removed, but sometimes it happens. As a result, we
might attempt to access the pci_dn for a device that has been torn down and
the kernel may crash as a result.
To fix this, don't free the pci_dn unless the corresponding pci_dev has
been released. If the pci_dev is still live, then we mark the pci_dn with
a flag that indicates the pci_dev's release function should free it.
Signed-off-by: Oliver O'Halloran <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The head-64.h code should deal only with the head code sections
and offset calculations.
No generated code change except BUG line number constants.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The common machine_check_event data structures and queues are mostly
platform independent, with powernv decoding SRR1/DSISR/etc., into
machine_check_event objects.
This patch converts pseries to use this infrastructure by decoding
fwnmi/rtas data into machine_check_event objects.
This allows queueing to be used by a subsequent change to delay the
virtual mode handling of machine checks that occur in kernel space
where it is unsafe to switch immediately to virtual mode, similarly
to powernv.
Signed-off-by: Nicholas Piggin <[email protected]>
[mpe: Fix implicit fallthrough warnings in mce_handle_error()]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
SWIOTLB checks range of incoming CPU addresses to be bounced and sees if
the device can access it through its DMA window without requiring bouncing.
In such cases it just chooses to skip bouncing. But for cases like secure
guests on powerpc platform all addresses need to be bounced into the shared
pool of memory because the host cannot access it otherwise. Hence the need
to do the bouncing is not related to device's DMA window and use of bounce
buffers is forced by setting swiotlb_force.
Also, connect the shared memory conversion functions into the
ARCH_HAS_MEM_ENCRYPT hooks and call swiotlb_update_mem_attributes() to
convert SWIOTLB's memory pool to shared memory.
Signed-off-by: Anshuman Khandual <[email protected]>
[ bauerman: Use ARCH_HAS_MEM_ENCRYPT hooks to share swiotlb memory pool. ]
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
A new kernel deserves a clean slate. Any pages shared with the hypervisor
is unshared before invoking the new kernel. However there are exceptions.
If the new kernel is invoked to dump the current kernel, or if there is a
explicit request to preserve the state of the current kernel, unsharing
of pages is skipped.
NOTE: While testing crashkernel, make sure at least 256M is reserved for
crashkernel. Otherwise SWIOTLB allocation will fail and crash kernel will
fail to boot.
Signed-off-by: Ram Pai <[email protected]>
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Secure guests need to share the DTL buffers with the hypervisor. To that
end, use a kmem_cache constructor which converts the underlying buddy
allocated SLUB cache pages into shared memory.
Signed-off-by: Anshuman Khandual <[email protected]>
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
LPPACA structures need to be shared with the host. Hence they need to be in
shared memory. Instead of allocating individual chunks of memory for a
given structure from memblock, a contiguous chunk of memory is allocated
and then converted into shared memory. Subsequent allocation requests will
come from the contiguous chunk which will be always shared memory for all
structures.
While we are able to use a kmem_cache constructor for the Debug Trace Log,
LPPACAs are allocated very early in the boot process (before SLUB is
available) so we need to use a simpler scheme here.
Introduce helper is_svm_platform() which uses the S bit of the MSR to tell
whether we're running as a secure guest.
Signed-off-by: Anshuman Khandual <[email protected]>
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Protected Execution Facility (PEF) is an architectural change for
POWER 9 that enables Secure Virtual Machines (SVMs). When enabled,
PEF adds a new higher privileged mode, called Ultravisor mode, to
POWER architecture.
The hardware changes include the following:
* There is a new bit in the MSR that determines whether the current
process is running in secure mode, MSR(S) bit 41. MSR(S)=1, process
is in secure mode, MSR(s)=0 process is in normal mode.
* The MSR(S) bit can only be set by the Ultravisor.
* HRFID cannot be used to set the MSR(S) bit. If the hypervisor needs
to return to a SVM it must use an ultracall. It can determine if
the VM it is returning to is secure.
* The privilege of a process is now determined by three MSR bits,
MSR(S, HV, PR). In each of the tables below the modes are listed
from least privilege to highest privilege. The higher privilege
modes can access all the resources of the lower privilege modes.
**Secure Mode MSR Settings**
+---+---+---+---------------+
| S | HV| PR|Privilege |
+===+===+===+===============+
| 1 | 0 | 1 | Problem |
+---+---+---+---------------+
| 1 | 0 | 0 | Privileged(OS)|
+---+---+---+---------------+
| 1 | 1 | 0 | Ultravisor |
+---+---+---+---------------+
| 1 | 1 | 1 | Reserved |
+---+---+---+---------------+
**Normal Mode MSR Settings**
+---+---+---+---------------+
| S | HV| PR|Privilege |
+===+===+===+===============+
| 0 | 0 | 1 | Problem |
+---+---+---+---------------+
| 0 | 0 | 0 | Privileged(OS)|
+---+---+---+---------------+
| 0 | 1 | 0 | Hypervisor |
+---+---+---+---------------+
| 0 | 1 | 1 | Problem (HV) |
+---+---+---+---------------+
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Ram Pai <[email protected]>
[ cclaudio: Update the commit message ]
Signed-off-by: Claudio Carvalho <[email protected]>
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
These functions are used when the guest wants to grant the hypervisor
access to certain pages.
Signed-off-by: Ram Pai <[email protected]>
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Make the Enter-Secure-Mode (ESM) ultravisor call to switch the VM to secure
mode. Pass kernel base address and FDT address so that the Ultravisor is
able to verify the integrity of the VM using information from the ESM blob.
Add "svm=" command line option to turn on switching to secure mode.
Signed-off-by: Ram Pai <[email protected]>
[ andmike: Generate an RTAS os-term hcall when the ESM ucall fails. ]
Signed-off-by: Michael Anderson <[email protected]>
[ bauerman: Cleaned up the code a bit. ]
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Introduce CONFIG_PPC_SVM to control support for secure guests and include
Ultravisor-related helpers when it is selected
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Merge our ppc-kvm topic branch to bring in the Ultravisor support
patches.
|
|
When an SVM makes an hypercall or incurs some other exception, the
Ultravisor usually forwards (a.k.a. reflects) the exceptions to the
Hypervisor. After processing the exception, Hypervisor uses the
UV_RETURN ultracall to return control back to the SVM.
The expected register state on entry to this ultracall is:
* Non-volatile registers are restored to their original values.
* If returning from an hypercall, register R0 contains the return value
(unlike other ultracalls) and, registers R4 through R12 contain any
output values of the hypercall.
* R3 contains the ultracall number, i.e UV_RETURN.
* If returning with a synthesized interrupt, R2 contains the
synthesized interrupt number.
Thanks to input from Paul Mackerras, Ram Pai and Mike Anderson.
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Claudio Carvalho <[email protected]>
Acked-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In ultravisor enabled systems, PTCR becomes ultravisor privileged only
for writing and an attempt to write to it will cause a Hypervisor
Emulation Assitance interrupt.
This patch uses the set_ptcr_when_no_uv() function to restrict PTCR
writing to only when ultravisor is disabled.
Signed-off-by: Claudio Carvalho <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When Ultravisor (UV) is enabled, the partition table is stored in secure
memory and can only be accessed via the UV. The Hypervisor (HV) however
maintains a copy of the partition table in normal memory to allow Nest MMU
translations to occur (for normal VMs). The HV copy includes partition
table entries (PATE)s for secure VMs which would currently be unused
(Nest MMU translations cannot access secure memory) but they would be
needed as we add functionality.
This patch adds the UV_WRITE_PATE ucall which is used to update the PATE
for a VM (both normal and secure) when Ultravisor is enabled.
Signed-off-by: Michael Anderson <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
Signed-off-by: Ram Pai <[email protected]>
[ cclaudio: Write the PATE in HV's table before doing that in UV's ]
Signed-off-by: Claudio Carvalho <[email protected]>
Reviewed-by: Ryan Grimm <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In PEF enabled systems, some of the resources which were previously
hypervisor privileged are now ultravisor privileged and controlled by
the ultravisor firmware.
This adds FW_FEATURE_ULTRAVISOR to indicate if PEF is enabled.
The host kernel can use FW_FEATURE_ULTRAVISOR, for instance, to skip
accessing resources (e.g. PTCR and LDBAR) in case PEF is enabled.
Signed-off-by: Claudio Carvalho <[email protected]>
[ andmike: Device node name to "ibm,ultravisor" ]
Signed-off-by: Michael Anderson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The ultracalls (ucalls for short) allow the Secure Virtual Machines
(SVM)s and hypervisor to request services from the ultravisor such as
accessing a register or memory region that can only be accessed when
running in ultravisor-privileged mode.
This patch adds the ucall_norets() ultravisor call handler.
The specific service needed from an ucall is specified in register
R3 (the first parameter to the ucall). Other parameters to the
ucall, if any, are specified in registers R4 through R12.
Return value of all ucalls is in register R3. Other output values
from the ucall, if any, are returned in registers R4 through R12.
Each ucall returns specific error codes, applicable in the context
of the ucall. However, like with the PowerPC Architecture Platform
Reference (PAPR), if no specific error code is defined for a particular
situation, then the ucall will fallback to an erroneous
parameter-position based code. i.e U_PARAMETER, U_P2, U_P3 etc depending
on the ucall parameter that may have caused the error.
Every host kernel (powernv) needs to be able to do ucalls in case it
ends up being run in a machine with ultravisor enabled. Otherwise, the
kernel may crash early in boot trying to access ultravisor resources,
for instance, trying to set the partition table entry 0. Secure guests
also need to be able to do ucalls and its kernel may not have
CONFIG_PPC_POWERNV=y. For that reason, the ucall.S file is placed under
arch/powerpc/kernel.
If ultravisor is not enabled, the ucalls will be redirected to the
hypervisor which must handle/fail the call.
Thanks to inputs from Ram Pai and Michael Anderson.
Signed-off-by: Claudio Carvalho <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add the PowerPC name and the PPC_ELFNOTE_CAPABILITIES type in the
kernel binary ELF note. This type is a bitmap that can be used to
advertise kernel capabilities to userland.
This patch also defines PPCCAP_ULTRAVISOR_BIT as being the bit zero.
Suggested-by: Paul Mackerras <[email protected]>
Signed-off-by: Claudio Carvalho <[email protected]>
[ maxiwell: Define the 'PowerPC' type in the elfnote.h ]
Signed-off-by: Maxiwell S. Garcia <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
As now we have xchg_no_kill/tce_kill, these are not used anymore so
remove them.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
At the moment updates in a TCE table are made by iommu_table_ops::exchange
which update one TCE and invalidates an entry in the PHB/NPU TCE cache
via set of registers called "TCE Kill" (hence the naming).
Writing a TCE is a simple xchg() but invalidating the TCE cache is
a relatively expensive OPAL call. Mapping a 100GB guest with PCI+NPU
passed through devices takes about 20s.
Thankfully we can do better. Since such big mappings happen at the boot
time and when memory is plugged/onlined (i.e. not often), these requests
come in 512 pages so we call call OPAL 512 times less which brings 20s
from the above to less than 10s. Also, since TCE caches can be flushed
entirely, calling OPAL for 512 TCEs helps skiboot [1] to decide whether
to flush the entire cache or not.
This implements 2 new iommu_table_ops callbacks:
- xchg_no_kill() to update a single TCE with no TCE invalidation;
- tce_kill() to invalidate multiple TCEs.
This uses the same xchg_no_kill() callback for IODA1/2.
This implements 2 new wrappers on top of the new callbacks similar to
the existing iommu_tce_xchg().
This does not use the new callbacks yet, the next patches will;
so this should not cause any behavioral change.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This switches to using common code for the DMA allocations, including
potential use of the CMA allocator if configured.
Switching to the generic code enables DMA allocations from atomic
context, which is required by the DMA API documentation, and also
adds various other minor features drivers start relying upon. It
also makes sure we have on tested code base for all architectures
that require uncached pte bits for coherent DMA allocations.
Another advantage is that consistent memory allocations now share
the general vmalloc pool instead of needing an explicit careout
from it.
Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Christophe Leroy <[email protected]> # tested on 8xx
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Only 601 and e200 have unified I/D cache.
Drop the feature and use CONFIG_PPC_BOOK3S_601 and CONFIG_E200.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/b5902144266d2f4eed1ffea53915bd0245841e02.1566834712.git.christophe.leroy@c-s.fr
|
|
CPU_FTR_USE_RTC feature only applies to powerpc601.
Drop this feature and replace it with tests on CONFIG_PPC_BOOK3S_601.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/170411e2360861f4a95c21faad43519a08bc4040.1566834712.git.christophe.leroy@c-s.fr
|
|
Now that 601 is exclusive from other 6xx, CPU_FTR_601 and
associated fixups are useless.
Drop this feature and use #ifdefs instead.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/ecdb7194a17dbfa01865df6a82979533adc2c70b.1566834712.git.christophe.leroy@c-s.fr
|
|
Prior to commit 1bd98d7fbaf5 ("ppc64: Update BUG handling based on
ppc32"), BUG() family was using BUG_ILLEGAL_INSTRUCTION which
was an invalid instruction opcode to trap into program check
exception.
That commit converted them to using standard trap instructions,
but prom/prom_init and their PROM_BUG() macro were left over.
head_64.S and exception-64s.S were left aside as well.
Convert them to using the standard BUG infrastructure.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/cdaf4bbbb64c288a077845846f04b12683f8875a.1566817807.git.christophe.leroy@c-s.fr
|
|
Booting w/ppc64le_defconfig + CONFIG_PREEMPT on bare metal results in
the oops below due to calling into __spin_yield() when not running in
an SPLPAR, which means lppaca pointers are NULL.
We fixed a similar case previously in commit a6201da34ff9 ("powerpc:
Fix oops due to bad access of lppaca on bare metal"), by adding SPLPAR
checks in lppaca_shared_proc(). However when PREEMPT is enabled we can
call __spin_yield() directly from arch_spin_yield().
To fix it add spin_yield() and rw_yield() which check that
shared-processor LPAR is enabled before calling the SPLPAR-only
implementation of each.
BUG: Kernel NULL pointer dereference at 0x00000100
Faulting instruction address: 0xc000000000097f88
Oops: Kernel access of bad area, sig: 7 [#1]
LE PAGE_SIZE=64K MMU=Radix MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.2.0-rc6-00491-g249155c20f9b #28
NIP: c000000000097f88 LR: c000000000c07a88 CTR: c00000000015ca10
REGS: c0000000727079f0 TRAP: 0300 Not tainted (5.2.0-rc6-00491-g249155c20f9b)
MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 84000424 XER: 20040000
CFAR: c000000000c07a84 DAR: 0000000000000100 DSISR: 00080000 IRQMASK: 1
GPR00: c000000000c07a88 c000000072707c80 c000000001546300 c00000007be38a80
GPR04: c0000000726f0c00 0000000000000002 c00000007279c980 0000000000000100
GPR08: c000000001581b78 0000000080000001 0000000000000008 c00000007279c9b0
GPR12: 0000000000000000 c000000001730000 c000000000142558 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: c00000007be38a80 c000000000c002f4 0000000000000000 0000000000000000
GPR28: c000000072221a00 c0000000726c2600 c00000007be38a80 c00000007be38a80
NIP [c000000000097f88] __spin_yield+0x48/0xa0
LR [c000000000c07a88] __raw_spin_lock+0xb8/0xc0
Call Trace:
[c000000072707c80] [c000000072221a00] 0xc000000072221a00 (unreliable)
[c000000072707cb0] [c000000000bffb0c] __schedule+0xbc/0x850
[c000000072707d70] [c000000000c002f4] schedule+0x54/0x130
[c000000072707da0] [c0000000001427dc] kthreadd+0x28c/0x2b0
[c000000072707e20] [c00000000000c1cc] ret_from_kernel_thread+0x5c/0x70
Instruction dump:
4d9e0020 552a043e 210a07ff 79080fe0 0b080000 3d020004 3908b878 794a1f24
e8e80000 7ce7502a e8e70000 38e70100 <7ca03c2c> 70a70001 78a50020 4d820020
---[ end trace 474d6b2b8fc5cb7e ]---
Fixes: 499dcd41378e ("powerpc/64s: Allocate LPPACAs individually")
Signed-off-by: Christopher M. Riedl <[email protected]>
[mpe: Reword change log a bit]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The __rw_yield and __spin_yield locks only pertain to SPLPAR mode.
Rename them to make this relationship obvious.
Signed-off-by: Christopher M. Riedl <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Determining if a processor is in shared processor mode is not a constant
so don't hide it behind a #define.
Signed-off-by: Christopher M. Riedl <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Optimise LOAD_REG_IMMEDIATE_SYM() using a temporary register to
parallelise operations.
It reduces the path from 5 to 3 instructions.
Suggested-by: Segher Boessenkool <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/bad41ed02531bb0382420cbab50a0d7153b71767.1566311636.git.christophe.leroy@c-s.fr
|
|
Today LOAD_REG_IMMEDIATE() is a basic #define which loads all
parts on a value into a register, including the parts that are NUL.
This means always 2 instructions on PPC32 and always 5 instructions
on PPC64. And those instructions cannot run in parallele as they are
updating the same register.
Ex: LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) in head_64.S results in:
3c 20 00 00 lis r1,0
60 21 00 00 ori r1,r1,0
78 21 07 c6 rldicr r1,r1,32,31
64 21 00 00 oris r1,r1,0
60 21 40 00 ori r1,r1,16384
Rewrite LOAD_REG_IMMEDIATE() with GAS macro in order to skip
the parts that are NUL.
Rename existing LOAD_REG_IMMEDIATE() as LOAD_REG_IMMEDIATE_SYM()
and use that one for loading value of symbols which are not known
at compile time.
Now LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) in head_64.S results in:
38 20 40 00 li r1,16384
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/d60ce8dd3a383c7adbfc322bf1d53d81724a6000.1566311636.git.christophe.leroy@c-s.fr
|
|
ioremap does things differently depending on whether
SLAB is available or not at different levels.
Try to separate the early path from the beginning.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/3acd2dbe04b04f111475e7a59f2b6f2ab9b95ab6.1566309263.git.christophe.leroy@c-s.fr
|
|
PPC32 and PPC64 are doing the same once SLAB is available.
Create a do_ioremap() function that calls get_vm_area and
do the mapping.
For PPC64, we add the 4K PFN hack sanity check to __ioremap_caller()
in order to avoid using __ioremap_at(). Other checks in __ioremap_at()
are irrelevant for __ioremap_caller().
On PPC64, VM area is allocated in the range [ioremap_bot ; IOREMAP_END]
On PPC32, VM area is allocated in the range [VMALLOC_START ; VMALLOC_END]
Lets define IOREMAP_START is ioremap_bot for PPC64, and alias
IOREMAP_START/END to VMALLOC_START/END on PPC32
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/42e7e36ad32e0fdf76692426cc642799c9f689b8.1566309263.git.christophe.leroy@c-s.fr
|
|
book3s64's ioremap_range() is almost same as fallback ioremap_range(),
except that it calls radix__ioremap_range() when radix is enabled.
radix__ioremap_range() is also very similar to the other ones, expect
that it calls ioremap_page_range when slab is available.
PPC32 __ioremap_caller() have a loop doing the same thing as
ioremap_range() so use it on PPC32 as well.
Lets keep only one version of ioremap_range() which calls
ioremap_page_range() on all platforms when slab is available.
At the same time, drop the nid parameter which is not used.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/4b1dca7096b01823b101be7338983578641547f1.1566309263.git.christophe.leroy@c-s.fr
|
|
Drop multiple definitions of ioremap_bot and make one common to
all subarches.
Only CONFIG_PPC_BOOK3E_64 had a global static init value for
ioremap_bot. Now ioremap_bot is set in early_init_mmu_global().
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/920eebfd9f36f14c79d1755847f5bf7c83703bdd.1566309262.git.christophe.leroy@c-s.fr
|