Age | Commit message (Collapse) | Author | Files | Lines |
|
module_trampoline_target()
module_trampoline_target() is quite a hot path used when
activating/deactivating function tracer.
Avoid the heavy copy_from_kernel_nofault() by doing four calls
to copy_inst_from_kernel_nofault().
Use __copy_inst_from_kernel_nofault() for the 3 last calls. First call
is done to copy_from_kernel_nofault() to check address is within
kernel space. No risk to wrap out the top of kernel space because the
last page is never mapped so if address is in last page the first copy
will fails and the other ones will never be performed.
And also make it notrace just like all functions that call it.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/c55559103e014b7863161559d340e8e9484eaaa6.1652074503.git.christophe.leroy@csgroup.eu
|
|
On the same model as get_user() versus __get_user(),
introduce __copy_inst_from_kernel_nofault() which doesn't
check address.
To be used by callers that have already checked that the adress
is a kernel address.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/1f3702890d6dbd64702b61834753bcc96851c18c.1652074503.git.christophe.leroy@csgroup.eu
|
|
A lot of #ifdefs can be replaced by IS_ENABLED()
Do so.
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Fold in changes suggested by Naveen and Christophe on list]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/18ce6708d6f8c71d87436f9c6019f04df4125128.1652074503.git.christophe.leroy@csgroup.eu
|
|
Avoid ifdefs around expected_nop_sequence().
While at it make it a bool.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/305d22472f1f92127fba09692df6bb5d079a8cd0.1652074503.git.christophe.leroy@csgroup.eu
|
|
0x80000000 is SZ_2G. Use it.
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Fix comparison against unsigned -SZ_2G]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/bb6626e884acffe87b58736291df57db3deaa9b9.1652074503.git.christophe.leroy@csgroup.eu
|
|
PPC_RAW_xxx() macros are self explanatory and less error prone
than open coding.
Use them in ftrace.c
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/9292094c9a69cef6d29ee83f435a557b59c45065.1652074503.git.christophe.leroy@csgroup.eu
|
|
To make it explicit, use BRANCH_SET_LINK instead of value 1
when calling create_branch().
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/d57847063ac93660a5af620d4df1847f10edf61a.1652074503.git.christophe.leroy@csgroup.eu
|
|
ftrace_plt_tramps table is never filled so it is useless.
Remove it.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/daeeb618a6619e3a7e3f82f1bd83ca7c25af6330.1652074503.git.christophe.leroy@csgroup.eu
|
|
Since commit 0c0c52306f47 ("powerpc: Only support DYNAMIC_FTRACE not
static"), CONFIG_DYNAMIC_FTRACE is always selected when
CONFIG_FUNCTION_TRACER is selected.
To avoid confusion and have the reader wonder what's happen when
CONFIG_FUNCTION_TRACER is selected and CONFIG_DYNAMIC_FTRACE is not,
use CONFIG_FUNCTION_TRACER in ifdefs instead of CONFIG_DYNAMIC_FTRACE.
As CONFIG_FUNCTION_GRAPH_TRACER depends on CONFIG_FUNCTION_TRACER,
ftrace.o doesn't need to appear for both symbols in Makefile.
Then as ftrace.o is built only when CONFIG_FUNCTION_TRACER is selected
ifdef CONFIG_FUNCTION_TRACER is not needed in ftrace.c, and since it
implies CONFIG_DYNAMIC_FTRACE, CONFIG_DYNAMIC_FTRACE is not needed
in ftrace.c
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/628d357503eb90b4a034f99b7df516caaff4d279.1652074503.git.christophe.leroy@csgroup.eu
|
|
Since commit 7bea7ac0ca01 ("powerpc/syscalls: Fix syscall tracing")
ftrace.o is not needed anymore for CONFIG_FTRACE_SYSCALLS.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/275932a5d61543b825ff9a64f61abed6da5d4a2a.1652074503.git.christophe.leroy@csgroup.eu
|
|
Since c93d4f6ecf4b ("powerpc/ftrace: Add module_trampoline_target()
for PPC32"), __ftrace_make_nop() for PPC32 is very similar to the
one for PPC64.
Same for __ftrace_make_call().
Make them common.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/96f53c237316dab4b1b8c682685266faa92da816.1652074503.git.christophe.leroy@csgroup.eu
|
|
Now that we have CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2,
get rid of all indirect detection of ABI version.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/709d9d69523c14c8a9fba4486395dca0f2d675b1.1652074503.git.christophe.leroy@csgroup.eu
|
|
Replace all uses of PPC64_ELF_ABI_v1 and PPC64_ELF_ABI_v2 by
resp CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/ba13d59e8c50bc9aa6328f1c7f0c0d0278e0a3a7.1652074503.git.christophe.leroy@csgroup.eu
|
|
At the time being, we use CONFIG_CPU_LITTLE_ENDIAN and
CONFIG_CPU_BIG_ENDIAN to pass -mabi=elfv1 or elfv2 to
compiler, then define a PPC64_ELF_ABI_v1 or PPC64_ELF_ABI_v2
macro in asm/types.h based on _CALL_ELF define set by the compiler.
Make it more straight forward with a CONFIG option that
is directly usable.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/1eca1addbc550167da9841c7340a010d0c4b2200.1652074503.git.christophe.leroy@csgroup.eu
|
|
Instead of returning -EPERM when patch_instruction() fails,
just return what patch_instruction returns.
That simplifies ftrace_modify_code():
0: 94 21 ff c0 stwu r1,-64(r1)
4: 93 e1 00 3c stw r31,60(r1)
8: 7c 7f 1b 79 mr. r31,r3
c: 40 80 00 30 bge 3c <ftrace_modify_code+0x3c>
10: 93 c1 00 38 stw r30,56(r1)
14: 7c 9e 23 78 mr r30,r4
18: 7c a4 2b 78 mr r4,r5
1c: 80 bf 00 00 lwz r5,0(r31)
20: 7c 1e 28 40 cmplw r30,r5
24: 40 82 00 34 bne 58 <ftrace_modify_code+0x58>
28: 83 c1 00 38 lwz r30,56(r1)
2c: 7f e3 fb 78 mr r3,r31
30: 83 e1 00 3c lwz r31,60(r1)
34: 38 21 00 40 addi r1,r1,64
38: 48 00 00 00 b 38 <ftrace_modify_code+0x38>
38: R_PPC_REL24 patch_instruction
Before:
0: 94 21 ff c0 stwu r1,-64(r1)
4: 93 e1 00 3c stw r31,60(r1)
8: 7c 7f 1b 79 mr. r31,r3
c: 40 80 00 4c bge 58 <ftrace_modify_code+0x58>
10: 93 c1 00 38 stw r30,56(r1)
14: 7c 9e 23 78 mr r30,r4
18: 7c a4 2b 78 mr r4,r5
1c: 80 bf 00 00 lwz r5,0(r31)
20: 7c 08 02 a6 mflr r0
24: 90 01 00 44 stw r0,68(r1)
28: 7c 1e 28 40 cmplw r30,r5
2c: 40 82 00 48 bne 74 <ftrace_modify_code+0x74>
30: 7f e3 fb 78 mr r3,r31
34: 48 00 00 01 bl 34 <ftrace_modify_code+0x34>
34: R_PPC_REL24 patch_instruction
38: 80 01 00 44 lwz r0,68(r1)
3c: 20 63 00 00 subfic r3,r3,0
40: 83 c1 00 38 lwz r30,56(r1)
44: 7c 63 19 10 subfe r3,r3,r3
48: 7c 08 03 a6 mtlr r0
4c: 83 e1 00 3c lwz r31,60(r1)
50: 38 21 00 40 addi r1,r1,64
54: 4e 80 00 20 blr
It improves ftrace activation/deactivation duration by about 3%.
Modify patch_instruction() return on failure to -EPERM in order to
match with ftrace expectations. Other users of patch_instruction()
do not care about the exact error value returned.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/49a8597230713e2633e7d9d7b56140787c4a7e20.1652074503.git.christophe.leroy@csgroup.eu
|
|
Inlining ftrace_modify_code(), it increases a bit the
size of ftrace code but brings 5% improvment on ftrace
activation.
Usually in C files we let gcc decide what to do but here
it really help to 'help' gcc to decide to inline, thought
we don't want to force it with an __always_inline that
would be too much for CONFIG_CC_OPTIMIZE_FOR_SIZE.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/1597a06d57cfc80e6853838c4066e799bf6c7977.1652074503.git.christophe.leroy@csgroup.eu
|
|
create_branch() is a good candidate for inlining because:
- Flags can be folded in.
- Range tests are likely to be already done.
Hence reducing the create_branch() to only a set of instructions.
So inline it.
It improves ftrace activation by 10%.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/69851cc9a7bf8f03d025e6d29e165f2d0bd3bb6e.1652074503.git.christophe.leroy@csgroup.eu
|
|
Use is_offset_in_branch_range() instead of create_branch()
to check if a target is within branch range.
This patch together with the previous one improves
ftrace activation time by 7%
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/912ae51782f5a53c44e435497c8c3fb5cc632387.1652074503.git.christophe.leroy@csgroup.eu
|
|
Test in is_offset_in_branch_range() and is_offset_in_cond_branch_range()
are simple tests that are worth inlining.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/a05be0ccb7373e6a9789a1988fcd0c810f5f9269.1652074503.git.christophe.leroy@csgroup.eu
|
|
Since commit d5937db114e4 ("powerpc/code-patching: Fix patch_branch()
return on out-of-range failure") patch_branch() fails with -ERANGE
when trying to branch out of range.
No need to perform the test twice. Remove redundant create_branch()
calls.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/aa45fbad0b4b7493080835d8276c0cb4ce146503.1652074503.git.christophe.leroy@csgroup.eu
|
|
When we have CONFIG_DYNAMIC_FTRACE_WITH_ARGS,
prepare_ftrace_return() is called by ftrace_graph_func()
otherwise prepare_ftrace_return() is called from assembly.
Refactor prepare_ftrace_return() into a static
__prepare_ftrace_return() that will be called by both
prepare_ftrace_return() and ftrace_graph_func().
It will allow GCC to fold __prepare_ftrace_return() inside
ftrace_graph_func().
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/0d42deafe353980c66cf19d3132638c05ba9f4a9.1652074503.git.christophe.leroy@csgroup.eu
|
|
rtas_call must not be called with the MMU disabled because in case
of rtas error, log_error is called which requires MMU enabled. Add
a test and warning for this.
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Laurent Dufour <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
PAPR specifies that RTAS may be called with MSR[RI] enabled if the
calling context is recoverable, and RTAS will manage RI as necessary.
Call the rtas entry point with RI enabled, and add a check to ensure
the caller has RI enabled.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
On 64-bit, PACA is saved in a SPRG so it does not need to be saved on
stack. We also don't need to mask off the top bits for real mode
addresses because the architecture does this for us.
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Laurent Dufour <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Disable MSR[EE] in C code rather than asm.
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Laurent Dufour <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The code was moved verbatim including whitespace cruft. Fix that.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This symbol is marked nokprobe on 32-bit but not 64-bit, add it.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This makes working on the code a bit easier.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Implement the AT_MINSIGSTKSZ AUXV entry, allowing userspace to
dynamically size stack allocations in a manner forward-compatible with
new processor state saved in the signal frame
For now these statically find the maximum signal frame size rather than
doing any runtime testing of features to minimise the size.
glibc 2.34 will take advantage of this, as will applications that use
use _SC_MINSIGSTKSZ and _SC_SIGSTKSZ.
Reviewed-by: Tulio Magno Quites Machado Filho <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
References: 94b07c1f8c39 ("arm64: signal: Report signal frame size to userspace via auxv")
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The sad tale of SIGSTKSZ and MINSIGSTKSZ is documented in glibc.git
commit f7c399cff5bd ("PowerPC SIGSTKSZ"), which explains why glibc
does not use the kernel defines for these constants.
Since then in fact there has been a further expansion of the signal
stack frame size on little-endian with linux commit
573ebfa6601f ("powerpc: Increase stack redzone for 64-bit userspace to
512 bytes"), which has caused it to exceed even the glibc defines.
See kernel commit 63dee5df43a3 ("powerpc: Allow 4224 bytes of stack
expansion for the signal frame") for more details of the history of the
expansion.
Increase MINSIGSTKSZ to 8192 which is double the current glibc value and
fits the current stack frame with room to grow. SIGSTKSZ is set to 4x
the minimum as convention.
glibc will have to be updated as well.
Reviewed-by: Tulio Magno Quites Machado Filho <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The PowerPC vDSO uses $(CC) to link, which differs from the rest of the
kernel, which uses $(LD) directly. As a result, the default linker of
the compiler is used, which may differ from the linker requested by the
builder. For example:
$ make ARCH=powerpc LLVM=1 mrproper defconfig arch/powerpc/kernel/vdso/
...
$ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg
File: arch/powerpc/kernel/vdso/vdso32.so.dbg
String dump of section '.comment':
[ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
File: arch/powerpc/kernel/vdso/vdso64.so.dbg
String dump of section '.comment':
[ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
LLVM=1 sets LD=ld.lld but ld.lld is not used to link the vDSO; GNU ld is
because "ld" is the default linker for clang on most Linux platforms.
This is a problem for Clang's Link Time Optimization as implemented in
the kernel because use of GNU ld with LTO requires the LLVMgold plugin,
which is not technically supported for ld.bfd per
https://llvm.org/docs/GoldPlugin.html. Furthermore, if LLVMgold.so is
missing from a user's system, the build will fail, even though LTO as it
is implemented in the kernel requires ld.lld to avoid this dependency in
the first place.
Ultimately, the PowerPC vDSO should be converted to compiling and
linking with $(CC) and $(LD) respectively but there were issues last
time this was tried, potentially due to older but supported tool
versions. To avoid regressing GCC + binutils, use the compiler option
'-fuse-ld', which tells the compiler which linker to use when it is
invoked as both the compiler and linker. Use '-fuse-ld=lld' when
LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is
linked with the same linker as the rest of the kernel.
$ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg
File: arch/powerpc/kernel/vdso/vdso32.so.dbg
String dump of section '.comment':
[ 0] Linker: LLD 14.0.0
[ 14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
File: arch/powerpc/kernel/vdso/vdso64.so.dbg
String dump of section '.comment':
[ 0] Linker: LLD 14.0.0
[ 14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
LD can be a full path to ld.lld, which will not be handled properly by
'-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's
search path. '-fuse-ld' can take a path to the linker but it is
deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario.
Use '--ld-path' if it is supported, as it will handle a full path or
just 'ld.lld' properly. See the LLVM commit below for the full details
of '--ld-path'.
Signed-off-by: Nathan Chancellor <[email protected]>
Tested-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://github.com/ClangBuiltLinux/linux/issues/774
Link: https://github.com/llvm/llvm-project/commit/1bc5c84710a8c73ef21295e63c19d10a8c71f2f5
Link: https://lore.kernel.org/r/[email protected]
|
|
When linking vdso{32,64}.so.dbg with ld.lld, there is a warning about
not finding _start for the starting address:
ld.lld: warning: cannot find entry symbol _start; not setting start address
ld.lld: warning: cannot find entry symbol _start; not setting start address
Looking at GCC + GNU ld, the entry point address is 0x0:
$ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
File: vdso32.so.dbg
Entry point address: 0x0
File: vdso64.so.dbg
Entry point address: 0x0
This matches what ld.lld emits:
$ powerpc64le-linux-gnu-readelf -p .comment vdso{32,64}.so.dbg
File: vdso32.so.dbg
String dump of section '.comment':
[ 0] Linker: LLD 14.0.0
[ 14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
File: vdso64.so.dbg
String dump of section '.comment':
[ 0] Linker: LLD 14.0.0
[ 14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
$ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
File: vdso32.so.dbg
Entry point address: 0x0
File: vdso64.so.dbg
Entry point address: 0x0
Remove ENTRY to remove the warning, as it is unnecessary for the vDSO to
function correctly.
Signed-off-by: Nathan Chancellor <[email protected]>
Tested-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When the mmu_feature_keys[] was introduced in the commit c12e6f24d413
("powerpc: Add option to use jump label for mmu_has_feature()"),
it is unlikely that it would be used either directly or indirectly in
the out of tree modules. So we exported it as GPL only.
But with the evolution of the codes, especially the PPC_KUAP support, it
may be indirectly referenced by some primitive macro or inline functions
such as get_user() or __copy_from_user_inatomic(), this will make it
impossible to build many non GPL modules (such as ZFS) on ppc
architecture. Fix this by exposing the mmu_feature_keys[] to the non-GPL
modules too.
Fixes: 7613f5a66bec ("powerpc/64s/kuap: Use mmu_has_feature()")
Reported-by: Nathaniel Filardo <[email protected]>
Signed-off-by: Kevin Hao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The panic notifiers infrastructure is a bit limited in the scope of
the callbacks - basically every kind of functionality is dropped
in a list that runs in the same point during the kernel panic path.
This is not really on par with the complexities and particularities
of architecture / hypervisors' needs, and a refactor is ongoing.
As part of this refactor, it was observed that powerpc has 2 notifiers,
with mixed goals: one is just a KASLR offset dumper, whereas the other
aims to hard-disable IRQs (necessary on panic path), warn firmware of
the panic event (fadump) and run low-level platform-specific machinery
that might stop kernel execution and never come back.
Clearly, the 2nd notifier has opposed goals: disable IRQs / fadump
should run earlier while low-level platform actions should
run late since it might not even return. Hence, this patch decouples
the notifiers splitting them in three:
- First one is responsible for hard-disable IRQs and fadump,
should run early;
- The kernel KASLR offset dumper is really an informative notifier,
harmless and may run at any moment in the panic path;
- The last notifier should run last, since it aims to perform
low-level actions for specific platforms, and might never return.
It is also only registered for 2 platforms, pseries and ps3.
The patch better documents the notifiers and clears the code too,
also removing a useless header.
Currently no functionality change should be observed, but after
the planned panic refactor we should expect more panic reliability
with this patch.
Signed-off-by: Guilherme G. Piccoli <[email protected]>
Reviewed-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Merge our KVM topic branch.
|
|
We removed most of the vcore logic from the P9 path but there's still
a tracepoint that tried to dereference vc->runner.
Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic")
Signed-off-by: Fabiano Rosas <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Currently we have 2 sets of interrupt controller hypercalls handlers
for real and virtual modes, this is from POWER8 times when switching
MMU on was considered an expensive operation.
POWER9 however does not have dependent threads and MMU is enabled for
handling hcalls so the XIVE native or XICS-on-XIVE real mode handlers
never execute on real P9 and later CPUs.
This untemplate the handlers and only keeps the real mode handlers for
XICS native (up to POWER8) and remove the rest of dead code. Changes
in functions are mechanical except few missing empty lines to make
checkpatch.pl happy.
The default implemented hcalls list already contains XICS hcalls so
no change there.
This should not cause any behavioral change.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Acked-by: Cédric Le Goater <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When KVM_CAP_PPC_ENABLE_HCALL was introduced, H_GET_TCE and H_PUT_TCE
were already implemented and enabled by default; however H_GET_TCE
was missed out on PR KVM (probably because the handler was in
the real mode code at the time).
This enables H_GET_TCE by default. While at this, this wraps
the checks in ifdef CONFIG_SPAPR_TCE_IOMMU just like HV KVM.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
LoPAPR defines guest visible IOMMU with hypercalls to use it -
H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap
in the KVM in the real mode (with MMU off). The problem with the real mode
is some memory is not available and some API usage crashed the host but
enabling MMU was an expensive operation.
The problems with the real mode handlers are:
1. Occasionally these cannot complete the request so the code is
copied+modified to work in the virtual mode, very little is shared;
2. The real mode handlers have to be linked into vmlinux to work;
3. An exception in real mode immediately reboots the machine.
If the small DMA window is used, the real mode handlers bring better
performance. However since POWER8, there has always been a bigger DMA
window which VMs use to map the entire VM memory to avoid calling
H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT
(a bulk version of H_PUT_TCE) which virtual mode handler is even closer
to its real mode version.
On POWER9 hypercalls trap straight to the virtual mode so the real mode
handlers never execute on POWER9 and later CPUs.
So with the current use of the DMA windows and MMU improvements in
POWER9 and later, there is no point in duplicating the code.
The 32bit passed through devices may slow down but we do not have many
of these in practice. For example, with this applied, a 1Gbit ethernet
adapter still demostrates above 800Mbit/s of actual throughput.
This removes the real mode handlers from KVM and related code from
the powernv platform.
This updates the list of implemented hcalls in KVM-HV as the realmode
handlers are removed.
This changes ABI - kvmppc_h_get_tce() moves to the KVM module and
kvmppc_find_table() is static now.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Merge our fixes branch. In parciular this brings in the KVM TCE handling
fix, which is a prerequisite for a subsequent patch.
|
|
The hypervisor always sets AMOR to ~0, but let's ensure we're not
passing stale values around.
Signed-off-by: Fabiano Rosas <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Merge our fixes branch from this cycle. In particular this brings in a
papr_scm.c change which a subsequent patch has a dependency on.
|
|
The return value type defined in the function kvm_age_rmapp() is
"bool", but the return value type defined in the implementation of the
function kvm_age_rmapp() is "int".
Change the return value type to "bool".
Signed-off-by: Bo Liu <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The bug is here:
if (!p)
return ret;
The list iterator value 'p' will *always* be set and non-NULL by
list_for_each_entry(), so it is incorrect to assume that the iterator
value will be NULL if the list is empty or no element is found.
To fix the bug, Use a new value 'iter' as the list iterator, while use
the old value 'p' as a dedicated variable to point to the found element.
Fixes: dfaa973ae960 ("KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs")
Cc: [email protected] # v5.9+
Signed-off-by: Xiaomeng Tong <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
comment
kernel test robot reported kernel-doc warning for rm_host_ipi_action():
arch/powerpc/kvm/book3s_hv_rm_xics.c:887: warning: This comment starts with '/**', but isn't a kernel-doc comment.
* Host Operations poked by RM KVM
Since the function is static, remove the extraneous (second) asterisk at
the head of function comment.
Fixes: 0c2a66062470cd ("KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Bagas Sanjaya <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/linux-doc/[email protected]/
Link: https://lore.kernel.org/r/[email protected]
|
|
The L1 should not be able to adjust LPES mode for the L2. Setting LPES
if the L0 needs it clear would cause external interrupts to be sent to
L2 and missed by the L0.
Clearing LPES when it may be set, as typically happens with XIVE enabled
could cause a performance issue despite having no native XIVE support in
the guest, because it will cause mediated interrupts for the L2 to be
taken in HV mode, which then have to be injected.
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The PowerNV L0 currently pushes the OS xive context when running a vCPU,
regardless of whether it is running a nested guest. The problem is that
xive OS ring interrupts will be delivered while the L2 is running.
At the moment, by default, the L2 guest runs with LPCR[LPES]=0, which
actually makes external interrupts go to the L0. That causes the L2 to
exit and the interrupt taken or injected into the L1, so in some
respects this behaves like an escalation. It's not clear if this was
deliberate or not, there's no comment about it and the L1 is actually
allowed to clear LPES in the L2, so it's confusing at best.
When the L2 is running, the L1 is essentially in a ceded state with
respect to external interrupts (it can't respond to them directly and
won't get scheduled again absent some additional event). So the natural
way to solve this is when the L0 handles a H_ENTER_NESTED hypercall to
run the L2, have it arm the escalation interrupt and don't push the L1
context while running the L2.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The differences between nested and !nested will become larger in
later changes so split them out for readability.
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Move the cede abort logic out of xive escalation rearming and into
the caller to prepare for handling a similar case with nested guest
entry.
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
If there is a pending xive interrupt, inject it at guest entry (if
MSR[EE] is enabled) rather than take another interrupt when the guest
is entered. If xive is enabled then LPCR[LPES] is set so this behaviour
should be expected.
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|