Age | Commit message (Collapse) | Author | Files | Lines |
|
H_GET_CPU_CHARACTERISTICS
This allows the hypervisor / firmware to describe these workarounds to
the guest.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This prevents spurious rebuilds of the lds and then wrappers.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
POWER9 and POWER10 asynchronous machine checks due to stores have their
cause reported in SRR1 but SRR1[42] is set, which in other cases
indicates DSISR cause.
Check for these cases and clear SRR1[42], so the cause matching uses
the i-side (SRR1) table.
Fixes: 7b9f71f974a1 ("powerpc/64s: POWER9 machine check handler")
Fixes: 201220bb0e8c ("powerpc/powernv: Machine check handler for POWER10")
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The POWER9 vCPU TLB management code assumes all threads in a core share
a TLB, and that TLBIEL execued by one thread will invalidate TLBs for
all threads. This is not the case for SMT8 capable POWER9 and POWER10
(big core) processors, where the TLB is split between groups of threads.
This results in TLB multi-hits, random data corruption, etc.
Fix this by introducing cpu_first_tlb_thread_sibling etc., to determine
which siblings share TLBs, and use that in the guest TLB flushing code.
[[email protected]: add changelog and comment]
Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
NX generates an interrupt when sees a fault on the user space
buffer and the hypervisor forwards that interrupt to OS. Then
the kernel handles the interrupt by issuing H_GET_NX_FAULT hcall
to retrieve the fault CRB information.
This patch also adds changes to setup and free IRQ per each
window and also handles the fault by updating the CSB.
Signed-off-by: Haren Myneni <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This patch adds VAS window allocatioa/close with the corresponding
hcalls. Also changes to integrate with the existing user space VAS
API and provide register/unregister functions to NX pseries driver.
The driver register function is used to create the user space
interface (/dev/crypto/nx-gzip) and unregister to remove this entry.
The user space process opens this device node and makes an ioctl
to allocate VAS window. The close interface is used to deallocate
window.
Signed-off-by: Haren Myneni <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The hypervisor provides VAS capabilities for GZIP default and QoS
features. These capabilities gives information for the specific
features such as total number of credits available in LPAR,
maximum credits allowed per window, maximum credits allowed in
LPAR, whether usermode copy/paste is supported, and etc.
This patch adds the following:
- Retrieve all features that are provided by hypervisor using
H_QUERY_VAS_CAPABILITIES hcall with 0 as feature type.
- Retrieve capabilities for the specific feature using the same
hcall and the feature type (1 for QoS and 2 for default type).
Signed-off-by: Haren Myneni <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This patch adds the following hcall wrapper functions to allocate,
modify and deallocate VAS windows, and retrieve VAS capabilities.
H_ALLOCATE_VAS_WINDOW: Allocate VAS window
H_DEALLOCATE_VAS_WINDOW: Close VAS window
H_MODIFY_VAS_WINDOW: Setup window before using
H_QUERY_VAS_CAPABILITIES: Get VAS capabilities
Signed-off-by: Haren Myneni <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
PowerVM introduces two different type of credits: Default and Quality
of service (QoS).
The total number of default credits available on each LPAR depends
on CPU resources configured. But these credits can be shared or
over-committed across LPARs in shared mode which can result in
paste command failure (RMA_busy). To avoid NX HW contention, the
hypervisor ntroduces QoS credit type which makes sure guaranteed
access to NX esources. The system admins can assign QoS credits
or each LPAR via HMC.
Default credit type is used to allocate a VAS window by default as
on PowerVM implementation. But the process can pass
VAS_TX_WIN_FLAG_QOS_CREDIT flag with VAS_TX_WIN_OPEN ioctl to open
QoS type window.
Signed-off-by: Haren Myneni <[email protected]>
Acked-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This patch adds hcalls and other definitions. Also define structs
that are used in VAS implementation on PowerVM.
Signed-off-by: Haren Myneni <[email protected]>
Acked-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Many elements in vas_struct are used on PowerNV and PowerVM
platforms. vas_window is used for both TX and RX windows on
PowerNV and for TX windows on PowerVM. So some elements are
specific to these platforms.
So this patch defines common vas_window and platform
specific window structs (pnv_vas_window on PowerNV). Also adds
the corresponding changes in PowerNV vas code.
Signed-off-by: Haren Myneni <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
If a coprocessor encounters an error translating an address, the
VAS will cause an interrupt in the host. The kernel processes
the fault by updating CSB. This functionality is same for both
powerNV and pseries. So this patch moves these functions to
common vas-api.c and the actual functionality is not changed.
Signed-off-by: Haren Myneni <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Take pid and mm references when each window opens and drops during
close. This functionality is needed for powerNV and pseries. So
this patch defines the existing code as functions in common book3s
platform vas-api.c
Signed-off-by: Haren Myneni <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
PowerNV uses registers to open/close VAS windows, and getting the
paste address. Whereas the hypervisor calls are used on PowerVM.
This patch adds the platform specific user space window operations
and register with the common VAS user space interface.
Signed-off-by: Haren Myneni <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
powerNV and pseries drivers register / unregister to the corresponding
platform specific VAS separately. Then these VAS functions call the
common API with the specific window operations. So rename powerNV VAS
API register/unregister functions.
Signed-off-by: Haren Myneni <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The pseries platform will share vas and nx code and interfaces
with the PowerNV platform, so create the
arch/powerpc/platforms/book3s/ directory and move VAS API code
there. Functionality is not changed.
Signed-off-by: Haren Myneni <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The kernel handles the NX fault by updating CSB or sending
signal to process. In multithread applications, children can
open VAS windows and can exit without closing them. But the
parent can continue to send NX requests with these windows. To
prevent pid reuse, reference will be taken on pid and tgid
when the window is opened and release them during window close.
The current code is not releasing the tgid reference which can
cause pid leak and this patch fixes the issue.
Fixes: db1c08a740635 ("powerpc/vas: Take reference to PID and mm for user space windows")
Cc: [email protected] # 5.8+
Reported-by: Nicholas Piggin <[email protected]>
Signed-off-by: Haren Myneni <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fix initrd corruption caused by our recent change to use relative jump
labels.
Fix a crash using perf record on systems without a hardware PMU
backend.
Rework our 64-bit signal handling slighty to make it more closely
match the old behaviour, after the recent change to use unsafe user
accessors.
Thanks to Anastasia Kovaleva, Athira Rajeev, Christophe Leroy, Daniel
Axtens, Greg Kurz, and Roman Bolshakov"
* tag 'powerpc-5.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/perf: Fix crash in perf_instruction_pointer() when ppmu is not set
powerpc: Fix initrd corruption with relative jump labels
powerpc/signal64: Copy siginfo before changing regs->nip
powerpc/mem: Add back missing header to fix 'no previous prototype' error
|
|
Change the type and name of task_struct::state. Drop the volatile and
shrink it to an 'unsigned int'. Rename it in order to find all uses
such that we can use READ_ONCE/WRITE_ONCE as appropriate.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Daniel Bristot de Oliveira <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Daniel Thompson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Acked-by: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This commit in sched/urgent moved the cfs_rq_is_decayed() function:
a7b359fc6a37: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
and this fresh commit in sched/core modified it in the old location:
9e077b52d86a: ("sched/pelt: Check that *_avg are null when *_sum are")
Merge the two variants.
Conflicts:
kernel/sched/fair.c
Signed-off-by: Ingo Molnar <[email protected]>
|
|
On systems without any specific PMU driver support registered, running
perf record causes Oops.
The relevant portion from call trace:
BUG: Kernel NULL pointer dereference on read at 0x00000040
Faulting instruction address: 0xc0021f0c
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K PREEMPT CMPCPRO
SAF3000 DIE NOTIFICATION
CPU: 0 PID: 442 Comm: null_syscall Not tainted 5.13.0-rc6-s3k-dev-01645-g7649ee3d2957 #5164
NIP: c0021f0c LR: c00e8ad8 CTR: c00d8a5c
NIP perf_instruction_pointer+0x10/0x60
LR perf_prepare_sample+0x344/0x674
Call Trace:
perf_prepare_sample+0x7c/0x674 (unreliable)
perf_event_output_forward+0x3c/0x94
__perf_event_overflow+0x74/0x14c
perf_swevent_hrtimer+0xf8/0x170
__hrtimer_run_queues.constprop.0+0x160/0x318
hrtimer_interrupt+0x148/0x3b0
timer_interrupt+0xc4/0x22c
Decrementer_virt+0xb8/0xbc
During perf record session, perf_instruction_pointer() is called to
capture the sample IP. This function in core-book3s accesses
ppmu->flags. If a platform specific PMU driver is not registered, ppmu
is set to NULL and accessing its members results in a crash. Fix this
crash by checking if ppmu is set.
Fixes: 2ca13a4cc56c ("powerpc/perf: Use regs->nip when SIAR is zero")
Cc: [email protected] # v5.11+
Reported-by: Christophe Leroy <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Make them the same type as vCPU stats. There is no reason
to limit the counters to unsigned long.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Merge some powerpc KVM patches from our topic branch.
In particular this brings in Nick's big series rewriting parts of the
guest entry/exit path in C.
Conflicts:
arch/powerpc/kernel/security.c
arch/powerpc/kvm/book3s_hv_rmhandlers.S
|
|
Update _tlbiel_pid() such that we can avoid build errors like below when
using this function in other places.
arch/powerpc/mm/book3s64/radix_tlb.c: In function ‘__radix__flush_tlb_range_psize’:
arch/powerpc/mm/book3s64/radix_tlb.c:114:2: warning: ‘asm’ operand 3 probably does not match constraints
114 | asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
| ^~~
arch/powerpc/mm/book3s64/radix_tlb.c:114:2: error: impossible constraint in ‘asm’
make[4]: *** [scripts/Makefile.build:271: arch/powerpc/mm/book3s64/radix_tlb.o] Error 1
m
With this fix, we can also drop the __always_inline in __radix_flush_tlb_range_psize
which was added by commit e12d6d7d46a6 ("powerpc/mm/radix: mark __radix__flush_tlb_range_psize() as __always_inline")
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When delivering a signal to a sigaction style handler (SA_SIGINFO), we
pass pointers to the siginfo and ucontext via r4 and r5.
Currently we populate the values in those registers by reading the
pointers out of the sigframe in user memory, even though the values in
user memory were written by the kernel just prior:
unsafe_put_user(&frame->info, &frame->pinfo, badframe_block);
unsafe_put_user(&frame->uc, &frame->puc, badframe_block);
...
if (ksig->ka.sa.sa_flags & SA_SIGINFO) {
err |= get_user(regs->gpr[4], (unsigned long __user *)&frame->pinfo);
err |= get_user(regs->gpr[5], (unsigned long __user *)&frame->puc);
ie. we write &frame->info into frame->pinfo, and then read frame->pinfo
back into r4, and similarly for &frame->uc.
The code has always been like this, since linux-fullhistory commit
d4f2d95eca2c ("Forward port of 2.4 ppc64 signal changes.").
There's no reason for us to read the values back from user memory,
rather than just setting the value in the gpr[4/5] directly. In fact
reading the value back from user memory opens up the possibility of
another user thread changing the values before we read them back.
Although any process doing that would be racing against the kernel
delivering the signal, and would risk corrupting the stack, so that
would be a userspace bug.
Note that this is 64-bit only code, so there's no subtlety with the size
of pointers differing between kernel and user. Also the frame variable
is not modified to point elsewhere during the function.
In the past reading the values back from user memory was not costly, but
now that we have KUAP on some CPUs it is, so we'd rather avoid it for
that reason too.
So change the code to just set the values directly, using the same
values we have written to the sigframe previously in the function.
Note also that this matches what our 32-bit signal code does.
Using a version of will-it-scale's signal1_threads that sets SA_SIGINFO,
this results in a ~4% increase in signals per second on a Power9, from
229,777 to 239,766.
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This implementation uses spin_until_cond in wd_smp_lock including
neither linux/processor.h nor asm/processor.h
This patch includes linux/processor.h here for spin_until_cond usage.
Signed-off-by: Sudeep Holla <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/5e8d2d50f301a346040362028c2ecba40685de9e.1623438544.git.christophe.leroy@csgroup.eu
|
|
linux/processor.h has exactly same defination for spin_until_cond.
Drop the redundant defination in asm/processor.h
Signed-off-by: Sudeep Holla <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/1fff2054e5dfc00329804dbd3f2a91667c9a8aff.1623438544.git.christophe.leroy@csgroup.eu
|
|
PPC_TRANSACTIONAL_MEM is only on book3s/64
SPE is only on booke
PPC_TRANSACTIONAL_MEM selects ALTIVEC and VSX
Therefore, within PPC_TRANSACTIONAL_MEM sections,
ALTIVEC and VSX are always defined while SPE never is.
Remove all SPE code and all #ifdef ALTIVEC and VSX in tm
functions.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/a069a348ee3c2fe3123a5a93695c2b35dc42cb40.1623340691.git.christophe.leroy@csgroup.eu
|
|
book3s/32 and 8xx don't use vmalloc for modules.
Print the modules area at startup as part of the virtual memory layout:
[ 0.000000] Kernel virtual memory layout:
[ 0.000000] * 0xffafc000..0xffffc000 : fixmap
[ 0.000000] * 0xc9000000..0xffafc000 : vmalloc & ioremap
[ 0.000000] * 0xb0000000..0xc0000000 : modules
[ 0.000000] Memory: 118480K/131072K available (7152K kernel code, 2320K rwdata, 1328K rodata, 368K init, 854K bss, 12592K reserved, 0K cma-reserved)
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/98394503e92d6fd6d8f657e0b263b32f21cf2790.1623438478.git.christophe.leroy@csgroup.eu
|
|
Make our stack-walking code KASAN-safe by using __no_sanitize_address.
Generic code, arm64, s390 and x86 all make accesses unchecked for similar
sorts of reasons: when unwinding a stack, we might touch memory that KASAN
has marked as being out-of-bounds. In ppc64 KASAN development, I hit this
sometimes when checking for an exception frame - because we're checking
an arbitrary offset into the stack frame.
See commit 20955746320e ("s390/kasan: avoid false positives during stack
unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
Prevent KASAN false positive warnings") and commit 6e22c8366416
("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer").
Signed-off-by: Daniel Axtens <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
update_power8_hid0() is used only by powernv platform subcore.c
Move it there.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/37f41d74faa0c66f90b373e243e8b1ee37a1f6fa.1623219019.git.christophe.leroy@csgroup.eu
|
|
proc_trap() has never been used, remove it.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/827944ea12d470c2f862635f48b5ee6c1520351f.1623217909.git.christophe.leroy@csgroup.eu
|
|
Comment says that __main() is there to make GCC happy.
It's been there since the implementation of ppc arch in Linux 1.3.45.
ppc32 is the only architecture having that. Even ppc64 doesn't have it.
Seems like GCC is still happy without it.
Drop it for good.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/d01028f8166b98584eec536b52f14c5e3f98ff6b.1623172922.git.christophe.leroy@csgroup.eu
|
|
PTE_SIZE means PTE page table size in most placed, whereas
in hash_low.S in means size of one entry in the table.
Rename it PTE_T_SIZE, and define it directly in hash_low.S
instead of going through asm-offsets.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/83a008a9fd6cc3f2bbcb470f592555d260ed7a3d.1623063174.git.christophe.leroy@csgroup.eu
|
|
Don't duplicate swapper_pg_dir[] in each platform's head.S
Define it in mm/pgtable.c
Define MAX_PTRS_PER_PGD because on book3s/64 PTRS_PER_PGD is
not a constant.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/5e3f1b8a4695c33ccc80aa3870e016bef32b85e1.1623063174.git.christophe.leroy@csgroup.eu
|
|
At the time being, empty_zero_page[] is defined in each
platform head.S.
Define it in mm/mem.c instead, and put it in BSS section instead
of the DATA section. Commit 5227cfa71f9e ("arm64: mm: place
empty_zero_page in bss") explains why it is interesting to have
it in BSS.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/5838caffa269e0957c5a50cc85477876220298b0.1623063174.git.christophe.leroy@csgroup.eu
|
|
DEBUG_HARDER is not user selectable.
Remove it together with related messages.
Also remove two pr_devel() messages that should
likely have been pr_hard().
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/0f25109b0e12fdd1e6541dedbb2212cc53526a57.1622712515.git.christophe.leroy@csgroup.eu
|
|
DEBUG_CLAMP_LAST_CONTEXT was there in the old days to reduce
number of contexts in order to ease debugging implementation
of context switching, but that's been quite stable during
years now.
As it is not user selectable, remove it.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/da81837b452e8b9f1657b529b9c3050dc10b9770.1622712515.git.christophe.leroy@csgroup.eu
|
|
mmu_context handling has been there for years, so we
would know if there was problems with maps.
DEBUG_MAP_CONSISTENCY is not user selectable, remove it.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/6fe2b88956db53f8d6ee221525b2c5dc6aec82c6.1622712515.git.christophe.leroy@csgroup.eu
|
|
Everything can be done even when CONFIG_SMP is not selected.
Just use IS_ENABLED() where relevant and rely on GCC to
opt out unneeded code and variables when CONFIG_SMP is not set.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/cc13b87b0f750a538621876ecc24c22a07e7c8be.1622712515.git.christophe.leroy@csgroup.eu
|
|
ppc8xx already has set_context() in C.
Other ones have it in assembly. The only thing it does is to
write the context id into SPRN_PID.
Do it in C.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/a5d0759064f3831c6b88af49ef5d3b05ba1c4dad.1622712515.git.christophe.leroy@csgroup.eu
|
|
Instead of duplicating the update of BDI2000 pointers in
set_context(), do it directly from switch_mmu_context().
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/4c54997edd3548fa54717915e7c6ebaf60f208c0.1622712515.git.christophe.leroy@csgroup.eu
|
|
All KUAP helpers defined in asm/kup.h are single line functions
that should be inlined. But on book3s/32 build, we get many
instances of <prevent_write_to_user.constprop.0>.
Force inlining of those helpers.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/8479a862e165a57a855292d47e24c259a578f5a0.1622711627.git.christophe.leroy@csgroup.eu
|
|
prevent_user_access() doesn't use anymore to/from/size parameters.
Remove them.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/b7113662fd2c26e4c33e9d705de324bd3860822e.1622708530.git.christophe.leroy@csgroup.eu
|
|
book3s/32 was the only user of KUAP_CURRENT_XXX.
After rework of book3s/32 KUAP, it is not used anymore.
Remove them.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/549214ecf6887d965645e664520d4886663c5ffb.1622708530.git.christophe.leroy@csgroup.eu
|
|
Now that KUAP and KUEP have been significantly optimised and can be
disabled at boot time using 'nosmap' and 'nosmep' kernel parameters,
them can be active by default like in other powerpc platforms.
It is still possible to disable them completely in the configuration.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/86c7c74a3ba5312daea7e9658b096e2bcc6f4b64.1622708530.git.christophe.leroy@csgroup.eu
|
|
On book3s/32, KUAP is provided by toggling Ks bit in segment registers.
One segment register addresses 256M of virtual memory.
At the time being, KUAP implements a complex logic to apply the
unlock/lock on the exact number of segments covering the user range
to access, with saving the boundaries of the range of segments in
a member of thread struct.
But most if not all user accesses are within a single segment.
Rework KUAP with a different approach:
- Open only one segment, the one corresponding to the starting
address of the range to be accessed.
- If a second segment is involved, it will generate a page fault. The
segment will then be open by the page fault handler.
The kuap member of thread struct will now contain:
- The start address of the current on going user access, that will be
used to know which segment to lock at the end of the user access.
- ~0 when no user access is open
- ~1 when additionnal segments are opened by a page fault.
Then, at lock time
- When only one segment is open, close it.
- When several segments are open, close all user segments.
Almost 100% of the time, only one segment will be involved.
In interrupts, inline the function that unlock/lock all segments,
because not inlining them implies a lot of register save/restore.
With the patch, writing value 128 in userspace in perf_copy_attr() is
done with 16 instructions:
3890: 93 82 04 dc stw r28,1244(r2)
3894: 7d 20 e5 26 mfsrin r9,r28
3898: 55 29 00 80 rlwinm r9,r9,0,2,0
389c: 7d 20 e1 e4 mtsrin r9,r28
38a0: 4c 00 01 2c isync
38a4: 39 20 00 80 li r9,128
38a8: 91 3c 00 00 stw r9,0(r28)
38ac: 81 42 04 dc lwz r10,1244(r2)
38b0: 39 00 ff ff li r8,-1
38b4: 91 02 04 dc stw r8,1244(r2)
38b8: 2c 0a ff fe cmpwi r10,-2
38bc: 41 82 00 88 beq 3944 <perf_copy_attr+0x36c>
38c0: 7d 20 55 26 mfsrin r9,r10
38c4: 65 29 40 00 oris r9,r9,16384
38c8: 7d 20 51 e4 mtsrin r9,r10
38cc: 4c 00 01 2c isync
...
3944: 48 00 00 01 bl 3944 <perf_copy_attr+0x36c>
3944: R_PPC_REL24 kuap_lock_all_ool
Before the patch it was 118 instructions. In reality only 42 are
executed in most cases, but GCC is not able to see that a properly
aligned user access cannot involve more than one segment.
5060: 39 1d 00 04 addi r8,r29,4
5064: 3d 20 b0 00 lis r9,-20480
5068: 7c 08 48 40 cmplw r8,r9
506c: 40 81 00 08 ble 5074 <perf_copy_attr+0x2cc>
5070: 3d 00 b0 00 lis r8,-20480
5074: 39 28 ff ff addi r9,r8,-1
5078: 57 aa 00 06 rlwinm r10,r29,0,0,3
507c: 55 29 27 3e rlwinm r9,r9,4,28,31
5080: 39 29 00 01 addi r9,r9,1
5084: 7d 29 53 78 or r9,r9,r10
5088: 91 22 04 dc stw r9,1244(r2)
508c: 7d 20 ed 26 mfsrin r9,r29
5090: 55 29 00 80 rlwinm r9,r9,0,2,0
5094: 7c 08 50 40 cmplw r8,r10
5098: 40 81 00 c0 ble 5158 <perf_copy_attr+0x3b0>
509c: 7d 46 50 f8 not r6,r10
50a0: 7c c6 42 14 add r6,r6,r8
50a4: 54 c6 27 be rlwinm r6,r6,4,30,31
50a8: 7d 20 51 e4 mtsrin r9,r10
50ac: 3c ea 10 00 addis r7,r10,4096
50b0: 39 29 01 11 addi r9,r9,273
50b4: 7f 88 38 40 cmplw cr7,r8,r7
50b8: 55 29 02 06 rlwinm r9,r9,0,8,3
50bc: 40 9d 00 9c ble cr7,5158 <perf_copy_attr+0x3b0>
50c0: 2f 86 00 00 cmpwi cr7,r6,0
50c4: 41 9e 00 4c beq cr7,5110 <perf_copy_attr+0x368>
50c8: 2f 86 00 01 cmpwi cr7,r6,1
50cc: 41 9e 00 2c beq cr7,50f8 <perf_copy_attr+0x350>
50d0: 2f 86 00 02 cmpwi cr7,r6,2
50d4: 41 9e 00 14 beq cr7,50e8 <perf_copy_attr+0x340>
50d8: 7d 20 39 e4 mtsrin r9,r7
50dc: 39 29 01 11 addi r9,r9,273
50e0: 3c e7 10 00 addis r7,r7,4096
50e4: 55 29 02 06 rlwinm r9,r9,0,8,3
50e8: 7d 20 39 e4 mtsrin r9,r7
50ec: 39 29 01 11 addi r9,r9,273
50f0: 3c e7 10 00 addis r7,r7,4096
50f4: 55 29 02 06 rlwinm r9,r9,0,8,3
50f8: 7d 20 39 e4 mtsrin r9,r7
50fc: 3c e7 10 00 addis r7,r7,4096
5100: 39 29 01 11 addi r9,r9,273
5104: 7f 88 38 40 cmplw cr7,r8,r7
5108: 55 29 02 06 rlwinm r9,r9,0,8,3
510c: 40 9d 00 4c ble cr7,5158 <perf_copy_attr+0x3b0>
5110: 7d 20 39 e4 mtsrin r9,r7
5114: 39 29 01 11 addi r9,r9,273
5118: 3c c7 10 00 addis r6,r7,4096
511c: 55 29 02 06 rlwinm r9,r9,0,8,3
5120: 7d 20 31 e4 mtsrin r9,r6
5124: 39 29 01 11 addi r9,r9,273
5128: 3c c6 10 00 addis r6,r6,4096
512c: 55 29 02 06 rlwinm r9,r9,0,8,3
5130: 7d 20 31 e4 mtsrin r9,r6
5134: 39 29 01 11 addi r9,r9,273
5138: 3c c7 30 00 addis r6,r7,12288
513c: 55 29 02 06 rlwinm r9,r9,0,8,3
5140: 7d 20 31 e4 mtsrin r9,r6
5144: 3c e7 40 00 addis r7,r7,16384
5148: 39 29 01 11 addi r9,r9,273
514c: 7f 88 38 40 cmplw cr7,r8,r7
5150: 55 29 02 06 rlwinm r9,r9,0,8,3
5154: 41 9d ff bc bgt cr7,5110 <perf_copy_attr+0x368>
5158: 4c 00 01 2c isync
515c: 39 20 00 80 li r9,128
5160: 91 3d 00 00 stw r9,0(r29)
5164: 38 e0 00 00 li r7,0
5168: 90 e2 04 dc stw r7,1244(r2)
516c: 7d 20 ed 26 mfsrin r9,r29
5170: 65 29 40 00 oris r9,r9,16384
5174: 40 81 00 c0 ble 5234 <perf_copy_attr+0x48c>
5178: 7d 47 50 f8 not r7,r10
517c: 7c e7 42 14 add r7,r7,r8
5180: 54 e7 27 be rlwinm r7,r7,4,30,31
5184: 7d 20 51 e4 mtsrin r9,r10
5188: 3d 4a 10 00 addis r10,r10,4096
518c: 39 29 01 11 addi r9,r9,273
5190: 7c 08 50 40 cmplw r8,r10
5194: 55 29 02 06 rlwinm r9,r9,0,8,3
5198: 40 81 00 9c ble 5234 <perf_copy_attr+0x48c>
519c: 2c 07 00 00 cmpwi r7,0
51a0: 41 82 00 4c beq 51ec <perf_copy_attr+0x444>
51a4: 2c 07 00 01 cmpwi r7,1
51a8: 41 82 00 2c beq 51d4 <perf_copy_attr+0x42c>
51ac: 2c 07 00 02 cmpwi r7,2
51b0: 41 82 00 14 beq 51c4 <perf_copy_attr+0x41c>
51b4: 7d 20 51 e4 mtsrin r9,r10
51b8: 39 29 01 11 addi r9,r9,273
51bc: 3d 4a 10 00 addis r10,r10,4096
51c0: 55 29 02 06 rlwinm r9,r9,0,8,3
51c4: 7d 20 51 e4 mtsrin r9,r10
51c8: 39 29 01 11 addi r9,r9,273
51cc: 3d 4a 10 00 addis r10,r10,4096
51d0: 55 29 02 06 rlwinm r9,r9,0,8,3
51d4: 7d 20 51 e4 mtsrin r9,r10
51d8: 3d 4a 10 00 addis r10,r10,4096
51dc: 39 29 01 11 addi r9,r9,273
51e0: 7c 08 50 40 cmplw r8,r10
51e4: 55 29 02 06 rlwinm r9,r9,0,8,3
51e8: 40 81 00 4c ble 5234 <perf_copy_attr+0x48c>
51ec: 7d 20 51 e4 mtsrin r9,r10
51f0: 39 29 01 11 addi r9,r9,273
51f4: 3c ea 10 00 addis r7,r10,4096
51f8: 55 29 02 06 rlwinm r9,r9,0,8,3
51fc: 7d 20 39 e4 mtsrin r9,r7
5200: 39 29 01 11 addi r9,r9,273
5204: 3c e7 10 00 addis r7,r7,4096
5208: 55 29 02 06 rlwinm r9,r9,0,8,3
520c: 7d 20 39 e4 mtsrin r9,r7
5210: 39 29 01 11 addi r9,r9,273
5214: 3c ea 30 00 addis r7,r10,12288
5218: 55 29 02 06 rlwinm r9,r9,0,8,3
521c: 7d 20 39 e4 mtsrin r9,r7
5220: 3d 4a 40 00 addis r10,r10,16384
5224: 39 29 01 11 addi r9,r9,273
5228: 7c 08 50 40 cmplw r8,r10
522c: 55 29 02 06 rlwinm r9,r9,0,8,3
5230: 41 81 ff bc bgt 51ec <perf_copy_attr+0x444>
5234: 4c 00 01 2c isync
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Export the ool handlers to fix build errors]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/d9121f96a7c4302946839a0771f5d1daeeb6968c.1622708530.git.christophe.leroy@csgroup.eu
|
|
PPC64 uses MMU features to enable/disable KUAP at boot time.
But feature fixups are applied way too early on PPC32.
Now that all KUAP related actions are in C following the
conversion of KUAP initial setup and context switch in C,
static branches can be used to enable/disable KUAP.
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Export disable_kuap_key to fix build errors]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/cd79e8008455fba5395d099f9bb1305c039b931c.1622708530.git.christophe.leroy@csgroup.eu
|
|
PPC64 uses MMU features to enable/disable KUEP at boot time.
But feature fixups are applied way too early on PPC32.
Now that all KUEP related actions are in C following the
conversion of KUEP initial setup and context switch in C,
static branches can be used to enable/disable KUEP.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/7745a2c3a08ec46302920a3f48d1cb9b5469dbbb.1622708530.git.christophe.leroy@csgroup.eu
|