Age | Commit message (Collapse) | Author | Files | Lines |
|
When a passthrough IO adapter is removed from a pseries machine using
hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
guest OS to clear all page table entries related to the adapter. If
some are still present, the RTAS call which isolates the PCI slot
returns error 9001 "valid outstanding translations" and the removal of
the IO adapter fails. This is because when the PHBs are scanned, Linux
maps automatically the INTx interrupts in the Linux interrupt number
space but these are never removed.
To solve this problem, we introduce a PPC platform specific
pcibios_remove_bus() routine which clears all interrupt mappings when
the bus is removed. This also clears the associated page table entries
of the ESB pages when using XIVE.
For this purpose, we record the logical interrupt numbers of the
mapped interrupt under the PHB structure and let pcibios_remove_bus()
do the clean up.
Since some PCI adapters, like GPUs, use the "interrupt-map" property
to describe interrupt mappings other than the legacy INTx interrupts,
we can not restrict the size of the mapping array to PCI_NUM_INTX. The
number of interrupt mappings is computed from the "interrupt-map"
property and the mapping array is allocated accordingly.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This driver does not restore stop > 3 state, so it limits itself
to states which do not lose full state or TB.
The POWER10 SPRs are sufficiently different from P9 that it seems
easier to split out the P10 code. The POWER10 deep sleep code
(e.g., the BHRB restore) has been taken out, but it can be re-added
when stop > 3 support is added.
Signed-off-by: Nicholas Piggin <[email protected]>
Tested-by: Pratik Rajesh Sampat<[email protected]>
Tested-by: Vaidyanathan Srinivasan <[email protected]>
Reviewed-by: Pratik Rajesh Sampat<[email protected]>
Reviewed-by: Gautham R. Shenoy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This ensures we don't do a partial mapping of memory. With nvdimm, when
creating namespaces with size not aligned to 16MB, the kernel ends up partially
mapping the pages. This can result in kernel adding multiple hash page table
entries for the same range. A new namespace will result in
create_section_mapping() with start and end overlapping an already existing
bolted hash page table entry.
commit: 6acd7d5ef264 ("libnvdimm/namespace: Enforce memremap_compat_align()")
made sure that we always create namespaces aligned to 16MB. But we can do
better by avoiding mapping pages that are not aligned. This helps to catch
access to these partially mapped pages early.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This addresses the following sparse warning:
arch/powerpc/platforms/ps3/spu.c:451:33: warning: symbol
'spu_management_ps3_ops' was not declared. Should it be static?
arch/powerpc/platforms/ps3/spu.c:592:28: warning: symbol
'spu_priv1_ps3_ops' was not declared. Should it be static?
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Jason Yan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Before the commit identified below, pages tables allocation was
performed after the allocation of final shadow area for linear memory.
But that commit switched the order, leading to page tables being
already allocated at the time 8xx kasan_init_shadow_8M() is called.
Due to this, kasan_init_shadow_8M() doesn't map the needed
shadow entries because there are already page tables.
kasan_init_shadow_8M() installs huge PMD entries instead of page
tables. We could at that time free the page tables, but there is no
point in creating page tables that get freed before being used.
Only book3s/32 hash needs early allocation of page tables. For other
variants, we can keep the initial order and create remaining page
tables after the allocation of final shadow memory for linear mem.
Move back the allocation of shadow page tables for
CONFIG_KASAN_VMALLOC into kasan_init() after the loop which creates
final shadow memory for linear mem.
Fixes: 41ea93cf7ba4 ("powerpc/kasan: Fix shadow pages allocation failure")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/8ae4554357da4882612644a74387ae05525b2aaa.1599800716.git.christophe.leroy@csgroup.eu
|
|
low_sleep_handler() has an hardcoded restore of segment registers
that doesn't take KUAP and KUEP into account.
Use head_32's load_segment_registers() routine instead.
Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection")
Fixes: 31ed2b13c48d ("powerpc/32s: Implement Kernel Userspace Execution Prevention.")
Cc: [email protected]
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/21b05f7298c1b18f73e6e5b4cd5005aafa24b6da.1599820109.git.christophe.leroy@csgroup.eu
|
|
Add a stub for __giveup_fpu() when CONFIG_PPC_FPU is
not selected, as done for CONFIG_SPE and CONFIG_ALTIVEC.
This allows to remove some #ifdef CONFIG_PPC_FPU.
Also change one to IS_ENABLED().
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/69c8b7954ceeccc6b849e52e1fa41b3a0f10f6c1.1597643221.git.christophe.leroy@csgroup.eu
|
|
cpu_has_feature(CPU_FTR_SPE) returns false when CONFIG_SPE is
not set.
There is no need to enclose the test in an #ifdef CONFIG_SPE.
Remove it.
CPU_FTR_SPE only exists on 32 bits. Define it as 0 on 64 bits.
We have a couple of places like:
#ifdef CONFIG_SPE
if (cpu_has_feature(CPU_FTR_SPE)) {
do_something_that_requires_CONFIG_SPE
} else {
return -EINVAL;
}
#else
return -EINVAL;
#endif
Replace them by a cleaner version:
if (cpu_has_feature(CPU_FTR_SPE)) {
#ifdef CONFIG_SPE
do_something_that_requires_CONFIG_SPE
#endif
} else {
return -EINVAL;
}
When CONFIG_SPE is not set, this resolves to an unconditional
return of -EINVAL
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/698df8387555765b70ea42e4a7fa48141c309c1f.1597643221.git.christophe.leroy@csgroup.eu
|
|
cpu_has_feature(CPU_FTR_ALTIVEC) returns false when CONFIG_ALTIVEC is
not set.
There is no need to enclose the test in an #ifdef CONFIG_ALTIVEC.
Remove it.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/03ba6b52344ca7c336df2bc6e3d31d736c804ae2.1597643221.git.christophe.leroy@csgroup.eu
|
|
cpu_has_feature(CPU_FTR_VSX) returns false when CONFIG_VSX is
not set.
There is no need to enclose the test in an #ifdef CONFIG_VSX.
Remove it.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/0eb61cf0dc66d781d47deb2228498cd61d03a754.1597643221.git.christophe.leroy@csgroup.eu
|
|
That #endif is more than 100 lines after the matching #ifdef,
and there are several #ifdef/#else/#endif inbetween.
Tag it as /* CONFIG_PPC_BOOK3S_64 */ to help locate the
matching #ifdef.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/3612a8f8aaca16de3fc414a7e66293319d6e213c.1597643147.git.christophe.leroy@csgroup.eu
|
|
The #ifdef CONFIG_KALLSYMS encloses some printk which can
compile in all cases.
Replace by IS_ENABLED().
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/2d89732a9062b2cf2651728804e4b8f6c9b9358e.1597643164.git.christophe.leroy@csgroup.eu
|
|
by IS_ENABLED()
The #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) encloses some
printk which can be compiled in all cases.
Replace by IS_ENABLED().
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/a1b6ef3d657c8f249193442f56868fc358ea5b6c.1597643160.git.christophe.leroy@csgroup.eu
|
|
This #ifdef CONFIG_PPC_BOOK3S_64 calls preload_new_slb_context()
when radix is not enabled.
radix_enabled() is always defined, and the prototype for
preload_new_slb_context() is always present, so the #ifdef
is unneeded.
Replace it by IS_ENABLED().
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/d31506ca9bac9def68cf7424eded63fdc4fb6660.1597643167.git.christophe.leroy@csgroup.eu
|
|
isync() is always defined, no need for an #ifdef.
Replace it by IS_ENABLED(CONFIG_PPC_47x).
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/ac8da0e3baa91dda805e1e492fd65aecd90c1fb5.1597643156.git.christophe.leroy@csgroup.eu
|
|
We need r1 to be properly set before activating MMU, otherwise any new
exception taken while saving registers into the stack in exception
prologs will use the user stack, which is wrong and will even lockup
or crash when KUAP is selected.
Do that by switching the meaning of r11 and r1 until we have saved r1
to the stack: copy r1 into r11 and setup the new stack pointer in r1.
To avoid complicating and impacting all generic and specific prolog
code (and more), copy back r1 into r11 once r11 is save onto
the stack.
We could get rid of copying r1 back and forth at the cost of
rewriting everything to use r1 instead of r11 all the way when
CONFIG_VMAP_STACK is set, but the effort is probably not worth it.
Fixes: 028474876f47 ("powerpc/32: prepare for CONFIG_VMAP_STACK")
Cc: [email protected]
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/8f85e8752ac5af602db7237ef53d634f4f3d3892.1599486108.git.christophe.leroy@csgroup.eu
|
|
We need r1 to be properly set before activating MMU, so
reading task_struct->stack must be done with MMU off.
This means we need an additional register to play with MSR
bits while r11 now points to the stack. For that, move r10
back to CR (As is already done for hash MMU) and use r10.
We still don't have r1 correct yet when we activate MMU.
It is done in following patch.
Fixes: 028474876f47 ("powerpc/32: prepare for CONFIG_VMAP_STACK")
Cc: [email protected]
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/a027d447022a006c9c4958ac734128e577a3c5c1.1599486108.git.christophe.leroy@csgroup.eu
|
|
__put_user_asm() and __put_user_asm2() are not used anymore.
Remove them.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/d66c4a372738d2fbd81f433ca86e4295871ace6a.1599216721.git.christophe.leroy@csgroup.eu
|
|
__patch_instruction() is the only user of __put_user_asm() outside
of asm/uaccess.h
Switch to the new __put_user_asm_goto() to enable retirement of
__put_user_asm() in a later patch.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/b9745b122f4a9ae72cef445c61320022ab8b77b7.1599216721.git.christophe.leroy@csgroup.eu
|
|
__put_user_asm_goto() provides more flexibility to GCC and avoids using
a local variable to tell if the write succeeded or not.
GCC can then avoid implementing a cmp in the fast path.
See the difference for a small function like the PPC64 version of
save_general_regs() in arch/powerpc/kernel/signal_32.c:
Before the patch (unreachable nop removed):
0000000000000c10 <.save_general_regs>:
c10: 39 20 00 2c li r9,44
c14: 39 40 00 00 li r10,0
c18: 7d 29 03 a6 mtctr r9
c1c: 38 c0 00 00 li r6,0
c20: 48 00 00 14 b c34 <.save_general_regs+0x24>
c30: 42 40 00 40 bdz c70 <.save_general_regs+0x60>
c34: 28 2a 00 27 cmpldi r10,39
c38: 7c c8 33 78 mr r8,r6
c3c: 79 47 1f 24 rldicr r7,r10,3,60
c40: 39 20 00 01 li r9,1
c44: 41 82 00 0c beq c50 <.save_general_regs+0x40>
c48: 7d 23 38 2a ldx r9,r3,r7
c4c: 79 29 00 20 clrldi r9,r9,32
c50: 91 24 00 00 stw r9,0(r4)
c54: 2c 28 00 00 cmpdi r8,0
c58: 39 4a 00 01 addi r10,r10,1
c5c: 38 84 00 04 addi r4,r4,4
c60: 41 82 ff d0 beq c30 <.save_general_regs+0x20>
c64: 38 60 ff f2 li r3,-14
c68: 4e 80 00 20 blr
c70: 38 60 00 00 li r3,0
c74: 4e 80 00 20 blr
0000000000000000 <.fixup>:
cc: 39 00 ff f2 li r8,-14
d0: 48 00 00 00 b d0 <.fixup+0xd0>
d0: R_PPC64_REL24 .text+0xc54
After the patch:
0000000000001490 <.save_general_regs>:
1490: 39 20 00 2c li r9,44
1494: 39 40 00 00 li r10,0
1498: 7d 29 03 a6 mtctr r9
149c: 60 00 00 00 nop
14a0: 28 2a 00 27 cmpldi r10,39
14a4: 79 48 1f 24 rldicr r8,r10,3,60
14a8: 39 20 00 01 li r9,1
14ac: 41 82 00 0c beq 14b8 <.save_general_regs+0x28>
14b0: 7d 23 40 2a ldx r9,r3,r8
14b4: 79 29 00 20 clrldi r9,r9,32
14b8: 91 24 00 00 stw r9,0(r4)
14bc: 39 4a 00 01 addi r10,r10,1
14c0: 38 84 00 04 addi r4,r4,4
14c4: 42 00 ff dc bdnz 14a0 <.save_general_regs+0x10>
14c8: 38 60 00 00 li r3,0
14cc: 4e 80 00 20 blr
14d0: 38 60 ff f2 li r3,-14
14d4: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/94ba5a5138f99522e1562dbcdb38d31aa790dc89.1599216721.git.christophe.leroy@csgroup.eu
|
|
Enable pre-update addressing mode in __put_user_asm_goto()
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/346f65d677adb11865f7762c25a1ca3c64404ba5.1599216023.git.christophe.leroy@csgroup.eu
|
|
The 8xx has 4 page sizes: 4k, 16k, 512k and 8M
4k and 16k can be selected at build time as standard page sizes,
and 512k and 8M are hugepages.
When 4k standard pages are selected, 16k pages are not available.
Allow 16k pages as hugepages when 4k pages are used.
To allow that, implement arch_make_huge_pte() which receives
the necessary arguments to allow setting the PTE in accordance
with the page size:
- 512 k pages must have _PAGE_HUGE and _PAGE_SPS. They are set
by pte_mkhuge(). arch_make_huge_pte() does nothing.
- 16 k pages must have only _PAGE_SPS. arch_make_huge_pte() clears
_PAGE_HUGE.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/a518abc29266a708dfbccc8fce9ae6694fe4c2c6.1598862623.git.christophe.leroy@csgroup.eu
|
|
On 8xx, the number of entries occupied by a PTE in the page tables
depends on the size of the page. At the time being, this calculation
is done in two places: in pte_update() and in set_huge_pte_at()
Refactor this calculation into a helper called
number_of_cells_per_pte(). For the time being, the val param is
unused. It will be used by following patch.
Instead of opencoding is_hugepd(), use hugepd_ok() with a forward
declaration.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/f6ea2483c2c389567b007945948f704d18cfaeea.1598862623.git.christophe.leroy@csgroup.eu
|
|
The following random segfault is observed from time to time with
map_hugetlb selftest:
root@localhost:~# ./map_hugetlb 1 19
524288 kB hugepages
Mapping 1 Mbytes
Segmentation fault
[ 31.219972] map_hugetlb[365]: segfault (11) at 117 nip 77974f8c lr 779a6834 code 1 in ld-2.23.so[77966000+21000]
[ 31.220192] map_hugetlb[365]: code: 9421ffc0 480318d1 93410028 90010044 9361002c 93810030 93a10034 93c10038
[ 31.220307] map_hugetlb[365]: code: 93e1003c 93210024 8123007c 81430038 <80e90004> 814a0004 7f443a14 813a0004
[ 31.221911] BUG: Bad rss-counter state mm:(ptrval) type:MM_FILEPAGES val:33
[ 31.229362] BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:5
This fault is due to hugetlb_free_pgd_range() freeing page tables
that are also used by regular pages.
As explain in the comment at the beginning of
hugetlb_free_pgd_range(), the verification done in free_pgd_range()
on floor and ceiling is not done here, which means
hugetlb_free_pte_range() can free outside the expected range.
As the verification cannot be done in hugetlb_free_pgd_range(), it
must be done in hugetlb_free_pte_range().
Fixes: b250c8c08c79 ("powerpc/8xx: Manage 512k huge pages as standard pages.")
Cc: [email protected]
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/f0cb2a5477cd87d1eaadb128042e20aeb2bc2859.1598860677.git.christophe.leroy@csgroup.eu
|
|
Enabling CONFIG_TAU_INT causes random crashes:
Unrecoverable exception 1700 at c0009414 (msr=1000)
Oops: Unrecoverable exception, sig: 6 [#1]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-pmac-00043-gd5f545e1a8593 #5
NIP: c0009414 LR: c0009414 CTR: c00116fc
REGS: c0799eb8 TRAP: 1700 Not tainted (5.7.0-pmac-00043-gd5f545e1a8593)
MSR: 00001000 <ME> CR: 22000228 XER: 00000100
GPR00: 00000000 c0799f70 c076e300 00800000 0291c0ac 00e00000 c076e300 00049032
GPR08: 00000001 c00116fc 00000000 dfbd3200 ffffffff 007f80a8 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 c075ce04
GPR24: c075ce04 dfff8880 c07b0000 c075ce04 00080000 00000001 c079ef98 c079ef5c
NIP [c0009414] arch_cpu_idle+0x24/0x6c
LR [c0009414] arch_cpu_idle+0x24/0x6c
Call Trace:
[c0799f70] [00000001] 0x1 (unreliable)
[c0799f80] [c0060990] do_idle+0xd8/0x17c
[c0799fa0] [c0060ba4] cpu_startup_entry+0x20/0x28
[c0799fb0] [c072d220] start_kernel+0x434/0x44c
[c0799ff0] [00003860] 0x3860
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX 3d20c07b XXXXXXXX XXXXXXXX XXXXXXXX 7c0802a6
XXXXXXXX XXXXXXXX XXXXXXXX 4e800421 XXXXXXXX XXXXXXXX XXXXXXXX 7d2000a6
---[ end trace 3a0c9b5cb216db6b ]---
Resolve this problem by disabling each THRMn comparator when handling
the associated THRMn interrupt and by disabling the TAU entirely when
updating THRMn thresholds.
Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: Finn Thain <[email protected]>
Tested-by: Stan Johnson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/5a0ba3dc5612c7aac596727331284a3676c08472.1599260540.git.fthain@telegraphics.com.au
|
|
According to Freescale's documentation, MPC74XX processors have an
erratum that prevents the TAU interrupt from working, so don't try to
use it when running on those processors.
Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: Finn Thain <[email protected]>
Tested-by: Stan Johnson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/c281611544768e758bd58fe812cf702a5bd2d042.1599260540.git.fthain@telegraphics.com.au
|
|
The commentary at the call site seems to disagree with the code. The
conditional prevents calling set_thresholds() via the exception handler,
which appears to crash. Perhaps that's because it immediately triggers
another TAU exception. Anyway, calling set_thresholds() from TAUupdate()
is redundant because tau_timeout() does so.
Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: Finn Thain <[email protected]>
Tested-by: Stan Johnson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/d7c7ee33232cf72a6a6bbb6ef05838b2e2b113c0.1599260540.git.fthain@telegraphics.com.au
|
|
Since commit 19dbdcb8039cf ("smp: Warn on function calls from softirq
context") the Thermal Assist Unit driver causes a warning like the
following when CONFIG_SMP is enabled.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/smp.c:428 smp_call_function_many_cond+0xf4/0x38c
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-pmac #3
NIP: c00b37a8 LR: c00b3abc CTR: c001218c
REGS: c0799c60 TRAP: 0700 Not tainted (5.7.0-pmac)
MSR: 00029032 <EE,ME,IR,DR,RI> CR: 42000224 XER: 00000000
GPR00: c00b3abc c0799d18 c076e300 c079ef5c c0011fec 00000000 00000000 00000000
GPR08: 00000100 00000100 00008000 ffffffff 42000224 00000000 c079d040 c079d044
GPR16: 00000001 00000000 00000004 c0799da0 c079f054 c07a0000 c07a0000 00000000
GPR24: c0011fec 00000000 c079ef5c c079ef5c 00000000 00000000 00000000 00000000
NIP [c00b37a8] smp_call_function_many_cond+0xf4/0x38c
LR [c00b3abc] on_each_cpu+0x38/0x68
Call Trace:
[c0799d18] [ffffffff] 0xffffffff (unreliable)
[c0799d68] [c00b3abc] on_each_cpu+0x38/0x68
[c0799d88] [c0096704] call_timer_fn.isra.26+0x20/0x7c
[c0799d98] [c0096b40] run_timer_softirq+0x1d4/0x3fc
[c0799df8] [c05b4368] __do_softirq+0x118/0x240
[c0799e58] [c0039c44] irq_exit+0xc4/0xcc
[c0799e68] [c000ade8] timer_interrupt+0x1b0/0x230
[c0799ea8] [c0013520] ret_from_except+0x0/0x14
--- interrupt: 901 at arch_cpu_idle+0x24/0x6c
LR = arch_cpu_idle+0x24/0x6c
[c0799f70] [00000001] 0x1 (unreliable)
[c0799f80] [c0060990] do_idle+0xd8/0x17c
[c0799fa0] [c0060ba8] cpu_startup_entry+0x24/0x28
[c0799fb0] [c072d220] start_kernel+0x434/0x44c
[c0799ff0] [00003860] 0x3860
Instruction dump:
8129f204 2f890000 40beff98 3d20c07a 8929eec4 2f890000 40beff88 0fe00000
81220000 552805de 550802ef 4182ff84 <0fe00000> 3860ffff 7f65db78 7f44d378
---[ end trace 34a886e47819c2eb ]---
Don't call on_each_cpu() from a timer callback, call it from a worker
thread instead.
Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <[email protected]>
Signed-off-by: Finn Thain <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/bb61650bea4f4c91fb8e24b9a6f130a1438651a7.1599260540.git.fthain@telegraphics.com.au
|
|
According to the MPC750 Users Manual, the SITV value in Thermal
Management Register 3 is 13 bits long. The present code calculates the
SITV value as 60 * 500 cycles. This would overflow to give 10 us on
a 500 MHz CPU rather than the intended 60 us. (But according to the
Microprocessor Datasheet, there is also a factor of 266 that has to be
applied to this value on certain parts i.e. speed sort above 266 MHz.)
Always use the maximum cycle count, as recommended by the Datasheet.
Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: Finn Thain <[email protected]>
Tested-by: Stan Johnson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/896f542e5f0f1d6cf8218524c2b67d79f3d69b3c.1599260540.git.fthain@telegraphics.com.au
|
|
MAX_PHYSMEM #define is used along with sparsemem to determine the SECTION_SHIFT
value. Powerpc also uses the same value to limit the max memory enabled on the
system. With 4K PAGE_SIZE and hash translation mode, we want to limit the max
memory enabled to 64TB due to page table size restrictions. However, with
radix translation, we don't have these restrictions. Hence split the radix
and hash MA_PHYSMEM limit and use different limit for each of them.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
With commit: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel
regions in the same 0xc range"), we now split the 64TB address range
into 4 contexts each of 16TB. That implies we can do only 16TB linear
mapping.
On some systems, eg. Power9, memory attached to nodes > 0 will appear
above 16TB in the linear mapping. This resulted in kernel crash when
we boot such systems in hash translation mode with 4K PAGE_SIZE.
This patch updates the kernel mapping such that we now start supporting upto
61TB of memory with 4K. The kernel mapping now looks like below 4K PAGE_SIZE
and hash translation.
vmalloc start = 0xc0003d0000000000
IO start = 0xc0003e0000000000
vmemmap start = 0xc0003f0000000000
Our MAX_PHYSMEM_BITS for 4K is still 64TB even though we can only map 61TB.
We prevent bolt mapping anything outside 61TB range by checking against
H_VMALLOC_START.
Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range")
Reported-by: Cameron Berkenpas <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Implement page mapping percpu first chunk allocator as a fallback to
the embedding allocator. With 4K hash translation we limit our page
table range to 64TB and commit: 0034d395f89d ("powerpc/mm/hash64: Map all the
kernel regions in the same 0xc range") moved all kernel mapping to
that 64TB range. In-order to support sparse memory layout we need
to increase our linear mapping space and reduce other mappings.
With such a layout percpu embedded first chunk allocator will fail
because of small vmalloc range. Add a fallback to page mapping
percpu first chunk allocator for such failures.
The below dmesg output can be observed in such case.
percpu: max_distance=0x1ffffef00000 too large for vmalloc space 0x10000000000
PERCPU: auto allocator failed (-22), falling back to page size
percpu: 40 4K pages/cpu s148816 r0 d15024
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This update the ppc64 version to be closer to x86/sparc.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
PPC_DEBUG_FEATURE_DATA_BP_ARCH_31 can be used to determine whether
we are running on an ISA 3.1 compliant machine. Which is needed to
determine DAR behaviour, 512 byte boundary limit etc. This was
requested by Pedro Miraglia Franco de Carvalho for extending
watchpoint features in gdb. Note that availability of 2nd DAWR is
independent of this flag and should be checked using
ppc_debug_info->num_data_bps.
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
There are couple of places where we set len but not hw_len. For
ptrace/perf watchpoints, when CONFIG_HAVE_HW_BREAKPOINT=Y, hw_len
will be calculated and set internally while parsing watchpoint.
But when CONFIG_HAVE_HW_BREAKPOINT=N, we need to manually set
'hw_len'. Similarly for xmon as well, hw_len needs to be set
directly.
Fixes: b57aeab811db ("powerpc/watchpoint: Fix length calculation for unaligned target")
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
On powerpc, ptrace watchpoint works in one-shot mode. i.e. kernel
disables event every time it fires and user has to re-enable it.
Also, in case of ptrace watchpoint, kernel notifies ptrace user
before executing instruction.
With CONFIG_HAVE_HW_BREAKPOINT=N, kernel is missing to disable
ptrace event and thus it's causing infinite loop of exceptions.
This is especially harmful when user watches on a data which is
also read/written by kernel, eg syscall parameters. In such case,
infinite exceptions happens in kernel mode which causes soft-lockup.
Fixes: 9422de3e953d ("powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers")
Reported-by: Pedro Miraglia Franco de Carvalho <[email protected]>
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Power10 hw has multiple DAWRs but hw doesn't tell which DAWR caused
the exception. So we have a sw logic to detect that in hw_breakpoint.c.
But hw_breakpoint.c gets compiled only with CONFIG_HAVE_HW_BREAKPOINT=Y.
Move DAWR detection logic outside of hw_breakpoint.c so that it can be
reused when CONFIG_HAVE_HW_BREAKPOINT is not set.
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When kernel is compiled with CONFIG_HAVE_HW_BREAKPOINT=N, user can
still create watchpoint using PPC_PTRACE_SETHWDEBUG, with limited
functionalities. But, such watchpoints are never firing because of
the missing privilege settings. Fix that.
It's safe to set HW_BRK_TYPE_PRIV_ALL because we don't really leak
any kernel address in signal info. Setting HW_BRK_TYPE_PRIV_ALL will
also help to find scenarios when kernel accesses user memory.
Reported-by: Pedro Miraglia Franco de Carvalho <[email protected]>
Suggested-by: Pedro Miraglia Franco de Carvalho <[email protected]>
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Vector load/store instructions are special because they are always
aligned. Thus unaligned EA needs to be aligned down before comparing
it with watch ranges. Otherwise we might consider valid event as
invalid.
Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint")
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
On p10 predecessors, watchpoint with quadword access is compared at
quadword length. If the watch range is doubleword or less than that
in a first half of quadword aligned 16 bytes, and if there is any
unaligned quadword access which will access only the 2nd half, the
handler should consider it as extraneous and emulate/single-step it
before continuing.
Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint")
Reported-by: Pedro Miraglia Franco de Carvalho <[email protected]>
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Yunhai Zhang recently fixed a VGA software scrollback bug in commit
ebfdfeeae8c0 ("vgacon: Fix for missing check in scrollback handling"),
but that then made people look more closely at some of this code, and
there were more problems on the vgacon side, but also the fbcon software
scrollback.
We don't really have anybody who maintains this code - probably because
nobody actually _uses_ it any more. Sure, people still use both VGA and
the framebuffer consoles, but they are no longer the main user
interfaces to the kernel, and haven't been for decades, so these kinds
of extra features end up bitrotting and not really being used.
So rather than try to maintain a likely unused set of code, I'll just
aggressively remove it, and see if anybody even notices. Maybe there
are people who haven't jumped on the whole GUI badnwagon yet, and think
it's just a fad. And maybe those people use the scrollback code.
If that turns out to be the case, we can resurrect this again, once
we've found the sucker^Wmaintainer for it who actually uses it.
Reported-by: NopNop Nop <[email protected]>
Tested-by: Willy Tarreau <[email protected]>
Cc: 张云海 <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Acked-by: Willy Tarreau <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
POWER secure guests (i.e., guests which use the Protected Execution
Facility) need to use SWIOTLB to be able to do I/O with the
hypervisor, but they don't need the SWIOTLB memory to be in low
addresses since the hypervisor doesn't have any addressing limitation.
This solves a SWIOTLB initialization problem we are seeing in secure
guests with 128 GB of RAM: they are configured with 4 GB of
crashkernel reserved memory, which leaves no space for SWIOTLB in low
addresses.
To do this, we use mostly the same code as swiotlb_init(), but
allocate the buffer using memblock_alloc() instead of
memblock_alloc_low().
Fixes: 2efbc58f157a ("powerpc/pseries/svm: Force SWIOTLB for secure guests")
Signed-off-by: Thiago Jung Bauermann <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When we added the VDSO32 kconfig symbol, which controls building of
the 32-bit VDSO, we made it depend on CPU_BIG_ENDIAN (for 64-bit).
That was because back then COMPAT was always enabled for 64-bit, so
depending on it would have left the 32-bit VDSO always enabled, which
we didn't want.
But since then we have made COMPAT selectable, and off by default for
ppc64le, so VDSO32 should really depend on that.
For most people this makes no difference, none of the defconfigs
change, it's only if someone is building ppc64le with COMPAT=y, they
will now also get VDSO32. If they've enabled COMPAT in order to run
32-bit binaries they presumably also want the 32-bit VDSO.
Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Bring in our fixes branch for this cycle which avoids some small
conflicts with upcoming commits.
|
|
The __phys_to_dma vs phys_to_dma distinction isn't exactly obvious. Try
to improve the situation by renaming __phys_to_dma to
phys_to_dma_unencryped, and not forcing architectures that want to
override phys_to_dma to actually provide __phys_to_dma.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
|
|
There is no harm in just always clearing the SME encryption bit, while
significantly simplifying the interface.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
|
|
The newly introduced 'perf_stats' attribute uses the default access
mode of 0444, allowing non-root users to access performance stats of
an nvdimm and potentially force the kernel into issuing a large number
of expensive hypercalls. Since the information exposed by this
attribute cannot be cached it is better to ward off access to this
attribute from users who don't need to access to these performance
statistics.
Hence update the access mode of 'perf_stats' attribute to be only
readable by root users.
Fixes: 2d02bf835e57 ("powerpc/papr_scm: Fetch nvdimm performance stats from PHYP")
Reported-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Vaibhav Jain <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Stop providing the possibility to override the address space using
set_fs() now that there is no need for that any more.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Provide __get_kernel_nofault and __put_kernel_nofault routines to
implement the maccess routines without messing with set_fs and without
opening up access to user space.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Add a CONFIG_SET_FS option that is selected by architecturess that
implement set_fs, which is all of them initially. If the option is not
set stubs for routines related to overriding the address space are
provided so that architectures can start to opt out of providing set_fs.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|