Age | Commit message (Collapse) | Author | Files | Lines |
|
Functions pte_alloc_one(), pte_alloc_one_kernel(), pte_free(),
pte_free_kernel() are identical for the four subarches.
This patch moves their definition in a common place.
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
pte_alloc_one_kernel() and pte_alloc_one() are simple calls to
pte_fragment_alloc(), so they are good candidates for inlining as
already done on PPC64.
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In the same way as PPC64, implement early allocation functions and
avoid calling pte_alloc_kernel() before slab is available.
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
early_alloc_pgtable() is only used during init.
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Lets select PPC_MM_SLICES from the subarch config item instead of
doing it via defaults declaration in the PPC_MM_SLICES item itself.
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Those files have no real added values, especially the 64 bit
which only includes the common book3e mmu.h which is also
included from 32 bits side.
So lets do the final inclusion directly from nohash/mmu.h
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
pgtable_t is now identical for all subarches, move it to the
top level asm/mmu.h
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Book3E 64 is the only subarch not using pte_fragment. In order
to allow refactorisation, this patch converts it to pte_fragment.
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This has never been called (since Kernel has been in git at least),
drop it.
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
__find_linux_pte() is full of if/else which is hard to
follow allthough the handling is pretty simple.
Previous patches left a { } block. This patch removes it.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
__find_linux_pte() is full of if/else which is hard to
follow allthough the handling is pretty simple.
Previous patch left { } blocks. This patch removes the first one
by shifting its content to the left.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
__find_linux_pte() is full of if/else which is hard to
follow allthough the handling is pretty simple.
This patch flattens the function by getting rid of as much if/else
as possible. In order to ease the review, this is done in three steps.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Only 3 subarches support huge pages. So when it is either 2 of them,
it is not the third one.
And mmu_has_feature() is known by all subarches so IS_ENABLED() can
be used instead of #ifdef
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Only book3s/64 may select default among several HPAGE_SHIFT at runtime.
8xx always defines 512K pages as default
FSL_BOOK3E always defines 4M pages as default
This patch limits HUGETLB_PAGE_SIZE_VARIABLE to book3s/64
moves the definitions in subarches files.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
No need to have this in asm/page.h, move it into asm/hugetlb.h
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Introduce a subarch specific helper check_and_get_huge_psize()
to check the huge page sizes and cleanup the ifdef mess in
add_huge_page_size()
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This patchs adds a subarch helper to populate hugepd.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Three subarches support hugepages:
- fsl book3e
- book3s/64
- 8xx
This patch splits asm/hugetlb.h to reduce the #ifdef mess.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
gup_huge_pd() is the only user of gup_hugepte() and it is
located in the same file. This patch moves gup_huge_pd()
after gup_hugepte() and makes gup_hugepte() static.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The only function in hugetlbpage.c which doesn't depend on
CONFIG_HUGETLB_PAGE is gup_hugepte(), and this function is
only called from gup_huge_pd() which depends on
CONFIG_HUGETLB_PAGE so all the content of hugetlbpage.c
depends on CONFIG_HUGETLB_PAGE.
This patch modifies Makefile to only compile hugetlbpage.c
when CONFIG_HUGETLB_PAGE is set.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
__find_linux_pte() is the only function in hugetlbpage.c
which is compiled in regardless on CONFIG_HUGETLBPAGE
This patch moves it in pgtable.c.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
As per Kconfig.cputype, only CONFIG_PPC_FSL_BOOK3E gets to
select SYS_SUPPORTS_HUGETLBFS so simplify accordingly.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
CONFIG_PPC_64K_PAGES cannot be selected by nohash/64.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This function is not used anymore, drop it.
Fixes: b42279f0165c ("powerpc/mm/nohash: MM_SLICE is only used by book3s 64")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This patch defines a subarch specific SLB_ADDR_LIMIT_DEFAULT
to remove the #ifdefs around the setup of mm->context.slb_addr_limit
It also generalises the use of mm_ctx_set_slb_addr_limit() helper.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
get_slice_psize() can be defined regardless of CONFIG_PPC_MM_SLICES
to avoid ifdefs
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The 8xx only selects CONFIG_PPC_MM_SLICES when CONFIG_HUGETLB_PAGE
is set.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This patch replaces a couple of #ifdef CONFIG_PPC_64K_PAGES
by IS_ENABLED(CONFIG_PPC_64K_PAGES) to improve code maintainability.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
For PPC32 that's a noop, gcc should be smart enough to ignore it.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Now that slice_mask_for_size() is in mmu.h, the mm_ctx_slice_mask_xxx()
are not needed anymore, so drop them. Note that the 8xx ones where
not used anyway.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Move slice_mask_for_size() into subarch mmu.h
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
[mpe: Retain the BUG_ON()s, rather than converting to VM_BUG_ON()]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
slice_mask_for_size() only uses mm->context, so hand directly a
pointer to the context. This will help moving the function in
subarch mmu.h in the next patch by avoiding having to include
the definition of struct mm_struct
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Only nohash/32 and book3s/64 support mm slices.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Commit 67fda38f0d68 ("powerpc/mm: Move slb_addr_linit to
early_init_mmu") moved slb_addr_limit init out of setup_arch().
Commit 701101865f5d ("powerpc/mm: Reduce memory usage for mm_context_t
for radix") brought it back into setup_arch() by error.
This patch reverts that erroneous regress.
Fixes: 701101865f5d ("powerpc/mm: Reduce memory usage for mm_context_t for radix")
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Many files in arch/powerpc/mm are only for nohash. This patch
creates a subdirectory for them.
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Shorten new filenames]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Several files in arch/powerpc/mm are only for book3S32. This patch
creates a subdirectory for them.
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Shorten new filenames]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Many files in arch/powerpc/mm are only for book3S64. This patch
creates a subdirectory for them.
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Update the selftest sym links, shorten new filenames, cleanup some
whitespace and formatting in the new files.]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This patch make inclusion of mmu_decl.h independant of the location
of the file including it.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
TRANSPARENT_HUGEPAGE is only supported by book3s
VMEMMAP_REGION_ID is never used
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
early_alloc_pgtable() never returns NULL as it panics on failure.
This patch drops the three BUG_ON() which check the non nullity
of early_alloc_pgtable() returned value.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Serge reported some crashes with CONFIG_STRICT_KERNEL_RWX enabled
on a book3s32 machine.
Analysis shows two issues:
- BATs addresses and sizes are not properly aligned.
- There is a gap between the last address covered by BATs and the
first address covered by pages.
Memory mapped with DBATs:
0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent
1: 0xc0800000-0xc0bfffff 0x00800000 Kernel RO coherent
2: 0xc0c00000-0xc13fffff 0x00c00000 Kernel RW coherent
3: 0xc1400000-0xc23fffff 0x01400000 Kernel RW coherent
4: 0xc2400000-0xc43fffff 0x02400000 Kernel RW coherent
5: 0xc4400000-0xc83fffff 0x04400000 Kernel RW coherent
6: 0xc8400000-0xd03fffff 0x08400000 Kernel RW coherent
7: 0xd0400000-0xe03fffff 0x10400000 Kernel RW coherent
Memory mapped with pages:
0xe1000000-0xefffffff 0x21000000 240M rw present dirty accessed
This patch fixes both issues. With the patch, we get the following
which is as expected:
Memory mapped with DBATs:
0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent
1: 0xc0800000-0xc0bfffff 0x00800000 Kernel RO coherent
2: 0xc0c00000-0xc0ffffff 0x00c00000 Kernel RW coherent
3: 0xc1000000-0xc1ffffff 0x01000000 Kernel RW coherent
4: 0xc2000000-0xc3ffffff 0x02000000 Kernel RW coherent
5: 0xc4000000-0xc7ffffff 0x04000000 Kernel RW coherent
6: 0xc8000000-0xcfffffff 0x08000000 Kernel RW coherent
7: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent
Memory mapped with pages:
0xe0000000-0xefffffff 0x20000000 256M rw present dirty accessed
Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
Reported-by: Serge Belyshev <[email protected]>
Acked-by: Segher Boessenkool <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
There is a kernel crash that happens if rt_sigreturn() is called inside
a transactional block.
This crash happens if the kernel hits an in-kernel page fault when
accessing userspace memory, usually through copy_ckvsx_to_user(). A
major page fault calls might_sleep() function, which can cause a task
reschedule. A task reschedule (switch_to()) reclaim and recheckpoint
the TM states, but, in the signal return path, the checkpointed memory
was already reclaimed, thus the exception stack has MSR that points to
MSR[TS]=0.
When the code returns from might_sleep() and a task reschedule
happened, then this task is returned with the memory recheckpointed,
and CPU MSR[TS] = suspended.
This means that there is a side effect at might_sleep() if it is
called with CPU MSR[TS] = 0 and the task has regs->msr[TS] != 0.
This side effect can cause a TM bad thing, since at the exception
entrance, the stack saves MSR[TS]=0, and this is what will be used at
RFID, but, the processor has MSR[TS] = Suspended, and this transition
will be invalid and a TM Bad thing will be raised, causing the
following crash:
Unexpected TM Bad Thing exception at c00000000000e9ec (msr 0x8000000302a03031) tm_scratch=800000010280b033
cpu 0xc: Vector: 700 (Program Check) at [c00000003ff1fd70]
pc: c00000000000e9ec: fast_exception_return+0x100/0x1bc
lr: c000000000032948: handle_rt_signal64+0xb8/0xaf0
sp: c0000004263ebc40
msr: 8000000302a03031
current = 0xc000000415050300
paca = 0xc00000003ffc4080 irqmask: 0x03 irq_happened: 0x01
pid = 25006, comm = sigfuz
Linux version 5.0.0-rc1-00001-g3bd6e94bec12 (breno@debian) (gcc version 8.2.0 (Debian 8.2.0-3)) #899 SMP Mon Jan 7 11:30:07 EST 2019
WARNING: exception is not recoverable, can't continue
enter ? for help
[c0000004263ebc40] c000000000032948 handle_rt_signal64+0xb8/0xaf0 (unreliable)
[c0000004263ebd30] c000000000022780 do_notify_resume+0x2f0/0x430
[c0000004263ebe20] c00000000000e844 ret_from_except_lite+0x70/0x74
--- Exception: c00 (System Call) at 00007fffbaac400c
SP (7fffeca90f40) is in userspace
The solution for this problem is running the sigreturn code with
regs->msr[TS] disabled, thus, avoiding hitting the side effect above.
This does not seem to be a problem since regs->msr will be replaced by
the ucontext value, so, it is being flushed already. In this case, it
is flushed earlier.
Signed-off-by: Breno Leitao <[email protected]>
Acked-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This patch fixes the below crash by making sure we touch the subpage
protection related structures only if we know they are allocated on
the platform. With radix translation we don't allocate hash context at
all and trying to access subpage_prot_table results in:
Faulting instruction address: 0xc00000000008bdb4
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
....
NIP [c00000000008bdb4] sys_subpage_prot+0x74/0x590
LR [c00000000000b688] system_call+0x5c/0x70
Call Trace:
[c00020002c6b7d30] [c00020002c6b7d90] 0xc00020002c6b7d90 (unreliable)
[c00020002c6b7e20] [c00000000000b688] system_call+0x5c/0x70
Instruction dump:
fb61ffd8 fb81ffe0 fba1ffe8 fbc1fff0 fbe1fff8 f821ff11 e92d1178 f9210068
39200000 e92d0968 ebe90630 e93f03e8 <eb891038> 60000000 3860fffe e9410068
We also move the subpage_prot_table with mmp_sem held to avoid race
between two parallel subpage_prot syscall.
Fixes: 701101865f5d ("powerpc/mm: Reduce memory usage for mm_context_t for radix")
Reported-by: Sachin Sant <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Tested-by: Sachin Sant <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Print more information about MCE error whether it is an hardware or
software error.
Some of the MCE errors can be easily categorized as hardware or
software errors e.g. UEs are due to hardware error, where as error
triggered due to invalid usage of tlbie is a pure software bug. But
not all the MCE errors can be easily categorize into either software
or hardware. There are errors like multihit errors which are usually
result of a software bug, but in some rare cases a hardware failure
can cause a multihit error. In past, we have seen case where after
replacing faulty chip, multihit errors stopped occurring. Same with
parity errors, which are usually due to faulty hardware but there are
chances where multihit can also cause an parity error. Such errors are
difficult to determine what really caused it. Hence this patch
classifies MCE errors into following four categorize:
1. Hardware error:
UE and Link timeout failure errors.
2. Probable hardware error (some chance of software cause)
SLB/ERAT/TLB Parity errors.
3. Software error
Invalid tlbie form.
4. Probable software error (some chance of hardware cause)
SLB/ERAT/TLB Multihit errors.
Sample output:
MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 000001001b6e0320 [Recovered]
MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [00007fffa309dc60]
MCE: CPU80: Probable Software error (some chance of hardware cause)
Signed-off-by: Mahesh Salgaonkar <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently all machine check errors are printed as severe errors which
isn't correct. Print soft errors as warning instead of severe errors.
Signed-off-by: Mahesh Salgaonkar <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Also add cpu number while displaying MCE log. This will help cleaner
logs when MCE hits on multiple cpus simultaneously.
Before the changes the MCE output was:
Severe Machine check interrupt [Recovered]
NIP [d00000000ba80280]: insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb]
Initiator: CPU
Error type: SLB [Multihit]
Effective address: d00000000ba80280
After this patch series changes the MCE output will be:
MCE: CPU80: machine check (Warning) Host SLB Multihit [Recovered]
MCE: CPU80: NIP: [d00000000b550280] insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb]
MCE: CPU80: Probable software error (some chance of hardware cause)
UE in host application:
MCE: CPU48: machine check (Severe) Host UE Load/Store DAR: 00007fffc6079a80 paddr: 0000000f8e260000 [Not recovered]
MCE: CPU48: PID: 4584 Comm: find NIP: [0000000010023368]
MCE: CPU48: Hardware error
and for MCE in Guest:
MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 000001001b6e0320 [Recovered]
MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [00007fffa309dc60]
MCE: CPU80: Probable software error (some chance of hardware cause)
Signed-off-by: Mahesh Salgaonkar <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
When analysing sources of OS jitter, I noticed that doorbells cannot be
traced.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In commit 2bf1071a8d50 ("powerpc/64s: Remove POWER9 DD1 support") the
function __switch_to remove usage for 'dummy_copy_buffer'. Since it is
not used anywhere else, remove it completely.
This remove the following warning:
arch/powerpc/kernel/process.c:1156:17: error: 'dummy_copy_buffer' defined but not used
Suggested-by: Christophe Leroy <[email protected]>
Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently error return from kobject_init_and_add() is not followed by
a call to kobject_put(). This means there is a memory leak.
Add call to kobject_put() in error path of kobject_init_and_add().
Signed-off-by: Tobin C. Harding <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Tyrel Datwyler <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|