Age | Commit message (Collapse) | Author | Files | Lines |
|
There is no big poing in not pinning kernel text anymore, as now
we can keep pinned TLB even with things like DEBUG_PAGEALLOC.
Remove CONFIG_PIN_TLB_TEXT, making it always right.
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Drop ifdef around mmu_pin_tlb() to fix build errors]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/203b89de491e1379f1677a2685211b7c32adfff0.1606231483.git.christophe.leroy@csgroup.eu
|
|
On hash 32 bits, handling minor protection faults like unsetting
dirty flag is heavy if done from the normal page_fault processing,
because it implies hash table software lookup for flushing the entry
and then a DSI is taken anyway to add the entry back.
When KUAP was implemented, as explained in commit a68c31fc01ef
("powerpc/32s: Implement Kernel Userspace Access Protection"),
protection faults has been diverted from hash_page() because
hash_page() was not able to identify a KUAP fault.
Implement KUAP verification in hash_page(), by clearing write
permission when the access is a kernel access and Ks is 1.
This works regardless of the address because kernel segments always
have Ks set to 0 while user segments have Ks set to 0 only
when kernel write to userspace is granted.
Then protection faults can be handled by hash_page() even for KUAP.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/8a4ffe4798e9ea32aaaccdf85e411bb1beed3500.1605542955.git.christophe.leroy@csgroup.eu
|
|
early_mmu_init() is independent of MMU type and not
directly linked to tlb handling.
In a following patch, tlb.c will be restricted to HASH mmu.
Move early_mmu_init() into mmu.c which is common.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/e51b5e2fe6bca623b33116403043d3a1b5eaf826.1603348103.git.christophe.leroy@csgroup.eu
|
|
flush_hash_entry() is a simple function calling
flush_hash_pages() if it's a hash MMU or doing nothing otherwise.
Inline it.
And use it also in __ptep_test_and_clear_young().
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/9af895be7d4b404d40e749a2659552fd138e62c4.1603348103.git.christophe.leroy@csgroup.eu
|
|
On book3s/32, tlb_flush() does nothing when the CPU has a hash table,
it calls _tlbia() otherwise.
Inline it.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/ebc933d1c530a19ef3cf7983f6ae94814f6e92ac.1603348103.git.christophe.leroy@csgroup.eu
|
|
flush_range() handle both the MMU_FTR_HPTE_TABLE case and
the other case.
The non MMU_FTR_HPTE_TABLE case is trivial as it is only a call
to _tlbie()/_tlbia() which is not worth a dedicated function.
Make flush_range() a hash specific and call it from tlbflush.h based
on mmu_has_feature(MMU_FTR_HPTE_TABLE).
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/132ab19aae52abc8e06ab524ec86d4229b5b9c3d.1603348103.git.christophe.leroy@csgroup.eu
|
|
flush_tlb_range() and flush_tlb_kernel_range() are trivial calls to
flush_range().
Make flush_range() global and inline flush_tlb_range()
and flush_tlb_kernel_range().
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/c7029a78e78709ad9272d7a44260e06b649169b2.1603348103.git.christophe.leroy@csgroup.eu
|
|
flush_tlb_mm() and flush_tlb_page() handle both the MMU_FTR_HPTE_TABLE
case and the other case.
The non MMU_FTR_HPTE_TABLE case is trivial as it is only a call
to _tlbie()/_tlbia() which is not worth a dedicated function.
Make flush_tlb_mm() and flush_tlb_page() hash specific and call
them from tlbflush.h based on mmu_has_feature(MMU_FTR_HPTE_TABLE).
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/11e932ded41ba6d9b251d89b7afa33cc060d3aa4.1603348103.git.christophe.leroy@csgroup.eu
|
|
_tlbie() and _tlbia() are used only on 603 cores while the
other functions are used only on cores having a hash table.
Move them into a new file named nohash_low.S
Add mmu_hash_lock var is used by both, it needs to go
in a common file.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/9a265b1b17a64153463d361280cb4b43eb1266a4.1603348103.git.christophe.leroy@csgroup.eu
|
|
On non SMP, _tlbie() is just a tlbie plus a sync instruction.
Make it static inline.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/475136425541db5c7c8a0395d19d400525b251bc.1603348103.git.christophe.leroy@csgroup.eu
|
|
In order to use _tlbie() and _tlbia() directly
from asm/book3s/32/tlbflush.h, move their prototypes
from mm/mm_decl.h to there.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/867587af929973ad65f8ef6972f2474a80c1737a.1603348103.git.christophe.leroy@csgroup.eu
|
|
Hash related vars are used at init only.
Declare them in __initdata.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/3878ea30706839fcff9196790ff3f99c128c3f6a.1603348103.git.christophe.leroy@csgroup.eu
|
|
Hash var is used only locally in mmu.c now.
No need to set it in head_32.S anymore.
Make it static and initialises it to the early hash table.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/786c82a89cdfdaabb32b72a44f7c312fa81d192b.1603348103.git.christophe.leroy@csgroup.eu
|
|
Hash var
We now have an early hash table on hash MMU, so no need to check
Hash var to know if the Hash table is set of not.
Use mmu_has_feature(MMU_FTR_HPTE_TABLE) instead. This will allow
optimisation via jump_label.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/f1766631a9e014b6433f1a3c12c726ddfce34220.1603348103.git.christophe.leroy@csgroup.eu
|
|
This table is used only locally. Declare it static.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/054fec0c139fc4c0a306360b360784733c0a6e65.1603348103.git.christophe.leroy@csgroup.eu
|
|
This partially reverts commit eb232b162446 ("powerpc/book3s64/kuap: Improve
error reporting with KUAP") and update the fault handler to print
[ 55.022514] Kernel attempted to access user page (7e6725b70000) - exploit attempt? (uid: 0)
[ 55.022528] BUG: Unable to handle kernel data access on read at 0x7e6725b70000
[ 55.022533] Faulting instruction address: 0xc000000000e8b9bc
[ 55.022540] Oops: Kernel access of bad area, sig: 11 [#1]
....
when the kernel access userspace address without unlocking AMR.
bad_kuap_fault() is added as part of commit 5e5be3aed230 ("powerpc/mm: Detect
bad KUAP faults") to catch userspace access incorrectly blocked by AMR. Hence
retain the full stack dump there even with hash translation. Also, add a comment
explaining the difference between hash and radix.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Since commit c33165253492 ("powerpc: use non-set_fs based maccess
routines"), userspace access is not granted anymore when using
copy_from_kernel_nofault()
However, kthread_probe_data() uses copy_from_kernel_nofault()
to check validity of pointers. When the pointer is NULL,
it points to userspace, leading to a KUAP fault and triggering
the following big hammer warning many times when you request
a sysrq "show task":
[ 1117.202054] ------------[ cut here ]------------
[ 1117.202102] Bug: fault blocked by AP register !
[ 1117.202261] WARNING: CPU: 0 PID: 377 at arch/powerpc/include/asm/nohash/32/kup-8xx.h:66 do_page_fault+0x4a8/0x5ec
[ 1117.202310] Modules linked in:
[ 1117.202428] CPU: 0 PID: 377 Comm: sh Tainted: G W 5.10.0-rc5-01340-g83f53be2de31-dirty #4175
[ 1117.202499] NIP: c0012048 LR: c0012048 CTR: 00000000
[ 1117.202573] REGS: cacdbb88 TRAP: 0700 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty)
[ 1117.202625] MSR: 00021032 <ME,IR,DR,RI> CR: 24082222 XER: 20000000
[ 1117.202899]
[ 1117.202899] GPR00: c0012048 cacdbc40 c2929290 00000023 c092e554 00000001 c09865e8 c092e640
[ 1117.202899] GPR08: 00001032 00000000 00000000 00014efc 28082224 100d166a 100a0920 00000000
[ 1117.202899] GPR16: 100cac0c 100b0000 1080c3fc 1080d685 100d0000 100d0000 00000000 100a0900
[ 1117.202899] GPR24: 100d0000 c07892ec 00000000 c0921510 c21f4440 0000005c c0000000 cacdbc80
[ 1117.204362] NIP [c0012048] do_page_fault+0x4a8/0x5ec
[ 1117.204461] LR [c0012048] do_page_fault+0x4a8/0x5ec
[ 1117.204509] Call Trace:
[ 1117.204609] [cacdbc40] [c0012048] do_page_fault+0x4a8/0x5ec (unreliable)
[ 1117.204771] [cacdbc70] [c00112f0] handle_page_fault+0x8/0x34
[ 1117.204911] --- interrupt: 301 at copy_from_kernel_nofault+0x70/0x1c0
[ 1117.204979] NIP: c010dbec LR: c010dbac CTR: 00000001
[ 1117.205053] REGS: cacdbc80 TRAP: 0301 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty)
[ 1117.205104] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28082224 XER: 00000000
[ 1117.205416] DAR: 0000005c DSISR: c0000000
[ 1117.205416] GPR00: c0045948 cacdbd38 c2929290 00000001 00000017 00000017 00000027 0000000f
[ 1117.205416] GPR08: c09926ec 00000000 00000000 3ffff000 24082224
[ 1117.206106] NIP [c010dbec] copy_from_kernel_nofault+0x70/0x1c0
[ 1117.206202] LR [c010dbac] copy_from_kernel_nofault+0x30/0x1c0
[ 1117.206258] --- interrupt: 301
[ 1117.206372] [cacdbd38] [c004bbb0] kthread_probe_data+0x44/0x70 (unreliable)
[ 1117.206561] [cacdbd58] [c0045948] print_worker_info+0xe0/0x194
[ 1117.206717] [cacdbdb8] [c00548ac] sched_show_task+0x134/0x168
[ 1117.206851] [cacdbdd8] [c005a268] show_state_filter+0x70/0x100
[ 1117.206989] [cacdbe08] [c039baa0] sysrq_handle_showstate+0x14/0x24
[ 1117.207122] [cacdbe18] [c039bf18] __handle_sysrq+0xac/0x1d0
[ 1117.207257] [cacdbe48] [c039c0c0] write_sysrq_trigger+0x4c/0x74
[ 1117.207407] [cacdbe68] [c01fba48] proc_reg_write+0xb4/0x114
[ 1117.207550] [cacdbe88] [c0179968] vfs_write+0x12c/0x478
[ 1117.207686] [cacdbf08] [c0179e60] ksys_write+0x78/0x128
[ 1117.207826] [cacdbf38] [c00110d0] ret_from_syscall+0x0/0x34
[ 1117.207938] --- interrupt: c01 at 0xfd4e784
[ 1117.208008] NIP: 0fd4e784 LR: 0fe0f244 CTR: 10048d38
[ 1117.208083] REGS: cacdbf48 TRAP: 0c01 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty)
[ 1117.208134] MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 44002222 XER: 00000000
[ 1117.208470]
[ 1117.208470] GPR00: 00000004 7fc34090 77bfb4e0 00000001 1080fa40 00000002 7400000f fefefeff
[ 1117.208470] GPR08: 7f7f7f7f 10048d38 1080c414 7fc343c0 00000000
[ 1117.209104] NIP [0fd4e784] 0xfd4e784
[ 1117.209180] LR [0fe0f244] 0xfe0f244
[ 1117.209236] --- interrupt: c01
[ 1117.209274] Instruction dump:
[ 1117.209353] 714a4000 418200f0 73ca0001 40820084 73ca0032 408200f8 73c90040 4082ff60
[ 1117.209727] 0fe00000 3c60c082 386399f4 48013b65 <0fe00000> 80010034 3860000b 7c0803a6
[ 1117.210102] ---[ end trace 1927c0323393af3e ]---
To avoid that, copy_from_kernel_nofault_allowed() is used to check
whether the address is a valid kernel address. But the default
version of it returns true for any address.
Provide a powerpc version of copy_from_kernel_nofault_allowed()
that returns false when the address is below TASK_USER_MAX,
so that copy_from_kernel_nofault() will return -ERANGE.
Fixes: c33165253492 ("powerpc: use non-set_fs based maccess routines")
Reported-by: Qian Cai <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/18bcb456d32a3e74f5ae241fd6f1580c092d07f5.1607360230.git.christophe.leroy@csgroup.eu
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.10:
- Three commits fixing possible missed TLB invalidations for
multi-threaded processes when CPUs are hotplugged in and out.
- A fix for a host crash triggerable by host userspace (qemu) in KVM
on Power9.
- A fix for a host crash in machine check handling when running HPT
guests on a HPT host.
- One commit fixing potential missed TLB invalidations when using the
hash MMU on Power9 or later.
- A regression fix for machines with CPUs on node 0 but no memory.
Thanks to Aneesh Kumar K.V, Cédric Le Goater, Greg Kurz, Milan
Mohanty, Milton Miller, Nicholas Piggin, Paul Mackerras, and Srikar
Dronamraju"
* tag 'powerpc-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/powernv: Fix memory corruption when saving SLB entries on MCE
KVM: PPC: Book3S HV: XIVE: Fix vCPU id sanity check
powerpc/numa: Fix a regression on memoryless node 0
powerpc/64s: Trim offlined CPUs from mm_cpumasks
kernel/cpu: add arch override for clear_tasks_mm_cpumask() mm handling
powerpc/64s/pseries: Fix hash tlbiel_all_isa300 for guest kernels
powerpc/64s: Fix hash ISA v3.0 TLBIEL instruction generation
|
|
There is no defconfig selecting CONFIG_E200, and no platform.
e200 is an earlier version of booke, a predecessor of e500,
with some particularities like an unified cache instead of both an
instruction cache and a data cache.
Remove it.
Signed-off-by: Christophe Leroy <[email protected]>
Acked-by: Scott Wood <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/34ebc3ba2c768d97f363bd5f2deea2356e9ae127.1605589460.git.christophe.leroy@csgroup.eu
|
|
To check machine check handling, add support to inject slb
multihit errors.
Co-developed-by: Mahesh Salgaonkar <[email protected]>
Signed-off-by: Mahesh Salgaonkar <[email protected]>
Signed-off-by: Ganesh Goudar <[email protected]>
[mpe: Use CONFIG_PPC_BOOK3S_64 to fix compile errors reported by [email protected]]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
440/460 variants and 470 variants are not compatible, no
need to make code supporting both and using MMU features.
Just use CONFIG_PPC_47x to decide what to build.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/c3e64da3d5d068c69a201e03bbae7da055761e5b.1603041883.git.christophe.leroy@csgroup.eu
|
|
Since commit 10b35d9978ac ("[PATCH] powerpc: merged asm/cputable.h"),
CPU_FTR_COHERENT_ICACHE has always been defined.
Remove the #ifndef CPU_FTR_COHERENT_ICACHE block.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/e26ddc1d6f6aca739dd8d2b7c67351ead559b084.1602489664.git.christophe.leroy@csgroup.eu
|
|
MMU_FTR_TYPE_44x cannot be checked by cpu_has_feature()
Use mmu_has_feature() instead
Fixes: 23eb7f560a2a ("powerpc: Convert flush_icache_range & friends to C")
Cc: [email protected]
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/ceede82fadf37f3b8275e61fcf8cf29a3e2ec7fe.1602351011.git.christophe.leroy@csgroup.eu
|
|
SPRN_SPRG_PGDIR is there mainly to speedup SW TLB miss handlers
for powerpc 603.
We need to free SPRN_SPRG2 to reduce the mess with CONFIG_VMAP_STACK.
In hash_page(), reading PGDIR from thread_struct will be in the noise
performance wise.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/4adca19b7120cdf619956768ed09e74fc6a558f3.1606285014.git.christophe.leroy@csgroup.eu
|
|
We now always map kernel text with BATs. Neither need to preload
hash with kernel text addresses nor ensure they are never evicted.
This is more or less a revert of commit ee4f2ea48674 ("[POWERPC] Fix
32-bit mm operations when not using BATs")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/0a0bab7fadd89aa829e33420fbc10d60c59040a7.1606285014.git.christophe.leroy@csgroup.eu
|
|
Since commit 2b279c0348af ("powerpc/32s: Allow mapping with BATs with
DEBUG_PAGEALLOC"), there is no real situation where mapping without
BATs is required.
In order to simplify memory handling, always map kernel text
and rodata with BATs even when "nobats" kernel parameter is set.
Also fix the 603 TLB miss exceptions that don't require anymore
kernel page table if DEBUG_PAGEALLOC.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/da51f7ec632825a4ce43290a904aad61648408c0.1606285013.git.christophe.leroy@csgroup.eu
|
|
Don't enable KUEP/KUAP if we support less than or equal to 3 keys.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
With hash translation use DSISR_KEYFAULT to identify a wrong access.
With Radix we look at the AMR value and type of fault.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Now that kernel correctly store/restore userspace AMR/IAMR values, avoid
manipulating AMR and IAMR from the kernel on behalf of userspace.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
On fork, we inherit from the parent and on exec, we should switch to default_amr values.
Also, avoid changing the AMR register value within the kernel. The kernel now runs with
different AMR values.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This patch updates kernel hash page table entries to use storage key 3
for its mapping. This implies all kernel access will now use key 3 to
control READ/WRITE. The patch also prevents the allocation of key 3 from
userspace and UAMOR value is updated such that userspace cannot modify key 3.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This is in preparation to adding support for kuap with hash translation.
In preparation for that rename/move kuap related functions to
non radix names. Also move the feature bit closer to MMU_FTR_KUEP.
MMU_FTR_KUEP is renamed to MMU_FTR_BOOK3S_KUEP to indicate the feature
is only relevant to BOOK3S_64
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The next set of patches adds support for kuep with hash translation.
In preparation for that rename/move kuap related functions to
non radix names.
Also set MMU_FTR_KUEP and add the missing isync().
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The next set of patches adds support for kuap with hash translation.
In preparation for that rename/move kuap related functions to
non radix names.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This patch consolidates UAMOR update across pkey, kuap and kuep features.
The boot cpu initialize UAMOR via pkey init and both radix/hash do the
secondary cpu UAMOR init in early_init_mmu_secondary.
We don't check for mmu_feature in radix secondary init because UAMOR
is a supported SPRN with all CPUs supporting radix translation.
The old code was not updating UAMOR if we had smap disabled and smep enabled.
This change handles that case.
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The config CONFIG_PPC_PKEY is used to select the base support that is
required for PPC_MEM_KEYS, KUAP, and KUEP. Adding this dependency
reduces the code complexity(in terms of #ifdefs) and enables us to
move some of the initialization code to pkeys.c
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Since ISA v3.0, SLB no longer uses the slb_cache, and stab_rr is no
longer correlated with SLB allocation. Move those to pre-3.0.
While here, improve some alignments and reduce whitespace.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Commit e75130f20b1f ("powerpc/numa: Offline memoryless cpuless node 0")
offlines node 0 and expects nodes to be subsequently onlined when CPUs
or nodes are detected.
Commit 6398eaa26816 ("powerpc/numa: Prefer node id queried from vphn")
skips onlining node 0 when CPUs are associated with node 0.
On systems with node 0 having CPUs but no memory, this causes node 0 be
marked offline. This causes issues at boot time when trying to set
memory node for online CPUs while building the zonelist.
0:mon> t
[link register ] c000000000400354 __build_all_zonelists+0x164/0x280
[c00000000161bda0] c0000000016533c8 node_states+0x20/0xa0 (unreliable)
[c00000000161bdc0] c000000000400384 __build_all_zonelists+0x194/0x280
[c00000000161be30] c000000001041800 build_all_zonelists_init+0x4c/0x118
[c00000000161be80] c0000000004020d0 build_all_zonelists+0x190/0x1b0
[c00000000161bef0] c000000001003cf8 start_kernel+0x18c/0x6a8
[c00000000161bf90] c00000000000adb4 start_here_common+0x1c/0x3e8
0:mon> r
R00 = c000000000400354 R16 = 000000000b57a0e8
R01 = c00000000161bda0 R17 = 000000000b57a6b0
R02 = c00000000161ce00 R18 = 000000000b5afee8
R03 = 0000000000000000 R19 = 000000000b6448a0
R04 = 0000000000000000 R20 = fffffffffffffffd
R05 = 0000000000000000 R21 = 0000000001400000
R06 = 0000000000000000 R22 = 000000001ec00000
R07 = 0000000000000001 R23 = c000000001175580
R08 = 0000000000000000 R24 = c000000001651ed8
R09 = c0000000017e84d8 R25 = c000000001652480
R10 = 0000000000000000 R26 = c000000001175584
R11 = c000000c7fac0d10 R27 = c0000000019568d0
R12 = c000000000400180 R28 = 0000000000000000
R13 = c000000002200000 R29 = c00000000164dd78
R14 = 000000000b579f78 R30 = 0000000000000000
R15 = 000000000b57a2b8 R31 = c000000001175584
pc = c000000000400194 local_memory_node+0x24/0x80
cfar= c000000000074334 mcount+0xc/0x10
lr = c000000000400354 __build_all_zonelists+0x164/0x280
msr = 8000000002001033 cr = 44002284
ctr = c000000000400180 xer = 0000000000000001 trap = 380
dar = 0000000000001388 dsisr = c00000000161bc90
0:mon>
Fix this by setting node to be online while onlining CPUs that belong to
node 0.
Fixes: e75130f20b1f ("powerpc/numa: Offline memoryless cpuless node 0")
Fixes: 6398eaa26816 ("powerpc/numa: Prefer node id queried from vphn")
Reported-by: Milan Mohanty <[email protected]>
Signed-off-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When offlining a CPU, powerpc/64s does not flush TLBs, rather it just
leaves the CPU set in mm_cpumasks, so it continues to receive TLBIEs
to manage its TLBs.
However the exit_flush_lazy_tlbs() function expects that after
returning, all CPUs (except self) have flushed TLBs for that mm, in
which case TLBIEL can be used for this flush. This breaks for offline
CPUs because they don't get the IPI to flush their TLB. This can lead
to stale translations.
Fix this by clearing the CPU from mm_cpumasks, then flushing all TLBs
before going offline.
These offlined CPU bits stuck in the cpumask also prevents the cpumask
from being trimmed back to local mode, which means continual broadcast
IPIs or TLBIEs are needed for TLB flushing. This patch prevents that
situation too.
A cast of many were involved in working this out, but in particular
Milton, Aneesh, Paul made key discoveries.
Fixes: 0cef77c7798a7 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask")
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Debugged-by: Milton Miller <[email protected]>
Debugged-by: Aneesh Kumar K.V <[email protected]>
Debugged-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
tlbiel_all() can not be usable in !HVMODE when running hash presently,
remove HV privileged flushes when running in guest to make it usable.
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
A typo has the R field of the instruction assigned by lucky dip a la
register allocator.
Fixes: d4748276ae14c ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The core-mm has a default __weak implementation of phys_to_target_node()
to mirror the weak definition of memory_add_physaddr_to_nid(). That
symbol is exported for modules. However, while the export in
mm/memory_hotplug.c exported the symbol in the configuration cases of:
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=y
...and:
CONFIG_NUMA_KEEP_MEMINFO=n
CONFIG_MEMORY_HOTPLUG=y
...it failed to export the symbol in the case of:
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=n
Not only is that broken, but Christoph points out that the kernel should
not be exporting any __weak symbol, which means that
memory_add_physaddr_to_nid() example that phys_to_target_node() copied
is broken too.
Rework the definition of phys_to_target_node() and
memory_add_physaddr_to_nid() to not require weak symbols. Move to the
common arch override design-pattern of an asm header defining a symbol
to replace the default implementation.
The only common header that all memory_add_physaddr_to_nid() producing
architectures implement is asm/sparsemem.h. In fact, powerpc already
defines its memory_add_physaddr_to_nid() helper in sparsemem.h.
Double-down on that observation and define phys_to_target_node() where
necessary in asm/sparsemem.h. An alternate consideration that was
discarded was to put this override in asm/numa.h, but that entangles
with the definition of MAX_NUMNODES relative to the inclusion of
linux/nodemask.h, and requires powerpc to grow a new header.
The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
now that the symbol is properly exported / stubbed in all combinations
of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.
[[email protected]: v4]
Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com
[[email protected]: powerpc: fix create_section_mapping compile warning]
Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation")
Reported-by: Randy Dunlap <[email protected]>
Reported-by: Thomas Gleixner <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reported-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Tested-by: Thomas Gleixner <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Let's revert what we did in case something goes wrong and we return an
error - as already done on arm64 and s390x.
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The single caller (arch_remove_linear_mapping()) prints a proper
warning when this function fails. No need to eventually crash the
kernel - let's drop this WARN_ON.
Suggested-by: Oscar Salvador <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Let's print a warning similar to in arch_add_linear_mapping() instead of
WARN_ON_ONCE() and eventually crashing the kernel.
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This code currently relies on mem_hotplug_begin()/mem_hotplug_done() -
create_section_mapping()/remove_section_mapping() implementations
cannot tollerate getting called concurrently.
Let's prepare for callers (memtrace) not holding any such locks (and
don't force them to mess with memory hotplug locks).
Other parts in these functions don't seem to rely on external locking.
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
We want to stop abusing memory hotplug infrastructure in memtrace code
to perform allocations and remove the linear mapping. Instead we will use
alloc_contig_pages() and remove the linear mapping manually.
Let's factor out creating/removing the linear mapping into
arch_create_linear_mapping() / arch_remove_linear_mapping() - so in the
future, we might be able to have whole arch_add_memory() /
arch_remove_memory() be implemented in common code.
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Fixes coccicheck warning:
./arch/powerpc/mm/pgtable_32.c:87:11-12: WARNING comparing pointer to 0
Avoid pointer type value compared to 0.
Reported-by: Tosk Robot <[email protected]>
Signed-off-by: Kaixu Xia <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|