Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx
Pull x86 MPX removal from Dave Hansen:
"MPX requires recompiling applications, which requires compiler
support. Unfortunately, GCC 9.1 is expected to be be released without
support for MPX. This means that there was only a relatively small
window where folks could have ever used MPX. It failed to gain wide
adoption in the industry, and Linux was the only mainstream OS to ever
support it widely.
Support for the feature may also disappear on future processors.
This set completes the process that we started during the 5.4 merge
window when the MPX prctl()s were removed. XSAVE support is left in
place, which allows MPX-using KVM guests to continue to function"
* tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx:
x86/mpx: remove MPX from arch/x86
mm: remove arch_bprm_mm_init() hook
x86/mpx: remove bounds exception code
x86/mpx: remove build infrastructure
x86/alternatives: add missing insn.h include
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
Second KVM PPC update for 5.6
* Fix compile warning on 32-bit machines
* Fix locking error in secure VM support
|
|
Pull SCSI updates from James Bottomley:
"This series is slightly unusual because it includes Arnd's compat
ioctl tree here:
1c46a2cf2dbd Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue
Excluding Arnd's changes, this is mostly an update of the usual
drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas.
There are a couple of core and base updates around error propagation
and atomicity in the attribute container base we use for the SCSI
transport classes.
The rest is minor changes and updates"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits)
scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask
scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity
scsi: hisi_sas: Modify the file permissions of trigger_dump to write only
scsi: hisi_sas: Replace magic number when handle channel interrupt
scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock
scsi: hisi_sas: use threaded irq to process CQ interrupts
scsi: ufs: Use UFS device indicated maximum LU number
scsi: ufs: Add max_lu_supported in struct ufs_dev_info
scsi: ufs: Delete is_init_prefetch from struct ufs_hba
scsi: ufs: Inline two functions into their callers
scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init()
scsi: ufs: Split ufshcd_probe_hba() based on its called flow
scsi: ufs: Delete struct ufs_dev_desc
scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails
scsi: ufs-mediatek: enable low-power mode for hibern8 state
scsi: ufs: export some functions for vendor usage
scsi: ufs-mediatek: add dbg_register_dump implementation
scsi: qla2xxx: Fix a NULL pointer dereference in an error path
scsi: qla1280: Make checking for 64bit support consistent
scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- Cleanup of the GOP [graphics output] handling code in the EFI stub
- Complete refactoring of the mixed mode handling in the x86 EFI stub
- Overhaul of the x86 EFI boot/runtime code
- Increase robustness for mixed mode code
- Add the ability to disable DMA at the root port level in the EFI
stub
- Get rid of RWX mappings in the EFI memory map and page tables,
where possible
- Move the support code for the old EFI memory mapping style into its
only user, the SGI UV1+ support code.
- plus misc fixes, updates, smaller cleanups.
... and due to interactions with the RWX changes, another round of PAT
cleanups make a guest appearance via the EFI tree - with no side
effects intended"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
efi/x86: Disable instrumentation in the EFI runtime handling code
efi/libstub/x86: Fix EFI server boot failure
efi/x86: Disallow efi=old_map in mixed mode
x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld
efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping
efi: Fix handling of multiple efi_fake_mem= entries
efi: Fix efi_memmap_alloc() leaks
efi: Add tracking for dynamically allocated memmaps
efi: Add a flags parameter to efi_memory_map
efi: Fix comment for efi_mem_type() wrt absent physical addresses
efi/arm: Defer probe of PCIe backed efifb on DT systems
efi/x86: Limit EFI old memory map to SGI UV machines
efi/x86: Avoid RWX mappings for all of DRAM
efi/x86: Don't map the entire kernel text RW for mixed mode
x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd
efi/libstub/x86: Fix unused-variable warning
efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode
efi/libstub/x86: Use const attribute for efi_is_64bit()
efi: Allow disabling PCI busmastering on bridges during boot
efi/x86: Allow translating 64-bit arguments for mixed mode calls
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The RCU changes in this cycle were:
- Expedited grace-period updates
- kfree_rcu() updates
- RCU list updates
- Preemptible RCU updates
- Torture-test updates
- Miscellaneous fixes
- Documentation updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
rcu: Remove unused stop-machine #include
powerpc: Remove comment about read_barrier_depends()
.mailmap: Add entries for old [email protected] addresses
srcu: Apply *_ONCE() to ->srcu_last_gp_end
rcu: Switch force_qs_rnp() to for_each_leaf_node_cpu_mask()
rcu: Move rcu_{expedited,normal} definitions into rcupdate.h
rcu: Move gp_state_names[] and gp_state_getname() to tree_stall.h
rcu: Remove the declaration of call_rcu() in tree.h
rcu: Fix tracepoint tracking RCU CPU kthread utilization
rcu: Fix harmless omission of "CONFIG_" from #if condition
rcu: Avoid tick_dep_set_cpu() misordering
rcu: Provide wrappers for uses of ->rcu_read_lock_nesting
rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special()
rcu: Clear ->rcu_read_unlock_special only once
rcu: Clear .exp_hint only when deferred quiescent state has been reported
rcu: Rename some instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU
rcu: Remove kfree_call_rcu_nobatch()
rcu: Remove kfree_rcu() special casing and lazy-callback handling
rcu: Add support for debug_objects debugging for kfree_rcu()
rcu: Add multiple in-flight batches of kfree_rcu() work
...
|
|
Implement user_access_save() and user_access_restore()
On 8xx and radix:
- On save, get the value of the associated special register then
prevent user access.
- On restore, set back the saved value to the associated special
register.
On book3s/32:
- On save, get the value stored in current->thread.kuap and prevent
user access.
- On restore, regenerate address range from the stored value and
reopen read/write access for that range.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/54f2f74938006b33c55a416674807b42ef222068.1579866752.git.christophe.leroy@c-s.fr
|
|
Today, when a function like strncpy_from_user() is called,
the userspace access protection is de-activated and re-activated
for every word read.
By implementing user_access_begin and friends, the protection
is de-activated at the beginning of the copy and re-activated at the
end.
Implement user_access_begin(), user_access_end() and
unsafe_get_user(), unsafe_put_user() and unsafe_copy_to_user()
For the time being, we keep user_access_save() and
user_access_restore() as nops.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/36d4fbf9e56a75994aca4ee2214c77b26a5a8d35.1579866752.git.christophe.leroy@c-s.fr
|
|
In preparation of implementing user_access_begin and friends
on powerpc, the book3s/32 version of prevent_user_access() need
to be prepared for user_access_end().
user_access_end() doesn't provide the address and size which
were passed to user_access_begin(), required by prevent_user_access()
to know which segment to modify.
The list of segments which where unprotected by allow_user_access()
are available in current->kuap. But we don't want prevent_user_access()
to read this all the time, especially everytime it is 0 (for instance
because the access was not a write access).
Implement a special direction named KUAP_CURRENT. In this case only,
the addr and end are retrieved from current->kuap.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/55bcc1f25d8200892a31f67a0b024ff3b816c3cc.1579866752.git.christophe.leroy@c-s.fr
|
|
NULL addr is a user address. Don't waste time checking it. If
someone tries to access it, it will SIGFAULT the same way as for
address 1, so no need to make it special.
The special case is when not doing a write, in that case we want
to drop the entire function. This is now handled by 'dir' param
and not by the nulity of 'to' anymore.
Also make beginning of prevent_user_access() similar
to beginning of allow_user_access(), and tell the compiler
that writing in kernel space or with a 0 length is unlikely
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/85e971223dfe6ace734637db1841678939a76155.1579866752.git.christophe.leroy@c-s.fr
|
|
__builtin_constant_p() always return 0 for pointers, so on RADIX
we always end up opening both direction (by writing 0 in SPR29):
0000000000000170 <._copy_to_user>:
...
1b0: 4c 00 01 2c isync
1b4: 39 20 00 00 li r9,0
1b8: 7d 3d 03 a6 mtspr 29,r9
1bc: 4c 00 01 2c isync
1c0: 48 00 00 01 bl 1c0 <._copy_to_user+0x50>
1c0: R_PPC64_REL24 .__copy_tofrom_user
...
0000000000000220 <._copy_from_user>:
...
2ac: 4c 00 01 2c isync
2b0: 39 20 00 00 li r9,0
2b4: 7d 3d 03 a6 mtspr 29,r9
2b8: 4c 00 01 2c isync
2bc: 7f c5 f3 78 mr r5,r30
2c0: 7f 83 e3 78 mr r3,r28
2c4: 48 00 00 01 bl 2c4 <._copy_from_user+0xa4>
2c4: R_PPC64_REL24 .__copy_tofrom_user
...
Use an explicit parameter for direction selection, so that GCC
is able to see it is a constant:
00000000000001b0 <._copy_to_user>:
...
1f0: 4c 00 01 2c isync
1f4: 3d 20 40 00 lis r9,16384
1f8: 79 29 07 c6 rldicr r9,r9,32,31
1fc: 7d 3d 03 a6 mtspr 29,r9
200: 4c 00 01 2c isync
204: 48 00 00 01 bl 204 <._copy_to_user+0x54>
204: R_PPC64_REL24 .__copy_tofrom_user
...
0000000000000260 <._copy_from_user>:
...
2ec: 4c 00 01 2c isync
2f0: 39 20 ff ff li r9,-1
2f4: 79 29 00 04 rldicr r9,r9,0,0
2f8: 7d 3d 03 a6 mtspr 29,r9
2fc: 4c 00 01 2c isync
300: 7f c5 f3 78 mr r5,r30
304: 7f 83 e3 78 mr r3,r28
308: 48 00 00 01 bl 308 <._copy_from_user+0xa8>
308: R_PPC64_REL24 .__copy_tofrom_user
...
Signed-off-by: Christophe Leroy <[email protected]>
[mpe: Spell out the directions, s/KUAP_R/KUAP_READ/ etc.]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/f4e88ec4941d5facb35ce75026b0112f980086c3.1579866752.git.christophe.leroy@c-s.fr
|
|
At the moment, bad_kuap_fault() reports a fault only if a bad access
to userspace occurred while access to userspace was not granted.
But if a fault occurs for a write outside the allowed userspace
segment(s) that have been unlocked, bad_kuap_fault() fails to
detect it and the kernel loops forever in do_page_fault().
Fix it by checking that the accessed address is within the allowed
range.
Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection")
Cc: [email protected] # v5.2+
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/f48244e9485ada0a304ed33ccbb8da271180c80d.1579866752.git.christophe.leroy@c-s.fr
|
|
Pull ioremap updates from Christoph Hellwig:
"Remove the ioremap_nocache API (plus wrappers) that are always
identical to ioremap"
* tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap:
remove ioremap_nocache and devm_ioremap_nocache
MIPS: define ioremap_nocache to ioremap
|
|
Add support of KASAN_VMALLOC on PPC32.
To allow this, the early shadow covering the VMALLOC space
need to be removed once high_memory var is set and before
freeing memblock.
And the VMALLOC area need to be aligned such that boundaries
are covered by a full shadow page.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/031dec5487bde9b2181c8b3c9800e1879cf98c1a.1579024426.git.christophe.leroy@c-s.fr
|
|
In order to ease stack overflow detection, align
stack to 2 * THREAD_SIZE when using VMAP_STACK.
This allows overflow detection using a single bit check.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/60e9ae86b7d2cdcf21468787076d345663648f46.1576916812.git.christophe.leroy@c-s.fr
|
|
To support CONFIG_VMAP_STACK, the kernel has to activate Data MMU
Translation for accessing the stack. Before doing that it must save
SRR0, SRR1 and also DAR and DSISR when relevant, in order to not
loose them in case there is a Data TLB Miss once the translation is
reactivated.
This patch adds fields in thread struct for saving those registers.
It prepares entry_32.S to handle exception entry with
Data MMU Translation enabled and alters EXCEPTION_PROLOG macros to
save SRR0, SRR1, DAR and DSISR then reenables Data MMU.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/a775a1fea60f190e0f63503463fb775310a2009b.1576916812.git.christophe.leroy@c-s.fr
|
|
We must not use the pointer output without validating the
success of the random read.
Acked-by: Michael Ellerman <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
The generic interface uses bool not int; match that.
Reviewed-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
These symbols are currently part of the generic archrandom.h
interface, but are currently unused and can be removed.
Signed-off-by: Richard Henderson <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Commit a25bd72badfa ("powerpc/mm/radix: Workaround prefetch issue with
KVM") introduced a number of workarounds as coming out of a guest with
the mmu enabled would make the cpu would start running in hypervisor
state with the PID value from the guest. The cpu will then start
prefetching for the hypervisor with that PID value.
In Power9 DD2.2 the cpu behaviour was modified to fix this. When
accessing Quadrant 0 in hypervisor mode with LPID != 0 prefetching will
not be performed. This means that we can get rid of the workarounds for
Power9 DD2.2 and later revisions. Add a new cpu feature
CPU_FTR_P9_RADIX_PREFETCH_BUG to indicate if the workarounds are needed.
Signed-off-by: Jordan Niethe <[email protected]>
Acked-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:
- Expedited grace-period updates
- kfree_rcu() updates
- RCU list updates
- Preemptible RCU updates
- Torture-test updates
- Miscellaneous fixes
- Documentation updates
Signed-off-by: Ingo Molnar <[email protected]>
|
|
'read_barrier_depends()' doesn't exist anymore so stop talking about it.
Signed-off-by: Will Deacon <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
Move the kvm_cpu_{un}init() calls to common PPC code as an intermediate
step towards removing kvm_cpu_{un}init() altogether.
No functional change intended.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Move allocation of all flavors of PPC vCPUs to common PPC code. All
variants either allocate 'struct kvm_vcpu' directly, or require that
the embedded 'struct kvm_vcpu' member be located at offset 0, i.e.
guarantee that the allocation can be directly interpreted as a 'struct
kvm_vcpu' object.
Remove the message from the build-time assertion regarding placement of
the struct, as compatibility with the arch usercopy region is no longer
the sole dependent on 'struct kvm_vcpu' being at offset zero.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
From: Dave Hansen <[email protected]>
MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).
arch_bprm_mm_init() is used at execve() time. The only non-stub
implementation is on x86 for MPX. Remove the hook entirely from
all architectures and generic code.
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: [email protected]
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
|
|
As reported by ./scripts/checkpatch.pl --strict:
CHECK: extern prototypes should be avoided in .h files
Signed-off-by: Greg Kurz <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The powerpc PCI code requires that a pci_dn structure exists for all
devices in the system. This is fine for real devices since at boot a pci_dn
is created for each PCI device in the DT and it's fine for hotplugged devices
since the hotplug slot driver will manage the pci_dn's devices in hotplug
slots. For SR-IOV, we need the platform / pcibios to manage the pci_dn for
virtual functions since firmware is unaware of VFs, and they aren't
"hot plugged" in the traditional sense.
Management of the pci_dn is handled by the, poorly named, functions:
add_pci_dev_data() and remove_pci_dev_data(). The entire body of these
functions is #ifdef`ed around CONFIG_PCI_IOV and they cannot be used
in any other context, so make them only available when CONFIG_PCI_IOV
is selected, and rename them to reflect their actual usage rather than
having them masquerade as generic code.
Signed-off-by: Oliver O'Halloran <[email protected]>
Reviewed-by: Sam Bobroff <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Unlike real PCI slots, opencapi slots are directly associated to
the (virtual) opencapi PHB, there's no intermediate bridge. So when
looking for a slot ID, we must start the search from the device node
itself and not its parent.
Also, the slot ID is not attached to a specific bdfn, so let's build
it from the PHB ID, like skiboot.
Signed-off-by: Frederic Barrat <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
On PPC32, the cache lines have a fixed size known at build time.
Don't read it from the datapage.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/dfa7b35e27e01964fcda84bf1ed8b2b31cf93826.1575273217.git.christophe.leroy@c-s.fr
|
|
__get_datapage() is only a few instructions to retrieve the
address of the page where the kernel stores data to the VDSO.
By inlining this function into its users, a bl/blr pair and
a mflr/mtlr pair is avoided, plus a few reg moves.
The improvement is noticeable (about 55 nsec/call on an 8xx)
vdsotest before the patch:
gettimeofday: vdso: 731 nsec/call
clock-gettime-realtime-coarse: vdso: 668 nsec/call
clock-gettime-monotonic-coarse: vdso: 745 nsec/call
vdsotest after the patch:
gettimeofday: vdso: 677 nsec/call
clock-gettime-realtime-coarse: vdso: 613 nsec/call
clock-gettime-monotonic-coarse: vdso: 690 nsec/call
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/c39ef7f3dfa25356b01e211d539671f279086c09.1575273217.git.christophe.leroy@c-s.fr
|
|
Unlike standard powerpc, Powerpc 8xx doesn't have SPRN_DABR, but
it has a breakpoint support based on a set of comparators which
allow more flexibility.
Commit 4ad8622dc548 ("powerpc/8xx: Implement hw_breakpoint")
implemented breakpoints by emulating the DABR behaviour. It did
this by setting one comparator the match 4 bytes at breakpoint address
and the other comparator to match 4 bytes at breakpoint address + 4.
Rewrite 8xx hw_breakpoint to make breakpoints match all addresses
defined by the breakpoint address and length by making full use of
comparators.
Now, comparator E is set to match any address greater than breakpoint
address minus one. Comparator F is set to match any address lower than
breakpoint address plus breakpoint length. Addresses are aligned
to 32 bits.
When the breakpoint range starts at address 0, the breakpoint is set
to match comparator F only. When the breakpoint range end at address
0xffffffff, the breakpoint is set to match comparator E only.
Otherwise the breakpoint is set to match comparator E and F.
At the same time, use registers bit names instead of hardcoded values.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/05105deeaf63bc02151aea2cdeaf525534e0e9d4.1574790198.git.christophe.leroy@c-s.fr
|
|
Selecting CONFIG_PPC_DEBUG_WX only impacts ptdump and pgtable_32/64
init calls. Declaring related functions in asm/pgtable.h implies
rebuilding almost everything.
Move ptdump_check_wx() declaration in mm/mmu_decl.h
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/bf34fd9dca61eadf9a134a9f89ebbc162cfd5f86.1578986011.git.christophe.leroy@c-s.fr
|
|
Commit 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in
the same 0xc range") has a bug in the definition of MIN_USER_CONTEXT.
The result is that the context id used for the vmemmap and the lowest
context id handed out to userspace are the same. The context id is
essentially the process identifier as far as the first stage of the
MMU translation is concerned.
This can result in multiple SLB entries with the same VSID (Virtual
Segment ID), accessible to the kernel and some random userspace
process that happens to get the overlapping id, which is not expected
eg:
07 c00c000008000000 40066bdea7000500 1T ESID= c00c00 VSID= 66bdea7 LLP:100
12 0002000008000000 40066bdea7000d80 1T ESID= 200 VSID= 66bdea7 LLP:100
Even though the user process and the kernel use the same VSID, the
permissions in the hash page table prevent the user process from
reading or writing to any kernel mappings.
It can also lead to SLB entries with different base page size
encodings (LLP), eg:
05 c00c000008000000 00006bde0053b500 256M ESID=c00c00000 VSID= 6bde0053b LLP:100
09 0000000008000000 00006bde0053bc80 256M ESID= 0 VSID= 6bde0053b LLP: 0
Such SLB entries can result in machine checks, eg. as seen on a G5:
Oops: Machine check, sig: 7 [#1]
BE PAGE SIZE=64K MU-Hash SMP NR_CPUS=4 NUMA Power Mac
NIP: c00000000026f248 LR: c000000000295e58 CTR: 0000000000000000
REGS: c0000000erfd3d70 TRAP: 0200 Tainted: G M (5.5.0-rcl-gcc-8.2.0-00010-g228b667d8ea1)
MSR: 9000000000109032 <SF,HV,EE,ME,IR,DR,RI> CR: 24282048 XER: 00000000
DAR: c00c000000612c80 DSISR: 00000400 IRQMASK: 0
...
NIP [c00000000026f248] .kmem_cache_free+0x58/0x140
LR [c088000008295e58] .putname 8x88/0xa
Call Trace:
.putname+0xB8/0xa
.filename_lookup.part.76+0xbe/0x160
.do_faccessat+0xe0/0x380
system_call+0x5c/ex68
This happens with 256MB segments and 64K pages, as the duplicate VSID
is hit with the first vmemmap segment and the first user segment, and
older 32-bit userspace maps things in the first user segment.
On other CPUs a machine check is not seen. Instead the userspace
process can get stuck continuously faulting, with the fault never
properly serviced, due to the kernel not understanding that there is
already a HPTE for the address but with inaccessible permissions.
On machines with 1T segments we've not seen the bug hit other than by
deliberately exercising it. That seems to be just a matter of luck
though, due to the typical layout of the user virtual address space
and the ranges of vmemmap that are typically populated.
To fix it we add 2 to MIN_USER_CONTEXT. This ensures the lowest
context given to userspace doesn't overlap with the VMEMMAP context,
or with the context for INVALID_REGION_ID.
Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range")
Cc: [email protected] # v5.2+
Reported-by: Christian Marillat <[email protected]>
Reported-by: Romain Dolbeau <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
[mpe: Account for INVALID_REGION_ID, mostly rewrite change log]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
A load on an ESB page returning all 1's means that the underlying
device has invalidated the access to the PQ state of the interrupt
through mmio. It may happen, for example when querying a PHB interrupt
while the PHB is in an error state.
In that case, we should consider the interrupt to be invalid when
checking its state in the irq_get_irqchip_state() handler.
Fixes: da15c03b047d ("powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race")
Cc: [email protected] # v5.4+
Signed-off-by: Frederic Barrat <[email protected]>
[clg: wrote a commit log, introduced XIVE_ESB_INVALID ]
Signed-off-by: Cédric Le Goater <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Implement the H_SVM_INIT_ABORT hcall which the Ultravisor can use to
abort an SVM after it has issued the H_SVM_INIT_START and before the
H_SVM_INIT_DONE hcalls. This hcall could be used when Ultravisor
encounters security violations or other errors when starting an SVM.
Note that this hcall is different from UV_SVM_TERMINATE ucall which
is used by HV to terminate/cleanup an VM that has becore secure.
The H_SVM_INIT_ABORT basically undoes operations that were done
since the H_SVM_INIT_START hcall - i.e page-out all the VM pages back
to normal memory, and terminate the SVM.
(If we do not bring the pages back to normal memory, the text/data
of the VM would be stuck in secure memory and since the SVM did not
go secure, its MSR_S bit will be clear and the VM wont be able to
access its pages even to do a clean exit).
Based on patches and discussion with Paul Mackerras, Ram Pai and
Bharata Rao.
Signed-off-by: Ram Pai <[email protected]>
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Bharata B Rao <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Add 'skip_page_out' parameter to kvmppc_uvmem_drop_pages() so the
callers can specify whetheter or not to skip paging out pages. This
will be needed in a follow-on patch that implements H_SVM_INIT_ABORT
hcall.
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux into arm/drivers
NXP/FSL SoC driver updates for v5.6
QUICC Engine drivers
- Improve the QE drivers to be compatible with ARM/ARM64/PPC64
architectures
- Various cleanups to the QE drivers
* tag 'soc-fsl-next-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux: (49 commits)
soc: fsl: qe: remove set but not used variable 'mm_gc'
soc: fsl: qe: remove PPC32 dependency from CONFIG_QUICC_ENGINE
soc: fsl: qe: remove unused #include of asm/irq.h from ucc.c
net: ethernet: freescale: make UCC_GETH explicitly depend on PPC32
net/wan/fsl_ucc_hdlc: reject muram offsets above 64K
net/wan/fsl_ucc_hdlc: fix reading of __be16 registers
net/wan/fsl_ucc_hdlc: avoid use of IS_ERR_VALUE()
soc: fsl: qe: avoid IS_ERR_VALUE in ucc_fast.c
soc: fsl: qe: drop pointless check in qe_sdma_init()
soc: fsl: qe: drop use of IS_ERR_VALUE in qe_sdma_init()
soc: fsl: qe: avoid IS_ERR_VALUE in ucc_slow.c
soc: fsl: qe: refactor cpm_muram_alloc_common to prevent BUG on error path
soc: fsl: qe: drop broken lazy call of cpm_muram_init()
soc: fsl: qe: make cpm_muram_free() ignore a negative offset
soc: fsl: qe: make cpm_muram_free() return void
soc: fsl: qe: change return type of cpm_muram_alloc() to s32
serial: ucc_uart: access __be32 field using be32_to_cpu
serial: ucc_uart: limit brg-frequency workaround to PPC32
serial: ucc_uart: use of_property_read_u32() in ucc_uart_probe()
serial: ucc_uart: stub out soft_uart_init for !CONFIG_PPC32
...
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Olof Johansson <[email protected]>
|
|
This implements the tricky tracing and soft irq handling bits in C,
leaving the low level bit to asm.
A functional difference is that this redirects the interrupt exit to
a return stub to execute blr, rather than the lr address itself. This
is probably barely measurable on real hardware, but it keeps the link
stack balanced.
Tested with QEMU.
Signed-off-by: Nicholas Piggin <[email protected]>
[mpe: Move power4_fixup_nap back into exceptions-64s.S]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When CONFIG_RELOCATABLE=y is set, VIRT_PHYS_OFFSET is a 64bit variable,
thus __pa() returns as 64bit value.
But when CONFIG_RELOCATABLE=n, __pa() returns 32bit value.
When CONFIG_PHYS_64BIT is set, __pa() should consistently return as
64bit value irrelevant to CONFIG_RELOCATABLE.
So we'd make __pa() consistently return phys_addr_t, which is 64bit
when CONFIG_PHYS_64BIT is set.
Signed-off-by: Bai Yingjie <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
ioremap has provided non-cached semantics by default since the Linux 2.6
days, so remove the additional ioremap_nocache interface.
Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
|
|
H_PUT_TCE_INDIRECT allows packing up to 512 TCE updates into a single
hypercall; H_STUFF_TCE can clear lots in a single hypercall too.
However, unlike H_STUFF_TCE (which writes the same TCE to all entries),
H_PUT_TCE_INDIRECT uses a 4K page with new TCEs. In a secure VM
environment this means sharing a secure VM page with a hypervisor which
we would rather avoid.
This splits the FW_FEATURE_MULTITCE feature into FW_FEATURE_PUT_TCE_IND
and FW_FEATURE_STUFF_TCE. "hcall-multi-tce" in
the "/rtas/ibm,hypertas-functions" device tree property sets both;
the "multitce=off" kernel command line parameter disables both.
This should not cause behavioural change.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Thiago Jung Bauermann <[email protected]>
Tested-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
With the previous patch applied pcibios_setup_device() will always be run
when pcibios_bus_add_device() is called. There are several code paths where
pcibios_setup_bus_device() is still called (the PowerPC specific PCI
hotplug support is one) so with just the previous patch applied the setup
can be run multiple times on a device, once before the device is added
to the bus and once after.
There's no need to run the setup in the early case any more so just
remove it entirely.
Signed-off-by: Oliver O'Halloran <[email protected]>
Tested-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In order to avoid needless #ifdef CONFIG_COMPAT checks,
move the compat_ptr() definition to linux/compat.h
where it can be seen by any file regardless of the
architecture.
Only s390 needs a special definition, this can use the
self-#define trick we have elsewhere.
Reviewed-by: Ben Hutchings <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
Recently, the spinlock implementation grew a static key optimization,
but the jump_label.h header include was left out, leading to build
errors:
linux/arch/powerpc/include/asm/spinlock.h:44:7: error: implicit declaration of function ‘static_branch_unlikely’
44 | if (!static_branch_unlikely(&shared_processor))
This commit adds the missing header.
mpe: The build break is only seen with CONFIG_JUMP_LABEL=n.
Fixes: 656c21d6af5d ("powerpc/shared: Use static key to detect shared processor")
Signed-off-by: Jason A. Donenfeld <[email protected]>
Reviewed-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The KUAP implementation adds calls in clear_user() to enable and
disable access to userspace memory. However, it doesn't add these to
__clear_user(), which is used in the ptrace regset code.
As there's only one direct user of __clear_user() (the regset code),
and the time taken to set the AMR for KUAP purposes is going to
dominate the cost of a quick access_ok(), there's not much point
having a separate path.
Rename __clear_user() to __arch_clear_user(), and make __clear_user()
just call clear_user().
Reported-by: [email protected]
Reported-by: Daniel Axtens <[email protected]>
Suggested-by: Michael Ellerman <[email protected]>
Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection")
Signed-off-by: Andrew Donnellan <[email protected]>
[mpe: Use __arch_clear_user() for the asm version like arm64 & nds32]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
With the static key shared processor available, is_shared_processor()
can return without having to query the lppaca structure.
Signed-off-by: Srikar Dronamraju <[email protected]>
Acked-by: Phil Auld <[email protected]>
Acked-by: Waiman Long <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on
pre-empted vCPUs"), the scheduler avoids preempted vCPUs to schedule
tasks on wakeup. This leads to wrong choice of CPU, which in-turn
leads to larger wakeup latencies. Eventually, it leads to performance
regression in latency sensitive benchmarks like soltp, schbench etc.
On Powerpc, vcpu_is_preempted() only looks at yield_count. If the
yield_count is odd, the vCPU is assumed to be preempted. However
yield_count is increased whenever the LPAR enters CEDE state (idle).
So any CPU that has entered CEDE state is assumed to be preempted.
Even if vCPU of dedicated LPAR is preempted/donated, it should have
right of first-use since they are supposed to own the vCPU.
On a Power9 System with 32 cores:
# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 16
NUMA node(s): 2
Model: 2.2 (pvr 004e 0202)
Model name: POWER9 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 10240K
NUMA node0 CPU(s): 0-63
NUMA node1 CPU(s): 64-127
# perf stat -a -r 5 ./schbench
v5.4 v5.4 + patch
Latency percentiles (usec) Latency percentiles (usec)
50.0000th: 45 50.0th: 45
75.0000th: 62 75.0th: 63
90.0000th: 71 90.0th: 74
95.0000th: 77 95.0th: 78
*99.0000th: 91 *99.0th: 82
99.5000th: 707 99.5th: 83
99.9000th: 6920 99.9th: 86
min=0, max=10048 min=0, max=96
Latency percentiles (usec) Latency percentiles (usec)
50.0000th: 45 50.0th: 46
75.0000th: 61 75.0th: 64
90.0000th: 72 90.0th: 75
95.0000th: 79 95.0th: 79
*99.0000th: 691 *99.0th: 83
99.5000th: 3972 99.5th: 85
99.9000th: 8368 99.9th: 91
min=0, max=16606 min=0, max=117
Latency percentiles (usec) Latency percentiles (usec)
50.0000th: 45 50.0th: 46
75.0000th: 61 75.0th: 64
90.0000th: 71 90.0th: 75
95.0000th: 77 95.0th: 79
*99.0000th: 106 *99.0th: 83
99.5000th: 2364 99.5th: 84
99.9000th: 7480 99.9th: 90
min=0, max=10001 min=0, max=95
Latency percentiles (usec) Latency percentiles (usec)
50.0000th: 45 50.0th: 47
75.0000th: 62 75.0th: 65
90.0000th: 72 90.0th: 75
95.0000th: 78 95.0th: 79
*99.0000th: 93 *99.0th: 84
99.5000th: 108 99.5th: 85
99.9000th: 6792 99.9th: 90
min=0, max=17681 min=0, max=117
Latency percentiles (usec) Latency percentiles (usec)
50.0000th: 46 50.0th: 45
75.0000th: 62 75.0th: 64
90.0000th: 73 90.0th: 75
95.0000th: 79 95.0th: 79
*99.0000th: 113 *99.0th: 82
99.5000th: 2724 99.5th: 83
99.9000th: 6184 99.9th: 93
min=0, max=9887 min=0, max=111
Performance counter stats for 'system wide' (5 runs):
context-switches 43,373 ( +- 0.40% ) 44,597 ( +- 0.55% )
cpu-migrations 1,211 ( +- 5.04% ) 220 ( +- 6.23% )
page-faults 15,983 ( +- 5.21% ) 15,360 ( +- 3.38% )
Waiman Long suggested using static_keys.
Fixes: 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted vCPUs")
Cc: [email protected] # v4.18+
Reported-by: Parth Shah <[email protected]>
Reported-by: Ihor Pasichnyk <[email protected]>
Tested-by: Juri Lelli <[email protected]>
Acked-by: Waiman Long <[email protected]>
Reviewed-by: Gautham R. Shenoy <[email protected]>
Signed-off-by: Srikar Dronamraju <[email protected]>
Acked-by: Phil Auld <[email protected]>
Reviewed-by: Vaidyanathan Srinivasan <[email protected]>
Tested-by: Parth Shah <[email protected]>
[mpe: Move the key and setting of the key to pseries/setup.c]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
<linux/vmalloc.h>
In the x86 MM code we'd like to untangle various types of historic
header dependency spaghetti, but for this we'd need to pass to
the generic vmalloc code various vmalloc related defines that
customarily come via the <asm/page.h> low level arch header.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Some drivers, e.g. ucc_uart, need definitions from cpm.h. In order to
allow building those drivers for non-ppc based SOCs, move the header
to include/soc/fsl. For now, leave a trivial wrapper at the old
location so drivers can be updated one by one.
Reviewed-by: Timur Tabi <[email protected]>
Signed-off-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Li Yang <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:
"A few commits splitting the KASAN instrumented bitops header in three,
to match the split of the asm-generic bitops headers.
This is needed on powerpc because we use the generic bitops for the
non-atomic case only, whereas the existing KASAN instrumented bitops
assume all the underlying operations are provided by the arch as
arch_foo() versions.
Thanks to: Daniel Axtens & Christophe Leroy"
* tag 'powerpc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
docs/core-api: Remove possibly confusing sub-headings from Bit Operations
powerpc: support KASAN instrumentation of bitops
kasan: support instrumented bitops combined with generic bitops
|