aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/include/asm
AgeCommit message (Collapse)AuthorFilesLines
2019-02-26powerpc: sstep: Add support for maddhd, maddhdu, maddld instructionsSandipan Das1-1/+14
This adds emulation support for the following integer instructions: * Multiply-Add High Doubleword (maddhd) * Multiply-Add High Doubleword Unsigned (maddhdu) * Multiply-Add Low Doubleword (maddld) As suggested by Michael, this uses a raw .long for specifying the instruction word when using inline assembly to retain compatibility with older binutils. Signed-off-by: Sandipan Das <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc/64: Replace CURRENT_THREAD_INFO with PACA_THREAD_INFOChristophe Leroy2-6/+2
Now that current_thread_info is located at the beginning of 'current' task struct, CURRENT_THREAD_INFO macro is not really needed any more. This patch replaces it by loads of the value at PACA_THREAD_INFO(r13). Signed-off-by: Christophe Leroy <[email protected]> [mpe: Add PACA_THREAD_INFO rather than using PACACURRENT] Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPUChristophe Leroy1-2/+0
Now that thread_info is similar to task_struct, its address is in r2 so CURRENT_THREAD_INFO() macro is useless. This patch removes it. This patch also moves the 'tovirt(r2, r2)' down just before the reactivation of MMU translation, so that we keep the physical address of 'current' in r2 until then. It avoids a few calls to tophys(). At the same time, as the 'cpu' field is not anymore in thread_info, TI_CPU is renamed TASK_CPU by this patch. It also allows to get rid of a couple of '#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE' as ACCOUNT_CPU_USER_ENTRY() and ACCOUNT_CPU_USER_EXIT() are empty when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not defined. Signed-off-by: Christophe Leroy <[email protected]> [mpe: Fix a missed conversion of TI_CPU idle_6xx.S] Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc: 'current_set' is now a table of task_struct pointersChristophe Leroy1-2/+2
The table of pointers 'current_set' has been used for retrieving the stack and current. They used to be thread_info pointers as they were pointing to the stack and current was taken from the 'task' field of the thread_info. Now, the pointers of 'current_set' table are now both pointers to task_struct and pointers to thread_info. As they are used to get current, and the stack pointer is retrieved from current's stack field, this patch changes their type to task_struct, and renames secondary_ti to secondary_current. Reviewed-by: Nicholas Piggin <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc: regain entire stack spaceChristophe Leroy2-7/+6
thread_info is not anymore in the stack, so the entire stack can now be used. There is also no risk anymore of corrupting task_cpu(p) with a stack overflow so the patch removes the test. When doing this, an explicit test for NULL stack pointer is needed in validate_sp() as it is not anymore implicitely covered by the sizeof(thread_info) gap. In the meantime, with the previous patch all pointers to the stacks are not anymore pointers to thread_info so this patch changes them to void* Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc: Activate CONFIG_THREAD_INFO_IN_TASKChristophe Leroy4-23/+22
This patch activates CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. This has the following consequences: - thread_info is now located at the beginning of task_struct. - The 'cpu' field is now in task_struct, and only exists when CONFIG_SMP is active. - thread_info doesn't have anymore the 'task' field. This patch: - Removes all recopy of thread_info struct when the stack changes. - Changes the CURRENT_THREAD_INFO() macro to point to current. - Selects CONFIG_THREAD_INFO_IN_TASK. - Modifies raw_smp_processor_id() to get ->cpu from current without including linux/sched.h to avoid circular inclusion and without including asm/asm-offsets.h to avoid symbol names duplication between ASM constants and C constants. - Modifies klp_init_thread_info() to take a task_struct pointer argument. Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> [mpe: Add task_stack.h to livepatch.h to fix build fails] Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc: Use task_stack_page() in current_pt_regs()Christophe Leroy1-1/+1
Change current_pt_regs() to use task_stack_page() rather than current_thread_info() so that it keeps working once we enable THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <[email protected]> [mpe: Split out of large patch] Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc: Use linux/thread_info.h in processor.hChristophe Leroy1-1/+1
When we enable THREAD_INFO_IN_TASK we will remove our definition of current_thread_info(). Instead it will come from linux/thread_info.h So switch processor.h to include the latter, so that it can continue to find current_thread_info(). Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc: Use sizeof(struct thread_info) in INIT_SP_LIMITChristophe Leroy1-1/+1
Currently INIT_SP_LIMIT uses sizeof(init_thread_info), but that symbol won't exist when we enable THREAD_INFO_IN_TASK. So just use the sizeof the type which is the same value but will continue to work. Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc: Update comments in preparation for THREAD_INFO_IN_TASKChristophe Leroy1-1/+1
Update a few comments that talk about current_thread_info() in preparation for THREAD_INFO_IN_TASK. Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc: call_do_[soft]irq() takes a pointer to the stackChristophe Leroy1-2/+2
The purpose of the pointer given to call_do_softirq() and call_do_irq() is to point the new stack. Currently that's the same thing as the thread_info, but won't be with THREAD_INFO_IN_TASK. So change the parameter to void* and rename it 'sp'. Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc: Avoid circular header inclusion in mmu-hash.hChristophe Leroy4-96/+106
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes asm/current.h. This generates a circular dependency. To avoid that, asm/processor.h shall not be included in mmu-hash.h. In order to do that, this patch moves into a new header called asm/task_size_64/32.h all the TASK_SIZE related constants, which can then be included in mmu-hash.h directly. Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> [mpe: Split out all the TASK_SIZE constants not just 64-bit ones] Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWXChristophe Leroy1-1/+2
This patch implements handling of STRICT_KERNEL_RWX with large TLBs directly in the TLB miss handlers. To do so, etext and sinittext are aligned on 512kB boundaries and the miss handlers use 512kB pages instead of 8Mb pages for addresses close to the boundaries. It sets RO PP flags for addresses under sinittext. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWXChristophe Leroy1-0/+11
Today, STRICT_KERNEL_RWX is based on the use of regular pages to map kernel pages. On Book3s 32, it has three consequences: - Using pages instead of BAT for mapping kernel linear memory severely impacts performance. - Exec protection is not effective because no-execute cannot be set at page level (except on 603 which doesn't have hash tables) - Write protection is not effective because PP bits do not provide RO mode for kernel-only pages (except on 603 which handles it in software via PAGE_DIRTY) On the 603+, we have: - Independent IBAT and DBAT allowing limitation of exec parts. - NX bit can be set in segment registers to forbit execution on memory mapped by pages. - RO mode on DBATs even for kernel-only blocks. On the 601, there is nothing much we can do other than warn the user about it, because: - BATs are common to instructions and data. - BAT do not provide RO mode for kernel-only blocks. - segment registers don't have the NX bit. In order to use IBAT for exec protection, this patch: - Aligns _etext to BAT block sizes (128kb) - Set NX bit in kernel segment register (Except on vmalloc area when CONFIG_MODULES is selected) - Maps kernel text with IBATs. In order to use DBAT for exec protection, this patch: - Aligns RW DATA to BAT block sizes (4M) - Maps kernel RO area with write prohibited DBATs - Maps remaining memory with remaining DBATs Here is what we get with this patch on a 832x when activating STRICT_KERNEL_RWX: Symbols: c0000000 T _stext c0680000 R __start_rodata c0680000 R _etext c0800000 T __init_begin c0800000 T _sinittext ~# cat /sys/kernel/debug/block_address_translation ---[ Instruction Block Address Translation ]--- 0: 0xc0000000-0xc03fffff 0x00000000 Kernel EXEC coherent 1: 0xc0400000-0xc05fffff 0x00400000 Kernel EXEC coherent 2: 0xc0600000-0xc067ffff 0x00600000 Kernel EXEC coherent 3: - 4: - 5: - 6: - 7: - ---[ Data Block Address Translation ]--- 0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent 1: 0xc0800000-0xc0ffffff 0x00800000 Kernel RW coherent 2: 0xc1000000-0xc1ffffff 0x01000000 Kernel RW coherent 3: 0xc2000000-0xc3ffffff 0x02000000 Kernel RW coherent 4: 0xc4000000-0xc7ffffff 0x04000000 Kernel RW coherent 5: 0xc8000000-0xcfffffff 0x08000000 Kernel RW coherent 6: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent 7: - ~# cat /sys/kernel/debug/segment_registers ---[ User Segments ]--- 0x00000000-0x0fffffff Kern key 1 User key 1 VSID 0xa085d0 0x10000000-0x1fffffff Kern key 1 User key 1 VSID 0xa086e1 0x20000000-0x2fffffff Kern key 1 User key 1 VSID 0xa087f2 0x30000000-0x3fffffff Kern key 1 User key 1 VSID 0xa08903 0x40000000-0x4fffffff Kern key 1 User key 1 VSID 0xa08a14 0x50000000-0x5fffffff Kern key 1 User key 1 VSID 0xa08b25 0x60000000-0x6fffffff Kern key 1 User key 1 VSID 0xa08c36 0x70000000-0x7fffffff Kern key 1 User key 1 VSID 0xa08d47 0x80000000-0x8fffffff Kern key 1 User key 1 VSID 0xa08e58 0x90000000-0x9fffffff Kern key 1 User key 1 VSID 0xa08f69 0xa0000000-0xafffffff Kern key 1 User key 1 VSID 0xa0907a 0xb0000000-0xbfffffff Kern key 1 User key 1 VSID 0xa0918b ---[ Kernel Segments ]--- 0xc0000000-0xcfffffff Kern key 0 User key 1 No Exec VSID 0x000ccc 0xd0000000-0xdfffffff Kern key 0 User key 1 No Exec VSID 0x000ddd 0xe0000000-0xefffffff Kern key 0 User key 1 No Exec VSID 0x000eee 0xf0000000-0xffffffff Kern key 0 User key 1 No Exec VSID 0x000fff Aligning _etext to 128kb allows to map up to 32Mb text with 8 IBATs: 16Mb + 8Mb + 4Mb + 2Mb + 1Mb + 512kb + 256kb + 128kb (+ 128kb) = 32Mb (A 9th IBAT is unneeded as 32Mb would need only a single 32Mb block) Aligning data to 4M allows to map up to 512Mb data with 8 DBATs: 16Mb + 8Mb + 4Mb + 4Mb + 32Mb + 64Mb + 128Mb + 256Mb = 512Mb Because some processors only have 4 BATs and because some targets need DBATs for mapping other areas, the following patch will allow to modify _etext and data alignment. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc/mm/32s: add setibat() clearibat() and update_bats()Christophe Leroy1-0/+2
setibat() and clearibat() allows to manipulate IBATs independently of DBATs. update_bats() allows to update bats after init. This is done with MMU off. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc/kconfig: define PAGE_SHIFT inside KconfigChristophe Leroy1-11/+2
This patch defined CONFIG_PPC_PAGE_SHIFT in order to be able to use PAGE_SHIFT value inside Kconfig. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc/mmu: add is_strict_kernel_rwx() helperChristophe Leroy1-0/+11
Add a helper to know whether STRICT_KERNEL_RWX is enabled. This is based on rodata_enabled flag which is defined only when CONFIG_STRICT_KERNEL_RWX is selected. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc/32: add helper to write into segment registersChristophe Leroy1-0/+5
This patch add an helper which wraps 'mtsrin' instruction to write into segment registers. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23powerpc: sstep: Add tests for addc[.] instructionSandipan Das1-0/+1
This adds test cases for the addc[.] instruction. Signed-off-by: Sandipan Das <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-23Revert "powerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB handling"Michael Ellerman1-4/+4
This reverts commit 78ca1108b10927b3d068c8da91352b0f4cd01fc5. It is causing boot failures with qemu mac99 in at least some configurations.
2019-02-22Merge tag 'kvm-ppc-next-5.1-1' of ↵Paolo Bonzini3-2/+20
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-next PPC KVM update for 5.1 There are no major new features this time, just a collection of bug fixes and improvements in various areas, including machine check handling and context switching of protection-key-related registers.
2019-02-22Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras2-2/+3
This merges in the "ppc-kvm" topic branch of the powerpc tree to get a series of commits that touch both general arch/powerpc code and KVM code. These commits will be merged both via the KVM tree and the powerpc tree. Signed-off-by: Paul Mackerras <[email protected]>
2019-02-22powerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB handlingChristophe Leroy1-4/+4
For pages without _PAGE_USER, PP field is 00 For pages with _PAGE_USER, PP field is 10 for RW and 11 for RO. This patch sets _PAGE_USER to 0x002 and _PAGE_RW to 0x001 is order to simplify TLB handling by reducing amount of shifts. The location of _PAGE_PRESENT and _PAGE_HASHPTE doesn't matter as they are only SW related flags. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/6xx: Store PGDIR physical address in a SPRGChristophe Leroy1-0/+1
Use SPRN_SPRG2 to store the current thread PGDIR and avoid reading thread_struct.pgdir at every TLB miss. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/6xx: Don't use SPRN_SPRG2 for storing stack pointer while in RTASChristophe Leroy2-1/+3
When calling RTAS, the stack pointer is stored in SPRN_SPRG2 in order to be able to restore it in case of machine check in RTAS. As machine check is not a perfomance critical path, this patch frees SPRN_SPRG2 by using a field in thread struct instead. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc: simplify BDI switchChristophe Leroy1-0/+2
There is no reason to re-read each time the pointer at location 0xf0 as it is fixed and known. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exitPaul Mackerras1-0/+2
Commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug", 2017-07-21) added two calls to opal_slw_set_reg() inside pnv_cpu_offline(), with the aim of changing the LPCR value in the SLW image to disable wakeups from the decrementer while a CPU is offline. However, pnv_cpu_offline() gets called each time a secondary CPU thread is woken up to participate in running a KVM guest, that is, not just when a CPU is offlined. Since opal_slw_set_reg() is a very slow operation (with observed execution times around 20 milliseconds), this means that an offline secondary CPU can often be busy doing the opal_slw_set_reg() call when the primary CPU wants to grab all the secondary threads so that it can run a KVM guest. This leads to messages like "KVM: couldn't grab CPU n" being printed and guest execution failing. There is no need to reprogram the SLW image on every KVM guest entry and exit. So that we do it only when a CPU is really transitioning between online and offline, this moves the calls to pnv_program_cpu_hotplug_lpcr() into pnv_smp_cpu_kill_self(). Fixes: 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug") Cc: [email protected] # v4.14+ Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/book3s: Remove pgd/pud/pmd_set() interfacesAneesh Kumar K.V2-18/+4
When updating page tables, we need to make sure we fill the page table entry valid bits. We do this by or'ing in one of PGD/PUD/PMD_VAL_BITS. The page table 'set' interfaces allow updating the raw value of page table entries without setting the valid bits, so remove those interfaces to avoid incorrect usage in future. Signed-off-by: Aneesh Kumar K.V <[email protected]> [mpe: Reword commit message based on mailing list discussion] Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/eeh: Add eeh_force_recover to debugfsOliver O'Halloran1-0/+1
This patch adds a debugfs interface to force scheduling a recovery event. This can be used to recover a specific PE or schedule a "special" recovery even that checks for errors at the PHB level. To force a recovery of a normal PE, use: echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_recover To force a scan for broken PHBs: echo 'hwcheck' > /sys/kernel/debug/powerpc/eeh_force_recover Signed-off-by: Oliver O'Halloran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/eeh: Allow disabling recoveryOliver O'Halloran1-0/+1
Currently when we detect an error we automatically invoke the EEH recovery handler. This can be annoying when debugging EEH problems, or when working on EEH itself so this patch adds a debugfs knob that will prevent a recovery event from being queued up when an issue is detected. Signed-off-by: Oliver O'Halloran <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/pci: Add pci_find_controller_for_domain()Oliver O'Halloran1-0/+2
Add a helper to find the pci_controller structure based on the domain number / phb id. Signed-off-by: Oliver O'Halloran <[email protected]> Reviewed-by: Sam Bobroff <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/eeh_cache: Add a way to dump the EEH address cacheOliver O'Halloran1-0/+3
Adds a debugfs file that can be read to view the contents of the EEH address cache. This is pretty similar to the existing eeh_addr_cache_print() function, but that function is intended to debug issues inside of the kernel since it's #ifdef`ed out by default, and writes into the kernel log. Signed-off-by: Oliver O'Halloran <[email protected]> Reviewed-by: Sam Bobroff <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezesOliver O'Halloran1-1/+1
There's no need to the custom getter/setter functions so we should remove them in favour of using the generic one. While we're here, change the type of eeh_max_freeze to u32 and print the value in decimal rather than hex because printing it in hex makes no sense. Signed-off-by: Oliver O'Halloran <[email protected]> Reviewed-by: Sam Bobroff <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc: drop unused GENERIC_CSUM Kconfig itemChristophe Leroy1-4/+0
Commit d4fde568a34a ("powerpc/64: Use optimized checksum routines on little-endian") converted last powerpc user of GENERIC_CSUM. This patch does a final cleanup dropping the Kconfig GENERIC_CSUM option which is always 'n', and associated piece of code in asm/checksum.h Fixes: d4fde568a34a ("powerpc/64: Use optimized checksum routines on little-endian") Reported-by: Christoph Hellwig <[email protected]> Signed-off-by: Christophe Leroy <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-22powerpc/mm/hash: Increase vmalloc space to 512T with hash MMUMichael Ellerman1-9/+23
This patch updates the kernel non-linear virtual map to 512TB when we're built with 64K page size and are using the hash MMU. We allocate one context for the vmalloc region and hence the max virtual area size is limited by the context map size (512TB for 64K and 64TB for 4K page size). This patch fixes boot failures with large amounts of system RAM where we need large vmalloc space to handle per cpu allocations. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Tested-by: Aneesh Kumar K.V <[email protected]>
2019-02-22Merge branch 'topic/ppc-kvm' into nextMichael Ellerman2-2/+3
Merge commits we're sharing with kvm-ppc tree.
2019-02-21powerpc/64s: Better printing of machine check info for guest MCEsPaul Mackerras1-1/+1
This adds an "in_guest" parameter to machine_check_print_event_info() so that we can avoid trying to translate guest NIP values into symbolic form using the host kernel's symbol table. Reviewed-by: Aravinda Prasad <[email protected]> Reviewed-by: Mahesh Salgaonkar <[email protected]> Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-21KVM: PPC: Book3S HV: Simplify machine check handlingPaul Mackerras1-1/+2
This makes the handling of machine check interrupts that occur inside a guest simpler and more robust, with less done in assembler code and in real mode. Now, when a machine check occurs inside a guest, we always get the machine check event struct and put a copy in the vcpu struct for the vcpu where the machine check occurred. We no longer call machine_check_queue_event() from kvmppc_realmode_mc_power7(), because on POWER8, when a vcpu is running on an offline secondary thread and we call machine_check_queue_event(), that calls irq_work_queue(), which doesn't work because the CPU is offline, but instead triggers the WARN_ON(lazy_irq_pending()) in pnv_smp_cpu_kill_self() (which fires again and again because nothing clears the condition). All that machine_check_queue_event() actually does is to cause the event to be printed to the console. For a machine check occurring in the guest, we now print the event in kvmppc_handle_exit_hv() instead. The assembly code at label machine_check_realmode now just calls C code and then continues exiting the guest. We no longer either synthesize a machine check for the guest in assembly code or return to the guest without a machine check. The code in kvmppc_handle_exit_hv() is extended to handle the case where the guest is not FWNMI-capable. In that case we now always synthesize a machine check interrupt for the guest. Previously, if the host thinks it has recovered the machine check fully, it would return to the guest without any notification that the machine check had occurred. If the machine check was caused by some action of the guest (such as creating duplicate SLB entries), it is much better to tell the guest that it has caused a problem. Therefore we now always generate a machine check interrupt for guests that are not FWNMI-capable. Reviewed-by: Aravinda Prasad <[email protected]> Reviewed-by: Mahesh Salgaonkar <[email protected]> Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-21Merge branch 'topic/dma' into nextMichael Ellerman9-121/+33
Merge hch's big DMA rework series. This is in a topic branch in case he wants to merge it to minimise conflicts.
2019-02-20KVM: Call kvm_arch_memslots_updated() before updating memslotsSean Christopherson1-1/+1
kvm_arch_memslots_updated() is at this point in time an x86-specific hook for handling MMIO generation wraparound. x86 stashes 19 bits of the memslots generation number in its MMIO sptes in order to avoid full page fault walks for repeat faults on emulated MMIO addresses. Because only 19 bits are used, wrapping the MMIO generation number is possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that the generation has changed so that it can invalidate all MMIO sptes in case the effective MMIO generation has wrapped so as to avoid using a stale spte, e.g. a (very) old spte that was created with generation==0. Given that the purpose of kvm_arch_memslots_updated() is to prevent consuming stale entries, it needs to be called before the new generation is propagated to memslots. Invalidating the MMIO sptes after updating memslots means that there is a window where a vCPU could dereference the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO spte that was created with (pre-wrap) generation==0. Fixes: e59dbe09f8e6 ("KVM: Introduce kvm_arch_memslots_updated()") Cc: <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+2
Two easily resolvable overlapping change conflicts, one in TCP and one in the eBPF verifier. Signed-off-by: David S. Miller <[email protected]>
2019-02-19Merge branch 'fixes' into nextMichael Ellerman1-17/+9
There's a few important fixes in our fixes branch, in particular the pgd/pud_present() one, so merge it now.
2019-02-19KVM: PPC: Book3S HV: Add KVM stat largepages_[2M/1G]Suraj Jitindar Singh1-0/+2
This adds an entry to the kvm_stats_debugfs directory which provides the number of large (2M or 1G) pages which have been used to setup the guest mappings, for radix guests. Signed-off-by: Suraj Jitindar Singh <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-02-19KVM: PPC: Book3S: Allow XICS emulation to work in nested hosts using XIVEPaul Mackerras1-0/+14
Currently, the KVM code assumes that if the host kernel is using the XIVE interrupt controller (the new interrupt controller that first appeared in POWER9 systems), then the in-kernel XICS emulation will use the XIVE hardware to deliver interrupts to the guest. However, this only works when the host is running in hypervisor mode and has full access to all of the XIVE functionality. It doesn't work in any nested virtualization scenario, either with PR KVM or nested-HV KVM, because the XICS-on-XIVE code calls directly into the native-XIVE routines, which are not initialized and cannot function correctly because they use OPAL calls, and OPAL is not available in a guest. This means that using the in-kernel XICS emulation in a nested hypervisor that is using XIVE as its interrupt controller will cause a (nested) host kernel crash. To fix this, we change most of the places where the current code calls xive_enabled() to select between the XICS-on-XIVE emulation and the plain XICS emulation to call a new function, xics_on_xive(), which returns false in a guest. However, there is a further twist. The plain XICS emulation has some functions which are used in real mode and access the underlying XICS controller (the interrupt controller of the host) directly. In the case of a nested hypervisor, this means doing XICS hypercalls directly. When the nested host is using XIVE as its interrupt controller, these hypercalls will fail. Therefore this also adds checks in the places where the XICS emulation wants to access the underlying interrupt controller directly, and if that is XIVE, makes the code use the virtual mode fallback paths, which call generic kernel infrastructure rather than doing direct XICS access. Signed-off-by: Paul Mackerras <[email protected]> Reviewed-by: Cédric Le Goater <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-02-19KVM: PPC: Book3S PR: Add emulation for slbfee. instructionPaul Mackerras1-0/+1
Recent kernels, since commit e15a4fea4dee ("powerpc/64s/hash: Add some SLB debugging tests", 2018-10-03) use the slbfee. instruction, which PR KVM currently does not have code to emulate. Consequently recent kernels fail to boot under PR KVM. This adds emulation of slbfee., enabling these kernels to boot successfully. Signed-off-by: Paul Mackerras <[email protected]>
2019-02-18powerpc/dma: trim the fat from <asm/dma-mapping.h>Christoph Hellwig2-29/+10
There is no need to provide anything but get_arch_dma_ops to <linux/dma-mapping.h>. More the remaining declarations to <asm/iommu.h> and drop all the includes. Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Christian Zigotzky <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-18powerpc/dma: remove set_dma_offsetChristoph Hellwig1-6/+0
There is no good reason for this helper, just opencode it. Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Christian Zigotzky <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-18powerpc/dma: remove get_dma_offsetChristoph Hellwig2-18/+6
Just fold the calculation into __phys_to_dma/__dma_to_phys as those are the only places that should know about it. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Benjamin Herrenschmidt <[email protected]> Tested-by: Christian Zigotzky <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-18powerpc/dma: use the generic direct mapping bypassChristoph Hellwig2-12/+0
Now that we've switched all the powerpc nommu and swiotlb methods to use the generic dma_direct_* calls we can remove these ops vectors entirely and rely on the common direct mapping bypass that avoids indirect function calls entirely. This also allows to remove a whole lot of boilerplate code related to setting up these operations. Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Christian Zigotzky <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-02-18powerpc/dma: use the dma_direct mapping routinesChristoph Hellwig1-30/+0
Switch the streaming DMA mapping and ownership transfer methods to the functionally identical dma_direct_ versions. Factor the cache maintainance helpers into the form expected by the common code for that. Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Christian Zigotzky <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>