aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/kvm
AgeCommit message (Collapse)AuthorFilesLines
2017-04-10powerpc/xive: Native exploitation of the XIVE interrupt controllerBenjamin Herrenschmidt1-0/+8
The XIVE interrupt controller is the new interrupt controller found in POWER9. It supports advanced virtualization capabilities among other things. Currently we use a set of firmware calls that simulate the old "XICS" interrupt controller but this is fairly inefficient. This adds the framework for using XIVE along with a native backend which OPAL for configuration. Later, a backend allowing the use in a KVM or PowerVM guest will also be provided. This disables some fast path for interrupts in KVM when XIVE is enabled as these rely on the firmware emulation code which is no longer available when the XIVE is used natively by Linux. A latter patch will make KVM also directly exploit the XIVE, thus recovering the lost performance (and more). Signed-off-by: Benjamin Herrenschmidt <[email protected]> [mpe: Fixup pr_xxx("XIVE:"...), don't split pr_xxx() strings, tweak Kconfig so XIVE_NATIVE selects XIVE and depends on POWERNV, fix build errors when SMP=n, fold in fixes from Ben: Don't call cpu_online() on an invalid CPU number Fix irq target selection returning out of bounds cpu# Extra sanity checks on cpu numbers ] Signed-off-by: Michael Ellerman <[email protected]>
2017-04-07kvm: make KVM_CAP_COALESCED_MMIO architecture agnosticPaolo Bonzini1-5/+0
Remove code from architecture files that can be moved to virt/kvm, since there is already common code for coalesced MMIO. Signed-off-by: Paolo Bonzini <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> [Removed a pointless 'break' after 'return'.] Signed-off-by: Radim Krčmář <[email protected]>
2017-04-06KVM: PPC: Book3S HV: Check for kmalloc errors in ioctlDan Carpenter1-0/+4
kzalloc() won't actually fail because sizeof(*resize) is small, but static checkers complain. Signed-off-by: Dan Carpenter <[email protected]> Acked-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-03-31powerpc/mm/hash: Support 68 bit VAAneesh Kumar K.V1-1/+7
Inorder to support large effective address range (512TB), we want to increase the virtual address bits to 68. But we do have platforms like p4 and p5 that can only do 65 bit VA. We support those platforms by limiting context bits on them to 16. The protovsid -> vsid conversion is verified to work with both 65 and 68 bit va values. I also documented the restrictions in a table format as part of code comments. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-03-31powerpc/mm/hash: Abstract context id allocation for KVMMichael Ellerman1-1/+1
KVM wants to be able to allocate an MMU context id, which it does currently by calling __init_new_context(). We're about to rework that code, so provide a wrapper for KVM so it can not worry about the details. Signed-off-by: Michael Ellerman <[email protected]>
2017-03-09power/mm: update pte_write and pte_wrprotect to handle savedwriteAneesh Kumar K.V2-2/+2
We use pte_write() to check whethwer the pte entry is writable. This is mostly used to later mark the pte read only if it is writable. The other use of pte_write() is to check whether the pte_entry is writable so that hardware page table entry can be marked accordingly. This is used in kvm where we look at qemu page table entry and update hardware hash page table for the guest with correct write enable bit. With the above, for the first usage we should also check the savedwrite bit so that we can correctly clear the savedwite bit. For the later, we add a new variant __pte_write(). With this we can revert write_protect_page part of 595cd8f256d2 ("mm/ksm: handle protnone saved writes when making page write protect"). But I left it as it is as an example code for savedwrite check. Fixes: c137a2757b886 ("powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write") Link: http://lkml.kernel.org/r/1488203787-17849-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-03-04Merge tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-6/+7
Pull more KVM updates from Radim Krčmář: "Second batch of KVM changes for the 4.11 merge window: PPC: - correct assumption about ASDR on POWER9 - fix MMIO emulation on POWER9 x86: - add a simple test for ioperm - cleanup TSS (going through KVM tree as the whole undertaking was caused by VMX's use of TSS) - fix nVMX interrupt delivery - fix some performance counters in the guest ... and two cleanup patches" * tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: Fix pending events injection x86/kvm/vmx: remove unused variable in segment_base() selftests/x86: Add a basic selftest for ioperm x86/asm: Tidy up TSS limit code kvm: convert kvm.users_count from atomic_t to refcount_t KVM: x86: never specify a sample period for virtualized in_tx_cp counters KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9 KVM: PPC: Book3S HV: Fix software walk of guest process page tables
2017-03-02sched/headers: Prepare to use <linux/rcuupdate.h> instead of ↵Ingo Molnar1-0/+1
<linux/rculist.h> in <linux/sched.h> We don't actually need the full rculist.h header in sched.h anymore, we will be able to include the smaller rcupdate.h header instead. But first update code that relied on the implicit header inclusion. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-02sched/headers: Prepare to remove the <linux/mm_types.h> dependency from ↵Ingo Molnar1-1/+1
<linux/sched.h> Update code that relied on sched.h including various MM types for them. This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/stat.h> We are going to split <linux/sched/stat.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/stat.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-02sched/headers: Prepare to move signal wakeup & sigpending methods from ↵Ingo Molnar2-1/+2
<linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/signal.h> We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-01KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9Paul Mackerras1-4/+4
In HPT mode on POWER9, the ASDR register is supposed to record segment information for hypervisor page faults. It turns out that POWER9 DD1 does not record the page size information in the ASDR for faults in guest real mode. We have the necessary information in memory already, so by moving the checks for real mode that already existed, we can use the in-memory copy. Since a load is likely to be faster than reading an SPR, we do this unconditionally (not just for POWER9 DD1). Signed-off-by: Paul Mackerras <[email protected]>
2017-03-01KVM: PPC: Book3S HV: Fix software walk of guest process page tablesPaul Mackerras1-2/+3
This fixes some bugs in the code that walks the guest's page tables. These bugs cause MMIO emulation to fail whenever the guest is in virtial mode (MMU on), leading to the guest hanging if it tried to access a virtio device. The first bug was that when reading the guest's process table, we were using the whole of arch->process_table, not just the field that contains the process table base address. The second bug was that the mask used when reading the process table entry to get the radix tree base address, RPDB_MASK, had the wrong value. Fixes: 9e04ba69beec ("KVM: PPC: Book3S HV: Add basic infrastructure for radix guests") Fixes: e99833448c5f ("powerpc/mm/radix: Add partition table format & callback") Signed-off-by: Paul Mackerras <[email protected]>
2017-02-24mm: cma_alloc: allow to specify GFP maskLucas Stach1-1/+2
Most users of this interface just want to use it with the default GFP_KERNEL flags, but for cases where DMA memory is allocated it may be called from a different context. No functional change yet, just passing through the flag to the underlying alloc_contig_range function. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Lucas Stach <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Radim Krcmar <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmfDave Jiang1-2/+2
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to take a vma and vmf parameter when the vma already resides in vmf. Remove the vma parameter to simplify things. [[email protected]: fix ARM build] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jan Kara <[email protected]> Cc: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-20Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini4-5/+17
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD Paul Mackerras writes: "Please do a pull from my kvm-ppc-next branch to get some fixes which I would like to have in 4.11. There are four small commits there; two are fixes for potential host crashes in the new HPT resizing code, and the other two are changes to printks to make KVM on PPC a little less noisy."
2017-02-18KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for nowPaul Mackerras2-1/+8
The new HPT resizing code added in commit b5baa6877315 ("KVM: PPC: Book3S HV: KVM-HV HPT resizing implementation", 2016-12-20) doesn't have code to handle the new HPTE format which POWER9 uses. Thus it would be best not to advertise it to userspace on POWER9 systems until it works properly. Also, since resize_hpt_rehash_hpte() contains BUG_ON() calls that could be hit on POWER9, let's prevent it from being called on POWER9 for now. Acked-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-02-17KVM: race-free exit from KVM_RUN without POSIX signalsPaolo Bonzini1-1/+5
The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick" a VCPU out of KVM_RUN through a POSIX signal. A signal is attached to a dummy signal handler; by blocking the signal outside KVM_RUN and unblocking it inside, this possible race is closed: VCPU thread service thread -------------------------------------------------------------- check flag set flag raise signal (signal handler does nothing) KVM_RUN However, one issue with KVM_SET_SIGNAL_MASK is that it has to take tsk->sighand->siglock on every KVM_RUN. This lock is often on a remote NUMA node, because it is on the node of a thread's creator. Taking this lock can be very expensive if there are many userspace exits (as is the case for SMP Windows VMs without Hyper-V reference time counter). As an alternative, we can put the flag directly in kvm_run so that KVM can see it: VCPU thread service thread -------------------------------------------------------------- raise signal signal handler set run->immediate_exit KVM_RUN check run->immediate_exit Reviewed-by: Radim Krčmář <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-02-17KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug messageThomas Huth1-2/+2
The average user likely does not know what a "htab" or "LPID" is, and it's annoying that these messages are quickly filling the dmesg log when you're doing boot cycle tests, so let's turn it into a debug message instead. Signed-off-by: Thomas Huth <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-02-17KVM: PPC: Book3S PR: Ratelimit copy data failure error messagesVipin K Parashar2-2/+4
kvm_ppc_mmu_book3s_32/64 xlat() logs "KVM can't copy data" error upon failing to copy user data to kernel space. This floods kernel log once such fails occur in short time period. Ratelimit this error to avoid flooding kernel logs upon copy data failures. Signed-off-by: Vipin K Parashar <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-02-16KVM: PPC: Book3S HV: Prevent double-free on HPT resize commit pathDavid Gibson1-0/+3
resize_hpt_release(), called once the HPT resize of a KVM guest is completed (successfully or unsuccessfully) frees the state structure for the resize. It is currently not safe to call with a NULL pointer. However, one of the error paths in kvm_vm_ioctl_resize_hpt_commit() can invoke it with a NULL pointer. This will occur if userspace improperly invokes KVM_PPC_RESIZE_HPT_COMMIT without previously calling KVM_PPC_RESIZE_HPT_PREPARE, or if it calls COMMIT twice without an intervening PREPARE. To fix this potential crash bug - and maybe others like it, make it safe (and a no-op) to call resize_hpt_release() with a NULL resize pointer. Found by Dan Carpenter with a static checker. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-02-09KVM: PPC: Book 3S: Fix error return in kvm_vm_ioctl_create_spapr_tce()Wei Yongjun1-0/+1
Fix to return error code -ENOMEM from the memory alloc error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-02-08Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras2-30/+14
This merges in a fix which touches both PPC and KVM code, which was therefore put into a topic branch in the powerpc tree. Signed-off-by: Paul Mackerras <[email protected]>
2017-02-07powerpc/powernv: Remove separate entry for OPAL real mode callsBenjamin Herrenschmidt2-30/+14
All entry points already read the MSR so they can easily do the right thing. Signed-off-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Advertise availablity of HPT resizing on KVM HVDavid Gibson1-0/+3
This updates the KVM_CAP_SPAPR_RESIZE_HPT capability to advertise the presence of in-kernel HPT resizing on KVM HV. Signed-off-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-01-31KVM: PPC: Book3S HV: KVM-HV HPT resizing implementationDavid Gibson1-1/+187
This adds the "guts" of the implementation for the HPT resizing PAPR extension. It has the code to allocate and clear a new HPT, rehash an existing HPT's entries into it, and accomplish the switchover for a KVM guest from the old HPT to the new one. Signed-off-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Outline of KVM-HV HPT resizing implementationDavid Gibson2-0/+216
This adds a not yet working outline of the HPT resizing PAPR extension. Specifically it adds the necessary ioctl() functions, their basic steps, the work function which will handle preparation for the resize, and synchronization between these, the guest page fault path and guest HPT update path. The actual guts of the implementation isn't here yet, so for now the calls will always fail. Signed-off-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Create kvmppc_unmap_hpte_helper()David Gibson1-33/+44
The kvm_unmap_rmapp() function, called from certain MMU notifiers, is used to force all guest mappings of a particular host page to be set ABSENT, and removed from the reverse mappings. For HPT resizing, we will have some cases where we want to set just a single guest HPTE ABSENT and remove its reverse mappings. To prepare with this, we split out the logic from kvm_unmap_rmapp() to evict a single HPTE, moving it to a new helper function. Signed-off-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT sizeDavid Gibson2-17/+17
The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page table (HPT) that userspace expects a guest VM to have, and is also used to clear that HPT when necessary (e.g. guest reboot). At present, once the ioctl() is called for the first time, the HPT size can never be changed thereafter - it will be cleared but always sized as from the first call. With upcoming HPT resize implementation, we're going to need to allow userspace to resize the HPT at reset (to change it back to the default size if the guest changed it). So, we need to allow this ioctl() to change the HPT size. This patch also updates Documentation/virtual/kvm/api.txt to reflect the new behaviour. In fact the documentation was already slightly incorrect since 572abd5 "KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl" Signed-off-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Split HPT allocation from activationDavid Gibson2-53/+61
Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT) and sets it up as the active page table for a VM. For the upcoming HPT resize implementation we're going to want to allocate HPTs separately from activating them. So, split the allocation itself out into kvmppc_allocate_hpt() and perform the activation with a new kvmppc_set_hpt() function. Likewise we split kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt() which unsets it as an active HPT, then frees it. We also move the logic to fall back to smaller HPT sizes if the first try fails into the single caller which used that behaviour, kvmppc_hv_setup_htab_rma(). This introduces a slight semantic change, in that previously if the initial attempt at CMA allocation failed, we would fall back to attempting smaller sizes with the page allocator. Now, we try first CMA, then the page allocator at each size. As far as I can tell this change should be harmless. To match, we make kvmppc_free_hpt() just free the actual HPT itself. The call to kvmppc_free_lpid() that was there, we move to the single caller. Signed-off-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Don't store values derivable from HPT orderDavid Gibson2-24/+22
Currently the kvm_hpt_info structure stores the hashed page table's order, and also the number of HPTEs it contains and a mask for its size. The last two can be easily derived from the order, so remove them and just calculate them as necessary with a couple of helper inlines. Signed-off-by: David Gibson <[email protected]> Reviewed-by: Thomas Huth <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Gather HPT related variables into sub-structureDavid Gibson3-78/+78
Currently, the powerpc kvm_arch structure contains a number of variables tracking the state of the guest's hashed page table (HPT) in KVM HV. This patch gathers them all together into a single kvm_hpt_info substructure. This makes life more convenient for the upcoming HPT resizing implementation. Signed-off-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Rename kvm_alloc_hpt() for clarityDavid Gibson2-8/+8
The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at all obvious from the name. In practice kvmppc_alloc_hpt() allocates an HPT by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate it with CMA only. To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma(). Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma(). Signed-off-by: David Gibson <[email protected]> Reviewed-by: Thomas Huth <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2017-01-31Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras11-121/+1203
This merges in the POWER9 radix MMU host and guest support, which was put into a topic branch because it touches both powerpc and KVM code. Signed-off-by: Paul Mackerras <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Enable radix guest supportPaul Mackerras4-27/+109
This adds a few last pieces of the support for radix guests: * Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and KVM_PPC_GET_RMMU_INFO ioctls for radix guests * On POWER9, allow secondary threads to be on/off-lined while guests are running. * Set up LPCR and the partition table entry for radix guests. * Don't allocate the rmap array in the kvm_memory_slot structure on radix. * Don't try to initialize the HPT for radix guests, since they don't have an HPT. * Take out the code that prevents the HV KVM module from initializing on radix hosts. At this stage, we only support radix guests if the host is running in radix mode, and only support HPT guests if the host is running in HPT mode. Thus a guest cannot switch from one mode to the other, which enables some simplifications. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Invalidate ERAT on guest entry/exit for POWER9 DD1Paul Mackerras1-0/+6
On POWER9 DD1, we need to invalidate the ERAT (effective to real address translation cache) when changing the PIDR register, which we do as part of guest entry and exit. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Allow guest exit path to have MMU onPaul Mackerras3-17/+58
If we allow LPCR[AIL] to be set for radix guests, then interrupts from the guest to the host can be delivered by the hardware with relocation on, and thus the code path starting at kvmppc_interrupt_hv can be executed in virtual mode (MMU on) for radix guests (previously it was only ever executed in real mode). Most of the code is indifferent to whether the MMU is on or off, but the calls to OPAL that use the real-mode OPAL entry code need to be switched to use the virtual-mode code instead. The affected calls are the calls to the OPAL XICS emulation functions in kvmppc_read_one_intr() and related functions. We test the MSR[IR] bit to detect whether we are in real or virtual mode, and call the opal_rm_* or opal_* function as appropriate. The other place that depends on the MMU being off is the optimization where the guest exit code jumps to the external interrupt vector or hypervisor doorbell interrupt vector, or returns to its caller (which is __kvmppc_vcore_entry). If the MMU is on and we are returning to the caller, then we don't need to use an rfid instruction since the MMU is already on; a simple blr suffices. If there is an external or hypervisor doorbell interrupt to handle, we branch to the relocation-on version of the interrupt vector. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movementPaul Mackerras3-14/+80
With radix, the guest can do TLB invalidations itself using the tlbie (global) and tlbiel (local) TLB invalidation instructions. Linux guests use local TLB invalidations for translations that have only ever been accessed on one vcpu. However, that doesn't mean that the translations have only been accessed on one physical cpu (pcpu) since vcpus can move around from one pcpu to another. Thus a tlbiel might leave behind stale TLB entries on a pcpu where the vcpu previously ran, and if that task then moves back to that previous pcpu, it could see those stale TLB entries and thus access memory incorrectly. The usual symptom of this is random segfaults in userspace programs in the guest. To cope with this, we detect when a vcpu is about to start executing on a thread in a core that is a different core from the last time it executed. If that is the case, then we mark the core as needing a TLB flush and then send an interrupt to any thread in the core that is currently running a vcpu from the same guest. This will get those vcpus out of the guest, and the first one to re-enter the guest will do the TLB flush. The reason for interrupting the vcpus executing on the old core is to cope with the following scenario: CPU 0 CPU 1 CPU 4 (core 0) (core 0) (core 1) VCPU 0 runs task X VCPU 1 runs core 0 TLB gets entries from task X VCPU 0 moves to CPU 4 VCPU 0 runs task X Unmap pages of task X tlbiel (still VCPU 1) task X moves to VCPU 1 task X runs task X sees stale TLB entries That is, as soon as the VCPU starts executing on the new core, it could unmap and tlbiel some page table entries, and then the task could migrate to one of the VCPUs running on the old core and potentially see stale TLB entries. Since the TLB is shared between all the threads in a core, we only use the bit of kvm->arch.need_tlb_flush corresponding to the first thread in the core. To ensure that we don't have a window where we can miss a flush, this moves the clearing of the bit from before the actual flush to after it. This way, two threads might both do the flush, but we prevent the situation where one thread can enter the guest before the flush is finished. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix modePaul Mackerras1-0/+14
If the guest is in radix mode, then it doesn't have a hashed page table (HPT), so all of the hypercalls that manipulate the HPT can't work and should return an error. This adds checks to make them return H_FUNCTION ("function not supported"). Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Implement dirty page logging for radix guestsPaul Mackerras3-32/+138
This adds code to keep track of dirty pages when requested (that is, when memslot->dirty_bitmap is non-NULL) for radix guests. We use the dirty bits in the PTEs in the second-level (partition-scoped) page tables, together with a bitmap of pages that were dirty when their PTE was invalidated (e.g., when the page was paged out). This bitmap is stored in the first half of the memslot->dirty_bitmap area, and kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the bitmap that gets returned to userspace. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: MMU notifier callbacks for radix guestsPaul Mackerras2-21/+97
This adapts our implementations of the MMU notifier callbacks (unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva) to call radix functions when the guest is using radix. These implementations are much simpler than for HPT guests because we have only one PTE to deal with, so we don't need to traverse rmap chains. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Page table construction and page faults for radix guestsPaul Mackerras4-3/+407
This adds the code to construct the second-level ("partition-scoped" in architecturese) page tables for guests using the radix MMU. Apart from the PGD level, which is allocated when the guest is created, the rest of the tree is all constructed in response to hypervisor page faults. As well as hypervisor page faults for missing pages, we also get faults for reference/change (RC) bits needing to be set, as well as various other error conditions. For now, we only set the R or C bit in the guest page table if the same bit is set in the host PTE for the backing page. This code can take advantage of the guest being backed with either transparent or ordinary 2MB huge pages, and insert 2MB page entries into the guest page tables. There is no support for 1GB huge pages yet. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle radix guestsPaul Mackerras1-11/+46
This adds code to branch around the parts that radix guests don't need - clearing and loading the SLB with the guest SLB contents, saving the guest SLB contents on exit, and restoring the host SLB contents. Since the host is now using radix, we need to save and restore the host value for the PID register. On hypervisor data/instruction storage interrupts, we don't do the guest HPT lookup on radix, but just save the guest physical address for the fault (from the ASDR register) in the vcpu struct. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Add basic infrastructure for radix guestsPaul Mackerras3-3/+149
This adds a field in struct kvm_arch and an inline helper to indicate whether a guest is a radix guest or not, plus a new file to contain the radix MMU code, which currently contains just a translate function which knows how to traverse the guest page tables to translate an address. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9Paul Mackerras1-0/+8
POWER9 adds a register called ASDR (Access Segment Descriptor Register), which is set by hypervisor data/instruction storage interrupts to contain the segment descriptor for the address being accessed, assuming the guest is using HPT translation. (For radix guests, it contains the guest real address of the access.) Thus, for HPT guests on POWER9, we can use this register rather than looking up the SLB with the slbfee. instruction. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9Paul Mackerras2-5/+32
This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl for HPT guests on POWER9. With this, we can return 1 for the KVM_CAP_PPC_MMU_HASH_V3 capability. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMUPaul Mackerras2-0/+45
This adds two capabilities and two ioctls to allow userspace to find out about and configure the POWER9 MMU in a guest. The two capabilities tell userspace whether KVM can support a guest using the radix MMU, or using the hashed page table (HPT) MMU with a process table and segment tables. (Note that the MMUs in the POWER9 processor cores do not use the process and segment tables when in HPT mode, but the nest MMU does). The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify whether a guest will use the radix MMU or the HPT MMU, and to specify the size and location (in guest space) of the process table. The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about the radix MMU. It returns a list of supported radix tree geometries (base page size and number of bits indexed at each level of the radix tree) and the encoding used to specify the various page sizes for the TLB invalidate entry instruction. Initially, both capabilities return 0 and the ioctls return -EINVAL, until the necessary infrastructure for them to operate correctly is added. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-31KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interruptsNicholas Piggin2-3/+16
64-bit Book3S exception handlers must find the dynamic kernel base to add to the target address when branching beyond __end_interrupts, in order to support kernel running at non-0 physical address. Support this in KVM by branching with CTR, similarly to regular interrupt handlers. The guest CTR saved in HSTATE_SCRATCH1 and restored after the branch. Without this, the host kernel hangs and crashes randomly when it is running at a non-0 address and a KVM guest is started. Signed-off-by: Nicholas Piggin <[email protected]> Acked-by: Paul Mackerras <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-01-27KVM: PPC: Book3S PR: Refactor program interrupt related code into separate ↵Thomas Huth1-65/+65
function The function kvmppc_handle_exit_pr() is quite huge and thus hard to read, and even contains a "spaghetti-code"-like goto between the different case labels of the big switch statement. This can be made much more readable by moving the code related to injecting program interrupts / instruction emulation into a separate function instead. Signed-off-by: Thomas Huth <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>