aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2019-04-30KVM: PPC: Book3S HV: XIVE: Prevent races when releasing devicePaul Mackerras2-16/+78
Now that we have the possibility of a XIVE or XICS-on-XIVE device being released while the VM is still running, we need to be careful about races and potential use-after-free bugs. Although the kvmppc_xive struct is not freed, but kept around for re-use, the kvmppc_xive_vcpu structs are freed, and they are used extensively in both the XIVE native and XICS-on-XIVE code. There are various ways in which XIVE code gets invoked: - VCPU entry and exit, which do push and pull operations on the XIVE hardware - one_reg get and set functions (vcpu->mutex is held) - XICS hypercalls (but only inside guest execution, not from kvmppc_pseries_do_hcall) - device creation calls (kvm->lock is held) - device callbacks - get/set attribute, mmap, pagefault, release/destroy - set_mapped/clr_mapped calls (kvm->lock is held) - connect_vcpu calls - debugfs file read callbacks Inside a device release function, we know that userspace cannot have an open file descriptor referring to the device, nor can it have any mmapped regions from the device. Therefore the device callbacks are excluded, as are the connect_vcpu calls (since they need a fd for the device). Further, since the caller holds the kvm->lock mutex, no other device creation calls or set/clr_mapped calls can be executing concurrently. To exclude VCPU execution and XICS hypercalls, we temporarily set kvm->arch.mmu_ready to 0. This forces any VCPU task that is trying to enter the guest to take the kvm->lock mutex, which is held by the caller of the release function. Then, sending an IPI to all other CPUs forces any VCPU currently executing in the guest to exit. Finally, we take the vcpu->mutex for each VCPU around the process of cleaning up and freeing its XIVE data structures, in order to exclude any one_reg get/set calls. To exclude the debugfs read callbacks, we just need to ensure that debugfs_remove is called before freeing any data structures. Once it returns we know that no CPU can be executing the callbacks (for our kvmppc_xive instance). Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' methodCédric Le Goater5-12/+101
When a P9 sPAPR VM boots, the CAS negotiation process determines which interrupt mode to use (XICS legacy or XIVE native) and invokes a machine reset to activate the chosen mode. We introduce 'release' methods for the XICS-on-XIVE and the XIVE native KVM devices which are called when the file descriptor of the device is closed after the TIMA and ESB pages have been unmapped. They perform the necessary cleanups : clear the vCPU interrupt presenters that could be attached and then destroy the device. The 'release' methods replace the 'destroy' methods as 'destroy' is not called anymore once 'release' is. Compatibility with older QEMU is nevertheless maintained. This is not considered as a safe operation as the vCPUs are still running and could be referencing the KVM device through their presenters. To protect the system from any breakage, the kvmppc_xive objects representing both KVM devices are now stored in an array under the VM. Allocation is performed on first usage and memory is freed only when the VM exits. [[email protected] - Moved freeing of xive structures to book3s.c, put it under #ifdef CONFIG_KVM_XICS.] Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation modeCédric Le Goater1-3/+4
Full support for the XIVE native exploitation mode is now available, advertise the capability KVM_CAP_PPC_IRQ_XIVE for guests running on PowerNV KVM Hypervisors only. Support for nested guests (pseries KVM Hypervisor) is not yet available. XIVE should also have been activated which is default setting on POWER9 systems running a recent Linux kernel. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: Add passthrough supportCédric Le Goater3-0/+65
The KVM XICS-over-XIVE device and the proposed KVM XIVE native device implement an IRQ space for the guest using the generic IPI interrupts of the XIVE IC controller. These interrupts are allocated at the OPAL level and "mapped" into the guest IRQ number space in the range 0-0x1FFF. Interrupt management is performed in the XIVE way: using loads and stores on the addresses of the XIVE IPI interrupt ESB pages. Both KVM devices share the same internal structure caching information on the interrupts, among which the xive_irq_data struct containing the addresses of the IPI ESB pages and an extra one in case of pass-through. The later contains the addresses of the ESB pages of the underlying HW controller interrupts, PHB4 in all cases for now. A guest, when running in the XICS legacy interrupt mode, lets the KVM XICS-over-XIVE device "handle" interrupt management, that is to perform the loads and stores on the addresses of the ESB pages of the guest interrupts. However, when running in XIVE native exploitation mode, the KVM XIVE native device exposes the interrupt ESB pages to the guest and lets the guest perform directly the loads and stores. The VMA exposing the ESB pages make use of a custom VM fault handler which role is to populate the VMA with appropriate pages. When a fault occurs, the guest IRQ number is deduced from the offset, and the ESB pages of associated XIVE IPI interrupt are inserted in the VMA (using the internal structure caching information on the interrupts). Supporting device passthrough in the guest running in XIVE native exploitation mode adds some extra refinements because the ESB pages of a different HW controller (PHB4) need to be exposed to the guest along with the initial IPI ESB pages of the XIVE IC controller. But the overall mechanic is the same. When the device HW irqs are mapped into or unmapped from the guest IRQ number space, the passthru_irq helpers, kvmppc_xive_set_mapped() and kvmppc_xive_clr_mapped(), are called to record or clear the passthrough interrupt information and to perform the switch. The approach taken by this patch is to clear the ESB pages of the guest IRQ number being mapped and let the VM fault handler repopulate. The handler will insert the ESB page corresponding to the HW interrupt of the device being passed-through or the initial IPI ESB page if the device is being removed. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: Add a mapping for the source ESB pagesCédric Le Goater2-0/+58
Each source is associated with an Event State Buffer (ESB) with a even/odd pair of pages which provides commands to manage the source: to trigger, to EOI, to turn off the source for instance. The custom VM fault handler will deduce the guest IRQ number from the offset of the fault, and the ESB page of the associated XIVE interrupt will be inserted into the VMA using the internal structure caching information on the interrupts. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: Add a TIMA mappingCédric Le Goater4-0/+53
Each thread has an associated Thread Interrupt Management context composed of a set of registers. These registers let the thread handle priority management and interrupt acknowledgment. The most important are : - Interrupt Pending Buffer (IPB) - Current Processor Priority (CPPR) - Notification Source Register (NSR) They are exposed to software in four different pages each proposing a view with a different privilege. The first page is for the physical thread context and the second for the hypervisor. Only the third (operating system) and the fourth (user level) are exposed the guest. A custom VM fault handler will populate the VMA with the appropriate pages, which should only be the OS page for now. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: Add get/set accessors for the VP XIVE stateCédric Le Goater4-0/+113
The state of the thread interrupt management registers needs to be collected for migration. These registers are cached under the 'xive_saved_state.w01' field of the VCPU when the VPCU context is pulled from the HW thread. An OPAL call retrieves the backup of the IPB register in the underlying XIVE NVT structure and merges it in the KVM state. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: Add a control to dirty the XIVE EQ pagesCédric Le Goater2-0/+86
When migration of a VM is initiated, a first copy of the RAM is transferred to the destination before the VM is stopped, but there is no guarantee that the EQ pages in which the event notifications are queued have not been modified. To make sure migration will capture a consistent memory state, the XIVE device should perform a XIVE quiesce sequence to stop the flow of event notifications and stabilize the EQs. This is the purpose of the KVM_DEV_XIVE_EQ_SYNC control which will also marks the EQ pages dirty to force their transfer. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: Add a control to sync the sourcesCédric Le Goater2-0/+37
This control will be used by the H_INT_SYNC hcall from QEMU to flush event notifications on the XIVE IC owning the source. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: Add a global reset controlCédric Le Goater2-0/+86
This control is to be used by the H_INT_RESET hcall from QEMU. Its purpose is to clear all configuration of the sources and EQs. This is necessary in case of a kexec (for a kdump kernel for instance) to make sure that no remaining configuration is left from the previous boot setup so that the new kernel can start safely from a clean state. The queue 7 is ignored when the XIVE device is configured to run in single escalation mode. Prio 7 is used by escalations. The XIVE VP is kept enabled as the vCPU is still active and connected to the XIVE device. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configurationCédric Le Goater5-6/+281
These controls will be used by the H_INT_SET_QUEUE_CONFIG and H_INT_GET_QUEUE_CONFIG hcalls from QEMU to configure the underlying Event Queue in the XIVE IC. They will also be used to restore the configuration of the XIVE EQs and to capture the internal run-time state of the EQs. Both 'get' and 'set' rely on an OPAL call to access the EQ toggle bit and EQ index which are updated by the XIVE IC when event notifications are enqueued in the EQ. The value of the guest physical address of the event queue is saved in the XIVE internal xive_q structure for later use. That is when migration needs to mark the EQ pages dirty to capture a consistent memory state of the VM. To be noted that H_INT_SET_QUEUE_CONFIG does not require the extra OPAL call setting the EQ toggle bit and EQ index to configure the EQ, but restoring the EQ state will. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: Add a control to configure a sourceCédric Le Goater4-2/+115
This control will be used by the H_INT_SET_SOURCE_CONFIG hcall from QEMU to configure the target of a source and also to restore the configuration of a source when migrating the VM. The XIVE source interrupt structure is extended with the value of the Effective Interrupt Source Number. The EISN is the interrupt number pushed in the event queue that the guest OS will use to dispatch events internally. Caching the EISN value in KVM eases the test when checking if a reconfiguration is indeed needed. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: add a control to initialize a sourceCédric Le Goater4-4/+125
The XIVE KVM device maintains a list of interrupt sources for the VM which are allocated in the pool of generic interrupts (IPIs) of the main XIVE IC controller. These are used for the CPU IPIs as well as for virtual device interrupts. The IRQ number space is defined by QEMU. The XIVE device reuses the source structures of the XICS-on-XIVE device for the source blocks (2-level tree) and for the source interrupts. Under XIVE native, the source interrupt caches mostly configuration information and is less used than under the XICS-on-XIVE device in which hcalls are still necessary at run-time. When a source is initialized in KVM, an IPI interrupt source is simply allocated at the OPAL level and then MASKED. KVM only needs to know about its type: LSI or MSI. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: XIVE: Introduce a new capability KVM_CAP_PPC_IRQ_XIVECédric Le Goater6-41/+258
The user interface exposes a new capability KVM_CAP_PPC_IRQ_XIVE to let QEMU connect the vCPU presenters to the XIVE KVM device if required. The capability is not advertised for now as the full support for the XIVE native exploitation mode is not yet available. When this is case, the capability will be advertised on PowerNV Hypervisors only. Nested guests (pseries KVM Hypervisor) are not supported. Internally, the interface to the new KVM device is protected with a new interrupt mode: KVMPPC_IRQ_XIVE. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation modeCédric Le Goater6-2/+198
This is the basic framework for the new KVM device supporting the XIVE native exploitation mode. The user interface exposes a new KVM device to be created by QEMU, only available when running on a L0 hypervisor. Support for nested guests is not available yet. The XIVE device reuses the device structure of the XICS-on-XIVE device as they have a lot in common. That could possibly change in the future if the need arise. Signed-off-by: Cédric Le Goater <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras11-20/+218
This merges in the ppc-kvm topic branch from the powerpc tree to get patches which touch both general powerpc code and KVM code, one of which is a prerequisite for following patches. Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: Save/restore vrsave register in kvmhv_p9_guest_entry()Suraj Jitindar Singh1-0/+2
On POWER9 and later processors where the host can schedule vcpus on a per thread basis, there is a streamlined entry path used when the guest is radix. This entry path saves/restores the fp and vr state in kvmhv_p9_guest_entry() by calling store_[fp/vr]_state() and load_[fp/vr]_state(). This is the same as the old entry path however the old entry path also saved/restored the VRSAVE register, which isn't done in the new entry path. This means that the vrsave register is now volatile across guest exit, which is an incorrect change in behaviour. Fix this by saving/restoring the vrsave register in kvmhv_p9_guest_entry(). This restores the old, correct, behaviour. Fixes: 95a6432ce9038 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Signed-off-by: Suraj Jitindar Singh <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: Flush TLB on secondary radix threadsPaul Mackerras4-63/+53
When running on POWER9 with kvm_hv.indep_threads_mode = N and the host in SMT1 mode, KVM will run guest VCPUs on offline secondary threads. If those guests are in radix mode, we fail to load the LPID and flush the TLB if necessary, leading to the guest crashing with an unsupported MMU fault. This arises from commit 9a4506e11b97 ("KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on", 2018-05-17), which didn't consider the case where indep_threads_mode = N. For simplicity, this makes the real-mode guest entry path flush the TLB in the same place for both radix and hash guests, as we did before 9a4506e11b97, though the code is now C code rather than assembly code. We also have the radix TLB flush open-coded rather than calling radix__local_flush_tlb_lpid_guest(), because the TLB flush can be called in real mode, and in real mode we don't want to invoke the tracepoint code. Fixes: 9a4506e11b97 ("KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on") Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: Move HPT guest TLB flushing to C codePaul Mackerras3-34/+35
This replaces assembler code in book3s_hv_rmhandlers.S that checks the kvm->arch.need_tlb_flush cpumask and optionally does a TLB flush with C code in book3s_hv_builtin.c. Note that unlike the radix version, the hash version doesn't do an explicit ERAT invalidation because we will invalidate and load up the SLB before entering the guest, and that will invalidate the ERAT. Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: Handle virtual mode in XIVE VCPU push codeSuraj Jitindar Singh1-11/+25
The code in book3s_hv_rmhandlers.S that pushes the XIVE virtual CPU context to the hardware currently assumes it is being called in real mode, which is usually true. There is however a path by which it can be executed in virtual mode, in the case where indep_threads_mode = N. A virtual CPU executing on an offline secondary thread can take a hypervisor interrupt in virtual mode and return from the kvmppc_hv_entry() call after the kvm_secondary_got_guest label. It is possible for it to be given another vcpu to execute before it gets to execute the stop instruction. In that case it will call kvmppc_hv_entry() for the second VCPU in virtual mode, and the XIVE vCPU push code will be executed in virtual mode. The result in that case will be a host crash due to an unexpected data storage interrupt caused by executing the stdcix instruction in virtual mode. This fixes it by adding a code path for virtual mode, which uses the virtual TIMA pointer and normal load/store instructions. [[email protected] - wrote patch description] Signed-off-by: Suraj Jitindar Singh <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: Fix XICS-on-XIVE H_IPI when priority = 0Paul Mackerras1-38/+40
This fixes a bug in the XICS emulation on POWER9 machines which is triggered by the guest doing a H_IPI with priority = 0 (the highest priority). What happens is that the notification interrupt arrives at the destination at priority zero. The loop in scan_interrupts() sees that a priority 0 interrupt is pending, but because xc->mfrr is zero, we break out of the loop before taking the notification interrupt out of the queue and EOI-ing it. (This doesn't happen when xc->mfrr != 0; in that case we process the priority-0 notification interrupt on the first iteration of the loop, and then break out of a subsequent iteration of the loop with hirq == XICS_IPI.) To fix this, we move the prio >= xc->mfrr check down to near the end of the loop. However, there are then some other things that need to be adjusted. Since we are potentially handling the notification interrupt and also delivering an IPI to the guest in the same loop iteration, we need to update pending and handle any q->pending_count value before the xc->mfrr check, rather than at the end of the loop. Also, we need to update the queue pointers when we have processed and EOI-ed the notification interrupt, since we may not do it later. Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30arm64: KVM: Fix perf cycle counter support for VHEAndrew Murray1-2/+9
The kvm_vcpu_pmu_{read,write}_evtype_direct functions do not handle the cycle counter use-case, this leads to inaccurate counts and a WARN message when using perf with the cycle counter (-e cycle). Let's fix this by adding a use case for pmccfiltr_el0. Fixes: 39e3406a090a ("arm64: KVM: Avoid isb's by using direct pmxevtyper sysreg") Reported-by: Suzuki K Poulose <[email protected]> Signed-off-by: Andrew Murray <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2019-04-30ARM: mvebu: drop return from void functionNicholas Mc Guire1-1/+0
The return statement is unnecessary here - so drop it. Signed-off-by: Nicholas Mc Guire <[email protected]> Signed-off-by: Gregory CLEMENT <[email protected]>
2019-04-30ARM: mvebu: prefix coprocessor operand with pStefan Agner2-2/+2
In every other instance where mrc is used the coprocessor operand is prefix with p (e.g. p15). Use the p prefix in this case too. This fixes a build issue when using LLVM's integrated assembler: arch/arm/mach-mvebu/coherency_ll.S:69:6: error: invalid operand for instruction mrc 15, 0, r3, cr0, cr0, 5 ^ arch/arm/mach-mvebu/pmsu_ll.S:19:6: error: invalid operand for instruction mrc 15, 0, r0, cr0, cr0, 5 @ get the CPU ID ^ Signed-off-by: Stefan Agner <[email protected]> Acked-by: Nicolas Pitre <[email protected]> Signed-off-by: Gregory CLEMENT <[email protected]>
2019-04-30ARM: mvebu: drop unnecessary labelStefan Agner1-1/+0
The label mvebu_boot_wa_start is not necessary and causes a build issue when building with LLVM's integrated assembler: AS arch/arm/mach-mvebu/pmsu_ll.o arch/arm/mach-mvebu/pmsu_ll.S:59:1: error: invalid symbol redefinition mvebu_boot_wa_start: ^ Drop the label. Signed-off-by: Stefan Agner <[email protected]> Acked-by: Nicolas Pitre <[email protected]> Signed-off-by: Gregory CLEMENT <[email protected]>
2019-04-30ARM: mvebu: fix a leaked reference by adding missing of_node_putWen Yang1-3/+8
The call to of_get_next_child returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage. Detected by coccinelle with the following warnings: ./arch/arm/mach-mvebu/pm-board.c:135:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 88, but without a corresponding object release within this functio Signed-off-by: Wen Yang <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Andrew Lunn <[email protected]> Cc: Gregory Clement <[email protected]> Cc: Sebastian Hesselbarth <[email protected]> Cc: Russell King <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Gregory CLEMENT <[email protected]>
2019-04-30Merge tag 'v5.1-rc7' into x86/mm, to pick up fixesIngo Molnar12-55/+100
Signed-off-by: Ingo Molnar <[email protected]>
2019-04-30KVM: PPC: Book3S HV: smb->smp comment fixupPalmer Dabbelt1-1/+1
I made the same typo when trying to grep for uses of smp_wmb and figured I might as well fix it. Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S: Allocate guest TCEs on demand tooAlexey Kardashevskiy4-33/+110
We already allocate hardware TCE tables in multiple levels and skip intermediate levels when we can, now it is a turn of the KVM TCE tables. Thankfully these are allocated already in 2 levels. This moves the table's last level allocation from the creating helper to kvmppc_tce_put() and kvm_spapr_tce_fault(). Since such allocation cannot be done in real mode, this creates a virtual mode version of kvmppc_tce_put() which handles allocations. This adds kvmppc_rm_ioba_validate() to do an additional test if the consequent kvmppc_tce_put() needs a page which has not been allocated; if this is the case, we bail out to virtual mode handlers. The allocations are protected by a new mutex as kvm->lock is not suitable for the task because the fault handler is called with the mmap_sem held but kvmhv_setup_mmu() locks kvm->lock and mmap_sem in the reverse order. Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: Avoid lockdep debugging in TCE realmode handlersAlexey Kardashevskiy3-33/+44
The kvmppc_tce_to_ua() helper is called from real and virtual modes and it works fine as long as CONFIG_DEBUG_LOCKDEP is not enabled. However if the lockdep debugging is on, the lockdep will most likely break in kvm_memslots() because of srcu_dereference_check() so we need to use PPC-own kvm_memslots_raw() which uses realmode safe rcu_dereference_raw_notrace(). This creates a realmode copy of kvmppc_tce_to_ua() which replaces kvm_memslots() with kvm_memslots_raw(). Since kvmppc_rm_tce_to_ua() becomes static and can only be used inside HV KVM, this moves it earlier under CONFIG_KVM_BOOK3S_HV_POSSIBLE. This moves truly virtual-mode kvmppc_tce_to_ua() to where it belongs and drops the prmap parameter which was never used in the virtual mode. Fixes: d3695aa4f452 ("KVM: PPC: Add support for multiple-TCE hcalls", 2016-02-15) Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: Fix lockdep warning when entering the guestAlexey Kardashevskiy1-7/+8
The trace_hardirqs_on() sets current->hardirqs_enabled and from here the lockdep assumes interrupts are enabled although they are remain disabled until the context switches to the guest. Consequent srcu_read_lock() checks the flags in rcu_lock_acquire(), observes disabled interrupts and prints a warning (see below). This moves trace_hardirqs_on/off closer to __kvmppc_vcore_entry to prevent lockdep from being confused. DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) WARNING: CPU: 16 PID: 8038 at kernel/locking/lockdep.c:4128 check_flags.part.25+0x224/0x280 [...] NIP [c000000000185b84] check_flags.part.25+0x224/0x280 LR [c000000000185b80] check_flags.part.25+0x220/0x280 Call Trace: [c000003fec253710] [c000000000185b80] check_flags.part.25+0x220/0x280 (unreliable) [c000003fec253780] [c000000000187ea4] lock_acquire+0x94/0x260 [c000003fec253840] [c00800001a1e9768] kvmppc_run_core+0xa60/0x1ab0 [kvm_hv] [c000003fec253a10] [c00800001a1ed944] kvmppc_vcpu_run_hv+0x73c/0xec0 [kvm_hv] [c000003fec253ae0] [c00800001a1095dc] kvmppc_vcpu_run+0x34/0x48 [kvm] [c000003fec253b00] [c00800001a1056bc] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm] [c000003fec253b90] [c00800001a0f3618] kvm_vcpu_ioctl+0x460/0x850 [kvm] [c000003fec253d00] [c00000000041c4f4] do_vfs_ioctl+0xe4/0x930 [c000003fec253db0] [c00000000041ce04] ksys_ioctl+0xc4/0x110 [c000003fec253e00] [c00000000041ce78] sys_ioctl+0x28/0x80 [c000003fec253e20] [c00000000000b5a4] system_call+0x5c/0x70 Instruction dump: 419e0034 3d220004 39291730 81290000 2f890000 409e0020 3c82ffc6 3c62ffc5 3884be70 386329c0 4bf6ea71 60000000 <0fe00000> 3c62ffc6 3863be90 4801273d irq event stamp: 1025 hardirqs last enabled at (1025): [<c00800001a1e9728>] kvmppc_run_core+0xa20/0x1ab0 [kvm_hv] hardirqs last disabled at (1024): [<c00800001a1e9358>] kvmppc_run_core+0x650/0x1ab0 [kvm_hv] softirqs last enabled at (0): [<c0000000000f1210>] copy_process.isra.4.part.5+0x5f0/0x1d00 softirqs last disabled at (0): [<0000000000000000>] (null) ---[ end trace 31180adcc848993e ]--- possible reason: unannotated irqs-off. irq event stamp: 1025 hardirqs last enabled at (1025): [<c00800001a1e9728>] kvmppc_run_core+0xa20/0x1ab0 [kvm_hv] hardirqs last disabled at (1024): [<c00800001a1e9358>] kvmppc_run_core+0x650/0x1ab0 [kvm_hv] softirqs last enabled at (0): [<c0000000000f1210>] copy_process.isra.4.part.5+0x5f0/0x1d00 softirqs last disabled at (0): [<0000000000000000>] (null) Fixes: 8b24e69fc47e ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry", 2017-06-26) Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: Implement real mode H_PAGE_INIT handlerSuraj Jitindar Singh3-1/+147
Implement a real mode handler for the H_CALL H_PAGE_INIT which can be used to zero or copy a guest page. The page is defined to be 4k and must be 4k aligned. The in-kernel real mode handler halves the time to handle this H_CALL compared to handling it in userspace for a hash guest. Signed-off-by: Suraj Jitindar Singh <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30KVM: PPC: Book3S HV: Implement virtual mode H_PAGE_INIT handlerSuraj Jitindar Singh1-0/+80
Implement a virtual mode handler for the H_CALL H_PAGE_INIT which can be used to zero or copy a guest page. The page is defined to be 4k and must be 4k aligned. The in-kernel handler halves the time to handle this H_CALL compared to handling it in userspace for a radix guest. Signed-off-by: Suraj Jitindar Singh <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>
2019-04-30powerpc/watchdog: Use hrtimers for per-CPU heartbeatNicholas Piggin1-41/+40
Using a jiffies timer creates a dependency on the tick_do_timer_cpu incrementing jiffies. If that CPU has locked up and jiffies is not incrementing, the watchdog heartbeat timer for all CPUs stops and creates false positives and confusing warnings on local CPUs, and also causes the SMP detector to stop, so the root cause is never detected. Fix this by using hrtimer based timers for the watchdog heartbeat, like the generic kernel hardlockup detector. Cc: Gautham R. Shenoy <[email protected]> Reported-by: Ravikumar Bangoria <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> Tested-by: Ravi Bangoria <[email protected]> Reported-by: Ravi Bangoria <[email protected]> Reviewed-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2019-04-29riscv: vdso: drop unnecessary cc-ldoptionNick Desaulniers1-1/+1
Towards the goal of removing cc-ldoption, it seems that --hash-style= was added to binutils 2.17.50.0.2 in 2006. The minimal required version of binutils for the kernel according to Documentation/process/changes.rst is 2.20. Link: https://gcc.gnu.org/ml/gcc/2007-01/msg01141.html Cc: [email protected] Suggested-by: Masahiro Yamada <[email protected]> Signed-off-by: Nick Desaulniers <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Palmer Dabbelt <[email protected]>
2019-04-29function_graph: Place ftrace_graph_entry_stub() prototype in ↵Steven Rostedt (VMware)2-2/+0
include/linux/ftrace.h ftrace_graph_entry_stub() is defined in generic code, its prototype should be in the generic header and not defined throughout architecture specific code in order to use it. Cc: Greentime Hu <[email protected]> Cc: Vincent Chen <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: [email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-04-29powerpc/powernv/npu: Use pci_dev_id() helperHeiner Kallweit1-8/+6
Use new helper pci_dev_id() to simplify the code. Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]>
2019-04-29y2038: Make CONFIG_64BIT_TIME unconditionalArnd Bergmann1-1/+1
As Stepan Golosunov points out, there is a small mistake in the get_timespec64() function in the kernel. It was originally added under the assumption that CONFIG_64BIT_TIME would get enabled on all 32-bit and 64-bit architectures, but when the conversion was done, it was only turned on for 32-bit ones. The effect is that the get_timespec64() function never clears the upper half of the tv_nsec field for 32-bit tasks in compat mode. Clearing this is required for POSIX compliant behavior of functions that pass a 'timespec' structure with a 64-bit tv_sec and a 32-bit tv_nsec, plus uninitialized padding. The easiest fix for linux-5.1 is to just make the Kconfig symbol unconditional, as it was originally intended. As a follow-up, the #ifdef CONFIG_64BIT_TIME can be removed completely.. Note: for native 32-bit mode, no change is needed, this works as designed and user space should never need to clear the upper 32 bits of the tv_nsec field, in or out of the kernel. Fixes: 00bf25d693e7 ("y2038: use time32 syscall names on 32-bit") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Joseph Myers <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Deepa Dinamani <[email protected]> Cc: Lukasz Majewski <[email protected]> Cc: Stepan Golosunov <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lkml.kernel.org/r/[email protected]
2019-04-29Merge tag 'bitmain-soc-5.2' of ↵Olof Johansson2-0/+211
git://git.kernel.org/pub/scm/linux/kernel/git/mani/linux-bitmain into arm/dt Bitmain SoC changes for v5.2: - Added GPIO support for BM1880 SoC based on Designware APB GPIO controller - Added GPIO line names for Sophon Edge board based on 96Boards CE specification for accessing GPIOs using line names from userspace tools like MRAA. - Added pinctrl node for BM1880 SoC as a child node of sctrl syscon node. - Added pinctrl support to UARTs exposed on the Sophon Edge board. * tag 'bitmain-soc-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/linux-bitmain: arm64: dts: bitmain: Add UART pinctrl support for Sophon Edge arm64: dts: bitmain: Add pinctrl support for BM1880 SoC arm64: dts: bitmain: Add GPIO Line names for Sophon Edge board arm64: dts: bitmain: Add GPIO support for BM1880 SoC Signed-off-by: Olof Johansson <[email protected]>
2019-04-29Merge tag 'v5.2-rockchip-defconfig32-1' of ↵Olof Johansson1-2/+13
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/defconfig Enable more options needed by Veyron Chromebooks. * tag 'v5.2-rockchip-defconfig32-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: ARM: multi_v7_defconfig: Enable missing drivers for supported Chromebooks Signed-off-by: Olof Johansson <[email protected]>
2019-04-29Merge tag 'v5.2-rockchip-soc32-1' of ↵Olof Johansson2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/soc Missing of_node_put and some added __init contants. * tag 'v5.2-rockchip-soc32-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: ARM: rockchip: add missing of_node_put in rockchip_smp_prepare_pmu ARM: rockchip: Mark pm-init functions __init Signed-off-by: Olof Johansson <[email protected]>
2019-04-29x86: make ZERO_PAGE() at least parse its argumentLinus Torvalds1-1/+1
This doesn't really do anything, but at least we now parse teh ZERO_PAGE() address argument so that we'll catch the most obvious errors in usage next time they'll happen. See commit 6a5c5d26c4c6 ("rdma: fix build errors on s390 and MIPS due to bad ZERO_PAGE use") what happens when we don't have any use of the macro argument at all. Signed-off-by: Linus Torvalds <[email protected]>
2019-04-29Merge tag 'mvebu-arm64-5.2-1' of git://git.infradead.org/linux-mvebu into ↵Olof Johansson1-0/+1
arm/defconfig mvebu arm64 for 5.2 (part 1) - Update the defconfig to enable the mv-xor driver found on the Armada 3700 * tag 'mvebu-arm64-5.2-1' of git://git.infradead.org/linux-mvebu: arm64: defconfig: enable mv-xor driver Signed-off-by: Olof Johansson <[email protected]>
2019-04-29Merge tag 'qcom-defconfig-for-5.2' of ↵Olof Johansson1-1/+12
git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into arm/defconfig Qualcomm ARM Based defconfig Updates for v5.2 * Enable options for LG Nexus 5 phone * tag 'qcom-defconfig-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux: ARM: qcom_defconfig: add options for LG Nexus 5 phone Signed-off-by: Olof Johansson <[email protected]>
2019-04-29Merge tag 'imx-dt64-5.2' of ↵Olof Johansson21-13/+2725
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/dt i.MX arm64 device tree update for 5.2: - Add initial i.MX8MM SoC and EVK board support. - Enable OPP table for cpufreq support on i.MX8MQ, i.MX8QXP and i.MX8MM. - A series from Andrey Smirnov to enable PCIe support for i.MX8MQ. - Add TMU (Thermal Management Unit) device on i.MX8MQ for managing thermal of CPU, GPU, and VPU. - Add SDMA and SAI2 devices for i.MX8MQ SoC and enable wm8524 audio support on EVK board. - Add LPUART, OCOTP and GPU devices for i.MX8MQ SoC. - Add initial i.MX8MQ based Zii Ultra board support - Add SCU general IRQ and watchdog support for i.MX8QXP. - Add audio related devices and PMU for LS1028A. - Enable SATA and cpuidle support for LX2160A. - Other small random updates. * tag 'imx-dt64-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: (41 commits) arm64: dts: lx2160a: add cpu idle support arm64: dts: imx8mq: fix GPU clock frequency arm64: dts: fsl: imx8mq-evk: link regulator to GPU domain arm64: dts: imx8mm: Add cpufreq properties arm64: dts: imx8qxp-mek: Add i2c1 with pca9646 arm64: dts: imx8qxp: enable scu general irq channel arm64: dts: imx8mq: add GPU node arm64: dts: imx: add Zii Ultra board support arm64: dts: imx8mq: fix higher CPU operating point arm64: dts: imx8mq-evk: Enable PCIE0 interface arm64: dts: imx8mq: Add nodes for PCIe IP blocks arm64: dts: imx8mq: Combine PCIE power domains arm64: dts: imx8mq: Add a node for SRC IP block arm64: dts: imx8mq: Mark iomuxc_gpr as i.MX6Q compatible arm64: dts: imx8qxp: Add lpuart1/lpuart2/lpuart3 nodes arm64: dts: lx2160a: add sata node support arm64: dts: ls1028a: Corrected the SATA ecc address arm64: dts: imx8mq: Change ahb clock for imx8mq arm64: dts: imx8mq: Fix the fsl,imx8mq-sdma compatible string arm64: dts: imx8qxp: add system controller watchdog support ... Signed-off-by: Olof Johansson <[email protected]>
2019-04-29Merge tag 'imx-soc-5.2' of ↵Olof Johansson1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/soc i.MX SoC update for 5.2: - Optimize i.MX6 cpuidle driver a little bit by omitting the unnecessary unmask of GINT for WAIT_CLOCKED mode. * tag 'imx-soc-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: imx6: cpuidle: omit the unnecessary unmask of GINT Signed-off-by: Olof Johansson <[email protected]>
2019-04-29Merge tag 'lpc32xx-dt-for-5.2' of ↵Olof Johansson3-18/+24
https://github.com/vzapolskiy/linux-lpc32xx into arm/dt ARM: lpc32xx: devicetree updates for v5.2 Here are the changes for ARM NXP LPC32xx devicetree files: * disabled I2S and MAC controllers by default, * set default #address-cells = <1> / #size-cells = <0> for SPI slaves, * fix notation of hexadecimal values, * switched lpc32xx.dtsi to SPDX license identifier. * tag 'lpc32xx-dt-for-5.2' of https://github.com/vzapolskiy/linux-lpc32xx: ARM: dts: lpc32xx: use SPDX license identifier ARM: dts: lpc32xx: add address and size cell values to SPI controller nodes ARM: dts: lpc32xx: disable MAC controller by default ARM: dts: lpc32xx: disable I2S controllers by default ARM: dts: lpc32xx: change hexadecimal values to lower case Signed-off-by: Olof Johansson <[email protected]>
2019-04-29Merge tag 'lpc32xx-soc-for-5.2' of ↵Olof Johansson1-39/+3
https://github.com/vzapolskiy/linux-lpc32xx into arm/soc ARM: lpc32xx: platform updates for v5.2 Here are the changes for ARM NXP LPC32xx platform files: * removed TEST_CLK_SEL setup out of common clock framework control, * unnecessary header files are removed from inclusion, * registration of SSP0 and SSP1 is removed as done through device tree, * switched the main platform file to SPDX license identifier. * tag 'lpc32xx-soc-for-5.2' of https://github.com/vzapolskiy/linux-lpc32xx: ARM: lpc32xx: use SPDX license identifier ARM: lpc32xx: remove platform data of SSP0 and SSP1 controllers ARM: lpc32xx: remove redundant included headers ARM: lpc32xx: stop overwriting TEST_CLK_SEL Signed-off-by: Olof Johansson <[email protected]>
2019-04-29arm64: mmap: Ensure file offset is treated as unsignedBoyang Zhou1-1/+1
The file offset argument to the arm64 sys_mmap() implementation is scaled from bytes to pages by shifting right by PAGE_SHIFT. Unfortunately, the offset is passed in as a signed 'off_t' type and therefore large offsets (i.e. with the top bit set) are incorrectly sign-extended by the shift. This has been observed to cause false mmap() failures when mapping GPU doorbells on an arm64 server part. Change the type of the file offset argument to sys_mmap() from 'off_t' to 'unsigned long' so that the shifting scales the value as expected. Cc: <[email protected]> Signed-off-by: Boyang Zhou <[email protected]> [will: rewrote commit message] Signed-off-by: Will Deacon <[email protected]>
2019-04-29arm64: Kconfig: Tidy up errata workaround help textWill Deacon1-16/+13
The nature of silicon errata means that the Kconfig help text for our various software workarounds has been written by many different people. Along the way, we've accumulated typos and inconsistencies which make the options needlessly difficult to read. Fix up minor issues with the help text. Signed-off-by: Will Deacon <[email protected]>