Age | Commit message (Collapse) | Author | Files | Lines |
|
Now that we have the possibility of a XIVE or XICS-on-XIVE device being
released while the VM is still running, we need to be careful about
races and potential use-after-free bugs. Although the kvmppc_xive
struct is not freed, but kept around for re-use, the kvmppc_xive_vcpu
structs are freed, and they are used extensively in both the XIVE native
and XICS-on-XIVE code.
There are various ways in which XIVE code gets invoked:
- VCPU entry and exit, which do push and pull operations on the XIVE hardware
- one_reg get and set functions (vcpu->mutex is held)
- XICS hypercalls (but only inside guest execution, not from
kvmppc_pseries_do_hcall)
- device creation calls (kvm->lock is held)
- device callbacks - get/set attribute, mmap, pagefault, release/destroy
- set_mapped/clr_mapped calls (kvm->lock is held)
- connect_vcpu calls
- debugfs file read callbacks
Inside a device release function, we know that userspace cannot have an
open file descriptor referring to the device, nor can it have any mmapped
regions from the device. Therefore the device callbacks are excluded,
as are the connect_vcpu calls (since they need a fd for the device).
Further, since the caller holds the kvm->lock mutex, no other device
creation calls or set/clr_mapped calls can be executing concurrently.
To exclude VCPU execution and XICS hypercalls, we temporarily set
kvm->arch.mmu_ready to 0. This forces any VCPU task that is trying to
enter the guest to take the kvm->lock mutex, which is held by the caller
of the release function. Then, sending an IPI to all other CPUs forces
any VCPU currently executing in the guest to exit.
Finally, we take the vcpu->mutex for each VCPU around the process of
cleaning up and freeing its XIVE data structures, in order to exclude
any one_reg get/set calls.
To exclude the debugfs read callbacks, we just need to ensure that
debugfs_remove is called before freeing any data structures. Once it
returns we know that no CPU can be executing the callbacks (for our
kvmppc_xive instance).
Signed-off-by: Paul Mackerras <[email protected]>
|
|
When a P9 sPAPR VM boots, the CAS negotiation process determines which
interrupt mode to use (XICS legacy or XIVE native) and invokes a
machine reset to activate the chosen mode.
We introduce 'release' methods for the XICS-on-XIVE and the XIVE
native KVM devices which are called when the file descriptor of the
device is closed after the TIMA and ESB pages have been unmapped.
They perform the necessary cleanups : clear the vCPU interrupt
presenters that could be attached and then destroy the device. The
'release' methods replace the 'destroy' methods as 'destroy' is not
called anymore once 'release' is. Compatibility with older QEMU is
nevertheless maintained.
This is not considered as a safe operation as the vCPUs are still
running and could be referencing the KVM device through their
presenters. To protect the system from any breakage, the kvmppc_xive
objects representing both KVM devices are now stored in an array under
the VM. Allocation is performed on first usage and memory is freed
only when the VM exits.
[[email protected] - Moved freeing of xive structures to book3s.c,
put it under #ifdef CONFIG_KVM_XICS.]
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Full support for the XIVE native exploitation mode is now available,
advertise the capability KVM_CAP_PPC_IRQ_XIVE for guests running on
PowerNV KVM Hypervisors only. Support for nested guests (pseries KVM
Hypervisor) is not yet available. XIVE should also have been activated
which is default setting on POWER9 systems running a recent Linux
kernel.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
The KVM XICS-over-XIVE device and the proposed KVM XIVE native device
implement an IRQ space for the guest using the generic IPI interrupts
of the XIVE IC controller. These interrupts are allocated at the OPAL
level and "mapped" into the guest IRQ number space in the range 0-0x1FFF.
Interrupt management is performed in the XIVE way: using loads and
stores on the addresses of the XIVE IPI interrupt ESB pages.
Both KVM devices share the same internal structure caching information
on the interrupts, among which the xive_irq_data struct containing the
addresses of the IPI ESB pages and an extra one in case of pass-through.
The later contains the addresses of the ESB pages of the underlying HW
controller interrupts, PHB4 in all cases for now.
A guest, when running in the XICS legacy interrupt mode, lets the KVM
XICS-over-XIVE device "handle" interrupt management, that is to
perform the loads and stores on the addresses of the ESB pages of the
guest interrupts. However, when running in XIVE native exploitation
mode, the KVM XIVE native device exposes the interrupt ESB pages to
the guest and lets the guest perform directly the loads and stores.
The VMA exposing the ESB pages make use of a custom VM fault handler
which role is to populate the VMA with appropriate pages. When a fault
occurs, the guest IRQ number is deduced from the offset, and the ESB
pages of associated XIVE IPI interrupt are inserted in the VMA (using
the internal structure caching information on the interrupts).
Supporting device passthrough in the guest running in XIVE native
exploitation mode adds some extra refinements because the ESB pages
of a different HW controller (PHB4) need to be exposed to the guest
along with the initial IPI ESB pages of the XIVE IC controller. But
the overall mechanic is the same.
When the device HW irqs are mapped into or unmapped from the guest
IRQ number space, the passthru_irq helpers, kvmppc_xive_set_mapped()
and kvmppc_xive_clr_mapped(), are called to record or clear the
passthrough interrupt information and to perform the switch.
The approach taken by this patch is to clear the ESB pages of the
guest IRQ number being mapped and let the VM fault handler repopulate.
The handler will insert the ESB page corresponding to the HW interrupt
of the device being passed-through or the initial IPI ESB page if the
device is being removed.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Each source is associated with an Event State Buffer (ESB) with a
even/odd pair of pages which provides commands to manage the source:
to trigger, to EOI, to turn off the source for instance.
The custom VM fault handler will deduce the guest IRQ number from the
offset of the fault, and the ESB page of the associated XIVE interrupt
will be inserted into the VMA using the internal structure caching
information on the interrupts.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Each thread has an associated Thread Interrupt Management context
composed of a set of registers. These registers let the thread handle
priority management and interrupt acknowledgment. The most important
are :
- Interrupt Pending Buffer (IPB)
- Current Processor Priority (CPPR)
- Notification Source Register (NSR)
They are exposed to software in four different pages each proposing a
view with a different privilege. The first page is for the physical
thread context and the second for the hypervisor. Only the third
(operating system) and the fourth (user level) are exposed the guest.
A custom VM fault handler will populate the VMA with the appropriate
pages, which should only be the OS page for now.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
The state of the thread interrupt management registers needs to be
collected for migration. These registers are cached under the
'xive_saved_state.w01' field of the VCPU when the VPCU context is
pulled from the HW thread. An OPAL call retrieves the backup of the
IPB register in the underlying XIVE NVT structure and merges it in the
KVM state.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
When migration of a VM is initiated, a first copy of the RAM is
transferred to the destination before the VM is stopped, but there is
no guarantee that the EQ pages in which the event notifications are
queued have not been modified.
To make sure migration will capture a consistent memory state, the
XIVE device should perform a XIVE quiesce sequence to stop the flow of
event notifications and stabilize the EQs. This is the purpose of the
KVM_DEV_XIVE_EQ_SYNC control which will also marks the EQ pages dirty
to force their transfer.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
This control will be used by the H_INT_SYNC hcall from QEMU to flush
event notifications on the XIVE IC owning the source.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
This control is to be used by the H_INT_RESET hcall from QEMU. Its
purpose is to clear all configuration of the sources and EQs. This is
necessary in case of a kexec (for a kdump kernel for instance) to make
sure that no remaining configuration is left from the previous boot
setup so that the new kernel can start safely from a clean state.
The queue 7 is ignored when the XIVE device is configured to run in
single escalation mode. Prio 7 is used by escalations.
The XIVE VP is kept enabled as the vCPU is still active and connected
to the XIVE device.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
These controls will be used by the H_INT_SET_QUEUE_CONFIG and
H_INT_GET_QUEUE_CONFIG hcalls from QEMU to configure the underlying
Event Queue in the XIVE IC. They will also be used to restore the
configuration of the XIVE EQs and to capture the internal run-time
state of the EQs. Both 'get' and 'set' rely on an OPAL call to access
the EQ toggle bit and EQ index which are updated by the XIVE IC when
event notifications are enqueued in the EQ.
The value of the guest physical address of the event queue is saved in
the XIVE internal xive_q structure for later use. That is when
migration needs to mark the EQ pages dirty to capture a consistent
memory state of the VM.
To be noted that H_INT_SET_QUEUE_CONFIG does not require the extra
OPAL call setting the EQ toggle bit and EQ index to configure the EQ,
but restoring the EQ state will.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
This control will be used by the H_INT_SET_SOURCE_CONFIG hcall from
QEMU to configure the target of a source and also to restore the
configuration of a source when migrating the VM.
The XIVE source interrupt structure is extended with the value of the
Effective Interrupt Source Number. The EISN is the interrupt number
pushed in the event queue that the guest OS will use to dispatch
events internally. Caching the EISN value in KVM eases the test when
checking if a reconfiguration is indeed needed.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
The XIVE KVM device maintains a list of interrupt sources for the VM
which are allocated in the pool of generic interrupts (IPIs) of the
main XIVE IC controller. These are used for the CPU IPIs as well as
for virtual device interrupts. The IRQ number space is defined by
QEMU.
The XIVE device reuses the source structures of the XICS-on-XIVE
device for the source blocks (2-level tree) and for the source
interrupts. Under XIVE native, the source interrupt caches mostly
configuration information and is less used than under the XICS-on-XIVE
device in which hcalls are still necessary at run-time.
When a source is initialized in KVM, an IPI interrupt source is simply
allocated at the OPAL level and then MASKED. KVM only needs to know
about its type: LSI or MSI.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
The user interface exposes a new capability KVM_CAP_PPC_IRQ_XIVE to
let QEMU connect the vCPU presenters to the XIVE KVM device if
required. The capability is not advertised for now as the full support
for the XIVE native exploitation mode is not yet available. When this
is case, the capability will be advertised on PowerNV Hypervisors
only. Nested guests (pseries KVM Hypervisor) are not supported.
Internally, the interface to the new KVM device is protected with a
new interrupt mode: KVMPPC_IRQ_XIVE.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
This is the basic framework for the new KVM device supporting the XIVE
native exploitation mode. The user interface exposes a new KVM device
to be created by QEMU, only available when running on a L0 hypervisor.
Support for nested guests is not available yet.
The XIVE device reuses the device structure of the XICS-on-XIVE device
as they have a lot in common. That could possibly change in the future
if the need arise.
Signed-off-by: Cédric Le Goater <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
This merges in the ppc-kvm topic branch from the powerpc tree to get
patches which touch both general powerpc code and KVM code, one of
which is a prerequisite for following patches.
Signed-off-by: Paul Mackerras <[email protected]>
|
|
On POWER9 and later processors where the host can schedule vcpus on a
per thread basis, there is a streamlined entry path used when the guest
is radix. This entry path saves/restores the fp and vr state in
kvmhv_p9_guest_entry() by calling store_[fp/vr]_state() and
load_[fp/vr]_state(). This is the same as the old entry path however the
old entry path also saved/restored the VRSAVE register, which isn't done
in the new entry path.
This means that the vrsave register is now volatile across guest exit,
which is an incorrect change in behaviour.
Fix this by saving/restoring the vrsave register in kvmhv_p9_guest_entry().
This restores the old, correct, behaviour.
Fixes: 95a6432ce9038 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
When running on POWER9 with kvm_hv.indep_threads_mode = N and the host
in SMT1 mode, KVM will run guest VCPUs on offline secondary threads.
If those guests are in radix mode, we fail to load the LPID and flush
the TLB if necessary, leading to the guest crashing with an
unsupported MMU fault. This arises from commit 9a4506e11b97 ("KVM:
PPC: Book3S HV: Make radix handle process scoped LPID flush in C,
with relocation on", 2018-05-17), which didn't consider the case
where indep_threads_mode = N.
For simplicity, this makes the real-mode guest entry path flush the
TLB in the same place for both radix and hash guests, as we did before
9a4506e11b97, though the code is now C code rather than assembly code.
We also have the radix TLB flush open-coded rather than calling
radix__local_flush_tlb_lpid_guest(), because the TLB flush can be
called in real mode, and in real mode we don't want to invoke the
tracepoint code.
Fixes: 9a4506e11b97 ("KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on")
Signed-off-by: Paul Mackerras <[email protected]>
|
|
This replaces assembler code in book3s_hv_rmhandlers.S that checks
the kvm->arch.need_tlb_flush cpumask and optionally does a TLB flush
with C code in book3s_hv_builtin.c. Note that unlike the radix
version, the hash version doesn't do an explicit ERAT invalidation
because we will invalidate and load up the SLB before entering the
guest, and that will invalidate the ERAT.
Signed-off-by: Paul Mackerras <[email protected]>
|
|
The code in book3s_hv_rmhandlers.S that pushes the XIVE virtual CPU
context to the hardware currently assumes it is being called in real
mode, which is usually true. There is however a path by which it can
be executed in virtual mode, in the case where indep_threads_mode = N.
A virtual CPU executing on an offline secondary thread can take a
hypervisor interrupt in virtual mode and return from the
kvmppc_hv_entry() call after the kvm_secondary_got_guest label.
It is possible for it to be given another vcpu to execute before it
gets to execute the stop instruction. In that case it will call
kvmppc_hv_entry() for the second VCPU in virtual mode, and the XIVE
vCPU push code will be executed in virtual mode. The result in that
case will be a host crash due to an unexpected data storage interrupt
caused by executing the stdcix instruction in virtual mode.
This fixes it by adding a code path for virtual mode, which uses the
virtual TIMA pointer and normal load/store instructions.
[[email protected] - wrote patch description]
Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
This fixes a bug in the XICS emulation on POWER9 machines which is
triggered by the guest doing a H_IPI with priority = 0 (the highest
priority). What happens is that the notification interrupt arrives
at the destination at priority zero. The loop in scan_interrupts()
sees that a priority 0 interrupt is pending, but because xc->mfrr is
zero, we break out of the loop before taking the notification
interrupt out of the queue and EOI-ing it. (This doesn't happen
when xc->mfrr != 0; in that case we process the priority-0 notification
interrupt on the first iteration of the loop, and then break out of
a subsequent iteration of the loop with hirq == XICS_IPI.)
To fix this, we move the prio >= xc->mfrr check down to near the end
of the loop. However, there are then some other things that need to
be adjusted. Since we are potentially handling the notification
interrupt and also delivering an IPI to the guest in the same loop
iteration, we need to update pending and handle any q->pending_count
value before the xc->mfrr check, rather than at the end of the loop.
Also, we need to update the queue pointers when we have processed and
EOI-ed the notification interrupt, since we may not do it later.
Signed-off-by: Paul Mackerras <[email protected]>
|
|
The kvm_vcpu_pmu_{read,write}_evtype_direct functions do not handle
the cycle counter use-case, this leads to inaccurate counts and a
WARN message when using perf with the cycle counter (-e cycle).
Let's fix this by adding a use case for pmccfiltr_el0.
Fixes: 39e3406a090a ("arm64: KVM: Avoid isb's by using direct pmxevtyper sysreg")
Reported-by: Suzuki K Poulose <[email protected]>
Signed-off-by: Andrew Murray <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
|
|
The return statement is unnecessary here - so drop it.
Signed-off-by: Nicholas Mc Guire <[email protected]>
Signed-off-by: Gregory CLEMENT <[email protected]>
|
|
In every other instance where mrc is used the coprocessor operand
is prefix with p (e.g. p15). Use the p prefix in this case too.
This fixes a build issue when using LLVM's integrated assembler:
arch/arm/mach-mvebu/coherency_ll.S:69:6: error: invalid operand for instruction
mrc 15, 0, r3, cr0, cr0, 5
^
arch/arm/mach-mvebu/pmsu_ll.S:19:6: error: invalid operand for instruction
mrc 15, 0, r0, cr0, cr0, 5 @ get the CPU ID
^
Signed-off-by: Stefan Agner <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>
Signed-off-by: Gregory CLEMENT <[email protected]>
|
|
The label mvebu_boot_wa_start is not necessary and causes a build
issue when building with LLVM's integrated assembler:
AS arch/arm/mach-mvebu/pmsu_ll.o
arch/arm/mach-mvebu/pmsu_ll.S:59:1: error: invalid symbol redefinition
mvebu_boot_wa_start:
^
Drop the label.
Signed-off-by: Stefan Agner <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>
Signed-off-by: Gregory CLEMENT <[email protected]>
|
|
The call to of_get_next_child returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./arch/arm/mach-mvebu/pm-board.c:135:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 88, but without a corresponding object release within this functio
Signed-off-by: Wen Yang <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Gregory Clement <[email protected]>
Cc: Sebastian Hesselbarth <[email protected]>
Cc: Russell King <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Gregory CLEMENT <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
I made the same typo when trying to grep for uses of smp_wmb and figured
I might as well fix it.
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
We already allocate hardware TCE tables in multiple levels and skip
intermediate levels when we can, now it is a turn of the KVM TCE tables.
Thankfully these are allocated already in 2 levels.
This moves the table's last level allocation from the creating helper to
kvmppc_tce_put() and kvm_spapr_tce_fault(). Since such allocation cannot
be done in real mode, this creates a virtual mode version of
kvmppc_tce_put() which handles allocations.
This adds kvmppc_rm_ioba_validate() to do an additional test if
the consequent kvmppc_tce_put() needs a page which has not been allocated;
if this is the case, we bail out to virtual mode handlers.
The allocations are protected by a new mutex as kvm->lock is not suitable
for the task because the fault handler is called with the mmap_sem held
but kvmhv_setup_mmu() locks kvm->lock and mmap_sem in the reverse order.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
The kvmppc_tce_to_ua() helper is called from real and virtual modes
and it works fine as long as CONFIG_DEBUG_LOCKDEP is not enabled.
However if the lockdep debugging is on, the lockdep will most likely break
in kvm_memslots() because of srcu_dereference_check() so we need to use
PPC-own kvm_memslots_raw() which uses realmode safe
rcu_dereference_raw_notrace().
This creates a realmode copy of kvmppc_tce_to_ua() which replaces
kvm_memslots() with kvm_memslots_raw().
Since kvmppc_rm_tce_to_ua() becomes static and can only be used inside
HV KVM, this moves it earlier under CONFIG_KVM_BOOK3S_HV_POSSIBLE.
This moves truly virtual-mode kvmppc_tce_to_ua() to where it belongs and
drops the prmap parameter which was never used in the virtual mode.
Fixes: d3695aa4f452 ("KVM: PPC: Add support for multiple-TCE hcalls", 2016-02-15)
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
The trace_hardirqs_on() sets current->hardirqs_enabled and from here
the lockdep assumes interrupts are enabled although they are remain
disabled until the context switches to the guest. Consequent
srcu_read_lock() checks the flags in rcu_lock_acquire(), observes
disabled interrupts and prints a warning (see below).
This moves trace_hardirqs_on/off closer to __kvmppc_vcore_entry to
prevent lockdep from being confused.
DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
WARNING: CPU: 16 PID: 8038 at kernel/locking/lockdep.c:4128 check_flags.part.25+0x224/0x280
[...]
NIP [c000000000185b84] check_flags.part.25+0x224/0x280
LR [c000000000185b80] check_flags.part.25+0x220/0x280
Call Trace:
[c000003fec253710] [c000000000185b80] check_flags.part.25+0x220/0x280 (unreliable)
[c000003fec253780] [c000000000187ea4] lock_acquire+0x94/0x260
[c000003fec253840] [c00800001a1e9768] kvmppc_run_core+0xa60/0x1ab0 [kvm_hv]
[c000003fec253a10] [c00800001a1ed944] kvmppc_vcpu_run_hv+0x73c/0xec0 [kvm_hv]
[c000003fec253ae0] [c00800001a1095dc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[c000003fec253b00] [c00800001a1056bc] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
[c000003fec253b90] [c00800001a0f3618] kvm_vcpu_ioctl+0x460/0x850 [kvm]
[c000003fec253d00] [c00000000041c4f4] do_vfs_ioctl+0xe4/0x930
[c000003fec253db0] [c00000000041ce04] ksys_ioctl+0xc4/0x110
[c000003fec253e00] [c00000000041ce78] sys_ioctl+0x28/0x80
[c000003fec253e20] [c00000000000b5a4] system_call+0x5c/0x70
Instruction dump:
419e0034 3d220004 39291730 81290000 2f890000 409e0020 3c82ffc6 3c62ffc5
3884be70 386329c0 4bf6ea71 60000000 <0fe00000> 3c62ffc6 3863be90 4801273d
irq event stamp: 1025
hardirqs last enabled at (1025): [<c00800001a1e9728>] kvmppc_run_core+0xa20/0x1ab0 [kvm_hv]
hardirqs last disabled at (1024): [<c00800001a1e9358>] kvmppc_run_core+0x650/0x1ab0 [kvm_hv]
softirqs last enabled at (0): [<c0000000000f1210>] copy_process.isra.4.part.5+0x5f0/0x1d00
softirqs last disabled at (0): [<0000000000000000>] (null)
---[ end trace 31180adcc848993e ]---
possible reason: unannotated irqs-off.
irq event stamp: 1025
hardirqs last enabled at (1025): [<c00800001a1e9728>] kvmppc_run_core+0xa20/0x1ab0 [kvm_hv]
hardirqs last disabled at (1024): [<c00800001a1e9358>] kvmppc_run_core+0x650/0x1ab0 [kvm_hv]
softirqs last enabled at (0): [<c0000000000f1210>] copy_process.isra.4.part.5+0x5f0/0x1d00
softirqs last disabled at (0): [<0000000000000000>] (null)
Fixes: 8b24e69fc47e ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry", 2017-06-26)
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Implement a real mode handler for the H_CALL H_PAGE_INIT which can be
used to zero or copy a guest page. The page is defined to be 4k and must
be 4k aligned.
The in-kernel real mode handler halves the time to handle this H_CALL
compared to handling it in userspace for a hash guest.
Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Implement a virtual mode handler for the H_CALL H_PAGE_INIT which can be
used to zero or copy a guest page. The page is defined to be 4k and must
be 4k aligned.
The in-kernel handler halves the time to handle this H_CALL compared to
handling it in userspace for a radix guest.
Signed-off-by: Suraj Jitindar Singh <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
|
|
Using a jiffies timer creates a dependency on the tick_do_timer_cpu
incrementing jiffies. If that CPU has locked up and jiffies is not
incrementing, the watchdog heartbeat timer for all CPUs stops and
creates false positives and confusing warnings on local CPUs, and
also causes the SMP detector to stop, so the root cause is never
detected.
Fix this by using hrtimer based timers for the watchdog heartbeat,
like the generic kernel hardlockup detector.
Cc: Gautham R. Shenoy <[email protected]>
Reported-by: Ravikumar Bangoria <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Tested-by: Ravi Bangoria <[email protected]>
Reported-by: Ravi Bangoria <[email protected]>
Reviewed-by: Gautham R. Shenoy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Towards the goal of removing cc-ldoption, it seems that --hash-style=
was added to binutils 2.17.50.0.2 in 2006. The minimal required version
of binutils for the kernel according to
Documentation/process/changes.rst is 2.20.
Link: https://gcc.gnu.org/ml/gcc/2007-01/msg01141.html
Cc: [email protected]
Suggested-by: Masahiro Yamada <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
include/linux/ftrace.h
ftrace_graph_entry_stub() is defined in generic code, its prototype should
be in the generic header and not defined throughout architecture specific
code in order to use it.
Cc: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: [email protected]
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
Use new helper pci_dev_id() to simplify the code.
Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
|
|
As Stepan Golosunov points out, there is a small mistake in the
get_timespec64() function in the kernel. It was originally added under the
assumption that CONFIG_64BIT_TIME would get enabled on all 32-bit and
64-bit architectures, but when the conversion was done, it was only turned
on for 32-bit ones.
The effect is that the get_timespec64() function never clears the upper
half of the tv_nsec field for 32-bit tasks in compat mode. Clearing this is
required for POSIX compliant behavior of functions that pass a 'timespec'
structure with a 64-bit tv_sec and a 32-bit tv_nsec, plus uninitialized
padding.
The easiest fix for linux-5.1 is to just make the Kconfig symbol
unconditional, as it was originally intended. As a follow-up, the #ifdef
CONFIG_64BIT_TIME can be removed completely..
Note: for native 32-bit mode, no change is needed, this works as
designed and user space should never need to clear the upper 32
bits of the tv_nsec field, in or out of the kernel.
Fixes: 00bf25d693e7 ("y2038: use time32 syscall names on 32-bit")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Joseph Myers <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Deepa Dinamani <[email protected]>
Cc: Lukasz Majewski <[email protected]>
Cc: Stepan Golosunov <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mani/linux-bitmain into arm/dt
Bitmain SoC changes for v5.2:
- Added GPIO support for BM1880 SoC based on Designware APB GPIO
controller
- Added GPIO line names for Sophon Edge board based on 96Boards CE
specification for accessing GPIOs using line names from userspace
tools like MRAA.
- Added pinctrl node for BM1880 SoC as a child node of sctrl syscon
node.
- Added pinctrl support to UARTs exposed on the Sophon Edge board.
* tag 'bitmain-soc-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/linux-bitmain:
arm64: dts: bitmain: Add UART pinctrl support for Sophon Edge
arm64: dts: bitmain: Add pinctrl support for BM1880 SoC
arm64: dts: bitmain: Add GPIO Line names for Sophon Edge board
arm64: dts: bitmain: Add GPIO support for BM1880 SoC
Signed-off-by: Olof Johansson <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/defconfig
Enable more options needed by Veyron Chromebooks.
* tag 'v5.2-rockchip-defconfig32-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
ARM: multi_v7_defconfig: Enable missing drivers for supported Chromebooks
Signed-off-by: Olof Johansson <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/soc
Missing of_node_put and some added __init contants.
* tag 'v5.2-rockchip-soc32-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
ARM: rockchip: add missing of_node_put in rockchip_smp_prepare_pmu
ARM: rockchip: Mark pm-init functions __init
Signed-off-by: Olof Johansson <[email protected]>
|
|
This doesn't really do anything, but at least we now parse teh
ZERO_PAGE() address argument so that we'll catch the most obvious errors
in usage next time they'll happen.
See commit 6a5c5d26c4c6 ("rdma: fix build errors on s390 and MIPS due to
bad ZERO_PAGE use") what happens when we don't have any use of the macro
argument at all.
Signed-off-by: Linus Torvalds <[email protected]>
|
|
arm/defconfig
mvebu arm64 for 5.2 (part 1)
- Update the defconfig to enable the mv-xor driver found on the
Armada 3700
* tag 'mvebu-arm64-5.2-1' of git://git.infradead.org/linux-mvebu:
arm64: defconfig: enable mv-xor driver
Signed-off-by: Olof Johansson <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into arm/defconfig
Qualcomm ARM Based defconfig Updates for v5.2
* Enable options for LG Nexus 5 phone
* tag 'qcom-defconfig-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux:
ARM: qcom_defconfig: add options for LG Nexus 5 phone
Signed-off-by: Olof Johansson <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/dt
i.MX arm64 device tree update for 5.2:
- Add initial i.MX8MM SoC and EVK board support.
- Enable OPP table for cpufreq support on i.MX8MQ, i.MX8QXP and
i.MX8MM.
- A series from Andrey Smirnov to enable PCIe support for i.MX8MQ.
- Add TMU (Thermal Management Unit) device on i.MX8MQ for managing
thermal of CPU, GPU, and VPU.
- Add SDMA and SAI2 devices for i.MX8MQ SoC and enable wm8524 audio
support on EVK board.
- Add LPUART, OCOTP and GPU devices for i.MX8MQ SoC.
- Add initial i.MX8MQ based Zii Ultra board support
- Add SCU general IRQ and watchdog support for i.MX8QXP.
- Add audio related devices and PMU for LS1028A.
- Enable SATA and cpuidle support for LX2160A.
- Other small random updates.
* tag 'imx-dt64-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: (41 commits)
arm64: dts: lx2160a: add cpu idle support
arm64: dts: imx8mq: fix GPU clock frequency
arm64: dts: fsl: imx8mq-evk: link regulator to GPU domain
arm64: dts: imx8mm: Add cpufreq properties
arm64: dts: imx8qxp-mek: Add i2c1 with pca9646
arm64: dts: imx8qxp: enable scu general irq channel
arm64: dts: imx8mq: add GPU node
arm64: dts: imx: add Zii Ultra board support
arm64: dts: imx8mq: fix higher CPU operating point
arm64: dts: imx8mq-evk: Enable PCIE0 interface
arm64: dts: imx8mq: Add nodes for PCIe IP blocks
arm64: dts: imx8mq: Combine PCIE power domains
arm64: dts: imx8mq: Add a node for SRC IP block
arm64: dts: imx8mq: Mark iomuxc_gpr as i.MX6Q compatible
arm64: dts: imx8qxp: Add lpuart1/lpuart2/lpuart3 nodes
arm64: dts: lx2160a: add sata node support
arm64: dts: ls1028a: Corrected the SATA ecc address
arm64: dts: imx8mq: Change ahb clock for imx8mq
arm64: dts: imx8mq: Fix the fsl,imx8mq-sdma compatible string
arm64: dts: imx8qxp: add system controller watchdog support
...
Signed-off-by: Olof Johansson <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/soc
i.MX SoC update for 5.2:
- Optimize i.MX6 cpuidle driver a little bit by omitting the
unnecessary unmask of GINT for WAIT_CLOCKED mode.
* tag 'imx-soc-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: imx6: cpuidle: omit the unnecessary unmask of GINT
Signed-off-by: Olof Johansson <[email protected]>
|
|
https://github.com/vzapolskiy/linux-lpc32xx into arm/dt
ARM: lpc32xx: devicetree updates for v5.2
Here are the changes for ARM NXP LPC32xx devicetree files:
* disabled I2S and MAC controllers by default,
* set default #address-cells = <1> / #size-cells = <0> for SPI slaves,
* fix notation of hexadecimal values,
* switched lpc32xx.dtsi to SPDX license identifier.
* tag 'lpc32xx-dt-for-5.2' of https://github.com/vzapolskiy/linux-lpc32xx:
ARM: dts: lpc32xx: use SPDX license identifier
ARM: dts: lpc32xx: add address and size cell values to SPI controller nodes
ARM: dts: lpc32xx: disable MAC controller by default
ARM: dts: lpc32xx: disable I2S controllers by default
ARM: dts: lpc32xx: change hexadecimal values to lower case
Signed-off-by: Olof Johansson <[email protected]>
|
|
https://github.com/vzapolskiy/linux-lpc32xx into arm/soc
ARM: lpc32xx: platform updates for v5.2
Here are the changes for ARM NXP LPC32xx platform files:
* removed TEST_CLK_SEL setup out of common clock framework control,
* unnecessary header files are removed from inclusion,
* registration of SSP0 and SSP1 is removed as done through device tree,
* switched the main platform file to SPDX license identifier.
* tag 'lpc32xx-soc-for-5.2' of https://github.com/vzapolskiy/linux-lpc32xx:
ARM: lpc32xx: use SPDX license identifier
ARM: lpc32xx: remove platform data of SSP0 and SSP1 controllers
ARM: lpc32xx: remove redundant included headers
ARM: lpc32xx: stop overwriting TEST_CLK_SEL
Signed-off-by: Olof Johansson <[email protected]>
|
|
The file offset argument to the arm64 sys_mmap() implementation is
scaled from bytes to pages by shifting right by PAGE_SHIFT.
Unfortunately, the offset is passed in as a signed 'off_t' type and
therefore large offsets (i.e. with the top bit set) are incorrectly
sign-extended by the shift. This has been observed to cause false mmap()
failures when mapping GPU doorbells on an arm64 server part.
Change the type of the file offset argument to sys_mmap() from 'off_t'
to 'unsigned long' so that the shifting scales the value as expected.
Cc: <[email protected]>
Signed-off-by: Boyang Zhou <[email protected]>
[will: rewrote commit message]
Signed-off-by: Will Deacon <[email protected]>
|
|
The nature of silicon errata means that the Kconfig help text for our
various software workarounds has been written by many different people.
Along the way, we've accumulated typos and inconsistencies which make
the options needlessly difficult to read.
Fix up minor issues with the help text.
Signed-off-by: Will Deacon <[email protected]>
|