aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2016-03-09Merge branch 'ib-mfd-regulator-gpio-4.6' of ↵Linus Walleij82-442/+598
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into devel
2016-03-09arm64: KVM: vgic-v3: Only wipe LRs on vcpu exitMarc Zyngier1-5/+4
So far, we're always writing all possible LRs, setting the empty ones with a zero value. This is obvious doing a low of work for nothing, and we're better off clearing those we've actually dirtied on the exit path (it is very rare to inject more than one interrupt at a time anyway). Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2016-03-09arm64: KVM: vgic-v3: Reset LRs at boot timeMarc Zyngier2-0/+10
In order to let the GICv3 code be more lazy in the way it accesses the LRs, it is necessary to start with a clean slate. Let's reset the LRs on each CPU when the vgic is probed (which includes a round trip to EL2...). Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2016-03-09arm64: KVM: vgic-v3: Do not save an LR known to be emptyMarc Zyngier1-2/+9
On exit, any empty LR will be signaled in ICH_ELRSR_EL2. Which means that we do not have to save it, and we can just clear its state in the in-memory copy. Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2016-03-09arm64: KVM: vgic-v3: Save maintenance interrupt state only if requiredMarc Zyngier1-2/+31
Next on our list of useless accesses is the maintenance interrupt status registers (ICH_MISR_EL2, ICH_EISR_EL2). It is pointless to save them if we haven't asked for a maintenance interrupt the first place, which can only happen for two reasons: - Underflow: ICH_HCR_UIE will be set, - EOI: ICH_LR_EOI will be set. These conditions can be checked on the in-memory copies of the regs. Should any of these two condition be valid, we must read GICH_MISR. We can then check for ICH_MISR_EOI, and only when set read ICH_EISR_EL2. This means that in most case, we don't have to save them at all. Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2016-03-09arm64: KVM: vgic-v3: Avoid accessing ICH registersMarc Zyngier1-113/+180
Just like on GICv2, we're a bit hammer-happy with GICv3, and access them more often than we should. Adopt a policy similar to what we do for GICv2, only save/restoring the minimal set of registers. As we don't access the registers linearly anymore (we may skip some), the convoluted accessors become slightly simpler, and we can drop the ugly indexing macro that tended to confuse the reviewers. Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2016-03-09powerpc: New possible return value from hcallChristophe Lombard1-0/+1
The hcalls introduced for cxl use a possible new value: H_STATE (invalid state). Co-authored-by: Frederic Barrat <[email protected]> Signed-off-by: Frederic Barrat <[email protected]> Signed-off-by: Christophe Lombard <[email protected]> Reviewed-by: Manoj Kumar <[email protected]> Acked-by: Ian Munsie <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/eeh: eeh_pci_enable(): fix checking of post-request stateAndrew Donnellan1-1/+1
In eeh_pci_enable(), after making the request to set the new options, we call eeh_ops->wait_state() to check that the request finished successfully. At the moment, if eeh_ops->wait_state() returns 0, we return 0 without checking that it reflects the expected outcome. This can lead to callers further up the chain incorrectly assuming the slot has been successfully unfrozen and continuing to attempt recovery. On powernv, this will occur if pnv_eeh_get_pe_state() or pnv_eeh_get_phb_state() return 0, which in turn occurs if the relevant OPAL call returns OPAL_EEH_STOPPED_MMIO_DMA_FREEZE or OPAL_EEH_PHB_ERROR respectively. On pseries, this will occur if pseries_eeh_get_state() returns 0, which in turn occurs if RTAS reports that the PE is in the MMIO Stopped and DMA Stopped states. Obviously, none of these cases represent a successful completion of a request to thaw MMIO or DMA. Fix the check so that a wait_state() return value of 0 won't be considered successful for the EEH_OPT_THAW_MMIO or EEH_OPT_THAW_DMA cases. Signed-off-by: Andrew Donnellan <[email protected]> Acked-by: Gavin Shan <[email protected]> Reviewed-by: Daniel Axtens <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/eeh: Remove duplicated check in eeh_dump_pe_log()Gavin Shan1-7/+0
When eeh_dump_pe_log() is only called by eeh_slot_error_detail(), we already have the check that the PE isn't in PCI config blocked state in eeh_slot_error_detail(). So we needn't the duplicated check in eeh_dump_pe_log(). This removes the duplicated check in eeh_dump_pe_log(). No logical changes introduced. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Andrew Donnellan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/eeh: Synchronize recovery in host/guestGavin Shan1-0/+11
When passing through SRIOV VFs to guest, we possibly encounter EEH error on PF. In this case, the VF PEs are put into frozen state. The error could be reported to guest before it's captured by the host. That means the guest could attempt to recover errors on VFs before host gets chance to recover errors on PFs. The VFs won't be recovered successfully. This enforces the recovery order for above case: the recovery on child PE in guest is hold until the recovery on parent PE in host is completed. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Russell Currey <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/eeh: Don't remove passed VFsGavin Shan1-0/+3
When we have partial hotplug as part of the error recovery on PF, the VFs that are bound with vfio-pci driver will experience hotplug. That's not allowed. This checks if the VF PE is passed or not. If it does, we leave the VF without removing it. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Russell Currey <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/eeh: Don't propagate error to guestGavin Shan1-5/+5
When EEH error happened to the parent PE of those PEs that have been passed through to guest, the error is propagated to guest domain and the VFIO driver's error handlers are called. It's not correct as the error in the host domain shouldn't be propagated to guests and affect them. This adds one more limitation when calling EEH error handlers. If the PE has been passed through to guest, the error handlers won't be called. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Russell Currey <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/eeh: powerpc/eeh: Support error recovery for VF PEWei Yang6-26/+127
PFs are enumerated on PCI bus, while VFs are created by PF's driver. In EEH recovery, it has two cases: 1. Device and driver is EEH aware, error handlers are called. 2. Device and driver is not EEH aware, un-plug the device and plug it again by enumerating it. The special thing happens on the second case. For a PF, we could use the original pci core to enumerate the bus, while for VF we need to record the VFs which aer un-plugged then plug it again. Also The patch caches the VF index in pci_dn, which can be used to calculate VF's bus, device and function number. Those information helps to locate the VF's PCI device instance when doing hotplug during EEH recovery if necessary. Signed-off-by: Wei Yang <[email protected]> Acked-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/powernv: Support PCI config restore for VFsWei Yang2-3/+93
After PE reset, OPAL API opal_pci_reinit() is called on all devices contained in the PE to reinitialize them. While skiboot is not aware of VFs, we have to implement the function in kernel to reinitialize VFs after reset on PE for VFs. In this patch, two functions pnv_pci_fixup_vf_mps() and pnv_eeh_restore_vf_config() both manipulate the MPS of the VF, since for a VF it has three cases. 1. Normal creation for a VF In this case, pnv_pci_fixup_vf_mps() is called to make the MPS a proper value compared with its parent. 2. EEH recovery without VF removed In this case, MPS is stored in pci_dn and pnv_eeh_restore_vf_config() is called to restore it and reinitialize other part. 3. EEH recovery with VF removed In this case, VF will be removed then re-created. Both functions are called. First pnv_pci_fixup_vf_mps() is called to store the proper MPS to pci_dn and then pnv_eeh_restore_vf_config() is called to do proper thing. This introduces two functions: pnv_pci_fixup_vf_mps() to fixup the VF's MPS to make sure it is equal to parent's and store this value in pci_dn for future use. pnv_eeh_restore_vf_config() to re-initialize on VF by restoring MPS, disabling completion timeout, enabling SERR, etc. Signed-off-by: Wei Yang <[email protected]> Acked-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/powernv: Support EEH reset for VF PEWei Yang3-4/+133
PEs for VFs don't have primary bus. So they have to have their own reset backend, which is used during EEH recovery. The patch implements the reset backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained in the PE. Signed-off-by: Wei Yang <[email protected]> Acked-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/eeh: Create PE for VFsWei Yang3-2/+25
This creates PEs for VFs in the weak function pcibios_bus_add_device(). Those PEs for VFs are identified with newly introduced flag EEH_PE_VF so that we treat them differently during EEH recovery. Signed-off-by: Wei Yang <[email protected]> Acked-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/eeh: EEH device for VFWei Yang2-0/+16
VFs and their corresponding pdn are created and released dynamically when their PF's SRIOV capability is enabled and disabled. This creates and releases EEH devices for VFs when creating and releasing their pdn instances, which means EEH devices and pdn instances have same life cycle. Also, VF's EEH device is identified by (struct eeh_dev::physfn). Signed-off-by: Wei Yang <[email protected]> Acked-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/eeh: Cache normal BARs, not windows or IOV BARsWei Yang1-6/+5
This restricts the EEH address cache to use only the first 7 BARs. This makes __eeh_addr_cache_insert_dev() ignore PCI bridge window and IOV BARs. As the result of this change, eeh_addr_cache_get_dev() will return VFs from VF's resource addresses instead of parent PFs. This also removes PCI bridge check as we limit __eeh_addr_cache_insert_dev() to 7 BARs and this effectively excludes PCI bridges from being cached. Signed-off-by: Wei Yang <[email protected]> Acked-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/pci: Remove VFs prior to PFWei Yang1-1/+1
As commit ac205b7bb72f ("PCI: make sriov work with hotplug remove") indicates, VFs which is on the same PCI bus as their PF, should be removed before the PF. Otherwise, we might run into kernel crash at PCI unplugging time. This applies the above pattern to powerpc PCI hotplug path. Signed-off-by: Wei Yang <[email protected]> Acked-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-09powerpc/eeh: Reworked eeh_pe_bus_get()Gavin Shan1-16/+12
The original implementation is ugly: unnecessary if statements and "out" tag. This reworks the function to avoid above weaknesses. No functional changes introduced. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Andrew Donnellan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-03-08PCI: Include pci/hotplug Kconfig directly from pci/KconfigBjorn Helgaas11-21/+0
Include pci/hotplug/Kconfig directly from pci/Kconfig, so arches don't have to source both pci/Kconfig and pci/hotplug/Kconfig. Note that this effectively adds pci/hotplug/Kconfig to the following arches, because they already sourced drivers/pci/Kconfig but they previously did not source drivers/pci/hotplug/Kconfig: alpha arm avr32 frv m68k microblaze mn10300 sparc unicore32 Inspired-by-patch-from: Bogicevic Sasa <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2016-03-08PCI: Include pci/pcie/Kconfig directly from pci/KconfigBogicevic Sasa9-14/+0
Include pci/pcie/Kconfig directly from pci/Kconfig, so arches don't have to source both pci/Kconfig and pci/pcie/Kconfig. Note that this effectively adds pci/pcie/Kconfig to the following arches, because they already sourced drivers/pci/Kconfig but they previously did not source drivers/pci/pcie/Kconfig: alpha avr32 blackfin frv m32r m68k microblaze mn10300 parisc sparc unicore32 xtensa [bhelgaas: changelog, source pci/pcie/Kconfig at top of pci/Kconfig, whitespace] Signed-off-by: Sasa Bogicevic <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2016-03-08microblaze/PCI: Support generic Xilinx AXI PCIe Host Bridge IP driverBharat Kumar Gogada2-46/+13
Modify the Microblaze PCI subsystem to work with the generic drivers/pci/host/pcie-xilinx.c driver on Microblaze and Zynq. [bhelgaas: changelog] Signed-off-by: Bharat Kumar Gogada <[email protected]> Signed-off-by: Ravi Kiran Gummaluri <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Michal Simek <[email protected]>
2016-03-08Input: ad7879 - move header to platform_data directoryStefan Agner5-5/+5
The header file is used by the SPI and I2C variant of the driver. Therefore, move it to a more generic place under platform_data. Signed-off-by: Stefan Agner <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]>
2016-03-08PCI: Set ROM shadow location in arch code, not in PCI coreBjorn Helgaas2-12/+32
IORESOURCE_ROM_SHADOW means there is a copy of a device's option ROM in RAM. The existence of such a copy and its location are arch-specific. Previously the IORESOURCE_ROM_SHADOW flag was set in arch code, but the 0xC0000-0xDFFFF location was hard-coded into the PCI core. If we're using a shadow copy in RAM, disable the ROM BAR and release the address space it was consuming. Move the location information from the PCI core to the arch code that sets IORESOURCE_ROM_SHADOW. Save the location of the RAM copy in the struct resource for PCI_ROM_RESOURCE. After this change, pci_map_rom() will call pci_assign_resource() and pci_enable_rom() for these IORESOURCE_ROM_SHADOW resources, which we did not do before. This is safe because: - pci_assign_resource() will do nothing because the resource is marked IORESOURCE_PCI_FIXED, which means we can't move it, and - pci_enable_rom() will not turn on the ROM BAR's enable bit because the resource is marked IORESOURCE_ROM_SHADOW, which means it is in RAM rather than in PCI memory space. Storing the location in the struct resource means "lspci" will show the shadow location, not the value from the ROM BAR. Signed-off-by: Bjorn Helgaas <[email protected]>
2016-03-08PCI: Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXEDBjorn Helgaas2-2/+4
A shadow copy of an option ROM is placed by the BIOS as a fixed address. Set IORESOURCE_PCI_FIXED to indicate that we can't move the shadow copy. This prevents warnings like the following when we assign resources: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment This warning is emitted by pdev_sort_resources(), which already ignores IORESOURCE_PCI_FIXED resources. Link: http://lkml.kernel.org/r/CA+55aFyVMfTBB0oz_yx8+eQOEJnzGtCsYSj9QuhEpdZ9BHdq5A@mail.gmail.com Signed-off-by: Bjorn Helgaas <[email protected]>
2016-03-08x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARsBjorn Helgaas1-0/+7
The Home Agent and PCU PCI devices in Broadwell-EP have a non-BAR register where a BAR should be. We don't know what the side effects of sizing the "BAR" would be, and we don't know what address space the "BAR" might appear to describe. Mark these devices as having non-compliant BARs so the PCI core doesn't touch them. Signed-off-by: Bjorn Helgaas <[email protected]> Tested-by: Andi Kleen <[email protected]> CC: [email protected]
2016-03-08unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definitionBjorn Helgaas1-5/+0
HAVE_ARCH_PCI_SET_DMA_MASK is unused, so remove the #define for it. Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: GUAN Xuetao <[email protected]>
2016-03-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller108-381/+603
Several cases of overlapping changes, as well as one instance (vxlan) of a bug fix in 'net' overlapping with code movement in 'net-next'. Signed-off-by: David S. Miller <[email protected]>
2016-03-08x86/mm, x86/mce: Add memcpy_mcsafe()Tony Luck3-0/+132
Make use of the EXTABLE_FAULT exception table entries to write a kernel copy routine that doesn't crash the system if it encounters a machine check. Prime use case for this is to copy from large arrays of non-volatile memory used as storage. We have to use an unrolled copy loop for now because current hardware implementations treat a machine check in "rep mov" as fatal. When that is fixed we can simplify. Return type is a "bool". True means that we copied OK, false means that it didn't. Signed-off-by: Tony Luck <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Link: http://lkml.kernel.org/r/a44e1055efc2d2a9473307b22c91caa437aa3f8b.1456439214.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08perf/x86/intel/rapl: Simplify quirk handling even moreBorislav Petkov1-19/+13
Drop the quirk() function pointer in favor of a simple boolean which says whether the quirk should be applied or not. Update comment while at it. Signed-off-by: Borislav Petkov <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Harish Chegondi <[email protected]> Cc: Jacob Pan <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08s390: Fix misspellings in commentsAdam Buchbinder5-6/+6
Signed-off-by: Adam Buchbinder <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2016-03-08s390/mm: split arch/s390/mm/pgtable.cMartin Schwidefsky13-1405/+1464
The pgtable.c file is quite big, before it grows any larger split it into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related header definitions into the new gmap.h header and all of the pgste helpers from pgtable.h to pgtable.c. Signed-off-by: Martin Schwidefsky <[email protected]>
2016-03-08s390/mm: uninline pmdp_xxx functions from pgtable.hMartin Schwidefsky3-112/+124
The pmdp_xxx function are smaller than their ptep_xxx counterparts but to keep things symmetrical unline them as well. Signed-off-by: Martin Schwidefsky <[email protected]>
2016-03-08s390/mm: uninline ptep_xxx functions from pgtable.hMartin Schwidefsky3-353/+318
The code in the various ptep_xxx functions has grown quite large, consolidate them to four out-of-line functions: ptep_xchg_direct to exchange a pte with another with immediate flushing ptep_xchg_lazy to exchange a pte with another in a batched update ptep_modify_prot_start to begin a protection flags update ptep_modify_prot_commit to commit a protection flags update Signed-off-by: Martin Schwidefsky <[email protected]>
2016-03-08arm64: dts: juno/vexpress: fix node name unit-address presence warningsSudeep Holla6-36/+36
Commit fa38a82096a1 ("scripts/dtc: Update to upstream version 53bf130b1cdd") added warnings on node name unit-address presence/absence mismatch in device trees. This patch fixes those warning on all the juno/vexpress platforms where unit-address is present in node name while the reg/ranges property is not present. It also adds unit-address to all smb bus node. Signed-off-by: Sudeep Holla <[email protected]>
2016-03-08Merge branch 'linus' into x86/fpu, to pick up fixesIngo Molnar188-659/+1244
Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08x86/asm-offsets: Remove PARAVIRT_enabledAndy Lutomirski1-1/+0
It no longer has any users. Signed-off-by: Andy Lutomirski <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luis R. Rodriguez <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabledAndy Lutomirski3-13/+35
x86_64 has very clean espfix handling on paravirt: espfix64 is set up in native_iret, so paravirt systems that override iret bypass espfix64 automatically. This is robust and straightforward. x86_32 is messier. espfix is set up before the IRET paravirt patch point, so it can't be directly conditionalized on whether we use native_iret. We also can't easily move it into native_iret without regressing performance due to a bizarre consideration. Specifically, on 64-bit kernels, the logic is: if (regs->ss & 0x4) setup_espfix; On 32-bit kernels, the logic is: if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 && (regs->flags & X86_EFLAGS_VM) == 0) setup_espfix; The performance of setup_espfix itself is essentially irrelevant, but the comparison happens on every IRET so its performance matters. On x86_64, there's no need for any registers except flags to implement the comparison, so we fold the whole thing into native_iret. On x86_32, we don't do that because we need a free register to implement the comparison efficiently. We therefore do espfix setup before restoring registers on x86_32. This patch gets rid of the explicit paravirt_enabled check by introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE to skip espfix on paravirt systems where iret != native_iret. This is also messy, but it's at least in line with other things we do. This improves espfix performance by removing a branch, but no one cares. More importantly, it removes a paravirt_enabled user, which is good because paravirt_enabled is ill-defined and is going away. Signed-off-by: Andy Lutomirski <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: Andrew Cooper <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luis R. Rodriguez <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08KVM: s390: allocate only one DMA page per VMDavid Hildenbrand4-48/+41
We can fit the 2k for the STFLE interpretation and the crypto control block into one DMA page. As we now only have to allocate one DMA page, we can clean up the code a bit. As a nice side effect, this also fixes a problem with crycbd alignment in case special allocation debug options are enabled, debugged by Sascha Silbe. Acked-by: Christian Borntraeger <[email protected]> Reviewed-by: Dominik Dingel <[email protected]> Acked-by: Cornelia Huck <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-03-08KVM: s390: enable STFLE interpretation only if enabled for the guestDavid Hildenbrand1-1/+2
Not setting the facility list designation disables STFLE interpretation, this is what we want if the guest was told to not have it. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-03-08KVM: s390: wake up when the VCPU cpu timer expiresDavid Hildenbrand1-13/+35
When the VCPU cpu timer expires, we have to wake up just like when the ckc triggers. For now, setting up a cpu timer in the guest and going into enabled wait will never lead to a wakeup. This patch fixes this problem. Just as for the ckc, we have to take care of waking up too early. We have to recalculate the sleep time and go back to sleep. Please note that the timer callback calls kvm_s390_get_cpu_timer() from interrupt context. As the timer is canceled when leaving handle_wait(), and we don't do any VCPU cpu timer writes/updates in that function, we can be sure that we will never try to read the VCPU cpu timer from the same cpu that is currentyl updating the timer (deadlock). Reported-by: Sascha Silbe <[email protected]> Tested-by: Sascha Silbe <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-03-08KVM: s390: step the VCPU timer while in enabled waitDavid Hildenbrand2-2/+7
The cpu timer is a mean to measure task execution time. We want to account everything for a VCPU for which it is responsible. Therefore, if the VCPU wants to sleep, it shall be accounted for it. We can easily get this done by not disabling cpu timer accounting when scheduled out while sleeping because of enabled wait. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-03-08KVM: s390: protect VCPU cpu timer with a seqcountDavid Hildenbrand2-8/+30
For now, only the owning VCPU thread (that has loaded the VCPU) can get a consistent cpu timer value when calculating the delta. However, other threads might also be interested in a more recent, consistent value. Of special interest will be the timer callback of a VCPU that executes without having the VCPU loaded and could run in parallel with the VCPU thread. The cpu timer has a nice property: it is only updated by the owning VCPU thread. And speaking about accounting, a consistent value can only be calculated by looking at cputm_start and the cpu timer itself in one shot, otherwise the result might be wrong. As we only have one writing thread at a time (owning VCPU thread), we can use a seqcount instead of a seqlock and retry if the VCPU refreshed its cpu timer. This avoids any heavy locking and only introduces a counter update/check plus a handful of smp_wmb(). The owning VCPU thread should never have to retry on reads, and also for other threads this might be a very rare scenario. Please note that we have to use the raw_* variants for locking the seqcount as lockdep will produce false warnings otherwise. The rq->lock held during vcpu_load/put is also acquired from hardirq context. Lockdep cannot know that we avoid potential deadlocks by disabling preemption and thereby disable concurrent write locking attempts (via vcpu_put/load). Reviewed-by: Christian Borntraeger <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-03-08KVM: s390: step VCPU cpu timer during kvm_run ioctlDavid Hildenbrand2-2/+76
Architecturally we should only provide steal time if we are scheduled away, and not if the host interprets a guest exit. We have to step the guest CPU timer in these cases. In the first shot, we will step the VCPU timer only during the kvm_run ioctl. Therefore all time spent e.g. in interception handlers or on irq delivery will be accounted for that VCPU. We have to take care of a few special cases: - Other VCPUs can test for pending irqs. We can only report a consistent value for the VCPU thread itself when adding the delta. - We have to take care of STP sync, therefore we have to extend kvm_clock_sync() and disable preemption accordingly - During any call to disable/enable/start/stop we could get premeempted and therefore get start/stop calls. Therefore we have to make sure we don't get into an inconsistent state. Whenever a VCPU is scheduled out, sleeping, in user space or just about to enter the SIE, the guest cpu timer isn't stepped. Please note that all primitives are prepared to be called from both environments (cpu timer accounting enabled or not), although not completely used in this patch yet (e.g. kvm_s390_set_cpu_timer() will never be called while cpu timer accounting is enabled). Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-03-08KVM: s390: abstract access to the VCPU cpu timerDavid Hildenbrand3-10/+28
We want to manually step the cpu timer in certain scenarios in the future. Let's abstract any access to the cpu timer, so we can hide the complexity internally. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-03-08KVM: s390: store cpu id in vcpu->cpu when scheduled inDavid Hildenbrand1-0/+2
By storing the cpu id, we have a way to verify if the current cpu is owning a VCPU. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-03-08KVM: s390: Add diag "watchdog functions" to trace event decodingAlexander Yarygin1-0/+1
DIAG 0x288 may occur now. Let's add its code to the diag table in sie.h. Signed-off-by: Alexander Yarygin <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]>
2016-03-08x86/nmi: Mark 'ignore_nmis' as __read_mostlyKostenzer Felix1-1/+2
ignore_nmis is used in two distinct places: 1. modified through {stop,restart}_nmi by alternative_instructions 2. read by do_nmi to determine if default_do_nmi should be called or not thus the access pattern conforms to __read_mostly and do_nmi() is a fastpath. Signed-off-by: Kostenzer Felix <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-03-08KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUSDavid Hildenbrand1-1/+1
With MACHINE_HAS_VX, we convert the floating point registers from the vector registeres when storing the status. For other VCPUs, these are stored to vcpu->run->s.regs.vrs, but we are using current->thread.fpu.vxrs, which resolves to the currently loaded VCPU. So kvm_s390_store_status_unloaded() currently writes the wrong floating point registers (converted from the vector registers) when called from another VCPU on a z13. This is only the case for old user space not handling SIGP STORE STATUS and SIGP STOP AND STORE STATUS, but relying on the kernel implementation. All other calls come from the loaded VCPU via kvm_s390_store_status(). Fixes: 9abc2a08a7d6 (KVM: s390: fix memory overwrites when vx is disabled) Reviewed-by: Christian Borntraeger <[email protected]> Cc: [email protected] # v4.4+ Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Christian Borntraeger <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>