aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2016-05-12perf/x86: Fix undefined shift on 32-bit kernelsAndrey Ryabinin1-1/+1
Jim reported: UBSAN: Undefined behaviour in arch/x86/events/intel/core.c:3708:12 shift exponent 35 is too large for 32-bit type 'long unsigned int' The use of 'unsigned long' type obviously is not correct here, make it 'unsigned long long' instead. Reported-by: Jim Cromie <[email protected]> Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Imre Palik <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Fixes: 2c33645d366d ("perf/x86: Honor the architectural performance monitoring version") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-12perf/x86/msr: Fix SMI overflowPeter Zijlstra1-1/+1
We compute 'delta' and properly sign extend it and then ignore it and recompute the raw value, loosing the sign extention. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-12perf/x86/intel/uncore: Fix CHA registers configuration procedure for Knights ↵hchrzani1-0/+7
Landing platform CHA events in Knights Landing platform require programming filter registers properly. Remote node, local node and NonNearMemCachable bits should be set to 1 at all times. Signed-off-by: Hubert Chrzaniuk <[email protected]> Signed-off-by: Lawrence F Meadows <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: 77af0037de0a ('perf/x86/intel/uncore: Add Knights Landing uncore PMU support') Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-12Merge branch 'smp/hotplug' into sched/core, to resolve conflictsIngo Molnar2-2/+2
Conflicts: kernel/sched/core.c Signed-off-by: Ingo Molnar <[email protected]>
2016-05-12Merge branch 'sched/urgent' into sched/core to pick up fixesIngo Molnar98-381/+862
Signed-off-by: Ingo Molnar <[email protected]>
2016-05-12x86/RAS: Add SMCA support to AMD Error InjectorYazen Ghannam1-6/+25
Use SMCA MSRs when writing to MCA_{STATUS,ADDR,MISC} and MCA_DE{STAT,ADDR} when injecting Deferred Errors on SMCA platforms. Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Aravind Gopalakrishnan <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: linux-edac <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-12x86/mce: Update AMD mcheck init to use cpu_has() facilitiesYazen Ghannam1-5/+3
Use cpu_has() facilities to find available RAS features rather than directly reading CPUID 0x80000007_EBX. Signed-off-by: Yazen Ghannam <[email protected]> [ Use the struct cpuinfo_x86 ptr instead. ] Signed-off-by: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: linux-edac <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-12x86/cpu: Add detection of AMD RAS CapabilitiesYazen Ghannam3-4/+14
Add a new CPUID leaf to hold the contents of CPUID 0x80000007_EBX (RasCap). Define bits that are currently in use: Bit 0: McaOverflowRecov Bit 1: SUCCOR Bit 3: ScalableMca Signed-off-by: Yazen Ghannam <[email protected]> [ Shorten comment. ] Signed-off-by: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: linux-edac <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-12x86/mce/AMD: Save an indentation level in prepare_threshold_block()Borislav Petkov1-40/+38
Do the !SMCA work first and then save us an indentation level for the SMCA code. No functionality change. Signed-off-by: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Aravind Gopalakrishnan <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Yazen Ghannam <[email protected]> Cc: linux-edac <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-12x86/mce/AMD: Disable LogDeferredInMcaStat for SMCA systemsYazen Ghannam1-9/+29
Disable Deferred Error logging in MCA_{STATUS,ADDR} additionally for SMCA systems as this information will retrieved from MCA_DE{STAT,ADDR} on those systems. Signed-off-by: Yazen Ghannam <[email protected]> [ Simplify, drop SMCA_MCAX_EN_OFF define too. ] Signed-off-by: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Aravind Gopalakrishnan <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: linux-edac <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-12x86/mce/AMD: Log Deferred Errors using SMCA MCA_DE{STAT,ADDR} registersYazen Ghannam2-8/+28
Scalable MCA provides new registers for all banks for logging deferred errors: MCA_DESTAT and MCA_DEADDR. Deferred errors are always logged to these registers. Update the AMD deferred error handler to use these registers, if available. Signed-off-by: Yazen Ghannam <[email protected]> [ Sanity-check __log_error() args, massage a bit. ] Signed-off-by: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Aravind Gopalakrishnan <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: linux-edac <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-05-12KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interruptsPaul Mackerras2-10/+16
Commit c9a5eccac1ab ("kvm/eventfd: add arch-specific set_irq", 2015-10-16) added the possibility for architecture-specific code to handle the generation of virtual interrupts in atomic context where possible, without having to schedule a work function. Since we can easily generate virtual interrupts on XICS without having to do anything worse than take a spinlock, we define a kvm_arch_set_irq_inatomic() for XICS. We also remove kvm_set_msi() since it is not used any more. The one slightly tricky thing is that with the new interface, we don't get told whether the interrupt is an MSI (or other edge sensitive interrupt) vs. level-sensitive. The difference as far as interrupt generation is concerned is that for LSIs we have to set the asserted flag so it will continue to fire until it is explicitly cleared. In fact the XICS code gets told which interrupts are LSIs by userspace when it configures the interrupt via the KVM_DEV_XICS_GRP_SOURCES attribute group on the XICS device. To store this information, we add a new "lsi" field to struct ics_irq_state. With that we can also do a better job of returning accurate values when reading the attribute group. Signed-off-by: Paul Mackerras <[email protected]>
2016-05-11kvm: Conditionally register IRQ bypass consumerAlex Williamson1-11/+8
If we don't support a mechanism for bypassing IRQs, don't register as a consumer. This eliminates meaningless dev_info()s when the connect fails between producer and consumer, such as on AMD systems where kvm_x86_ops->update_pi_irte is not implemented Signed-off-by: Alex Williamson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-05-11kvm: introduce KVM_MAX_VCPU_IDGreg Kurz1-0/+3
The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and also the upper limit for vCPU ids. This is okay for all archs except PowerPC which can have higher ids, depending on the cpu/core/thread topology. In the worst case (single threaded guest, host with 8 threads per core), it limits the maximum number of vCPUS to KVM_MAX_VCPUS / 8. This patch separates the vCPU numbering from the total number of vCPUs, with the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids plus one. The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids before passing them to KVM_CREATE_VCPU. This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC. Other archs continue to return KVM_MAX_VCPUS instead. Suggested-by: Radim Krcmar <[email protected]> Signed-off-by: Greg Kurz <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2016-05-11Merge tag 'kvm-arm-for-4.7' of ↵Paolo Bonzini13-361/+676
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Changes for Linux v4.7 Reworks our stage 2 page table handling to have page table manipulation macros separate from those of the host systems as the underlying hardware page tables can be configured to be noticably different in layout from the stage 1 page tables used by the host. Adds 16K page size support based on the above. Adds a generic firmware probing layer for the timer and GIC so that KVM initializes using the same logic based on both ACPI and FDT. Finally adds support for hardware updating of the access flag.
2016-05-11x86/extable: ensure entries are swapped completely when sortingMathias Krause1-0/+8
The x86 exception table sorting was changed in commit 29934b0fb8ff ("x86/extable: use generic search and sort routines") to use the arch independent code in lib/extable.c. However, the patch was mangled somehow on its way into the kernel from the last version posted at [1]. The committed version kind of attempted to incorporate the changes of commit 548acf19234d ("x86/mm: Expand the exception table logic to allow new handling options") as in _completely_ _ignoring_ the x86 specific 'handler' member of struct exception_table_entry. This effectively broke the sorting as entries will only partly be swapped now. Fortunately, the x86 Kconfig selects BUILDTIME_EXTABLE_SORT, so the exception table doesn't need to be sorted at runtime. However, in case that ever changes, we better not break the exception table sorting just because of that. [ Ard Biesheuvel points out that BUILDTIME_EXTABLE_SORT applies to the core image only, but we still rely on the sorting routines for modules in that case - Linus ] Fix this by providing a swap_ex_entry_fixup() macro that takes care of the 'handler' member. [1] https://lkml.org/lkml/2016/1/27/232 Signed-off-by: Mathias Krause <[email protected]> Fixes: 29934b0fb8f ("x86/extable: use generic search and sort routines") Reviewed-by: Ard Biesheuvel <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-05-11Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-1/+8
Pull KVM fixes from Paolo Bonzini: "Two small x86 patches, improving "make kvmconfig" and fixing an objtool warning for CONFIG_PROFILE_ALL_BRANCHES" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvmconfig: add more virtio drivers x86/kvm: Add stack frame dependency to fastop() inline asm
2016-05-11Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar51-143/+332
Signed-off-by: Ingo Molnar <[email protected]>
2016-05-11ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMCBoris Brezillon1-1/+1
The memory range assigned to the PMC (Power Management Controller) was not including the PMC_PCR register which are used to control peripheral clocks. This was working fine thanks to the page granularity of ioremap(), but started to fail when we switched to syscon/regmap, because regmap is making sure that all accesses are falling into the reserved range. Signed-off-by: Boris Brezillon <[email protected]> Reported-by: Richard Genoud <[email protected]> Tested-by: Richard Genoud <[email protected]> Fixes: 863a81c3be1d ("clk: at91: make use of syscon to share PMC registers in several drivers") Cc: <[email protected]> Signed-off-by: Nicolas Ferre <[email protected]>
2016-05-11ARM: aspeed: adapt defconfigs for new CONFIG_PRINTK_TIMEArnd Bergmann2-2/+2
Commit 94ea9c7a579f ("lib, switch CONFIG_PRINTK_TIME to int") changes the type of CONFIG_PRINTK_TIME and adapts all existing defconfig files, while we add two new defconfig files with d0bc3483da87 ("arm/configs: Add Aspeed defconfig"), with the old type. This changes the two new defconfig files to match the other ones. As a result, we get harmless warnings for the arm-soc defconfig branch by itself, but everything will work when 4.7-rc1 is out. Signed-off-by: Arnd Bergmann <[email protected]>
2016-05-11powerpc/powernv/npu: Enable NVLink pass throughAlexey Kardashevskiy3-6/+176
IBM POWER8 NVlink systems come with Tesla K40-ish GPUs each of which also has a couple of fast speed links (NVLink). The interface to links is exposed as an emulated PCI bridge which is included into the same IOMMU group as the corresponding GPU. In the kernel, NPUs get a separate PHB of the PNV_PHB_NPU type and a PE which behave pretty much as the standard IODA2 PHB except NPU PHB has just a single TVE in the hardware which means it can have either 32bit window or 64bit window or DMA bypass but never two of these. In order to make these links work when GPU is passed to the guest, these bridges need to be passed as well; otherwise performance will degrade. This implements and exports API to manage NPU state in regard to VFIO; it replicates iommu_table_group_ops. This defines a new pnv_pci_ioda2_npu_ops which is assigned to the IODA2 bridge if there are NPUs for a GPU on the bridge. The new callbacks call the default IODA2 callbacks plus new NPU API. This adds a gpe_table_group_to_npe() helper to find NPU PE for the IODA2 table_group, it is not expected to fail as the helper is only called from the pnv_pci_ioda2_npu_ops. This does not define NPU-specific .release_ownership() so after VFIO is finished, DMA on NPU is disabled which is ok as the nvidia driver sets DMA mask when probing which enable 32 or 64bit DMA on NPU. This adds a pnv_pci_npu_setup_iommu() helper which adds NPUs to the GPU group if any found. The helper uses helpers to look for the "ibm,gpu" property in the device tree which is a phandle of the corresponding GPU. This adds an additional loop over PEs in pnv_ioda_setup_dma() as the main loop skips NPU PEs as they do not have 32bit DMA segments. As pnv_npu_set_window() and pnv_npu_unset_window() are started being used by the new IODA2-NPU IOMMU group, this makes the helpers public and adds the DMA window number parameter. Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-By: Alistair Popple <[email protected]> [mpe: Add pnv_pci_ioda_setup_iommu_api() to fix build with IOMMU_API=n] Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv/npu: Rework TCE Kill handlingAlexey Kardashevskiy3-112/+27
The pnv_ioda_pe struct keeps an array of peers. At the moment it is only used to link GPU and NPU for 2 purposes: 1. Access NPU quickly when configuring DMA for GPU - this was addressed in the previos patch by removing use of it as DMA setup is not what the kernel would constantly do. 2. Invalidate TCE cache for NPU when it is invalidated for GPU. GPU and NPU are in different PE. There is already a mechanism to attach multiple iommu_table_group to the same iommu_table (used for VFIO), we can reuse it here so does this patch. This gets rid of peers[] array and PNV_IODA_PE_PEER flag as they are not needed anymore. While we are here, add TCE cache invalidation after enabling bypass. Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-By: Alistair Popple <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv/npu: Add set/unset window helpersAlexey Kardashevskiy1-10/+55
The upcoming NVLink passthrough support will require NPU code to cope with two DMA windows. This adds a pnv_npu_set_window() helper which programs 32bit window to the hardware. This also adds multilevel TCE support. This adds a pnv_npu_unset_window() helper which removes the DMA window from the hardware. This does not make difference now as the caller - pnv_npu_dma_set_bypass() - enables bypass in the hardware but the next patch will use it to manage TCE table lists for TCE Kill handling. Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-By: Alistair Popple <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv/ioda2: Export debug helper pe_level_printk()Alexey Kardashevskiy2-8/+10
This exports debugging helper pe_level_printk() and corresponding macroses so they can be used in npu-dma.c. Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-By: Alistair Popple <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv/npu: Simplify DMA setupAlexey Kardashevskiy3-68/+52
NPU devices are emulated in firmware and mainly used for NPU NVLink training; one NPU device is per a hardware link. Their DMA/TCE setup must match the GPU which is connected via PCIe and NVLink so any changes to the DMA/TCE setup on the GPU PCIe device need to be propagated to the NVLink device as this is what device drivers expect and it doesn't make much sense to do anything else. This makes NPU DMA setup explicit. pnv_npu_ioda_controller_ops::pnv_npu_dma_set_mask is moved to pci-ioda, made static and prints warning as dma_set_mask() should never be called on this function as in any case it will not configure GPU; so we make this explicit. Instead of using PNV_IODA_PE_PEER and peers[] (which the next patch will remove), we test every PCI device if there are corresponding NVLink devices. If there are any, we propagate bypass mode to just found NPU devices by calling the setup helper directly (which takes @bypass) and avoid guessing (i.e. calculating from DMA mask) whether we need bypass or not on NPU devices. Since DMA setup happens in very rare occasion, this will not slow down booting or VFIO start/stop much. This renames pnv_npu_disable_bypass to pnv_npu_dma_set_32 to make it more clear what the function really does which is programming 32bit table address to the TVT ("disabling bypass" means writing zeroes to the TVT). This removes pnv_npu_dma_set_bypass() from pnv_npu_ioda_fixup() as the DMA configuration on NPU does not matter until dma_set_mask() is called on GPU and that will do the NPU DMA configuration. This removes phb->dma_dev_setup initialization for NPU as pnv_pci_ioda_dma_dev_setup is no-op for it anyway. This stops using npe->tce_bypass_base as it never changes and values other than zero are not supported. Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: David Gibson <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv/npu: Use the correct IOMMU page sizeAlexey Kardashevskiy1-6/+5
This uses the page size from iommu_table instead of hard-coded 4K. This should cause no change in behavior. While we are here, move bits around to prepare for further rework which will define and use iommu_table_group_ops. Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: David Gibson <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv/npu: TCE Kill helpers cleanupAlexey Kardashevskiy3-52/+25
NPU PHB TCE Kill register is exactly the same as in the rest of POWER8 so let's reuse the existing code for NPU. The only bit missing is a helper to reset the entire TCE cache so this moves such a helper from NPU code and renames it. Since pnv_npu_tce_invalidate() does really invalidate the entire cache, this uses pnv_pci_ioda2_tce_invalidate_entire() directly for NPU. This adds an explicit comment for workaround for invalidating NPU TCE cache. Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: David Gibson <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv: Define TCE Kill flagsAlexey Kardashevskiy1-2/+5
This replaces magic constants for TCE Kill IODA2 register with macros. Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv: Rename pnv_pci_ioda2_tce_invalidate_entireAlexey Kardashevskiy1-3/+3
As in fact pnv_pci_ioda2_tce_invalidate_entire() invalidates TCEs for the specific PE rather than the entire cache, rename it to pnv_pci_ioda2_tce_invalidate_pe(). In later patches we will add a proper pnv_pci_ioda2_tce_invalidate_entire(). Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: David Gibson <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()Gavin Shan1-10/+2
The function pnv_pci_reset_secondary_bus() is called like below. It's impossible for call the function on root bus. So it's safe to remove the root bus case in the function. No functional changes introduced. pci_parent_bus_reset() / pci_bus_reset() / pci_try_reset_bus() pci_reset_bridge_secondary_bus() pcibios_reset_secondary_bus() pnv_pci_reset_secondary_bus() Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Daniel Axtens <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv: Simplify pnv_eeh_reset()Gavin Shan1-36/+31
This drops unnecessary nested if statements in pnv_eeh_reset() to improve the code readability. After the changes, the unused local variable "ret" is dropped as well. No logical changes introduced. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Andrew Donnellan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/pci: Don't scan empty slotGavin Shan1-1/+2
In hotplug case, function pci_add_pci_devices() is called to rescan the specified PCI bus, which might not have any child devices. Access to the PCI bus's child device node will cause kernel crash without exception. This adds one more check to skip scanning PCI bus that doesn't have any subordinate devices from device-tree, in order to avoid kernel crash. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/pci: Export pci_traverse_device_nodes()Gavin Shan3-10/+15
This renames traverse_pci_devices() to pci_traverse_device_nodes(). The function traverses all subordinate device nodes of the specified one. Also, below cleanup applied to the function. No logical changes introduced. * Rename "pre" to "fn". * Avoid assignment in if condition reported from checkpatch.pl. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/pci: Introduce pci_remove_device_node_info()Gavin Shan2-0/+24
This implements and exports pci_remove_device_node_info(). It's used to remove the pdn (struct pci_dn) for the indicated device node. The function is going to be used by PowerNV PCI hotplug driver. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/pci: Export pci_add_device_node_info()Gavin Shan3-13/+22
This renames update_dn_pci_info() to pci_add_device_node_info() with corresponding adjustment on the parameter type and exports it. The function is used to create pdn (struct pci_dn) for the indicated device node. Another function add_pdn(), almost wrapper of pci_add_device_node_info(), to be used in traverse_pci_devices(). No logical changes introduced. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/pci: Move pci_find_bus_by_node() aroundGavin Shan2-31/+29
This moves pci_find_bus_by_node() from arch/powerpc/platforms/ pseries/pci_dlpar.c to arch/powerpc/kernel/pci-hotplug.c so that the function can be used by pSeries and PowerNV platform at the same time. Also, below cleanup applied. No functional changes introduced. * Remove variable "busdn" in find_bus_among_children() * Use PCI_DN() to convert device node to pci_dn Signed-off-by: Gavin Shan <[email protected]> Acked-by: Benjamin Herrenschmidt <[email protected]> Reviewed-by: Andrew Donnellan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/pci: Rename pcibios_find_pci_bus()Gavin Shan2-4/+3
This renames pcibios_find_pci_bus() to pci_find_bus_by_node() to avoid conflicts with those PCI subsystem weak function names, which have prefix "pcibios". No logical changes introduced. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/pci: Rename pcibios_{add, remove}_pci_devices()Gavin Shan3-16/+15
This renames pcibios_{add,remove}_pci_devices() to avoid conflicts with names of the weak functions in PCI subsystem, which have the prefix "pcibios". No logical changes introduced. Signed-off-by: Gavin Shan <[email protected]> Reviewed-By: Alistair Popple <[email protected]> Reviewed-by: Andrew Donnellan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv: Use PE instead of number during setup and releaseGavin Shan2-47/+59
In current implementation, the PEs that are allocated or picked from the reserved list are identified by PE number. The PE instance has to be picked according to the PE number eventually. We have same issue when PE is released. For pnv_ioda_pick_m64_pe() and pnv_ioda_alloc_pe(), this returns PE instance so that pnv_ioda_setup_bus_PE() can use the allocated or reserved PE instance directly. Also, pnv_ioda_setup_bus_PE() returns the reserved/allocated PE instance to be used in subsequent patches. On the other hand, pnv_ioda_free_pe() uses PE instance (not number) as its argument. No logical changes introduced. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv/ioda1: Improve DMA32 segment trackGavin Shan2-56/+66
In current implementation, the DMA32 segments required by one specific PE isn't calculated with the information hold in the PE independently. It conflicts with the PCI hotplug design: PE centralized, meaning the PE's DMA32 segments should be calculated from the information hold in the PE independently. This introduces an array (@dma32_segmap) for every PHB to track the DMA32 segmeng usage. Besides, this moves the logic calculating PE's consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's DMA32 segments are calculated/allocated from the information hold in the PE (DMA32 weight). Also the logic is improved: we try to allocate as much DMA32 segments as we can. It's acceptable that number of DMA32 segments less than the expected number are allocated. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv: Remove DMA32 PE listGavin Shan2-112/+78
PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according to their DMA32 weight. The PEs on the list are iterated to setup their TCE32 tables at system booting time. The list is used for once at boot time and no need to keep it. This moves the logic calculating DMA32 weight of PHB and PE to pnv_ioda_setup_dma() to drop PHB's DMA32 list. Also, every PE traces the consumed DMA32 segment by @tce32_seg and @tce32_segcount are useless and they're removed. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZEGavin Shan1-13/+18
Currently, there is one macro (TCE32_TABLE_SIZE) representing the TCE table size for one DMA32 segment. The constant representing the DMA32 segment size (1 << 28) is still used in the code. This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32 segment size. the TCE table size can be calcualted when the page has fixed 4KB size. So all the related calculation depends on one macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced. Signed-off-by: Gavin Shan <[email protected]> Reviewed-By: Alistair Popple <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe()Gavin Shan1-4/+5
This renames pnv_pci_ioda_setup_dma_pe() to pnv_pci_ioda1_setup_dma_pe() as it's the counter-part of IODA2's pnv_pci_ioda2_setup_dma_pe(). No logical changes introduced. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv/ioda1: M64 support on P7IOCGavin Shan1-3/+86
This enables M64 window on P7IOC, which has been enabled on PHB3. Different from PHB3 where 16 M64 BARs are supported and each of them can be owned by one particular PE# exclusively or divided evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each of them are divided to 8 segments. So every P7IOC PHB supports 128 M64 segments in total. P7IOC has M64DT, which helps mapping one particular M64 segment# to arbitrary PE#. PHB3 doesn't have M64DT, indicating that one M64 segment can only be pinned to the fixed PE#. In order to unified M64 support M64 on P7IOC and PHB3, we just provide 128 M64 segments on every P7IOC PHB and each of them is pinned to the fixed PE# by bypassing the function of M64DT. In turn, we just need different phb->init_m64() for P7IOC and PHB3 and maps M64 segment in pnv_ioda_reserve_m64_pe() for P7IOC, most of the code are shared by them. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv: Rename M64 related functionsGavin Shan1-11/+11
This renames those functions picking PE number based on consumed M64 segments, mapping M64 segments to PEs as those functions are going to be shared by IODA1/IODA2 in next patch. No logical changes introduced. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv: Track M64 segment consumptionGavin Shan2-2/+9
When unplugging PCI devices, their parent PEs might be offline. The consumed M64 resource by the PEs should be released at that time. As we track M32 segment consumption, this introduces an array to the PHB to track the mapping between M64 segment and PE number. Note: M64 mapping isn't covered by pnv_ioda_setup_pe_seg() as IODA2 doesn't support the mapping explicitly while it's supported on IODA1. Until now, no M64 is supported on IODA1 in software. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv: IO and M32 mapping based on PCI device resourcesGavin Shan1-3/+16
Currently, the IO and M32 segments are mapped to the corresponding PE based on the windows of the parent bridge of PE's primary bus. It's not going to work when the windows of root port or upstream port of the PCIe switch behind root port are extended to PHB's apertures in order to support hotplug in subsequent patch. This fixes the issue by mapping IO and M32 segments based on the resources of the PCI devices included in the PE, instead of the windows of the parent bridge of the PE's primary bus. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()Gavin Shan1-59/+62
pnv_ioda_setup_pe_seg() associates the IO and M32 segments with the owner PE. The code mapping segments should be fixed and immune from logic changes introduced to pnv_ioda_setup_pe_seg(). This moves the code mapping segments to helper pnv_ioda_setup_pe_res(). The data type for @rc is changed to "int64_t". Also, argument @hose is removed from pnv_ioda_setup_pe() as it can be got from @pe. No functional changes introduced. Signed-off-by: Gavin Shan <[email protected]> Reviewed-By: Alistair Popple <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv: Fix initial IO and M32 segmapGavin Shan1-1/+7
There are two arrays for IO and M32 segment maps on every PHB. The index of the arrays are segment number and the value stored in the corresponding element is PE number, indicating the segment is assigned to the PE. Initially, all elements in those two arrays are zeroes, meaning all segments are assigned to PE#0. It's wrong. This fixes the initial values in the elements of those two arrays to IODA_INVALID_PE, meaning all segments aren't assigned to any PE. Signed-off-by: Gavin Shan <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-05-11powerpc/powernv: Data type unsigned int for PE numberGavin Shan4-9/+9
This changes the data type of PE number from "int" to "unsigned int" in order to match the fact PE number is never negative: * The number of PE to which the specified PCI device is attached. * The PE number map for SRIOV VFs. * The returned PE number from pnv_ioda_alloc_pe(). * The returned PE number from pnv_ioda2_pick_m64_pe(). Suggested-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Gavin Shan <[email protected]> Reviewed-By: Alistair Popple <[email protected]> Reviewed-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>