Age | Commit message (Collapse) | Author | Files | Lines |
|
Jim reported:
UBSAN: Undefined behaviour in arch/x86/events/intel/core.c:3708:12
shift exponent 35 is too large for 32-bit type 'long unsigned int'
The use of 'unsigned long' type obviously is not correct here, make it
'unsigned long long' instead.
Reported-by: Jim Cromie <[email protected]>
Signed-off-by: Andrey Ryabinin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Imre Palik <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Fixes: 2c33645d366d ("perf/x86: Honor the architectural performance monitoring version")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
We compute 'delta' and properly sign extend it and then ignore it and
recompute the raw value, loosing the sign extention.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Landing platform
CHA events in Knights Landing platform require programming filter registers properly.
Remote node, local node and NonNearMemCachable bits should be set to 1 at all times.
Signed-off-by: Hubert Chrzaniuk <[email protected]>
Signed-off-by: Lawrence F Meadows <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 77af0037de0a ('perf/x86/intel/uncore: Add Knights Landing uncore PMU support')
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Conflicts:
kernel/sched/core.c
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Use SMCA MSRs when writing to MCA_{STATUS,ADDR,MISC} and
MCA_DE{STAT,ADDR} when injecting Deferred Errors on SMCA platforms.
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Aravind Gopalakrishnan <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: linux-edac <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Use cpu_has() facilities to find available RAS features rather than
directly reading CPUID 0x80000007_EBX.
Signed-off-by: Yazen Ghannam <[email protected]>
[ Use the struct cpuinfo_x86 ptr instead. ]
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: linux-edac <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add a new CPUID leaf to hold the contents of CPUID 0x80000007_EBX (RasCap).
Define bits that are currently in use:
Bit 0: McaOverflowRecov
Bit 1: SUCCOR
Bit 3: ScalableMca
Signed-off-by: Yazen Ghannam <[email protected]>
[ Shorten comment. ]
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: linux-edac <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Do the !SMCA work first and then save us an indentation level for the
SMCA code.
No functionality change.
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Aravind Gopalakrishnan <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Yazen Ghannam <[email protected]>
Cc: linux-edac <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Disable Deferred Error logging in MCA_{STATUS,ADDR} additionally for
SMCA systems as this information will retrieved from MCA_DE{STAT,ADDR}
on those systems.
Signed-off-by: Yazen Ghannam <[email protected]>
[ Simplify, drop SMCA_MCAX_EN_OFF define too. ]
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Aravind Gopalakrishnan <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: linux-edac <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Scalable MCA provides new registers for all banks for logging deferred
errors: MCA_DESTAT and MCA_DEADDR. Deferred errors are always logged to
these registers.
Update the AMD deferred error handler to use these registers, if
available.
Signed-off-by: Yazen Ghannam <[email protected]>
[ Sanity-check __log_error() args, massage a bit. ]
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Aravind Gopalakrishnan <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: linux-edac <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Commit c9a5eccac1ab ("kvm/eventfd: add arch-specific set_irq",
2015-10-16) added the possibility for architecture-specific code
to handle the generation of virtual interrupts in atomic context
where possible, without having to schedule a work function.
Since we can easily generate virtual interrupts on XICS without
having to do anything worse than take a spinlock, we define a
kvm_arch_set_irq_inatomic() for XICS. We also remove kvm_set_msi()
since it is not used any more.
The one slightly tricky thing is that with the new interface, we
don't get told whether the interrupt is an MSI (or other edge
sensitive interrupt) vs. level-sensitive. The difference as far
as interrupt generation is concerned is that for LSIs we have to
set the asserted flag so it will continue to fire until it is
explicitly cleared.
In fact the XICS code gets told which interrupts are LSIs by userspace
when it configures the interrupt via the KVM_DEV_XICS_GRP_SOURCES
attribute group on the XICS device. To store this information, we add
a new "lsi" field to struct ics_irq_state. With that we can also do a
better job of returning accurate values when reading the attribute
group.
Signed-off-by: Paul Mackerras <[email protected]>
|
|
If we don't support a mechanism for bypassing IRQs, don't register as
a consumer. This eliminates meaningless dev_info()s when the connect
fails between producer and consumer, such as on AMD systems where
kvm_x86_ops->update_pi_irte is not implemented
Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and
also the upper limit for vCPU ids. This is okay for all archs except PowerPC
which can have higher ids, depending on the cpu/core/thread topology. In the
worst case (single threaded guest, host with 8 threads per core), it limits
the maximum number of vCPUS to KVM_MAX_VCPUS / 8.
This patch separates the vCPU numbering from the total number of vCPUs, with
the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids
plus one.
The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids
before passing them to KVM_CREATE_VCPU.
This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC.
Other archs continue to return KVM_MAX_VCPUS instead.
Suggested-by: Radim Krcmar <[email protected]>
Signed-off-by: Greg Kurz <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM Changes for Linux v4.7
Reworks our stage 2 page table handling to have page table manipulation
macros separate from those of the host systems as the underlying
hardware page tables can be configured to be noticably different in
layout from the stage 1 page tables used by the host.
Adds 16K page size support based on the above.
Adds a generic firmware probing layer for the timer and GIC so that KVM
initializes using the same logic based on both ACPI and FDT.
Finally adds support for hardware updating of the access flag.
|
|
The x86 exception table sorting was changed in commit 29934b0fb8ff
("x86/extable: use generic search and sort routines") to use the arch
independent code in lib/extable.c. However, the patch was mangled
somehow on its way into the kernel from the last version posted at [1].
The committed version kind of attempted to incorporate the changes of
commit 548acf19234d ("x86/mm: Expand the exception table logic to allow
new handling options") as in _completely_ _ignoring_ the x86 specific
'handler' member of struct exception_table_entry. This effectively
broke the sorting as entries will only partly be swapped now.
Fortunately, the x86 Kconfig selects BUILDTIME_EXTABLE_SORT, so the
exception table doesn't need to be sorted at runtime. However, in case
that ever changes, we better not break the exception table sorting just
because of that.
[ Ard Biesheuvel points out that BUILDTIME_EXTABLE_SORT applies to the
core image only, but we still rely on the sorting routines for modules
in that case - Linus ]
Fix this by providing a swap_ex_entry_fixup() macro that takes care of
the 'handler' member.
[1] https://lkml.org/lkml/2016/1/27/232
Signed-off-by: Mathias Krause <[email protected]>
Fixes: 29934b0fb8f ("x86/extable: use generic search and sort routines")
Reviewed-by: Ard Biesheuvel <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pull KVM fixes from Paolo Bonzini:
"Two small x86 patches, improving "make kvmconfig" and fixing an
objtool warning for CONFIG_PROFILE_ALL_BRANCHES"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvmconfig: add more virtio drivers
x86/kvm: Add stack frame dependency to fastop() inline asm
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The memory range assigned to the PMC (Power Management Controller) was
not including the PMC_PCR register which are used to control peripheral
clocks.
This was working fine thanks to the page granularity of ioremap(), but
started to fail when we switched to syscon/regmap, because regmap is
making sure that all accesses are falling into the reserved range.
Signed-off-by: Boris Brezillon <[email protected]>
Reported-by: Richard Genoud <[email protected]>
Tested-by: Richard Genoud <[email protected]>
Fixes: 863a81c3be1d ("clk: at91: make use of syscon to share PMC registers in several drivers")
Cc: <[email protected]>
Signed-off-by: Nicolas Ferre <[email protected]>
|
|
Commit 94ea9c7a579f ("lib, switch CONFIG_PRINTK_TIME to int") changes
the type of CONFIG_PRINTK_TIME and adapts all existing defconfig files,
while we add two new defconfig files with d0bc3483da87 ("arm/configs:
Add Aspeed defconfig"), with the old type.
This changes the two new defconfig files to match the other ones.
As a result, we get harmless warnings for the arm-soc defconfig
branch by itself, but everything will work when 4.7-rc1 is out.
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
IBM POWER8 NVlink systems come with Tesla K40-ish GPUs each of which
also has a couple of fast speed links (NVLink). The interface to links
is exposed as an emulated PCI bridge which is included into the same
IOMMU group as the corresponding GPU.
In the kernel, NPUs get a separate PHB of the PNV_PHB_NPU type and a PE
which behave pretty much as the standard IODA2 PHB except NPU PHB has
just a single TVE in the hardware which means it can have either
32bit window or 64bit window or DMA bypass but never two of these.
In order to make these links work when GPU is passed to the guest,
these bridges need to be passed as well; otherwise performance will
degrade.
This implements and exports API to manage NPU state in regard to VFIO;
it replicates iommu_table_group_ops.
This defines a new pnv_pci_ioda2_npu_ops which is assigned to
the IODA2 bridge if there are NPUs for a GPU on the bridge.
The new callbacks call the default IODA2 callbacks plus new NPU API.
This adds a gpe_table_group_to_npe() helper to find NPU PE for the IODA2
table_group, it is not expected to fail as the helper is only called
from the pnv_pci_ioda2_npu_ops.
This does not define NPU-specific .release_ownership() so after
VFIO is finished, DMA on NPU is disabled which is ok as the nvidia
driver sets DMA mask when probing which enable 32 or 64bit DMA on NPU.
This adds a pnv_pci_npu_setup_iommu() helper which adds NPUs to
the GPU group if any found. The helper uses helpers to look for
the "ibm,gpu" property in the device tree which is a phandle of
the corresponding GPU.
This adds an additional loop over PEs in pnv_ioda_setup_dma() as the main
loop skips NPU PEs as they do not have 32bit DMA segments.
As pnv_npu_set_window() and pnv_npu_unset_window() are started being used
by the new IODA2-NPU IOMMU group, this makes the helpers public and
adds the DMA window number parameter.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-By: Alistair Popple <[email protected]>
[mpe: Add pnv_pci_ioda_setup_iommu_api() to fix build with IOMMU_API=n]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The pnv_ioda_pe struct keeps an array of peers. At the moment it is only
used to link GPU and NPU for 2 purposes:
1. Access NPU quickly when configuring DMA for GPU - this was addressed
in the previos patch by removing use of it as DMA setup is not what
the kernel would constantly do.
2. Invalidate TCE cache for NPU when it is invalidated for GPU.
GPU and NPU are in different PE. There is already a mechanism to
attach multiple iommu_table_group to the same iommu_table (used for VFIO),
we can reuse it here so does this patch.
This gets rid of peers[] array and PNV_IODA_PE_PEER flag as they are
not needed anymore.
While we are here, add TCE cache invalidation after enabling bypass.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-By: Alistair Popple <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The upcoming NVLink passthrough support will require NPU code to cope
with two DMA windows.
This adds a pnv_npu_set_window() helper which programs 32bit window to
the hardware. This also adds multilevel TCE support.
This adds a pnv_npu_unset_window() helper which removes the DMA window
from the hardware. This does not make difference now as the caller -
pnv_npu_dma_set_bypass() - enables bypass in the hardware but the next
patch will use it to manage TCE table lists for TCE Kill handling.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-By: Alistair Popple <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This exports debugging helper pe_level_printk() and corresponding macroses
so they can be used in npu-dma.c.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-By: Alistair Popple <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
NPU devices are emulated in firmware and mainly used for NPU NVLink
training; one NPU device is per a hardware link. Their DMA/TCE setup
must match the GPU which is connected via PCIe and NVLink so any changes
to the DMA/TCE setup on the GPU PCIe device need to be propagated to
the NVLink device as this is what device drivers expect and it doesn't
make much sense to do anything else.
This makes NPU DMA setup explicit.
pnv_npu_ioda_controller_ops::pnv_npu_dma_set_mask is moved to pci-ioda,
made static and prints warning as dma_set_mask() should never be called
on this function as in any case it will not configure GPU; so we make
this explicit.
Instead of using PNV_IODA_PE_PEER and peers[] (which the next patch will
remove), we test every PCI device if there are corresponding NVLink
devices. If there are any, we propagate bypass mode to just found NPU
devices by calling the setup helper directly (which takes @bypass) and
avoid guessing (i.e. calculating from DMA mask) whether we need bypass
or not on NPU devices. Since DMA setup happens in very rare occasion,
this will not slow down booting or VFIO start/stop much.
This renames pnv_npu_disable_bypass to pnv_npu_dma_set_32 to make it
more clear what the function really does which is programming 32bit
table address to the TVT ("disabling bypass" means writing zeroes to
the TVT).
This removes pnv_npu_dma_set_bypass() from pnv_npu_ioda_fixup() as
the DMA configuration on NPU does not matter until dma_set_mask() is
called on GPU and that will do the NPU DMA configuration.
This removes phb->dma_dev_setup initialization for NPU as
pnv_pci_ioda_dma_dev_setup is no-op for it anyway.
This stops using npe->tce_bypass_base as it never changes and values
other than zero are not supported.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This uses the page size from iommu_table instead of hard-coded 4K.
This should cause no change in behavior.
While we are here, move bits around to prepare for further rework
which will define and use iommu_table_group_ops.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
NPU PHB TCE Kill register is exactly the same as in the rest of POWER8
so let's reuse the existing code for NPU. The only bit missing is
a helper to reset the entire TCE cache so this moves such a helper
from NPU code and renames it.
Since pnv_npu_tce_invalidate() does really invalidate the entire cache,
this uses pnv_pci_ioda2_tce_invalidate_entire() directly for NPU.
This adds an explicit comment for workaround for invalidating NPU TCE
cache.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This replaces magic constants for TCE Kill IODA2 register with macros.
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
As in fact pnv_pci_ioda2_tce_invalidate_entire() invalidates TCEs for
the specific PE rather than the entire cache, rename it to
pnv_pci_ioda2_tce_invalidate_pe(). In later patches we will add
a proper pnv_pci_ioda2_tce_invalidate_entire().
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The function pnv_pci_reset_secondary_bus() is called like below.
It's impossible for call the function on root bus. So it's safe
to remove the root bus case in the function. No functional changes
introduced.
pci_parent_bus_reset() / pci_bus_reset() / pci_try_reset_bus()
pci_reset_bridge_secondary_bus()
pcibios_reset_secondary_bus()
pnv_pci_reset_secondary_bus()
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Daniel Axtens <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This drops unnecessary nested if statements in pnv_eeh_reset() to
improve the code readability. After the changes, the unused local
variable "ret" is dropped as well. No logical changes introduced.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In hotplug case, function pci_add_pci_devices() is called to rescan
the specified PCI bus, which might not have any child devices. Access
to the PCI bus's child device node will cause kernel crash without
exception.
This adds one more check to skip scanning PCI bus that doesn't have
any subordinate devices from device-tree, in order to avoid kernel
crash.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This renames traverse_pci_devices() to pci_traverse_device_nodes().
The function traverses all subordinate device nodes of the specified
one. Also, below cleanup applied to the function. No logical changes
introduced.
* Rename "pre" to "fn".
* Avoid assignment in if condition reported from checkpatch.pl.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This implements and exports pci_remove_device_node_info(). It's
used to remove the pdn (struct pci_dn) for the indicated device
node. The function is going to be used by PowerNV PCI hotplug
driver.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This renames update_dn_pci_info() to pci_add_device_node_info()
with corresponding adjustment on the parameter type and exports it.
The function is used to create pdn (struct pci_dn) for the indicated
device node. Another function add_pdn(), almost wrapper of
pci_add_device_node_info(), to be used in traverse_pci_devices(). No
logical changes introduced.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This moves pci_find_bus_by_node() from arch/powerpc/platforms/
pseries/pci_dlpar.c to arch/powerpc/kernel/pci-hotplug.c so that
the function can be used by pSeries and PowerNV platform at the
same time. Also, below cleanup applied. No functional changes
introduced.
* Remove variable "busdn" in find_bus_among_children()
* Use PCI_DN() to convert device node to pci_dn
Signed-off-by: Gavin Shan <[email protected]>
Acked-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This renames pcibios_find_pci_bus() to pci_find_bus_by_node() to
avoid conflicts with those PCI subsystem weak function names, which
have prefix "pcibios". No logical changes introduced.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This renames pcibios_{add,remove}_pci_devices() to avoid conflicts
with names of the weak functions in PCI subsystem, which have the
prefix "pcibios". No logical changes introduced.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-By: Alistair Popple <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In current implementation, the PEs that are allocated or picked
from the reserved list are identified by PE number. The PE instance
has to be picked according to the PE number eventually. We have
same issue when PE is released.
For pnv_ioda_pick_m64_pe() and pnv_ioda_alloc_pe(), this returns
PE instance so that pnv_ioda_setup_bus_PE() can use the allocated
or reserved PE instance directly. Also, pnv_ioda_setup_bus_PE()
returns the reserved/allocated PE instance to be used in subsequent
patches. On the other hand, pnv_ioda_free_pe() uses PE instance
(not number) as its argument. No logical changes introduced.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
In current implementation, the DMA32 segments required by one specific
PE isn't calculated with the information hold in the PE independently.
It conflicts with the PCI hotplug design: PE centralized, meaning the
PE's DMA32 segments should be calculated from the information hold in
the PE independently.
This introduces an array (@dma32_segmap) for every PHB to track the
DMA32 segmeng usage. Besides, this moves the logic calculating PE's
consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's
DMA32 segments are calculated/allocated from the information hold in
the PE (DMA32 weight). Also the logic is improved: we try to allocate
as much DMA32 segments as we can. It's acceptable that number of DMA32
segments less than the expected number are allocated.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according
to their DMA32 weight. The PEs on the list are iterated to setup
their TCE32 tables at system booting time. The list is used for
once at boot time and no need to keep it.
This moves the logic calculating DMA32 weight of PHB and PE to
pnv_ioda_setup_dma() to drop PHB's DMA32 list. Also, every PE
traces the consumed DMA32 segment by @tce32_seg and @tce32_segcount
are useless and they're removed.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently, there is one macro (TCE32_TABLE_SIZE) representing the
TCE table size for one DMA32 segment. The constant representing
the DMA32 segment size (1 << 28) is still used in the code.
This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
segment size. the TCE table size can be calcualted when the page
has fixed 4KB size. So all the related calculation depends on one
macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-By: Alistair Popple <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This renames pnv_pci_ioda_setup_dma_pe() to pnv_pci_ioda1_setup_dma_pe()
as it's the counter-part of IODA2's pnv_pci_ioda2_setup_dma_pe().
No logical changes introduced.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This enables M64 window on P7IOC, which has been enabled on PHB3.
Different from PHB3 where 16 M64 BARs are supported and each of
them can be owned by one particular PE# exclusively or divided
evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
of them are divided to 8 segments. So every P7IOC PHB supports
128 M64 segments in total. P7IOC has M64DT, which helps mapping
one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
M64DT, indicating that one M64 segment can only be pinned to the
fixed PE#.
In order to unified M64 support M64 on P7IOC and PHB3, we just
provide 128 M64 segments on every P7IOC PHB and each of them is
pinned to the fixed PE# by bypassing the function of M64DT. In
turn, we just need different phb->init_m64() for P7IOC and PHB3
and maps M64 segment in pnv_ioda_reserve_m64_pe() for P7IOC, most
of the code are shared by them.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This renames those functions picking PE number based on consumed
M64 segments, mapping M64 segments to PEs as those functions are
going to be shared by IODA1/IODA2 in next patch. No logical changes
introduced.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
When unplugging PCI devices, their parent PEs might be offline.
The consumed M64 resource by the PEs should be released at that
time. As we track M32 segment consumption, this introduces an
array to the PHB to track the mapping between M64 segment and
PE number.
Note: M64 mapping isn't covered by pnv_ioda_setup_pe_seg() as
IODA2 doesn't support the mapping explicitly while it's supported
on IODA1. Until now, no M64 is supported on IODA1 in software.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently, the IO and M32 segments are mapped to the corresponding
PE based on the windows of the parent bridge of PE's primary bus.
It's not going to work when the windows of root port or upstream
port of the PCIe switch behind root port are extended to PHB's
apertures in order to support hotplug in subsequent patch.
This fixes the issue by mapping IO and M32 segments based on the
resources of the PCI devices included in the PE, instead of the
windows of the parent bridge of the PE's primary bus.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
pnv_ioda_setup_pe_seg() associates the IO and M32 segments with the
owner PE. The code mapping segments should be fixed and immune from
logic changes introduced to pnv_ioda_setup_pe_seg().
This moves the code mapping segments to helper pnv_ioda_setup_pe_res().
The data type for @rc is changed to "int64_t". Also, argument @hose is
removed from pnv_ioda_setup_pe() as it can be got from @pe. No functional
changes introduced.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-By: Alistair Popple <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
There are two arrays for IO and M32 segment maps on every PHB.
The index of the arrays are segment number and the value stored
in the corresponding element is PE number, indicating the segment
is assigned to the PE. Initially, all elements in those two arrays
are zeroes, meaning all segments are assigned to PE#0. It's wrong.
This fixes the initial values in the elements of those two arrays
to IODA_INVALID_PE, meaning all segments aren't assigned to any
PE.
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This changes the data type of PE number from "int" to "unsigned int"
in order to match the fact PE number is never negative:
* The number of PE to which the specified PCI device is attached.
* The PE number map for SRIOV VFs.
* The returned PE number from pnv_ioda_alloc_pe().
* The returned PE number from pnv_ioda2_pick_m64_pe().
Suggested-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-By: Alistair Popple <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|