Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt core and driver subsystem:
The bulk is the rework of the MSI subsystem to support per device MSI
interrupt domains. This solves conceptual problems of the current
PCI/MSI design which are in the way of providing support for
PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device.
IMS (Interrupt Message Store] is a new specification which allows
device manufactures to provide implementation defined storage for MSI
messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified
message store which is uniform accross all devices). The PCI/MSI[-X]
uniformity allowed us to get away with "global" PCI/MSI domains.
IMS not only allows to overcome the size limitations of the MSI-X
table, but also gives the device manufacturer the freedom to store the
message in arbitrary places, even in host memory which is shared with
the device.
There have been several attempts to glue this into the current MSI
code, but after lengthy discussions it turned out that there is a
fundamental design problem in the current PCI/MSI-X implementation.
This needs some historical background.
When PCI/MSI[-X] support was added around 2003, interrupt management
was completely different from what we have today in the actively
developed architectures. Interrupt management was completely
architecture specific and while there were attempts to create common
infrastructure the commonalities were rudimentary and just providing
shared data structures and interfaces so that drivers could be written
in an architecture agnostic way.
The initial PCI/MSI[-X] support obviously plugged into this model
which resulted in some basic shared infrastructure in the PCI core
code for setting up MSI descriptors, which are a pure software
construct for holding data relevant for a particular MSI interrupt,
but the actual association to Linux interrupts was completely
architecture specific. This model is still supported today to keep
museum architectures and notorious stragglers alive.
In 2013 Intel tried to add support for hot-pluggable IO/APICs to the
kernel, which was creating yet another architecture specific mechanism
and resulted in an unholy mess on top of the existing horrors of x86
interrupt handling. The x86 interrupt management code was already an
incomprehensible maze of indirections between the CPU vector
management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X]
implementation.
At roughly the same time ARM struggled with the ever growing SoC
specific extensions which were glued on top of the architected GIC
interrupt controller.
This resulted in a fundamental redesign of interrupt management and
provided the today prevailing concept of hierarchical interrupt
domains. This allowed to disentangle the interactions between x86
vector domain and interrupt remapping and also allowed ARM to handle
the zoo of SoC specific interrupt components in a sane way.
The concept of hierarchical interrupt domains aims to encapsulate the
functionality of particular IP blocks which are involved in interrupt
delivery so that they become extensible and pluggable. The X86
encapsulation looks like this:
|--- device 1
[Vector]---[Remapping]---[PCI/MSI]--|...
|--- device N
where the remapping domain is an optional component and in case that
it is not available the PCI/MSI[-X] domains have the vector domain as
their parent. This reduced the required interaction between the
domains pretty much to the initialization phase where it is obviously
required to establish the proper parent relation ship in the
components of the hierarchy.
While in most cases the model is strictly representing the chain of IP
blocks and abstracting them so they can be plugged together to form a
hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the
hardware it's clear that the actual PCI/MSI[-X] interrupt controller
is not a global entity, but strict a per PCI device entity.
Here we took a short cut on the hierarchical model and went for the
easy solution of providing "global" PCI/MSI domains which was possible
because the PCI/MSI[-X] handling is uniform across the devices. This
also allowed to keep the existing PCI/MSI[-X] infrastructure mostly
unchanged which in turn made it simple to keep the existing
architecture specific management alive.
A similar problem was created in the ARM world with support for IP
block specific message storage. Instead of going all the way to stack
a IP block specific domain on top of the generic MSI domain this ended
in a construct which provides a "global" platform MSI domain which
allows overriding the irq_write_msi_msg() callback per allocation.
In course of the lengthy discussions we identified other abuse of the
MSI infrastructure in wireless drivers, NTB etc. where support for
implementation specific message storage was just mindlessly glued into
the existing infrastructure. Some of this just works by chance on
particular platforms but will fail in hard to diagnose ways when the
driver is used on platforms where the underlying MSI interrupt
management code does not expect the creative abuse.
Another shortcoming of today's PCI/MSI-X support is the inability to
allocate or free individual vectors after the initial enablement of
MSI-X. This results in an works by chance implementation of VFIO (PCI
pass-through) where interrupts on the host side are not set up upfront
to avoid resource exhaustion. They are expanded at run-time when the
guest actually tries to use them. The way how this is implemented is
that the host disables MSI-X and then re-enables it with a larger
number of vectors again. That works by chance because most device
drivers set up all interrupts before the device actually will utilize
them. But that's not universally true because some drivers allocate a
large enough number of vectors but do not utilize them until it's
actually required, e.g. for acceleration support. But at that point
other interrupts of the device might be in active use and the MSI-X
disable/enable dance can just result in losing interrupts and
therefore hard to diagnose subtle problems.
Last but not least the "global" PCI/MSI-X domain approach prevents to
utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact
that IMS is not longer providing a uniform storage and configuration
model.
The solution to this is to implement the missing step and switch from
global PCI/MSI domains to per device PCI/MSI domains. The resulting
hierarchy then looks like this:
|--- [PCI/MSI] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
which in turn allows to provide support for multiple domains per
device:
|--- [PCI/MSI] device 1
|--- [PCI/IMS] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
|--- [PCI/IMS] device N
This work converts the MSI and PCI/MSI core and the x86 interrupt
domains to the new model, provides new interfaces for post-enable
allocation/free of MSI-X interrupts and the base framework for
PCI/IMS. PCI/IMS has been verified with the work in progress IDXD
driver.
There is work in progress to convert ARM over which will replace the
platform MSI train-wreck. The cleanup of VFIO, NTB and other creative
"solutions" are in the works as well.
Drivers:
- Updates for the LoongArch interrupt chip drivers
- Support for MTK CIRQv2
- The usual small fixes and updates all over the place"
* tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits)
irqchip/ti-sci-inta: Fix kernel doc
irqchip/gic-v2m: Mark a few functions __init
irqchip/gic-v2m: Include arm-gic-common.h
irqchip/irq-mvebu-icu: Fix works by chance pointer assignment
iommu/amd: Enable PCI/IMS
iommu/vt-d: Enable PCI/IMS
x86/apic/msi: Enable PCI/IMS
PCI/MSI: Provide pci_ims_alloc/free_irq()
PCI/MSI: Provide IMS (Interrupt Message Store) support
genirq/msi: Provide constants for PCI/IMS support
x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN
PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X
PCI/MSI: Provide prepare_desc() MSI domain op
PCI/MSI: Split MSI-X descriptor setup
genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN
genirq/msi: Provide msi_domain_alloc_irq_at()
genirq/msi: Provide msi_domain_ops:: Prepare_desc()
genirq/msi: Provide msi_desc:: Msi_data
genirq/msi: Provide struct msi_map
x86/apic/msi: Remove arch_create_remap_msi_irq_domain()
...
|
|
- Remove unnecessary <linux/of_irq.h> includes (Bjorn Helgaas)
* pci/kbuild:
PCI: Drop of_match_ptr() to avoid unused variables
PCI: Remove unnecessary <linux/of_irq.h> includes
PCI: xgene-msi: Include <linux/irqdomain.h> explicitly
PCI: mvebu: Include <linux/irqdomain.h> explicitly
PCI: microchip: Include <linux/irqdomain.h> explicitly
PCI: altera-msi: Include <linux/irqdomain.h> explicitly
# Conflicts:
# drivers/pci/controller/pci-mvebu.c
|
|
- Fix whitespace issues (Michal Simek)
* pci/ctrl/xilinx:
PCI: xilinx-nwl: Fix coding style violations
|
|
- Switch to the gpiod API so we can make of_get_named_gpio_flags() private
(Dmitry Torokhov)
* pci/ctrl/mvebu:
PCI: mvebu: Switch to using gpiod API
|
|
- Switch to using devm_gpiod_get_optional() so we can stop exporting
devm_gpiod_get_from_of_node() (Dmitry Torokhov)
* pci/ctrl/aardvark:
PCI: aardvark: Switch to using devm_gpiod_get_optional()
|
|
- Register notifier if core_init_notifier is enabled in pci-epf-test
(Kunihiko Hayashi)
- Fixup Kconfig indentation (Shunsuke Mie)
* remotes/lorenzo/pci/misc:
PCI: endpoint: Fix Kconfig indent style
PCI: pci-epf-test: Register notifier if only core_init_notifier is enabled
|
|
- Restore MSI remapping configuration during resume because the
configuration is cleared out by firmware when suspending (Nirmal Patel)
- Reset the hierarchy below VMD when probing the VMD; we attempted this
before, but with the wrong device, so it didn't work (Francisco Munoz)
* remotes/lorenzo/pci/vmd:
PCI: vmd: Fix secondary bus reset for Intel bridges
PCI: vmd: Disable MSI remapping after suspend
|
|
- Switch from devm_gpiod_get_from_of_node() to devm_fwnode_gpiod_get()
(Dmitry Torokhov)
* remotes/lorenzo/pci/tegra:
PCI: tegra: Switch to using devm_fwnode_gpiod_get
|
|
- Add DT and driver support for SC8280XP/SA8540P basic interconnects where
interconnect bandwidth must be requested before enabling interconnect
clocks (Johan Hovold)
- Add 'dma-coherent' property (Johan Hovold)
* remotes/lorenzo/pci/qcom:
dt-bindings: PCI: qcom: Allow 'dma-coherent' property
PCI: qcom: Add basic interconnect support
dt-bindings: PCI: qcom: Add SC8280XP/SA8540P interconnects
|
|
- Add sentinel to mt7621_pcie_quirks_match[] to prevent oops when parsing
the table (John Thomson)
* remotes/lorenzo/pci/mt7621:
PCI: mt7621: Add sentinel to quirks table
|
|
- Add a .release() callback for the Endpoint Controller library so an
Endpoint driver is removable (Yoshihiro Shimoda)
- Fix pci-epf-vntb kernel-doc and whitespace (Frank Li)
- Fix pci-epf-vntb error path usage of pci_epc_mem_free_addr() (Frank Li)
- Remove pci-epf-vntb unused epf_db_phy (Frank Li)
- Fix pci-epf-vntb sparse warnings (Frank Li)
* remotes/lorenzo/pci/endpoint:
PCI: endpoint: pci-epf-vntb: Fix sparse ntb->reg build warning
PCI: endpoint: pci-epf-vntb: Fix sparse build warning for epf_db
PCI: endpoint: pci-epf-vntb: Replace hardcoded 4 with sizeof(u32)
PCI: endpoint: pci-epf-vntb: Remove unused epf_db_phy struct member
PCI: endpoint: pci-epf-vntb: Fix call pci_epc_mem_free_addr() in error path
PCI: endpoint: pci-epf-vntb: Fix struct epf_ntb_ctrl indentation
PCI: endpoint: pci-epf-vntb: Clean up kernel_doc warning
PCI: endpoint: Fix WARN() when an endpoint driver is removed
|
|
- Fix n_fts[] array overrun (Vidya Sagar)
- Don't advertise PTM Responder role for Endpoints (Vidya Sagar)
- Fix qcom "reset assert" error message (Manivannan Sadhasivam)
- Downgrade "link didn't come up" message to dev_info (Vidya Sagar)
- Initialize PHY before deasserting core reset so the link comes up on
boards where the PHY provides the reference clock (this was a regression
in v6.0) (Sascha Hauer)
- Switch histb to the gpiod API (Dmitry Torokhov)
- Fix imx6sx and imx8mq clock names in DT binding (Serge Semin)
- Fix visconti MSI interrupt in DT binding (Serge Semin)
- Consolidate reset-gpio, cdm, windows info in common DT shared by both
Root Port and Endpoint bindings (Serge Semin)
- Remove bus node from DT examples (Serge Semin)
- Add common phys, phy-names to DT (Serge Semin)
- Add default max-link-speed of Gen5 to DT (Serge Semin)
- Apply generic schema for generic device (Serge Semin)
- Add default max-functions of 32 to DT (Serge Semin)
- Add common interrupts, interrupt-names to DT (Serge Semin)
- Add common regs, reg-names to DT (Serge Semin)
- Add common clocks, resets to DT (Serge Semin)
- Add dma-coherent to DT (Serge Semin)
- Apply common schema to Rockchip DT (Serge Semin)
- Add Baikal-T1 DT bindings (Serge Semin)
- Add dma-ranges support in DesignWare core (Serge Semin)
- Add dw_pcie_cap_is() for testing controller capabilities (Serge Semin)
- Add generic resources getter to DesignWare core (Serge Semin)
- Combine iATU detection procedures (Serge Semin)
- Add generic clock and reset names to DesignWare core (Serge Semin)
- Add Baikal-T1 PCIe controller driver (Serge Semin)
* remotes/lorenzo/pci/dwc:
PCI: dwc: Add Baikal-T1 PCIe controller support
PCI: dwc: Introduce generic platform clocks and resets
PCI: dwc: Combine iATU detection procedures
PCI: dwc: Introduce generic resources getter
PCI: dwc: Introduce generic controller capabilities interface
PCI: dwc: Introduce dma-ranges property support for RC-host
dt-bindings: PCI: dwc: Add Baikal-T1 PCIe Root Port bindings
dt-bindings: PCI: dwc: Apply common schema to Rockchip DW PCIe nodes
dt-bindings: PCI: dwc: Add dma-coherent property
dt-bindings: PCI: dwc: Add clocks/resets common properties
dt-bindings: PCI: dwc: Add reg/reg-names common properties
dt-bindings: PCI: dwc: Add interrupts/interrupt-names common properties
dt-bindings: PCI: dwc: Add max-functions EP property
dt-bindings: PCI: dwc: Apply generic schema for generic device only
dt-bindings: PCI: dwc: Add max-link-speed common property
dt-bindings: PCI: dwc: Add phys/phy-names common properties
dt-bindings: PCI: dwc: Remove bus node from the examples
dt-bindings: PCI: dwc: Detach common RP/EP DT bindings
dt-bindings: visconti-pcie: Fix interrupts array max constraints
dt-bindings: imx6q-pcie: Fix clock names for imx6sx and imx8mq
PCI: histb: Switch to using gpiod API
PCI: imx6: Initialize PHY before deasserting core reset
PCI: dwc: Use dev_info for PCIe link down event logging
PCI: qcom: Fix error message for reset_control_assert()
PCI: designware-ep: Disable PTM capabilities for EP mode
PCI: Add PCI_PTM_CAP_RES macro
PCI: dwc: Fix n_fts[] array overrun
|
|
- Enable Multi-MSI (Jim Quinlan)
- Wait for 100ms after PERST# deassert for power and clocks to stabilize
(Jim Quinlan)
- Use readl_poll_timeout_atomic() instead of hand-rolled timeout loop (Jim
Quinlan)
- Drop needless "inline" annotations (Jim Quinlan)
- Set RCB_MPS mode bit so data for reads up to MPS are returned in a single
completion (Jim Quinlan)
* remotes/lorenzo/pci/brcmstb:
PCI: brcmstb: Set RCB_{MPS,64B}_MODE bits
PCI: brcmstb: Drop needless 'inline' annotations
PCI: brcmstb: Replace status loops with read_poll_timeout_atomic()
PCI: brcmstb: Wait for 100ms following PERST# deassert
PCI: brcmstb: Enable Multi-MSI
|
|
- Fix a double free in the error path of creating sysfs "resource%d"
attributes (Sascha Hauer)
* pci/sysfs:
PCI/sysfs: Fix double free in error path
|
|
- Remove EfiMemoryMappedIO regions from the E820 map to allow PCI core to
allocate BARs from them. The only purpose of EfiMemoryMappedIO is to
tell the OS to map things needed by EFI runtime services, so it's often
used for PCI host bridge apertures. If we can't allocate from those
apertures, we can't hot-add devices (Bjorn Helgaas)
* pci/resource:
x86/PCI: Use pr_info() when possible
x86/PCI: Fix log message typo
x86/PCI: Tidy E820 removal messages
PCI: Skip allocate_resource() if too little space available
efi/x86: Remove EfiMemoryMappedIO from E820 map
|
|
- Squash portdrv_core.c and portdrv_pci.c into portdrv.c to make it easier
to find things (Bjorn Helgaas)
- Allow AER service only for Root Ports & RCECs so portdrv can successfully
bind to other devices that have AER but lack MSI (which they don't need
for AER), which allows power management for those devices (Bjorn Helgaas)
* pci/portdrv:
PCI/portdrv: Allow AER service only for Root Ports & RCECs
PCI/portdrv: Unexport pcie_port_service_register(), pcie_port_service_unregister()
PCI/portdrv: Move private things to portdrv.c
PCI/portdrv: Squash into portdrv.c
|
|
- Remove unused 'state' parameter to pci_legacy_suspend_late() (Bjorn
Helgaas)
* pci/pm:
PCI/PM: Remove unused 'state' parameter to pci_legacy_suspend_late()
|
|
- Use METHOD_NAME__UID instead of plain string to make it easier to find
all uses (Yipeng Zou)
* pci/misc:
PCI/ACPI: Use METHOD_NAME__UID instead of plain string
|
|
- Enable pciehp by default if USB4 is enabled because USB4/Thunderbolt
tunneling depends on native PCIe hotplug (Albert Zhou)
- Make sure pciehp binds only to Downstream Ports, not Upstream Ports
(Rafael J. Wysocki)
- Remove unused get_mode1_ECC_cap callback in shpchp (Ian Cowan)
- Enable pciehp Command Completed Interrupt only if supported to reduce
confusion when looking at lspci output (Pali Rohár)
* pci/hotplug:
PCI: pciehp: Enable Command Completed Interrupt only if supported
PCI: shpchp: Remove unused get_mode1_ECC_cap callback
PCI: acpiphp: Avoid setting is_hotplug_bridge for PCIe Upstream Ports
PCI/portdrv: Set PCIE_PORT_SERVICE_HP for Root and Downstream Ports only
PCI: pciehp: Enable by default if USB4 enabled
|
|
- Only read/write PCIe Link 2 registers for devices with Links and PCIe
Capability version >= 2 (Maciej W. Rozycki)
- Revert a patch that cleared PCI_STATUS during enumeration because it
broke Linux guests on Apple's virtualization framework (Bjorn Helgaas)
- Assign PCI domain IDs using IDAs so IDs can be easily reused after
loading/unloading host bridge drivers (Pali Rohár)
- Fix pci_device_is_present(), which previously always returned "false" for
VFs because their vendor ID is always 0xfff (Michael S. Tsirkin)
- Check for alloc failure in pci_request_irq() (Zeng Heng)
* pci/enumeration:
PCI: Check for alloc failure in pci_request_irq()
PCI: Fix pci_device_is_present() for VFs by checking PF
PCI: Assign PCI domain IDs by ida_alloc()
Revert "PCI: Clear PCI_STATUS when setting up device"
PCI: Access Link 2 registers only for devices with Links
|
|
pci_bus_alloc_from_region() allocates MMIO space by iterating through all
the resources available on the bus. The available resource might be
reduced if the caller requires 32-bit space or we're avoiding BIOS or E820
areas.
Don't bother calling allocate_resource() if we need more space than is
available in this resource. This prevents some pointless and annoying
messages about avoided areas.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Hans de Goede <[email protected]>
|
|
Previously portdrv allowed the AER service for any device with an AER
capability (assuming Linux had control of AER) even though the AER service
driver only attaches to Root Port and RCECs.
Because get_port_device_capability() included AER for non-RP, non-RCEC
devices, we tried to initialize the AER IRQ even though these devices
don't generate AER interrupts.
Intel DG1 and DG2 discrete graphics cards contain a switch leading to a
GPU. The switch supports AER but not MSI, so initializing an AER IRQ
failed, and portdrv failed to claim the switch port at all. The GPU itself
could be suspended, but the switch could not be put in a low-power state
because it had no driver.
Don't allow the AER service on non-Root Port, non-Root Complex Event
Collector devices. This means we won't enable Bus Mastering if the device
doesn't require MSI, the AER service will not appear in sysfs, and the AER
service driver will not bind to the device.
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Based-on-patch-by: Mika Westerberg <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
|
|
Fix code alignments and remove additional newline.
Link: https://lore.kernel.org/r/17c75e7003bb8c43a0f45ae3d7c45cac230ef852.1670503129.git.michal.simek@amd.com
Signed-off-by: Michal Simek <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Switch the driver away from legacy gpio/of_gpio API to gpiod API, and
remove use of of_get_named_gpio_flags() which I want to make private to
gpiolib.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
The No Command Completed Support bit in the Slot Capabilities register
indicates whether Command Completed Interrupt Enable is unsupported.
We already check whether No Command Completed Support bit is set in
pcie_wait_cmd(), and do not wait in this case.
Don't enable this Command Completed Interrupt at all if NCCS is set, so
that when users dump configuration space from userspace, the dump does not
confuse them by saying that Command Completed Interrupt is not supported,
but it is enabled.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Pali Rohár <[email protected]>
Signed-off-by: Marek Behún <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Lukas Wunner <[email protected]>
|
|
Switch the driver to the generic version of gpiod API (and away from
OF-specific variant), so that we can stop exporting
devm_gpiod_get_from_of_node().
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Acked-by: Pali Rohár <[email protected]>
|
|
Current driver is missing a sentinel in the struct soc_device_attribute
array, which causes an oops when assessed by the
soc_device_match(mt7621_pcie_quirks_match) call.
This was only exposed once the CONFIG_SOC_MT7621 mt7621 soc_dev_attr
was fixed to register the SOC as a device, in:
commit 7c18b64bba3b ("mips: ralink: mt7621: do not use kzalloc too early")
Fix it by adding the required sentinel.
Link: https://lore.kernel.org/lkml/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Fixes: b483b4e4d3f6 ("staging: mt7621-pci: add quirks for 'E2' revision using 'soc_device_attribute'")
Signed-off-by: John Thomson <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Acked-by: Sergio Paracuellos <[email protected]>
|
|
The reset was never applied in the current implementation because Intel
Bridges owned by VMD are parentless. Internally, pci_reset_bus() applies
a reset to the parent of the PCI device supplied as argument, but in this
case it failed because there wasn't a parent.
In more detail, this change allows the VMD driver to enumerate NVMe devices
in pass-through configurations when guest reboots are performed. There was
an attempted to fix this, but later we discovered that the code inside
pci_reset_bus() wasn’t triggering secondary bus resets. Therefore, we
updated the parameters passed to it, and now NVMe SSDs attached to VMD
bridges are properly enumerated in VT-d pass-through scenarios.
Link: https://lore.kernel.org/r/[email protected]
Fixes: 6aab5622296b ("PCI: vmd: Clean up domain before enumeration")
Signed-off-by: Francisco Munoz <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Reviewed-by: Nirmal Patel <[email protected]>
Reviewed-by: Jonathan Derrick <[email protected]>
|
|
Single vector allocation which allocates the next free index in the IMS
space. The free function releases.
All allocated vectors are released also via pci_free_vectors() which is
also releasing MSI/MSI-X vectors.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
IMS (Interrupt Message Store) is a new specification which allows
implementation specific storage of MSI messages contrary to the
strict standard specified MSI and MSI-X message stores.
This requires new device specific interrupt domains to handle the
implementation defined storage which can be an array in device memory or
host/guest memory which is shared with hardware queues.
Add a function to create IMS domains for PCI devices. IMS domains are using
the new per device domain mechanism and are configured by the device driver
via a template. IMS domains are created as secondary device domains so they
work side on side with MSI[-X] on the same device.
The IMS domains have a few constraints:
- The index space is managed by the core code.
Device memory based IMS provides a storage array with a fixed size
which obviously requires an index. But there is no association between
index and functionality so the core can randomly allocate an index in
the array.
System memory based IMS does not have the concept of an index as the
storage is somewhere in memory. In that case the index is purely
software based to keep track of the allocations.
- There is no requirement for consecutive index ranges
This is currently a limitation of the MSI core and can be implemented
if there is a justified use case by changing the internal storage from
xarray to maple_tree. For now it's single vector allocation.
- The interrupt chip must provide the following callbacks:
- irq_mask()
- irq_unmask()
- irq_write_msi_msg()
- The interrupt chip must provide the following optional callbacks
when the irq_mask(), irq_unmask() and irq_write_msi_msg() callbacks
cannot operate directly on hardware, e.g. in the case that the
interrupt message store is in queue memory:
- irq_bus_lock()
- irq_bus_unlock()
These callbacks are invoked from preemptible task context and are
allowed to sleep. In this case the mandatory callbacks above just
store the information. The irq_bus_unlock() callback is supposed to
make the change effective before returning.
- Interrupt affinity setting is handled by the underlying parent
interrupt domain and communicated to the IMS domain via
irq_write_msi_msg(). IMS domains cannot have a irq_set_affinity()
callback. That's a reasonable restriction similar to the PCI/MSI
device domain implementations.
The domain is automatically destroyed when the PCI device is removed.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
MSI-X vectors can be allocated after the initial MSI-X enablement, but this
needs explicit support of the underlying interrupt domains.
Provide a function to query the ability and functions to allocate/free
individual vectors post-enable.
The allocation can either request a specific index in the MSI-X table or
with the index argument MSI_ANY_INDEX it allocates the next free vector.
The return value is a struct msi_map which on success contains both index
and the Linux interrupt number. In case of failure index is negative and
the Linux interrupt number is 0.
The allocation function is for a single MSI-X index at a time as that's
sufficient for the most urgent use case VFIO to get rid of the 'disable
MSI-X, reallocate, enable-MSI-X' cycle which is prone to lost interrupts
and redirections to the legacy and obviously unhandled INTx.
As single index allocation is also sufficient for the use cases Jason
Gunthorpe pointed out: Allocation of a MSI-X or IMS vector for a network
queue. See Link below.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
The setup of MSI descriptors for PCI/MSI-X interrupts depends partially on
the MSI index for which the descriptor is initialized.
Dynamic MSI-X vector allocation post MSI-X enablement allows to allocate
vectors at a given index or at any free index in the available table
range. The latter requires that the descriptor is initialized after the
MSI core has chosen an index.
Implement the prepare_desc() op in the PCI/MSI-X specific msi_domain_ops
which is invoked before the core interrupt descriptor and the associated
Linux interrupt number is allocated.
That callback is also provided for the upcoming PCI/IMS implementations so
the implementation specific interrupt domain can do their domain specific
initialization of the MSI descriptors.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The upcoming mechanism to allocate MSI-X vectors after enabling MSI-X needs
to share some of the MSI-X descriptor setup.
The regular descriptor setup on enable has the following code flow:
1) Allocate descriptor
2) Setup descriptor with PCI specific data
3) Insert descriptor
4) Allocate interrupts which in turn scans the inserted
descriptors
This cannot be easily changed because the PCI/MSI code needs to handle the
legacy architecture specific allocation model and the irq domain model
where quite some domains have the assumption that the above flow is how it
works.
Ideally the code flow should look like this:
1) Invoke allocation at the MSI core
2) MSI core allocates descriptor
3) MSI core calls back into the irq domain which fills in
the domain specific parts
This could be done for underlying parent MSI domains which support
post-enable allocation/free but that would create significantly different
code pathes for MSI/MSI-X enable.
Though for dynamic allocation which wants to share the allocation code with
the upcoming PCI/IMS support it's the right thing to do.
Split the MSI-X descriptor setup into the preallocation part which just sets
the index and fills in the horrible hack of virtual IRQs and the real PCI
specific MSI-X setup part which solely depends on the index in the
descriptor. This allows to provide a common dynamic allocation interface at
the MSI core level for both PCI/MSI-X and PCI/IMS.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The check for special MSI domains like VMD which prevents the interrupt
remapping code to overwrite device::msi::domain is not longer required and
has been replaced by an x86 specific version which is aware of MSI parent
domains.
Remove it.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Provide a template and the necessary callbacks to create PCI/MSI and
PCI/MSI-X domains.
The domains are created when MSI or MSI-X is enabled. The domain's lifetime
is either the device lifetime or in case that e.g. MSI-X was tried first
and failed, then the MSI-X domain is removed and a MSI domain is created as
both are mutually exclusive and reside in the default domain ID slot of the
per device domain pointer array.
Also expand pci_msi_domain_supports() to handle feature checks correctly
even in the case that the per device domain was not yet created by checking
the features supported by the MSI parent.
Add the necessary setup calls into the MSI and MSI-X enable code path.
These setup calls are backwards compatible. They return success when there
is no parent domain found, which means the existing global domains or the
legacy allocation path keep just working.
Co-developed-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The upcoming per device MSI domains will create different domains for MSI
and MSI-X. Split the write message function into MSI and MSI-X helpers so
they can be used by those new domain functions seperately.
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Pick up CXL AER handling and correctable error extensions. Resolve
conflicts with cxl_pmem_wq reworks and RCH support.
|
|
Switch to the new domain id aware interfaces to phase out the previous
ones. No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This reflects the functionality better. No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Use bullet-list RST syntax for kernel-doc parameters' flags and interrupt
mode descriptions. Otherwise Sphinx produces "Unexpected identation" errors
and warnings.
Fixes: 5c0997dc33ac24 ("PCI/MSI: Move pci_alloc_irq_vectors() to api.c")
Fixes: 017239c8db2093 ("PCI/MSI: Move pci_irq_vector() to api.c")
Fixes: be37b8428b7b77 ("PCI/MSI: Move pci_irq_get_affinity() to api.c")
Reported-by: Stephen Rothwell <[email protected]>
Suggested-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Bagas Sanjaya <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Ahmed S. Darwish <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Some new devices such as CXL devices may want to record additional error
information on a corrected error. Add a callback to allow the PCI device
driver to do additional logging such as providing additional stats for user
space RAS monitoring.
For CXL device, this is actually a need due to CXL needing to write to the
CXL RAS capability structure correctable error status register in order to
clear the unmasked correctable errors. See CXL spec rev3.0 8.2.4.16.
Suggested-by: Jonathan Cameron <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/166984619233.2804404.3966368388544312674.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <[email protected]>
|
|
Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version.
The rc fix was done a different way when iommufd patches reworked this
code.
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The function hv_set_affinity was removed in commit 831c1ae7 ("PCI: hv:
Make the code arch neutral by adding arch specific interfaces").
Signed-off-by: Olaf Hering <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
pci-epf-vntb.c:1128:33: sparse: expected void [noderef] __iomem *base
pci-epf-vntb.c:1128:33: sparse: got struct epf_ntb_ctrl *reg
Add __iomem type cast in vntb_epf_peer_spad_read() and
vntb_epf_peer_spad_write().
Link: https://lore.kernel.org/r/[email protected]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Frank Li <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Acked-by: Manivannan Sadhasivam <[email protected]>
|
|
Use epf_db[i] dereference instead of readl() because epf_db is
in memory allocated by dma_alloc_coherent(), not I/O.
Remove useless/duplicated readl() in the process.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Frank Li <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
NTB spad entry item size is sizeof(u32), replace hardcoded 4 with it.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Frank Li <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Acked-by: Manivannan Sadhasivam <[email protected]>
|
|
epf_db_phy member in struct epf_ntb is not used, remove it.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Frank Li <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Acked-by: Manivannan Sadhasivam <[email protected]>
|
|
Replace pci_epc_mem_free_addr() with pci_epf_free_space() in the
error handle path to match pci_epf_alloc_space().
Link: https://lore.kernel.org/r/[email protected]
Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
Signed-off-by: Frank Li <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
Align the indentation of struct epf_ntb_ctrl with other structs in
the driver.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Frank Li <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
Cleanup warning found by scripts/kernel-doc.
Consolidate terms:
- host, host1 to HOST
- vhost, vHost, Vhost, VHOST2 to VHOST
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Frank Li <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Acked-by: Manivannan Sadhasivam <[email protected]>
|