Age | Commit message (Collapse) | Author | Files | Lines |
|
- Select CONFIG_REGMAP_MMIO so kirin driver links correctly (Josh Triplett)
* pci/controller/kirin:
PCI: kirin: Select REGMAP_MMIO
|
|
- Use the PCI_CONF1_ADDRESS() macro to simplify config space address
computation (Pali Rohár)
* pci/controller/ixp4xx:
PCI: ixp4xx: Use PCI_CONF1_ADDRESS() macro
|
|
- Install i.MX6 PCI abort handler only when DT contains a PCI controller
claimed by the imx6 driver (H. Nikolaus Schaller)
* pci/controller/dwc:
PCI: imx6: Install the fault handler only on compatible match
|
|
- Add pci_dev_for_each_resource() and pci_bus_for_each_resource() iterators
to simplify loops (Andy Shevchenko)
* pci/resource:
EISA: Drop unused pci_bus_for_each_resource() index argument
PCI: Make pci_bus_for_each_resource() index optional
PCI: Document pci_bus_for_each_resource()
PCI: Introduce pci_dev_for_each_resource()
PCI: Introduce pci_resource_n()
|
|
- Wait longer for devices to become ready after resume (as we do for reset)
to accommodate Intel Titan Ridge xHCI devices (Mika Westerberg)
- Drop pci_bridge_wait_for_secondary_bus() timeout parameter since all
callers pass the same value (Mika Westerberg)
- Extend D3hot delay for NVIDIA HDA controllers to avoid unrecoverable
devices after a bus reset (Alex Williamson)
* pci/reset:
PCI/PM: Extend D3hot delay for NVIDIA HDA controllers
PCI/PM: Drop pci_bridge_wait_for_secondary_bus() timeout parameter
PCI/PM: Increase wait time after resume
|
|
- Fix pci_p2pmem_find_many() kernel-doc (Cai Huoqing)
* pci/p2pdma:
PCI/P2PDMA: Fix pci_p2pmem_find_many() kernel-doc
|
|
- Fix pciehp AB-BA deadlock between reset_lock and device_lock (Lukas
Wunner)
* pci/hotplug:
PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock
|
|
- Use of_property_present(), instead of lower-level functions like
of_get_property(), for testing DT property presence (Rob Herring)
* pci/enumeration:
PCI: Use of_property_present() for testing DT property presence
|
|
Commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled status")
checked the firmware device status for both DT and ACPI devices. That
caused a regression in some ACPI systems. The exact reason isn't clear.
It's possibly a firmware bug. For now, at least, refactor the check to
be for DT based systems only.
Note that the original implementation leaked a refcount which is now
correctly handled.
[bhelgaas: Per ACPI r6.5, sec 6.3.7, for devices on an enumerable bus, _STA
must return with bit[0] ("device is present") set]
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status")
Link: https://lore.kernel.org/r/[email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217317
Reported-by: Donald Hunter <[email protected]>
Reported-by: Vitaly Kuznetsov <[email protected]>
Tested-by: Donald Hunter <[email protected]>
Tested-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Binbin Zhou <[email protected]>
Cc: Liu Peibao <[email protected]>
Cc: Huacai Chen <[email protected]>
|
|
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property()/of_find_property() functions for reading properties. As
part of this, convert of_get_property()/of_find_property() calls to the
recently added of_property_present() helper when we just want to test for
presence of a property and nothing more.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]> # pcie-mediatek
|
|
An upcoming user of DOE is CMA (Component Measurement and Authentication,
PCIe r6.0 sec 6.31).
It builds on SPDM (Security Protocol and Data Model):
https://www.dmtf.org/dsp/DSP0274
SPDM message sizes are not always a multiple of dwords. To transport
them over DOE without using bounce buffers, allow sending requests and
receiving responses whose final dword is only partially populated.
To be clear, PCIe r6.0 sec 6.30.1 specifies the Data Object Header 2
"Length" in dwords and pci_doe_send_req() and pci_doe_recv_resp()
read/write dwords. So from a spec point of view, DOE is still specified
in dwords and allowing non-dword request/response buffers is merely for
the convenience of callers.
Tested-by: Ira Weiny <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Ming Li <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Link: https://lore.kernel.org/r/151b1a6a1794afb65d941287ecbc032c5b8004b9.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <[email protected]>
|
|
The PCI core has just been amended to create a pci_doe_mb struct for
every DOE instance on device enumeration. CXL (the only in-tree DOE
user so far) has been migrated to use those mailboxes instead of
creating its own.
That leaves pcim_doe_create_mb() and pci_doe_for_each_off() without any
callers, so drop them.
pci_doe_supports_prot() is now only used internally, so declare it
static.
pci_doe_destroy_mb() is no longer used as callback for
devm_add_action(), so refactor it to accept a struct pci_doe_mb pointer
instead of a generic void pointer.
Because pci_doe_create_mb() is only called on device enumeration, i.e.
before driver binding, the workqueue name never contains a driver name.
So replace dev_driver_string() with dev_bus_name() when generating the
workqueue name.
Tested-by: Ira Weiny <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Ming Li <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Link: https://lore.kernel.org/r/64f614b6584982986c55d2c6229b4ee2b276dd59.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <[email protected]>
|
|
Currently a DOE instance cannot be shared by multiple drivers because
each driver creates its own pci_doe_mb struct for a given DOE instance.
For the same reason a DOE instance cannot be shared between the PCI core
and a driver.
Moreover, finding out which protocols a DOE instance supports requires
creating a pci_doe_mb for it. If a device has multiple DOE instances,
a driver looking for a specific protocol may need to create a pci_doe_mb
for each of the device's DOE instances and then destroy those which
do not support the desired protocol. That's obviously an inefficient
way to do things.
Overcome these issues by creating mailboxes in the PCI core on device
enumeration.
Provide a pci_find_doe_mailbox() API call to allow drivers to get a
pci_doe_mb for a given (pci_dev, vendor, protocol) triple. This API is
modeled after pci_find_capability() and can later be amended with a
pci_find_next_doe_mailbox() call to iterate over all mailboxes of a
given pci_dev which support a specific protocol.
On removal, destroy the mailboxes in pci_destroy_dev(), after the driver
is unbound. This allows drivers to use DOE in their ->remove() hook.
On surprise removal, cancel ongoing DOE exchanges and prevent new ones
from being scheduled. Thereby ensure that a hot-removed device doesn't
needlessly wait for a running exchange to time out.
Tested-by: Ira Weiny <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Ming Li <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Link: https://lore.kernel.org/r/40a6f973f72ef283d79dd55e7e6fddc7481199af.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <[email protected]>
|
|
DOE mailbox creation is currently only possible through a devres-managed
API. The lifetime of mailboxes thus ends with driver unbinding.
An upcoming commit will create DOE mailboxes upon device enumeration by
the PCI core. Their lifetime shall not be limited by a driver.
Therefore rework pcim_doe_create_mb() into the non-devres-managed
pci_doe_create_mb(). Add pci_doe_destroy_mb() for mailbox destruction
on device removal.
Provide a devres-managed wrapper under the existing pcim_doe_create_mb()
name.
The error path of pcim_doe_create_mb() previously called xa_destroy() if
alloc_ordered_workqueue() failed. That's unnecessary because the xarray
is still empty at that point. It doesn't need to be destroyed until
it's been populated by pci_doe_cache_protocols(). Arrange the error
path of the new pci_doe_create_mb() accordingly.
pci_doe_cancel_tasks() is no longer used as callback for
devm_add_action(), so refactor it to accept a struct pci_doe_mb pointer
instead of a generic void pointer.
Tested-by: Ira Weiny <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Ming Li <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Link: https://lore.kernel.org/r/7c9a63867d70233c5e9d26cd8bf956742cd6d650.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <[email protected]>
|
|
When a DOE mailbox is torn down, its workqueue is flushed once in
pci_doe_flush_mb() through a call to flush_workqueue() and subsequently
flushed once more in pci_doe_destroy_workqueue() through a call to
destroy_workqueue().
Deduplicate by dropping flush_workqueue() from pci_doe_flush_mb().
Rename pci_doe_flush_mb() to pci_doe_cancel_tasks() to more aptly
describe what it now does.
Tested-by: Ira Weiny <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Ming Li <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Link: https://lore.kernel.org/r/1f009f60b326d1c6d776641d4b20aff27de0c234.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <[email protected]>
|
|
A synchronous API for DOE has just been introduced. CXL (the only
in-tree DOE user so far) was converted to use it instead of the
asynchronous API.
Consequently, pci_doe_submit_task() as well as the pci_doe_task struct
are only used internally, so make them private.
Tested-by: Ira Weiny <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Ming Li <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Link: https://lore.kernel.org/r/cc19544068483681e91dfe27545c2180cd09f931.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <[email protected]>
|
|
The DOE API only allows asynchronous exchanges and forces callers to
provide a completion callback. Yet all existing callers only perform
synchronous exchanges. Upcoming commits for CMA (Component Measurement
and Authentication, PCIe r6.0 sec 6.31) likewise require only
synchronous DOE exchanges.
Provide a synchronous pci_doe() API call which builds on the internal
asynchronous machinery.
Convert the internal pci_doe_discovery() to the new call.
The new API allows submission of const-declared requests, necessitating
the addition of a const qualifier in struct pci_doe_task.
Tested-by: Ira Weiny <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Ming Li <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Link: https://lore.kernel.org/r/0f444206da9615c56301fbaff459c0f45d27f122.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <[email protected]>
|
|
Assignment of NVIDIA Ampere-based GPUs have seen a regression since the
below referenced commit, where the reduced D3hot transition delay appears
to introduce a small window where a D3hot->D0 transition followed by a bus
reset can wedge the device. The entire device is subsequently unavailable,
returning -1 on config space read and is unrecoverable without a host
reset.
This has been observed with RTX A2000 and A5000 GPU and audio functions
assigned to a Windows VM, where shutdown of the VM places the devices in
D3hot prior to vfio-pci performing a bus reset when userspace releases the
devices. The issue has roughly a 2-3% chance of occurring per shutdown.
Restoring the HDA controller d3hot_delay to the effective value before the
below commit has been shown to resolve the issue. NVIDIA confirms this
change should be safe for all of their HDA controllers.
Fixes: 3e347969a577 ("PCI/PM: Reduce D3hot delay with usleep_range()")
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Zhiyi Guo <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Tarun Gupta <[email protected]>
Cc: Abhishek Sahu <[email protected]>
Cc: Tarun Gupta <[email protected]>
|
|
For PCI pass-thru devices in a Confidential VM, Hyper-V requires
that PCI config space be accessed via hypercalls. In normal VMs,
config space accesses are trapped to the Hyper-V host and emulated.
But in a confidential VM, the host can't access guest memory to
decode the instruction for emulation, so an explicit hypercall must
be used.
Add functions to make the new MMIO read and MMIO write hypercalls.
Update the PCI config space access functions to use the hypercalls
when such use is indicated by Hyper-V flags. Also, set the flag to
allow the Hyper-V PCI driver to be loaded and used in a Confidential
VM (a.k.a., "Isolation VM"). The driver has previously been hardened
against a malicious Hyper-V host[1].
[1] https://lore.kernel.org/all/[email protected]/
Co-developed-by: Dexuan Cui <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Boqun Feng <[email protected]>
Reviewed-by: Haiyang Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
|
|
pci_msix_validate_entries() validates the entries array which is handed in
by the caller for a MSI-X interrupt allocation. Aside of consistency
failures it also detects a failure when the size of the MSI-X hardware table
in the device is smaller than the size of the entries array.
That's wrong for the case of range allocations where the caller provides
the minimum and the maximum number of vectors to allocate, when the
hardware size is greater or equal than the mininum, but smaller than the
maximum.
Remove the hardware size check completely from that function and just
ensure that the entires array up to the maximum size is consistent.
The limitation and range checking versus the hardware size happens
independently of that afterwards anyway because the entries array is
optional.
Fixes: 4644d22eb673 ("PCI/MSI: Validate MSI-X contiguous restriction early")
Reported-by: David Laight <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/87v8i3sg62.ffs@tglx
|
|
SM8550 requires two additional clocks for proper working.
Add these two clocks as optional clocks (as only required by this
platform) and compatible for this platform.
While at it, let's also rename the reset variable to "rst" from
"pci_reset" to match the existing naming preference.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Abel Vesa <[email protected]>
[[email protected]: commit log rewording]
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Reviewed-by: Konrad Dybcio <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Reviewed-by: Johan Hovold <[email protected]>
|
|
Add support for SDX55 SoC reusing the 1.9.0 config. The PCIe controller is
of version 1.10.0 but it is compatible with the 1.9.0 config. This SoC also
requires "sleep" clock which is added as an optional clock in the driver,
since it is not required on other SoCs.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
Qcom PCIe RC driver waits for the PHY link to be up during the probe;
this consumes several milliseconds during boot.
Enable async probe by default so that other drivers can load in parallel
while this driver waits for the link to be up.
Suggested-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Johan Hovold <[email protected]>
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Reviewed-by: Johan Hovold <[email protected]>
|
|
During the system suspend, vote for minimal interconnect bandwidth (1KiB)
to keep the interconnect path active for config access and also turn OFF
the resources like clock and PHY if there are no active devices connected
to the controller. For the controllers with active devices, the resources
are kept ON as removing the resources will trigger access violation during
the late end of suspend cycle as kernel tries to access the config space of
PCIe devices to mask the MSIs.
Also, it is not desirable to put the link into L2/L3 state as that
implies VDD supply will be removed and the devices may go into powerdown
state. This will affect the lifetime of storage devices like NVMe.
And finally, during resume, turn ON the resources if the controller was
truly suspended (resources OFF) and update the interconnect bandwidth
based on PCIe Gen speed.
Suggested-by: Krishna chaitanya chundru <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Johan Hovold <[email protected]>
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Reviewed-by: Johan Hovold <[email protected]>
Acked-by: Dhruva Gole <[email protected]>
|
|
All callers of pci_bridge_wait_for_secondary_bus() supply a timeout of
PCIE_RESET_READY_POLL_MS, so drop the parameter. Move the definition of
PCIE_RESET_READY_POLL_MS into pci.c, the only user.
[bhelgaas: extracted from
https://lore.kernel.org/r/[email protected]]
Signed-off-by: Mika Westerberg <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
PCIe r6.0 sec 6.6.1 prescribes that a device must be able to respond to
config requests within 1.0 s (PCI_RESET_WAIT) after exiting conventional
reset and this same delay is prescribed when coming out of D3cold (as that
involves reset too).
A device that requires more than 1 second to initialize after reset may
respond to config requests with Request Retry Status completions (sec
2.3.1), and we accommodate that in Linux with a 60 second cap
(PCIE_RESET_READY_POLL_MS).
Previously we waited up to PCIE_RESET_READY_POLL_MS only in the reset code
path, not in the resume path. However, a device has surfaced, namely Intel
Titan Ridge xHCI, which requires a longer delay also in the resume code
path.
Make the resume code path to use this same extended delay as the reset
path.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216728
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Chris Chiu <[email protected]>
Signed-off-by: Mika Westerberg <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Lukas Wunner <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Provide pci_msix_can_alloc_dyn() stub when CONFIG_PCI_MSI unset to
avoid build errors (Reinette Chatre)
- Quirk AMD XHCI controller that loses MSI-X state in D3hot to avoid
broken USB after hotplug or suspend/resume (Basavaraj Natikar)
- Fix use-after-free in pci_bus_release_domain_nr() (Rob Herring)
* tag 'pci-v6.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Fix use-after-free in pci_bus_release_domain_nr()
x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()
|
|
In 2013, commits
2e35afaefe64 ("PCI: pciehp: Add reset_slot() method")
608c388122c7 ("PCI: Add slot reset option to pci_dev_reset()")
amended PCIe hotplug to mask Presence Detect Changed events during a
Secondary Bus Reset. The reset thus no longer causes gratuitous slot
bringdown and bringup.
However the commits neglected to serialize reset with code paths reading
slot registers. For instance, a slot bringup due to an earlier hotplug
event may see the Presence Detect State bit cleared during a concurrent
Secondary Bus Reset.
In 2018, commit
5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset")
retrofitted the missing locking. It introduced a reset_lock which
serializes a Secondary Bus Reset with other parts of pciehp.
Unfortunately the locking turns out to be overzealous: reset_lock is
held for the entire enumeration and de-enumeration of hotplugged devices,
including driver binding and unbinding.
Driver binding and unbinding acquires device_lock while the reset_lock
of the ancestral hotplug port is held. A concurrent Secondary Bus Reset
acquires the ancestral reset_lock while already holding the device_lock.
The asymmetric locking order in the two code paths can lead to AB-BA
deadlocks.
Michael Haeuptle reports such deadlocks on simultaneous hot-removal and
vfio release (the latter implies a Secondary Bus Reset):
pciehp_ist() # down_read(reset_lock)
pciehp_handle_presence_or_link_change()
pciehp_disable_slot()
__pciehp_disable_slot()
remove_board()
pciehp_unconfigure_device()
pci_stop_and_remove_bus_device()
pci_stop_bus_device()
pci_stop_dev()
device_release_driver()
device_release_driver_internal()
__device_driver_lock() # device_lock()
SYS_munmap()
vfio_device_fops_release()
vfio_device_group_close()
vfio_device_close()
vfio_device_last_close()
vfio_pci_core_close_device()
vfio_pci_core_disable() # device_lock()
__pci_reset_function_locked()
pci_reset_bus_function()
pci_dev_reset_slot_function()
pci_reset_hotplug_slot()
pciehp_reset_slot() # down_write(reset_lock)
Ian May reports the same deadlock on simultaneous hot-removal and an
AER-induced Secondary Bus Reset:
aer_recover_work_func()
pcie_do_recovery()
aer_root_reset()
pci_bus_error_reset()
pci_slot_reset()
pci_slot_lock() # device_lock()
pci_reset_hotplug_slot()
pciehp_reset_slot() # down_write(reset_lock)
Fix by releasing the reset_lock during driver binding and unbinding,
thereby splitting and shrinking the critical section.
Driver binding and unbinding is protected by the device_lock() and thus
serialized with a Secondary Bus Reset. There's no need to additionally
protect it with the reset_lock. However, pciehp does not bind and
unbind devices directly, but rather invokes PCI core functions which
also perform certain enumeration and de-enumeration steps.
The reset_lock's purpose is to protect slot registers, not enumeration
and de-enumeration of hotplugged devices. That would arguably be the
job of the PCI core, not the PCIe hotplug driver. After all, an
AER-induced Secondary Bus Reset may as well happen during boot-time
enumeration of the PCI hierarchy and there's no locking to prevent that
either.
Exempting *de-enumeration* from the reset_lock is relatively harmless:
A concurrent Secondary Bus Reset may foil config space accesses such as
PME interrupt disablement. But if the device is physically gone, those
accesses are pointless anyway. If the device is physically present and
only logically removed through an Attention Button press or the sysfs
"power" attribute, PME interrupts as well as DMA cannot come through
because pciehp_unconfigure_device() disables INTx and Bus Master bits.
That's still protected by the reset_lock in the present commit.
Exempting *enumeration* from the reset_lock also has limited impact:
The exempted call to pci_bus_add_device() may perform device accesses
through pcibios_bus_add_device() and pci_fixup_device() which are now
no longer protected from a concurrent Secondary Bus Reset. Otherwise
there should be no impact.
In essence, the present commit seeks to fix the AB-BA deadlocks while
still retaining a best-effort reset protection for enumeration and
de-enumeration of hotplugged devices -- until a general solution is
implemented in the PCI core.
Link: https://lore.kernel.org/linux-pci/CS1PR8401MB0728FC6FDAB8A35C22BD90EC95F10@CS1PR8401MB0728.NAMPRD84.PROD.OUTLOOK.COM
Link: https://lore.kernel.org/linux-pci/[email protected]
Link: https://lore.kernel.org/linux-pci/[email protected]
Link: https://lore.kernel.org/linux-pci/[email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215590
Fixes: 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset")
Link: https://lore.kernel.org/r/fef2b2e9edf245c049a8c5b94743c0f74ff5008a.1681191902.git.lukas@wunner.de
Reported-by: Michael Haeuptle <[email protected]>
Reported-by: Ian May <[email protected]>
Reported-by: Andrey Grodzovsky <[email protected]>
Reported-by: Rahul Kumar <[email protected]>
Reported-by: Jialin Zhang <[email protected]>
Tested-by: Anatoli Antonovitch <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: [email protected] # v4.19+
Cc: Dan Stein <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Alex Michon <[email protected]>
Cc: Xiongfeng Wang <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Mika Westerberg <[email protected]>
Cc: Sathyanarayanan Kuppuswamy <[email protected]>
|
|
Qualcomm PCIe controllers have debug registers in the MHI region that
count PCIe link transitions. Expose them over debugfs to userspace to
help debug the low power issues.
Note that even though the registers are prefixed as PARF_, they don't
live under the "parf" register region. The register naming is following
the Qualcomm's internal documentation as like other registers.
While at it, let's arrange the local variables in probe function to follow
reverse XMAS tree order.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
qcom_pcie_config_sid_sm8250() function no longer applies only to SM8250.
So let's rename it to reflect the actual IP version and also move its
definition to keep it sorted as per IP revisions.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
To keep uniformity, let's use macros to define the total number of clocks
and supplies in qcom_pcie_resources_{2_7_0/2_9_0} structs.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
All the resets are asserted and deasserted at the same time. So the bulk
reset APIs can be used to handle them together. This simplifies the code
a lot.
It should be noted that there were delays in-between the reset asserts and
deasserts. But going by the config used by other revisions, those delays
are not really necessary. So a single delay after all asserts and one after
deasserts is used.
The total number of resets supported is 12 but only ipq4019 is using all of
them.
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Sricharan Ramabadhran <[email protected]>
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
All the resets are asserted and deasserted at the same time. So the bulk
reset APIs can be used to handle them together. This simplifies the code
a lot.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
All the clocks are enabled and disabled at the same time. So the bulk clock
APIs can be used to handle them together. This simplifies the code a lot.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
All the clocks are enabled and disabled at the same time. So the bulk clock
APIs can be used to handle them together. This simplifies the code a lot.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
All the clocks are enabled and disabled at the same time. So the bulk clock
APIs can be used to handle them together. This simplifies the code a lot.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
All the resets are asserted and deasserted at the same time. So the bulk
reset APIs can be used to handle them together. This simplifies the code
a lot.
While at it, let's also move the qcom_pcie_resources_2_1_0 struct below
qcom_pcie_resources_1_0_0 to keep it sorted.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
To maintain uniformity, let's use lower case for representing hexadecimal
numbers.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
Some of the registers are changed using hardcoded bitfields without macros.
This provides no information on what the register setting is about. So add
the macros to those fields for making the code more understandable.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
To maintain uniformity throughout the driver and also to make the code
easier to read, let's make use of bitfield definitions for register fields.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
Sorting the registers and their bit definitions will make it easier to add
more definitions in the future and it also helps in maintenance.
While at it, let's also group the registers and bit definitions separately
as done in the pcie-qcom-ep driver.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
The PCIE part is redundant and 20 doesn't represent anything across the
SoCs supported now. So let's get rid of the prefix.
This involves adding the IP version suffix to one definition of
PARF_SLV_ADDR_SPACE_SIZE that defines offset specific to that version.
The other definition is generic for the rest of the versions.
Also, the register PCIE20_LNK_CONTROL2_LINK_STATUS2 is not used anywhere,
hence removed.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
|
|
Qcom PCIe IP version v2.7.0 and its derivatives don't contain the
PCIE20_PARF_AXI_MSTR_WR_ADDR_HALT register. Instead, they have the new
PCIE20_PARF_AXI_MSTR_WR_ADDR_HALT_V2 register. So fix the incorrect
register usage which is modifying a different register.
Also in this IP version, this register change doesn't depend on MSI
being enabled. So remove that check also.
Link: https://lore.kernel.org/r/[email protected]
Fixes: ed8cc3b1fc84 ("PCI: qcom: Add support for SDM845 PCIe controller")
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Lorenzo Pieralisi <[email protected]>
Cc: <[email protected]> # 5.6+
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link (cxl) fixes from Dan Williams:
"Several fixes for driver startup regressions that landed during the
merge window as well as some older bugs.
The regressions were due to a lack of testing with what the CXL
specification calls Restricted CXL Host (RCH) topologies compared to
the testing with Virtual Host (VH) CXL topologies. A VH topology is
typical PCIe while RCH topologies map CXL endpoints as Root Complex
Integrated endpoints. The impact is some driver crashes on startup.
This merge window also added compatibility for range registers (the
mechanism that CXL 1.1 defined for mapping memory) to treat them like
HDM decoders (the mechanism that CXL 2.0 defined for mapping
Host-managed Device Memory). That work collided with the new region
enumeration code that was tested with CXL 2.0 setups, and fails with
crashes at startup.
Lastly, the DOE (Data Object Exchange) implementation for retrieving
an ACPI-like data table from CXL devices is being reworked for v6.4.
Several fixes fell out of that work that are suitable for v6.3.
All of this has been in linux-next for a while, and all reported
issues [1] have been addressed.
Summary:
- Fix several issues with region enumeration in RCH topologies that
can trigger crashes on driver startup or shutdown.
- Fix CXL DVSEC range register compatibility versus region
enumeration that leads to startup crashes
- Fix CDAT endiannes handling
- Fix multiple buffer handling boundary conditions
- Fix Data Object Exchange (DOE) workqueue usage vs
CONFIG_DEBUG_OBJECTS warn splats"
Link: http://lore.kernel.org/r/[email protected] [1]
* tag 'cxl-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/hdm: Extend DVSEC range register emulation for region enumeration
cxl/hdm: Limit emulation to the number of range registers
cxl/region: Move coherence tracking into cxl_region_attach()
cxl/region: Fix region setup/teardown for RCDs
cxl/port: Fix find_cxl_root() for RCDs and simplify it
cxl/hdm: Skip emulation when driver manages mem_enable
cxl/hdm: Fix double allocation of @cxlhdm
PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y
cxl/pci: Handle excessive CDAT length
cxl/pci: Handle truncated CDAT entries
cxl/pci: Handle truncated CDAT header
cxl/pci: Fix CDAT retrieval on big endian
|
|
EDR documentation is a bit sketchy. Add a couple comments to
edr_handle_event() about the devices involved.
Link: https://lore.kernel.org/r/20230407215259.GA3825733@bhelgaas
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
|
|
During EDR recovery, the OS must clear error status of the port that
triggered DPC even if firmware retains control of DPC and AER (see the
implementation note in the PCI Firmware spec r3.3, sec 4.6.12).
Prior to 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status errors only if
OS owns AER"), the port Device Status was cleared in this path:
edr_handle_event
dpc_process_error(dev) # "dev" triggered DPC
pcie_do_recovery(dev, dpc_reset_link)
dpc_reset_link # exit DPC
pcie_clear_device_status(dev) # clear Device Status
After 068c29a248b6, pcie_do_recovery() no longer clears Device Status when
firmware controls AER, so the error bit remains set even after recovery.
Per the "Downstream Port Containment configuration control" bit in the
returned _OSC Control Field (sec 4.5.1), the OS is allowed to clear error
status until it evaluates _OST, so clear Device Status in
edr_handle_event() if the error recovery was successful.
[bhelgaas: commit log]
Fixes: 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status errors only if OS owns AER")
Link: https://lore.kernel.org/r/20230315235449.1279209-1-sathyanarayanan.kuppuswamy@linux.intel.com
Reported-by: Tsaur Erwin <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Commit c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
introduced a use-after-free bug in the bus removal cleanup. The issue was
found with kfence:
[ 19.293351] BUG: KFENCE: use-after-free read in pci_bus_release_domain_nr+0x10/0x70
[ 19.302817] Use-after-free read at 0x000000007f3b80eb (in kfence-#115):
[ 19.309677] pci_bus_release_domain_nr+0x10/0x70
[ 19.309691] dw_pcie_host_deinit+0x28/0x78
[ 19.309702] tegra_pcie_deinit_controller+0x1c/0x38 [pcie_tegra194]
[ 19.309734] tegra_pcie_dw_probe+0x648/0xb28 [pcie_tegra194]
[ 19.309752] platform_probe+0x90/0xd8
...
[ 19.311457] kfence-#115: 0x00000000063a155a-0x00000000ba698da8, size=1072, cache=kmalloc-2k
[ 19.311469] allocated by task 96 on cpu 10 at 19.279323s:
[ 19.311562] __kmem_cache_alloc_node+0x260/0x278
[ 19.311571] kmalloc_trace+0x24/0x30
[ 19.311580] pci_alloc_bus+0x24/0xa0
[ 19.311590] pci_register_host_bridge+0x48/0x4b8
[ 19.311601] pci_scan_root_bus_bridge+0xc0/0xe8
[ 19.311613] pci_host_probe+0x18/0xc0
[ 19.311623] dw_pcie_host_init+0x2c0/0x568
[ 19.311630] tegra_pcie_dw_probe+0x610/0xb28 [pcie_tegra194]
[ 19.311647] platform_probe+0x90/0xd8
...
[ 19.311782] freed by task 96 on cpu 10 at 19.285833s:
[ 19.311799] release_pcibus_dev+0x30/0x40
[ 19.311808] device_release+0x30/0x90
[ 19.311814] kobject_put+0xa8/0x120
[ 19.311832] device_unregister+0x20/0x30
[ 19.311839] pci_remove_bus+0x78/0x88
[ 19.311850] pci_remove_root_bus+0x5c/0x98
[ 19.311860] dw_pcie_host_deinit+0x28/0x78
[ 19.311866] tegra_pcie_deinit_controller+0x1c/0x38 [pcie_tegra194]
[ 19.311883] tegra_pcie_dw_probe+0x648/0xb28 [pcie_tegra194]
[ 19.311900] platform_probe+0x90/0xd8
...
[ 19.313579] CPU: 10 PID: 96 Comm: kworker/u24:2 Not tainted 6.2.0 #4
[ 19.320171] Hardware name: /, BIOS 1.0-d7fb19b 08/10/2022
[ 19.325852] Workqueue: events_unbound deferred_probe_work_func
The stack trace is a bit misleading as dw_pcie_host_deinit() doesn't
directly call pci_bus_release_domain_nr(). The issue turns out to be in
pci_remove_root_bus() which first calls pci_remove_bus() which frees the
struct pci_bus when its struct device is released. Then
pci_bus_release_domain_nr() is called and accesses the freed struct
pci_bus. Reordering these fixes the issue.
Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Jon Hunter <[email protected]>
Tested-by: Jon Hunter <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
Cc: [email protected] # v6.2+
Cc: Pali Rohár <[email protected]>
|
|
Remove reference to pci_p2pmem_dma(), which has never existed.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Cai Huoqing <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Logan Gunthorpe <[email protected]>
|
|
Refactor pci_bus_for_each_resource() in the same way as
pci_dev_for_each_resource(). This allows the index to be hidden inside the
implementation so the caller can omit it when it's not used otherwise.
No functional changes intended.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Krzysztof Wilczyński <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
|
|
Instead of open-coding it everywhere introduce a tiny helper that can be
used to iterate over each resource of a PCI device, and convert the most
obvious users into it.
While at it drop doubled empty line before pdev_sort_resources().
No functional changes intended.
Suggested-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mika Westerberg <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Krzysztof Wilczyński <[email protected]>
|