Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently we try to keep PCIe ports runtime suspended over system suspend
if possible. This mostly happens when entering suspend-to-idle because
there is no need to re-configure wake settings.
This causes problems if the parent port goes into D3cold and it gets
resumed upon exit from system suspend. This may happen for example if the
port is part of PCIe switch and the same switch is connected to a PCIe
endpoint that needs to be resumed. The way exit from D3cold works according
PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold reset is
signaled. After this the device is in D0unitialized state keeping PME
context if it supports wake from D3cold.
The problem occurs when a PCIe hotplug port is left suspended and the
parent port goes into D3cold and back to D0: the port keeps its PME context
but since everything else is reset back to defaults (D0unitialized) it is
not set to detect hotplug events anymore.
For this reason change the PCIe portdrv power management logic so that it
is fine to keep the port runtime suspended over system suspend but it needs
to be resumed upon exit to make sure it gets properly re-initialized.
Signed-off-by: Mika Westerberg <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
The spec has timing requirements when waiting for a link to become active
after a conventional reset. Implement those hard delays when waiting for
an active link so pciehp and dpc drivers don't need to duplicate this.
For devices that don't support data link layer active reporting, wait the
fixed time recommended by the PCIe spec.
Signed-off-by: Keith Busch <[email protected]>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
|
|
Bring surprise removals and permanent failures together so we no longer
need separate flags. The implementation enforces that error handling will
not be able to override a surprise removal's permanent channel failure.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
|
|
A device still participates in error recovery even if it doesn't have
the error callbacks.
Always provide the status for user event watchers.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
|
|
There is no point in having a generic broadcast function if it needs to
have special cases for each callback it broadcasts.
Abstract the error broadcast to only the necessary information and removes
the now unnecessary helper to walk the bus.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
|
|
If an Endpoint reported an error with ERR_FATAL, we previously ran driver
error recovery callbacks only for the Endpoint's driver. But if we reset a
Link to recover from the error, all downstream components are affected,
including the Endpoint, any multi-function peers, and children of those
peers.
Initiate the Link reset from the deepest Downstream Port that is
reliable, and call the error recovery callbacks for all its children.
If a Downstream Port (including a Root Port) reports an error, we assume
the Port itself is reliable and we need to reset its downstream Link. In
all other cases (Switch Upstream Ports, Endpoints, Bridges, etc), we assume
the Link leading to the component needs to be reset, so we initiate the
reset at the parent Downstream Port.
This allows two other clean-ups. First, we currently only use a Link
reset, which can only be initiated using a Downstream Port, so we can
remove checks for Endpoints. Second, the Downstream Port where we initiate
the Link reset is reliable (unlike components downstream from it), so the
special cases for error detect and resume are no longer necessary.
Signed-off-by: Keith Busch <[email protected]>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
|
|
We don't need to be paranoid about the topology changing while handling an
error. If the device has changed in a hotplug capable slot, we can rely on
the presence detection handling to react to a changing topology.
Restore the fatal error handling behavior that existed before merging DPC
with AER with 7e9084b36740 ("PCI/AER: Handle ERR_FATAL with removal and
re-enumeration of devices").
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
|
|
The secondary bus reset may have link side effects that a hotplug capable
port may incorrectly react to. Use the slot specific reset for hotplug
ports, fixing the undesirable link down-up handling during error
recovering.
Signed-off-by: Keith Busch <[email protected]>
[bhelgaas: fold in
https://lore.kernel.org/linux-pci/[email protected]
for issue reported by Stephen Rothwell <[email protected]>]
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
|
|
The AER driver has never read the config space of an endpoint that reported
a fatal error because the link to that device is considered unreliable.
An ERR_FATAL from an upstream port almost certainly indicates an error on
its upstream link, so we can't expect to reliably read its config space for
the same reason.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
|
|
Error handling may be running in parallel with a hot removal. Reference
count the device during AER handling so the device can not be freed while
AER wants to reference it.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
|
|
This patch provides DPC save and restore capabilities. This is necessary
for the driver to observe DPC events in the event the configuration space
needs to be restored after a reset.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
|
|
The port's config space may be cleared after a link reset, which wipes out
the bridge's bus and memory windows. Restore the config space that was
saved during probe so we can access downstream devices.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
|
|
The PCI port driver saves the PCI state after initializing the device with
the applicable service devices. This was, however, before the service
drivers were even registered because PCI probe happens before the
device_initcall initialized those service drivers. The config space state
that the services set up were not being saved. The end result would cause
PCI devices to not react to events that the drivers think they did if the
PCI state ever needed to be restored.
Fix this by changing the service drivers from using the init calls to
having the portdrv driver calling the services directly. This will get the
state saved as desired, while making the relationship between the port
driver and the services under it more explicit in the code.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
|
|
Now that ASPM is configured for *all* PCIe devices at boot, a problem is
seen with systems that set the FADT NO_ASPM bit. This bit indicates that
the OS should not alter the ASPM state, but when
pcie_aspm_init_link_state() runs it only checks for !aspm_support_enabled.
This misses the ACPI_FADT_NO_ASPM case because that is setting
aspm_disabled.
The result is systems may hang at boot after 1302fcf; avoidable if they
boot with pcie_aspm=off (sets !aspm_support_enabled).
Fix this by having aspm_init_link_state() check for either
!aspm_support_enabled or acpm_disabled.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201001
Fixes: 1302fcf0d03e ("PCI: Configure *all* devices, not just hot-added ones")
Signed-off-by: Patrick Talbert <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Commit 89ee9f768003 ("PCI: Add device disconnected state") iterates over
the devices on a parent bus, marks each as disconnected, then marks
each device's children as disconnected using pci_walk_bus().
The same can be achieved more succinctly by calling pci_walk_bus() on
the parent bus. Moreover, this does not need to wait until acquiring
pci_lock_rescan_remove(), so move it out of that critical section.
The critical section in err.c contains a pci_dev_get() / pci_dev_put()
pair which was apparently copy-pasted from pciehp_pci.c. In the latter
it serves the purpose of holding the struct pci_dev in place until the
Command register is updated. err.c doesn't do anything like that, hence
the pair is unnecessary. Remove it.
Signed-off-by: Lukas Wunner <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Oza Pawandeep <[email protected]>
Cc: Sinan Kaya <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
|
|
Upon removal of the last device on a bus, the link_state of the bridge
leading to that bus is sought to be torn down by having pci_stop_dev()
call pcie_aspm_exit_link_state().
When ASPM was originally introduced by commit 7d715a6c1ae5 ("PCI: add
PCI Express ASPM support"), it determined whether the device being
removed is the last one by calling list_empty() on the bridge's
subordinate devices list. That didn't work because the device is only
removed from the list slightly later in pci_destroy_dev().
Commit 3419c75e15f8 ("PCI: properly clean up ASPM link state on device
remove") attempted to fix it by calling list_is_last(), but that's not
correct either because it checks whether the device is at the *end* of
the list, not whether it's the last one *left* in the list. If the user
removes the device which happens to be at the end of the list via sysfs
but other devices are preceding the device in the list, the link_state
is torn down prematurely.
The real fix is to move the invocation of pcie_aspm_exit_link_state() to
pci_destroy_dev() and reinstate the call to list_empty(). Remove a
duplicate check for dev->bus->self because pcie_aspm_exit_link_state()
already contains an identical check.
Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support")
Signed-off-by: Lukas Wunner <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: [email protected] # v2.6.26
|
|
- To avoid bus errors, enable PASID only if entire path supports End-End
TLP prefixes (Sinan Kaya)
- Unify slot and bus reset functions and remove hotplug knowledge from
callers (Sinan Kaya)
- Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
fix guest reboot issues (Alex Williamson)
- Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller
(Bjorn Helgaas)
* pci/virtualization:
PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
PCI: Delay after FLR of Intel DC P3700 NVMe
PCI: Disable Samsung SM961/PM961 NVMe before FLR
PCI: Export pcie_has_flr()
PCI: Rename pci_try_reset_bus() to pci_reset_bus()
PCI: Deprecate pci_reset_bus() and pci_reset_slot() functions
PCI: Unify try slot and bus reset API
PCI: Hide pci_reset_bridge_secondary_bus() from drivers
IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset
PCI: Handle error return from pci_reset_bridge_secondary_bus()
PCI/IOV: Tidy pci_sriov_set_totalvfs()
PCI: Enable PASID only if entire path supports End-End TLP prefixes
# Conflicts:
# drivers/pci/hotplug/pciehp_hpc.c
|
|
- Simplify SHPC existence/permission checks (Bjorn Helgaas)
- Remove hotplug sample skeleton driver (Lukas Wunner)
- Convert pciehp to threaded IRQ handling (Lukas Wunner)
- Improve pciehp tolerance of missed events and initially unstable links
(Lukas Wunner)
- Clear spurious pciehp events on resume (Lukas Wunner)
- Add pciehp runtime PM support, including for Thunderbolt controllers
(Lukas Wunner)
- Support interrupts from pciehp bridges in D3hot (Lukas Wunner)
* pci/hotplug:
PCI: pciehp: Deduplicate presence check on probe & resume
PCI: pciehp: Avoid implicit fallthroughs in switch statements
PCI: Whitelist Thunderbolt ports for runtime D3
PCI: Whitelist native hotplug ports for runtime D3
PCI: sysfs: Resume to D0 on function reset
PCI: pciehp: Resume parent to D0 on config space access
PCI: pciehp: Resume to D0 on enable/disable
PCI: pciehp: Support interrupts sent from D3hot
PCI: pciehp: Obey compulsory command delay after resume
PCI: pciehp: Clear spurious events earlier on resume
PCI: portdrv: Deduplicate PM callback iterator
PCI: pciehp: Avoid slot access during reset
PCI: pciehp: Always enable occupied slot on probe
PCI: pciehp: Become resilient to missed events
PCI: pciehp: Tolerate initially unstable link
PCI: pciehp: Declare pciehp_enable/disable_slot() static
PCI: pciehp: Drop enable/disable lock
PCI: pciehp: Enable/disable exclusively from IRQ thread
PCI: pciehp: Track enable/disable status
PCI: pciehp: Publish to user space last on probe
PCI: hotplug: Demidlayer registration with the core
PCI: pciehp: Drop slot workqueue
PCI: pciehp: Handle events synchronously
PCI: pciehp: Stop blinking on slot enable failure
PCI: pciehp: Convert to threaded polling
PCI: pciehp: Convert to threaded IRQ
PCI: pciehp: Document struct slot and struct controller
PCI: pciehp: Declare pciehp_unconfigure_device() void
PCI: pciehp: Drop unnecessary NULL pointer check
PCI: pciehp: Fix unprotected list iteration in IRQ handler
PCI: pciehp: Fix use-after-free on unplug
PCI: hotplug: Don't leak pci_slot on registration failure
PCI: hotplug: Delete skeleton driver
PCI: shpchp: Separate existence of SHPC and permission to use it
|
|
- Defer DPC event handling to work queue (Keith Busch)
- Use threaded IRQ for DPC bottom half (Keith Busch)
- Print AER status while handling DPC events (Keith Busch)
* pci/dpc:
PCI/DPC: Remove indirection waiting for inactive link
PCI/DPC: Use threaded IRQ for bottom half handling
PCI/DPC: Print AER status in DPC event handling
PCI/DPC: Remove rp_pio_status from dpc struct
PCI/DPC: Defer event handling to work queue
PCI/DPC: Leave interrupts enabled while handling event
|
|
- Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
Shevchenko)
- Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)
* pci/aspm:
PCI: Remove unnecessary include of <linux/pci-aspm.h>
iwlwifi: Remove unnecessary include of <linux/pci-aspm.h>
ath9k: Remove unnecessary include of <linux/pci-aspm.h>
igb: Remove unnecessary include of <linux/pci-aspm.h>
PCI/ASPM: Convert to use sysfs_match_string() helper
|
|
- Decode AER errors with names similar to "lspci" (Tyler Baicar)
- Expose AER statistics in sysfs (Rajat Jain)
- Clear AER status bits selectively based on the type of recovery (Oza
Pawandeep)
- Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
Gagniuc)
- Don't clear AER status bits if we're using the "Firmware-First"
strategy where firmware owns the registers (Alexandru Gagniuc)
* pci/aer:
PCI/AER: Don't clear AER bits if error handling is Firmware-First
PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition
PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset
PCI/AER: Clear device status bits during ERR_COR handling
PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL
PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
PCI/AER: Factor out ERR_NONFATAL status bit clearing
PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
PCI/AER: Clear only ERR_FATAL status bits during fatal recovery
PCI/AER: Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST
PCI/AER: Add sysfs attributes for rootport cumulative stats
PCI/AER: Add sysfs attributes to provide AER stats and breakdown
PCI/AER: Define aer_stats structure for AER capable devices
PCI/AER: Move internal declarations to drivers/pci/pci.h
PCI/AER: Adopt lspci names for AER error decoding
PCI/AER: Expose internal API for obtaining AER information
# Conflicts:
# drivers/pci/pci.h
|
|
If the platform requests Firmware-First error handling, firmware is
responsible for reading and clearing AER status bits. If OSPM also clears
them, we may miss errors. See ACPI v6.2, sec 18.3.2.5 and 18.4.
This race is mostly of theoretical significance, as it is not easy to
reasonably demonstrate it in testing.
Signed-off-by: Alexandru Gagniuc <[email protected]>
[bhelgaas: add similar guards to pci_cleanup_aer_uncorrect_error_status()
and pci_aer_clear_fatal_status()]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
The sysfs_match_string() helper returns index of the matching string in an
array. Use it in pcie_aspm_set_policy() to simplify the code.
Signed-off-by: Andy Shevchenko <[email protected]>
[bhelgaas: squash sysfs_match_string() fix into original patch for issue
Reported-by: Heiner Kallweit <[email protected]>]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
PCI_EXP_AER_FLAGS was defined twice (with identical definitions), once
under #ifdef CONFIG_ACPI_APEI, and again at the top level. This looks like
my merge error from these commits:
fd3362cb73de ("PCI/AER: Squash aerdrv_core.c into aerdrv.c")
41cbc9eb1a82 ("PCI/AER: Squash ecrc.c into aerdrv.c")
Remove the duplicate PCI_EXP_AER_FLAGS definition.
Fixes: 41cbc9eb1a82 ("PCI/AER: Squash ecrc.c into aerdrv.c")
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Oza Pawandeep <[email protected]>
|
|
Thunderbolt hotplug ports that were occupied before system sleep resume
with their downstream link in "off" state. Only after the Thunderbolt
controller has reestablished the PCIe tunnels does the link go up.
As a result, a spurious Presence Detect Changed and/or Data Link Layer
State Changed event occurs.
The events are not immediately acted upon because tunnel reestablishment
happens in the ->resume_noirq phase, when interrupts are still disabled.
Also, notification of events may initially be disabled in the Slot
Control register when coming out of system sleep and is reenabled in the
->resume_noirq phase through:
pci_pm_resume_noirq()
pci_pm_default_resume_early()
pci_restore_state()
pci_restore_pcie_state()
It is not guaranteed that the events are acted upon at all: PCIe r4.0,
sec 6.7.3.4 says that "a port may optionally send an MSI when there are
hot-plug events that occur while interrupt generation is disabled, and
interrupt generation is subsequently enabled." Note the "optionally".
If an MSI is sent, pciehp will gratuitously turn the slot off and back
on once the ->resume_early phase has commenced.
If an MSI is not sent, the extant, unacknowledged events in the Slot
Status register will prevent future notification of presence or link
changes.
Commit 13c65840feab ("PCI: pciehp: Clear Presence Detect and Data Link
Layer Status Changed on resume") fixed the latter by clearing the events
in the ->resume phase. Move this to the ->resume_noirq phase to also
fix the gratuitous disable/enablement of the slot.
The commit further restored the Slot Control register in the ->resume
phase, but that's dispensable because as shown above it's already been
done in the ->resume_noirq phase.
Signed-off-by: Lukas Wunner <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Cc: Mika Westerberg <[email protected]>
|
|
Replace suspend_iter() and resume_iter() with a single function pm_iter()
to allow addition of port service callbacks for further power management
phases without having to add another iterator each time.
No functional change intended.
Signed-off-by: Lukas Wunner <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
When an fatal error is received by a non-bridge device, the device is
removed, and pci_stop_and_remove_bus_device() deallocates the device
structure. The freed device structure is used by subsequent code to send
uevents and print messages.
Hold a reference on the device until we're finished using it. This is not
an ideal fix because pcie_do_fatal_recovery() should not use the device at
all after removing it, but that's too big a project for right now.
Fixes: 7e9084b36740 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices")
Signed-off-by: Thomas Tai <[email protected]>
[bhelgaas: changelog, reduce get/put coverage]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
The pci_error_handlers.slot_reset() callback is only used for non-bridge
devices (see broadcast_error_message()). Since portdrv only binds to
bridges, we don't need pcie_portdrv_slot_reset(), so remove it.
Signed-off-by: Oza Pawandeep <[email protected]>
[bhelgaas: changelog, remove pcie_portdrv_slot_reset() completely]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
In case of correctable error, the Correctable Error Detected bit in the
Device Status register is set. Clear it after handling the error.
Signed-off-by: Oza Pawandeep <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Clear the device status bits while handling both ERR_FATAL and ERR_NONFATAL
cases.
Signed-off-by: Oza Pawandeep <[email protected]>
[bhelgaas: rename to pci_aer_clear_device_status(), declare internal to PCI
core instead of exposing it everywhere]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
broadcast_error_message() is only used for ERR_NONFATAL events, when the
state is always pci_channel_io_normal, so remove the unused alternate path.
Signed-off-by: Oza Pawandeep <[email protected]>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
aer_error_resume() clears all ERR_NONFATAL error status bits. This is
exactly what pci_cleanup_aer_uncorrect_error_status(), so use that instead
of duplicating the code.
Signed-off-by: Oza Pawandeep <[email protected]>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
pci_cleanup_aer_uncorrect_error_status() is called by driver .slot_reset()
methods when handling ERR_NONFATAL errors. Previously this cleared *all*
the bits, including ERR_FATAL bits.
Since we're only handling ERR_NONFATAL errors, clear only the ERR_NONFATAL
error status bits.
Signed-off-by: Oza Pawandeep <[email protected]>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
During recovery from fatal errors, we previously called
pci_cleanup_aer_uncorrect_error_status(), which cleared *all* uncorrectable
error status bits (both ERR_FATAL and ERR_NONFATAL).
Instead, call a new pci_aer_clear_fatal_status() that clears only the
ERR_FATAL bits (as indicated by the PCI_ERR_UNCOR_SEVER register).
Based-on-patch-by: Oza Pawandeep <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Rename pci_reset_bridge_secondary_bus() to pci_bridge_secondary_bus_reset()
and move the declaration from linux/pci.h to drivers/pci.h to be used
internally in PCI directory only.
Signed-off-by: Sinan Kaya <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Commit 01fd61c0b9bd ("PCI: Add a return type for
pci_reset_bridge_secondary_bus()") added a return value to the function to
return if a device is accessible following a reset. Callers are not
checking the value.
Pass error code up high in the stack if device is not accessible.
Fixes: 01fd61c0b9bd ("PCI: Add a return type for pci_reset_bridge_secondary_bus()")
Signed-off-by: Sinan Kaya <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Simplify waiting for the contained link to become inactive, removing the
indirection to a unnecessary DPC-specific handler.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
Reviewed-by: Oza Pawandeep <[email protected]>
|
|
Remove the work struct that was being used to handle a DPC event and use a
threaded IRQ instead.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
Reviewed-by: Oza Pawandeep <[email protected]>
|
|
A DPC enabled device suppresses ERR_(NON)FATAL messages, preventing the AER
handler from reporting error details. If the DPC trigger reason says the
downstream port detected the error, collect the AER uncorrectable status
for logging, then clear the status.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
Reviewed-by: Oza Pawandeep <[email protected]>
|
|
We don't need to save the rp pio status across multiple contexts as all
DPC event handling occurs in a single work queue context.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
Reviewed-by: Oza Pawandeep <[email protected]>
|
|
Move all event handling to the existing work queue, which will
make it simpler to pass event information to the handler.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
Reviewed-by: Oza Pawandeep <[email protected]>
|
|
Now that the DPC driver clears the interrupt status before exiting the
IRQ handler, we don't need to abuse the DPC control register to know if
a shared interrupt is for a new DPC event: a DPC port can not trigger
a second interrupt until the host clears the trigger status later in the
work queue handler.
Signed-off-by: Keith Busch <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
Reviewed-by: Oza Pawandeep <[email protected]>
|
|
According to the documentation, "pcie_ports=native", linux should use
native AER and DPC services. While that is true for the _OSC method
parsing, this is not the only place that is checked. Should the HEST
list PCIe ports as firmware-first, linux will not use native services.
This happens because aer_acpi_firmware_first() doesn't take 'pcie_ports'
into account. This is wrong. DPC uses the same logic when it decides
whether to load or not, so fixing this also fixes DPC not loading.
Signed-off-by: Alexandru Gagniuc <[email protected]>
[bhelgaas: return "false" from bool function (from kbuild robot)]
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Add sysfs attributes for rootport statistics (that are cumulative of all
the ERR_* messages seen on this PCI hierarchy).
Signed-off-by: Rajat Jain <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Add sysfs attributes to provide total and breakdown of the AERs seen,
into different type of correctable, fatal and nonfatal errors:
/sys/bus/pci/devices/<dev>/aer_dev_correctable
/sys/bus/pci/devices/<dev>/aer_dev_fatal
/sys/bus/pci/devices/<dev>/aer_dev_nonfatal
Signed-off-by: Rajat Jain <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Define a structure to hold the AER statistics. There are 2 groups of
statistics: dev_* counters that are to be collected for all AER capable
devices and rootport_* counters that are collected for all (AER capable)
rootports only. Allocate and free this structure when device is added or
released (thus counters survive the lifetime of the device).
Signed-off-by: Rajat Jain <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Since pci_aer_init() and pci_no_aer() are used only internally, move their
declarations to the PCI internal header file. Also, no one cares about
return value of pci_aer_init(), so make it void.
Signed-off-by: Rajat Jain <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
lspci uses abbreviated naming for AER error strings. Adopt the same naming
convention for the AER printing so they match.
Signed-off-by: Tyler Baicar <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Oza Pawandeep <[email protected]>
|
|
Export some common AER functions and structures for other PCI core drivers
to use. Since this is making the function externally visible inside the
PCI core, prepend "aer_" to the function name.
Signed-off-by: Keith Busch <[email protected]>
[bhelgaas: move AER declarations from linux/aer.h to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Sinan Kaya <[email protected]>
Reviewed-by: Oza Pawandeep <[email protected]>
|
|
Use "PCI Express" consistently in Kconfig text. No functional change
intended.
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
|