From cdea98bf1faef23166262825ce44648be6ebff42 Mon Sep 17 00:00:00 2001 From: Daniel Drake Date: Wed, 28 Feb 2024 08:53:15 +0100 Subject: PCI: Disable D3cold on Asus B1400 PCI-NVMe bridge The Asus B1400 with original shipped firmware versions and VMD disabled cannot resume from suspend: the NVMe device becomes unresponsive and inaccessible. This appears to be an untested D3cold transition by the vendor; Intel socwatch shows that Windows leaves the NVMe device and parent bridge in D0 during suspend, even though these firmware versions have StorageD3Enable=1. The NVMe device and parent PCI bridge both share the same "PXP" ACPI power resource, which gets turned off as both devices are put into D3cold during suspend. The _OFF() method calls DL23() which sets a L23E bit at offset 0xe2 into the PCI configuration space for this root port. This is the specific write that the _ON() routine is unable to recover from. This register is not documented in the public chipset datasheet. Disallow D3cold on the PCI bridge to enable successful suspend/resume. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215742 Link: https://lore.kernel.org/r/20240228075316.7404-1-drake@endlessos.org Signed-off-by: Daniel Drake Signed-off-by: Bjorn Helgaas Acked-by: Jian-Hong Pan Acked-by: Rafael J. Wysocki --- arch/x86/pci/fixup.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c index f347c20247d3..b33afb240601 100644 --- a/arch/x86/pci/fixup.c +++ b/arch/x86/pci/fixup.c @@ -907,6 +907,54 @@ static void chromeos_fixup_apl_pci_l1ss_capability(struct pci_dev *dev) DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_save_apl_pci_l1ss_capability); DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_fixup_apl_pci_l1ss_capability); +/* + * Disable D3cold on Asus B1400 PCI-NVMe bridge + * + * On this platform with VMD off, the NVMe device cannot successfully power + * back on from D3cold. This appears to be an untested transition by the + * vendor: Windows leaves the NVMe and parent bridge in D0 during suspend. + * + * We disable D3cold on the parent bridge for simplicity, and the fact that + * both parent bridge and NVMe device share the same power resource. + * + * This is only needed on BIOS versions before 308; the newer versions flip + * StorageD3Enable from 1 to 0. + */ +static const struct dmi_system_id asus_nvme_broken_d3cold_table[] = { + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.304"), + }, + }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.305"), + }, + }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.306"), + }, + }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BIOS_VERSION, "B1400CEAE.307"), + }, + }, + {} +}; + +static void asus_disable_nvme_d3cold(struct pci_dev *pdev) +{ + if (dmi_check_system(asus_nvme_broken_d3cold_table) > 0) + pci_d3cold_disable(pdev); +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9a09, asus_disable_nvme_d3cold); + #ifdef CONFIG_SUSPEND /* * Root Ports on some AMD SoCs advertise PME_Support for D3hot and D3cold, but -- cgit From cb98555fcd8eee98c30165537c7e394f3a66e809 Mon Sep 17 00:00:00 2001 From: Daniel Drake Date: Wed, 28 Feb 2024 08:53:16 +0100 Subject: Revert "ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default" This reverts commit d52848620de00cde4a3a5df908e231b8c8868250, which was originally put in place to work around a s2idle failure on this platform where the NVMe device was inaccessible upon resume. After extended testing, we found that the firmware's implementation of S3 is buggy and intermittently fails to wake up the system. We need to revert to s2idle mode. The NVMe issue has now been solved more precisely in the commit titled "PCI: Disable D3cold on Asus B1400 PCI-NVMe bridge" Link: https://bugzilla.kernel.org/show_bug.cgi?id=215742 Link: https://lore.kernel.org/r/20240228075316.7404-2-drake@endlessos.org Signed-off-by: Daniel Drake Signed-off-by: Bjorn Helgaas Acked-by: Jian-Hong Pan Acked-by: Rafael J. Wysocki --- drivers/acpi/sleep.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c index 808484d11209..728acfeb774d 100644 --- a/drivers/acpi/sleep.c +++ b/drivers/acpi/sleep.c @@ -385,18 +385,6 @@ static const struct dmi_system_id acpisleep_dmi_table[] __initconst = { DMI_MATCH(DMI_PRODUCT_NAME, "20GGA00L00"), }, }, - /* - * ASUS B1400CEAE hangs on resume from suspend (see - * https://bugzilla.kernel.org/show_bug.cgi?id=215742). - */ - { - .callback = init_default_s3, - .ident = "ASUS B1400CEAE", - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), - DMI_MATCH(DMI_PRODUCT_NAME, "ASUS EXPERTBOOK B1400CEAE"), - }, - }, {}, }; -- cgit From fa885b06ec7ee07c2498ad0ce4e0717c3b7dd487 Mon Sep 17 00:00:00 2001 From: Raag Jadav Date: Tue, 27 Feb 2024 11:56:48 +0530 Subject: PCI/PM: Allow runtime PM with no PM callbacks at all Commit c5eb1190074c ("PCI / PM: Allow runtime PM without callback functions") eliminated the need for PM callbacks in pci_pm_runtime_suspend() and pci_pm_runtime_resume(), but didn't do the same for pci_pm_runtime_idle(). Therefore, runtime suspend worked as long as the driver implemented at least one PM callback. But if the driver doesn't implement any PM callbacks at all (driver->pm is NULL), pci_pm_runtime_idle() returned -ENOSYS, which prevented runtime suspend. Modify pci_pm_runtime_idle() to allow PCI device power state transitions without runtime PM callbacks and complete the original intention of commit c5eb1190074c ("PCI / PM: Allow runtime PM without callback functions"). Link: https://lore.kernel.org/r/20240227062648.16579-1-raag.jadav@intel.com Signed-off-by: Raag Jadav [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas Tested-by: Jarkko Nikula Reviewed-by: Andy Shevchenko Reviewed-by: Stanislaw Gruszka Acked-by: Rafael J. Wysocki --- drivers/pci/pci-driver.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 51ec9e7e784f..bb7f6775b350 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -1382,10 +1382,7 @@ static int pci_pm_runtime_idle(struct device *dev) if (!pci_dev->driver) return 0; - if (!pm) - return -ENOSYS; - - if (pm->runtime_idle) + if (pm && pm->runtime_idle) return pm->runtime_idle(dev); return 0; -- cgit From 9d5286d4e7f68beab450deddbb6a32edd5ecf4bf Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 5 Mar 2024 11:45:38 +0100 Subject: PCI/PM: Drain runtime-idle callbacks before driver removal A race condition between the .runtime_idle() callback and the .remove() callback in the rtsx_pcr PCI driver leads to a kernel crash due to an unhandled page fault [1]. The problem is that rtsx_pci_runtime_idle() is not expected to be running after pm_runtime_get_sync() has been called, but the latter doesn't really guarantee that. It only guarantees that the suspend and resume callbacks will not be running when it returns. However, if a .runtime_idle() callback is already running when pm_runtime_get_sync() is called, the latter will notice that the runtime PM status of the device is RPM_ACTIVE and it will return right away without waiting for the former to complete. In fact, it cannot wait for .runtime_idle() to complete because it may be called from that callback (it arguably does not make much sense to do that, but it is not strictly prohibited). Thus in general, whoever is providing a .runtime_idle() callback needs to protect it from running in parallel with whatever code runs after pm_runtime_get_sync(). [Note that .runtime_idle() will not start after pm_runtime_get_sync() has returned, but it may continue running then if it has started earlier.] One way to address that race condition is to call pm_runtime_barrier() after pm_runtime_get_sync() (not before it, because a nonzero value of the runtime PM usage counter is necessary to prevent runtime PM callbacks from being invoked) to wait for the .runtime_idle() callback to complete should it be running at that point. A suitable place for doing that is in pci_device_remove() which calls pm_runtime_get_sync() before removing the driver, so it may as well call pm_runtime_barrier() subsequently, which will prevent the race in question from occurring, not just in the rtsx_pcr driver, but in any PCI drivers providing .runtime_idle() callbacks. Link: https://lore.kernel.org/lkml/20240229062201.49500-1-kai.heng.feng@canonical.com/ # [1] Link: https://lore.kernel.org/r/5761426.DvuYhMxLoT@kreacher Reported-by: Kai-Heng Feng Signed-off-by: Rafael J. Wysocki Signed-off-by: Bjorn Helgaas Tested-by: Ricky Wu Acked-by: Kai-Heng Feng Cc: --- drivers/pci/pci-driver.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index bb7f6775b350..652506c54a54 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -473,6 +473,13 @@ static void pci_device_remove(struct device *dev) if (drv->remove) { pm_runtime_get_sync(dev); + /* + * If the driver provides a .runtime_idle() callback and it has + * started to run already, it may continue to run in parallel + * with the code below, so wait until all of the runtime PM + * activity has completed. + */ + pm_runtime_barrier(dev); drv->remove(pci_dev); pm_runtime_put_noidle(dev); } -- cgit