aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-03-12Merge branch 'pci/enumeration'Bjorn Helgaas6-218/+206
- Collect interrupt-related code in irq.c (Ilpo Järvinen) - Mark 3ware-9650SE Root Port Extended Tags as broken (Jörg Wedekind) * pci/enumeration: PCI: Mark 3ware-9650SE Root Port Extended Tags as broken PCI: Place interrupt related code into irq.c # Conflicts: # drivers/pci/Makefile
2024-03-12Merge branch 'pci/dpc'Bjorn Helgaas2-1/+63
- After a DPC event, print all logged TLP Prefixes instead of printing the first prefix several times (Ilpo Järvinen) - Ignore the expected Surprise Down error that may cause a DPC event when hot-removing a device (Smita Koralahalli) - Add an RP PIO log size quirk for Intel Raptor Lake Root Ports, which still don't advertise the correct log size, which prevented logging of RP PIO Log registers when DPC is triggered (Paul Menzel) * pci/dpc: PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports PCI/DPC: Ignore Surprise Down error on hot removal PCI/DPC: Print all TLP Prefixes, not just the first
2024-03-12Merge branch 'pci/devres'Bjorn Helgaas12-469/+484
- Unmap MMIO mappings in pci_iounmap() to avoid a leak when ARCH_HAS_GENERIC_IOPORT_MAP is defined (Philipp Stanner) - Move pci_iomap.c to drivers/pci/ since it's all PCI-related (Philipp Stanner) - Move other PCI-related devres code from lib/devres.c to drivers/pci/ (Philipp Stanner) - Move other devres code from pci.c to devres.c (Philipp Stanner) * pci/devres: PCI: Move devres code from pci.c to devres.c PCI: Move PCI-specific devres code to drivers/pci/ PCI: Move pci_iomap.c to drivers/pci/ pci_iounmap(): Fix MMIO mapping leak
2024-03-12Merge branch 'pci/aspm'Bjorn Helgaas6-130/+292
- Collect ASPM-related code into aspm.c (David E. Box) - Save and restore ASPM L1 PM Substates configuration so these states continue working after suspend/resume (David E. Box) - Move the ASPM L1.2-related LTR save/restore next to the ASPM save/restore (David E. Box) - Move the required L1 disable before L1 Substate configuration into pci_restore_aspm_l1ss_state() (Bjorn Helgaas) - Update save_save when ASPM config is changed, so a .slot_reset() during error recovery restores the changed config, not the .probe()-time config (Vidya Sagar) * pci/aspm: PCI/ASPM: Update save_state when configuration changes PCI/ASPM: Disable L1 before configuring L1 Substates PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state() PCI/ASPM: Save L1 PM Substates Capability for suspend/resume PCI/ASPM: Move pci_save_ltr_state() to aspm.c PCI/ASPM: Always build aspm.c PCI/ASPM: Move pci_configure_ltr() to aspm.c
2024-03-12Merge branch 'pci/aer'Bjorn Helgaas9-47/+80
- Fix sysfs paths in aer_rootport_total_err_* documentation (Johan Hovold) - Block runtime suspend while handling AER errors (Stanislaw Gruszka) - Add a generic Header Log structure and reader shared by AER and DPC (Ilpo Järvinen) * pci/aer: PCI/AER: Generalize TLP Header Log reading PCI/AER: Use explicit register size for PCI_ERR_CAP PCI/AER: Block runtime suspend when handling errors PCI/AER: Clean up version indentation in ABI docs PCI/AER: Fix rootport attribute paths in ABI docs
2024-03-12PCI/ASPM: Update save_state when configuration changesVidya Sagar1-1/+33
Many PCIe device drivers save the configuration state of their device during probe and restore it when their .slot_reset() hook is called during PCIe error recovery. If the ASPM configuration is changed after the driver's probe is called and before an error event occurs, .slot_reset() restores the ASPM configuration to what it was at the time of probe, not to what it was just before the occurrence of the error event. This leads to a mismatch in ASPM configuration between the device and its upstream device. Update the saved configuration of the device when the ASPM configuration changes. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vidya Sagar <[email protected]> [bhelgaas: commit log, rebase to pci/aspm, rename to pci_update_aspm_saved_state() since it updates only LNKCTL, update only ASPMC and CLKREQ_EN in LNKCTL] Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]> Reviewed-by: David E. Box <[email protected]>
2024-03-12PCI/ASPM: Disable L1 before configuring L1 SubstatesBjorn Helgaas2-13/+22
Per PCIe r6.1, sec 5.5.4, L1 must be disabled while setting ASPM L1 PM Substates enable bits. Previously this was enforced by clearing PCI_EXP_LNKCTL_ASPMC before calling pci_restore_aspm_l1ss_state(). Move the L1 (and L0s, although that doesn't seem required) disable into pci_restore_aspm_l1ss_state() itself so it's closer to the code that depends on it. Link: https://lore.kernel.org/r/20240223213733.GA115410@bhelgaas Signed-off-by: Bjorn Helgaas <[email protected]>
2024-03-12PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state()David E. Box1-7/+7
ASPM state is saved and restored from pci_save/restore_pcie_state(). Since the LTR Capability is linked with ASPM, move the LTR save and restore calls there as well. No functional change intended. Suggested-by: Bjorn Helgaas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: David E. Box <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-03-12PCI/ASPM: Save L1 PM Substates Capability for suspend/resumeDavid E. Box5-6/+119
4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume") restored the L1 PM Substates Capability after resume, which reduced power consumption by making the ASPM L1.x states work after resume. a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"") reverted 4ff116d0d5fd because resume failed on some systems, so power consumption after resume increased again. a7152be79b62 mentioned that we restore L1 PM substate configuration even though ASPM L1 may already be enabled. This is due the fact that the pci_restore_aspm_l1ss_state() was called before pci_restore_pcie_state(). Save and restore the L1 PM Substates Capability, following PCIe r6.1, sec 5.5.4 more closely by: 1) Do not restore ASPM configuration in pci_restore_pcie_state() but do that after PCIe capability is restored in pci_restore_aspm_state() following PCIe r6.1, sec 5.5.4. 2) If BIOS reenables L1SS, particularly L1.2, we need to clear the enables in the right order, downstream before upstream. Defer restoring the L1SS config until we are at the downstream component. Then update the config for both ends of the link in the prescribed order. 3) Program ASPM L1 PM substate configuration before L1 enables. 4) Program ASPM L1 PM substate enables last, after rest of the fields in the capability are programmed. [bhelgaas: commit log, squash L1SS-related patches, do both LNKCTL restores in pci_restore_pcie_state()] Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217321 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877 Co-developed-by: Mika Westerberg <[email protected]> Co-developed-by: David E. Box <[email protected]> Reported-by: Koba Ko <[email protected]> Signed-off-by: Mika Westerberg <[email protected]> Signed-off-by: David E. Box <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Tested-by: Tasev Nikola <[email protected]> # Asus UX305FA Cc: Mark Enriquez <[email protected]> Cc: Thomas Witt <[email protected]> Cc: Werner Sembach <[email protected]> Cc: Vidya Sagar <[email protected]>
2024-03-08PCI/AER: Generalize TLP Header Log readingIlpo Järvinen7-35/+48
Both AER and DPC RP PIO provide TLP Header Log registers (PCIe r6.1 secs 7.8.4 & 7.9.14) to convey error diagnostics but the struct is named after AER as the struct aer_header_log_regs. Also, not all places that handle TLP Header Log use the struct and the struct members are named individually. Generalize the struct name and members, and use it consistently where TLP Header Log is being handled so that a pcie_read_tlp_log() helper can be easily added. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ilpo Järvinen <[email protected]> [bhelgaas: drop ixgbe changes for now, tidy whitespace] Signed-off-by: Bjorn Helgaas <[email protected]>
2024-03-08PCI/AER: Use explicit register size for PCI_ERR_CAPIlpo Järvinen1-3/+3
Use u32 for PCIe AER Capability register variable and name it "aercc" (Advanced Error Capabilities and Control register, PCIe r6.1 sec 7.8.4.7) instead of "temp". Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ilpo Järvinen <[email protected]> [bhelgaas: make subject more specific and match similar previous patches] Signed-off-by: Bjorn Helgaas <[email protected]>
2024-03-07PCI/AER: Block runtime suspend when handling errorsStanislaw Gruszka1-0/+20
PM runtime can be done simultaneously with AER error handling. Avoid that by using pm_runtime_get_sync() before and pm_runtime_put() after reset in pcie_do_recovery() for all recovering devices. pm_runtime_get_sync() will increase dev->power.usage_count counter to prevent any possible future request to runtime suspend a device. It will also resume a device, if it was previously in D3hot state. I tested with igc device by doing simultaneous aer_inject and rpm suspend/resume via /sys/bus/pci/devices/PCI_ID/power/control and can reproduce: igc 0000:02:00.0: not ready 65535ms after bus reset; giving up pcieport 0000:00:1c.2: AER: Root Port link has been reset (-25) pcieport 0000:00:1c.2: AER: subordinate device reset failed pcieport 0000:00:1c.2: AER: device recovery failed igc 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible The problem disappears when this patch is applied. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stanislaw Gruszka <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Cc: <[email protected]>
2024-03-07PCI/ASPM: Move pci_save_ltr_state() to aspm.cDavid E. Box3-40/+45
Even when CONFIG_PCIEASPM is not set, we save and restore the LTR Capability so that if ASPM L1.2 and LTR were configured by the platform, ASPM L1.2 will still work after suspend/resume, when that platform configuration may be lost. See dbbfadf23190 ("PCI/ASPM: Save LTR Capability for suspend/resume"). Since ASPM L1.2 depends on the LTR Capability, move the save/restore code to the part of aspm.c that is always compiled regardless of CONFIG_PCIEASPM. No functional change intended. Suggested-by: Bjorn Helgaas <[email protected]> Link: https://lore.kernel.org/r/[email protected] [bhelgaas: commit log, reorder to make this a pure move] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: David E. Box <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-03-07PCI/ASPM: Always build aspm.cDavid E. Box2-1/+5
Some ASPM-related tasks, such as save and restore of LTR and L1SS capabilities, still need to be performed when CONFIG_PCIEASPM is not enabled. To prepare for these changes, wrap the current code in aspm.c with an #ifdef and always build the file. Link: https://lore.kernel.org/r/[email protected] [bhelgaas: split build change from function moves] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: David E. Box <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-03-07PCI/ASPM: Move pci_configure_ltr() to aspm.cDavid E. Box4-80/+79
The Latency Tolerance Reporting (LTR) mechanism supports the ASPM L1.2 state and is only configured when CONFIG_PCIEASPM is set. Move pci_configure_ltr() and pci_bridge_reconfigure_ltr() into aspm.c since they only build when CONFIG_PCIEASPM is set. No functional change intended. Suggested-by: Bjorn Helgaas <[email protected]> Link: https://lore.kernel.org/r/[email protected] [bhelgaas: commit log, split build change from function moves] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: David E. Box <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-03-05PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root PortsPaul Menzel1-0/+2
Commit 5459c0b70467 ("PCI/DPC: Quirk PIO log size for certain Intel Root Ports") and commit 3b8803494a06 ("PCI/DPC: Quirk PIO log size for Intel Ice Lake Root Ports") add quirks for Ice, Tiger and Alder Lake Root Ports. System firmware for Raptor Lake still has the bug, so Linux logs the warning below on several Raptor Lake systems like Dell Precision 3581 with Intel Raptor Lake processor (0W18NX) system firmware/BIOS version 1.10.1. pci 0000:00:07.0: [8086:a76e] type 01 class 0x060400 pci 0000:00:07.0: DPC: RP PIO log size 0 is invalid pci 0000:00:07.1: [8086:a73f] type 01 class 0x060400 pci 0000:00:07.1: DPC: RP PIO log size 0 is invalid Apply the quirk for Raptor Lake Root Ports as well. This also enables the DPC driver to dump the RP PIO Log registers when DPC is triggered. Link: https://lore.kernel.org/r/[email protected] Reported-by: Niels van Aert <[email protected]> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218560 Signed-off-by: Paul Menzel <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Cc: <[email protected]> Cc: Mika Westerberg <[email protected]> Cc: Niels van Aert <[email protected]>
2024-02-28PCI/DPC: Ignore Surprise Down error on hot removalSmita Koralahalli1-0/+60
According to PCIe r6.0 sec 6.7.6 [1], async removal with DPC may result in surprise down error. This error is expected and is just a side-effect of async remove. Ignore surprise down error generated as a side-effect of async remove. Typically, this error is benign as the pciehp handler invoked by PDC or/and DLLSC alongside DPC, de-enumerates and brings down the device appropriately, but the error messages might confuse users. Get rid of these irritating log messages with a 1s delay while pciehp waits for DPC recovery. The implementation is as follows: On an async remove a DPC is triggered along with a Presence Detect State change and/or DLL State Change. Determine it's an async remove by checking for DPC Trigger Status in DPC Status Register and Surprise Down Error Status in AER Uncorrected Error Status to be non-zero. If true, treat the DPC event as a side-effect of async remove, clear the error status registers and continue with hot-plug tear down routines. If not, follow the existing routine to handle AER and DPC errors. Masking Surprise Down Errors was explored as an alternative approach, but discarded due to the odd behavior that masking only avoids the interrupt, but still records an error per PCIe r6.0, sec 6.2.3.2.2. That stale error would be reported the next time some error other than Surprise Down is handled. Dmesg before: pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000 pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID) pcieport 0000:00:01.4: device [1022:14ab] error status/mask=00000020/04004000 pcieport 0000:00:01.4: [ 5] SDES (First) nvme nvme2: frozen state error detected, reset controller pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec pcieport 0000:00:01.4: AER: subordinate device reset failed pcieport 0000:00:01.4: AER: device recovery failed pcieport 0000:00:01.4: pciehp: Slot(16): Link Down nvme2n1: detected capacity change from 1953525168 to 0 pci 0000:04:00.0: Removing from iommu group 49 Dmesg after: pcieport 0000:00:01.4: pciehp: Slot(16): Link Down nvme1n1: detected capacity change from 1953525168 to 0 pci 0000:04:00.0: Removing from iommu group 37 [1] PCI Express Base Specification Revision 6.0, Dec 16 2021. https://members.pcisig.com/wg/PCI-SIG/document/16609 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Smita Koralahalli <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Lukas Wunner <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]> Reviewed-by: Ilpo Järvinen <[email protected]>
2024-02-20PCI: Mark 3ware-9650SE Root Port Extended Tags as brokenJörg Wedekind1-0/+1
Per PCIe r6.1, sec 2.2.6.2 and 7.5.3.4, a Requester may not use 8-bit Tags unless its Extended Tag Field Enable is set, but all Receivers/Completers must handle 8-bit Tags correctly regardless of their Extended Tag Field Enable. Some devices do not handle 8-bit Tags as Completers, so add a quirk for them. If we find such a device, we disable Extended Tags for the entire hierarchy to make peer-to-peer DMA possible. The 3ware 9650SE seems to have issues with handling 8-bit tags. Mark it as broken. This fixes PCI Parity Errors like : 3w-9xxx: scsi0: ERROR: (0x06:0x000C): PCI Parity Error: clearing. 3w-9xxx: scsi0: ERROR: (0x06:0x000D): PCI Abort: clearing. 3w-9xxx: scsi0: ERROR: (0x06:0x000E): Controller Queue Error: clearing. 3w-9xxx: scsi0: ERROR: (0x06:0x0010): Microcontroller Error: clearing. Link: https://lore.kernel.org/r/[email protected] Fixes: 60db3a4d8cc9 ("PCI: Enable PCIe Extended Tags if supported") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=202425 Signed-off-by: Jörg Wedekind <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-02-12PCI: Move devres code from pci.c to devres.cPhilipp Stanner3-249/+262
The file pci.c is very large and contains a number of devres functions. These functions should now reside in devres.c. Move as much devres-specific code from pci.c to devres.c as possible. There are a few callers left in pci.c that do devres operations. These should be ported in the future. Add corresponding TODOs. The reason they are not moved right now in this commit is that PCI's devres currently implements a sort of "hybrid-mode": pci_request_region(), for instance, does not have a corresponding pcim_ equivalent, yet. Instead, the function can be made managed by previously calling pcim_enable_device() (instead of pci_enable_device()). This makes it unreasonable to move pci_request_region() to devres.c. Moving the functions would require changes to PCI's API and is, therefore, left for future work. In summary, this commit serves as a preparation step for a following patch series that will cleanly separate the PCI's managed and unmanaged API. Link: https://lore.kernel.org/r/[email protected] Suggested-by: Danilo Krummrich <[email protected]> Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-02-12PCI: Move PCI-specific devres code to drivers/pci/Philipp Stanner4-208/+212
The pcim_*() functions in lib/devres.c are guarded by an #ifdef CONFIG_PCI and, thus, don't belong to this file. They are only ever used for PCI and are not generic infrastructure. Move all pcim_*() functions in lib/devres.c to drivers/pci/devres.c. Adjust the Makefile. Add drivers/pci/devres.c to Documentation. Link: https://lore.kernel.org/r/[email protected] Suggested-by: Danilo Krummrich <[email protected]> Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-02-12PCI: Move pci_iomap.c to drivers/pci/Philipp Stanner8-11/+9
The entirety of pci_iomap.c is guarded by an #ifdef CONFIG_PCI. It, consequently, does not belong to lib/ because it is not generic infrastructure. Move pci_iomap.c to drivers/pci/ and implement the necessary changes to Makefiles and Kconfigs. Update MAINTAINERS file. Update Documentation. Link: https://lore.kernel.org/r/[email protected] [bhelgaas: squash in https://lore.kernel.org/r/[email protected]] Suggested-by: Danilo Krummrich <[email protected]> Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]>
2024-02-02PCI/AER: Clean up version indentation in ABI docsJohan Hovold1-6/+6
The 'KernelVersion' lines use a single space as separator instead of a tab so the values are not aligned with the other AER attribute fields. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-02-02PCI/AER: Fix rootport attribute paths in ABI docsJohan Hovold1-3/+3
The 'aer_stats' directory never made it into the sixth and final revision of the series adding the sysfs AER attributes. Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/lkml/[email protected]/ Fixes: 12833017e581 ("PCI/AER: Add sysfs attributes for rootport cumulative stats") Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-01-31pci_iounmap(): Fix MMIO mapping leakPhilipp Stanner1-1/+1
The #ifdef ARCH_HAS_GENERIC_IOPORT_MAP accidentally also guards iounmap(), which means MMIO mappings are leaked. Move the guard so we call iounmap() for MMIO mappings. Fixes: 316e8d79a095 ("pci_iounmap'2: Electric Boogaloo: try to make sense of it all") Link: https://lore.kernel.org/r/[email protected] Reported-by: Danilo Krummrich <[email protected]> Suggested-by: Arnd Bergmann <[email protected]> Signed-off-by: Philipp Stanner <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Cc: <[email protected]> # v5.15+
2024-01-29PCI: Place interrupt related code into irq.cIlpo Järvinen5-218/+205
Interrupt related code is spread into irq.c, pci.c, and setup-irq.c. Group them into pre-existing irq.c. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2024-01-22PCI/DPC: Print all TLP Prefixes, not just the firstIlpo Järvinen1-1/+1
The TLP Prefix Log Register consists of multiple DWORDs (PCIe r6.1 sec 7.9.14.13) but the loop in dpc_process_rp_pio_error() keeps reading from the first DWORD, so we print only the first PIO TLP Prefix (duplicated several times), and we never print the second, third, etc., Prefixes. Add the iteration count based offset calculation into the config read. Fixes: f20c4ea49ec4 ("PCI/DPC: Add eDPC support") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ilpo Järvinen <[email protected]> [bhelgaas: add user-visible details to commit log] Signed-off-by: Bjorn Helgaas <[email protected]>
2024-01-21Linux 6.8-rc1Linus Torvalds1-2/+2
2024-01-21Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefsLinus Torvalds78-1426/+1629
Pull more bcachefs updates from Kent Overstreet: "Some fixes, Some refactoring, some minor features: - Assorted prep work for disk space accounting rewrite - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this makes our trigger context more explicit - A few fixes to avoid excessive transaction restarts on multithreaded workloads: fstests (in addition to ktest tests) are now checking slowpath counters, and that's shaking out a few bugs - Assorted tracepoint improvements - Starting to break up bcachefs_format.h and move on disk types so they're with the code they belong to; this will make room to start documenting the on disk format better. - A few minor fixes" * tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits) bcachefs: Improve inode_to_text() bcachefs: logged_ops_format.h bcachefs: reflink_format.h bcachefs; extents_format.h bcachefs: ec_format.h bcachefs: subvolume_format.h bcachefs: snapshot_format.h bcachefs: alloc_background_format.h bcachefs: xattr_format.h bcachefs: dirent_format.h bcachefs: inode_format.h bcachefs; quota_format.h bcachefs: sb-counters_format.h bcachefs: counters.c -> sb-counters.c bcachefs: comment bch_subvolume bcachefs: bch_snapshot::btime bcachefs: add missing __GFP_NOWARN bcachefs: opts->compression can now also be applied in the background bcachefs: Prep work for variable size btree node buffers bcachefs: grab s_umount only if snapshotting ...
2024-01-21Merge tag 'timers-core-2024-01-21' of ↵Linus Torvalds7-12/+41
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for time and clocksources: - A fix for the idle and iowait time accounting vs CPU hotplug. The time is reset on CPU hotplug which makes the accumulated systemwide time jump backwards. - Assorted fixes and improvements for clocksource/event drivers" * tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug clocksource/drivers/ep93xx: Fix error handling during probe clocksource/drivers/cadence-ttc: Fix some kernel-doc warnings clocksource/drivers/timer-ti-dm: Fix make W=n kerneldoc warnings clocksource/timer-riscv: Add riscv_clock_shutdown callback dt-bindings: timer: Add StarFive JH8100 clint dt-bindings: timer: thead,c900-aclint-mtimer: separate mtime and mtimecmp regs
2024-01-21Merge tag 'powerpc-6.8-2' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Aneesh Kumar: - Increase default stack size to 32KB for Book3S Thanks to Michael Ellerman. * tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Increase default stack size to 32KB
2024-01-21bcachefs: Improve inode_to_text()Kent Overstreet1-7/+18
Add line breaks - inode_to_text() is now much easier to read. Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: logged_ops_format.hKent Overstreet2-27/+31
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: reflink_format.hKent Overstreet3-47/+48
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs; extents_format.hKent Overstreet2-279/+284
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: ec_format.hKent Overstreet2-16/+20
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: subvolume_format.hKent Overstreet2-32/+36
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: snapshot_format.hKent Overstreet2-33/+37
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: alloc_background_format.hKent Overstreet2-93/+94
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: xattr_format.hKent Overstreet2-15/+20
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: dirent_format.hKent Overstreet2-39/+43
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: inode_format.hKent Overstreet2-164/+167
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs; quota_format.hKent Overstreet2-42/+48
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: sb-counters_format.hKent Overstreet2-95/+100
bcachefs_format.h has gotten too big; let's do some organizing. Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: counters.c -> sb-counters.cKent Overstreet5-8/+7
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: comment bch_subvolumeKent Overstreet1-0/+3
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: bch_snapshot::btimeKent Overstreet2-0/+3
Add a field to bch_snapshot for creation time; this will be important when we start exposing the snapshot tree to userspace. Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: add missing __GFP_NOWARNKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: opts->compression can now also be applied in the backgroundKent Overstreet11-23/+24
The "apply this compression method in the background" paths now use the compression option if background_compression is not set; this means that setting or changing the compression option will cause existing data to be compressed accordingly in the background. Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: Prep work for variable size btree node buffersKent Overstreet18-97/+87
bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <[email protected]>
2024-01-21bcachefs: grab s_umount only if snapshottingSu Yue1-6/+5
When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] *** DEADLOCK *** [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>