aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2024-05-24Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds3-10/+33
Pull drm fixes from Dave Airlie: "Some fixes for the end of the merge window, mostly amdgpu and panthor, with one nouveau uAPI change that fixes a bad decision we made a few months back. nouveau: - fix bo metadata uAPI for vm bind panthor: - Fixes for panthor's heap logical block. - Reset on unrecoverable fault - Fix VM references. - Reset fix. xlnx: - xlnx compile and doc fixes. amdgpu: - Handle vbios table integrated info v2.3 amdkfd: - Handle duplicate BOs in reserve_bo_and_cond_vms - Handle memory limitations on small APUs dp/mst: - MST null deref fix. bridge: - Don't let next bridge create connector in adv7511 to make probe work" * tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel: drm/amdgpu/atomfirmware: add intergrated info v2.3 table drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2 drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms drm/bridge: adv7511: Attach next bridge without creating connector drm/buddy: Fix the warn on's during force merge drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations drm/panthor: Call panthor_sched_post_reset() even if the reset failed drm/panthor: Reset the FW VM to NULL on unplug drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level drm/panthor: Force an immediate reset on unrecoverable faults drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints drm/panthor: Fix an off-by-one in the heap context retrieval logic drm/panthor: Relax the constraints on the tiler chunk size drm/panthor: Make sure the tiler initial/max chunks are consistent drm/panthor: Fix tiler OOM handling to allow incremental rendering drm: xlnx: zynqmp_dpsub: Fix compilation error drm: xlnx: zynqmp_dpsub: Fix few function comments
2024-05-23drm/amdgpu: correct hbm field in boot statusHawking Zhang1-1/+1
hbm filed takes bit 13 and bit 14 in boot status. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: add gfx queue support for gfx11 ipdumpSunil Khatri1-0/+92
Add support of all the CP GFX queues for gfx11 ipdump to be used by devcoredump. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: add cp queue registers for gfx11 ipdumpSunil Khatri1-2/+109
Add gfx11 support of CP queue registers for all queues to be used by devcoredump. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: add print support for gfx11 ipdumpSunil Khatri1-1/+16
Add support of gfx11 ipdump print so devcoredump could trigger it to dump the captured registers in devcoredump. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: add gfx11 registers support in ipdumpSunil Khatri1-1/+106
Add general registers of gfx11 in ipdump for devcoredump support. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: add more device info to the devcoredumpSunil Khatri1-2/+19
Adding more device information: a. PCI info b. VRAM and GTT info c. GDC config Also correct the print layout and section information for in devcoredump. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: add prints in IP State dumpSunil Khatri1-0/+2
add prints before and after ip state is dumped. It avoids user to think of system being stuck/hung as dump could take some time after a gpu hang. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: add gfx queue support of gfx10 in ipdumpSunil Khatri2-0/+91
Add gfx queue register for all instances in devcoredump for gfx10. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: Add cp queues support fro gfx10 in ipdumpSunil Khatri2-4/+113
Add support to dump registers of all instances of cp queue registers of gfx10 to devcoredump. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: rename the ip_dump to ip_dump_coreSunil Khatri2-8/+8
Rename the memory pointer from ip_dump to ip_dump_core to make it specific to core registers and rest other registers to be dumped in their respective memories. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: Add CRC16 selection in configLijo Lazar1-0/+1
KFD uses crc16 for gpu_id generation. Fixes: 3ed181b8ff43 ("drm/amdkfd: Ensure gpu_id is unique") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amd/amdgpu: fix the inst passed to amdgpu_virt_rlcg_reg_rwVictor Zhao2-11/+11
the inst passed to amdgpu_virt_rlcg_reg_rw should be physical instance. Fix the miss matched code. Signed-off-by: Victor Zhao <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu - optimize rlc spm cntlJane Jian3-20/+38
v1 - driver MMIO read the register to check whether write is required - if write is required, sriov full time to use rlcg, otherwise use KIQ v2 - include gfx v11 sriov runtime case Signed-off-by: Jane Jian <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: correct hbm field in boot statusHawking Zhang1-1/+1
hbm filed takes bit 13 and bit 14 in boot status. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: program device_cntl2 through pci cfg spaceFrank Min1-5/+8
device_cntl2 is accessible from pci config space, so program it through pci cfg space instead of mmio. Signed-off-by: Frank Min <[email protected]> Reviewed-by: Kenneth Feng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu/atomfirmware: add intergrated info v2.3 tableLi Ma1-0/+15
[Why] The vram width value is 0. Because the integratedsysteminfo table in VBIOS has updated to 2.3. [How] Driver needs a new intergrated info v2.3 table too. Then the vram width value will be correct. Signed-off-by: Li Ma <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ringSrinivasan Shanmugam1-1/+1
This commit fixes a format truncation issue arosed by the snprintf function potentially writing more characters into the ring->name buffer than it can hold, in the amdgpu_gfx_kiq_init_ring function The issue occurred because the '%d' format specifier could write between 1 and 10 bytes into a region of size between 0 and 8, depending on the values of xcc_id, ring->me, ring->pipe, and ring->queue. The snprintf function could output between 12 and 41 bytes into a destination of size 16, leading to potential truncation. To resolve this, the snprintf line was modified to use the '%hhu' format specifier for xcc_id, ring->me, ring->pipe, and ring->queue. The '%hhu' specifier is used for unsigned char variables and ensures that these values are printed as unsigned decimal integers. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_kiq_init_ring’: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:61: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size between 0 and 8 [-Wformat-truncation=] 332 | snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d", | ^~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:50: note: directive argument in the range [0, 2147483647] 332 | snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d", | ^~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:9: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 16 332 | snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 333 | xcc_id, ring->me, ring->pipe, ring->queue); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 345a36c4f1ba ("drm/amdgpu: prefer snprintf over sprintf") Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: fix invadate operation for pg_flagsJesse Zhang1-2/+2
Since the type of pg_flags is u32, adev->pg_flags >> 16 >> 16 is 0 regardless of the values of its operands. So removing the operations upper_32_bits and lower_32_bits. Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Tim Huang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu/mes12: mes hw_fini fix for mode1 resetJack Xiao1-3/+4
Port mes11 hw_fini to mes12, fix for mode1 reset. Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: fix invadate operation for umschJesse Zhang1-3/+2
Since the type of data_size is uint32_t, adev->umsch_mm.data_size - 1 >> 16 >> 16 is 0 regardless of the values of its operands So removing the operations upper_32_bits and lower_32_bits. Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Tim Huang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/admgpu: fix dereferencing null pointer contextJesse Zhang1-1/+1
When user space sets an invalid ta type, the pointer context will be empty. So it need to check the pointer context before using it Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Tim Huang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: skip to create ras xxx_err_count node when ACA is enabledYang Wang1-0/+6
skip to create 'xxx_err_count' node when ACA is enabled. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-23drm/amdgpu: remove amdgpu_connector_edid() and stop using edid_blob_ptrJani Nikula6-25/+8
amdgpu_connector_edid() copies the EDID from edid_blob_ptr as a side effect if amdgpu_connector->edid isn't initialized. However, everywhere that the returned EDID is used, the EDID should have been set beforehands. Only the drm EDID code should look at the EDID property, anyway, so stop using it. Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: Pan Cc: [email protected] Reviewed-by: Robert Foss <[email protected]> Acked-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/1463862965d76e9458551598fd4d287a08d3d264.1715353572.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <[email protected]>
2024-05-22tracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)1-8/+8
With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Julia Lawall <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]> Acked-by: Jani Nikula <[email protected]> Acked-by: Christian König <[email protected]> for the amdgpu parts. Acked-by: Thomas Hellström <[email protected]> #for Acked-by: Rafael J. Wysocki <[email protected]> # for thermal Acked-by: Takashi Iwai <[email protected]> Acked-by: Darrick J. Wong <[email protected]> # xfs Tested-by: Guenter Roeck <[email protected]>
2024-05-22drm/amdgpu/atomfirmware: add intergrated info v2.3 tableLi Ma1-0/+15
[Why] The vram width value is 0. Because the integratedsysteminfo table in VBIOS has updated to 2.3. [How] Driver needs a new intergrated info v2.3 table too. Then the vram width value will be correct. Signed-off-by: Li Ma <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-05-21Merge tag 'pci-v6.10-changes' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Skip E820 checks for MCFG ECAM regions for new (2016+) machines, since there's no requirement to describe them in E820 and some platforms require ECAM to work (Bjorn Helgaas) - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien Le Moal) - Remove last user and pci_enable_device_io() (Heiner Kallweit) - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen) - Skip waiting for devices that have been disconnected while suspended (Ilpo Järvinen) - Clear Secondary Status errors after enumeration since Master Aborts and Unsupported Request errors are an expected part of enumeration (Vidya Sagar) MSI: - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas) Error handling: - Mask Genesys GL975x SD host controller Replay Timer Timeout correctable errors caused by a hardware defect; the errors cause interrupts that prevent system suspend (Kai-Heng Feng) - Fix EDR-related _DSM support, which previously evaluated revision 5 but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan) ASPM: - Simplify link state definitions and mask calculation (Ilpo Järvinen) Power management: - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS apparently doesn't know how to put them back in D0 (Mario Limonciello) CXL: - Support resetting CXL devices; special handling required because CXL Ports mask Secondary Bus Reset by default (Dave Jiang) DOE: - Support DOE Discovery Version 2 (Alexey Kardashevskiy) Endpoint framework: - Set endpoint BAR to be 64-bit if the driver says that's all the device supports, in addition to doing so if the size is >2GB (Niklas Cassel) - Simplify endpoint BAR allocation and setting interfaces (Niklas Cassel) Cadence PCIe controller driver: - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof Kozlowski) Cadence PCIe endpoint driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) Freescale Layerscape PCIe controller driver: - Convert DT binding to YAML (Frank Li) MediaTek MT7621 PCIe controller driver: - Add DT binding missing 'reg' property for child Root Ports (Krzysztof Kozlowski) - Fix theoretical string truncation in PHY name (Sergio Paracuellos) NVIDIA Tegra194 PCIe controller driver: - Return success for endpoint probe instead of falling through to the failure path (Vidya Sagar) Renesas R-Car PCIe controller driver: - Add DT binding missing IOMMU properties (Geert Uytterhoeven) - Add DT binding R-Car V4H compatible for host and endpoint mode (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) - Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski) - Set the Subsystem Vendor ID, which was previously zero because it was masked incorrectly (Rick Wertenbroek) Synopsys DesignWare PCIe controller driver: - Restructure DBI register access to accommodate devices where this requires Refclk to be active (Manivannan Sadhasivam) - Remove the deinit() callback, which was only need by the pcie-rcar-gen4, and do it directly in that driver (Manivannan Sadhasivam) - Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean up things like eDMA (Manivannan Sadhasivam) - Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel to dw_pcie_ep_init() (Manivannan Sadhasivam) - Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to reflect the actual functionality (Manivannan Sadhasivam) - Call dw_pcie_ep_init_registers() directly from all the glue drivers, not just those that require active Refclk from the host (Manivannan Sadhasivam) - Remove the "core_init_notifier" flag, which was an obscure way for glue drivers to indicate that they depend on Refclk from the host (Manivannan Sadhasivam) TI J721E PCIe driver: - Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli) - Add DT binding J722S SoC support (Siddharth Vadapalli) TI Keystone PCIe controller driver: - Add DT binding missing num-viewport, phys and phy-name properties (Jan Kiszka) Miscellaneous: - Constify and annotate with __ro_after_init (Heiner Kallweit) - Convert DT bindings to YAML (Krzysztof Kozlowski) - Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming Zhou)" * tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits) PCI: Do not wait for disconnected devices when resuming x86/pci: Skip early E820 check for ECAM region PCI: Remove unused pci_enable_device_io() ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io() PCI: Update pci_find_capability() stub return types PCI: Remove PCI_IRQ_LEGACY scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios Revert "genirq/msi: Provide constants for PCI/IMS support" Revert "x86/apic/msi: Enable PCI/IMS" Revert "iommu/vt-d: Enable PCI/IMS" Revert "iommu/amd: Enable PCI/IMS" Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support" ...
2024-05-20drm/amdkfd: Let VRAM allocations go to GTT domain on small APUsLang Yu2-9/+16
Small APUs(i.e., consumer, embedded products) usually have a small carveout device memory which can't satisfy most compute workloads memory allocation requirements. We can't even run a Basic MNIST Example with a default 512MB carveout. https://github.com/pytorch/examples/tree/main/mnist. Error Log: "torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes is free. Of the allocated memory 103.83 MiB is allocated by PyTorch, and 22.17 MiB is reserved by PyTorch but unallocated" Though we can change BIOS settings to enlarge carveout size, which is inflexible and may bring complaint. On the other hand, the memory resource can't be effectively used between host and device. The solution is MI300A approach, i.e., let VRAM allocations go to GTT. Then device and host can flexibly and effectively share memory resource. v2: Report local_mem_size_private as 0. (Felix) Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-20drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vmsLang Yu1-1/+2
Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used. Two attachments use the same VM, root PD would be locked twice. [ 57.910418] Call Trace: [ 57.793726] ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu] [ 57.793820] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu] [ 57.793923] ? idr_get_next_ul+0xbe/0x100 [ 57.793933] kfd_process_device_free_bos+0x7e/0xf0 [amdgpu] [ 57.794041] kfd_process_wq_release+0x2ae/0x3c0 [amdgpu] [ 57.794141] ? process_scheduled_works+0x29c/0x580 [ 57.794147] process_scheduled_works+0x303/0x580 [ 57.794157] ? __pfx_worker_thread+0x10/0x10 [ 57.794160] worker_thread+0x1a2/0x370 [ 57.794165] ? __pfx_worker_thread+0x10/0x10 [ 57.794167] kthread+0x11b/0x150 [ 57.794172] ? __pfx_kthread+0x10/0x10 [ 57.794177] ret_from_fork+0x3d/0x60 [ 57.794181] ? __pfx_kthread+0x10/0x10 [ 57.794184] ret_from_fork_asm+0x1b/0x30 Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-20drm/amdgpu: Fix amdgpu_vm_is_bo_always_valid kerneldocTvrtko Ursulin1-1/+1
Align kerneldoc with the function argument name. Signed-off-by: Tvrtko Ursulin <[email protected]> Reported-by: Stephen Rothwell <[email protected]> Fixes: 26e20235ce00 ("drm/amdgpu: Add amdgpu_bo_is_vm_bo helper") Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-20drm/amdgpu: remove unused struct 'hqd_registers'Dr. David Alan Gilbert1-38/+0
'hqd_registers' used to be used in a member of the 'bonaire_mqd' struct. 'bonaire_mqd' was removed by commit 486d807cd9a9 ("drm/amdgpu: remove duplicate definition of cik_mqd") It's now unused. Remove 'hqd_registers' as well. Signed-off-by: Dr. David Alan Gilbert <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-20drm/amdgpu: update type of buf size to u32 for eeprom functionsTao Zhou2-5/+5
Avoid overflow issue. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-20drm/amdgpu: Queue KFD reset workitem in VF FEDVictor Skvortsov1-1/+1
The guest recovery sequence is buggy in Fatal Error when both FLR & KFD reset workitems are queued at the same time. In addition, FLR guest recovery sequence is out of order when PF/VF communication breaks due to a GPU fatal error As a temporary work around, perform a KFD style reset (Initiate reset request from the guest) inside the pf2vf thread on FED. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-20drm/amdgpu: Extend KIQ reg polling wait for VFVictor Skvortsov1-3/+3
Runtime KIQ interface to read/write registers in VF may take longer than expected for BM environment. Extend the timeout. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-18Merge tag 'kbuild-v6.10' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Avoid 'constexpr', which is a keyword in C23 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of 'dt_binding_check' - Fix weak references to avoid GOT entries in position-independent code generation - Convert the last use of 'optional' property in arch/sh/Kconfig - Remove support for the 'optional' property in Kconfig - Remove support for Clang's ThinLTO caching, which does not work with the .incbin directive - Change the semantics of $(src) so it always points to the source directory, which fixes Makefile inconsistencies between upstream and downstream - Fix 'make tar-pkg' for RISC-V to produce a consistent package - Provide reasonable default coverage for objtool, sanitizers, and profilers - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc. - Remove the last use of tristate choice in drivers/rapidio/Kconfig - Various cleanups and fixes in Kconfig * tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits) kconfig: use sym_get_choice_menu() in sym_check_prop() rapidio: remove choice for enumeration kconfig: lxdialog: remove initialization with A_NORMAL kconfig: m/nconf: merge two item_add_str() calls kconfig: m/nconf: remove dead code to display value of bool choice kconfig: m/nconf: remove dead code to display children of choice members kconfig: gconf: show checkbox for choice correctly kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal Makefile: remove redundant tool coverage variables kbuild: provide reasonable defaults for tool coverage modules: Drop the .export_symbol section from the final modules kconfig: use menu_list_for_each_sym() in sym_check_choice_deps() kconfig: use sym_get_choice_menu() in conf_write_defconfig() kconfig: add sym_get_choice_menu() helper kconfig: turn defaults and additional prompt for choice members into error kconfig: turn missing prompt for choice members into error kconfig: turn conf_choice() into void function kconfig: use linked list in sym_set_changed() kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED kconfig: gconf: remove debug code ...
2024-05-17drm/amdgpu: fix ACA no query result after gpu resetYang Wang3-13/+4
fix ACA no query result after gpu reset. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amd/pm: Remove legacy interface for xgmi plpdLijo Lazar1-2/+2
Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI PLPD mode. Also, xgmi_plpd_policy sysfs node is not used by any client. Remove that as well. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu: change bank cache lock type to spinlockYang Wang2-7/+6
modify the lock type to 'spinlock' to avoid schedule issue in interrupt context. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu: change aca bank error lock type to spinlockYang Wang2-11/+11
modify the lock type to 'spinlock' to avoid schedule issue in interrupt context. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu: Describe all object placements in debugfsTvrtko Ursulin1-0/+15
Accurately show all placements when describing objects in debugfs, instead of bunching them up under the 'CPU' placement. Signed-off-by: Tvrtko Ursulin <[email protected]> Cc: Christian König <[email protected]> Cc: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu: Reduce mem_type to domain double indirectionTvrtko Ursulin2-19/+13
All apart from AMDGPU_GEM_DOMAIN_GTT memory domains map 1:1 to TTM placements. And the former be either AMDGPU_PL_PREEMPT or TTM_PL_TT, depending on AMDGPU_GEM_CREATE_PREEMPTIBLE. Simplify a few places in the code which convert the TTM placement into a domain by checking against the current placement directly. In the conversion AMDGPU_PL_PREEMPT either does not have to be handled because amdgpu_mem_type_to_domain() cannot return that value anyway. v2: * Remove AMDGPU_PL_PREEMPT handling. v3: * Rebase. Signed-off-by: Tvrtko Ursulin <[email protected]> Reviewed-by: Christian König <[email protected]> # v1 Reviewed-by: Felix Kuehling <[email protected]> # v2 Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu: Add amdgpu_bo_is_vm_bo helperTvrtko Ursulin3-16/+29
Help code readability by replacing a bunch of: bo->tbo.base.resv == vm->root.bo->tbo.base.resv With: amdgpu_vm_is_bo_always_valid(vm, bo) No functional changes. v2: * Rename helper and move to amdgpu_vm. (Christian) v3: * Use Christian's kerneldoc. v4: * Fixed logic inversion in amdgpu_vm_bo_get_memory. Signed-off-by: Tvrtko Ursulin <[email protected]> Cc: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu: Remove duplicate check for *is_queue_unmap in ↵Srinivasan Shanmugam1-4/+0
sdma_v7_0_ring_set_wptr This commit removes a duplicate check for *is_queue_unmap in the sdma_v7_0_ring_set_wptr function. The check at line 171 was considered dead code because at this point in the code, we already know that *is_queue_unmap is false due to the check at line 161. By removing this unnecessary check, improves the readability of the code Fixes the below: drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c:171 sdma_v7_0_ring_set_wptr() warn: duplicate check '*is_queue_unmap' (previous on line 161) drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c 140 static void sdma_v7_0_ring_set_wptr(struct amdgpu_ring *ring) 141 { 142 struct amdgpu_device *adev = ring->adev; 143 uint32_t *wptr_saved; 144 uint32_t *is_queue_unmap; 145 uint64_t aggregated_db_index; 146 uint32_t mqd_size = adev->mqds[AMDGPU_HW_IP_DMA].mqd_size; 147 148 DRM_DEBUG("Setting write pointer\n"); 149 150 if (ring->is_mes_queue) { 151 wptr_saved = (uint32_t *)(ring->mqd_ptr + mqd_size); 152 is_queue_unmap = (uint32_t *)(ring->mqd_ptr + mqd_size + ^^^^^^^^^^^^^^^^ Set here 153 sizeof(uint32_t)); 154 aggregated_db_index = 155 amdgpu_mes_get_aggregated_doorbell_index(adev, 156 ring->hw_prio); 157 158 atomic64_set((atomic64_t *)ring->wptr_cpu_addr, 159 ring->wptr << 2); 160 *wptr_saved = ring->wptr << 2; 161 if (*is_queue_unmap) { ^^^^^^^^^^^^^^^ Checked here 162 WDOORBELL64(aggregated_db_index, ring->wptr << 2); 163 DRM_DEBUG("calling WDOORBELL64(0x%08x, 0x%016llx)\n", 164 ring->doorbell_index, ring->wptr << 2); 165 WDOORBELL64(ring->doorbell_index, ring->wptr << 2); 166 } else { 167 DRM_DEBUG("calling WDOORBELL64(0x%08x, 0x%016llx)\n", 168 ring->doorbell_index, ring->wptr << 2); 169 WDOORBELL64(ring->doorbell_index, ring->wptr << 2); 170 --> 171 if (*is_queue_unmap) ^^^^^^^^^^^^^^^ This is dead code. We know it's false. 172 WDOORBELL64(aggregated_db_index, 173 ring->wptr << 2); 174 } 175 } else { 176 if (ring->use_doorbell) { 177 DRM_DEBUG("Using doorbell -- " 178 "wptr_offs == 0x%08x " Fixes: b412351e91bd ("drm/amdgpu: Add sdma v7_0 ip block support (v7)") Cc: Likun Gao <[email protected]> Cc: Hawking Zhang <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Likun Gao <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1Tim Van Patten1-1/+0
The following commit updated gmc->noretry from 0 to 1 for GC HW IP 9.3.0: commit 5f3854f1f4e2 ("drm/amdgpu: add more cases to noretry=1") This causes the device to hang when a page fault occurs, until the device is rebooted. Instead, revert back to gmc->noretry=0 so the device is still responsive. Fixes: 5f3854f1f4e2 ("drm/amdgpu: add more cases to noretry=1") Signed-off-by: Tim Van Patten <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu/vcn: update vcn5 enc/dec capabilitiesRuijing Dong1-5/+5
Update the capabilities for supporting 8k encoding. Reviewed-by: David (Ming Qiang) Wu <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Ruijing Dong <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu: Remove duplicate amdgpu_umsch_mm.h headerJiapeng Chong1-1/+0
./drivers/gpu/drm/amd/amdgpu/amdgpu.h: amdgpu_umsch_mm.h is included more than once. Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9063 Signed-off-by: Jiapeng Chong <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amd/amdgpu: add module parameter for jpegKenneth Feng3-0/+8
add module parameter for jpeg. this is a temporary workaround for jpeg unit test fail on vcn 5.0 now. will be removed later. Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Sonny Jiang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu: fix documentation errors in gmc v12.0Alex Deucher1-0/+5
Fix up parameter descriptions. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdkfd: Check correct memory types for is_system variableSreekant Somasekharan1-1/+2
To catch GPU mapping of system memory, TTM_PL_TT and AMDGPU_PL_PREEMPT must be checked. Fixes: 628e1ace2379 ("drm/amdkfd: mark GFX12 system and peer GPU memory mappings as MTYPE_NC") Signed-off-by: Sreekant Somasekharan <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu: fix documentation errors in sdma v7.0Alex Deucher1-9/+14
Fix up parameter descriptions. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>