aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2023-05-11drm/amdgpu: change gfx 11.0.4 external_id rangeYifan Zhang1-1/+1
gfx 11.0.4 range starts from 0x80. Fixes: 311d52367d0a ("drm/amdgpu: add soc21 common ip block support for GC 11.0.4") Cc: [email protected] Signed-off-by: Yifan Zhang <[email protected]> Reported-by: Yogesh Mohan Marimuthu <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-05-11drm/amdgpu/jpeg: Remove harvest checking for JPEG3Saleemkhan Jamadar1-0/+1
Register CC_UVD_HARVESTING is obsolete for JPEG 3.1.2 Signed-off-by: Saleemkhan Jamadar <[email protected]> Reviewed-by: Veerabadhran Gopalakrishnan <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.1.x
2023-05-11drm/amdgpu/gfx: disable gfx9 cp_ecc_error_irq only when enabling legacy gfx rasGuchun Chen1-1/+2
gfx9 cp_ecc_error_irq is only enabled when legacy gfx ras is assert. So in gfx_v9_0_hw_fini, interrupt disablement for cp_ecc_error_irq should be executed under such condition, otherwise, an amdgpu_irq_put calltrace will occur. [ 7283.170322] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] [ 7283.170964] RSP: 0018:ffff9a5fc3967d00 EFLAGS: 00010246 [ 7283.170967] RAX: ffff98d88afd3040 RBX: ffff98d89da20000 RCX: 0000000000000000 [ 7283.170969] RDX: 0000000000000000 RSI: ffff98d89da2bef8 RDI: ffff98d89da20000 [ 7283.170971] RBP: ffff98d89da20000 R08: ffff98d89da2ca18 R09: 0000000000000006 [ 7283.170973] R10: ffffd5764243c008 R11: 0000000000000000 R12: 0000000000001050 [ 7283.170975] R13: ffff98d89da38978 R14: ffffffff999ae15a R15: ffff98d880130105 [ 7283.170978] FS: 0000000000000000(0000) GS:ffff98d996f00000(0000) knlGS:0000000000000000 [ 7283.170981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7283.170983] CR2: 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0 [ 7283.170986] Call Trace: [ 7283.170988] <TASK> [ 7283.170989] gfx_v9_0_hw_fini+0x1c/0x6d0 [amdgpu] [ 7283.171655] amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu] [ 7283.172245] amdgpu_device_suspend+0x103/0x180 [amdgpu] [ 7283.172823] amdgpu_pmops_freeze+0x21/0x60 [amdgpu] [ 7283.173412] pci_pm_freeze+0x54/0xc0 [ 7283.173419] ? __pfx_pci_pm_freeze+0x10/0x10 [ 7283.173425] dpm_run_callback+0x98/0x200 [ 7283.173430] __device_suspend+0x164/0x5f0 v2: drop gfx11 as it's fixed in a different solution by retiring cp_ecc_irq funcs(Hawking) Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-05-11drm/amdgpu: disable sdma ecc irq only when sdma RAS is enabled in suspendGuchun Chen1-3/+5
sdma_v4_0_ip is shared on a few asics, but in sdma_v4_0_hw_fini, driver unconditionally disables ecc_irq which is only enabled on those asics enabling sdma ecc. This will introduce a warning in suspend cycle on those chips with sdma ip v4.0, while without sdma ecc. So this patch correct this. [ 7283.166354] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] [ 7283.167001] RSP: 0018:ffff9a5fc3967d08 EFLAGS: 00010246 [ 7283.167019] RAX: ffff98d88afd3770 RBX: 0000000000000001 RCX: 0000000000000000 [ 7283.167023] RDX: 0000000000000000 RSI: ffff98d89da30390 RDI: ffff98d89da20000 [ 7283.167025] RBP: ffff98d89da20000 R08: 0000000000036838 R09: 0000000000000006 [ 7283.167028] R10: ffffd5764243c008 R11: 0000000000000000 R12: ffff98d89da30390 [ 7283.167030] R13: ffff98d89da38978 R14: ffffffff999ae15a R15: ffff98d880130105 [ 7283.167032] FS: 0000000000000000(0000) GS:ffff98d996f00000(0000) knlGS:0000000000000000 [ 7283.167036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7283.167039] CR2: 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0 [ 7283.167041] Call Trace: [ 7283.167046] <TASK> [ 7283.167048] sdma_v4_0_hw_fini+0x38/0xa0 [amdgpu] [ 7283.167704] amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu] [ 7283.168296] amdgpu_device_suspend+0x103/0x180 [amdgpu] [ 7283.168875] amdgpu_pmops_freeze+0x21/0x60 [amdgpu] [ 7283.169464] pci_pm_freeze+0x54/0xc0 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-05-11drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)Lin.Cao1-1/+5
v1: Vmbo->shadow is used to back vram bo up when vram lost. So that we should set shadow as vmbo->shadow to recover vmbo->bo v2: Modify if(vmbo->shadow) shadow = vmbo->shadow as if(!vmbo->shadow) continue; Fixes: e18aaea733da ("drm/amdgpu: move shadow_list to amdgpu_bo_vm") Reviewed-by: Christian König <[email protected]> Signed-off-by: Lin.Cao <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-05-11drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcsHoratio Zhang2-49/+5
The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still use amdgpu_irq_put to disable this interrupt, which caused the call trace in this function. [ 102.873958] Call Trace: [ 102.873959] <TASK> [ 102.873961] gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [ 102.874019] gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.874072] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.874122] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.874172] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.874223] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.874321] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.874375] process_one_work+0x21f/0x3f0 [ 102.874377] worker_thread+0x200/0x3e0 [ 102.874378] ? process_one_work+0x3f0/0x3f0 [ 102.874379] kthread+0xfd/0x130 [ 102.874380] ? kthread_complete_and_exit+0x20/0x20 [ 102.874381] ret_from_fork+0x22/0x30 v2: - Handle umc and gfx ras cases in separated patch - Retired the gfx_v11_0_cp_ecc_error_irq_funcs in gfx11 v3: - Improve the subject and code comments - Add judgment on gfx11 in the function of amdgpu_gfx_ras_late_init v4: - Drop the define of CP_ME1_PIPE_INST_ADDR_INTERVAL and SET_ECC_ME_PIPE_STATE which using in gfx_v11_0_set_cp_ecc_error_state - Check cp_ecc_error_irq.funcs rather than ip version for a more sustainable life v5: - Simplify judgment conditions Signed-off-by: Horatio Zhang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-05-11drm/amdgpu: set gfx9 onwards APU atomics support to be trueYifan Zhang1-0/+6
APUs w/ gfx9 onwards doesn't reply on PCIe atomics, rather it is internal path w/ native atomic support. Set have_atomics_support to true. Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Lang Yu <[email protected]> Acked-by: Felix Kuehling <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-05-11drm/amdgpu/nv: update VCN 3 max HEVC encoding resolutionThong Thai1-6/+16
Update the maximum resolution reported for HEVC encoding on VCN 3 devices to reflect its 8K encoding capability. v2: Also update the max height for H.264 encoding to match spec. (Ruijing) Signed-off-by: Thong Thai <[email protected]> Reviewed-by: Ruijing Dong <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-05-05Merge tag 'drm-next-2023-05-05' of git://anongit.freedesktop.org/drm/drmLinus Torvalds12-65/+76
Pull more drm fixes from Dave Airlie: "This is the fixes for the last couple of weeks for i915 and last 3 weeks for amdgpu, lots of them but pretty scattered around and all pretty small. amdgpu: - SR-IOV fixes - DCN 3.2 fixes - DC mclk handling fixes - eDP fixes - SubVP fixes - HDCP regression fix - DSC fixes - DC FP fixes - DCN 3.x fixes - Display flickering fix when switching between vram and gtt - Z8 power saving fix - Fix hang when skipping modeset - GPU reset fixes - Doorbell fix when resizing BARs - Fix spurious warnings in gmc - Locking fix for AMDGPU_SCHED IOCTL - SR-IOV fix - DCN 3.1.4 fix - DCN 3.2 fix - Fix job cleanup when CS is aborted i915: - skl pipe source size check - mtl transcoder mask fix - DSI power on sequence fix - GuC versioning corner case fix" * tag 'drm-next-2023-05-05' of git://anongit.freedesktop.org/drm/drm: (48 commits) drm/amdgpu: drop redundant sched job cleanup when cs is aborted drm/amd/display: filter out invalid bits in pipe_fuses drm/amd/display: Change default Z8 watermark values drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOV drm/amdgpu: add a missing lock for AMDGPU_SCHED drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini() drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini drm/amdgpu: Enable doorbell selfring after resize FB BAR drm/amdgpu: Use the default reset when loading or reloading the driver drm/amdgpu: Fix mode2 reset for sienna cichlid drm/i915/dsi: Use unconditional msleep() instead of intel_dsi_msleep() drm/i915/mtl: Add the missing CPU transcoder mask in intel_device_info drm/i915/guc: Actually return an error if GuC version range check fails drm/amd/display: Lowering min Z8 residency time drm/amd/display: fix flickering caused by S/G mode drm/amd/display: Set min_width and min_height capability for DCN30 drm/amd/display: Isolate remaining FPU code in DCN32 drm/amd/display: Update bounding box values for DCN321 drm/amd/display: Do not clear GPINT register when releasing DMUB from reset ...
2023-05-03drm/amdgpu: drop redundant sched job cleanup when cs is abortedGuchun Chen1-10/+3
Once command submission failed due to userptr invalidation in amdgpu_cs_submit, legacy code will perform cleanup of scheduler job. However, it's not needed at all, as former commit has integrated job cleanup stuff into amdgpu_job_free. Otherwise, because of double free, a NULL pointer dereference will occur in such scenario. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2457 Fixes: f7d66fb2ea43 ("drm/amdgpu: cleanup scheduler job initialization v2") Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-05-03drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOVHorace Chen1-4/+1
[Why] This WPTR_POLL_ENABLE is a hardware contigious polling which will cause FCLK and UCLK to keep on a high level. Mostly its case can be covered by F32_WPTR_POLL_ENABLE which polls by firmware. So to save power, SR-IOV also needs to disable this bit Signed-off-by: Horace Chen <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-05-03drm/amdgpu: add a missing lock for AMDGPU_SCHEDChia-I Wu1-1/+5
mgr->ctx_handles should be protected by mgr->lock. v2: improve commit message v3: add a Fixes tag Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Christian König <[email protected]> Fixes: 52c6a62c64fa ("drm/amdgpu: add interface for editing a foreign process's priority v3") Signed-off-by: Alex Deucher <[email protected]>
2023-05-03drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini()Hamza Mahfooz1-1/+0
As made mention of in commit 08c677cb0b43 ("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini") and commit 13af556104fa ("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini"). It is meaningless to call amdgpu_irq_put() for gmc.ecc_irq. So, remove it from gmc_v9_0_hw_fini(). Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: 3029c855d79f ("drm/amdgpu: Fix desktop freezed after gpu-reset") Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-05-03drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_finiHoratio Zhang1-1/+0
The gmc.ecc_irq is enabled by firmware per IFWI setting, and the host driver is not privileged to enable/disable the interrupt. So, it is meaningless to use the amdgpu_irq_put function in gmc_v10_0_hw_fini, which also leads to the call trace. [ 82.340264] Call Trace: [ 82.340265] <TASK> [ 82.340269] gmc_v10_0_hw_fini+0x83/0xa0 [amdgpu] [ 82.340447] gmc_v10_0_suspend+0xe/0x20 [amdgpu] [ 82.340623] amdgpu_device_ip_suspend_phase2+0x127/0x1c0 [amdgpu] [ 82.340789] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 82.340955] amdgpu_device_pre_asic_reset+0xdd/0x2b0 [amdgpu] [ 82.341122] amdgpu_device_gpu_recover.cold+0x4dd/0xbb2 [amdgpu] [ 82.341359] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 82.341529] process_one_work+0x21d/0x3f0 [ 82.341535] worker_thread+0x1fa/0x3c0 [ 82.341538] ? process_one_work+0x3f0/0x3f0 [ 82.341540] kthread+0xff/0x130 [ 82.341544] ? kthread_complete_and_exit+0x20/0x20 [ 82.341547] ret_from_fork+0x22/0x30 Signed-off-by: Horatio Zhang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: c8b5a95b5709 ("drm/amdgpu: Fix desktop freezed after gpu-reset") Cc: [email protected]
2023-05-03drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_finiHoratio Zhang1-1/+0
The gmc.ecc_irq is enabled by firmware per IFWI setting, and the host driver is not privileged to enable/disable the interrupt. So, it is meaningless to use the amdgpu_irq_put function in gmc_v11_0_hw_fini, which also leads to the call trace. [ 102.980303] Call Trace: [ 102.980303] <TASK> [ 102.980304] gmc_v11_0_hw_fini+0x54/0x90 [amdgpu] [ 102.980357] gmc_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.980409] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.980459] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.980520] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.980573] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.980687] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.980740] process_one_work+0x21f/0x3f0 [ 102.980741] worker_thread+0x200/0x3e0 [ 102.980742] ? process_one_work+0x3f0/0x3f0 [ 102.980743] kthread+0xfd/0x130 [ 102.980743] ? kthread_complete_and_exit+0x20/0x20 [ 102.980744] ret_from_fork+0x22/0x30 Signed-off-by: Horatio Zhang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: c8b5a95b5709 ("drm/amdgpu: Fix desktop freezed after gpu-reset") Cc: [email protected]
2023-05-03drm/amdgpu: Enable doorbell selfring after resize FB BARShane Xiao3-30/+41
[Why] The selfring doorbell aperture will change when resize FB BAR successfully during gmc sw init, we should reorder the sequence of enabling doorbell selfring aperture. [How] Move enable_doorbell_selfring_aperture from *_common_hw_init to *_common_late_init. This fixes the potential issue that GPU ring its own doorbell when this device is in translated mode when iommu is on. v2: Remove *_enable_doorbell_aperture functions (Christian) v3: Add comments to note that why we need enable doorbell selfring late (Christian) Signed-off-by: Shane Xiao <[email protected]> Signed-off-by: Aaron Liu <[email protected]> Tested-by: Xiaomeng Hou <[email protected]> Reviewed-by: Christian K�nig <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-05-03drm/amdgpu: Use the default reset when loading or reloading the driverlyndonli1-0/+7
Below call trace and errors are observed when reloading amdgpu driver with the module parameter reset_method=3. It should do a default reset when loading or reloading the driver, regardless of the module parameter reset_method. v2: add comments inside and modify commit messages. [ +2.180243] [drm] psp gfx command ID_LOAD_TOC(0x20) failed and response status is (0x0) [ +0.000011] [drm:psp_hw_start [amdgpu]] *ERROR* Failed to load toc [ +0.000890] [drm:psp_hw_start [amdgpu]] *ERROR* PSP tmr init failed! [ +0.020683] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off. [ +0.000003] RIP: 0010:amdgpu_bo_release_notify+0x1ef/0x210 [amdgpu] [ +0.000004] Call Trace: [ +0.000003] <TASK> [ +0.000008] ttm_bo_release+0x2c4/0x330 [amdttm] [ +0.000026] amdttm_bo_put+0x3c/0x70 [amdttm] [ +0.000020] amdgpu_bo_free_kernel+0xe6/0x140 [amdgpu] [ +0.000728] psp_v11_0_ring_destroy+0x34/0x60 [amdgpu] [ +0.000826] psp_hw_init+0xe7/0x2f0 [amdgpu] [ +0.000813] amdgpu_device_fw_loading+0x1ad/0x2d0 [amdgpu] [ +0.000731] amdgpu_device_init.cold+0x108e/0x2002 [amdgpu] [ +0.001071] ? do_pci_enable_device+0xe1/0x110 [ +0.000011] amdgpu_driver_load_kms+0x1a/0x160 [amdgpu] [ +0.000729] amdgpu_pci_probe+0x179/0x3a0 [amdgpu] Signed-off-by: lyndonli <[email protected]> Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Reviewed-by: Kenneth Feng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-05-03drm/amdgpu: Fix mode2 reset for sienna cichlidlyndonli1-1/+1
Before this change, sienna_cichlid_get_reset_handler will always return NULL, although the module parameter reset_method is 3 when loading amdgpu driver. Signed-off-by: lyndonli <[email protected]> Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Reviewed-by: Kenneth Feng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-25Merge tag 'drm-next-2023-04-24' of git://anongit.freedesktop.org/drm/drmLinus Torvalds107-1605/+5565
Pull drm updates from Dave Airlie: "There is a new Qualcomm accel driver for their QAIC, dma-fence got a deadline feature added, lots of refactoring around fbdev emulation, and the usual pre-release hw enablements from AMD and Intel and fixes everywhere. New drivers: - add QAIC acceleration driver dma-buf: - constify kobj_type structs - Reject prime DMA-Buf attachment if get_sg_table is missing. fbdev: - cmdline parser fixes - implement fbdev emulation for GEM DMA drivers - always use shadow buffer in fbdev emulation helpers dma-fence: - add deadline hint to fences - signal private stub fence core: - improve DisplayID 2.0 and EDID parsing - add gem eviction function + callback - prep to convert shmem helper to GEM resv lock - move suballocator from radeon/amdgpu to core for Xe - HPD polling fixes - Documentation improvements - Add atomic enable_plane callback - use tgid instead of pid for client tracking - DP: Add SDP Error Detection Configuration Register - Add prime import/export to vram-helper - use pci aperture helpers in more drivers panel: - Radxa 8/10HD support - Samsung AMD495QA01 support - Elida KD50T048A - Sony TD4353 - Novatek NT36523 - STARRY 2081101QFH032011-53G - B133UAN01.0 - AUO NE135FBM-N41 i915: - More MTL enabling - fix s/r problems with MEI/PXP - Implement fb_dirty for PSR,FBC,DRRS fixes - Fix eDP+DSI dual panel systems - Fix issue #6333: "list_add corruption" and full system lockup from performance monitoring - Don't use stolen memory or BAR for ring buffers on LLC platforms - Make sure DSM size has correct 1MiB granularity on Gen12+ - Whitelist COMMON_SLICE_CHICKEN3 for UMD access on Gen12+ - Add engine TLB invalidation for Meteorlake - Fix GSC races on driver load/unload on Meteorlake+ - Make kobj_type structures constant - Move fd_install after last use of fence - wm/vblank refactoring - display code refactoring - Create GSC submission targeting HDCP and PXP usages on MTL+ - Enable HDCP2.x via GSC CS - Fix context runtime accounting on sysfs fdinfo for heavy workloads - Use i915 instead of dev_priv insied the file_priv structure - Replace fake flex-array with flexible-array member amdgpu: - Make kobj structures const - Generalize dmabuf import to work with KFD - Add capped/uncapped workload handling for supported APUs - Expose additional memory stats via fdinfo - Register vga_switcheroo for apple-gmux - Initial NBIO7.9, GC 9.4.3, GFXHUB 1.2, MMHUB 1.8 support - Initial DC FAM infrastructure - Link DC backlight to connector device rather than PCI device - Add sysfs nodes for secondary VCN clocks amdkfd: - Make kobj structures const - Support for exporting buffers via dmabuf - Multi-VMA page migration fixes - initial GC 9.4.3 support radeon: - iMac fix - convert to client based fbdev emulation habanalabs: - Add opcodes to the CS ioctl to allow user to stall/resume specific engines inside Gaudi2. - INFO ioctl the amount of device memory that the driver and f/w reserve for themselves. - INFO ioctl a bit-mask of the available rotator engines - INFO ioctl the register's address of the f/w that should be used to trigger interrupts - INFO ioctl two new opcodes to fetch information on h/w and f/w events - Enable graceful reset mechanism for compute-reset. - Align to the latest firmware specs. - Enforce the release order of the compute device and dma-buf. msm: - UBWC decoder programming rework - SM8550, SM8450 bindings update - uapi C++ fix - a3xx and a4xx devfreq support - GPU and GEM updates to avoid allocations which could trigger reclaim (shrinker) in fence signaling path - dma-fence deadline hint support and wait-boost - a640/650 speed bin support cirrus: - convert to regular atomic helpers - add damage clipping mediatek: - 10-bit overlay support - mt8195 support - Only trigger DRM HPD events if bridge is attached - Change the aux retries times when receiving AUX_DEFER rockchip: - add 4K support vc4: - use drm_gem_objects virtio: - allow KMS support to be disabled - add damage clipping vmwgfx: - buffer object lifetime fixes exynos: - move MIPI DSI driver to drm bridge for iMX sharing - use kernel fbdev emulation panfrost: - add support for mali MT81xx devices - add speed binning support lima: - add usage stats tegra: - fbdev client conversion vkms: - Add primary plane positioning support" * tag 'drm-next-2023-04-24' of git://anongit.freedesktop.org/drm/drm: (1495 commits) drm/i915/dp_mst: Fix active port PLL selection for secondary MST streams drm/exynos: Implement fbdev emulation as in-kernel client drm/exynos: Initialize fbdev DRM client drm/exynos: Remove fb_helper from struct exynos_drm_private drm/exynos: Remove struct exynos_drm_fbdev drm/exynos: Remove exynos_gem from struct exynos_drm_fbdev drm/i915: Fix memory leaks in i915 selftests drm/i915: Make intel_get_crtc_new_encoder() less oopsy drm/i915/gt: Avoid out-of-bounds access when loading HuC drm/amdgpu: add some basic elements for multiple XCD case drm/amdgpu: move vmhub out of amdgpu_ring_funcs (v4) Revert "drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV" drm/amdgpu: add common ip block for GC 9.4.3 drm/amd/display: Add logging when DP link training Clock recovery is Successful drm/amdgpu: add common early init support for GC 9.4.3 drm/amdgpu: switch to v9_4_3 gfx_funcs callbacks for GC 9.4.3 drm/amd/display: Add logging when setting DP sink power state fails drm/amdkfd: Add gfx_target_version for GC 9.4.3 drm/amdkfd: Enable HW_UPDATE_RPTR on GC 9.4.3 drm/amdgpu: reserve the old gc_11_0_*_mes.bin ...
2023-04-18drm/amdgpu: Fix desktop freezed after gpu-resetAlan Liu1-0/+3
[Why] After gpu-reset, sometimes the driver fails to enable vblank irq, causing flip_done timed out and the desktop freezed. During gpu-reset, we disable and enable vblank irq in dm_suspend() and dm_resume(). Later on in amdgpu_irq_gpu_reset_resume_helper(), we check irqs' refcount and decide to enable or disable the irqs again. However, we have 2 sets of API for controling vblank irq, one is dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has its own refcount and flag to store the state of vblank irq, and they are not synchronized. In drm we use the first API to control vblank irq but in amdgpu_irq_gpu_reset_resume_helper() we use the second set of API. The failure happens when vblank irq was enabled by dm_vblank_get() before gpu-reset, we have vblank->enabled true. However, during gpu-reset, in amdgpu_irq_gpu_reset_resume_helper() vblank irq's state checked from amdgpu_irq_update() is DISABLED. So finally it disables vblank irq again. After gpu-reset, if there is a cursor plane commit, the driver will try to enable vblank irq by calling drm_vblank_enable(), but the vblank->enabled is still true, so it fails to turn on vblank irq and causes flip_done can't be completed in vblank irq handler and desktop become freezed. [How] Combining the 2 vblank control APIs by letting drm's API finally calls amdgpu_irq's API, so the irq's refcount and state of both APIs can be synchronized. Also add a check to prevent refcount from being less then 0 in amdgpu_irq_put(). v2: - Add warning in amdgpu_irq_enable() if the irq is already disabled. - Call dc_interrupt_set() in dm_set_vblank() to avoid refcount change if it is in gpu-reset. v3: - Improve commit message and code comments. Signed-off-by: Alan Liu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-04-18drm/amdgpu: release gpu full access after "amdgpu_device_ip_late_init"Chong Li1-15/+17
[WHY] Function "amdgpu_irq_update()" called by "amdgpu_device_ip_late_init()" is an atomic context. We shouldn't access registers through KIQ since "msleep()" may be called in "amdgpu_kiq_rreg()". [HOW] Move function "amdgpu_virt_release_full_gpu()" after function "amdgpu_device_ip_late_init()", to ensure that registers be accessed through RLCG instead of KIQ. Call Trace: <TASK> show_stack+0x52/0x69 dump_stack_lvl+0x49/0x6d dump_stack+0x10/0x18 __schedule_bug.cold+0x4f/0x6b __schedule+0x473/0x5d0 ? __wake_up_klogd.part.0+0x40/0x70 ? vprintk_emit+0xbe/0x1f0 schedule+0x68/0x110 schedule_timeout+0x87/0x160 ? timer_migration_handler+0xa0/0xa0 msleep+0x2d/0x50 amdgpu_kiq_rreg+0x18d/0x1f0 [amdgpu] amdgpu_device_rreg.part.0+0x59/0xd0 [amdgpu] amdgpu_device_rreg+0x3a/0x50 [amdgpu] amdgpu_sriov_rreg+0x3c/0xb0 [amdgpu] gfx_v10_0_set_gfx_eop_interrupt_state.constprop.0+0x16c/0x190 [amdgpu] gfx_v10_0_set_eop_interrupt_state+0xa5/0xb0 [amdgpu] amdgpu_irq_update+0x53/0x80 [amdgpu] amdgpu_irq_get+0x7c/0xb0 [amdgpu] amdgpu_fence_driver_hw_init+0x58/0x90 [amdgpu] amdgpu_device_init.cold+0x16b7/0x2022 [amdgpu] Signed-off-by: Chong Li <[email protected]> Reviewed-by: [email protected] Signed-off-by: Alex Deucher <[email protected]>
2023-04-18drm/amdgpu/vcn: fix mmsch ctx table sizeJane Jian1-1/+1
add jpeg table size to ctx table size rather than override it Signed-off-by: Jane Jian <[email protected]> Reviewed-by: JingWen Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-14drm/amdgpu: add some basic elements for multiple XCD caseLe Ma3-2/+18
Add some basic definitions and structure member. Inscrease MAX_WB slots to 1024 to support the increasing number of rings for multiple partitions. v2: unify naming style Signed-off-by: Le Ma <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-14drm/amdgpu: move vmhub out of amdgpu_ring_funcs (v4)Le Ma31-220/+93
It looks better to place this field in ring structure. Also drop the repeated ring funcs definitions if there's no difference except for vmhub field. v2: rename the field to vm_hub like others (Le) v3: apply the changes to new ip blocks (Hawking) v4: fix vcn sw ring (Alex) Signed-off-by: Le Ma <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-14Revert "drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV"Jane Jian1-1/+0
This reverts commit fe120b9f5ce873516a2604e4ff0c19084be94e8c. This patch impacts sriov multi-vf stability Signed-off-by: Jane Jian <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-14drm/amdgpu: add common ip block for GC 9.4.3Hawking Zhang1-0/+1
Add common IP handling for GC 9.4.3 Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-14drm/amdgpu: add common early init support for GC 9.4.3Hawking Zhang1-0/+5
init asic funcs and cp/pg flags for GC 9.4.3 Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-14drm/amdgpu: switch to v9_4_3 gfx_funcs callbacks for GC 9.4.3Hawking Zhang2-29/+126
add gfx_funcs callbacks implemenation based on gc_v9_4_3 ip headers Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-14drm/amdgpu: reserve the old gc_11_0_*_mes.binLi Ma1-0/+5
Reserve the MOUDLE_FIRMWARE declaration of gc_11_0_*_mes.bin to fix falling back to old mes bin on failure via autoload. Fixes: 97998b893c30 ("drm/amd/amdgpu: introduce gc_*_mes_2.bin v2") Signed-off-by: Li Ma <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-14drm/amdgpu: change the reference clock for raven/raven2Jesse Zhang1-4/+3
Due to switch to golden tsc register to get clock counter for raven/ raven2. Chang the reference clock from 25MHZ to 100MHZ. Suggested-by: shanshengwang <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-14drm/amdgpu: skip kfd-iommu suspend/resume for S0ixAaron Liu1-3/+5
GFX is in gfxoff mode during s0ix so we shouldn't need to actually execute kfd_iommu_suspend/kfd_iommu_resume operation. Signed-off-by: Aaron Liu <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-14drm/amdgpu: add gc v9_4_3 rlc_funcs implementationHawking Zhang3-0/+364
all the gc v9_4_3 registers fall in gc_rlcpdec address range have different relative offsets and base_idx from the ones defined in gc v9_0 ip headers. gc_v9_0_rlc_funcs can not be reused anymore for gc v9_4_3 v2: drop unused handshake function (Alex) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-13drm/amdgpu: drop temp programming for pagefault handlingHawking Zhang1-22/+0
Was introduced as workaround. not needed anymore Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Jack Gui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-13drm/amdgpu: include protection for doorbell.hShashank Sharma1-0/+4
This patch adds double include protection for doorbell.h Cc: Christian Koenig <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Christian Koenig <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-13drm/amdgpu: rename num_doorbellsShashank Sharma3-15/+17
Rename doorbell.num_doorbells to doorbell.num_kernel_doorbells to make it more readable. Cc: Alex Deucher <[email protected]> Cc: Christian Koenig <[email protected]> Acked-by: Christian Koenig <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-13drm/amdgpu: switch to golden tsc registers for raven/raven2Jesse Zhang1-0/+40
Due to raven/raven2 maybe enable  sclk slow down, they cannot get clock count by the RLC at the auto level of dpm performance. So switch to golden tsc register. Suggested-by: shanshengwang <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-13drm/amdgpu: add gfx v11_0_3 fed irq handling for sriovYiPeng Chai1-3/+11
Add gfx v11_0_3 fed irq handling for sriov. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-13drm/amdgpu: Rework retry fault removalMukul Joshi2-3/+34
Rework retry fault removal from the software filter by storing an expired timestamp for a fault that is being removed. When a new fault comes, and it matches an entry in the sw filter, it will be added as a new fault only when its timestamp is greater than the timestamp expiry of the fault in the sw filter. This helps in avoiding stale faults being added back into the filter and preventing legitimate faults from being handled. Suggested-by: Felix Kuehling <[email protected]> Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Philip Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-13drm/amdgpu: Enable IH retry CAM on GFX9Mukul Joshi4-48/+62
This patch enables the IH retry CAM on GFX9 series cards. This retry filter is used to prevent sending lots of retry interrupts in a short span of time and overflowing the IH ring buffer. This will also help reduce CPU interrupt workload. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-13drm/amdgpu: simplify amdgpu_ras_eeprom.cAlex Deucher1-52/+20
All chips that support RAS also support IP discovery, so use the IP versions rather than a mix of IP versions and asic types. Checking the validity of the atom_ctx pointer is not required as the vbios is already fetched at this point. v2: add comments to id asic types based on feedback from Luben Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: Luben Tuikov <[email protected]>
2023-04-11drm/amdgpu: Enable GFX11 SDMA context empty interruptGraham Sider1-10/+18
Enable SDMA queue empty context switching. SDMA context switch due to quantum programming no longer done here (as of sdma v6), so re-name sdma_v6_0_ctx_switch_enable to sdma_v6_0_ctxempty_int_enable to reflect this. Also program SDMAx_QUEUEx_SCHEDULE_CNTL for context switch due to quantum in KFD. Set to amdgpu_sdma_phase_quantum (defaults to 32 i.e. 3200us). Signed-off-by: Graham Sider <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Reviewed-by: Stanley Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-11drm/amdkfd: Check PCIe atomics support on GFX11 to set CP_HQD_HQ_STATUS0[29]Sreekant Somasekharan1-1/+1
CP_HQD_HQ_STATUS0[29] bit will be used by CPFW to acknowledge whether PCIe atomics are supported. The default value of this bit is set to 0. Driver will check whether PCIe atomics are supported and set the bit to 1 if supported. This will force CPFW to use real atomic ops. If the bit is not set, CPFW will default to read/modify/write using the firmware itself. This is applicable only to GFX11 RS64 CP with MEC FW >= 509. If MEC FW < 509 and for all GFX11 F32 CP, PCIe atomics needs to be supported else it will skip the device. This commit also involves moving amdgpu_amdkfd_device_probe() function call after per-IP early_init loop in amdgpu_device_ip_early_init() function so as to check for RS64 enabled device. Signed-off-by: Sreekant Somasekharan <[email protected]> Reviewed-by: Graham Sider <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-11drm/amdgpu: correct ras enabled flagStanley.Yang1-0/+7
XGMI RAS should be according to the gmc xgmi physical nodes number, XGMI RAS should not be enabled if xgmi num_physical_nodes is zero. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-11drm/amdgpu: fix unexpected block idStanley.Yang2-0/+6
Aldebaran supports VCN and JPEG RAS, it reports unexpected block id message during VCN and JPEG RAS initialization if VCN and JPEG block id not defined. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-11drm/amdgpu: use sdma_v6 single packet invalidationPierre-Eric Pelloux-Prayer1-1/+22
This achieves the same result as the sequence used in emit_flush_gpu_tlb but the invalidation is now a single packet instead of the 3 packets required to implement reg_write_reg_wait. Signed-off-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Monk Liu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-11drm/amdgpu: Fix warningsLijo Lazar1-1/+1
Fix below warning due to incompatible types in conditional operator ../pm/swsmu/smu13/smu_v13_0_6_ppt.c:315:17: sparse: sparse: incompatible types in conditional expression (different base types): Signed-off-by: Lijo Lazar <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Alex Deucher <[email protected]>
2023-04-11drm/amdgpu: refine get gpu clock counter methodTong Liu011-2/+15
[why] regGOLDEN_TSC_COUNT_LOWER/regGOLDEN_TSC_COUNT_UPPER are protected and unaccessible under sriov. The clock counter high bit may update during reading process. [How] Replace regGOLDEN_TSC_COUNT_LOWER/regGOLDEN_TSC_COUNT_UPPER with regCP_MES_MTIME_LO/regCP_MES_MTIME_HI to get gpu clock under sriov. Refine get gpu clock counter method to make the result more precise. Signed-off-by: Tong Liu01 <[email protected]> Acked-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-11drm/amdgpu: optimize redundant code in umc_v6_7YiPeng Chai1-91/+70
Optimize redundant code in umc_v6_7. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-11drm/amdgpu: Fix sdma v4 sw fini errorlyndonli1-1/+1
Fix sdma v4 sw fini error for sdma 4.2.2 to solve the following general protection fault [ +0.108196] general protection fault, probably for non-canonical address 0xd5e5a4ae79d24a32: 0000 [#1] PREEMPT SMP PTI [ +0.000018] RIP: 0010:free_fw_priv+0xd/0x70 [ +0.000022] Call Trace: [ +0.000012] <TASK> [ +0.000011] release_firmware+0x55/0x80 [ +0.000021] amdgpu_ucode_release+0x11/0x20 [amdgpu] [ +0.000415] amdgpu_sdma_destroy_inst_ctx+0x4f/0x90 [amdgpu] [ +0.000360] sdma_v4_0_sw_fini+0xce/0x110 [amdgpu] Signed-off-by: lyndonli <[email protected]> Reviewed-by: Likun Gao <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-11drm/amdgpu: DROP redundant drm_prime_sg_to_dma_addr_arrayShane Xiao1-3/+0
For DMA-MAP userptr on other GPUs, the dma address array will be populated in amdgpu_ttm_backend_bind. Remove the redundant call from the driver. v2: update the comment Signed-off-by: Shane Xiao <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>