aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-09-02drm/amdgpu/gfx10: use rlc safe mode for soft recoveryAlex Deucher1-0/+2
Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx11: use rlc safe mode for soft recoveryAlex Deucher1-0/+2
Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx12: use rlc safe mode for soft recoveryAlex Deucher1-0/+2
Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx12: use proper rlc safe mode helpersAlex Deucher1-2/+2
Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx11: use proper rlc safe mode helpersAlex Deucher1-4/+4
Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx10: use proper rlc safe mode helpersAlex Deucher1-2/+2
Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx12: per queue reset only on bare metalAlex Deucher1-0/+6
It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx11: per queue reset only on bare metalAlex Deucher1-0/+6
It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx10: per queue reset only on bare metalAlex Deucher1-0/+6
It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/mes11: implement mmio queue reset for gfx11Jiadong Zhu1-0/+80
Implement queue reset for graphic and compute queue. v2: use amdgpu_gfx_rlc funcs to enter/exit safe mode. v3: use gfx_v11_0_request_gfx_index_mutex() v4: fix mutex handling Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Jiadong Zhu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/mes: implement amdgpu_mes_reset_hw_queue_mmioJiadong Zhu1-0/+20
The reset_queue api could be used from kfd or kgd. v2: add use_mmio parameter for mes_reset_legacy_queue. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Jiadong Zhu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/mes: modify mes api for mmio queue resetJiadong Zhu4-4/+17
Add me/pipe/queue parameters for queue reset input. v2: fix build (Alex) Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Jiadong Zhu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx12: fallback to driver reset compute queue directlyAlex Deucher1-14/+79
Since the MES FW resets kernel compute queue always failed, this may caused by the KIQ failed to process unmap KCQ. So, before MES FW work properly that will fallback to driver executes dequeue and resets SPI directly. Besides, rework the ring reset function and make the busy ring type reset in each function respectively. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx12: add ring reset callbacksAlex Deucher1-0/+18
Add ring reset callbacks for gfx and compute. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx10: rework reset sequenceAlex Deucher1-7/+19
To match other GFX IPs. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx10: wait for reset done before remapJiadong Zhu1-11/+30
There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) v3: fix KIQ locking harder (Jessie) Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Jiadong Zhu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx10: remap queue after reset successfullyJiadong Zhu1-11/+35
Kiq command unmap_queues only does the dequeueing action. We have to map the queue back with clean mqd. v2: fix up error handling (Alex) Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Jiadong Zhu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx10: add ring reset callbacksAlex Deucher1-0/+91
Add ring reset callbacks for gfx and compute. v2: fix gfx handling v3: wait for KIQ to complete Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx11: wait for reset done before remapJiadong Zhu1-1/+14
There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Jiadong Zhu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx11: rename gfx_v11_0_gfx_init_queue()Alex Deucher1-3/+3
Rename to gfx_v11_0_kgq_init_queue() to better align with the other naming in the file. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx11: fallback to driver reset compute queue directly (v2)Prike Liang1-13/+71
Since the MES FW resets kernel compute queue always failed, this may caused by the KIQ failed to process unmap KCQ. So, before MES FW work properly that will fallback to driver executes dequeue and resets SPI directly. Besides, rework the ring reset function and make the busy ring type reset in each function respectively. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Prike Liang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amd/display: 3.2.299Aric Cyr2-2/+2
This version brings along the following: - DCN35 fixes - DML2 fixes - IPS fixes - ODM fixes - Miscellaneous cleanups - MST fixes - SPL fixes Acked-by: Aurabindo Pillai <[email protected]> Signed-off-by: Aric Cyr <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amd/display: Fix flickering caused by dccgHansen Dsouza3-73/+72
Always allow un-gating. Follow legacy workaround for repeated dppclk dto updates Reviewed-by: Muhammad Ahmed <[email protected]> Signed-off-by: Hansen Dsouza <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amd/display: Block timing sync for different signals in PMODillon Varone1-1/+2
PMO assumes that like timings can be synchronized, but DC only allows this if the signal types match. Reviewed-by: Austin Zheng <[email protected]> Signed-off-by: Dillon Varone <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amd/display: fix graphics hang in multi-display mst caseGabe Teeger4-49/+34
[what] Graphics hang observed with 3 displays connected to DP2.0 mst dock. [why] There's a mismatch in dml and dc between the assignments of hpo link encoders. [how] Add a new array in dml that tracks the current mapping of HPO stream encoders to HPO link encoders in dc. Reviewed-by: Sung joon Kim <[email protected]> Reviewed-by: Nicholas Kazlauskas <[email protected]> Signed-off-by: Gabe Teeger <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amd/display: Add sharpness control interfaceRelja Vojvodic9-192/+138
- Add interface for controlling shapness level input into DCN. - Update SPL to support custom sharpness values. - Add support for different sharpness values depending on YUV/RGB content. Reviewed-by: Samson Tam <[email protected]> Signed-off-by: Relja Vojvodic <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02Revert "drm/amd/display: Wait for all pending cleared before full update"Dillon Varone24-161/+34
This reverts commit f0b7dcf25834afd17df316367dfe5d4c890c713c. It is causing graphics hangs. Reviewed-by: Martin Leung <[email protected]> Signed-off-by: Dillon Varone <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amd/display: disable sharpness if HDR Multiplier is too largeSamson Tam1-1/+5
[Why] Certain profiles have higher HDR multiplier than SDR boost max which is not currently supported [How] Disable sharpness for these profiles Fixes: 1b0ce903fe74 ("drm/amd/display: add improvements for text display and HDR DWM and MPO") Reviewed-by: Martin Leung <[email protected]> Signed-off-by: Samson Tam <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amd/display: Add dpia debug option to control power managementMeenakshikumar Somasundaram1-1/+2
[Why] To provide option to dpia control power management [How] By adding disable_usb4_pm_support bit field in dpia_debug option to control dpia power management Reviewed-by: Jun Lei <[email protected]> Signed-off-by: Meenakshikumar Somasundaram <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx11: add ring reset callbacksAlex Deucher1-0/+18
Add ring reset callbacks for gfx and compute. Acked-by: Vitaly Prosyak <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amd/display: re-enable Dynamic ODM policySamson Tam1-2/+1
[Why] Previous disable ODM policy due to underflow issue with sharpener. Issue is resolved after updating sharpening policy to apply to both windowed and fullscreen video [How] Remove sharpness check disabling Dynamic ODM policy Reviewed-by: Martin Leung <[email protected]> Signed-off-by: Samson Tam <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amd/display: Lock DC and exit IPS when changing backlightLeo Li1-1/+12
Backlight updates require aux and/or register access. Therefore, driver needs to disallow IPS beforehand. So, acquire the dc lock before calling into dc to update backlight - we should be doing this regardless of IPS. Then, while the lock is held, disallow IPS before calling into dc, then allow IPS afterwards (if it was previously allowed). Reviewed-by: Aurabindo Pillai <[email protected]> Reviewed-by: Roman Li <[email protected]> Signed-off-by: Leo Li <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amd/display: only trigger BIOS related assert for older ASICsDaniel Sa1-1/+1
[Why] Some asserts are always hit on startup/Pnp when they should only be used to indicate when something has gone wrong. [How] Ignore result of getting function from bios cmd table for newer asics. Reviewed-by: Jun Lei <[email protected]> Signed-off-by: Daniel Sa <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amd/display: Fix DCN35 set min dispclk logicNicholas Susanto1-3/+3
[Why] Setting min dispclk to 50Mhz outside clock lowering function causes unnecessary calls to SMU to lower dispclk and causes dentist hangs when there is no stream on the pipes. [How] Move the set minimum dispclk logic inside the lowering dispclk if statement. Fixes: 234441320552 ("DCN35 set min dispclk to 50Mhz") Reviewed-by: Nicholas Kazlauskas <[email protected]> Signed-off-by: Nicholas Susanto <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-02drm/amdgpu/gfx9.4.3: Implement compute pipe resetPrike Liang1-19/+108
Implement the compute pipe reset, and the driver will fallback to pipe reset when queue reset fails. The pipe reset only deactivates the queue which is scheduled in the pipe, and meanwhile the MEC pipe will be reset to the firmware _start pointer. So, it seems pipe reset will cost more cycles than the queue reset; therefore, the driver tries to recover by doing queue reset first. Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Prike Liang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29drm/amdgpu: always allocate cleared VRAM for GEM allocationsAlex Deucher1-0/+3
This adds allocation latency, but aligns better with user expectations. The latency should improve with the drm buddy clearing patches that Arun has been working on. In addition this fixes the high CPU spikes seen when doing wipe on release. v2: always set AMDGPU_GEM_CREATE_VRAM_CLEARED (Christian) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3528 Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") Acked-by: Arunpravin Paneer Selvam <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> (v1) Signed-off-by: Alex Deucher <[email protected]> Cc: Arunpravin Paneer Selvam <[email protected]> Cc: Christian König <[email protected]>
2024-08-29drm/amdgpu/mes: add mes mapping legacy queue switchJack Xiao4-20/+43
For mes11 old firmware has issue to map legacy queue, add a flag to switch mes to map legacy queue. Fixes: f9d8c5c7855d ("drm/amdgpu/gfx: enable mes to map legacy queue support") Reported-by: Andrew Worsley <[email protected]> Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29drm/amdkfd: Don't drain ih1 for APUYifan Zhang1-5/+8
ih1 is not initialized for APUs. Don't drain it or NULL pointer error will be triggered. Fixes: 6ef29715ac06 ("drm/amdkfd: Change kfd/svm page fault drain handling") Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29drm/amdgpu/gfx12: return early in preempt_ib()Alex Deucher1-0/+3
When MES is enabled KIQ is not available. Return an error when someone uses the debugfs preempt test interface in that case. Acked-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29drm/amdgpu/gfx11: return early in preempt_ib()Alex Deucher1-0/+3
When MES is enabled KIQ is not available. Return an error when someone uses the debugfs preempt test interface in that case. Acked-by: Srinivasan Shanmugam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29drm/amd/display: Determine IPS mode by ASIC and PMFW versionsLeo Li1-1/+25
[Why] DCN IPS interoperates with other system idle power features, such as Zstates. On DCN35, there is a known issue where system Z8 + DCN IPS2 causes a hard hang. We observe this on systems where the SBIOS allows Z8. Though there is a SBIOS fix, there's no guarantee that users will get it any time soon, or even install it. A workaround is needed to prevent this from rearing its head in the wild. [How] For DCN35, check the pmfw version to determine whether the SBIOS has the fix. If not, set IPS1+RCG as the deepest possible state in all cases except for s0ix and display off (DPMS). Otherwise, enable all IPS Signed-off-by: Leo Li <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29drm/amdgpu: Move the dumping log out of for loopSunil Khatri1-3/+2
log message "Dumping IP State Completed" needs to be logged only once when state dumping is complete. Hence moving it out of the for loop. Signed-off-by: Sunil Khatri <[email protected]> Acked-by: Trigger Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29drm/amd/amdgpu: move drain_workqueue before shutdown is setVictor Zhao1-3/+3
[background] when unloading amdgpu driver right after running a workload, drain_workqueue is causing "Fence fallback timer expired on ring sdma0.0". Under sriov, this issue will cause sriov full access timeout and a reset happening. move drain_workqueue before shutdown is set to allow ih process and before enter full access under sriov to avoid full access time cost. Signed-off-by: Victor Zhao <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29drm/amdgpu: Do core dump immediately when job tmoTrigger Huang1-1/+67
Do the coredump immediately after a job timeout to get a closer representation of GPU's error status. V2: This will skip printing vram_lost as the GPU reset is not happened yet (Alex) V3: Unconditionally call the core dump as we care about all the reset functions(soft-recovery and queue reset and full adapter reset, Alex) V4: Do the dump after adev->job_hang = true (Sunil) Signed-off-by: Trigger Huang <[email protected]> Acked-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29drm/amdgpu: skip printing vram_lost if neededTrigger Huang3-14/+15
The vm lost status can only be obtained after a GPU reset occurs, but sometimes a dev core dump can be happened before GPU reset. So a new argument is added to tell the dev core dump implementation whether to skip printing the vram_lost status in the dump. And this patch is also trying to decouple the core dump function from the GPU reset function, by replacing the argument amdgpu_reset_context with amdgpu_job to specify the context for core dump. V2: Inform user if VRAM lost check is skipped so users don't assume VRAM wasn't lost (Alex) Signed-off-by: Trigger Huang <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29drm/amdgpu/gfx9: put queue resets behind a debug optionAlex Deucher3-0/+14
Pending extended validation. Reviewed-and-tested-by: Jiadong Zhu <[email protected]> Acked-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29drm/amdgpu: add experimental resets debug flagAlex Deucher2-0/+7
Add this flag to enable experimental resets for testing before they are fully validated. Reviewed-and-tested-by: Jiadong Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-29drm/amdgpu/display: Fix a mistake in revert commitFangzhi Zuo1-1/+1
[why] It is to fix in try_disable_dsc() due to misrevert of commit 338567d17627 ("drm/amd/display: Fix MST BW calculation Regression") [How] Fix restoring minimum compression bw by 'max_kbps', instead of native bw 'stream_kbps' Signed-off-by: Fangzhi Zuo <[email protected]> Reviewed-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-27drm/amdgpu/swsmu: always force a state reprogram on initAlex Deucher1-6/+9
Always reprogram the hardware state on init. This ensures the PMFW state is explicitly programmed and we are not relying on the default PMFW state. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3131 Reviewed-by: Kenneth Feng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-27drm/amdgpu/display: remove unnecessary TODO spl_os_types.hZaeem Mohamed1-1/+0
Remove unnecessary TODO from spl_os_types.h Reviewed-by: Hamza Mahfooz <[email protected]> Signed-off-by: Zaeem Mohamed <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>