aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
2024-02-01drm/xe/vm: Subclass userptr vmasThomas Hellström5-93/+137
The construct allocating only parts of the vma structure when the userptr part is not needed is very fragile. A developer could add additional fields below the userptr part, and the code could easily attempt to access the userptr part even if its not persent. So introduce xe_userptr_vma which subclasses struct xe_vma the proper way, and accordingly modify a couple of interfaces. This should also help if adding userptr helpers to drm_gpuvm. v2: - Fix documentation of to_userptr_vma() (Matthew Brost) - Fix allocation and freeing of vmas to clearer distinguish between the types. Closes: https://lore.kernel.org/intel-xe/[email protected]/T/#u Fixes: a4cc60a55fd9 ("drm/xe: Only alloc userptr part of xe_vma for userptrs") Cc: Rodrigo Vivi <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Lucas De Marchi <[email protected]> Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 5bd24e78829ad569fa1c3ce9a05b59bb97b91f3d) Signed-off-by: Thomas Hellström <[email protected]>
2024-02-01drm/xe: Use LRC prefix rather than CTX prefix in lrc desc definesMatthew Brost1-7/+7
The sparc build fails [1] due to CTX_VALID being redefined. Fix this by using a better naming convention of LRC_VALID as this define is used in setting bits in the lrc descriptor. To be uniform, change other define with LRC prefix too. [1] https://lore.kernel.org/all/[email protected]/ v2: - s/LEGACY_64B_CONTEXT/LRC_LEGACY_64B_CONTEXT (Lucas) Fixes: 0bc519d20ffa ("drm/xe: Remove GEN[0-9]*_ prefixes") Cc: Thomas Hellström <[email protected]> Cc: Lucas De Marchi <[email protected]> Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 152ca51d8db03f08a71c25e999812e263839fdce) Signed-off-by: Thomas Hellström <[email protected]>
2024-02-01drm/xe: Don't use __user error pointersThomas Hellström1-25/+25
The error pointer macros are not aware of __user pointers and as a consequence sparse warns. Have the copy_mask() function return an integer instead of a __user pointer. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Rodrigo Vivi <[email protected]> Cc: Matthew Brost <[email protected]> Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 78366eed6853aa6a5deccb2eb182f9334d2bd208) Signed-off-by: Thomas Hellström <[email protected]>
2024-02-01drm/xe: Annotate mcr_[un]lock()Thomas Hellström1-2/+2
These functions acquire and release the gt::mcr_lock. Annotate accordingly. Fix the corresponding sparse warning. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Fixes: fb1d55efdfcb ("drm/xe: Cleanup OPEN_BRACE style issues") Cc: Francois Dugast <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Matthew Brost <[email protected]> Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 97fd7a7e4e877676a2ab1a687ba958b70931abcc) Signed-off-by: Thomas Hellström <[email protected]>
2024-02-01drm/xe: Only allow 1 ufence per exec / bind IOCTLMatthew Brost3-2/+23
The way exec ufences are coded only 1 ufence per IOCTL will be signaled. It is possible to fix this but for current use cases 1 ufence per IOCTL is sufficient. Enforce a limit of 1 ufence per IOCTL (both exec and bind to be uniform). v2: - Add fixes tag (Thomas) Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Mika Kahola <[email protected]> Cc: Thomas Hellström <[email protected]> Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Brian Welty <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit d1df9bfbf68c65418f30917f406b6d5bd597714e) Signed-off-by: Thomas Hellström <[email protected]>
2024-02-01drm/xe: Grab mem_access when disabling C6 on skip_guc_pc platformsMatt Roper1-0/+2
If skip_guc_pc is set for a platform, C6 is disabled directly without acquiring a mem_access reference, triggering an assertion inside xe_gt_idle_disable_c6. Fixes: 975e4a3795d4 ("drm/xe: Manually setup C6 when skip_guc_pc is set") Cc: Rodrigo Vivi <[email protected]> Cc: Vinay Belgaumkar <[email protected]> Signed-off-by: Matt Roper <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 9f5971bdf78e0937206556534247243ad56cd735) Signed-off-by: Thomas Hellström <[email protected]>
2024-02-01drm/xe: Fix crash in trace_dma_fence_init()José Roberto de Souza1-3/+3
trace_dma_fence_init() uses dma_fence_ops functions like get_driver_name() and get_timeline_name() to generate trace information but the Xe KMD implementation of those functions makes use of xe_hw_fence_ctx that was being set after dma_fence_init(). So here just inverting the order to fix the crash. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: José Roberto de Souza <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit c6878e47431c72168da08dfbc1496c09b2d3c246) Signed-off-by: Thomas Hellström <[email protected]>
2024-02-01drm/xe/vm: Subclass userptr vmasThomas Hellström5-93/+137
The construct allocating only parts of the vma structure when the userptr part is not needed is very fragile. A developer could add additional fields below the userptr part, and the code could easily attempt to access the userptr part even if its not persent. So introduce xe_userptr_vma which subclasses struct xe_vma the proper way, and accordingly modify a couple of interfaces. This should also help if adding userptr helpers to drm_gpuvm. v2: - Fix documentation of to_userptr_vma() (Matthew Brost) - Fix allocation and freeing of vmas to clearer distinguish between the types. Closes: https://lore.kernel.org/intel-xe/[email protected]/T/#u Fixes: a4cc60a55fd9 ("drm/xe: Only alloc userptr part of xe_vma for userptrs") Cc: Rodrigo Vivi <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Lucas De Marchi <[email protected]> Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-01-31drm/xe: Use LRC prefix rather than CTX prefix in lrc desc definesMatthew Brost1-7/+7
The sparc build fails [1] due to CTX_VALID being redefined. Fix this by using a better naming convention of LRC_VALID as this define is used in setting bits in the lrc descriptor. To be uniform, change other define with LRC prefix too. [1] https://lore.kernel.org/all/[email protected]/ v2: - s/LEGACY_64B_CONTEXT/LRC_LEGACY_64B_CONTEXT (Lucas) Fixes: 0bc519d20ffa ("drm/xe: Remove GEN[0-9]*_ prefixes") Cc: Thomas Hellström <[email protected]> Cc: Lucas De Marchi <[email protected]> Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-01-31drm/amdgpu: Reset IH OVERFLOW_CLEAR bitFriedrich Vock10-0/+59
Allows us to detect subsequent IH ring buffer overflows as well. Cc: Joshua Ashton <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: [email protected] Signed-off-by: Friedrich Vock <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspendYifan Zhang2-36/+0
There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src status. Beside, current set function callbacks are empty with no real effect. Signed-off-by: Yifan Zhang <[email protected]> Acked-by: Saleemkhan Jamadar <[email protected]> Reviewed-by: Veerabadhran Gopalakrishnan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0Yifan Zhang1-22/+0
No need to set GC golden settings in driver from gfx 11.5.0 onwards. Signed-off-by: Yifan Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdkfd: reserve the BO before validating itLang Yu3-5/+21
Fix a warning. v2: Avoid unmapping attachment repeatedly when ERESTARTSYS. v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix) [ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'Srinivasan Shanmugam4-8/+8
Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()' Fixes the below: drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r' Fixes: fac4ebd79fed ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'") Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()'Srinivasan Shanmugam1-1/+1
The error message buffer overflow 'dc->links' 12 <= 12 suggests that the code is trying to access an element of the dc->links array that is beyond its bounds. In C, arrays are zero-indexed, so an array with 12 elements has valid indices from 0 to 11. Trying to access dc->links[12] would be an attempt to access the 13th element of a 12-element array, which is a buffer overflow. To fix this, ensure that the loop does not go beyond the last valid index when accessing dc->links[i + 1] by subtracting 1 from the loop condition. This would ensure that i + 1 is always a valid index in the array. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:208 get_host_router_total_dp_tunnel_bw() error: buffer overflow 'dc->links' 12 <= 12 Fixes: 59f1622a5f05 ("drm/amd/display: Add dpia display mode validation logic") Cc: PeiChen Huang <[email protected]> Cc: Aric Cyr <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: Aurabindo Pillai <[email protected]> Cc: Meenakshikumar Somasundaram <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Tom Chung <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'Srinivasan Shanmugam1-0/+4
Add a NULL check for the kzalloc call that allocates memory for dummy_updates in the amdgpu_dm_atomic_commit_tail function. Previously, if kzalloc failed to allocate memory and returned NULL, the code would attempt to use the NULL pointer. The fix is to check if kzalloc returns NULL, and if so, log an error message and skip the rest of the current loop iteration with the continue statement. This prevents the code from attempting to use the NULL pointer. Cc: Julia Lawall <[email protected]> Cc: Aurabindo Pillai <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: Alex Hung <[email protected]> Cc: Alex Deucher <[email protected]> Reported-by: Julia Lawall <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Fixes: 135fd1b35690 ("drm/amd/display: Reduce stack size") Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amd: Don't init MEC2 firmware when it fails to loadDavid McFarland1-2/+0
The same calls are made directly above, but conditional on the firmware loading and validating successfully. Cc: [email protected] Fixes: 9931b67690cf ("drm/amd: Load GFX10 microcode during early_init") Signed-off-by: David McFarland <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: Fix the warning info in mode1 resetMa Jun5-12/+24
Fix the warning info below during mode1 reset. [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000006] ? show_regs+0x6e/0x80 [ +0.000011] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000005] ? __warn+0x91/0x150 [ +0.000009] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000006] ? report_bug+0x19d/0x1b0 [ +0.000013] ? handle_bug+0x46/0x80 [ +0.000012] ? exc_invalid_op+0x1d/0x80 [ +0.000011] ? asm_exc_invalid_op+0x1f/0x30 [ +0.000014] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000007] ? __flush_work.isra.0+0x208/0x390 [ +0.000007] ? _prb_read_valid+0x216/0x290 [ +0.000008] __cancel_work_timer+0x11d/0x1a0 [ +0.000007] ? try_to_grab_pending+0xe8/0x190 [ +0.000012] cancel_work_sync+0x14/0x20 [ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched] [ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu] This warning info was printed after applying the patch "drm/sched: Convert drm scheduler to use a work queue rather than kthread". The root cause is that amdgpu driver tries to use the uninitialized work_struct in the struct drm_gpu_scheduler v2: - Rename the function to amdgpu_ring_sched_ready and move it to amdgpu_ring.c (Alex) v3: - Fix a few more checks based on Vitaly's patch (Alex) v4: - squash in fix noticed by Bert in https://gitlab.freedesktop.org/drm/amd/-/issues/3139 Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling") Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Vitaly Prosyak <[email protected]> Signed-off-by: Ma Jun <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amd/display: Fix dcn35 8k30 Underflow/Corruption IssueFangzhi Zuo2-18/+13
[why] odm calculation is missing for pipe split policy determination and cause Underflow/Corruption issue. [how] Add the odm calculation. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Reviewed-by: Charlene Liu <[email protected]> Acked-by: Tom Chung <[email protected]> Signed-off-by: Fangzhi Zuo <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amd/display: Underflow workaround by increasing SR exit latencyNicholas Susanto2-18/+18
[Why] On 14us for exit latency time causes underflow for 8K monitor with HDR on. Increasing the latency to 28us fixes the underflow. [How] Increase the latency to 28us. This workaround should be sufficient before we figure out why SR exit so long. Reviewed-by: Chaitanya Dhere <[email protected]> Acked-by: Tom Chung <[email protected]> Signed-off-by: Nicholas Susanto <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amd/display: fix incorrect mpc_combine array sizeWenjing Liu1-1/+1
[why] MAX_SURFACES is per stream, while MAX_PLANES is per asic. The mpc_combine is an array that records all the planes per asic. Therefore MAX_PLANES should be used as the array size. Using MAX_SURFACES causes array overflow when there are more than 3 planes. [how] Use the MAX_PLANES for the mpc_combine array size. Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Cc: [email protected] Reviewed-by: Rodrigo Siqueira <[email protected]> Reviewed-by: Nevenko Stupar <[email protected]> Reviewed-by: Chaitanya Dhere <[email protected]> Acked-by: Tom Chung <[email protected]> Signed-off-by: Wenjing Liu <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amd/display: Fix DPSTREAM CLK on and off sequenceDmytro Laktyushkin2-7/+6
[Why] Secondary DP2 display fails to light up in some instances [How] Clock needs to be on when DPSTREAMCLK*_EN =1. This change moves dtbclk_p enable/disable point to make sure this is the case Reviewed-by: Charlene Liu <[email protected]> Reviewed-by: Dmytro Laktyushkin <[email protected]> Acked-by: Tom Chung <[email protected]> Signed-off-by: Daniel Miess <[email protected]> Signed-off-by: Dmytro Laktyushkin <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amd/display: fix USB-C flag update after enc10 feature initCharlene Liu2-4/+4
[why] BIOS's integration info table not following the original order which is phy instance is ext_displaypath's array index. [how] Move them to follow the original order. Reviewed-by: Muhammad Ahmed <[email protected]> Acked-by: Tom Chung <[email protected]> Signed-off-by: Charlene Liu <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amd/display: increased min_dcfclk_mhz and min_fclk_mhzSohaib Nadeem1-1/+1
[why] Originally, PMFW said min FCLK is 300Mhz, but min DCFCLK can be increased to 400Mhz because min FCLK is now 600Mhz so FCLK >= 1.5 * DCFCLK hardware requirement will still be satisfied. Increasing min DCFCLK addresses underflow issues (underflow occurs when phantom pipe is turned on for some Sub-Viewport configs). [how] Increasing DCFCLK by raising the min_dcfclk_mhz Reviewed-by: Chaitanya Dhere <[email protected]> Reviewed-by: Alvin Lee <[email protected]> Acked-by: Tom Chung <[email protected]> Signed-off-by: Sohaib Nadeem <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31Revert "drm/amd/display: initialize all the dpm level's stutter latency"Charlene Liu1-3/+1
Revert commit 885c71ad791c ("drm/amd/display: initialize all the dpm level's stutter latency") Because it causes some regression Reviewed-by: Muhammad Ahmed <[email protected]> Acked-by: Tom Chung <[email protected]> Signed-off-by: Charlene Liu <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdkfd: Use correct drm device for cgroup permission checkMukul Joshi1-2/+7
On GFX 9.4.3, for a given KFD node, fetch the correct drm device from XCP manager when checking for cgroup permissions. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdkfd: Use S_ENDPGM_SAVED in trap handlerJay Cornwall3-9/+9
This instruction has no functional difference to S_ENDPGM but allows performance counters to track save events correctly. Signed-off-by: Jay Cornwall <[email protected]> Reviewed-by: Laurent Morichetti <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdkfd: Correct partial migration virtual addrPhilip Yang1-1/+1
Partial migration to system memory should use migrate.addr, not prange->start as virtual address to allocate system memory page. Fixes: a546a2768440 ("drm/amdkfd: Use partial migrations/mapping for GPU/CPU page faults in SVM") Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Xiaogang Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: move the drm client creation behind drm device registrationLe Ma3-11/+27
This patch is to eliminate interrupt warning below: "[drm] Fence fallback timer expired on ring sdma0.0". An early vm pt clearing job is sent to SDMA ahead of interrupt enabled. And re-locating the drm client creation following after drm_dev_register looks like a more proper flow. v2: wrap the drm client creation Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Le Ma <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/nouveau/svm: remove unused but set variablesJani Nikula1-7/+3
Fix the W=1 warning -Wunused-but-set-variable. Cc: Karol Herbst <[email protected]> Cc: Lyude Paul <[email protected]> Cc: Danilo Krummrich <[email protected]> Cc: [email protected] Signed-off-by: Jani Nikula <[email protected]> Reviewed-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/8b133e7ec0e9aef728be301ac019c5ddcb3bbf51.1704908087.git.jani.nikula@intel.com
2024-01-31drm/nouveau/acr/ga102: remove unused but set variableJani Nikula1-2/+1
Fix the W=1 warning -Wunused-but-set-variable. Cc: Karol Herbst <[email protected]> Cc: Lyude Paul <[email protected]> Cc: Danilo Krummrich <[email protected]> Cc: [email protected] Signed-off-by: Jani Nikula <[email protected]> Reviewed-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/4d9f62fa6963acfd8b7d8f623799ba3a516e347d.1704908087.git.jani.nikula@intel.com
2024-01-31drm/amdgpu: Reset IH OVERFLOW_CLEAR bitFriedrich Vock10-0/+59
Allows us to detect subsequent IH ring buffer overflows as well. Cc: Joshua Ashton <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: [email protected] Signed-off-by: Friedrich Vock <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspendYifan Zhang2-36/+0
There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src status. Beside, current set function callbacks are empty with no real effect. Signed-off-by: Yifan Zhang <[email protected]> Acked-by: Saleemkhan Jamadar <[email protected]> Reviewed-by: Veerabadhran Gopalakrishnan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: use PSP address query commandTao Zhou1-7/+39
Get UMC physical address from PSP in RAS error address coversion. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: add PSP RAS address query commandTao Zhou3-0/+64
Convert mca address to physical address or vice versa via RAS TA. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0Yifan Zhang1-22/+0
No need to set GC golden settings in driver from gfx 11.5.0 onwards. Signed-off-by: Yifan Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdkfd: reserve the BO before validating itLang Yu3-5/+21
Fix a warning. v2: Avoid unmapping attachment repeatedly when ERESTARTSYS. v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix) [ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()'Srinivasan Shanmugam1-1/+1
The error message buffer overflow 'dc->links' 12 <= 12 suggests that the code is trying to access an element of the dc->links array that is beyond its bounds. In C, arrays are zero-indexed, so an array with 12 elements has valid indices from 0 to 11. Trying to access dc->links[12] would be an attempt to access the 13th element of a 12-element array, which is a buffer overflow. To fix this, ensure that the loop does not go beyond the last valid index when accessing dc->links[i + 1] by subtracting 1 from the loop condition. This would ensure that i + 1 is always a valid index in the array. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:208 get_host_router_total_dp_tunnel_bw() error: buffer overflow 'dc->links' 12 <= 12 Fixes: 59f1622a5f05 ("drm/amd/display: Add dpia display mode validation logic") Cc: PeiChen Huang <[email protected]> Cc: Aric Cyr <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: Aurabindo Pillai <[email protected]> Cc: Meenakshikumar Somasundaram <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Tom Chung <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'Srinivasan Shanmugam1-0/+4
Add a NULL check for the kzalloc call that allocates memory for dummy_updates in the amdgpu_dm_atomic_commit_tail function. Previously, if kzalloc failed to allocate memory and returned NULL, the code would attempt to use the NULL pointer. The fix is to check if kzalloc returns NULL, and if so, log an error message and skip the rest of the current loop iteration with the continue statement. This prevents the code from attempting to use the NULL pointer. Cc: Julia Lawall <[email protected]> Cc: Aurabindo Pillai <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: Alex Hung <[email protected]> Cc: Alex Deucher <[email protected]> Reported-by: Julia Lawall <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Fixes: 135fd1b35690 ("drm/amd/display: Reduce stack size") Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'Srinivasan Shanmugam4-8/+8
Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()' Fixes the below: drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r' Fixes: fac4ebd79fed ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'") Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: Need to resume ras during gpu reset for gfx v9_4_3 sriovYiPeng Chai1-0/+1
Need to resume ras during gpu reset for gfx v9_4_3 sriov Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: disable RAS feature when finiTao Zhou1-1/+1
Send RAS disable feature command in fini. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amdgpu: Update boot time errors polling sequenceHawking Zhang2-1/+18
Update boot time errors polling sequence to align with the latest firmware change. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Frank Min <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/amd: Don't init MEC2 firmware when it fails to loadDavid McFarland1-2/+0
The same calls are made directly above, but conditional on the firmware loading and validating successfully. Cc: [email protected] Fixes: 9931b67690cf ("drm/amd: Load GFX10 microcode during early_init") Signed-off-by: David McFarland <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/xe: Make all GuC ABI shift values unsignedMatthew Brost5-21/+21
All GuC ABI definitions are unsigned and not defining as unsigned is causing build errors [1]. [1] https://lore.kernel.org/all/[email protected]/ Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Thomas Hellström <[email protected]> Cc: Lucas De Marchi <[email protected]> Cc: Michal Wajdeczko <[email protected]> Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-01-31drm/xe: Convert job timeouts from assert to warningMatt Roper1-3/+6
xe_assert() is intended to be used only for "impossible" situations that should never be hit (and if they are hit it means there's a driver bug somewhere); assertions are only compiled into debug builds. Although we expect jobs submitted by the kernel to be well-behaved and run without error, timeouts are a legitimate possibility for reasons beyond our control (bad firmware, flaky hardware, etc.). We should use a real WARN if we encounter these, even for non-debug builds, to ensure the issue is being properly highlighted in bug reports and such. Also give the WARNs more human-readable messages and move them below the general notice-level message that gets printed for any kind of timeout to make the errors a bit more understandable. v2: - Convert the VM / exec_queue_killed assertion as well. (MattB) Signed-off-by: Matt Roper <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-01-31drm/amdgpu: Fix the warning info in mode1 resetMa Jun5-12/+24
Fix the warning info below during mode1 reset. [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000006] ? show_regs+0x6e/0x80 [ +0.000011] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000005] ? __warn+0x91/0x150 [ +0.000009] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000006] ? report_bug+0x19d/0x1b0 [ +0.000013] ? handle_bug+0x46/0x80 [ +0.000012] ? exc_invalid_op+0x1d/0x80 [ +0.000011] ? asm_exc_invalid_op+0x1f/0x30 [ +0.000014] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000007] ? __flush_work.isra.0+0x208/0x390 [ +0.000007] ? _prb_read_valid+0x216/0x290 [ +0.000008] __cancel_work_timer+0x11d/0x1a0 [ +0.000007] ? try_to_grab_pending+0xe8/0x190 [ +0.000012] cancel_work_sync+0x14/0x20 [ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched] [ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu] This warning info was printed after applying the patch "drm/sched: Convert drm scheduler to use a work queue rather than kthread". The root cause is that amdgpu driver tries to use the uninitialized work_struct in the struct drm_gpu_scheduler v2: - Rename the function to amdgpu_ring_sched_ready and move it to amdgpu_ring.c (Alex) v3: - Fix a few more checks based on Vitaly's patch (Alex) v4: - squash in fix noticed by Bert in https://gitlab.freedesktop.org/drm/amd/-/issues/3139 Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling") Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Vitaly Prosyak <[email protected]> Signed-off-by: Ma Jun <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-31drm/xe: Don't use __user error pointersThomas Hellström1-25/+25
The error pointer macros are not aware of __user pointers and as a consequence sparse warns. Have the copy_mask() function return an integer instead of a __user pointer. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Rodrigo Vivi <[email protected]> Cc: Matthew Brost <[email protected]> Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-01-31drm/xe: Annotate mcr_[un]lock()Thomas Hellström1-2/+2
These functions acquire and release the gt::mcr_lock. Annotate accordingly. Fix the corresponding sparse warning. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Fixes: fb1d55efdfcb ("drm/xe: Cleanup OPEN_BRACE style issues") Cc: Francois Dugast <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Matthew Brost <[email protected]> Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-01-31drm/xe: drop display/ subdir from include directoriesJani Nikula5-7/+6
There are very few places that need to include anything from under display/. Require the display/ prefix in #include directives, and drop the subdirectory from the header search path. Sort the include lists while at it. Signed-off-by: Jani Nikula <[email protected]> Reviewed-by: Rodrigo Vivi <[email protected]> Acked-by: Lucas De Marchi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]