aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2024-06-14drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recoverYunxiang Li1-2/+0
At this point the gart is not set up, there's no point to invalidate tlb here and it could even be harmful. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: fix sriov host flr handlerYunxiang Li6-52/+50
We send back the ready to reset message before we stop anything. This is wrong. Move it to when we are actually ready for the FLR to happen. In the current state since we take tens of seconds to stop everything, it is very likely that host would give up waiting and reset the GPU before we send ready, so it would be the same as before. But this gets rid of the hack with reset_domain locking and also let us tell how slow ready to reset actually is from the host. The ready to reset speed can be improved later. Signed-off-by: Yunxiang Li <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: add skip_hw_access checks for sriovYunxiang Li1-0/+9
Accessing registers via host is missing the check for skip_hw_access and the lockdep check that comes with it. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: add reset source in various casesEric Huang3-0/+3
To fullfill the reset event description. Suggested-by: Lijo Lazar <[email protected]> Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: fix NULL pointer in amdgpu_reset_get_descEric Huang1-4/+2
amdgpu_job_ring may return NULL, which causes kernel NULL pointer error, using another way to print ring name instead of ring->name. Suggested-by: Lijo Lazar <[email protected]> Signed-off-by: Eric Huang <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: remove dead code in atom_get_src_intJesse Zhang1-4/+4
Since the range of align is 0~7, the expression is: align = (attr >> 3) & 7. In the case of ATOM_ARG_IMM, the code cannot reach the default case. So there is no need for "break". Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Tim Huang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: add sdma 7.0 support for copy dcc bufferFrank Min4-4/+27
1. Add dcc buffer flag for copy buffer 2. Add sdma 7.0 support copy dcc buffer Signed-off-by: Likun Gao <[email protected]> Signed-off-by: Frank Min <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: support for DCC featureLikun Gao2-0/+9
Deal with AMDGPU_GEM_CREATE_GFX12_DCC to set DCC bit when needed. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: add additional VM bitsAlex Deucher1-0/+1
Add additional VM PTE bits. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-11Merge tag 'amd-drm-next-6.11-2024-06-07' of ↵Dave Airlie121-1498/+14182
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.11-2024-06-07: amdgpu: - DCN 4.0.x support - DCN 3.5 updates - GC 12.0 support - DP MST fixes - Cursor fixes - MES11 updates - MMHUB 4.1 support - DML2 Updates - DCN 3.1.5 fixes - IPS fixes - Various code cleanups - GMC 12.0 support - SDMA 7.0 support - SMU 13 updates - SR-IOV fixes - VCN 5.x fixes - MES12 support - SMU 14.x updates - Devcoredump improvements - Fixes for HDP flush on platforms with >4k pages - GC 9.4.3 fixes - RAS ACA updates - Silence UBSAN flex array warnings - MMHUB 3.3 updates amdkfd: - Contiguous VRAM allocations - GC 12.0 support - SDMA 7.0 support - SR-IOV fixes radeon: - Backlight workaround for iMac - Silence UBSAN flex array warnings UAPI: - GFX12 modifier and DCC support Proposed Mesa changes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29510 - KFD GFX ALU exceptions Proposed ROCdebugger changes: https://github.com/ROCm/ROCdbgapi/commit/08c760622b6601abf906f75abbc5e21d9fd425df https://github.com/ROCm/ROCgdb/commit/944fe1c1414a68700414e86e32273b6bfa62ba6f - KFD Contiguous VRAM allocation flag Proposed ROCr/HIP changes: https://github.com/ROCm/ROCT-Thunk-Interface/commit/f7b4a269914a3ab4f1e2453c2879adb97b5cc9e5 https://github.com/ROCm/ROCR-Runtime/pull/214/commits/26e8530d05a775872cb06dde6693db72be0c454a https://github.com/ROCm/clr/commit/1d48f2a1ab38b632919c4b7274899b3faf4279ff Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-06-05drm/amdgpu: add RAS is_rma flagTao Zhou4-11/+12
Set the flag to true if bad page number reaches threshold. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: fix failure mapping legacy queue when FLRLin.Cao1-0/+2
Flag "mes.ring.shced.ready" will be set as true after mes hw init and set as false when mes hw fini to avoid duplicate initialization. But hw fini will not be called when function level reset, which will cause mes hw init be skipped during FLR, which will leads to mapping legacy queue fail. Set this flag as false when post reset will fix this issue. Signed-off-by: Lin.Cao <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdkfd: add reset cause in gpu pre-reset smi eventEric Huang3-6/+14
reset cause is requested by customer as additional info for gpu reset smi event. v2: integerate reset sources suggested by Lijo Lazar Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: Set PTE_IS_PTE bit for gfx12Frank Min1-0/+1
Set PTE_IS_PTE bit while PRT is enabled on gfx12. Signed-off-by: Frank Min <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: add reset sources in gpu reset contextEric Huang2-0/+47
reset source or reset cause is very useful info for reset context, it will be used by events API. Suggested-by: Lijo Lazar <[email protected]> Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_VG10Shane Xiao3-21/+21
This patch changes the implementation of AMDGPU_PTE_MTYPE_VG10, clear the bits before setting the new one. Suggested-by: Alex Deucher <[email protected]> Signed-off-by: longlyao <[email protected]> Signed-off-by: Shane Xiao <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05Revert "drm/amdgpu/gfx11: enable gfx pipe1 hardware support"Alex Deucher1-3/+3
This reverts commit 6670142d25f3cc3166f2a6c8454acd310bf2776a. Pierre-Eric reported problems with this on his navi33. Revert for now until we understand what is going wrong. Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-06-05drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_NV10Shane Xiao3-20/+21
This patch changes the implementation of AMDGPU_PTE_MTYPE_NV10, clear the bits before setting the new one. Suggested-by: Alex Deucher <[email protected]> Signed-off-by: longlyao <[email protected]> Signed-off-by: Shane Xiao <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: disable lane0 L1TLB and enable lane1 L1TLBYifan Zhang1-0/+13
This patch to disable lane0 L1TLB and enable lane1 L1TLB. Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: init SAW registers for mmhub v3.3Yifan Zhang1-0/+40
This patch to configure mmhub3.3 SAW registers Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: fix comments and error message for ipdumpSunil Khatri3-8/+8
Fix comments and error messages to rightly represent the information. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: rename ip_dump_cp_queues to compute queuesSunil Khatri4-22/+22
Rename the variable ip_dump_cp_queues to ip_dump_compute_queue as it represent compute queues. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: add cp queue registers for gfx9 ipdumpSunil Khatri1-2/+108
Add gfx9 support of CP queue registers for all queues to be used by devcoredump. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: add print support for gfx9 ipdumpSunil Khatri1-1/+17
Add support of gfx9 ipdump print so devcoredump could trigger it to dump the captured registers in devcoredump. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: add gfx9 register support in ipdumpSunil Khatri1-1/+123
Add general registers of gfx9 in ipdump for devcoredump support. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: Fix type mismatch in amdgpu_gfx_kiq_init_ringSrinivasan Shanmugam1-1/+2
This commit fixes a type mismatch in the amdgpu_gfx_kiq_init_ring function triggered by the snprintf function expecting unsigned char arguments due to the '%hhu' format specifier, but receiving int and u32 arguments. The issue occurred because the arguments xcc_id, ring->me, ring->pipe, and ring->queue were of type int and u32, not unsigned char. This led to a type mismatch when these arguments were passed to snprintf. To resolve this, the snprintf line was modified to cast these arguments to unsigned char. This ensures that the arguments are of the correct type for the '%hhu' format specifier and resolves the warning. Fixes the below: >> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:4: warning: format >> specifies type 'unsigned char' but the argument has type 'int' >> [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~ >> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:12: warning: format >> specifies type 'unsigned char' but the argument has type 'u32' (aka >> 'unsigned int') [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:22: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:34: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~~~~~~ 4 warnings generated. Fixes: 0ea554455542 ("drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Cc: Lijo Lazar <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgu: fix Unintentional integer overflow for mall sizeJesse Zhang1-1/+1
Potentially overflowing expression mall_size_per_umc * adev->gmc.num_umc with type unsigned int (32 bits, unsigned) is evaluated using 32-bit arithmetic,and then used in a context that expects an expression of type u64 (64 bits, unsigned). Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: Update programming for boot error reportingHawking Zhang2-57/+46
AMDGPU_RAS_GPU_ERR_BOOT_STATUS field is no longer valid. The polling sequence is also simplifed according to the latest firmware change. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu/soc24: use common nbio callback to set remap offsetAlex Deucher1-3/+1
This fixes HDP flushes on systems with non-4K pages. Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: Estimate RAS reservation when report capacity v2Hawking Zhang3-2/+31
Add estimate of how much vram we need to reserve for RAS when caculating the total available vram. v2: apply the change to MP0 v13_0_2 and v13_0_14 Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: use u32 for buf size in __amdgpu_eeprom_xferTao Zhou1-5/+5
And also make sure the value of msg[1].len should be in the range of u16. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: drop some kernel messages in VCN codeDavid (Ming Qiang) Wu3-13/+1
We have messages when the VCN fails to initialize and there is no need to report on success. Also PSP loading FWs is the default for production. Acked-by: Christian König <[email protected]> Reviewed-by: Sonny Jiang <[email protected]> Signed-off-by: David (Ming Qiang) Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-05drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_GFX12Shane Xiao2-12/+14
This patch changes the implementation of AMDGPU_PTE_MTYPE_GFX12, clear the bits before setting the new one. This fixed the potential issue that GFX12 setting memory to NC. v2: Clear mtype field before setting the new one (Alex) v3: Fix typo (Felix) Suggested-by: Alex Deucher <[email protected]> Signed-off-by: longlyao <[email protected]> Signed-off-by: Shane Xiao <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-31Merge tag 'amd-drm-fixes-6.10-2024-05-30' of ↵Dave Airlie5-18/+27
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.10-2024-05-30: amdgpu: - RAS fix - Fix colorspace property for MST connectors - Fix for PCIe DPM - Silence UBSAN warning - GPUVM robustness fix - Partition fix - Drop deprecated I2C_CLASS_SPD amdkfd: - Revert unused changes for certain 11.0.3 devices - Simplify APU VRAM handling Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-05-29drm/amdgpu: Make CPX mode auto default in NPS4Rajneesh Bhardwaj1-1/+1
On GFXIP9.4.3, make CPX mode as the default compute mode if the node is setup in NPS4 memory partition mode. This change is only applicable for dGPU, for APU, continue to use TPX mode. Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdkfd: simplify APU VRAM handlingAlex Deucher1-8/+8
With commit 89773b85599a ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code. v2: clean up a few more places (Lang) Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdgpu: fix dereference null return value for the function ↵Jesse Zhang1-1/+5
amdgpu_vm_pt_parent The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer parent. Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()Alex Deucher1-7/+12
Use current speed/width on devices which don't support dynamic PCIe switching. Fixes: 466a7d115326 ("drm/amd: Use the first non-dGPU PCI device for BW limits") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289 Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdgpu/gfx11: enable gfx pipe1 hardware supportAlex Deucher1-3/+3
Enable gfx pipe1 hardware support. Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdgpu: Make CPX mode auto default in NPS4Rajneesh Bhardwaj1-1/+1
On GFXIP9.4.3, make CPX mode as the default compute mode if the node is setup in NPS4 memory partition mode. This change is only applicable for dGPU, for APU, continue to use TPX mode. Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdgpu/gfx11: handle priority setup for gfx pipe1Alex Deucher1-11/+25
Set up pipe1 as a high priority queue. Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdgpu/gfx11: select HDP ref/mask according to gfx ring pipeAlex Deucher1-1/+1
Use correct ref/mask for differnent gfx ring pipe. Ported from ZhenGuo's patch for gfx10. Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdgpu: Add lock around VF RLCG interfaceVictor Skvortsov3-0/+9
flush_gpu_tlb may be called from another thread while device_gpu_recover is running. Both of these threads access registers through the VF RLCG interface during VF Full Access. Add a lock around this interface to prevent race conditions between these threads. Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdkfd: simplify APU VRAM handlingAlex Deucher1-8/+8
With commit 89773b85599a ("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs") big and small APU "VRAM" handling in KFD was unified. Since AMD_IS_APU is set for both big and small APUs, we can simplify the checks in the code. v2: clean up a few more places (Lang) Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdgpu: fix typo in amdgpu_ras_aca_sysfs_read() functionYang Wang1-1/+1
fix typo "info.ue_count" in amdgpu_ras_aca_sysfs_read() function. Fixes: 865d3397630b ("drm/amdgpu: add aca deferred error type support") Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdgpu: fix dereference null return value for the function ↵Jesse Zhang1-1/+5
amdgpu_vm_pt_parent The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer parent. Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amd/amdgpu: add AMD_PG_SUPPORT_VCN_DPG flagDavid (Ming Qiang) Wu1-1/+2
AMD_PG_SUPPORT_VCN_DPG is needed for secure parts and should/can be enabled by now. Signed-off-by: David (Ming Qiang) Wu <[email protected]> Reviewed-by: Sonny Jiang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdgpu: drop MES 10.1 support v3Alex Deucher6-1450/+71
It was an enablement vehicle for MES 11 and was never productized. Remove it. v2: drop additional checks in the GFX10 code. v3: drop mes_api_def.h Acked-by: Christian König <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-29drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()Alex Deucher1-7/+12
Use current speed/width on devices which don't support dynamic PCIe switching. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289 Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-27Merge drm/drm-next into drm-misc-nextMaxime Ripard10-57/+88
Let's start the new release cycle. Signed-off-by: Maxime Ripard <[email protected]>