aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2024-05-02drm/amdgpu: Fix two reset triggered in a rowYunxiang Li5-13/+14
Some times a hang GPU causes multiple reset sources to schedule resets. The second source will be able to trigger an unnecessary reset if they schedule after we call amdgpu_device_stop_pending_resets. Move amdgpu_device_stop_pending_resets to after the reset is done. Since at this point the GPU is supposedly in a good state, any reset scheduled after this point would be a legitimate reset. Remove unnecessary and incorrect checks for amdgpu_in_reset that was kinda serving this purpose. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-02drm/amdgpu: update vf to pf message retry from 2 to 5Zhigang Luo1-1/+1
increase retry times to wait host has enough time to complete reset. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-02drm/amdgpu: avoid reading vf2pf info size from FBZhigang Luo1-1/+1
VF can't access FB when host is doing mode1 reset. Using sizeof to get vf2pf info size, instead of reading it from vf2pf header stored in FB. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-02Revert "drm: Switch DRM_DISPLAY_HELPER to depends on"Geert Uytterhoeven1-4/+2
This reverts commit e075e496f516bf92bc0cbaf94d64e8d4a6b58321, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/1ba76cc4d96a8afefff5d1bc42fb1e1329c5da68.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <[email protected]>
2024-05-02Revert "drm: Switch DRM_DISPLAY_DP_HELPER to depends on"Geert Uytterhoeven1-1/+1
This reverts commit 0323287de87d7e6e9c22c57d7440aa353a2298d0, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/89ac456805746b6d0c888f10c5120b11aacd3319.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <[email protected]>
2024-05-02Revert "drm: Switch DRM_DISPLAY_HDCP_HELPER to depends on"Geert Uytterhoeven1-1/+1
This reverts commit 3166e7e6d935caaef07605a5c90773fbf9ffeaf4, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/a40e70a0abd3d841c23c107d452a43fdd70ef37a.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <[email protected]>
2024-05-02Revert "drm: Switch DRM_DISPLAY_HDMI_HELPER to depends on"Geert Uytterhoeven1-1/+1
This reverts commit f6d2dc03fa8546b284dd8c1af027d9fac5725921, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/bd288a5943dab8609f2d1f2bf413595a61df727a.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <[email protected]>
2024-05-02drm/fbdev-generic: Convert to fbdev-ttmThomas Zimmermann1-3/+3
Only TTM-based drivers use fbdev-generic. Rename it to fbdev-ttm and change the symbol infix from _generic_ to _ttm_. Link the source file into TTM helpers, so that it is only build if TTM-based drivers have been selected. Select DRM_TTM_HELPER for loongson. Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Javier Martinez Canillas <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-04-30drm/amdgpu: fix doorbell regressionShashank Sharma1-1/+1
This patch adds a missed handling of PL domain doorbell while handling VRAM faults. Cc: Christian Koenig <[email protected]> Cc: Alex Deucher <[email protected]> Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults") Reviewed-by: Christian Koenig <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2Christian König3-28/+38
This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes were suggested, but essentially we should call the move notification before actually moving things because only this way we have the correct order for DMA-buf and VM move notifications as well. Also rework the statistic handling so that we don't update the eviction counter before the move. v2: add missing NULL check Signed-off-by: Christian König <[email protected]> Fixes: 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171 Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> CC: [email protected]
2024-04-30drm/amdgpu: Fix VRAM memory accountingMukul Joshi1-1/+1
Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: fix uninitialized scalar variable warningTim Huang1-0/+2
Clear warning that field bp is uninitialized when calling amdgpu_virt_ras_add_bps. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/discovery: add sdma v7_0 ip blockLikun Gao1-0/+5
Add sdma v7_0 ip block. v2: squash in updates (Alex) Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: provide more ucode name shown via idLikun Gao1-0/+24
Provide some lost ucode name shown via firmware ID. v2: fix whitespace (Alex) Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: support SDMA v3 struct fw front door loadLikun Gao4-0/+19
Add support for new SDMA firmware struct (V3) with PSP front door load type. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/sdma7: set sdma hang watchdogJack Xiao1-0/+7
Set SDMAx_WATCHDOG_CNTL.QUEUE_HANG_COUNT registers to improve SDMA reliability. Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add sdma v7_0 ip block support (v7)Likun Gao4-1/+1668
v1: Add sdma v7_0 ip block support. (Likun) v2: Move vmhub from ring_funcs to ring. (Hawking) v3: Switch to AMDGPU_GFXHUB(0). (Hawking) v4: Move microcode init into early_init. (Likun) v5: Fix warnings (Alex) v6: Squash in various fixes (Alex) v7: Rebase (Alex) v8: Rebase (Alex) Signed-off-by: Likun Gao <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add sdma fw v3 structureLikun Gao2-0/+15
Add sdma firmware struct version 3 to support sdma v7_0 firmware. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Fix signedness bug in sdma_v4_0_process_trap_irq()Dan Carpenter1-1/+1
The "instance" variable needs to be signed for the error handling to work. Fixes: 8b2faf1a4f3b ("drm/amdgpu: add error handle to avoid out-of-bounds") Reviewed-by: Bob Zhou <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add new members for sdma v7_0 fwLikun Gao1-0/+4
Add new members in sdma instance structure for sdma v7_0 firmware. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/discovery: Add gmc v12_0 ip blockLikun Gao1-0/+5
Add gmc v12_0 ip block. v2: Squash in updates (Alex) Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: fix doorbell regressionShashank Sharma1-1/+1
This patch adds a missed handling of PL domain doorbell while handling VRAM faults. Cc: Christian Koenig <[email protected]> Cc: Alex Deucher <[email protected]> Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults") Reviewed-by: Christian Koenig <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Arvind Yadav <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: support gfx v12 specific pte/pde fieldsHawking Zhang5-12/+18
Add gfx v12 pte/pde support to gmc common helper. v2: squash in fixes (Alex) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Set pte_is_pte flag in gmc v12 gartHawking Zhang1-1/+2
pte_is_pte is new flag introduced in gmc v12 that needs to be set by default for pte. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add gmc v12_0 ip block support (v7)Hawking Zhang3-1/+1031
Add initial support for GMC v12. v1: Add gmc v12_0 ip block support. v2: Switch to gfx.kiq array. v3: Switch to vmhubs_mask. v4: Switch to AMDGPU_MMHUB0(0) and AMDGPU_GFXHUB(0) v5: Rebase (Alex) v6: Squash in fixes for AGP handling, gfxhub init order, vmhub index (Alex) v7: Rebase (Alex) v8: squash in ecc fix (Alex) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add gfx v12 pte/pde format changeHawking Zhang1-0/+13
Add gfx v12 pte/pde format change. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add gfxhub v12_0 ip block support (v3)Likun Gao3-1/+531
Add initial gfxhub v12 support. v1: Add gfxhub v12_0 ip block support (Likun) v2: Switch to AMDGPU_GFXHUB(0) (Hawking) v3: Squash in keep default error response mode (Hawking) Signed-off-by: Likun Gao <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/mes11: increase waiting time for engine readyJack Xiao1-1/+1
mes schq engine require more waiting time for engine ready before packet submission. Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/gfx: enable mes to map legacy queue supportJack Xiao1-9/+41
Enable mes to map legacy queue support. v2: kiq_set_resources is required. Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdkfd: Evict BO itself for contiguous allocationPhilip Yang1-1/+18
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to system memory first to free the VRAM space, then allocate contiguous VRAM space, and then move it from system memory back to VRAM. v6: user context should use interruptible call (Felix) Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Remove redundant function callYiPeng Chai1-16/+6
Remove redundant function call. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30rm/amdgpu: Remove unused codeYiPeng Chai1-71/+0
Remove unused code. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: fix overflowed array index read warningTim Huang1-1/+2
Clear overflowed array index read warning by cast operation. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: fix potential resource leak warningTim Huang1-0/+5
Clear resource leak warning that when the prepare fails, the allocated amdgpu job object will never be released. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: avoid dump mca bank log muti times during ras ISRYang Wang2-0/+28
because the ue valid mca count will only be cleared after gpu reset, so only dump mca log on the first time to get mca bank after receive RAS interrupt. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: add MCA smu cache supportYang Wang3-7/+116
v1: because SMU CE valid mca bank will be cleared after reading, this patch adds mca cache at the driver level to ensure that the mca bank is not lost. v2: refine amdgpu_mca_init/fini/reset() function name. v3: add mca_cache.lock support only add CE bank to mca bank cache. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: add amdgpu MCA bank dispatch function supportYang Wang1-42/+55
- Refine mca driver code. - Centralize mca bank dispatch code logic. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Add mmhub v4_1_0 ip block support (v4)Hawking Zhang3-1/+683
Add initial support for MMHUB 4.1.0. v1: Add mmhub v4_1_0 ip block support. v2: Switch to AMDGPU_MMHUB0(0). v3: squash in fix for ip version check (Alex) v4: squash in vm_contexts_disable fix (Alex) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Evict BOs from same process for contiguous allocationPhilip Yang1-1/+2
When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system memory then retry the allocation, this skips the KFD BOs from the same process because KFD require all BOs are resident for user queues. If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow TTM evict KFD BOs from the same process, this will evict the user queues first, and restore the queues later after contiguous VRAM allocation. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Handle sg size limit for contiguous allocationPhilip Yang1-6/+6
Define macro AMDGPU_MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist length is unsigned int, and some users of it cast to a signed int, so every segment of sg table is limited to size 2GB maximum. For contiguous VRAM allocation, don't limit the max buddy block size in order to get contiguous VRAM memory. To workaround the sg table segment size limit, allocate multiple segments if contiguous size is bigger than AMDGPU_MAX_SG_SEGMENT_SIZE. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: remove unused MCA driver codesYang Wang2-133/+82
- remove unused callback functions. - make part of mca functions static and refine the function order. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/discovery: Add common soc24 ip blockLikun Gao1-0/+5
Add common soc24 ip block. v2: squash in updates (Alex) Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/mes11: adjust mes initialization sequenceJack Xiao1-1/+8
Adjust mes queue initialization before kgq/kcq initialization to enable mes mapping legacy queue. Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu/mes11: add mes mapping legacy queue supportJack Xiao1-0/+26
Add mes11 map legacy queue packet submission. Signed-off-by: Jack Xiao <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2Christian König3-28/+38
This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes were suggested, but essentially we should call the move notification before actually moving things because only this way we have the correct order for DMA-buf and VM move notifications as well. Also rework the statistic handling so that we don't update the eviction counter before the move. v2: add missing NULL check Signed-off-by: Christian König <[email protected]> Fixes: 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171 Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> CC: [email protected]
2024-04-30drm/amdgpu: Add soc24 common ip block (v2)Hawking Zhang3-1/+561
Add initial soc24 support. v1: Add soc24 common ip block. v2: Switch to new select_se_sh/enter_safe_mode interface. v3: squash in correct ext rev id, etc. (Alex) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Support contiguous VRAM allocationPhilip Yang1-0/+4
RDMA device with limited scatter-gather ability requires contiguous VRAM buffer allocation for RDMA peer direct support. Add a new KFD alloc memory flag and store as bo alloc flag AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask VRAM buddy allocator to get contiguous VRAM. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: Fix uninitialized variable warning in amdgpu_afmt_acrMa Jun1-0/+1
Assign value to clock to fix the warning below: "Using uninitialized value res. Field res.clock is uninitialized" Signed-off-by: Ma Jun <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30Merge tag 'amd-drm-next-6.10-2024-04-26' of ↵Dave Airlie98-141/+1322
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.10-2024-04-26: amdgpu: - Misc code cleanups and refactors - Support setting reset method at runtime - Report OD status - SMU 14.0.1 fixes - SDMA 4.4.2 fixes - VPE fixes - MES fixes - Update BO eviction priorities - UMSCH fixes - Reset fixes - Freesync fixes - GFXIP 9.4.3 fixes - SDMA 5.2 fixes - MES UAF fix - RAS updates - Devcoredump updates for dumping IP state - DSC fixes - JPEG fix - Fix VRAM memory accounting - VCN 5.0 fixes - MES fixes - UMC 12.0 updates - Modify contiguous flags handling - Initial support for mapping kernel queues via MES amdkfd: - Fix rescheduling of restore worker - VRAM accounting for SVM migrations - mGPU fix - Enable SQ watchpoint for gfx10 Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-04-29Merge v6.9-rc6 into drm-nextDaniel Vetter10-26/+33
Thomas needs the defio fixes, Maíra needs the vkms fixes and Joonas has some fun with i915-gem conflicts. Signed-off-by: Daniel Vetter <[email protected]>