aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2023-10-20drm/amdgpu: drop status query/reset for GCEA 9.4.3 and MMEA 1.8Tao Zhou2-203/+0
PMFW will be responsible for them. v2: remove query interfaces. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amdgpu: update the xgmi ta interface headerShiwu Zhang2-17/+51
Update the header file to the v20.00.00.13 v1: rename TA_COMMAND_XGMI__GET_GET_TOPOLOGY_INFO to TA_COMMAND_XGMI__GET_TOPOLOGY_INFO And also rename struct ta_xgmi_cmd_get_peer_link_info_output to ta_xgmi_cmd_get_peer_link_info accordingly v2: add structs to support xgmi GET_EXTEND_PEER_LINK command Signed-off-by: Shiwu Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amdgpu: add set/get mca debug mode operationsTao Zhou2-0/+26
Record the debug mode status in RAS. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amdgpu: replace reset_error_count with amdgpu_ras_reset_error_countTao Zhou5-25/+10
Simplify the code. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amdgpu: add clockgating support for NBIO v7.7.1Li Ma1-0/+1
add clockgating support for NBIO ip 7.7.1 Signed-off-by: Li Ma <[email protected]> Reviewed-by: Tim Huang <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amdgpu: fix missing stuff in NBIO v7.11Li Ma2-0/+79
add get_clockgating_state, update_medium_grain_light_sleep and update_medium_grain_clock_gating in nbio_v7_11_funcs v1: add missing funcs in nbio_v7_11.c v2: modify the if condition and add spport for nbio v7.11 clockgating. Signed-off-by: Li Ma <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amdgpu: Enable RAS feature by default for APUStanley.Yang1-12/+2
Enable RAS feature by default for aqua vanjaram on apu platform. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amdgpu: fix typo for amdgpu ras error data printYang Wang1-2/+2
typo fix. Fixes: 5b1270beb380 ("drm/amdgpu: add ras_err_info to identify RAS error source") Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Candice Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P4Bokun Zhang1-16/+55
- In VCN 4 SRIOV code path, add code to enable RB decouple feature Signed-off-by: Bokun Zhang <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P3Bokun Zhang1-7/+41
- Update VCN header for RB decouple feature - Add metadata struct, metadata will be placed after each RB Signed-off-by: Bokun Zhang <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P2Bokun Zhang1-0/+4
- Add function to check if RB decouple is enabled under SRIOV Signed-off-by: Bokun Zhang <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P1Bokun Zhang1-2/+3
- Update SRIOV header with RB decouple flag Signed-off-by: Bokun Zhang <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amdgpu: Fix delete nodes that have been relesedStanley.Yang1-3/+1
Fix delete nodes that it has been freed. Fixes: 5b1270beb380 ("drm/amdgpu: add ras_err_info to identify RAS error source") Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amdgpu: Add UVD_VCPU_INT_EN2 to dpg sramHawking Zhang1-0/+5
Add RAS sepcifc programming to dpg sram. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amdgpu: Enable software RAS in vcn v4_0_3Hawking Zhang1-1/+3
Set VCN/JPEG RAS masks to enable software RAS for VCN and JPEG. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amdgpu: define ras_reset_error_count functionTao Zhou2-4/+17
Make the code architecture more simple. v2: reuse ras_reset_error_count in ras_reset_error_status. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20drm/amdgpu: Log UE corrected by replay as correctable errorCandice Li1-3/+4
Support replay mode where UE could be converted to CE. Signed-off-by: Candice Li <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-20Merge tag 'drm-misc-fixes-2023-10-19' of ↵Dave Airlie1-2/+3
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: amdgpu: - Disable AMD_CTX_PRIORITY_UNSET bridge: - ti-sn65dsi86: Fix device lifetime edid: - Add quirk for BenQ GW2765 ivpu: - Extend address range for MMU mmap nouveau: - DP-connector fixes - Documentation fixes panel: - Move AUX B116XW03 into panel-simple scheduler: - Eliminate DRM_SCHED_PRIORITY_UNSET ttm: - Fix possible NULL-ptr deref in cleanup Signed-off-by: Dave Airlie <[email protected]> From: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/20231019114605.GA22540@linux-uq9g
2023-10-19drm/amdgpu: Reserve fences for VM updateFelix Kuehling1-1/+4
In amdgpu_dma_buf_move_notify reserve fences for the page table updates in amdgpu_vm_clear_freed and amdgpu_vm_handle_moved. This fixes a BUG_ON in dma_resv_add_fence when using SDMA for page table updates. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19drm/amdgpu: Fix possible null pointer dereferenceFelix Kuehling1-1/+2
abo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19drm/amdgpu: Reserve fences for VM updateFelix Kuehling1-1/+4
In amdgpu_dma_buf_move_notify reserve fences for the page table updates in amdgpu_vm_clear_freed and amdgpu_vm_handle_moved. This fixes a BUG_ON in dma_resv_add_fence when using SDMA for page table updates. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19drm/amdgpu: Fix possible null pointer dereferenceFelix Kuehling1-1/+2
abo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19drm/amdgpu: Workaround to skip kiq ring test during ras gpu recoveryStanley.Yang1-0/+21
This is workaround, kiq ring test failed in suspend stage when do ras recovery. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19drm/amd: Read IMU FW version from scratch register during hw_initMario Limonciello1-0/+4
If the IMU version wasn't discovered from the header, such as when the firmware was directly loaded by PSP then there is no firmware version to show to userspace from sysfs or IOCTL. The IMU F/W stores the version in the first scratch register though, so fetch it in these cases to let the driver export. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19drm/amd: Don't parse IMU ucode version if it won't be loadedMario Limonciello1-2/+2
When the IMU ucode is loaded by the PSP parsing the version that comes from Linux will vary. Rather than showing the wrong data to kernel interface consumers, avoid populating it in this case. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19drm/amd: Move microcode init step to early_init()Mario Limonciello1-8/+8
The intention for early init is to find any missing microcode early and fail the driver load if it's missing. Move this step to earlier in driver init to match other IP blocks. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19drm/amdgpu: update retry times for psp BL waitAsad Kamal1-1/+1
Increase retry time for PSP BL wait, to compensate for longer time to set c2pmsg 35 ready bit during mode1 with RAS Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19drm/amdgpu/mes11: remove aggregated doorbell codeAlex Deucher3-185/+40
It's not enabled in hardware so the code is dead. Remove it. Reviewed-by: Jack Xiao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19drm/amdgpu : Add hive ras recovery checkAsad Kamal2-2/+8
If one of the devices in the hive detects a fatal error, need to send ras recovery reset message to PMFW of all devices in the hive. For that add a flag in hive to indicate that it's undergoing ras recovery Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19Revert "drm/amdgpu: Program xcp_ctl registers as needed"Mangesh Gadre1-12/+11
This reverts commit 0bdebfef3fb2b6291000765eaa9c6c8030293fce. XCP_CTL register is programmed by firmware and register access is protected. Signed-off-by: Mangesh Gadre <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19drm/amdgpu/umsch: add suspend and resume callbackLang Yu1-0/+16
Add missing IP callbacks. Signed-off-by: Lang Yu <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Veerabadhran Gopalakrishnan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-19drm/amdgpu: ignore duplicate BOs againChristian König1-1/+2
Looks like RADV is actually hitting this. Signed-off-by: Christian König <[email protected]> Fixes: ca6c1e210aa7 ("drm/amdgpu: use the new drm_exec object for CS v3") Acked-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-10-18Merge tag 'amd-drm-next-6.7-2023-10-13' of ↵Dave Airlie174-1620/+10727
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.7-2023-10-13: amdgpu: - DC replay fixes - Misc code cleanups and spelling fixes - Documentation updates - RAS EEPROM Updates - FRU EEPROM Updates - IP discovery updates - SR-IOV fixes - RAS updates - DC PQ fixes - SMU 13.0.6 updates - GC 11.5 Support - NBIO 7.11 Support - GMC 11 Updates - Reset fixes - SMU 11.5 Updates - SMU 13.0 OD support - Use flexible arrays for bo list handling - W=1 Fixes - SubVP fixes - DPIA fixes - DCN 3.5 Support - Devcoredump fixes - VPE 6.1 support - VCN 4.0 Updates - S/G display fixes - DML fixes - DML2 Support - MST fixes - VRR fixes - Enable seamless boot in more cases - Enable content type property for HDMI - OLED fixes - Rework and clean up GPUVM TLB flushing - DC ODM fixes - DP 2.x fixes - AGP aperture fixes - SDMA firmware loading cleanups - Cyan Skillfish GPU clock counter fix - GC 11 GART fix - Cache GPU fault info for userspace queries - DC cursor check fixes - eDP fixes - DC FP handling fixes - Variable sized array fixes - SMU 13.0.x fixes - IB start and size alignment fixes for VCN - SMU 14 Support - Suspend and resume sequence rework - vkms fix amdkfd: - GC 11 fixes - GC 10 fixes - Doorbell fixes - CWSR fixes - SVM fixes - Clean up GC info enumeration - Rework memory limit handling - Coherent memory handling fixes - Use partial migrations in GPU faults - TLB flush fixes - DMA unmap fixes - GC 9.4.3 fixes - SQ interrupt fix - GTT mapping fix - GC 11.5 Support radeon: - Misc code cleanups - W=1 Fixes - Fix possible buffer overflow - Fix possible NULL pointer dereference UAPI: - Add EXT_COHERENT memory allocation flags. These allow for system scope atomics. Proposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88 - Add support for new VPE engine. This is a memory to memory copy engine with advanced scaling, CSC, and color management features Proposed mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25713 - Add INFO IOCTL interface to query GPU faults Proposed Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238 Proposed libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/298 Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-10-17gpu/drm: Eliminate DRM_SCHED_PRIORITY_UNSETLuben Tuikov1-1/+2
Eliminate DRM_SCHED_PRIORITY_UNSET, value of -2, whose only user was amdgpu. Furthermore, eliminate an index bug, in that when amdgpu boots, it calls drm_sched_entity_init() with DRM_SCHED_PRIORITY_UNSET, which uses it to index sched->sched_rq[]. Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Acked-by: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-17drm/amdgpu: Unset context priority is now invalidLuben Tuikov1-1/+1
A context priority value of AMD_CTX_PRIORITY_UNSET is now invalid--instead of carrying it around and passing it to the Direct Rendering Manager--and it becomes AMD_CTX_PRIORITY_NORMAL in amdgpu_ctx_ioctl(), the gateway to context creation. Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Acked-by: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-13drm/amdgpu/vkms: fix a possible null pointer dereferenceMa Ke1-0/+2
In amdgpu_vkms_conn_get_modes(), the return value of drm_cvt_mode() is assigned to mode, which will lead to a NULL pointer dereference on failure of drm_cvt_mode(). Add a check to avoid null pointer dereference. Signed-off-by: Ma Ke <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu: add RAS error info support for umc_v12_0Yang Wang1-6/+14
add RAS error info support for umc_v12_0. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu: add RAS error info support for mmhub_v1_8Yang Wang1-2/+13
add RAS error info support for mmhub_v1_8. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu: add RAS error info support for gfx_v9_4_3Yang Wang1-2/+8
add RAS error info support for gfx_v9_4_3. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu: add RAS error info support for sdma_v4_4_2.Yang Wang1-1/+8
add RAS error info support for sdma_v4_4_2. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu: add ras_err_info to identify RAS error sourceYang Wang6-53/+312
introduced "ras_err_info" to better identify a RAS ERROR source. NOTE: For legacy chips, keep the original RAS error print format. v1: RAS errors may come from different dies during a RAS error query, therefore, need a new data structure to identify the source of RAS ERROR. v2: - use new data structure 'amdgpu_smuio_mcm_config_info' instead of ras_err_id (in v1 patch) - refine ras error dump function name - refine ras error dump log format Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu: flush the correct vmid tlb for specific pasidYifan Zhang1-1/+1
flush the correct vmid tlb for specific pasid on gmc 11. Fixes: 041a5743883d ("drm/amdgpu: fix and cleanup gmc_v11_0_flush_gpu_tlb_pasid") Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu: make err_data structure built-in for ras_managerYang Wang1-1/+4
(No effect outside the ras_mgr data structure) Since a new member was added to the ras_err_data data structure, it becomes unreasonable for the ras_mgr instance to contain this data, because ras mgr only uses the 2 member information of ue_count/ce_count in err_data. This patch changes the code err_data into built-in structure members, making the code directly compatible. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu: disable GFXOFF and PG during compute for GFX9Jesse Zhang1-0/+7
Temporary workaround to fix issues observed in some compute applications when GFXOFF is enabled on GFX9. Signed-off-by: Jesse Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu/umsch: fix missing stuff during rebaseLang Yu1-0/+3
These are missed during rebase. Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Veerabadhran Gopalakrishnan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu/umsch: correct IP version formatLang Yu1-4/+7
FW uses IP_VERSION_MAJ_MIN_REV format. Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Veerabadhran Gopalakrishnan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu: don't use legacy invalidation on MMHUB v3.3Lang Yu1-1/+1
Legacy invalidation is not supported. This is missed during rebase. Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu: correct NBIO v7.11 programingLang Yu1-29/+27
Use v7.7 before, switch to v7.11 now. Fix incorrect programing. Signed-off-by: Lang Yu <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu: Correctly use bo_va->ref_count in compute VMsXiaogang Chen1-3/+11
This is needed to correctly handle BOs imported into compute VM from gfx. Both kfd and gfx should use same bo_va and set bo_va->ref_count correctly when map the Bos into same VM, otherwise we may trigger kernel general protection when iterate mappings over bo_va's valids or invalids list. Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Xiaogang Chen <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Ramesh Errabolu <[email protected]> Tested-by: Xiaogang Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amd/pm: Add P2S tables for SMU v13.0.6Lijo Lazar1-0/+7
Add P2S table load support on SMU v13.0.6 ASICs. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>