aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2024-01-15drm/amdgpu: Clean up errors in jpeg_v2_5.cchenxuebing1-6/+4
Fix the following errors reported by checkpatch: ERROR: space required before the open parenthesis '(' ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in gfx_v9_4.cchenxuebing1-2/+3
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in amdgpu_drv.cchenxuebing1-2/+2
Fix the following errors reported by checkpatch: ERROR: do not initialise globals to 0 Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amd: Clean up errors in amdgpu_vkms.cchenxuebing1-2/+1
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Fix the null pointer when load rlc firmwareMa Jun1-9/+6
If the RLC firmware is invalid because of wrong header size, the pointer to the rlc firmware is released in function amdgpu_ucode_request. There will be a null pointer error in subsequent use. So skip validation to fix it. Signed-off-by: Ma Jun <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Centralize ras cap query to amdgpu_ras_check_supportedHawking Zhang1-77/+93
Move ras capablity check to amdgpu_ras_check_supported. Driver will query ras capablity through psp interace, or vbios interface, or specific ip callbacks. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Query ras capablity from psp v2Hawking Zhang3-0/+38
Instead of traditional atomfirmware interfaces for RAS capability, host driver can query ras capability from psp starting from psp v13_0_6. v2: drop redundant local variable from get_ras_capability. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in amdgpu_rlc.cchenxuebing1-1/+1
Fix the following errors reported by checkpatch: ERROR: space prohibited before that '++' (ctx:WxB) Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amd: Clean up errors in sdma_v2_4.cchenxuebing1-9/+6
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line ERROR: trailing statements should be on next line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amd/amdgpu: Clean up errors in amdgpu_umr.hchenxuebing1-2/+2
Fix the following errors reported by checkpatch: spaces required around that '=' (ctx:VxV) Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in amdgpu_atomfirmware.hchenxuebing1-1/+1
Fix the following errors reported by checkpatch: ERROR: "foo* bar" should be "foo *bar" Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in clearstate_gfx9.hchenxuebing1-18/+9
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in navi10_ih.cchenxuebing1-2/+1
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: check PS, WS indexAlexander Richards8-47/+75
Theoretically, it would be possible for a buggy or malicious VBIOS to overwrite past the bounds of the passed parameters (or its own workspace); add bounds checking to prevent this from happening. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3093 Signed-off-by: Alexander Richards <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Align ras block enum with firmwareHawking Zhang1-0/+2
Driver and firmware share the same ras block enum. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: replace MCA macro with ACA for XGMIYang Wang1-12/+12
use new ACA macro to instead of MCA Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Log deferred error separatelyCandice Li6-58/+139
Separate deferred error from UE and CE and log it individually. Signed-off-by: Candice Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Do bad page retirement for deferred errorsCandice Li1-6/+4
Needs to do bad page retirement for deferred errors. v2: Drop unused dev_info. Signed-off-by: YiPeng Chai <[email protected]> Signed-off-by: Candice Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add xgmi v6.4.0 ACA supportYang Wang1-1/+60
add xgmi v6.4.0 ACA driver support Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add mmhub v1.8 ACA supportYang Wang1-0/+87
v1: add mmhub v1.8 ACA driver support v2: use macro to define smn address value. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add sdma v4.4.2 ACA supportYang Wang1-0/+72
v1: add sdma v4.4.2 ACA driver support v2: use macro to define smn address value. v3: squash in fix for unbalanced irqs Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add gfx v9.4.3 ACA supportYang Wang1-0/+88
v1: add gfx v9.4.3 ACA driver support v2: use macro to define smn address value. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Check extended configuration space register when system uses ↵Ma Jun1-0/+4
large bar Some customer platforms do not enable mmconfig for various reasons, such as bios bug, and therefore cannot access the GPU extend configuration space through mmio. When the system enters the d3cold state and resumes, the amdgpu driver fails to resume because the extend configuration space registers of GPU can't be restored. At this point, Usually we only see some failure dmesg log printed by amdgpu driver, it is difficult to find the root cause. Therefor print a warnning message if the system can't access the extended configuration space register when using large bar. Signed-off-by: Ma Jun <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add umc v12.0 ACA supportYang Wang1-0/+58
add umc v12.0 ACA driver support Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add aca sysfs supportYang Wang4-1/+51
add aca sysfs node support Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add amdgpu ras aca query interfaceYang Wang4-18/+109
v1: add ACA error query interface v2: Add a new helper function to determine whether to use ACA or MCA. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: move kiq_reg_write_reg_wait() out of amdgpu_virt.cAlex Deucher7-70/+74
It's used for more than just SR-IOV now, so move it to amdgpu_gmc.c and rename it to better match the functionality and update the comments in the code paths to better document when each path is used and why. No functional change. Reviewed-by: Shaoyun.liu <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Cc: [email protected]
2024-01-15drm/amdgpu: add new INFO IOCTL query for input powerAlex Deucher1-0/+9
Some chips provide both average and input power. Previously we just exposed average power, add a new query for input power. Example userspace: https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Query boot status if boot failedHawking Zhang1-2/+9
Check and report firmware boot status if it doesn't reach steady status. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add ACA bank dump debugfs supportYang Wang4-0/+136
add ACA bank dump debugfs support Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add ACA kernel hardware error log supportYang Wang1-0/+29
add ACA kernel hardware error log support. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Query boot status if discovery failedHawking Zhang1-1/+5
Check and report boot status if discovery failed. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Add ras helper to query boot errors v2Hawking Zhang3-1/+110
Add ras helper function to query boot time gpu errors. v2: use aqua_vanjaram smn addressing pattern Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: implement RAS ACA driver frameworkYang Wang6-1/+881
v1: implement new RAS ACA driver code framework. v2: - rename aca_bank_set to aca_banks. - rename aca_source_xxx to aca_handle_xxx. v3: Optimize some function implementation details. (from Hawking's suggestion) Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Init pcie_index/data address as fallback (v2)Hawking Zhang1-5/+18
To allow using this helper for indirect access when nbio funcs is not available. For instance, in ip discovery phase. v2: define macro for pcie_index/data/index_hi fallback. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: drop psp v13 query_boot_status implementationHawking Zhang4-99/+0
Will replace it with new implementation to cover boot fails in ip discovery phase. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Replace DRM_* with dev_* in amdgpu_psp.cHawking Zhang1-69/+75
So kernel message has the device pcie bdf information, which helps issue debugging especially in multiple GPU system. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Auto-validate DMABuf imports in compute VMsFelix Kuehling7-32/+135
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the process_info->kfd_bo_list. There is no explicit KFD API call to validate them or add eviction fences to them. This patch automatically validates and fences dymanic DMABuf imports when they are added to a compute VM. Revalidation after evictions is handled in the VM code. v2: * Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate * Eliminated evicted_user state, use evicted state for VM BOs and user BOs * Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs * Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into amdgpu_vm_validate, outside the vm->status_lock * Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds without KFD v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the reservation with its fences is shared with the export, as long as all imports are from KFD, with the exports already reserved, validated and fenced by the KFD restore worker. v5: Reintroduced separate evicted_user state to simplify the state machine and CS error handling when amdgpu_vm_validate is called without a ticket. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: drop exp hw support check for GC 9.4.3Alex Deucher1-2/+0
No longer needed. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.7.x
2024-01-15drm/amdgpu: move debug options init prior to amdgpu device initLe Ma1-2/+2
To bring debug options into effect in early initialization phase Signed-off-by: Le Ma <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add debug flag to place fw bo on vram for frontdoor loadingLe Ma4-2/+10
Use debug_mask=0x8 param to help isolating data path issues on new systems in early phase. v2: rename the flag for explicitness (lijo) Signed-off-by: Le Ma <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15Revert "drm/amdgpu: add param to specify fw bo location for front-door loading"Le Ma4-10/+2
This reverts commit c572abffe9f50c8ba33060865449313b3f588c35. Will use debug module param instead of independent module param. Signed-off-by: Le Ma <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: update regGL2C_CTRL4 value in golden settingYifan Zhang1-1/+1
This patch to update regGL2C_CTRL4 in golden setting. Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.7.x
2024-01-15drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'Srinivasan Shanmugam1-0/+1
In function 'amdgpu_device_need_post(struct amdgpu_device *adev)' - 'adev->pm.fw' may not be released before return. Using the function release_firmware() to release adev->pm.fw. Thus fixing the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1571 amdgpu_device_need_post() warn: 'adev->pm.fw' from request_firmware() not released on lines: 1554. Cc: Monk Liu <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Fix unsigned comparison with less than zero in ↵Srinivasan Shanmugam1-8/+2
vpe_u1_8_from_fraction() The variables 'numerator' and 'denominator', are unsigned 16-bit integer types, that can never be less than 0. Thus fixing the below: drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:62 vpe_u1_8_from_fraction() warn: unsigned 'numerator' is never less than zero. drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:63 vpe_u1_8_from_fraction() warn: unsigned 'denominator' is never less than zero. Cc: Peyton Lee <[email protected]> Cc: Lang Yu <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Peyton Lee <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'Srinivasan Shanmugam1-7/+14
The amdgpu_gmc_vram_checking() function in emulation checks whether all of the memory range of shared system memory could be accessed by GPU, from this aspect, -EIO is returned for error scenarios. Fixes the below: drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:919 gmc_v6_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1103 gmc_v7_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1223 gmc_v8_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2344 gmc_v9_0_hw_init() warn: missing error code? 'r' Cc: Xiaojian Du <[email protected]> Cc: Lijo Lazar <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Do not program VM_L2_CNTL under SRIOVVictor Lu1-4/+6
VM_L2_CNTL* should not be programmed on driver unload under SRIOV. These regs are skipped during SRIOV driver init. Signed-off-by: Victor Lu <[email protected]> Reviewed-by: Vignesh Chander <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: update ATHUB_MISC_CNTL offset for athub v3.3Yifan Zhang1-0/+8
This patch to update ATHUB_MISC_CNTL offset for athub v3.3 v2: correct a typo (Tim) v3: correct patch title (Lang) Signed-off-by: Yifan Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: fall back to INPUT power for AVG power via INFO IOCTLAlex Deucher1-1/+6
For backwards compatibility with userspace. Fixes: 47f1724db4fe ("drm/amd: Introduce `AMDGPU_PP_SENSOR_GPU_INPUT_POWER`") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2897 Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-09drm/amdgpu: make a correction on commentJames Zhu1-1/+1
Use a generic comment for AMDGPU_VM_RESERVED_VRAM size. Signed-off-by: James Zhu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>