aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2024-01-18drm/amdgpu: Remove unnecessary NULL checkFelix Kuehling1-1/+1
A static checker pointed out, that bo_va->base.bo was already derefenced earlier in the same scope. Therefore this check is unnecessary here. Reported-by: Dan Carpenter <[email protected]> Fixes: 50661eb1a2c8 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Reviewed-by: Kent Russell <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-18drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"Christian König4-65/+1
Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the HW in an active state and is an unbalanced use of the IP callbacks. Using the IP callbacks like this can lead to memory leaks, double free and imbalanced reference counters. Leaving the HW in an active state can lead to DMA accesses to memory now freed by the driver. Both is a complete no-go for driver unload so completely revert the workaround for now. This reverts commit f5c7e7797060255dbc8160734ccc5ad6183c5e04. Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-18drm/amdgpu: Remove usage of the deprecated ida_simple_xx() APIChristophe JAILLET1-4/+3
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). Note that the upper limit of ida_simple_get() is exclusive, but the one of ida_alloc_range() is inclusive. So a -1 has been added when needed. Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-18drm/amdkfd: init drm_client with funcs hookFlora Cui1-1/+4
otherwise drm_client_dev_unregister() would try to kfree(&adev->kfd.client). Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Flora Cui <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-18drm/amdgpu: Clean up errors in umc_v6_0.cchenxuebing1-1/+1
Fix the following errors reported by checkpatch: ERROR: space required after that ',' (ctx:VxV) Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-18drm/amdgpu: Clean up errors in clearstate_si.hchenxuebing1-16/+8
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-18drm: remove I2C_CLASS_DDC supportHeiner Kallweit1-1/+0
After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC. Class-based device auto-detection is a legacy mechanism and shouldn't be used in new code. So we can remove this class completely now. Acked-by: Alex Deucher <[email protected]> Acked-by: Dmitry Baryshkov <[email protected]> Acked-by: Harry Wentland <[email protected]> Acked-by: Heiko Stuebner <[email protected]> Acked-by: Jani Nikula <[email protected]> Acked-by: Jernej Skrabec <[email protected]> Reviewed-by: Thomas Zimmermann <[email protected]> Reviewed-by: Andi Shyti <[email protected]> Reviewed-by: AngeloGioacchino Del Regno <[email protected]> Reviewed-by: Neil Armstrong <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in amdgpu.hchenxuebing1-6/+3
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in amdgpu_gmc.cchenxuebing1-1/+1
Fix the following errors reported by checkpatch: ERROR: need consistent spacing around '-' (ctx:WxV) Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in jpeg_v2_5.cchenxuebing1-6/+4
Fix the following errors reported by checkpatch: ERROR: space required before the open parenthesis '(' ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in gfx_v9_4.cchenxuebing1-2/+3
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in amdgpu_drv.cchenxuebing1-2/+2
Fix the following errors reported by checkpatch: ERROR: do not initialise globals to 0 Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amd: Clean up errors in amdgpu_vkms.cchenxuebing1-2/+1
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Fix the null pointer when load rlc firmwareMa Jun1-9/+6
If the RLC firmware is invalid because of wrong header size, the pointer to the rlc firmware is released in function amdgpu_ucode_request. There will be a null pointer error in subsequent use. So skip validation to fix it. Signed-off-by: Ma Jun <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Centralize ras cap query to amdgpu_ras_check_supportedHawking Zhang1-77/+93
Move ras capablity check to amdgpu_ras_check_supported. Driver will query ras capablity through psp interace, or vbios interface, or specific ip callbacks. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Query ras capablity from psp v2Hawking Zhang3-0/+38
Instead of traditional atomfirmware interfaces for RAS capability, host driver can query ras capability from psp starting from psp v13_0_6. v2: drop redundant local variable from get_ras_capability. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in amdgpu_rlc.cchenxuebing1-1/+1
Fix the following errors reported by checkpatch: ERROR: space prohibited before that '++' (ctx:WxB) Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amd: Clean up errors in sdma_v2_4.cchenxuebing1-9/+6
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line ERROR: trailing statements should be on next line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amd/amdgpu: Clean up errors in amdgpu_umr.hchenxuebing1-2/+2
Fix the following errors reported by checkpatch: spaces required around that '=' (ctx:VxV) Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in amdgpu_atomfirmware.hchenxuebing1-1/+1
Fix the following errors reported by checkpatch: ERROR: "foo* bar" should be "foo *bar" Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in clearstate_gfx9.hchenxuebing1-18/+9
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in navi10_ih.cchenxuebing1-2/+1
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: check PS, WS indexAlexander Richards8-47/+75
Theoretically, it would be possible for a buggy or malicious VBIOS to overwrite past the bounds of the passed parameters (or its own workspace); add bounds checking to prevent this from happening. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3093 Signed-off-by: Alexander Richards <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Align ras block enum with firmwareHawking Zhang1-0/+2
Driver and firmware share the same ras block enum. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: replace MCA macro with ACA for XGMIYang Wang1-12/+12
use new ACA macro to instead of MCA Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Log deferred error separatelyCandice Li6-58/+139
Separate deferred error from UE and CE and log it individually. Signed-off-by: Candice Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Do bad page retirement for deferred errorsCandice Li1-6/+4
Needs to do bad page retirement for deferred errors. v2: Drop unused dev_info. Signed-off-by: YiPeng Chai <[email protected]> Signed-off-by: Candice Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add xgmi v6.4.0 ACA supportYang Wang1-1/+60
add xgmi v6.4.0 ACA driver support Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add mmhub v1.8 ACA supportYang Wang1-0/+87
v1: add mmhub v1.8 ACA driver support v2: use macro to define smn address value. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add sdma v4.4.2 ACA supportYang Wang1-0/+72
v1: add sdma v4.4.2 ACA driver support v2: use macro to define smn address value. v3: squash in fix for unbalanced irqs Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add gfx v9.4.3 ACA supportYang Wang1-0/+88
v1: add gfx v9.4.3 ACA driver support v2: use macro to define smn address value. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Check extended configuration space register when system uses ↵Ma Jun1-0/+4
large bar Some customer platforms do not enable mmconfig for various reasons, such as bios bug, and therefore cannot access the GPU extend configuration space through mmio. When the system enters the d3cold state and resumes, the amdgpu driver fails to resume because the extend configuration space registers of GPU can't be restored. At this point, Usually we only see some failure dmesg log printed by amdgpu driver, it is difficult to find the root cause. Therefor print a warnning message if the system can't access the extended configuration space register when using large bar. Signed-off-by: Ma Jun <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add umc v12.0 ACA supportYang Wang1-0/+58
add umc v12.0 ACA driver support Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add aca sysfs supportYang Wang4-1/+51
add aca sysfs node support Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add amdgpu ras aca query interfaceYang Wang4-18/+109
v1: add ACA error query interface v2: Add a new helper function to determine whether to use ACA or MCA. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: move kiq_reg_write_reg_wait() out of amdgpu_virt.cAlex Deucher7-70/+74
It's used for more than just SR-IOV now, so move it to amdgpu_gmc.c and rename it to better match the functionality and update the comments in the code paths to better document when each path is used and why. No functional change. Reviewed-by: Shaoyun.liu <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Cc: [email protected]
2024-01-15drm/amdgpu: add new INFO IOCTL query for input powerAlex Deucher1-0/+9
Some chips provide both average and input power. Previously we just exposed average power, add a new query for input power. Example userspace: https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Query boot status if boot failedHawking Zhang1-2/+9
Check and report firmware boot status if it doesn't reach steady status. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add ACA bank dump debugfs supportYang Wang4-0/+136
add ACA bank dump debugfs support Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add ACA kernel hardware error log supportYang Wang1-0/+29
add ACA kernel hardware error log support. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Query boot status if discovery failedHawking Zhang1-1/+5
Check and report boot status if discovery failed. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Add ras helper to query boot errors v2Hawking Zhang3-1/+110
Add ras helper function to query boot time gpu errors. v2: use aqua_vanjaram smn addressing pattern Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: implement RAS ACA driver frameworkYang Wang6-1/+881
v1: implement new RAS ACA driver code framework. v2: - rename aca_bank_set to aca_banks. - rename aca_source_xxx to aca_handle_xxx. v3: Optimize some function implementation details. (from Hawking's suggestion) Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Init pcie_index/data address as fallback (v2)Hawking Zhang1-5/+18
To allow using this helper for indirect access when nbio funcs is not available. For instance, in ip discovery phase. v2: define macro for pcie_index/data/index_hi fallback. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: drop psp v13 query_boot_status implementationHawking Zhang4-99/+0
Will replace it with new implementation to cover boot fails in ip discovery phase. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Yang Wang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Replace DRM_* with dev_* in amdgpu_psp.cHawking Zhang1-69/+75
So kernel message has the device pcie bdf information, which helps issue debugging especially in multiple GPU system. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Auto-validate DMABuf imports in compute VMsFelix Kuehling7-32/+135
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the process_info->kfd_bo_list. There is no explicit KFD API call to validate them or add eviction fences to them. This patch automatically validates and fences dymanic DMABuf imports when they are added to a compute VM. Revalidation after evictions is handled in the VM code. v2: * Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate * Eliminated evicted_user state, use evicted state for VM BOs and user BOs * Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs * Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into amdgpu_vm_validate, outside the vm->status_lock * Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds without KFD v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the reservation with its fences is shared with the export, as long as all imports are from KFD, with the exports already reserved, validated and fenced by the KFD restore worker. v5: Reintroduced separate evicted_user state to simplify the state machine and CS error handling when amdgpu_vm_validate is called without a ticket. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: drop exp hw support check for GC 9.4.3Alex Deucher1-2/+0
No longer needed. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] # 6.7.x
2024-01-15drm/amdgpu: move debug options init prior to amdgpu device initLe Ma1-2/+2
To bring debug options into effect in early initialization phase Signed-off-by: Le Ma <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: add debug flag to place fw bo on vram for frontdoor loadingLe Ma4-2/+10
Use debug_mask=0x8 param to help isolating data path issues on new systems in early phase. v2: rename the flag for explicitness (lijo) Signed-off-by: Le Ma <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>