aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2019-08-02drm/amdgpu: replace AMDGPU_RAS_UE with AMDGPU_RAS_SUCCESSTao Zhou4-4/+4
ce can also trigger interrupt, and even both ce and ue error can be found in one ras query, distinguishing between ce and ue in interrupt handler is uncessary. Signed-off-by: Tao Zhou <[email protected]> Suggested-by: Guchun Chen <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: only uncorrectable error needs gpu resetTao Zhou1-1/+5
we only read error information for correctable error in interrupt handler, gpu reset is unnecessary since there is no data lost in correctable error Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: update the calc algorithm of umc ecc error countTao Zhou1-4/+6
the initial value of ecc error count can be adjusted Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: implement umc ras init functionTao Zhou2-0/+39
enable umc ce interrupt and initialize ecc error count Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: support ce interrupt in ras moduleTao Zhou1-4/+8
correctable error can also trigger interrupt in some ras blocks Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: add error address query for umc rasTao Zhou2-0/+10
umc error address query can get ce/ue error address and clear error status Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: apply umc_for_each_channel macro to umc_6_1Tao Zhou1-56/+28
use umc_for_each_channel to make code simpler Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: add macro of umc for each channelTao Zhou1-0/+23
common function for all umc versions, loop for each umc channel is a frequent used operation in umc block, define it as a macro to simplify code Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: initialize new parameters and functions for amdgpu_umc structureTao Zhou3-3/+17
add initialization for new members of amdgpu_umc structure Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: add more parameters and functions to amdgpu_umc structureTao Zhou2-0/+15
expose more parameters and functions of specific umc version to common umc layer, so amdgpu_umc layer and other blocks could access them Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: remove the clear of MCA_ADDRTao Zhou1-2/+0
clearing MCA_STATUS is enough to reset the whole MCA, writing zero to MCA_ADDR is unnecessary Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: fix unsigned variable instance compared to less than zeroColin Ian King1-1/+2
Currenly the error check on variable instance is always false because it is a uint32_t type and this is never less than zero. Fix this by making it an int type. Addresses-Coverity: ("Unsigned compared against 0") Fixes: 7d0e6329dfdc ("drm/amdgpu: update more sdma instances irq support") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdkfd: Extend CU mask to 8 SEs (v3)Jay Cornwall1-0/+4
Following bitmap layout logic introduced by: "drm/amdgpu: support get_cu_info for Arcturus". v2: squash in fixup for gfx_v9_0.c (Alex) v3: squash in debug print output fix Signed-off-by: Jay Cornwall <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: support get_cu_info for ArcturusLe Ma1-7/+28
This change is because SE/SH layout on Arcturus is 8*1, different from 4*2(or 4*1) on Vega ASICs. Currently the cu bitmap array is 4x4 size, and besides the bitmap is used widely across SW stack. To mostly reduce the scale of impact, we make the cu bitmap array compatible with SE/SH layout on Arcturus. Then the store of cu bits of each shader array for Arcturus will be like below: SE0,SH0 --> bitmap[0][0] SE1,SH0 --> bitmap[1][0] SE2,SH0 --> bitmap[2][0] SE3,SH0 --> bitmap[3][0] SE4,SH0 --> bitmap[0][1] SE5,SH0 --> bitmap[1][1] SE6,SH0 --> bitmap[2][1] SE7,SH0 --> bitmap[3][1] Signed-off-by: Le Ma <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: Fix pcie_bw on Vega20Kent Russell1-8/+52
The registers used for VG20 are different in that certain performance counters were split off to TXCLK3/4. Vega10/12 doesn't have this, so add a new vg20_get_pcie_usage to reflect this change. Signed-off-by: Kent Russell <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: Add amdgpu_asic_funcs.reset_method for Vega20Andrey Grodzovsky1-0/+1
Fixes GPU reset crash. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: Mark KFD VRAM allocations for wipe on releaseFelix Kuehling1-1/+1
Memory used by KFD applications can contain sensitive information that should not be leaked to other processes. The current approach to prevent leaks is to clear VRAM at allocation time. This is not effective because memory can be reused in other ways without being cleared. Synchronously clearing memory on the allocation path also carries a significant performance penalty. Stop clearing memory at allocation time. Instead mark the memory for wipe on release. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: Implement VRAM wipe on releaseFelix Kuehling4-3/+56
Wipe VRAM memory containing sensitive data when moving or releasing BOs. Clearing the memory is pipelined to minimize any impact on subsequent memory allocation latency. Use of a poison value should help debug future use-after-free bugs. When moving BOs, the existing ttm_bo_pipelined_move ensures that the memory won't be reused before being wiped. When releasing BOs, the BO is fenced with the memory fill operation, which results in queuing the BO for a delayed delete. v2: Move amdgpu_amdkfd_unreserve_memory_limit into amdgpu_bo_release_notify so that KFD can use memory that's still being cleared in the background Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: fix double ucode load by PSP(v3)Monk Liu1-21/+38
previously the ucode loading of PSP was repreated, one executed in phase_1 init/re-init/resume and the other in fw_loading routine Avoid this double loading by clearing ip_blocks.status.hw in suspend or reset prior to the FW loading and any block's hw_init/resume v2: still do the smu fw loading since it is needed by bare-metal v3: drop the change in reinit_early_sriov, just clear all block's status.hw in the head place and set the status.hw after hw_init done is enough Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Emily Deng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: fix incorrect judge on sos fw versionMonk Liu1-1/+1
for SRIOV the SOS fw of PSP is loaded in hypervisor thus guest won't tell the version of it, and judging feature by reading the sos fw version in guest side is completely wrong Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: cleanup vega10 SRIOV code pathMonk Liu11-118/+38
we can simplify all those unnecessary function under SRIOV for vega10 since: 1) PSP L1 policy is by force enabled in SRIOV 2) original logic always set all flags which make itself a dummy step besides, 1) the ih_doorbell_range set should also be skipped for VEGA10 SRIOV. 2) the gfx_common registers should also be skipped for VEGA10 SRIOV. Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02Merge tag 'drm-fixes-5.3-2019-07-31' of ↵Dave Airlie4-41/+64
git://people.freedesktop.org/~agd5f/linux into drm-fixes drm-fixes-5.3-2019-07-31: amdgpu: - Fix temperature granularity for navi - Fix stable pstate setting for navi - Fix VCN DPM enablement on navi - Fix error handling on CS ioctl when processing dependencies - Fix possible information leak in debugfs amdkfd: - fix memory alignment for VegaM Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-07-31drm/amdkfd: enable KFD support for navi14Alex Deucher1-0/+1
Same as navi10. Reviewed-by: Xiaojie Yuan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: disable inject for failed subblocks of gfxDennis Li1-116/+165
some subblocks of gfx fail in inject test, disable them Signed-off-by: Dennis Li <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: support gfx ras error injection and err_cnt queryDennis Li2-3/+18
check gfx error count in both ras querry function and ras interrupt handler. gfx ras is still disabled by default due to known stability issue found in gpu reset. Signed-off-by: Dennis Li <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: add RAS callback for gfxDennis Li2-1/+531
Add functions for RAS error inject and query error counter Signed-off-by: Dennis Li <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: add define for gfx ras subblockDennis Li2-0/+431
Signed-off-by: Dennis Li <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: remove ras_reserve_vram in ras injectionTao Zhou1-11/+10
error injection address is not in gpu address space Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: add check for ras error typeTao Zhou1-3/+8
only ue and ce errors are supported Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: update interrupt callback for all ras clientsTao Zhou3-2/+6
add err_data parameter in interrupt cb for ras clients Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: allow ras interrupt callback to return error dataTao Zhou2-21/+22
add error data as parameter for ras interrupt cb and process it Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: query umc ras error addressTao Zhou1-0/+80
query umc ras error address, translate it to gpu 4k page view and save it. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: add structures for umc error address translationTao Zhou2-0/+12
add related registers, callback function and channel index table Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: add support for recording ras error addressTao Zhou2-1/+3
more than one error address may be recorded in one query Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: update algorithm of umc uncorrectable error countingTao Zhou1-6/+6
remove the check of ErrorCodeExt v2: refine the if condition for ue counting Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: switch to amdgpu_umc structureTao Zhou4-6/+16
create new amdgpu_umc structure to for more umc settings in future and switch to the new structure Signed-off-by: Tao Zhou <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: use 64bit operation macros for umcTao Zhou1-17/+8
replace some 32bit macros with 64bit operations to simplify code Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: add RREG64/WREG64(_PCIE) operationsTao Zhou3-0/+129
add 64 bits register access functions v2: implement 64 bit functions in low level Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: add ras error count after each query (v2)Tao Zhou1-0/+11
v1: increase ras ce/ue error count v2: log the number of correctable and uncorrectable errors Signed-off-by: Tao Zhou <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: querry umc error countHawking Zhang2-1/+13
check umc error count in both ras querry function and ras interrupt handler Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: init umc v6_1 functions for vega20Hawking Zhang1-0/+14
init umc callback function for vega20 in sw early init phase Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: add umc v6_1 query error count supportHawking Zhang3-0/+205
Implement umc query_ras_error_count function to support querry both correctable and uncorrectable error Signed-off-by: Hawking Zhang <[email protected]> Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: add amdgpu_umc_functions structureHawking Zhang2-0/+31
This is common structure as UMC callback function Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: init RSMU and UMC ip base address for vega20Hawking Zhang2-0/+4
the driver needs to program RSMU and UMC registers to support vega20 RAS feature Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: move some ras data structure to amdgpu_ras.hHawking Zhang2-69/+68
These are common structures that can be included by IP specific source files Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: drop drmP.h from vcn_v2_5.cAlex Deucher1-1/+1
Unused. Acked-by: Sam Ravnborg <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: drop drmP.h from vcn_v2_0.cAlex Deucher1-2/+2
And fix the fallout. Acked-by: Sam Ravnborg <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: drop drmP.h from sdma_v5_0.cAlex Deucher1-3/+6
And fix the fallout. Acked-by: Sam Ravnborg <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: drop drmP.h from nv.cAlex Deucher1-1/+2
And fix up the fallout. Acked-by: Sam Ravnborg <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-31drm/amdgpu: drop drmP.h from navi10_ih.cAlex Deucher1-1/+2
And fix the fallout. Acked-by: Sam Ravnborg <[email protected]> Signed-off-by: Alex Deucher <[email protected]>