aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2019-09-13drm/amdgpu: Hook EEPROM table to RASTao Zhou1-28/+81
support eeprom records load and save for ras, move EEPROM records storing to bad page reserving v2: remove redundant check for con->eh_data Signed-off-by: Tao Zhou <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: change ras bps type to eeprom table record structureTao Zhou2-27/+43
change bps type from retired page to eeprom table record, prepare for saving umc error records to eeprom Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/madgpu: Fix EEPROM Checksum calculation.Andrey Grodzovsky1-2/+2
Fix typo which messed up the calculation. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-and-tested-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: Remove clock gating restore.Andrey Grodzovsky1-1/+9
Restoring clock gating break SMU opeartion afterwards, avoid this until this further invistigated with SMU. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-and-tested-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdkfd: Query kfd device info by CHIP id instead of pci device idYong Zhao3-4/+7
This optimizes out the pci device id usage in KFD and makes the code more maintainable. Signed-off-by: Yong Zhao <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: Disable page faults while reading user wptrsFelix Kuehling1-0/+8
These wptrs must be pinned and GPU accessible when this is called from hqd_load functions. So they should never fault. This resolves a circular lock dependency issue involving four locks including the DQM lock and mmap_sem. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oak Zeng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: clean up load TMR sequenceJohn Clements1-6/+0
Removed redundant goto statement Signed-off-by: John Clements <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: enable TA load support in ArcturusJohn Clements1-1/+2
Add support for loading XGMI/RAS FW Signed-off-by: John Clements <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: change r type to int in gmc_v9_0_late_initTao Zhou1-1/+1
change r type from bool to int, suitable for both bool and int return value Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amd/amdgpu: add sw_fini interface for df_funcsJack Zhang4-0/+17
add sw_fini interface of df_funcs. This interface will remove sysfs file of df_cntr_avail function. The old behavior only create sysfs of df_cntr_avail in sw_init, but never remove it for lack of sw_fini interface. With this,driver will report create sysfs fail when it's loaded for the second time. Signed-off-by: Jack Zhang <[email protected]> Reviewed-by: Jonathan Kim <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: init UMC & RSMU register base addressHawking Zhang1-0/+2
UMC RAS feature requires access to UMC & RSMU registers Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu/nbio: switch to amdgpu_nbio_ras_late_init helper functionHawking Zhang4-49/+75
amdgpu_nbio_ras_late_init is used to init nbio specfic ras debugfs/sysfs node and nbio specific interrupt handler. It can be shared among nbio generations Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu/mmhub: switch to amdgpu_mmhub_ras_late_init helper functionHawking Zhang4-32/+60
amdgpu_mmhub_ras_late_init is used to init mmhub specfic ras debugfs/sysfs node and mmhub specific interrupt handler. It can be shared among mmhub generations Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu/sdma: switch to amdgpu_sdma_ras_late_init helper functionHawking Zhang3-41/+55
amdgpu_sdma_ras_late_init is used to init sdma specfic ras debugfs/sysfs node and sdma specific interrupt handler. It can be shared among sdma generations Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu/gfx: switch to amdgpu_gfx_ras_late_init helper functionHawking Zhang3-35/+54
amdgpu_gfx_ras_late_init is used to init gfx specfic ras debugfs/sysfs node and gfx specific interrupt handler. It can be shared among gfx generations Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu/gmc: switch to amdgpu_gmc_ras_late_init helper functionHawking Zhang3-34/+53
amdgpu_gmc_ras_late_init is used to init gmc specfic ras debugfs/sysfs node and gmc specific interrupt handler. It can be shared among gmc generations. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: set ip specific ras interface pointer to NULL after free itHawking Zhang5-7/+24
to prevent access to dangling pointers Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13dmr/amdgpu: Add system auto reboot to RAS.Andrey Grodzovsky3-2/+23
In case of RAS error allow user configure auto system reboot through ras_ctrl. This is also part of the temproray work around for the RAS hang problem. v4: Use latest kernel API for disk sync. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: Avoid HW GPU reset for RAS.Andrey Grodzovsky12-42/+155
Problem: Under certain conditions, when some IP bocks take a RAS error, we can get into a situation where a GPU reset is not possible due to issues in RAS in SMU/PSP. Temporary fix until proper solution in PSP/SMU is ready: When uncorrectable error happens the DF will unconditionally broadcast error event packets to all its clients/slave upon receiving fatal error event and freeze all its outbound queues, err_event_athub interrupt will be triggered. In such case and we use this interrupt to issue GPU reset. THe GPU reset code is modified for such case to avoid HW reset, only stops schedulers, deatches all in progress and not yet scheduled job's fences, set error code on them and signals. Also reject any new incoming job submissions from user space. All this is done to notify the applications of the problem. v2: Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c Remove print param from amdgpu_ras_query_error_count v3: Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset for other XGMI hive memebers. Signed-off-by: Andrey Grodzovsky <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: Fix bugs in amdgpu_device_gpu_recover in XGMI case.Andrey Grodzovsky1-13/+10
Issue 1: In XGMI case amdgpu_device_lock_adev for other devices in hive was called to late, after access to their repsective schedulers. So relocate the lock to the begining of accessing the other devs. Issue 2: Using amdgpu_device_ip_need_full_reset to switch the device list from all devices in hive to the single 'master' device who owns this reset call is wrong because when stopping schedulers we iterate all the devices in hive but when restarting we will only reactivate the 'master' device. Also, in case amdgpu_device_pre_asic_reset conlcudes that full reset IS needed we then have to stop schedulers for all devices in hive and not only the 'master' but with amdgpu_device_ip_need_full_reset we already missed the opprotunity do to so. So just remove this logic and always stop and start all schedulers for all devices in hive. Also minor cleanup and print fix. v4: Minor coding style fix. Signed-off-by: Andrey Grodzovsky <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: remove amdgpu_cs_try_evictChristian König2-71/+1
Trying to evict things from the current working set doesn't work that well anymore because of per VM BOs. Rely on reserving VRAM for page tables to avoid contention. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: reserve at least 4MB of VRAM for page tables v2Christian König3-8/+22
This hopefully helps reduce the contention for page tables. v2: adjust maximum reported VRAM size as well Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: use moving fence instead of exclusive for VM updatesChristian König1-1/+1
Make VM updates depend on the moving fence instead of the exclusive one. Makes it less likely to actually have a dependency. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: only apply gds clearing workaround when ras is supportedHawking Zhang1-0/+4
gds clearing workaround should only be applied on asics that support gfx ras Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: fix memory leak when ras is not supported on specific ip blockHawking Zhang4-4/+7
free ras_if if ras is not supported Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: check mmhub_funcs pointer before refering to itHawking Zhang1-1/+1
mmhub callback functions are not initialized for all the ASICs Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: Remove unnecessary TLB workaround (v2)Felix Kuehling1-16/+1
This workaround is better handled in user mode in a way that doesn't require allocating extra memory and breaking userptr BOs. The TLB bug is a performance bug, not a functional or security bug. Hence it is safe to remove this kernel part of the workaround to allow a better workaround using only virtual address alignments in user mode. v2: Removed VI_BO_SIZE_ALIGN definition Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: Use optimal mtypes and PTE bits for ArcturusFelix Kuehling2-2/+22
For compute VRAM allocations on Arturus use the new RW mtype for non-coherent local memory, CC mtype for coherent local memory and PTE_SNOOPED bit for invalidating non-dirty cache lines on remote XGMI mappings. Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Tested-by: Amber Lin <[email protected]> Reviewed-by: Shaoyun Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: Determing PTE flags separately for each mapping (v3)Felix Kuehling2-17/+24
The same BO can be mapped with different PTE flags by different GPUs. Therefore determine the PTE flags separately for each mapping instead of storing them in the KFD buffer object. Add a helper function to determine the PTE flags to be extended with ASIC and memory-type-specific logic in subsequent commits. v2: Split Arcturus-specific MTYPE changes into separate commit v3: Fix return type of get_pte_flags to uint64_t Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Shaoyun Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: Support new arcturus mtypeOak Zeng1-0/+3
Arcturus repurposed mtype WC to RW. Modify gmc functions to support the new mtype Signed-off-by: Oak Zeng <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Shaoyun Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: switch to amdgpu_ras_late_init for nbio v7_4 (v2)Hawking Zhang1-1/+12
call helper function in late init phase to handle ras init for nbio ip block v2: init local var r to 0 in case the function return failure on asics that don't have ras_late_init implementation Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: add ras_late_init callback function for nbio v7_4 (v3)Hawking Zhang2-0/+47
ras_late_init callback function will be used to do common ras init in late init phase. v2: call ras_late_fini to do cleanup when fails to enable interrupt v3: rename sysfs/debugfs node name to pcie_bif_xxx Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: add mmhub ras_late_init callback function (v2)Hawking Zhang3-22/+35
The function will be called in late init phase to do mmhub ras init v2: check ras_late_init function pointer before invoking the function Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: switch to amdgpu_ras_late_init for gmc v9 block (v2)Hawking Zhang1-112/+47
call helper function in late init phase to handle ras init for gmc ip block v2: call ras_late_fini to do clean up when fail to enable interrupt Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: switch to amdgpu_ras_late_init for sdma v4 block (v2)Hawking Zhang1-74/+24
call helper function in late init phase to handle ras init for sdma ip block v2: call ras_late_fini to do clean up when fail to enable interrupt Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: switch to amdgpu_ras_late_init for gfx v9 block (v2)Hawking Zhang1-71/+21
call helper function in late init phase to handle ras init for gfx ip block v2: call ras_late_fini to do clean up when fail to enable interrupt Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: add helper function to do common ras_late_init/fini (v3)Hawking Zhang2-0/+79
In late_init for ras, the helper function will be used to 1). disable ras feature if the IP block is masked as disabled 2). send enable feature command if the ip block was masked as enabled 3). create debugfs/sysfs node per IP block 4). register interrupt handler v2: check ih_info.cb to decide add interrupt handler or not v3: add ras_late_fini for cleanup all the ras fs node and remove interrupt handler Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: poll ras_controller_irq and err_event_athub_irq statusHawking Zhang1-0/+12
For the hardware that can not enable BIF ring for IH cookies for both ras_controller_irq and err_event_athub_irq, the driver has to poll the status register in irq handling and ack the hardware properly when there is interrupt triggered Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: add ras_controller and err_event_athub interrupt supportHawking Zhang3-0/+143
Ras controller interrupt and Ras err event athub interrupt are two dedicated interrupts for RAS support. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu/nbio: add functions to query ras specific interrupt statusHawking Zhang2-0/+34
ras_controller_interrupt and err_event_interrupt are ras specific interrupts. add functions to check their status and ack them if they are generated. both funcitons should only be invoked in ISR when BIF ring is disabled or even not initialized. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: switch to new amdgpu_nbio structureHawking Zhang22-148/+102
no functional change, just switch to new structures Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: add new amdgpu nbio header fileHawking Zhang1-0/+87
More nbio funcitonalities will be added and nbio could be treated as an ip block like gfx/sdma.etc Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-11drm/amdgpu: switch to gem vma offset managerGerd Hoffmann1-1/+1
Pass gem vma_offset_manager to ttm_bo_device_init(), so ttm uses it instead of its own embedded struct. This makes some gem functions (specifically drm_gem_object_lookup) work on ttm objects. Signed-off-by: Gerd Hoffmann <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Christian König <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-09-11drm/ttm: turn ttm_bo_device.vma_manager into a pointerGerd Hoffmann1-0/+1
Rename the embedded struct vma_offset_manager, new name is _vma_manager. ttm_bo_device.vma_manager changed to a pointer. The ttm_bo_device_init() function gets an additional vma_manager argument which allows to initialize ttm with a different vma manager. When passing NULL the embedded _vma_manager is used. All callers are updated to pass NULL, so the behavior doesn't change. Signed-off-by: Gerd Hoffmann <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Christian König <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-09-06Merge tag 'drm-next-5.4-2019-08-30' of ↵Dave Airlie22-71/+1460
git://people.freedesktop.org/~agd5f/linux into drm-next drm-next-5.4-2019-08-30: amdgpu: - Add DC support for Renoir - Add some GPUVM hw bug workarounds - add support for the smu11 i2c controller - GPU reset vram lost bug fixes - Navi1x powergating fixes - Navi12 power fixes - Renoir power fixes - Misc bug fixes and cleanups Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-08-30drm/amdgpu: Fix undefined dm_ip_block for navi12Petr Cvek1-0/+2
There is missing "if defined" CONFIG_DRM_AMD_DC block for non DC configurations. This will cause link error. The patch is fixing that. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=110979 Signed-off-by: Petr Cvek <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-30drm/amdgpu: fix no interrupt issue for renoir emu (v2)Aaron Liu1-8/+10
In renoir's vega10_ih model, there's a security change in mmIH_CHICKEN register, that limits IH to use physical address (FBPA, GPA) directly. Those chicken bits need to be programmed first. Signed-off-by: Aaron Liu <[email protected]> Reviewed-by: Huang Rui <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-30drm/amdgpu: Handle job is NULL use case in amdgpu_device_gpu_recoverAndrey Grodzovsky1-6/+4
This should be checked at all places job is accessed. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-29drm/amdgpu: Enable DC on RenoirRoman Li2-0/+9
Enable DC support for renoir. Acked-by: Harry Wentland <[email protected]> Signed-off-by: Roman Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-29drm/amdgpu: Initialize and update SDMA power gatingPrike Liang1-0/+1
Init SDMA HW base configuration and enable idle INT for rn. Signed-off-by: Prike Liang <[email protected]> Reviewed-by: Aaron Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>