aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
AgeCommit message (Collapse)AuthorFilesLines
2017-12-18drm/amdgpu: rename amdgpu_gpu_recoverAlex Deucher1-1/+1
add device to the name for consistency. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-15drm/amdgpu: Simplify amdgpu_lockup_timeout usage.Andrey Grodzovsky1-1/+1
With introduction of amdgpu_gpu_recovery we don't need any more to rely on amdgpu_lockup_timeout == 0 for disabling GPU reset. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-15drm/amdgpu: Add gpu_recovery parameterAndrey Grodzovsky1-1/+1
Add new parameter to control GPU recovery procedure. v2: Add auto logic where reset is disabled for bare metal and enabled for SR-IOV. Allow forced reset from debugfs. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-08drm/admgpu: Reduce the usage of soc15ip.hShaoyun Liu1-1/+0
Remove the header where it's not used. Acked-by: Christian Konig <christian.koenig@amd.com> Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-06drm/amd/include:cleanup vega10 header files.Feifei Xu1-1/+1
Remove asic_reg/vega10 folder. Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-06drm/amd/include:cleanup vega10 nbio header files.Feifei Xu1-2/+2
Cleanup asic_reg/vega10/NBIO folder. Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-06drm/amd/include:cleanup vega10 gc header files.Feifei Xu1-2/+2
Cleanup asic_reg/vega10/GC folder. Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-04drm/amdgpu:fix random missing of FLR NOTIFYMonk Liu1-3/+11
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-04drm/amdgpu:implement new GPU recover(v3)Monk Liu1-1/+1
1,new imple names amdgpu_gpu_recover which gives more hint on what it does compared with gpu_reset 2,gpu_recover unify bare-metal and SR-IOV, only the asic reset part is implemented differently 3,gpu_recover will increase hang job karma and mark its entity/context as guilty if exceeds limit V2: 4,in scheduler main routine the job from guilty context will be immedialy fake signaled after it poped from queue and its fence be set with "-ECANCELED" error 5,in scheduler recovery routine all jobs from the guilty entity would be dropped 6,in run_job() routine the real IB submission would be skipped if @skip parameter equales true or there was VRAM lost occured. V3: 7,replace deprecated gpu reset, use new gpu recover Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-12-04drm/amdgpu/virt: implement wait_reset callbacks for vi/aipding1-0/+1
Reviewed-by: Monk Liu <monk.liu@amd.com> Signed-off-by: pding <Pixel.Ding@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-10-19drm/amdgpu: SR-IOV data exchange between PF&VFHorace Chen1-0/+6
SR-IOV need to exchange some data between PF&VF through shared VRAM PF will copy some necessary firmware and information to the shared VRAM. It also requires some information from VF. PF will send a key through mailbox2 to help guest calculate checksum so that it can verify whether the data is correct. So check the data on the specified offset of the shared VRAM, if the checksum is right, read values from it and write some VF information next to the data from PF. Signed-off-by: Horace Chen <horace.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-07-14drm/amdgpu: Support passing amdgpu critical error to host via GPU Mailbox.Gavin Wan1-20/+26
This feature works for SRIOV enviroment. For non-SRIOV enviroment, the trans_error function does nothing. The error information includes error_code (16bit), error_flags(16bit) and error_data(64bit). Since there are not many errors, we keep the errors in an array and transfer all errors to Host before amdgpu initialization function (amdgpu_device_init) exit. Signed-off-by: Gavin Wan <Gavin.Wan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-05-24drm/amdgpu:only call flr_work under infinite timeoutMonk Liu1-6/+9
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-05-24drm/amdgpu:use job* to replace voluntaryMonk Liu1-1/+1
that way we can know which job cause hang and can do per sched reset/recovery instead of all sched. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-05-24drm/amdgpu/virt: change AI ack-irq message to debug levelXiangliang Yu1-1/+1
Change message to debug level as VI does. Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Reviewed-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-05-24drm/amdgpu:need som change on vega10 mailboxMonk Liu1-8/+10
if sriov gpu reset is invoked by job timeout, it is run in a global work-queue which is very slow and better not call msleep ortherwise it takes long time to get back CPU. so make below changes: 1: Change msleep 1 to mdelay 5 2: Ignore the ack fail from pf after time out, because VF FLR will clear ack, sometime VF FLR is done prior to the beginning of poll_ack so we can ignore this ack TODO: Put job_timedout (and the following gpu reset) in a driver thread, instead of the global work_struct. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-05-24drm/amdgpu:fix cannot receive rcv/ack irq bugMonk Liu1-2/+2
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-04-06drm/amdgpu:implement the reset MB func for vega10Monk Liu1-0/+133
they are lack in the bringup stage, we need them for GPU reset feature. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-04-06drm/amdgpu:fix typo for mxgpu_aiMonk Liu1-2/+2
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-29drm/amdgpu/virt: impl mailbox for aiXiangliang Yu1-0/+207
Implement mailbox protocol for AI so that guest vf can communicate with GPU hypervisor. Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>