aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
AgeCommit message (Collapse)AuthorFilesLines
2019-12-23drm/amdgpu: use true, false for bool variable in mxgpu_ai.czhengbin1-2/+2
Fixes coccicheck warning: drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c:253:2-20: WARNING: Assignment of 0/1 to bool variable drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c:265:2-20: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <[email protected]> Signed-off-by: zhengbin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-12-11drm/amd/powerplay: enable pp one vf mode for vega10Yintian Tao1-78/+0
Originally, due to the restriction from PSP and SMU, VF has to send message to hypervisor driver to handle powerplay change which is complicated and redundant. Currently, SMU and PSP can support VF to directly handle powerplay change by itself. Therefore, the old code about the handshake between VF and PF to handle powerplay will be removed and VF will use new the registers below to handshake with SMU. mmMP1_SMN_C2PMSG_101: register to handle SMU message mmMP1_SMN_C2PMSG_102: register to handle SMU parameter mmMP1_SMN_C2PMSG_103: register to handle SMU response v2: remove module parameter pp_one_vf v3: fix the parens v4: forbid vf to change smu feature v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute v6: change skip condition at vega10_copy_table_to_smc Signed-off-by: Yintian Tao <[email protected]> Acked-by: Evan Quan <[email protected]> Reviewed-by: Kenneth Feng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-08-02drm/amdgpu: cleanup vega10 SRIOV code pathMonk Liu1-15/+0
we can simplify all those unnecessary function under SRIOV for vega10 since: 1) PSP L1 policy is by force enabled in SRIOV 2) original logic always set all flags which make itself a dummy step besides, 1) the ih_doorbell_range set should also be skipped for VEGA10 SRIOV. 2) the gfx_common registers should also be skipped for VEGA10 SRIOV. Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-06-11drm/amdgpu: Hardcode reg access using L1 securityTrigger Huang1-9/+6
Under Vega10 SR-IOV VF, L1 register access mode should be enabled by default as the non-security VF will no longer be supported. Signed-off-by: Trigger Huang <[email protected]> Reviewed-by: Emily Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-05-24drm/amdgpu: init vega10 SR-IOV reg access modeTrigger Huang1-0/+19
Set different register access mode according to the features provided by firmware Signed-off-by: Trigger Huang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-05-24drm/amdgpu: enable separate timeout setting for every ring type V4Evan Quan1-1/+1
Every ring type can have its own timeout setting. - V2: update lockup_timeout parameter format and cosmetic fixes - V3: invalidate 0 and negative values - V4: update lockup_timeout parameter format Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-05-06drm/amdgpu: Add IDH_QUERY_ALIVE event for SR-IOVTrigger Huang1-0/+3
SR-IOV host side will send IDH_QUERY_ALIVE to guest VM to check if this guest VM is still alive (not destroyed). The only thing guest KMD need to do is to send ACK back to host. Signed-off-by: Trigger Huang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-04-10drm/amdgpu: support dpm level modification under virtualization v3Yintian Tao1-0/+78
Under vega10 virtualuzation, smu ip block will not be added. Therefore, we need add pp clk query and force dpm level function at amdgpu_virt_ops to support the feature. v2: add get_pp_clk existence check and use kzalloc to allocate buf v3: return -ENOMEM for allocation failure and correct the coding style Signed-off-by: Yintian Tao <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-02-13drm/amdgpu: tighten gpu_recover in mailbox_flr to avoid duplicate recover in ↵wentalou1-1/+2
sriov sriov's gpu_recover inside xgpu_ai_mailbox_flr_work would cause duplicate recover in TDR. TDR's gpu_recover would be triggered by amdgpu_job_timedout, that could avoid vk-cts failure by unexpected recover. Signed-off-by: Wentao Lou <[email protected]> Acked-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-01-14drm/amdgpu/sriov:Correct pfvf exchange logicEmily Deng1-1/+1
The pfvf exchange need be in exclusive mode. And add pfvf exchange in gpu reset. Signed-off-by: Emily Deng <[email protected]> Reviewed-By: Xiangliang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-08-27drm/amdgpu: cleanup GPU recovery check a bit (v2)Christian König1-2/+2
Check if we should call the function instead of providing the forced flag. v2: rebase on KFD changes (Alex) Signed-off-by: Christian König <[email protected]> Acked-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-05-15drm/amdgpu/sriov: Need to set in_gpu_reset flag to back after gpu resetEmily Deng1-1/+3
After host os reset gpu reset, need to set flag in_gpu_reset to zero. Signed-off-by: Emily Deng <[email protected]> Reviewed-by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-03-22drm/amdgpu: fix spelling mistake: "asssert" -> "assert"Colin Ian King1-1/+1
Trivial fix to spelling mistake in pr_err error message text Acked-by: Christian König <[email protected]> Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-03-14drm/amdgpu: Move IH clientid defs to separate fileOak Zeng1-2/+2
This is preparation for sharing client ID definitions between amdgpu and amdkfd Signed-off-by: Oak Zeng <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-03-14drm/amdgpu: refactoring mailbox to fix TDR handshake bugs(v2)Monk Liu1-93/+103
this patch actually refactor mailbox implmentations, and all below changes are needed together to fix all those mailbox handshake issues exposured by heavey TDR test. 1)refactor all mailbox functions based on byte accessing for mb_control reason is to avoid touching non-related bits when writing trn/rcv part of mailbox_control, this way some incorrect INTR sent to hypervisor side could be avoided, and it fixes couple handshake bug. 2)trans_msg function re-impled: put a invalid logic before transmitting message to make sure the ACK bit is in a clear status, otherwise there is chance that ACK asserted already before transmitting message and lead to fake ACK polling. (hypervisor side have some tricks to workaround ACK bit being corrupted by VF FLR which hase an side effects that may make guest side ACK bit asserted wrongly), and clear TRANS_MSG words after message transferred. 3)for mailbox_flr_work, it is also re-worked: it takes the mutex lock first if invoked, to block gpu recover's participate too early while hypervisor side is doing VF FLR. (hypervisor sends FLR_NOTIFY to guest before doing VF FLR and sentds FLR_COMPLETE after VF FLR done, and the FLR_NOTIFY will trigger interrupt to guest which lead to mailbox_flr_work being invoked) This can avoid the issue that mailbox trans msg being cleared by its VF FLR. 4)for mailbox_rcv_irq IRQ routine, it should only peek msg and schedule mailbox_flr_work, instead of ACK to hypervisor itself, because FLR_NOTIFY msg sent from hypervisor side doesn't need VF's ACK (this is because VF's ACK would lead to hypervisor clear its trans_valid/msg, and this would cause handshake bug if trans_valid/msg is cleared not due to correct VF ACK but from a wrong VF ACK like this "FLR_NOTIFY" one) This fixed handshake bug that sometimes GUEST always couldn't receive "READY_TO_ACCESS_GPU" msg from hypervisor. 5)seperate polling time limite accordingly: POLL ACK cost no more than 500ms POLL MSG cost no more than 12000ms POLL FLR finish cost no more than 500ms 6) we still need to set adev into in_gpu_reset mode after we received FLR_NOTIFY from host side, this can prevent innocent app wrongly succesed to open amdgpu dri device. FLR_NOFITY is received due to an IDLE hang detected from hypervisor side which indicating GPU is already die in this VF. v2: use MACRO as the offset of mailbox_control register don't test if NOTIFY_CMPL event in rcv_msg since it won't recieve that message anymore Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Pixel Ding <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-12-18drm/amdgpu: rename amdgpu_gpu_recoverAlex Deucher1-1/+1
add device to the name for consistency. Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-12-15drm/amdgpu: Simplify amdgpu_lockup_timeout usage.Andrey Grodzovsky1-1/+1
With introduction of amdgpu_gpu_recovery we don't need any more to rely on amdgpu_lockup_timeout == 0 for disabling GPU reset. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-12-15drm/amdgpu: Add gpu_recovery parameterAndrey Grodzovsky1-1/+1
Add new parameter to control GPU recovery procedure. v2: Add auto logic where reset is disabled for bare metal and enabled for SR-IOV. Allow forced reset from debugfs. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-12-08drm/admgpu: Reduce the usage of soc15ip.hShaoyun Liu1-1/+0
Remove the header where it's not used. Acked-by: Christian Konig <[email protected]> Signed-off-by: Shaoyun Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-12-06drm/amd/include:cleanup vega10 header files.Feifei Xu1-1/+1
Remove asic_reg/vega10 folder. Signed-off-by: Feifei Xu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-12-06drm/amd/include:cleanup vega10 nbio header files.Feifei Xu1-2/+2
Cleanup asic_reg/vega10/NBIO folder. Signed-off-by: Feifei Xu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-12-06drm/amd/include:cleanup vega10 gc header files.Feifei Xu1-2/+2
Cleanup asic_reg/vega10/GC folder. Signed-off-by: Feifei Xu <[email protected]> Signed-off-by: Feifei Xu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-12-04drm/amdgpu:fix random missing of FLR NOTIFYMonk Liu1-3/+11
Signed-off-by: Monk Liu <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-12-04drm/amdgpu:implement new GPU recover(v3)Monk Liu1-1/+1
1,new imple names amdgpu_gpu_recover which gives more hint on what it does compared with gpu_reset 2,gpu_recover unify bare-metal and SR-IOV, only the asic reset part is implemented differently 3,gpu_recover will increase hang job karma and mark its entity/context as guilty if exceeds limit V2: 4,in scheduler main routine the job from guilty context will be immedialy fake signaled after it poped from queue and its fence be set with "-ECANCELED" error 5,in scheduler recovery routine all jobs from the guilty entity would be dropped 6,in run_job() routine the real IB submission would be skipped if @skip parameter equales true or there was VRAM lost occured. V3: 7,replace deprecated gpu reset, use new gpu recover Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-12-04drm/amdgpu/virt: implement wait_reset callbacks for vi/aipding1-0/+1
Reviewed-by: Monk Liu <[email protected]> Signed-off-by: pding <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-10-19drm/amdgpu: SR-IOV data exchange between PF&VFHorace Chen1-0/+6
SR-IOV need to exchange some data between PF&VF through shared VRAM PF will copy some necessary firmware and information to the shared VRAM. It also requires some information from VF. PF will send a key through mailbox2 to help guest calculate checksum so that it can verify whether the data is correct. So check the data on the specified offset of the shared VRAM, if the checksum is right, read values from it and write some VF information next to the data from PF. Signed-off-by: Horace Chen <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-07-14drm/amdgpu: Support passing amdgpu critical error to host via GPU Mailbox.Gavin Wan1-20/+26
This feature works for SRIOV enviroment. For non-SRIOV enviroment, the trans_error function does nothing. The error information includes error_code (16bit), error_flags(16bit) and error_data(64bit). Since there are not many errors, we keep the errors in an array and transfer all errors to Host before amdgpu initialization function (amdgpu_device_init) exit. Signed-off-by: Gavin Wan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-05-24drm/amdgpu:only call flr_work under infinite timeoutMonk Liu1-6/+9
Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-05-24drm/amdgpu:use job* to replace voluntaryMonk Liu1-1/+1
that way we can know which job cause hang and can do per sched reset/recovery instead of all sched. Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-05-24drm/amdgpu/virt: change AI ack-irq message to debug levelXiangliang Yu1-1/+1
Change message to debug level as VI does. Signed-off-by: Xiangliang Yu <[email protected]> Reviewed-by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-05-24drm/amdgpu:need som change on vega10 mailboxMonk Liu1-8/+10
if sriov gpu reset is invoked by job timeout, it is run in a global work-queue which is very slow and better not call msleep ortherwise it takes long time to get back CPU. so make below changes: 1: Change msleep 1 to mdelay 5 2: Ignore the ack fail from pf after time out, because VF FLR will clear ack, sometime VF FLR is done prior to the beginning of poll_ack so we can ignore this ack TODO: Put job_timedout (and the following gpu reset) in a driver thread, instead of the global work_struct. Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Xiangliang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-05-24drm/amdgpu:fix cannot receive rcv/ack irq bugMonk Liu1-2/+2
Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Xiangliang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-04-06drm/amdgpu:implement the reset MB func for vega10Monk Liu1-0/+133
they are lack in the bringup stage, we need them for GPU reset feature. Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-04-06drm/amdgpu:fix typo for mxgpu_aiMonk Liu1-2/+2
Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-03-29drm/amdgpu/virt: impl mailbox for aiXiangliang Yu1-0/+207
Implement mailbox protocol for AI so that guest vf can communicate with GPU hypervisor. Signed-off-by: Xiangliang Yu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Monk Liu <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>