aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2022-03-15drm/amdgpu: only check for _PR3 on dGPUsAlex Deucher1-2/+4
We don't support runtime pm on APUs. They support more dynamic power savings using clock and powergating. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-15drm/amdgpu: drop xmgi23 error query/reset supportHawking Zhang1-22/+0
xgmi_ras is only initialized when host to GPU interface is PCIE. in such case, xgmi23 is disabled and protected by security firmware. Host access will results to security violation Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-15drm/amdgpu: fix aldebaran xgmi topology for vfJonathan Kim1-2/+4
VFs must also distinguish whether or not the TA supports full duplex or half duplex link records in order to report the correct xGMI topology. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-15drm/amdgpu: message smu to update bad channel infoStanley.Yang5-2/+42
It should notice SMU to update bad channel info when detected uncorrectable error in UMC block Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-15drm/amdgpu: Disable baco dummy modeLijo Lazar1-0/+15
On aldebaran, BACO dummy mode may be enabled during reset. Disable it during resume. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-09drm/amdgpu: fix a wrong ib referenceLang Yu1-5/+2
It should be p->job->ibs[j] instead of p->job->ibs[i] here. Fixes: cdc7893fc93f19 ("drm/amdgpu: use job and ib structures directly in CS parsers") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdgpu: initialize the vmid_wait with the stub fenceChristian König2-1/+2
This way we don't need to check for NULL any more. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdgpu: properly embed the IBs into the jobChristian König2-8/+5
We now have standard macros for that. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdgpu: use job and ib structures directly in CS parsersChristian König9-114/+130
Instead of providing the ib index provide the job and ib pointers directly to the patch and parse functions for UVD and VCE. Also move the set/get functions for IB values to the IB declerations. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdgpu: header cleanupChristian König10-100/+132
No function change, just move a bunch of definitions from amdgpu.h into separate header files. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amd/amdgpu: set disabled vcn to no_schdulerJingwen Chen1-0/+2
[Why] after the reset domain introduced, the sched.ready will be init after hw_init, which will overwrite the setup in vcn hw_init, and lead to vcn ib test fail. [How] set disabled vcn to no_scheduler Fixes: 5fd8518d187ed0 ("drm/amdgpu: Move scheduler init to after XGMI is ready") Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdgpu: install ctx entities with cmpxchgChristian König1-1/+7
Since we removed the context lock we need to make sure that not two threads are trying to install an entity at the same time. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 461fa7b0ac565e ("drm/amdgpu: remove ctx->lock") Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdkfd: implement get_atc_vmid_pasid_mapping_info for gfx10.3Yifan Zhang1-1/+15
This patch implements get_atc_vmid_pasid_mapping_info for gfx10.3 Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdgpu/vcn: Add vcn firmware logRuijing Dong9-1/+163
vcn fwlog is for debugging purpose only, by default, it is disabled. Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdgpu/vcn: Update fw shared data structureRuijing Dong5-35/+61
Add fw log in fw shared data structure. Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdgpu: Add DFC CAP support for aldebaranDavid Yu2-1/+2
Add DFC CAP support for aldebaran Initialize cap microcode in psp_init_sriov_microcode, the ta microcode will be initialized in psp_vxx_init_microcode Signed-off-by: David Yu <David.Yu@amd.com> Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdgpu: Set correct DMA mask for aldebaranHarish Kasiviswanathan1-3/+4
Aldebaran has 48-bit physical address support Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdgpu: Refactor mode2 reset logic for v13.0.2Lijo Lazar2-20/+54
Use IP version and refactor reset logic to apply to a list of devices. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: remove redundant null checkWeiguo Li1-6/+0
Remove the redundant null check since the caller ensures that 'ctx' is never NULL. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Weiguo Li <liwg06@foxmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu/sdma5: drop unused cyan skillfish firmwareAlex Deucher1-7/+1
Leftover from bring up. Not used anymore. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu/gfx10: drop unused cyan skillfish firmwareAlex Deucher1-11/+1
Leftover from bring up. Not used anymore. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: remove unused gpu_info firmwaresAlex Deucher1-23/+0
These were leftover from bring up and are no longer necessary. The information is available via the IP discovery table. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Use IP versions in convert_tiling_flags_to_modifier()Alex Deucher1-3/+3
Rather than checking the asic_type. Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: enable gfxoff routine for GC 10.3.7Prike Liang1-0/+3
Enable gfxoff routine for GC 10.3.7. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: enable gfx power gating for GC 10.3.7Prike Liang2-1/+4
Enable gfx power gating for GC 10.3.7. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu/nv: enable clock gating for GC 10.3.7 subblockPrike Liang1-1/+11
This will enable the following block clock gating. - MC - SDMA - HDP - ATHUB - IH - VCN/JPEG Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: enable gfx clock gating control for GC 10.3.7Prike Liang1-0/+1
Enable gfx cg gate/ungate control for GC 10.3.7. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: fix suspend/resume hang regressionQiang Yu1-1/+2
Regression has been reported that suspend/resume may hang with the previous vm ready check commit. So bring back the evicted list check as a temp fix. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1922 Fixes: c1a66c3bc425 ("drm/amdgpu: check vm ready by amdgpu_vm->evicting flag") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Qiang Yu <qiang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Move CAP firmware loading to the beginning of PSP firmware listYifan Zha1-2/+2
[Why] As PSP needs to verify the signature, CAP firmware must be loaded first when PSP loads firmwares. Otherwise, when DFC feature is enabled, CP firmwares would be loaded failed. [ 1149.160480] [drm] MM table gpu addr = 0x800022f000, cpu addr = 00000000a62afcea. [ 1149.209874] [drm] failed to load ucode CP_CE(0x8) [ 1149.209878] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007) [ 1149.215914] [drm] failed to load ucode CP_PFP(0x9) [ 1149.215917] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007) [ 1149.221941] [drm] failed to load ucode CP_ME(0xA) [ 1149.221944] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007) [ 1149.228082] [drm] failed to load ucode CP_MEC1(0xB) [ 1149.228085] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007) [ 1149.234209] [drm] failed to load ucode CP_MEC2(0xD) [ 1149.234212] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007) [ 1149.242379] [drm] failed to load ucode VCN(0x1C) [ 1149.242382] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007) [How] Move CAP UCODE ID to the beginning of AMDGPU_UCODE_ID enum list. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Reviewed-by: Bokun Zhang <Bokun.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Bump minor version for hot plug tests enabling.Andrey Grodzovsky1-1/+2
This will allow to enable the tests only after latest fix after which the tests passed on my system. I tested on NV21 standalone and Vega 10 and Polaris as pair with DRI_PRIME. It's possible there might be still issues on ASICs i don't have at my posession but that that the point of enbling the tests finally - if other people during testing will encounter errors they will report and I will be able to fix. The releated merge request for enabling libdrm tests suite is in https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/227 Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Fix sigsev when accessing MMIO on hot unplug.Andrey Grodzovsky1-2/+8
Protect with drm_dev_enter/exit Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: convert code name to ip version for noretry setYifan Zhang1-6/+5
Use IP version rather than codename for noretry set. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: move amdgpu_gmc_noretry_set after ip_versions populatedYifan Zhang1-1/+1
otherwise adev->ip_versions is still empty when amdgpu_gmc_noretry_set is called. Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Remove redundant .ras_fini initialization in some ras blocksyipechai9-26/+8
1. Define amdgpu_ras_block_late_fini_default in amdgpu_ras.c as .ras_fini common function, which is called when .ras_fini of ras block isn't initialized. 2. Remove the code of using amdgpu_ras_block_late_fini to initialize .ras_fini in ras blocks. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in mca ras ↵yipechai3-27/+3
block Remove redundant calls of amdgpu_ras_block_late_fini in mca ras block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in sdma ras ↵yipechai3-9/+1
block Remove redundant calls of amdgpu_ras_block_late_fini in sdma ras block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in hdp ras ↵yipechai3-5/+2
block Remove redundant calls of amdgpu_ras_block_late_fini in hdp ras block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in xgmi ras ↵yipechai1-8/+1
block Remove redundant calls of amdgpu_ras_block_late_fini in xgmi ras block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in umc ras ↵yipechai4-10/+2
block Remove redundant calls of amdgpu_ras_block_late_fini in umc ras block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in nbio ras ↵yipechai3-9/+1
block Remove redundant calls of amdgpu_ras_block_late_fini in nbio ras block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in mmhub ↵yipechai3-5/+2
ras block Remove redundant calls of amdgpu_ras_block_late_fini in mmhub ras block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in gfx ras ↵yipechai3-9/+1
block Remove redundant calls of amdgpu_ras_block_late_fini in gfx ras block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: centrally calls the .ras_fini function of all ras blocksyipechai5-26/+14
centrally calls the .ras_fini function of all ras blocks. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Optimize xxx_ras_fini function of each ras blockyipechai11-21/+21
1. Move the variables of ras block instance members from specific xxx_ras_fini to general ras_fini call. 2. Function calls inside the modules only use parameters passed from xxx_ras_fini instead of ras block instance members. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Modify .ras_fini function pointer parameteryipechai19-24/+24
Modify .ras_fini function pointer parameter so that we can remove redundant intermediate calls in some ras blocks. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Fix realloc of ptrTom Rix1-2/+7
Clang static analysis reports this error amdgpu_debugfs.c:1690:9: warning: 1st function call argument is an uninitialized value tmp = krealloc_array(tmp, i + 1, ^~~~~~~~~~~~~~~~~~~~~~~~~~~ realloc uses tmp, so tmp can not be garbage. And the return needs to be checked. Fixes: 5ce5a584cb82 ("drm/amdgpu: add debugfs for reset registers list") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-01Merge tag 'amd-drm-next-5.18-2022-02-25' of ↵Dave Airlie22-472/+462
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.18-2022-02-25: amdgpu: - Raven2 suspend/resume fix - SDMA 5.2.6 updates - VCN 3.1.2 updates - SMU 13.0.5 updates - DCN 3.1.5 updates - Virtual display fixes - SMU code cleanup - Harvest fixes - Expose benchmark tests via debugfs - Drop no longer relevant gart aperture tests - More RAS restructuring - W=1 fixes - PSR rework - DP/VGA adapter fixes - DP MST fixes - GPUVM eviction fix - GPU reset debugfs register dumping support - Misc display fixes - SR-IOV fix - Aldebaran mGPU fix - Add module parameter to disable XGMI for testing amdkfd: - IH ring overflow logging fixes - CRIU fixes - Misc fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220225183535.5907-1-alexander.deucher@amd.com
2022-02-28Backmerge tag 'v5.17-rc6' into drm-nextDave Airlie5-7/+14
This backmerges v5.17-rc6 so I can merge some amdgpu and some tegra changes on top. Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-02-24drm/amdgpu: Exclude PCI reset method for now.Andrey Grodzovsky2-2/+7
According to my investigation of the state of PCI reset recently it's not working. The reason is due to the fact the kernel PCI code rejects SBR when there are more then one PF under same bridge which we always have (at least AUDIO PF but usually more) and that because SBR will reset all the PFS and devices under the same bridge as you and you cannot assume they support SBR. Once we anble FLR support we can reenable this option as FLR is doable on single PF and doens't have this restriction. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-24drm/amdgpu: Add use_xgmi_p2p module parameterAlex Sierra3-1/+11
This parameter controls xGMI p2p communication, which is enabled by default. However, it can be disabled by setting it to 0. In case xGMI p2p is disabled in a dGPU, PCIe p2p interface will be used instead. This parameter is ignored in GPUs that do not support xGMI p2p configuration. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>