aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2020-01-16drm/amdgpu: add arcturus to gpu recovery check code pathHawking Zhang1-0/+1
support check if dirver should try gpu recovery for arcturus Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: check if driver should try recovery in ras recovery pathHawking Zhang1-1/+2
To allow the flexibilty for user to disable gpu recovery in RAS recovery path by module parameter amdgpu_gpu_recovery Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amd/powerplay: a quick fix for the deadlock issue belowEvan Quan1-15/+43
NFO: task ocltst:2028 blocked for more than 120 seconds. Tainted: G OE 5.0.0-37-generic #40~18.04.1-Ubuntu echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. cltst D 0 2028 2026 0x00000000 all Trace: __schedule+0x2c0/0x870 schedule+0x2c/0x70 schedule_preempt_disabled+0xe/0x10 __mutex_lock.isra.9+0x26d/0x4e0 __mutex_lock_slowpath+0x13/0x20 ? __mutex_lock_slowpath+0x13/0x20 mutex_lock+0x2f/0x40 amdgpu_dpm_set_powergating_by_smu+0x64/0xe0 [amdgpu] gfx_v8_0_enable_gfx_static_mg_power_gating+0x3c/0x70 [amdgpu] gfx_v8_0_set_powergating_state+0x66/0x260 [amdgpu] amdgpu_device_ip_set_powergating_state+0x62/0xb0 [amdgpu] pp_dpm_force_performance_level+0xe7/0x100 [amdgpu] amdgpu_set_dpm_forced_performance_level+0x129/0x330 [amdgpu] Fixes: a64c9e15e624 ("drm/amd/powerplay: cleanup the interfaces for powergate setting through SMU") Signed-off-by: Evan Quan <[email protected]> Reported-by: Rui Teng <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: only set cp active field for kiq queueHuang Rui3-6/+15
The mec ucode will set the CP_HQD_ACTIVE bit while the queue is mapped by MAP_QUEUES packet. So we only need set cp active field for kiq queue. Signed-off-by: Huang Rui <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu/pm: clean up return typesAlex Deucher1-15/+24
count is size_t so don't use negative values. Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu/vcn2.5: implement indirect DPG SRAM modeJames Zhu2-20/+52
Implement indirect DPG SRAM mode for vcn2.5 Signed-off-by: James Zhu <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu/vcn2.5: add dpg pause modeJames Zhu1-0/+70
Add dpg pause mode support for vcn2.5 Signed-off-by: James Zhu <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu/vcn2.5: add DPG mode start and stopJames Zhu1-2/+288
Add DPG mode start and stop functions for vcn2.5 v2: Correct firmware ucode index in vcn_v2_5_mc_resume_dpg_mode Signed-off-by: James Zhu <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu/vcn: move macro from vcn2.0 to share amdgpu_vcn (v2)James Zhu3-12/+12
Move macro from vcn2.0 to amdgpu_vcn to share with vcn2.5 v2: squash in macro fix Signed-off-by: James Zhu <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu/vcn: support multiple instance direct SRAM read and write (v2)James Zhu3-84/+83
Add multiple instance direct SRAM read and write support for vcn2.5 v2: squash in indexing fix Signed-off-by: James Zhu <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu/vcn: support multiple-instance dpg pause modeJames Zhu4-9/+9
Add multiple-instance dpg pause mode support for VCN2.5 Signed-off-by: James Zhu <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: fix modprobe failure of the secondary GPU when GDDR6 training ↵Tianci.Yin2-1/+31
enabled(V5) [why] In dual GPUs scenario, stolen_size is assigned to zero on the secondary GPU, since there is no pre-OS console using that memory. Then the bottom region of VRAM was allocated as GTT, unfortunately a small region of bottom VRAM was encroached by UMC firmware during GDDR6 BIST training, this cause page fault. [how] Forcing stolen_size to 3MB, then the bottom region of VRAM was allocated as stolen memory, GTT corruption avoid. Reviewed-by: Christian König <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Tianci.Yin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu/gfx10: update gfx golden settings for navi14Tianci.Yin1-1/+1
remove registers: mmSPI_CONFIG_CNTL add registers: mmSPI_CONFIG_CNTL_1 Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Tianci.Yin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu/gfx10: update gfx golden settingsTianci.Yin1-1/+1
remove registers: mmSPI_CONFIG_CNTL add registers: mmSPI_CONFIG_CNTL_1 Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Tianci.Yin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: check rlc_g firmware pointer is valid before using itshaoyunl1-4/+5
In SRIOV, rlc_g firmware is loaded by host, guest driver won't load it which will cause the rlc_fw pointer is null Signed-off-by: shaoyunl <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: drop amdgpu_job.ownerChristian König3-3/+0
Entirely unused. Signed-off-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: error out on entity with no run queueNirmoy Das1-0/+5
Disabled HW IP's entity initialized with NULL rq. We should not process any submit request from userspace for a disabled HW IP. Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdkfd: use map_queues for hiq on gfx v10 as wellHuang Rui1-21/+61
To align with gfx v9, we use the map_queues packet to load hiq MQD. Signed-off-by: Huang Rui <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdkfd: use kiq to load the mqd of hiq queue for gfx v9 (v6)Aaron Liu3-20/+63
There is an issue that CP will check the HIQ queue to be configured and mapped with KIQ ring, otherwise, it will be unable to read back the secure buffer while the gfxoff is enabled even with trusted IP blocks. v1 -> v2: - Fix to remove surplus set_resources packets. - Fill the whole configuration in MQD. - Change the author as Aaron because he addressed the key point of this issue. - Add kiq ring lock. v2 -> v3: - Free the lock while in error return case. - Remove the programming only needed by the queue is unmapped. v3 -> v4: - Remove doorbell programming because it's used for restarting queue. - Remove CP scheduler programming because map_queue packet will handle this. v4 -> v5: - Remove cp_hqd_active because mec ucode will enable it while use map_queues. - Revise goto out_unlock. - Correct the right doorbell offset for HIQ that kfd driver assigned in the packet. v5 -> v6: - Merge Arcturus fix into this patch because it will get oops in Arcturus platform. Reported-by: Lisa Saturday <[email protected]> Signed-off-by: Aaron Liu <[email protected]> Signed-off-by: Huang Rui <[email protected]> Reviewed-and-Tested-by: Aaron Liu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: flush TLB functions removal from kfd2kgd interfaceAlex Sierra6-249/+0
[Why] kfd2kgd interface will be deprecated. This removal only covers TLB invalidation for now. They have been replaced in amdgpu_amdkfd API. [How] TLB invalidate functions removed from the different amdkfd_gfx_v* versions. Signed-off-by: Alex Sierra <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: GPU TLB flush API moved to amdgpu_amdkfdAlex Sierra2-0/+34
[Why] TLB flush method has been deprecated using kfd2kgd interface. This implementation is now on the amdgpu_amdkfd API. [How] TLB flush functions now implemented in amdgpu_amdkfd. Signed-off-by: Alex Sierra <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: export function to flush TLB via pasidAlex Sierra5-0/+223
This can be used directly from amdgpu and amdkfd to invalidate TLB through pasid. It supports gmc v7, v8, v9 and v10. Signed-off-by: Alex Sierra <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: replace kcq enable/disable functions on gfx_v9Alex Sierra1-100/+2
[Why] There are HW-indpendent functions that enables and disables kcq. These functions use the kiq_pm4_funcs implementation. [How] Local kcq enable and disable functions removed and replace it by the generic kcq enable under amdgpu_gfx Signed-off-by: Alex Sierra <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: implement tlbs invalidate on gfx9 gfx10Alex Sierra3-0/+33
tlbs invalidate pointer function added to kiq_pm4_funcs struct. This way, tlb flush can be done through kiq member. TLBs invalidatation implemented for gfx9 and gfx10. Signed-off-by: Alex Sierra <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: kiq pm4 function implementation for gfx_v9Alex Sierra1-0/+115
Functions implemented from kiq_pm4_funcs struct members for gfx_v9 version. Signed-off-by: Alex Sierra <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: Avoid reclaim fs while eviction lockAlex Sierra2-8/+38
[Why] Avoid reclaim filesystem while eviction lock is held called from MMU notifier. [How] Setting PF_MEMALLOC_NOFS flags while eviction mutex is locked. Using memalloc_nofs_save / memalloc_nofs_restore API. Signed-off-by: Alex Sierra <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-15drm/amdgpu: update goldensetting for renoirAaron Liu1-1/+1
Update mmSDMA0_UTCL1_WATERMK golden setting for renoir. Signed-off-by: Aaron Liu <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu/debugfs: properly handle runtime pmAlex Deucher2-7/+134
If driver debugfs files are accessed, power up the GPU when necessary. Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu/pm: properly handle runtime pmAlex Deucher1-208/+614
If power management sysfs or debugfs files are accessed, power up the GPU when necessary. Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: add header file for macro SZ_1MFlora Cui1-0/+1
Fixes: 4dee6e4ca50a ("drm/amdgpu: use linux size macro to simplify ONE_Kib & One_Mib") Signed-off-by: Flora Cui <[email protected]> Reviewed-by: Kevin Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu/psp: declare navi1x ta firmwareAlex Deucher1-0/+3
So that it gets included in the initrd. At the moment this is optional firmware that contains support for HDCP. Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: Match TC hash settings to DF settings (v2)Joseph Greathouse3-0/+50
On Arcturus, data fabric hashing is set by the VBIOS, and affects which addresses map to which memory channels. The gfx core's caches also need to know this mapping, but the hash settings for these these caches is set by the driver. This change queries the DF to understand how the VBIOS configured DF, then matches the TC hash configuration bits to do the same thing. v2: squash in warning fix Signed-off-by: Joseph Greathouse <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: Create generic DF struct in adevJoseph Greathouse8-49/+90
The only data fabric information the adev struct currently contains is a function pointer table. In the near future, we will be adding some cached DF information into adev. As such, this patch creates a new amdgpu_df struct for adev. Right now, it only containst the old function pointer table, but new stuff will be added soon. Signed-off-by: Joseph Greathouse <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: preserve RSMU UMC index mode stateJohn Clements1-2/+41
between UMC RAS err register access restore previous RSMU UMC index mode state Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: disable XGMI TA unload for arcturusJohn Clements1-0/+5
in event of GPU reset, XGMI TA unload causes unrecoverable GPU hang Acked-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: update goldensetting for renoirAaron Liu1-1/+1
Update mmSDMA0_UTCL1_WATERMK golden setting for renoir. Signed-off-by: Aaron Liu <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu/gmc10: free stolen memory in late_initAlex Deucher1-0/+2
We don't need to store the pre-OS console memory after the driver has loaded so free it. Reviewed-by: Huang Rui <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu/gmc10: remove dead codeAlex Deucher1-9/+0
Leftover from bring up. We look up the actual pre-OS memory usage value later in the same function. Reviewed-by: Huang Rui <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: enable S/G display on PCO and RV2 (v2)Alex Deucher1-6/+16
It should work on all Raven variants, but some users have reported issues with original Raven with IOMMU enabled. So far there have been no issues observed with PCO or RV2. v2: split out the dm init and domain changes into separate patches. Acked-by: Harry Wentland <[email protected]> Acked-by: Huang Rui <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu/gfx9: remove unused sdma headersAlex Deucher1-9/+0
All of the sdma stuff these were used for moves to the sdma code, so remove them. Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: check sdma ras funcs pointer before accessingHawking Zhang1-2/+6
sdma ras funcs are not supported by ASIC prior to vega20 Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: calculate MCUMC_ADDRT0 per asic's UMC offsetGuchun Chen1-4/+6
Hardcoded offset is not friendly. And another benifit of this patch is to keep read and write access to this register be consistent with other similar UMC regsiters in this file. Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu/sriov: workaround on rev_id for Navi12 under sriovTiecheng Zhou1-0/+6
guest vm gets 0xffffffff when reading RCC_DEV0_EPF0_STRAP0, as a consequence, the rev_id and external_rev_id are wrong. workaround it by hardcoding the rev_id to 0, which is the default value. v2. add comment in the code Signed-off-by: Tiecheng Zhou <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: read sdma edc counter to clear the countersHawking Zhang2-10/+8
SDMA edc counter registers were added in gfx edc counters array. When querying gfx error counter in that array, there is no way to differentiate sdma instance number for different asic and then results to NULL pointer access when trying to read sdma register base address for instances greater than 2 on Vega20. In addition, this also results to wrong gfx error counters since it actually added sdma edc counters. Therefore, sdma edc counter registers should be separated from gfx edc counter regsiter array and only get initialized when driver tries to enable sdma ras. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: add ras_late_init and ras_fini for sdma v4Hawking Zhang2-2/+7
move ras_late_init and ras_fini to sdma_ras_funcs table Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: support error reporting for sdma ip blockHawking Zhang1-0/+8
invoke sdma query_ras_error_count to get sdma single bit error count Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: add query_ras_error_count function for sdma v4Hawking Zhang2-0/+169
query_ras_error_count function will be invoked to query single bit error count detected in sdma ip block Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: enable VCN2.5 IP block for ArcturusLeo Liu1-2/+1
With default PSP FW loading Signed-off-by: Leo Liu <[email protected]> Reviewed-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu/vcn2.5: fix PSP FW loading for the second instanceLeo Liu1-2/+2
ucodes for instances are from different location Signed-off-by: Leo Liu <[email protected]> Reviewed-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: catch amdgpu_irq_add_id failureNirmoy Das1-0/+4
Do not ignore amdgpu_irq_add_id return value while registering VMC page fault interrupt. Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>