Age | Commit message (Collapse) | Author | Files | Lines |
|
support check if dirver should try gpu recovery for
arcturus
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Guchun Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
To allow the flexibilty for user to disable gpu recovery
in RAS recovery path by module parameter amdgpu_gpu_recovery
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Guchun Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
NFO: task ocltst:2028 blocked for more than 120 seconds.
Tainted: G OE 5.0.0-37-generic #40~18.04.1-Ubuntu
echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cltst D 0 2028 2026 0x00000000
all Trace:
__schedule+0x2c0/0x870
schedule+0x2c/0x70
schedule_preempt_disabled+0xe/0x10
__mutex_lock.isra.9+0x26d/0x4e0
__mutex_lock_slowpath+0x13/0x20
? __mutex_lock_slowpath+0x13/0x20
mutex_lock+0x2f/0x40
amdgpu_dpm_set_powergating_by_smu+0x64/0xe0 [amdgpu]
gfx_v8_0_enable_gfx_static_mg_power_gating+0x3c/0x70 [amdgpu]
gfx_v8_0_set_powergating_state+0x66/0x260 [amdgpu]
amdgpu_device_ip_set_powergating_state+0x62/0xb0 [amdgpu]
pp_dpm_force_performance_level+0xe7/0x100 [amdgpu]
amdgpu_set_dpm_forced_performance_level+0x129/0x330 [amdgpu]
Fixes: a64c9e15e624 ("drm/amd/powerplay: cleanup the interfaces for powergate setting through SMU")
Signed-off-by: Evan Quan <[email protected]>
Reported-by: Rui Teng <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The mec ucode will set the CP_HQD_ACTIVE bit while the queue is mapped by
MAP_QUEUES packet. So we only need set cp active field for kiq queue.
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
count is size_t so don't use negative values.
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Implement indirect DPG SRAM mode for vcn2.5
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add dpg pause mode support for vcn2.5
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add DPG mode start and stop functions for vcn2.5
v2: Correct firmware ucode index in vcn_v2_5_mc_resume_dpg_mode
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Move macro from vcn2.0 to amdgpu_vcn to share with vcn2.5
v2: squash in macro fix
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add multiple instance direct SRAM read and write support for vcn2.5
v2: squash in indexing fix
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add multiple-instance dpg pause mode support for VCN2.5
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
enabled(V5)
[why]
In dual GPUs scenario, stolen_size is assigned to zero on the secondary GPU,
since there is no pre-OS console using that memory. Then the bottom region of
VRAM was allocated as GTT, unfortunately a small region of bottom VRAM was
encroached by UMC firmware during GDDR6 BIST training, this cause page fault.
[how]
Forcing stolen_size to 3MB, then the bottom region of VRAM was
allocated as stolen memory, GTT corruption avoid.
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Tianci.Yin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
remove registers: mmSPI_CONFIG_CNTL
add registers: mmSPI_CONFIG_CNTL_1
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Tianci.Yin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
remove registers: mmSPI_CONFIG_CNTL
add registers: mmSPI_CONFIG_CNTL_1
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Tianci.Yin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In SRIOV, rlc_g firmware is loaded by host, guest driver won't load it which will
cause the rlc_fw pointer is null
Signed-off-by: shaoyunl <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Entirely unused.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Disabled HW IP's entity initialized with NULL rq. We should not
process any submit request from userspace for a disabled HW IP.
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
To align with gfx v9, we use the map_queues packet to load hiq MQD.
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There is an issue that CP will check the HIQ queue to be configured and mapped
with KIQ ring, otherwise, it will be unable to read back the secure buffer while
the gfxoff is enabled even with trusted IP blocks.
v1 -> v2:
- Fix to remove surplus set_resources packets.
- Fill the whole configuration in MQD.
- Change the author as Aaron because he addressed the key point of this issue.
- Add kiq ring lock.
v2 -> v3:
- Free the lock while in error return case.
- Remove the programming only needed by the queue is unmapped.
v3 -> v4:
- Remove doorbell programming because it's used for restarting queue.
- Remove CP scheduler programming because map_queue packet will handle this.
v4 -> v5:
- Remove cp_hqd_active because mec ucode will enable it while use map_queues.
- Revise goto out_unlock.
- Correct the right doorbell offset for HIQ that kfd driver assigned in the
packet.
v5 -> v6:
- Merge Arcturus fix into this patch because it will get oops in Arcturus
platform.
Reported-by: Lisa Saturday <[email protected]>
Signed-off-by: Aaron Liu <[email protected]>
Signed-off-by: Huang Rui <[email protected]>
Reviewed-and-Tested-by: Aaron Liu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
kfd2kgd interface will be deprecated. This removal only covers TLB
invalidation for now. They have been replaced in amdgpu_amdkfd API.
[How]
TLB invalidate functions removed from the different amdkfd_gfx_v*
versions.
Signed-off-by: Alex Sierra <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
TLB flush method has been deprecated using kfd2kgd interface.
This implementation is now on the amdgpu_amdkfd API.
[How]
TLB flush functions now implemented in amdgpu_amdkfd.
Signed-off-by: Alex Sierra <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This can be used directly from amdgpu and amdkfd to invalidate
TLB through pasid.
It supports gmc v7, v8, v9 and v10.
Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
There are HW-indpendent functions that enables and disables kcq. These functions use
the kiq_pm4_funcs implementation.
[How]
Local kcq enable and disable functions removed and replace it by the generic kcq
enable under amdgpu_gfx
Signed-off-by: Alex Sierra <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
tlbs invalidate pointer function added to kiq_pm4_funcs struct.
This way, tlb flush can be done through kiq member.
TLBs invalidatation implemented for gfx9 and gfx10.
Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Functions implemented from kiq_pm4_funcs struct members
for gfx_v9 version.
Signed-off-by: Alex Sierra <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
Avoid reclaim filesystem while eviction lock is held called from
MMU notifier.
[How]
Setting PF_MEMALLOC_NOFS flags while eviction mutex is locked.
Using memalloc_nofs_save / memalloc_nofs_restore API.
Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update mmSDMA0_UTCL1_WATERMK golden setting for renoir.
Signed-off-by: Aaron Liu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If driver debugfs files are accessed, power up the GPU
when necessary.
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If power management sysfs or debugfs files are accessed,
power up the GPU when necessary.
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fixes: 4dee6e4ca50a ("drm/amdgpu: use linux size macro to simplify ONE_Kib & One_Mib")
Signed-off-by: Flora Cui <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
So that it gets included in the initrd. At the moment
this is optional firmware that contains support for HDCP.
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On Arcturus, data fabric hashing is set by the VBIOS, and
affects which addresses map to which memory channels. The
gfx core's caches also need to know this mapping, but the
hash settings for these these caches is set by the driver.
This change queries the DF to understand how the VBIOS
configured DF, then matches the TC hash configuration bits
to do the same thing.
v2: squash in warning fix
Signed-off-by: Joseph Greathouse <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The only data fabric information the adev struct currently
contains is a function pointer table. In the near future,
we will be adding some cached DF information into adev. As
such, this patch creates a new amdgpu_df struct for adev.
Right now, it only containst the old function pointer table,
but new stuff will be added soon.
Signed-off-by: Joseph Greathouse <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
between UMC RAS err register access restore previous RSMU UMC index mode state
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: John Clements <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
in event of GPU reset, XGMI TA unload causes unrecoverable GPU hang
Acked-by: Hawking Zhang <[email protected]>
Signed-off-by: John Clements <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update mmSDMA0_UTCL1_WATERMK golden setting for renoir.
Signed-off-by: Aaron Liu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We don't need to store the pre-OS console memory after
the driver has loaded so free it.
Reviewed-by: Huang Rui <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Leftover from bring up. We look up the actual pre-OS memory usage
value later in the same function.
Reviewed-by: Huang Rui <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It should work on all Raven variants, but some users have
reported issues with original Raven with IOMMU enabled.
So far there have been no issues observed with PCO or RV2.
v2: split out the dm init and domain changes into separate
patches.
Acked-by: Harry Wentland <[email protected]>
Acked-by: Huang Rui <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
All of the sdma stuff these were used for moves to
the sdma code, so remove them.
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
sdma ras funcs are not supported by ASIC prior
to vega20
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Hardcoded offset is not friendly. And another benifit of this
patch is to keep read and write access to this register be
consistent with other similar UMC regsiters in this file.
Signed-off-by: Guchun Chen <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
guest vm gets 0xffffffff when reading RCC_DEV0_EPF0_STRAP0,
as a consequence, the rev_id and external_rev_id are wrong.
workaround it by hardcoding the rev_id to 0, which is the default value.
v2. add comment in the code
Signed-off-by: Tiecheng Zhou <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
SDMA edc counter registers were added in gfx edc counters
array. When querying gfx error counter in that array, there
is no way to differentiate sdma instance number for different
asic and then results to NULL pointer access when trying to
read sdma register base address for instances greater
than 2 on Vega20.
In addition, this also results to wrong gfx error counters
since it actually added sdma edc counters.
Therefore, sdma edc counter registers should be separated
from gfx edc counter regsiter array and only get initialized
when driver tries to enable sdma ras.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
move ras_late_init and ras_fini to sdma_ras_funcs table
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
invoke sdma query_ras_error_count to get sdma single
bit error count
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
query_ras_error_count function will be invoked to query
single bit error count detected in sdma ip block
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
With default PSP FW loading
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: James Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
ucodes for instances are from different location
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: James Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Do not ignore amdgpu_irq_add_id return value while registering
VMC page fault interrupt.
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|