Age | Commit message (Collapse) | Author | Files | Lines |
|
The pfvf exchange need be in exclusive mode. And add pfvf exchange in gpu
reset.
Signed-off-by: Emily Deng <[email protected]>
Reviewed-By: Xiangliang Yu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For virtual display feature, no need to pin cursor bo.
Signed-off-by: Emily Deng <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev,
but outside req_full_gpu of sriov.
It would make sriov hang during reset.
Signed-off-by: Wentao Lou <[email protected]>
Reviewed-by: Shaoyun Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
I retested Bonaire (gfx7 dGPU) and it works fine.
Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
SI does not use doorbells, move asic doorbell init later
asic check.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108920
Reviewed-by: Oak Zeng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use per hive wq to concurrently send reset commands to all nodes
in the hive.
v2:
Switch to system_highpri_wq after dropping dedicated queue.
Fix non XGMI code path KASAN error.
Stop the hive reset for each node loop if there
is a reset failure on any of the nodes.
Signed-off-by: Andrey Grodzovsky <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
XGMI hive has some resources allocted on device init which
needs to be deallocated when the device is unregistered.
v2: Remove creation of dedicated wq for XGMI hive reset.
v3: Use the gmc.xgmi.supported flag
Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When paging queue is enabled, it use the second page of doorbell.
The AMDGPU_DOORBELL64_MAX_ASSIGNMENT definition assumes all the
kernel doorbells are in the first page. So with paging queue enabled,
the total kernel doorbell range should be original num_doorbell plus
one page (0x400 in dword), not *2.
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For XGMI hive case do reset in steps where each step iterates over
all devs in hive. This especially important for asic reset
since all PSP FW in hive must come up within a limited time
(around 1 sec) to properply negotiate the link.
Do this by refactoring amdgpu_device_gpu_recover and amdgpu_device_reset
into pre_asic_reset, asic_reset and post_asic_reset functions where is part
is exectued for all the GPUs in the hive before going to the next step.
v2: Update names for amdgpu_device_lock/unlock functions.
v3: Introduce per hive locking to avoid multiple resets for GPUs
in same hive.
v4:
Remove delayed_workqueue()/ttm_bo_unlock_delayed_workqueue() - they
are copy & pasted over from radeon and on amdgpu there isn't
any reason for that any more.
Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This is prep work for updating each PSP FW in hive after
GPU reset.
Split into build topology SW state and update each PSP FW in the hive.
Save topology and count of XGMI devices for reuse.
v2: Create seperate header for XGMI.
Signed-off-by: Andrey Grodzovsky <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
ASIC specific doorbell layout is used instead of enum definition
Signed-off-by: Oak Zeng <[email protected]>
Suggested-by: Felix Kuehling <[email protected]>
Suggested-by: Alex Deucher <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Also call functioin amdgpu_device_doorbell_init after
amdgpu_device_ip_early_init because the former depends
on the later to set up asic-specific init_doorbell_index
function
Signed-off-by: Oak Zeng <[email protected]>
Suggested-by: Felix Kuehling <[email protected]>
Suggested-by: Alex Deucher <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Because increase SDMA_DOORBELL_RANGE to add new SDMA doorbell for paging queue will
break SRIOV, instead we can reserve and map two doorbell pages for amdgpu, paging
queues doorbell index use same index as SDMA gfx queues index but on second page.
For Vega20, after we change doorbell layout to increase SDMA doorbell for 8 SDMA RLC
queues later, we could use new doorbell index for paging queue.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Setup and tear down xgmi as part of psp.
v2:
- make psp_xgmi_terminate static
- squash in:
drm/amdgpu: only issue xgmi cmd when it is enabled
drm/amdgpu/psp: terminate xgmi ta in suspend and hw_fini phase
Signed-off-by: Hawking Zhang <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There is no functional changes,
Use function arguments for SRIOV special variables which
is hardcode in those functions.
so we can share those functions in baremetal.
Reviewed-by: Monk Liu <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
After testing looks like these subset of ASICs has GPU reset
working for the most part. Enable reset due to job timeout.
v2: Switch from GFX version to ASIC type.
v3: Fix identation
Signed-off-by: Andrey Grodzovsky <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This is still completely breaking my Raven system.
This reverts commit cdf2f910fa969adca1b0e3ad2b487821233dc038.
Revert until we sort out the sbios and firmware combinations that work
correctly.
bug: https://bugs.freedesktop.org/show_bug.cgi?id=108606
Cc: [email protected] # v4.19
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Problem:
During GPU recover DAL would hang in
amdgpu_pm_compute_clocks->amdgpu_fence_wait_empty
Fix:
Turns out there was a typo introduced by
3320b8d drm/amdgpu: remove job->ring which caused skipping
amdgpu_fence_driver_force_completion and so the hangged job
was never force signaled and this would cause the hang later in DAL.
Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Need to check adev->powerplay.pp_funcs.
Signed-off-by: Emily Deng <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Extract the function of fw loading out of powerplay.
Do fw loading between hw_init/resuem_phase1 and phase2
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We need to do some IPs earlier to deal with ordering issues
similar to how resume is split into two phases.
Will do fw loading via smu/psp between the two phases.
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
1. one is for create/free bo when init/fini
2. one is for fill the bo before fw loading
the ucode bo only need to be created when load driver
and free when driver unload.
when resume/reset, driver only need to re-fill the bo
if the bo is allocated in vram.
Suggested by Christian.
v2: Return error when bo create failed.
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix cg/pg unexpected set in hw init failed case.
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
1. only call late_init when hw_init successful,
so check status.hw instand of status.valid in late_init.
2. set status.late_initialized true if late_init was not implemented.
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Move in_suspend flag to adev from gfx, so
can be used in other ip blocks, also keep
consistent with gpu_in_reset flag.
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
MGPU fan boost feature is enabled only when two or more dGPUs
in the system.
Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Don't grab the reservation lock any more and simplify the handling quite
a bit.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It shouldn't add much overhead and we should make sure that critical
VRAM content is always restored.
Signed-off-by: Christian König <[email protected]>
Acked-by: Junwei Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Treat them all as Raven rather than adding a new picasso
asic type. This simplifies a lot of code and also handles the
case of rv2 chips with the 0x15d8 pci id. It also fixes dmcu
fw handling for picasso.
Acked-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add gpu_info firmware for raven2.
Signed-off-by: Feifei Xu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
enable gfxoff in non-sriov and stutter mode by default
Signed-off-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add support for picasso to the display manager.
Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add the IP blocks, clock and powergating flags, and common clockgating support.
Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add picasso to amd_asic_type enum and amdgpu_asic_name[].
Signed-off-by: Likun Gao <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Driver will save an array of XGMI hive info, each hive will have a list of devices
that have the same hive ID.
Signed-off-by: Shaoyun Liu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
since we use PSP to program IH regs now
Signed-off-by: Monk Liu <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Emily Deng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Move that into amdgpu_gmc.c since we are really deadling with GMC
address space here.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Junwei Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It will be more safe to make full-acess include both phase1 and phase2.
Then accessing special registeris wherever at phase1 or phase2 will not
block any shutdown and suspend process under virtualization.
Signed-off-by: Yintian Tao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Check if we should call the function instead of providing the forced
flag.
v2: rebase on KFD changes (Alex)
Signed-off-by: Christian König <[email protected]>
Acked-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There is no need for gpu full access for suspend phase1
because under virtualization there is no hw register access
for dce block.
Signed-off-by: Yintian Tao <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
After set power ungate state, set clock ungate state
before when suspend or fini.
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Unify to set power ungate state at the begin of suspend/fini.
Remove the workaround code for gfx off feature in
amdgpu_device.c.
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There are no any logical changes here.
1. change function names:
amdgpu_device_ip_late_set_pg/cg_state to
amdgpu_device_set_pg/cg_state.
2. add a function argument cg/pg_state, so
we can enable/disable cg/pg through those functions
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of the fixed round robin use let the scheduler balance the load
of page table updates.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Cancel the delay work to avoid the corner case that
ib test was not running when suspend
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
there may be gfx off delay work pending when suspend/driver
unload, need to cancel them first.
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
use amdgpu_gfx_off_ctrl function so driver can arbitrate
whether the gfx ip can be power off or power on.
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
delay to enable gfx off feature to avoid gfx on/off frequently
suggested by Alex and Evan.
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v2:
1. drop the special handling for the hw IP
suggested by hawking and Christian.
2. refine the variable name suggested by Flora.
This funciton as the entry of gfx off feature.
we arbitrat gfx off feature enable/disable in this
function.
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts commit 8624c3c4dbfe24fc6740687236a2e196f5f4bfb0.
We need CONFIG_DRM_AMD_DC_DCN1_0 to guard code that is using fp math.
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Leo (Sunpeng) Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|