Age | Commit message (Collapse) | Author | Files | Lines |
|
A subset of VM fault types currently send retry XNACK to the client.
This causes a storm of interrupts from the VM to the host.
Until the storm is throttled by other means send no-retry XNACK for
all fault types instead. No change in behavior to the client which
will stall indefinitely with the current configuration in any case.
Improves system stability under GC or MMHUB faults.
Signed-off-by: Jay Cornwall <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: John Bridgman <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Yong Zhao <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Set a configurable SDMA phase quantum when enabling SDMA context
switching. The default value significantly reduces SDMA latency
in page table updates when user-mode SDMA queues have concurrent
activity, compared to the initial HW setting.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Andres Rodriguez <[email protected]>
Reviewed-by: Shaoyun Liu <[email protected]>
Acked-by: Chunming Zhou <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable SDMA context switching on CIK (copied from sdma_v3_0.c).
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For GFX context, the ATC bit in SDMA*_GFX_VIRTUAL_ADDRESS can be cleared
to perform in VM mode. For RLC context, to support ATC mode , ATC bit in
SDMA*_RLC*_VIRTUAL_ADDRESS should be set. SDMA_CNTL.ATC_L1_ENABLE bit is
global setting that enables the L1-L2 translation for ATC address.
Signed-off-by: shaoyun liu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This gives BOs which haven't been accessed by the CPU since they were
moved to visible VRAM another chance to stay in VRAM when another BO
needs to go to visible VRAM.
This should allow BOs to stay in VRAM longer in some cases.
v2:
* Only do this for BOs which don't have the
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag set.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There is no need for page faults to force BOs into visible VRAM if it's
full, and the time it takes to do so is great enough to cause noticeable
stuttering. Add GTT as a possible placement so that if visible VRAM is
full, page faults move BOs to GTT instead of evicting other BOs from VRAM.
Suggested-by: Michel Dänzer <[email protected]>
Signed-off-by: John Brooks <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When a BO is moved to VRAM, clear AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED.
This allows it to potentially later move to invisible VRAM if the CPU
does not access it again.
Setting the CPU_ACCESS flag in amdgpu_bo_fault_reserve_notify() also means
that we can remove the loop to restrict lpfn to the end of visible VRAM,
because amdgpu_ttm_placement_init() will do it for us.
v3 [Michel Dänzer]
* Use AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED instead of a new flag
(Christian König)
* Clear flag in amdgpu_bo_move instead of amdgpu_move_ram_vram
(Christian)
* Explicitly mention amdgpu_bo_fault_reserve_notify in amdgpu_bo_move
* Also clear flag in amdgpu_bo_create_restricted
Suggested-by: Michel Dänzer <[email protected]>
Signed-off-by: John Brooks <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The BO move throttling code is designed to allow VRAM to fill quickly if it
is relatively empty. However, this does not take into account situations
where the visible VRAM is smaller than total VRAM, and total VRAM may not
be close to full but the visible VRAM segment is under pressure. In such
situations, visible VRAM would experience unrestricted swapping and
performance would drop.
Add a separate counter specifically for moves involving visible VRAM, and
check it before moving BOs there.
v2: Only perform calculations for separate counter if visible VRAM is
smaller than total VRAM. (Michel Dänzer)
v3: [Michel Dänzer]
* Use BO's location rather than the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
flag to determine whether to account a move for visible VRAM in most
cases.
* Use a single
if (adev->mc.visible_vram_size < adev->mc.real_vram_size) {
block in amdgpu_cs_get_threshold_for_moves.
Fixes: 95844d20ae02 (drm/amdgpu: throttle buffer migrations at CS using a fixed MBps limit (v2))
Signed-off-by: John Brooks <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Allow specifying a limit on visible VRAM via a module parameter. This is
helpful for testing performance under visible VRAM pressure.
v2: Add cast to 64-bit (Christian König)
Signed-off-by: John Brooks <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Limit the default GART size and save a lot of VRAM.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This allows setting the gtt size independent of the gart size.
v2: fix copy and paste typo
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We should only cover the GART size with the GTT manager.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Rename symbols from gtt_ to gart_ as appropriate.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Not used any more.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No functional change, just cleanup.
v2: rebased, keep gart name.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Rather than checking the CONGIG_MEMSIZE register as that may
not be reliable on some APUs.
v2: The scratch register is only used on CIK+
Reviewed-by: Junwei Zhang <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Call nbio init registers on hw_init to set up any
nbio registers that need initialization at hw init time.
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Used for nbio registers that need to be initialized. Currently
only used for a golden setting that got missed on some boards.
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
- v2: rename param 'en' as 'lock'
Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
- v2: added missing spinlock init
Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use the TTM values instead of the hardware config here.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We can finally remove this now.
v2: remove now unused max_size variable as well.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No need to map BOs to GTT on eviction and intermediate transfers any more.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This way we don't need to map the full BO at a time any more.
v2: use fixed windows for src/dst
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This allows us to write the mapped PTEs into
an IB instead of the table directly.
v2: fix build with debugfs enabled, remove unused assignment
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We want to use them as remap address space.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The arrays pctl0_data and pctl1_data do not need to be in global scope,
so them both static.
Cleans up sparse warnings:
symbol 'pctl0_data' was not declared. Should it be static?
symbol 'pctl1_data' was not declared. Should it be static?
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Get it from the system info table.
Reviewed-by: Hawking Zhang <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Implement support using the new atomfirmware system info table.
Reviewed-by: Hawking Zhang <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Not all vbios images seem to set the version appropriately.
Switch the check based on asic type instead.
Reviewed-by: Hawking Zhang <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Alex Xie <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Reviewed-by: Christian König<[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Rex Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Rex Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
1. simplify avfs state switch.
2. delete save/restore VFT table functions as not support
by fiji.
3. implement thermal_avfs_enable funciton.
Signed-off-by: Rex Zhu <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Rex Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Rex Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In previous case, driver can't enable psp via the kernel parameter for raven.
We should open this path and set it as direct by default till psp firmware
loading is workable.
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Junwei Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Junwei Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Junwei Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Junwei Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
nbio hdp flush routine are called within atomic context.
Avoid use KIQ when write to the HDP_MEM_COHERENCY_FLUSH_CNTL register
since this register has its own VF copy
Signed-off-by: Shaoyun Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Shaoyun Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
for SR-IOV, we must keep the pipeline-sync in the protection
of COND_EXEC, otherwise the command consumed by CPG is not
consistent when world switch triggerd, e.g.:
world switch hit and the IB frame is skipped so the fence
won't signal, thus CP will jump to the next DMAframe's pipeline-sync
command, and it will make CP hang foever.
after pipelin-sync moved into COND_EXEC the consistency can be
guaranteed
Signed-off-by: Monk Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Rex Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Rex Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|