| Age | Commit message (Collapse) | Author | Files | Lines |
|
The bank number of both VML2 and ATCL2 are changed to 8, so refine
related codes to avoid defining long name arrays.
Signed-off-by: Dennis Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add edc counter/status reset and query functions for gfx block of
aldebaran.
v2: change to clear edc counter explicitly
aldebaran hardware will not clear edc counter after driver reading them,
so driver should clear them explicitly.
Signed-off-by: Dennis Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add GC power brake feature support for Aldebaran.
v2: squash in fixes (Alex)
Signed-off-by: Kevin Wang <[email protected]>
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The golden setting was changed recently. update to
the latest one
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
golden settings that should be applied
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Those registers should be programmed as one-time initialization
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Initialization of TRAP_DATA0/1 is still required for the debugger to detect
new waves on Aldebaran. Also, per-vmid global trap enablement may be
required outside of debugger scope so move to init phase.
v2: just add the gfx 9.4.2 changes (Alex)
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Create dedicated Aldebaran kfd2kgd callbacks to prepare
for new per-vmid register instructions for debug trap
setting functions and sending host traps.
v2: rebase (Alex)
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Oak Zeng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This is to keep wavefront context for debug purpose
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Match existing asics.
v2: rebase (Alex)
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
With a recent gart page table re-construction, the gart page
table is now 2-level for some ASICs: PDB0->PTB.
In the case of 2-level gart page table, the page_table_base
of vmid0 should point to PDB0 instead of PTB.
Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Oak Zeng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
More accurate words are used to address a
code review feedback
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For the new 2-level GART table, the last PDE0 points
to PTB. Since PTB is in vram and right now we are
runing under s=0 mode (vram is treated as FB carveout),
so the s bit of this PDE0 should be set to 0.
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
update mmhub client id table for Aldebaran.
Signed-off-by: Alex Sierra <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Aldebaran can share the same initializing shader code witn
arcturus.
Signed-off-by: Dennis Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
With the 2-level gart page table, vram is squeezed into gart aperture
and FB aperture is disabled. Therefore all VRAM virtual addresses are
in the GART aperture. However currently PSP requires TMR addresses
in FB aperture. So we need some design change at PSP FW level to support
this 2-level gart table driver change. Right now this PSP FW support
doesn't exist. To workaround this issue temporarily, FB aperture is
added back and the gart aperture address is converted back to FB aperture
for this PSP TMR address.
Will revert it after we get a fix from PSP FW.
v2: squash in tmr fix for other asics (Kevin)
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Set up HW for 2-level vmid0 page table: 1. Set up
PAGE_TABLE_START/END registers. Currently only plan
to do 2-level page table for ALDEBARAN, so only gfxhub1.0
and mmhub1.7 is changed. 2. Set page table base register.
For 2-level page table, the page table base should point
to PDB0. 3. Disable AGP and FB aperture as they are not
used.
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Christian Konig <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If use gart for FB translation, allocate and fill
PDB0.
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Christian Konig <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add functions to allocate PDB0, map it for CPU access,
and fill it.
Those functions are only used for 2-level vmid0 page
table construction
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If use gart for FB translation, we will squeeze vram into
sysvm aperture. This requires 2 level gart table. Add
page table depth and page table block size parameters
to gmc. This is prepare work to 2-level gart table
construction
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Christian Konig <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If use GART for FB translation, place both vram and gart to sysvm
aperture. AGP aperture is not set up in this case because it
is not used
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Christian Konig <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Modify the comment to reflect the fact that, if
use GART for vram address translation for vmid0,
[vram_start, vram_end] will be placed inside SYSVM
aperture, together with GART.
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Christian Konig <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In amdgpu_gmc_gart_location function, gart_size is adjusted
by a smu_prv_buffer_size. This logic shouldn't belong to
this function. Move the logic to the mc_init functions
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Christian Konig <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On A+A platform, CPU write page directory and page table in cached
mode. So it is necessary for page table walker to snoop CPU cache.
This setting is necessary for page walker to snoop page directory
and page table data out of CPU cache.
Signed-off-by: Oak Zeng <[email protected]>
Acked-by: Christian Konig <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On A+A platform, vram can be mapped as WB. Not necessarily
to always map vram as WC on such platform.
Calling function arch_io_reserve_memtype_wc will mark the
whole vram region as WC. So don't call it for A+A platform.
Signed-off-by: Oak Zeng <[email protected]>
Suggested-by: Alex Deucher <[email protected]>
Acked-by: Christian Konig <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The psp supplies the link type in the upper 2 bits of the psp xgmi node
information num_hops field. With a new link type, Aldebaran has these
bits set to a non-zero value (1 = xGMI3) so the KFD topology will report
the incorrect IO link weights without proper masking.
The actual number of hops is located in the 3 least significant bits of
this field so mask if off accordingly before passing it to the KFD.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Amber Lin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
By default this timestamp is 32 bit counter. It gets
overflowed in around 10 minutes.
Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
If xnack is on, VM retry fault interrupt send to IH ring1, and ring1
will be full quickly. IH cannot receive other interrupts, this causes
deadlock if migrating buffer using sdma and waiting for sdma done while
handling retry fault.
Remove VMC from IH storm client, enable ring1 write pointer overflow,
then IH will drop retry fault interrupts and be able to receive other
interrupts while driver is handling retry fault.
IH ring1 write pointer doesn't writeback to memory by IH, and ring1
write pointer recorded by self-irq is not updated, so always read
the latest ring1 write pointer from register.
Signed-off-by: Philip Yang <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
With the current kfd memory accounting scheme, kfd applications
can use up to 15/16 of total system memory. For system which
has small total system memory size it leaves small system memory
for OS. For example, if the system has totally 16GB of system
memory, this scheme leave OS and non-kfd applications only 1GB
of system memory. In many cases, this leads to OOM killer.
This patch changed the KFD system memory accounting scheme.
15/16 of free system memory when kfd driver load. This deduct
the system memory that OS already use.
Signed-off-by: Oak Zeng <[email protected]>
Suggested-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Arcturus and onwards products should follow the same sequence
that have pmfw loading ahead of tmr setup
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Aldebaran MMHUB CG/LS logic is controlled by VBIOS. Enable the state
change logic only if driver is used for control.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
v1: The interrupts need to be enabled to move to DS clocks.
v2: Don't enable GFX IDLE interrupts if there are no GFX rings.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Aldebaran clock gating support for GFX,SDMA,IH blocks
VCN/JPEG blocks are excluded in this patch, to be enabled later
Signed-off-by: Lijo Lazar <[email protected]>
Acked-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add the mmhub client id table for aldebaran.
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable dpg indirect sram mode on aldebaran.
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable vcn dpg mode on aldebaran
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable vcn and jpeg 2.6 on aldebaran.
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable smu13 block on aldebaran
Signed-off-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
global noretry setting now is cached to gmc.noretry
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
hdp read cache is removed in aldebaran. don't issue
an mmio write or write data packet to hardware.
v2: rebase
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Simplify all Aldebaran DIDs into one ASIC type.
Signed-off-by: Amber Lin <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[Why]
This causes infinite retries on the UTCL1 RB, preventing
higher priority RB such as paging RB.
[How]
Set to one the SDMAx_UTLC1_TIMEOUT registers for all SDMAs.
Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
0x61 is assigned to HBM2E in atom_dgpu_vram_type.
Signed-off-by: Feifei Xu <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
driver should use the gfx_info atomfirmware interface
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For ASICs that don't support ip discovery feature, query
gfx configuration through atomfirmware interface, rather
than gpu_info firmware.
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Temporarily add smu_pptable module parameter for aldebaran.This is used
to force soft PPTable use overriding any VBIOS PPTable.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Kenneth Feng <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Disable PCIe BAR resizing on A+A config. It's not needed because we won't use the
PCIe BAR, but it breaks the PCI BAR configuration with the current SBIOS.
Error message of FB BAR resize failure under A+A:
[ 154.913731] [drm:amdgpu_device_resize_fb_bar [amdgpu]] *ERROR* Problem resizing BAR0 (-22).
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Amber Lin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian Koenig <[email protected]>
Tested-by: Amber Lin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For A+A configuration, device memory is supposed to be mapped as
cachable from CPU side. For kernel pre-map gpu device memory using
ioremap_cache
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Christian Koenig <[email protected]>
Tested-by: Amber Lin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
shall revisit the change later
Signed-off-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
replace vega10 ih block with vega20 ih block for
aldebaran.
Signed-off-by: Hawking Zhang <[email protected]>
Acked-by: Christian König <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|