Age | Commit message (Collapse) | Author | Files | Lines |
|
Enable coarse grain clockgating/light sleep for GC v9.4.3. Remove
programming that is not meant for GC 9.4.3.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Program different ranges in each XCC with MEC_DOORBELL_RANGE_LOWER/HIGHER.
Keeping the same range causes CPF in other XCCs also to be busy when an IB
packet is submitted to KCQ. Only the XCC which processes the packet
comes back to idle afterwards and this causes other CPs not be idle.
This in turn affects clockgating behavior as RLC doesn't get idle
interrupt.
LOWER/HIGHER covers only KIQ/KCQs which are per XCC queues. Assigning
different ranges doesn't seem to have any side effect as user queue ranges
are outside of this range. User queue tests - PM4 through KFD and AQL
through rocr - have the same results after this change.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
During ASIC wide reset, SDMA shouldn't be clockgated and be ready to
accept freeze requests from PMFW. For that, don't stop SDMA engine
during reset and keep the clocks active.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
GFXIP 9.4.3 could be in APU or carveout mode but we cannot use the
xgmi.connected_to_cpu flag to identify the iolinks type. Use appropriate
APU or Carveout mode based condition to report xgmi connection in kfd
topology.
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add num_xcps return.
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
[WA] Increase AMDGPU_MAX_HWIP_RINGS to 64 to support more compute
ring resource. Later need redesign with queue/prirority/scheduler
factors to reduce AMDGPU_MAX_HWIP_RINGS.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: James Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Certain instances of VCN/JPEG IPs may not be usable. Fetch the information
from harvest table.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Revert temporary dGPU VRAM MTYPE setting and align with expected
coherency protocol.
Signed-off-by: Graham Sider <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
SDMA v4.4.2 doesn't need explicit power gating control through PMFW
Signed-off-by: Asad kamal <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable vcn/jpeg on vcn_v4_0_3.
Signed-off-by: James Zhu <[email protected]>
Acked-by Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable indirect_sram mode on vcn_v4_0_3.
Signed-off-by: James Zhu <[email protected]>
Acked-by Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add unified queue support on vcn_v4_0_3.
Signed-off-by: James Zhu <[email protected]>
Acked-by Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add fwlog support on vcn_v4_0_3.
Signed-off-by: James Zhu <[email protected]>
Acked-by Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
vcn_v4_0_3 increased jpeg instances,
need increasing MAX resources setting accordlingly.
Signed-off-by: James Zhu <[email protected]>
Acked-by Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Get information about active XCC and SDMAs from discovery table.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When overridden with module param, directly read discovery info
from discovery binary instead of reading from VRAM.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Some AMD APUs may not have a dedicated VRAM. On such platforms the GART
table should be allocated on the system memory. When real vram size is
zero, place the GART table in system memory and create an SG BO to make
it GPU accessible.
v2: fix includes
Reviewed-by: Felix Kuehling <[email protected]>
(rajneesh: removed set_memory_wc workaround)
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add logic for fine grain clock gating logic for GFX v9.4.3. The feature
will be controlled using CG flags. Also, make a change so that RLC safe
mode entry/exit is done only once during CG update sequence.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On AMD APP APUs, to make UTCL2 snoop CPU caches, its not sufficient to
rely on xgmi connected flag so add the logic to use is_app_apu to
program the PDE_REQUEST_PHYSICAL bit correctly for gfxhub and mmhub
both.
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For GFX v9_4_3, set MTYPE_UC for memory access over PCIe.
v4 - add missing indentation pointed out by Felix and add his
reviewed-by tag.
v3 - add missing logic for the svm path.
v2 - add amdgpu_xgmi_same_hive to separate access over xgmi from pcie
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Amber Lin <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Each compute cluster gets 8 compute queues in GFX v9.4.3. Fix the EOP
buffer allocation so that compute queue on every XCC gets a unique
address.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Tested-and-Reviewed-by: Asad Kamal <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
ASICs with GFX 9.4.3 support 48-bit addressing.
Signed-off-by: Lijo Lazar <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use the right register for semaphore release during invalidation.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Setup rolling current_logical_xcc_id in MQD for GFX9.4.3
to ensure each queue starts at a different place and prevent
hotspotting issues. Also, remove updating current_logical_xcc_id
during queue update.
Suggested-by: Joseph Greathouse <[email protected]>
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There is no need to check return value, as the function internally
used - amdgpu_discovery_read_binary_from_vram() - returns void.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The AMDGPU_GFXHUB was bind to each xcc in the logical order.
Thus convert the node_id to logical xcc_id to index the
correct AMDGPU_GFXHUB. And "node_id / 4" can get the correct
AMDGPU_MMHUB0 index.
Signed-off-by: Le Ma <[email protected]>
Tested-by: Asad kamal <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In GFX 9.4.3, there can be more than 8 SDMA engines.
As a result, extended_engine_sel and engine_sel fields
in MAP_QUEUES packet need to be updated to allow correct
mapping of SDMA queues to these SDMA engines.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Apply the GFXIP 9.4.3 specific snoop and mtype settings for various
scenarios such as APU, APU in Carveout mode and dGPU mode.
Note: This is expected to change due to:
1 - NPS > 1 support in future
2 - Hardware bugs found during initial asic bringup.
Cc: Graham Sider <[email protected]>
Cc: Hawking Zhang <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use a mask of available active clusters instead of using only the number
of active clusters.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
SDMA instances per active cluster and SDMA instance mask are used
to find the number of active clusters.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Move soc specific configuration details to aqua vanjaram specific file.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Just like the KIQ, KCQ need to clear the doorbell related regs as well
to avoid hangs when to load driver again after unloading.
Signed-off-by: Shiwu Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On ASICs with PSPv13.0.6, TMR is reserved at boot time. There is no need
to allocate TMR region by driver. However, it's still required to send
SETUP_TMR command to PSP.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Initialize with the IP specific functions needed for GFXHUB, GFX and
SDMA.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add functions to suspend/resume GFX instances belonging to an XCP.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add functions required to suspend/resume instances of SDMA which
are part of an XCP.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add functions required for suspend/resume of GFXHUB instances which are
part of an XCP.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For GFXv9.4.3, use SOC level partition switch implementation rather than
keeping them at GFX IP level. Change the exisiting implementation in
GFX IP for keeping partition mode and restrict it to only GFX related
switch.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add function to initialize soc configuration information for GC 9.4.3
ASICs. Use it to map IPs and other SOC related information once IP
configuration information is available through discovery.
For GC9.4.3 compute partition related callbacks are initialized as part
of configuration init.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Switching the partition mode configuration of ASIC is SOC
level function rather than something at GFX core level. Add
partition mode switch functions as SOC specific callbacks.
Implement the XCP manager callbacks needed for partition
switch for GC 9.4.3 based ASICs.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Within a device, an accelerator core partition can be constituted with
different IP instances. These partitions are spatial in nature. Number
of partitions which can exist at the same time depends on the 'partition
mode'. Add a manager entity which is responsible for switching between
different partition modes and maintaining partitions. It is also
responsible for suspend/resume of different partitions.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
SDMA 4.4.2 supports multiple instances. Add functions to support
handling of each SDMA instance separately.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
GFXHUB 1.2 supports multiple XCC instances. Add XCC specific functions
to handle XCC instances separately.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add more XCC specific functions and use them from IP block functions.
RLC, CP functions are further split to have xcc specific versions.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add 'xcc' prefix to xcc specific functions to distinguish from IP block
functions.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On GPXIP 9.4.3 APU, in no carveout mode there is no real vram heap and
could be emulated by the driver over the interleaved NUMA system memory
and the APU could also be in the carveout mode during early development
stage or otherwise for debugging purpose so introduce a new member in
amdgpu_gmc to figure out whether the APU is in the native mode as per
the production configuration. AMD_IS_APU cannot be used for Accelerated
Processing Platform APUs as it might be used in a different context on
previous generations or on small APUs.
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Tested-by: Graham Sider <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Output IH cookie node_id and translate it to the corresponding AID id
and XCC id, to help debug the GPU page fault.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For driver de-init like rmmod operations those partition specific
attributes need to be removed accordingly.
Signed-off-by: Shiwu Zhang <[email protected]>
Reviewed-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For gfx_v9_4_3 and beyond, struct kiq has its own mqd_backup pointer
rather than using the last pointer from mec struct. Then the kfree
operation on the pointer from the mec struct should be removed otherwise
it will cause double free on the first kcq's mqd_backup buffer on XCD1.
Signed-off-by: Shiwu Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|