Age | Commit message (Collapse) | Author | Files | Lines |
|
Use WREG_SOC15x() instead of WREG32(SOC15_REG_OFFSET())
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Number of instances is extended.
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
New ip_discovery binary size is increased.
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
version 4 supports 64bit ip base address
Signed-off-by: Le Ma <[email protected]>
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
New doorbell index assignment is used by aqua_vanjaram.
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For aqua vanjaram, add mapping for logical to physical
instances.
v2:
Register accesses on bare metal should be based on physical
instance. Use GET_INST() to get physical instance.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add a mask of SDMA instances available for use. On certain ASIC configs,
not all SDMA instances are available for software use.
v2:
Change sdma mask type to uint32_t (Le)
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add XCC logical to physical instance map for aqua vanjaram
v2:
Keep look up table only for required IPs, for others return
default mapping (Felix).
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Four basic reasons as below to do the change:
1. number of ring expand a lot on aqua_vanjaram, and adjustment on old
assignment cannot make each ring in a continuous doorbell space.
2. the SDMA doorbell index should not exceed 0x1FF on aqua_vanjaram due to
regDOORBELLx_CTRL_ENTRY.BIF_DOORBELLx_RANGE_OFFSET_ENTRY field width.
3. re-design the doorbell assignment and unify the calculation as
"start + ring/inst id" will make the code much concise.
4. only defining the START/END makes the table look simple
v2: (Lijo)
1. replace name
2. use num_inst_per_aid/sdma_doorbell_range instead of hardcoding
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In GC v9.4.3 there are multiple XCCs. It's required to use
physical instance number to get the right register offset. Use
GET_INST API for that.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On initialization set the partition mode correctly to SPX (default) or
any other user specified partition mode. Use switch_compute_partition
API so that all settings are initialized correctly.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update interrupt handling in CPX mode for GFX9.4.3 by using the
VMID space instead of SDMA client id to determine if an interrupt
should be processed by a KFD node. This is especially needed for
handling retry faults from MMHUB.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix the if condition which causes dynamic repartitioning
to fail when trying to switch to DPX mode.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Amber Lin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For GFX 9.4.3, use the logical to physical mapping table,
to get the correct XCD instance when accessing registers on
bare metal.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Amber Lin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
GFX_9_4_3 supports multi-XCDs and multi-AIDs in one GPU device. SWS needs
to program IH_VMID_x_LUT with specified XCC instance and corresponded
AID instance.
Signed-off-by: Amber Lin <[email protected]>
Reviewed-by: Mukul Joshi <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It's not required for compute pipeline and will cause soft lockup on emulation
due to long-time writing.
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use s2a entry 5/6 registers to decode sdma doorbell trans on different AIDs,
which aligns the entry table in SHUB spec, and leave entry 4 dedicated for VCN
doorbell to avoid conflict.
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On GFX 9.4.3, there can be multiple KFD nodes. As a result,
SMI events for SVM, queue evict/restore should be raised for
each node independently.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Program partition status register to reflect the current partition mode.
Partition capability register is for capability and is a one-time setting.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This work is required for GC 9.4.3, previous to support memory
partitions per node at SVM. When multiple partition is configured,
every BO should be allocated inside one specific partition which
corresponds to the current amdgpu_device and kfd_node.
v2: squash in compilation fix (Alex)
v3: squash in fix for pre-gfx 9.4.3 (Alex)
v4: squash in best_loc fix (Alex)
Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The packet expects only 16 bits register offset. Hence pass register
offset which is local to each XCC.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add vcn multiple AIDs support.
v2: squash in FW setting fix (Alex)
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update clock gate setting.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add JPEG multiple AIDs support.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update vcn doorbell range to support multiple AIDs.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It needs to be done only for XCC instances in non-AID0. Use the physical
instance to determine non-AID0 XCC instances.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For ASICs with sdma IP v4.4.2, add mapping for logical to physical
instances.
v2:
Register accesses on bare metal should be based on physical
instance. Use GET_INST() to get physical instance.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add a mask of SDMA instances available for use. On certain ASIC configs,
not all SDMA instances are available for software use.
v2:
Change sdma mask type to uint32_t (Le)
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Register accesses need to be based on physical instance on bare metal.
Pass the right instance using logical to physical instance lookup
table before accessing registers. Add a macro GET_INST to get the right
physical instance of an IP corresponding to a logical instance.
v2: fix gfx_v9_4_3_check_rlcg_range() (Alex)
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add a map for logical to physical instances of an IP. For ex: on some device
configurations, the first logical XCC may not be the first physical XCC.
Software may continue to access in logical IP instance order. The map
provides a convenient way to get to the actual physical instance.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
GFX9.4.3 will support dynamic repartitioning of the GPU through sysfs.
Add device repartitioning support in KFD to repartition GPU from one
mode to other.
v2: squash in fix ("drm/amdkfd: Fix warning kgd2kfd_unlock_kfd defined but not used")
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Currently, even if kfd_locked is set, a process is first
created and then removed to work around a race condition
in updating kfd_locked flag. Rework kfd_locked handling to
ensure no processes is created if kfd_locked is set. This
is achieved by updating kfd_locked under kfd_processes_mutex.
With this there is no need for kfd_locked to be an atomic
counter. Instead, it can be a regular integer.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Configure the sdma doorbell settings on NBIF0 and SYSHUB of each AID
v2: fetch aid_id from amdgpu_sdma_instance (Lijo)
Signed-off-by: Le Ma <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On multiple AIDs platform, bit[34:32] in SMD address is leveraged to access
nonAID0 register smn address and new PCI_INDEX_HI register is introduced
to access the higher bits.
v2: rebase on latest register accessors (Alex)
Signed-off-by: Le Ma <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
On GC 9.4.3, we are removing the EOP buffer.
If we specify 0 for the size, CP_HQD_EOP_CONTROL ends up with
incorrect value as order_size_2 calculations does not handle 0.
Fix it by using zero for the MQD entry for EOP size 0.
v2: Reworked code with a conditional assignment and fixed style issues.
Signed-off-by: David Belanger <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Similar to GFX9.4.2 non-A+A devices, GFX9.4.3 psp xgmi topology info is
half duplex and requires the driver to fill in the bidirectional info.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Shiwu Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use amdgpu_vcn4_fw_shared for vcn 4.0.3.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Remove unused code.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use common amdgpu_vcn_setup_ucode for ucode setup.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
New doorbell map is used for VCN 4.0.3.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add aid_id in jpeg header to support multiple AIDs.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add aid_id in vcn header to support multiple AIDs
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use vcn4 irqsrc header for VCN 4.0.3.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of number of XCCs, keep a mask of XCCs for the exact XCCs
available on the ASIC. XCC configuration could differ based on
different ASIC configs.
v2:
Rename num_xcd to num_xcc (Hawking)
Use smaller xcc_mask size, changed to u16 (Le)
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add the xgmi LFB_CNTL/LBF_SIZE reg addresses to fetch the xgmi info from.
v2: move get_xgmi_info() to GC_V9_4_3 sepecific source files to utilize
the register definitions specific for GC_V9_4_3
v3: remove the duplicated register definitions
v4: enable xgmi based on asic_type as XGMI_IP ver is not available
yet for IP discovery
Signed-off-by: Shiwu Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Ack-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
hbm3 will be supported in some dgpu program
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The local memory info needs to be fetched before the GPU node is added
to topology. Without this, the sysfs is incorrectly populated and the
size is reported as 0. This was causing rocr tests to fail. This issue
was caused because of a bad merge.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Amber Lin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update SDMA queue information for SDMA 4.4.2.
Signed-off-by: Mukul Joshi <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When creating a user-mode SDMA queue, CP FW expects
driver to use/set virtual SDMA engine id in MAP_QUEUES
packet instead of using the physical SDMA engine id.
Each partition node's virtual SDMA number should start
from 0. However, when allocating doorbell for the queue,
KFD needs to allocate the doorbell from doorbell space
corresponding to the physical SDMA engine id, otherwise
the hwardware will not see the doorbell press.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Amber Lin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The PSP TA will only provide xGMI topology info for links between GPU
sockets so links between partitions from different sockets will be
hardcoded as 3 xGMI hops with 1 hops weighted as xGMI and 2 hops
weighted with a new intra-socket weight to indicate the longest
possible distance.
If the link between a partition and the CPU is non-PCIe, then assume
the CPU (CCDs) is located within the same socket as the partition
and represent the link as an intra-socket weighted single hop XGMI link
with memory bandwidth.
Links between partitions within a single socket will be abstracted as
single hop xGMI links weighted with the new intra-socket weight and
will have memory bandwidth.
Finally, use the unused function bits in the location ID to represent the
coordinates of the compute partition within its socket.
A follow on patch will resolve the requirement for GPU socket xGMI
link representation sometime later.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|