Age | Commit message (Collapse) | Author | Files | Lines |
|
On GC 9.4.3, we are removing the EOP buffer.
If we specify 0 for the size, CP_HQD_EOP_CONTROL ends up with
incorrect value as order_size_2 calculations does not handle 0.
Fix it by using zero for the MQD entry for EOP size 0.
v2: Reworked code with a conditional assignment and fixed style issues.
Signed-off-by: David Belanger <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Similar to GFX9.4.2 non-A+A devices, GFX9.4.3 psp xgmi topology info is
half duplex and requires the driver to fill in the bidirectional info.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Shiwu Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use amdgpu_vcn4_fw_shared for vcn 4.0.3.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Remove unused code.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use common amdgpu_vcn_setup_ucode for ucode setup.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
New doorbell map is used for VCN 4.0.3.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add aid_id in jpeg header to support multiple AIDs.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add aid_id in vcn header to support multiple AIDs
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use vcn4 irqsrc header for VCN 4.0.3.
Signed-off-by: James Zhu <[email protected]>
Acked-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of number of XCCs, keep a mask of XCCs for the exact XCCs
available on the ASIC. XCC configuration could differ based on
different ASIC configs.
v2:
Rename num_xcd to num_xcc (Hawking)
Use smaller xcc_mask size, changed to u16 (Le)
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add the xgmi LFB_CNTL/LBF_SIZE reg addresses to fetch the xgmi info from.
v2: move get_xgmi_info() to GC_V9_4_3 sepecific source files to utilize
the register definitions specific for GC_V9_4_3
v3: remove the duplicated register definitions
v4: enable xgmi based on asic_type as XGMI_IP ver is not available
yet for IP discovery
Signed-off-by: Shiwu Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Ack-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
hbm3 will be supported in some dgpu program
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The local memory info needs to be fetched before the GPU node is added
to topology. Without this, the sysfs is incorrectly populated and the
size is reported as 0. This was causing rocr tests to fail. This issue
was caused because of a bad merge.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Amber Lin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update SDMA queue information for SDMA 4.4.2.
Signed-off-by: Mukul Joshi <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When creating a user-mode SDMA queue, CP FW expects
driver to use/set virtual SDMA engine id in MAP_QUEUES
packet instead of using the physical SDMA engine id.
Each partition node's virtual SDMA number should start
from 0. However, when allocating doorbell for the queue,
KFD needs to allocate the doorbell from doorbell space
corresponding to the physical SDMA engine id, otherwise
the hwardware will not see the doorbell press.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Amber Lin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The PSP TA will only provide xGMI topology info for links between GPU
sockets so links between partitions from different sockets will be
hardcoded as 3 xGMI hops with 1 hops weighted as xGMI and 2 hops
weighted with a new intra-socket weight to indicate the longest
possible distance.
If the link between a partition and the CPU is non-PCIe, then assume
the CPU (CCDs) is located within the same socket as the partition
and represent the link as an intra-socket weighted single hop XGMI link
with memory bandwidth.
Links between partitions within a single socket will be abstracted as
single hop xGMI links weighted with the new intra-socket weight and
will have memory bandwidth.
Finally, use the unused function bits in the location ID to represent the
coordinates of the compute partition within its socket.
A follow on patch will resolve the requirement for GPU socket xGMI
link representation sometime later.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Four basic reasons as below to do the change:
1. number of ring expand a lot on GC 9.4.3, and adjustment on old
assignment cannot make each ring in a continuous doorbell space.
2. the SDMA doorbell index should not exceed 0x1FF on SDMA 4.2.2 due to
regDOORBELLx_CTRL_ENTRY.BIF_DOORBELLx_RANGE_OFFSET_ENTRY field width.
3. re-design the doorbell assignment and unify the calculation as
"start + ring/inst id" will make the code much concise.
4. only defining the START/END makes the table look simple
v2: (Lijo)
1. replace name
2. use num_inst_per_aid/sdma_doorbell_range instead of hardcoding
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Otherwise the EOP interrupt on non-AID0 cannot route to IH0.
Signed-off-by: Le Ma <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
KIQ doorbell_index is non-zero from XCC1, thus need to left-shift it like
other rings.
Signed-off-by: Le Ma <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Like GFXHUB, set MMHUB0 bitmask for each AID.
Signed-off-by: Le Ma <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
With different node_id, the SDMA interrupt from multiple AIDs can be
distinguished by sw driver.
Signed-off-by: Le Ma <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Correct this though the value is same across different vmhub.
Signed-off-by: Le Ma <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Mmhub on each AID needs to be initialized respectively
Signed-off-by: Le Ma <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Allocate new sdma doorbell index for the instances only on AID1 for now.
Todo: there's limitation that SDMA doorbell index on SDMA 4.4.2 needs to be
less than 0x1FF, so the tail part in _AMDGPU_VEGA20_DOORBELL_ASSIGNMENT is not
enough to store sdma doorbell range on maximum 4 AIDs if doorbell_range is 20.
So it looks better to create a new doorbell index assignment table for 4.4.2.
v2: change "(x << 1) + 2" to "(x + 1) << 1" for readability.
Signed-off-by: Le Ma <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Initialize SDMA instances on each AID.
v2: revise coding fault in hw_fini
Signed-off-by: Le Ma <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
add some elements below:
- num_aid
- aid_id for each sdma instance
- num_inst_per_aid for sdma
and extend macro size below:
- SDMA_MAX_INSTANCES to 16
- AMDGPU_MAX_RINGS to 96
- AMDGPU_MAX_HWIP_RINGS to 32
v2: move aid_id from amdgpu_ring to amdgpu_sdma_instance. (Lijo)
Signed-off-by: Le Ma <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Previously for vega10, the sdma_doorbell_range is only enough for sdma
gfx queue, thus the index on second doorbell page is allocated for sdma
page queue. From vega20, the sdma_doorbell_range on 1st page is enlarged.
Therefore, just leverage these index instead of allocation on 2nd page.
v2: change "(x << 1) + 2" to "(x + 1) << 1" for readability and add comments.
Signed-off-by: Le Ma <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Set RETRY_PERMISSION_OR_INVALID_PAGE_FAULT bit in VM_CONTEXT1_CNTL
as well so XNACK can be enabled in the SQ per process.
Signed-off-by: Amber Lin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use the new atomfirmware initialization logic for GC 9.4.3 based ASICs
also. ASIC init logic doesn't consider boot clocks during init.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Recently introduced commit "drm/amdgpu: Set cache coherency
for GC 9.4.3" did not update the settings applicable for svm ranges.
Add the coherence settings for svm ranges for GFX IP 9.4.3.
Reviewed-by: Amber Lin <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Currently, in CPX mode, the CP_HYP_XCP_CTL register is programmed
incorrectly with the number of XCCs in the partition. As a result,
HIQ doesn't work in CPX mode. Fix this by programming the correct
number of XCCs in a partition, which is 1, in CPX mode.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch updates SDMA queue management for multi XCC in GFX9.4.3.
- Allocate/deallocate SDMA queues from the correct SDMA engines
based on the partition mode.
- Updates the kgd2kfd interface to fetch the correct SDMA register
addresses.
- It also fixes dumping correct SDMA queue info in debugfs.
v2: squash in fix "drm/amdkfd: Fix XGMI SDMA user-mode queue allocation"
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update simd_count and array_count node properties to report
values multiplied by number of XCCs in the partition.
Signed-off-by: Mukul Joshi <[email protected]>
Tested-by: Amber Lin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
During DQM tear down, call DQM stop to unitialize HIQ and
associated memory allocated during packet manager init.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Fix VM fault reporting and clear VM fault register
for XCC1.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Context save handling needs to be updated for a multi XCC
setup:
- On a multi XCC setup, KFD needs to report context save base
address and size for each XCC in MQD.
- Thunk will allocate a large context save area covering all
XCCs which will be equal to: num_of_xccs in a partition * size
of context save area for 1 XCC. However, it will report only the
size of context save area for 1 XCC only in the ioctl call.
- Driver then setups the MQD correctly using the size passed from
Thunk and information about number of XCCs in a partition.
- Update get_wave_state function to return context save area
for all XCCs in the partition.
v2: update the get_wave_state function for mqd manager v11 (Morris)
Signed-off-by: Mukul Joshi <[email protected]>
Tested-by: Amber Lin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Morris Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add XCC instance to select the correct KIQ ring when
flushing TLBs on a multi-XCC setup.
Signed-off-by: Mukul Joshi <[email protected]>
Tested-by: Amber Lin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Gfx 9 starts to have multiple XCC instances in one device. Add instance
parameter to kgd2kfd functions where XCC instance was hard coded as 0.
Also, update code to pass the correct instance number when running
on a multi-XCC setup.
v2: introduce the XCC instance to gfx v11 (Morris)
v3: rebase (Alex)
Signed-off-by: Amber Lin <[email protected]>
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Tested-by: Amber Lin <[email protected]>
Signed-off-by: Morris Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In a device that supports multiple XCCs, unlike AQL queues, the PM4 queue
will be only processed in one XCC in the partitioning. This patch
re-purposes the queue percentage variable in create queue and update
queue ioctl for the user space to specify the target XCC.
Signed-off-by: Amber Lin <[email protected]>
Signed-off-by: Mukul Joshi <[email protected]>
Tested-by: Amber Lin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update MQD management for both HIQ and user-mode compute
queues on a multi XCC setup. MQDs needs to be allocated,
initialized, loaded and destroyed for each XCC in the KFD
node.
v2: squash in fix "drm/amdkfd: Fix SDMA+HIQ HQD allocation on GFX9.4.3"
Signed-off-by: Mukul Joshi <[email protected]>
Signed-off-by: Amber Lin <[email protected]>
Tested-by: Amber Lin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch introduces multi-partition support in KFD.
This patch includes:
- Support for maximum 8 spatial partitions in KFD.
- Initialize one HIQ per partition.
- Management of VMID range depending on partition mode.
- Management of doorbell aperture space between all
partitions.
- Each partition does its own queue management, interrupt
handling, SMI event reporting.
- IOMMU, if enabled with multiple partitions, will only work
on first partition.
- SPM is only supported on the first partition.
- Currently, there is no support for resetting individual
partitions. All partitions will reset together.
Signed-off-by: Mukul Joshi <[email protected]>
Tested-by: Amber Lin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Introduce a new structure, kfd_node, which will now represent
a compute node. kfd_node is carved out of kfd_dev structure.
kfd_dev struct now will become the parent of kfd_node, and will
store common resources such as doorbells, GTT sub-alloctor etc.
kfd_node struct will store all resources specific to a compute
node, such as device queue manager, interrupt handling etc.
This is the first step in adding compute partition support in KFD.
v2: introduce kfd_node struct to gc v11 (Hawking)
v3: make reference to kfd_dev struct through kfd_node (Morris)
v4: use kfd_node instead for kfd isr/mqd functions (Morris)
v5: rebase (Alex)
Signed-off-by: Mukul Joshi <[email protected]>
Tested-by: Amber Lin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Hawking Zhang <[email protected]>
Signed-off-by: Morris Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Mode2 reset for v13.0.6 has similar workflow as v13.0.2
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add additional XCC programming sequences.
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
A new field nodeid in interrupt cookie indicates the node ID.
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Shiwu Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
With the mec FW update to utilize the mqd base set by
driver for kcq mapping, slave kcq ring test and IB test
can be re-enabled.
Signed-off-by: Shiwu Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Each xcc has its own sratch_reg offset
Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Le Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Should use vcn_ring0_1 instead of doorbell index to
set nbio doorbell range.
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Sonny Jiang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable jpeg doorbell for jpeg4.0.3.
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Enable vcn doorbell for vcn4.0.3.
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|