aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-09drm/amdkfd: EOP Removal - Handle size 0 correctlyDavid Belanger1-2/+7
On GC 9.4.3, we are removing the EOP buffer. If we specify 0 for the size, CP_HQD_EOP_CONTROL ends up with incorrect value as order_size_2 calculations does not handle 0. Fix it by using zero for the MQD entry for EOP size 0. v2: Reworked code with a conditional assignment and fixed style issues. Signed-off-by: David Belanger <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: reflect psp xgmi topology info for gfx9.4.3Jonathan Kim1-4/+7
Similar to GFX9.4.2 non-A+A devices, GFX9.4.3 psp xgmi topology info is half duplex and requires the driver to fill in the bidirectional info. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Shiwu Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/vcn: update amdgpu_fw_shared to amdgpu_vcn4_fw_sharedJames Zhu1-29/+11
Use amdgpu_vcn4_fw_shared for vcn 4.0.3. Signed-off-by: James Zhu <[email protected]> Acked-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/vcn: remove unused codeJames Zhu1-121/+0
Remove unused code. Signed-off-by: James Zhu <[email protected]> Acked-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/vcn: update ucode setupJames Zhu1-10/+1
Use common amdgpu_vcn_setup_ucode for ucode setup. Signed-off-by: James Zhu <[email protected]> Acked-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/vcn: update new doorbell mapJames Zhu3-5/+5
New doorbell map is used for VCN 4.0.3. Signed-off-by: James Zhu <[email protected]> Acked-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/jpeg: update jpeg header to support multiple AIDsJames Zhu1-0/+2
Add aid_id in jpeg header to support multiple AIDs. Signed-off-by: James Zhu <[email protected]> Acked-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/vcn: update vcn header to support multiple AIDsJames Zhu1-0/+3
Add aid_id in vcn header to support multiple AIDs Signed-off-by: James Zhu <[email protected]> Acked-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/vcn: use vcn4 irqsrc header for VCN 4.0.3James Zhu1-3/+3
Use vcn4 irqsrc header for VCN 4.0.3. Signed-off-by: James Zhu <[email protected]> Acked-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Change num_xcd to xcc_maskLijo Lazar7-99/+141
Instead of number of XCCs, keep a mask of XCCs for the exact XCCs available on the ASIC. XCC configuration could differ based on different ASIC configs. v2: Rename num_xcd to num_xcc (Hawking) Use smaller xcc_mask size, changed to u16 (Le) Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add the support of XGMI link for GC 9.4.3Shiwu Zhang2-4/+47
Add the xgmi LFB_CNTL/LBF_SIZE reg addresses to fetch the xgmi info from. v2: move get_xgmi_info() to GC_V9_4_3 sepecific source files to utilize the register definitions specific for GC_V9_4_3 v3: remove the duplicated register definitions v4: enable xgmi based on asic_type as XGMI_IP ver is not available yet for IP discovery Signed-off-by: Shiwu Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Ack-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add new vram type for dgpuHawking Zhang2-0/+2
hbm3 will be supported in some dgpu program Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Populate memory info before adding GPU node to topologyMukul Joshi1-2/+2
The local memory info needs to be fetched before the GPU node is added to topology. Without this, the sysfs is incorrectly populated and the size is reported as 0. This was causing rocr tests to fail. This issue was caused because of a bad merge. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Amber Lin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Add SDMA info for SDMA 4.4.2Mukul Joshi1-0/+1
Update SDMA queue information for SDMA 4.4.2. Signed-off-by: Mukul Joshi <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Fix SDMA in CPX modeMukul Joshi1-4/+15
When creating a user-mode SDMA queue, CP FW expects driver to use/set virtual SDMA engine id in MAP_QUEUES packet instead of using the physical SDMA engine id. Each partition node's virtual SDMA number should start from 0. However, when allocating doorbell for the queue, KFD needs to allocate the doorbell from doorbell space corresponding to the physical SDMA engine id, otherwise the hwardware will not see the doorbell press. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Amber Lin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: add gpu compute cores io links for gfx9.4.3Jonathan Kim4-16/+47
The PSP TA will only provide xGMI topology info for links between GPU sockets so links between partitions from different sockets will be hardcoded as 3 xGMI hops with 1 hops weighted as xGMI and 2 hops weighted with a new intra-socket weight to indicate the longest possible distance. If the link between a partition and the CPU is non-PCIe, then assume the CPU (CCDs) is located within the same socket as the partition and represent the link as an intra-socket weighted single hop XGMI link with memory bandwidth. Links between partitions within a single socket will be abstracted as single hop xGMI links weighted with the new intra-socket weight and will have memory bandwidth. Finally, use the unused function bits in the location ID to represent the coordinates of the compute partition within its socket. A follow on patch will resolve the requirement for GPU socket xGMI link representation sometime later. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: introduce new doorbell assignment table for GC 9.4.3Le Ma4-32/+33
Four basic reasons as below to do the change: 1. number of ring expand a lot on GC 9.4.3, and adjustment on old assignment cannot make each ring in a continuous doorbell space. 2. the SDMA doorbell index should not exceed 0x1FF on SDMA 4.2.2 due to regDOORBELLx_CTRL_ENTRY.BIF_DOORBELLx_RANGE_OFFSET_ENTRY field width. 3. re-design the doorbell assignment and unify the calculation as "start + ring/inst id" will make the code much concise. 4. only defining the START/END makes the table look simple v2: (Lijo) 1. replace name 2. use num_inst_per_aid/sdma_doorbell_range instead of hardcoding Signed-off-by: Le Ma <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: program GRBM_MCM_ADDR for non-AID0 GRBMLe Ma1-0/+3
Otherwise the EOP interrupt on non-AID0 cannot route to IH0. Signed-off-by: Le Ma <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: convert the doorbell_index to 2 dwords offset for kiqLe Ma1-4/+3
KIQ doorbell_index is non-zero from XCC1, thus need to left-shift it like other rings. Signed-off-by: Le Ma <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: set mmhub bitmask for multiple AIDsLe Ma1-1/+1
Like GFXHUB, set MMHUB0 bitmask for each AID. Signed-off-by: Le Ma <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: complement the IH node_id table for multiple AIDsLe Ma3-1/+12
With different node_id, the SDMA interrupt from multiple AIDs can be distinguished by sw driver. Signed-off-by: Le Ma <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: correct the vmhub reference for each XCD in gfxhub initLe Ma1-4/+8
Correct this though the value is same across different vmhub. Signed-off-by: Le Ma <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: do mmhub init for multiple AIDsLe Ma1-261/+348
Mmhub on each AID needs to be initialized respectively Signed-off-by: Le Ma <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: assign the doorbell index for sdma on non-AID0Le Ma2-3/+21
Allocate new sdma doorbell index for the instances only on AID1 for now. Todo: there's limitation that SDMA doorbell index on SDMA 4.4.2 needs to be less than 0x1FF, so the tail part in _AMDGPU_VEGA20_DOORBELL_ASSIGNMENT is not enough to store sdma doorbell range on maximum 4 AIDs if doorbell_range is 20. So it looks better to create a new doorbell index assignment table for 4.4.2. v2: change "(x << 1) + 2" to "(x + 1) << 1" for readability. Signed-off-by: Le Ma <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add support for SDMA on multiple AIDsLe Ma2-7/+21
Initialize SDMA instances on each AID. v2: revise coding fault in hw_fini Signed-off-by: Le Ma <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: adjust some basic elements for multiple AID caseLe Ma3-3/+6
add some elements below: - num_aid - aid_id for each sdma instance - num_inst_per_aid for sdma and extend macro size below: - SDMA_MAX_INSTANCES to 16 - AMDGPU_MAX_RINGS to 96 - AMDGPU_MAX_HWIP_RINGS to 32 v2: move aid_id from amdgpu_ring to amdgpu_sdma_instance. (Lijo) Signed-off-by: Le Ma <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: assign the doorbell index in 1st page to sdma page queueLe Ma3-7/+17
Previously for vega10, the sdma_doorbell_range is only enough for sdma gfx queue, thus the index on second doorbell page is allocated for sdma page queue. From vega20, the sdma_doorbell_range on 1st page is enlarged. Therefore, just leverage these index instead of allocation on 2nd page. v2: change "(x << 1) + 2" to "(x + 1) << 1" for readability and add comments. Signed-off-by: Le Ma <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Set XNACK per process on GC 9.4.3Amber Lin2-3/+5
Set RETRY_PERMISSION_OR_INVALID_PAGE_FAULT bit in VM_CONTEXT1_CNTL as well so XNACK can be enabled in the SQ per process. Signed-off-by: Amber Lin <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Use new atomfirmware init for GC 9.4.3Lijo Lazar1-1/+2
Use the new atomfirmware initialization logic for GC 9.4.3 based ASICs also. ASIC init logic doesn't consider boot clocks during init. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Update coherence settings for svm rangesRajneesh Bhardwaj2-0/+19
Recently introduced commit "drm/amdgpu: Set cache coherency for GC 9.4.3" did not update the settings applicable for svm ranges. Add the coherence settings for svm ranges for GFX IP 9.4.3. Reviewed-by: Amber Lin <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Fix CP_HYP_XCP_CTL register programming in CPX modeMukul Joshi1-1/+1
Currently, in CPX mode, the CP_HYP_XCP_CTL register is programmed incorrectly with the number of XCCs in the partition. As a result, HIQ doesn't work in CPX mode. Fix this by programming the correct number of XCCs in a partition, which is 1, in CPX mode. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Update SDMA queue management for GFX9.4.3Mukul Joshi5-41/+227
This patch updates SDMA queue management for multi XCC in GFX9.4.3. - Allocate/deallocate SDMA queues from the correct SDMA engines based on the partition mode. - Updates the kgd2kfd interface to fetch the correct SDMA register addresses. - It also fixes dumping correct SDMA queue info in debugfs. v2: squash in fix "drm/amdkfd: Fix XGMI SDMA user-mode queue allocation" Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Update sysfs node properties for multi XCCMukul Joshi1-2/+4
Update simd_count and array_count node properties to report values multiplied by number of XCCs in the partition. Signed-off-by: Mukul Joshi <[email protected]> Tested-by: Amber Lin <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Call DQM stop during DQM uninitializeMukul Joshi1-0/+8
During DQM tear down, call DQM stop to unitialize HIQ and associated memory allocated during packet manager init. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Fix VM fault reporting on XCC1Mukul Joshi1-3/+5
Fix VM fault reporting and clear VM fault register for XCC1. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Update context save handling for multi XCC setup (v2)Mukul Joshi6-3/+67
Context save handling needs to be updated for a multi XCC setup: - On a multi XCC setup, KFD needs to report context save base address and size for each XCC in MQD. - Thunk will allocate a large context save area covering all XCCs which will be equal to: num_of_xccs in a partition * size of context save area for 1 XCC. However, it will report only the size of context save area for 1 XCC only in the ioctl call. - Driver then setups the MQD correctly using the size passed from Thunk and information about number of XCCs in a partition. - Update get_wave_state function to return context save area for all XCCs in the partition. v2: update the get_wave_state function for mqd manager v11 (Morris) Signed-off-by: Mukul Joshi <[email protected]> Tested-by: Amber Lin <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Morris Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add XCC inst to PASID TLB flushingMukul Joshi9-18/+25
Add XCC instance to select the correct KIQ ring when flushing TLBs on a multi-XCC setup. Signed-off-by: Mukul Joshi <[email protected]> Tested-by: Amber Lin <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Add XCC instance to kgd2kfd interface (v3)Mukul Joshi17-218/+270
Gfx 9 starts to have multiple XCC instances in one device. Add instance parameter to kgd2kfd functions where XCC instance was hard coded as 0. Also, update code to pass the correct instance number when running on a multi-XCC setup. v2: introduce the XCC instance to gfx v11 (Morris) v3: rebase (Alex) Signed-off-by: Amber Lin <[email protected]> Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Tested-by: Amber Lin <[email protected]> Signed-off-by: Morris Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Add PM4 target XCCMukul Joshi4-4/+22
In a device that supports multiple XCCs, unlike AQL queues, the PM4 queue will be only processed in one XCC in the partitioning. This patch re-purposes the queue percentage variable in create queue and update queue ioctl for the user space to specify the target XCC. Signed-off-by: Amber Lin <[email protected]> Signed-off-by: Mukul Joshi <[email protected]> Tested-by: Amber Lin <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Update MQD management on multi XCC setupMukul Joshi10-57/+380
Update MQD management for both HIQ and user-mode compute queues on a multi XCC setup. MQDs needs to be allocated, initialized, loaded and destroyed for each XCC in the KFD node. v2: squash in fix "drm/amdkfd: Fix SDMA+HIQ HQD allocation on GFX9.4.3" Signed-off-by: Mukul Joshi <[email protected]> Signed-off-by: Amber Lin <[email protected]> Tested-by: Amber Lin <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Add spatial partitioning support in KFDMukul Joshi7-77/+208
This patch introduces multi-partition support in KFD. This patch includes: - Support for maximum 8 spatial partitions in KFD. - Initialize one HIQ per partition. - Management of VMID range depending on partition mode. - Management of doorbell aperture space between all partitions. - Each partition does its own queue management, interrupt handling, SMI event reporting. - IOMMU, if enabled with multiple partitions, will only work on first partition. - SPM is only supported on the first partition. - Currently, there is no support for resetting individual partitions. All partitions will reset together. Signed-off-by: Mukul Joshi <[email protected]> Tested-by: Amber Lin <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Introduce kfd_node struct (v5)Mukul Joshi38-495/+573
Introduce a new structure, kfd_node, which will now represent a compute node. kfd_node is carved out of kfd_dev structure. kfd_dev struct now will become the parent of kfd_node, and will store common resources such as doorbells, GTT sub-alloctor etc. kfd_node struct will store all resources specific to a compute node, such as device queue manager, interrupt handling etc. This is the first step in adding compute partition support in KFD. v2: introduce kfd_node struct to gc v11 (Hawking) v3: make reference to kfd_dev struct through kfd_node (Morris) v4: use kfd_node instead for kfd isr/mqd functions (Morris) v5: rebase (Alex) Signed-off-by: Mukul Joshi <[email protected]> Tested-by: Amber Lin <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Signed-off-by: Morris Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add mode2 reset logic for v13.0.6Lijo Lazar1-0/+2
Mode2 reset for v13.0.6 has similar workflow as v13.0.2 Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add some XCC programmingLijo Lazar1-0/+26
Add additional XCC programming sequences. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add node_id to physical id conversion in EOP handlerLe Ma3-2/+29
A new field nodeid in interrupt cookie indicates the node ID. Signed-off-by: Le Ma <[email protected]> Reviewed-by: Shiwu Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: enable the ring and IB test for slave kcqShiwu Zhang3-44/+32
With the mec FW update to utilize the mqd base set by driver for kcq mapping, slave kcq ring test and IB test can be re-enabled. Signed-off-by: Shiwu Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: support gc v9_4_3 ring_test running on all xccHawking Zhang1-4/+7
Each xcc has its own sratch_reg offset Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: fix vcn doorbell range settingJames Zhu1-1/+1
Should use vcn_ring0_1 instead of doorbell index to set nbio doorbell range. Signed-off-by: James Zhu <[email protected]> Reviewed-by: Sonny Jiang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/jpeg: enable jpeg doorbell for jpeg4.0.3James Zhu1-2/+11
Enable jpeg doorbell for jpeg4.0.3. Signed-off-by: James Zhu <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/vcn: enable vcn doorbell for vcn4.0.3James Zhu1-1/+9
Enable vcn doorbell for vcn4.0.3. Signed-off-by: James Zhu <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>