aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)AuthorFilesLines
2023-06-09drm/amdkfd: Show KFD node memory partition infoPhilip Yang2-1/+11
Show KFD node memory partition id and size, add helper function KFD_XCP_MEMORY_SIZE to get kfd node memory size, will be used later to support memory accounting per partition. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add memory partition id to amdgpu_vmPhilip Yang3-4/+10
If xcp_mgr is initialized, add mem_id to amdgpu_vm structure to store memory partition number when creating amdgpu_vm for the xcp. The xcp number is decided when opening the render device, for example /dev/dri/renderD129 is xcp_id 0, /dev/dri/renderD130 is xcp_id 1. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Store drm node minor number for kfd nodesPhilip Yang1-2/+6
From KFD topology, application will find kfd node with the corresponding drm device node minor number, for example if partition drm node starts from /dev/dri/renderD129, then KFD node 0 with store drm node minor number 129. Application will open drm node /dev/dri/renderD129 to create amdgpu vm for kfd node 0 with the correct vm->mem_id to indicate the memory partition. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add xcp manager num_xcp_per_mem_partitionPhilip Yang2-0/+4
Used by KFD to check memory limit accounting. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: update ref_cnt before ctx freeJames Zhu3-2/+23
Update ref_cnt before ctx free. Signed-off-by: James Zhu <[email protected]> Acked-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: run partition schedule if it is supportedJames Zhu1-2/+13
Run partition schedule if it is supported during ctx init entity. Signed-off-by: James Zhu <[email protected]> Acked-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add partition schedule for GC(9, 4, 3)James Zhu1-0/+41
Implement partition schedule for GC(9, 4, 3). Signed-off-by: James Zhu <[email protected]> Acked-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: keep amdgpu_ctx_mgr in ctx structureJames Zhu2-0/+3
Keep amdgpu_ctx_mgr in ctx structure to track fpriv. v2: add missing fpriv declaration lost in rebase Signed-off-by: James Zhu <[email protected]> Acked-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add partition scheduler list updateJames Zhu3-1/+70
Add partition scheduler list update in late init and xcp partition mode switch. Signed-off-by: James Zhu <[email protected]> Acked-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: update header to support partition schedulingJames Zhu1-0/+15
Update header to support partition scheduling. Signed-off-by: James Zhu <[email protected]> Acked-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add partition ID track in ringJames Zhu2-0/+42
Keep track partition ID in ring. Signed-off-by: James Zhu <[email protected]> Acked-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: find partition ID when open deviceJames Zhu4-0/+38
Find partition ID when open device from render device minor. Signed-off-by: Christian König <[email protected]> Signed-off-by: James Zhu <[email protected]> Reviewed-and-tested-by: Philip Yang<[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: support partition drm devicesJames Zhu6-1/+99
Support partition drm devices on GC_HWIP IP_VERSION(9, 4, 3). This is a temporary solution and will be superceded. Signed-off-by: Christian König <[email protected]> Signed-off-by: James Zhu <[email protected]> Reviewed-and-tested-by: Philip Yang<[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/bu: update mtype_local parameter settingsGraham Sider3-8/+9
Update mtype_local module parameter to use MTYPE_RW by default. 0: MTYPE_RW (default) 1: MTYPE_NC 2: MTYPE_CC Signed-off-by: Graham Sider <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/bu: add mtype_local as a module parameterDavid Francis4-10/+22
Selects the MTYPE to be used for local memory, (0 = MTYPE_CC (default), 1 = MTYPE_NC, 2 = MTYPE_RW) v2: squash in build fix (Alex) Reviewed-by: Graham Sider <[email protected]> Signed-off-by: David Francis <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Override MTYPE per page on GFXv9.4.3 APUsFelix Kuehling3-3/+90
On GFXv9.4.3 NUMA APUs, system memory locality must be determined per page to choose the correct MTYPE. This patch adds a GMC callback that can provide this per-page override and implements it for native mode. Carve-out mode is not yet supported and will use the safe default (remote) MTYPE for system memory. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Philip Yang <[email protected]> Reviewed-and-tested-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Fix per-BO MTYPE selection for GFXv9.4.3Felix Kuehling2-34/+30
Treat system memory on NUMA systems as remote by default. Overriding with a more efficient MTYPE per page will be implemented in the next patch. No need for a special case for APP APUs. System memory is handled the same for carve-out and native mode. And VRAM doesn't exist in native mode. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Philip Yang <[email protected]> Reviewed-and-tested-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu/bu: Add use_mtype_cc_wa module paramGraham Sider4-5/+20
By default, set use_mtype_cc_wa to 1 to set PTE coherence flag MTYPE_CC instead of MTYPE_RW by default. This is required for the time being to mitigate a bug causing XCCs to hit stale data due to TCC marking fully dirty lines as exclusive. Signed-off-by: Graham Sider <[email protected]> Reviewed-by: Joseph Greathouse <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Use legacy TLB flush for gfx943Graham Sider1-0/+12
Invalidate TLBs via a legacy flush request (flush_type=0) prior to the heavyweight flush requests (flush_type=2) in gmc_v9_0.c. This is temporarily required to mitigate a bug causing CPC UTCL1 to return stale translations after invalidation requests in address range mode. v2: squash in long term fix "drm/amdgpu: disable extra gfx943 legacy flush on rev1+" Signed-off-by: Graham Sider <[email protected]> Reviewed-by: Philip Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: For GFX 9.4.3 APU fix vram_usage valueHarish Kasiviswanathan1-3/+6
For GFX 9.4.3 APP APU VRAM is allocated in GTT domain. While freeing memory check for GTT domain instead of VRAM if it is APP APU Signed-off-by: Harish Kasiviswanathan <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Enable NPS4 CPX modePhilip Yang1-3/+3
CPX compute mode is valid mode for NPS4 memory partition mode. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Move pgmap to amdgpu_kfd_dev structurePhilip Yang6-13/+14
VRAM pgmap resource is allocated every time when switching compute partitions because kfd_dev is re-initialized by post_partition_switch, As a result, it causes memory region resource leaking and system memory usage accounting unbalanced. pgmap resource should be allocated and registered only once when loading driver and freed when unloading driver, move it from kfd_dev to amdgpu_kfd_dev. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Skip halting RLC on GFX v9.4.3Lijo Lazar1-16/+7
RLC-PMFW handshake happens periodically when GFXCLK DPM is enabled and halting RLC may cause unexpected results. Avoid halting RLC from driver side. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Fix register accesses in GFX v9.4.3Lijo Lazar1-13/+3
Access registers with the right xcc id. Also, remove the unused logic as PG is not used in GFX v9.4.3 Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Increase queue number per process to 255 on GFX9.4.3Mukul Joshi1-0/+7
Increase the maximum number of queues that can be created per process to 255 on GFX 9.4.3. There is no HWS limitation restricting the number queues that can be created. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Adjust the sequence to query ras error infoHawking Zhang1-6/+7
It turns out STATUS_VALID_FLAG needs to be checked ahead of any other fields. ADDRESS_VALID_FLAG and ERR_INFO_VALID_FLAG only manages ADDRESS and ERR_INFO field respectively. driver should continue poll ERR CNT field even ERR_INFO_VALD_FLAG is not set. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Initialize jpeg v4_0_3 ras functionHawking Zhang1-0/+26
Initialize jpeg v4_0_3 ras function. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add reset_ras_error_count for jpeg v4_0_3Hawking Zhang1-0/+22
Add reset_ras_error_count callback for jpeg v4_0_3. It will be used to reset jpeg ras error count. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add query_ras_error_count for jpeg v4_0_3Hawking Zhang1-0/+64
Add query_ras_error_count callback for jpeg v4_0_3. It will be used to query and log jpeg error count. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Re-enable VCN RAS if DPG is enabledHawking Zhang1-1/+26
VCN RAS enablement sequence needs to be added in DPG HW init sequence. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Initialize vcn v4_0_3 ras functionHawking Zhang1-0/+26
Initialize vcn v4_0_3 ras function Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add reset_ras_error_count for vcn v4_0_3Hawking Zhang1-0/+22
Add reset_ras_error_count callback for vcn v4_0_3. It will be used to reset vcn ras error count. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add query_ras_error_count for vcn v4_0_3Hawking Zhang1-0/+36
Add query_ras_error_count callback for vcn v4_0_3. It will be used to query and log vcn error count. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add vcn/jpeg ras err status registersHawking Zhang2-0/+573
Add new ras error status registers introduced in vcn v4_0_3 to log vcn and jpeg ras error. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Checked if the pointer NULL before use it.Gavin Wan1-8/+12
For SRIOV on some parts, the host driver does not post VBIOS. So the guest cannot get bios information. Therefore, adev->virt.fw_reserve.p_pf2vf and adev->mode_info.atom_context are NULL. Signed-off-by: Gavin Wan <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Acked-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Set memory partitions to 1 for SRIOV.Gavin Wan1-1/+7
For SRIOV, the memory partitions are set on host drover. Each VF only has one memory partition. We need set the memory partitions to 1 on guest driver for SRIOV. V2: sqaush in fix ("drm/amdgpu: Fix memory range info of GC 9.4.3 VFs") Signed-off-by: Gavin Wan <[email protected]> Acked-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Skip using MC FB Offset when APU flag is set for SRIOV.Gavin Wan1-1/+2
The MC_VM_FB_OFFSET is PF only register. It cannot be read on VF. So, the driver should not use MC_VM_FB_OFFSET address to set the address of dev->gmc.aper_base. Signed-off-by: Gavin Wan <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add PSP supporting PSP 13.0.6 SRIOV ucode init.Gavin Wan1-0/+3
Add PSP supporting PSP 13.0.6 SRIOV ucode init. Signed-off-by: Gavin Wan <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add PSP spatial parition interfaceLijo Lazar3-0/+32
Add PSP ring command interface for spatial partitioning. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Return error on invalid compute modeLijo Lazar2-3/+11
Return error if an invalid compute partition mode is requested. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add compute mode descriptor functionLijo Lazar2-23/+22
Keep a helper function to get description of compute partition mode. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Fix unmapping of apertureLijo Lazar3-6/+6
When aperture size is zero, there is no mapping done. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Fix xGMI access P2P mapping failure on GFXIP 9.4.3Rajneesh Bhardwaj1-1/+1
On GFXIP 9.4.3, we dont need to rely on xGMI hive info to determine P2P access. Reviewed-by: Felix Kuehling <[email protected]> Acked-and-tested-by: Mukul Joshi <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Native mode memory partition supportRajneesh Bhardwaj3-0/+26
For native mode, after amdgpu_bo is created on CPU domain, then call amdgpu_ttm_tt_set_mem_pool to select the TTM pool using bo->mem_id. ttm_bo_validate will allocate the memory to the correct memory partition before mapping to GPUs. Reviewed-by: Felix Kuehling <[email protected]> Acked-and-tested-by: Mukul Joshi <[email protected]> Signed-off-by: Philip Yang <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Set TTM pools for memory partitionsPhilip Yang2-2/+60
For native mode only, create TTM pool for each memory partition to store the NUMA node id, then the TTM pool will be selected using memory partition id to allocate memory from the correct partition. Acked-by: Christian König <[email protected]> (rajneesh: changed need_swiotlb and need_dma32 to false for pool init) Reviewed-by: Felix Kuehling <[email protected]> Acked-and-tested-by: Mukul Joshi <[email protected]> Signed-off-by: Philip Yang <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add auto mode for compute partitionLijo Lazar4-5/+35
When auto mode is specified, driver will choose the right compute partition mode. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Reviewed-by: Philip Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Check memory ranges for valid xcp modeLijo Lazar1-6/+13
Check the memory ranges available to the device also for deciding a valid partition mode. Only select combinations are valid for a particular mode. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Reviewed-by: Philip Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Use xcc mask for identifying xccLijo Lazar8-95/+95
Instead of start xcc id and number of xcc per node, use the xcc mask which is the mask of logical ids of xccs belonging to a parition. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Add xcp reference to kfd nodeLijo Lazar2-6/+16
Fetch xcp information from xcp_mgr and also add xcc_mask to kfd node. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Move initialization of xcp before kfdLijo Lazar3-11/+12
After partition switch, fill all relevant xcp information before kfd starts initialization. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>