aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-09drm/amdkfd: Flush TLB after unmapping for GFX v9.4.3Philip Yang1-3/+3
kfd_flush_tlb_after_unmap should return true for GFX v9.4.3, to do TLB heavyweight flush after unmapping from GPU to guarantee that the GPU will not access pages after they have been unmapped. This also helps improve the mapping to GPU performance. Without this, KFD accidently flush TLB after mapping to GPU because the vm update sequence number is increased by previous unmapping. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add fallback path for discovery infoLijo Lazar1-5/+16
If SOC doesn't expose dedicated vram, discovery region may be available through system memory. Rename the existing interface to generic read_binary_from_mem and add a fallback path to read from system memory. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Read discovery info from system memoryLijo Lazar1-0/+23
On certain ASICs, discovery info is available at reserved region in system memory. The location is available through ACPI interface. Add API to read discovery info from there. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add API to get tmr info from acpiLijo Lazar2-0/+27
In certain configs, TMR information is available from ACPI. Add API to fetch the information. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add parsing of acpi xcc objectsLijo Lazar3-0/+297
Add parsing of ACPI xcc objects and fill in relevant info from them by invoking the DSM methods. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-and-tested-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Enable SVM on Native modeMukul Joshi2-1/+10
This patch enables SVM capability on GFX9.4.3 when run in Native mode. It also sets best_prefetch and best_restore locations to CPU as there is no VRAM. Signed-off-by: Mukul Joshi <[email protected]> Acked-by: Rajneesh Bhardwaj <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add FGCG for GFX v9.4.3Lijo Lazar1-2/+3
It's not fine grain, behaves similar to MGCG. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Use transient mode during xcp switchLijo Lazar2-3/+16
During partition switch, keep the state as transient mode. Fetch the latest state if switch fails. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add flags for partition mode queryLijo Lazar5-7/+15
It's not required to take lock on all cases while querying partition mode. Querying partition mode during KFD init process doesn't need to take a lock. Init process after a switch will already be happening under lock. Control the behaviour by adding flags to xcp_query_partition_mode. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: fix wrong smu socclk valueYang Wang1-1/+1
fix typo about smu socclk value. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add mode-2 reset in SMU v13.0.6Lijo Lazar2-9/+16
Modifications to mode-2 reset flow for SMU v13.0.6 ASICs. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: Notify PMFW about driver unload casesLijo Lazar1-2/+23
On SMU v13.0.6 APUs, FW will need to take some actions if driver is going to halt RLC. Notify PMFW that driver is not going to manage device so that FW takes care of the required actions. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: Update PMFW headers for version 85.54Lijo Lazar2-19/+2
It adds message support for FW notification on driver unload. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: Expose mem temperature for GC version 9.4.3Asad Kamal1-5/+5
Add mem temperature as part of hw mon attributes for GC version 9.4.3 Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: Update hw mon attributes for GC version 9.4.3Asad Kamal1-11/+25
Update hw mon attributes for GC Version 9.4.3 to valid ones on APU and Non APU systems v2: Group checks along existing one Added power limit & mclock for gc version 9.4.3 Signed-off-by: Asad Kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: Initialize power limit for SMU v13.0.6Lijo Lazar1-15/+0
PMFW will initialize the power limit values even if PPT throttler feature is disabled. Fetch the limit value from FW. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: Keep interface version in PMFW headerLijo Lazar15-52/+17
Use the interface version directly from PMFW interface header file rather than keeping another definition in common smu13 file. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Asad kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: Add ih for SMU v13.0.6 thermal throttlingAsad kamal1-3/+104
Add interrupt handler for thermal throttler events from PMFW on SMUv13.0.6 Signed-off-by: Asad kamal <[email protected]> Acked-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: Update pmfw header files for SMU v13.0.6Asad kamal2-1/+12
Update driver interface for SMU v13.0.6 to be compatible with PMFW v85.48 version Signed-off-by: Asad kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: Update gfx clock frequency for SMU v13.0.6Asad kamal1-1/+10
Update gfx clock frequency from metric table for SMU v13.0.6 Signed-off-by: Asad kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd/pm: Update pmfw header files for SMU v13.0.6Asad kamal2-3/+7
Update driver metrics table for SMU v13.0.6 to be compatible with PMFW v85.47 version Signed-off-by: Asad kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: fix sdma instanceStanley.Yang1-2/+5
It should change logical instance to device instance to query ras info Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: change the print level to warn for ip block disabledLe Ma1-1/+1
Avoid to mislead users as it's not a real error. Signed-off-by: Le Ma <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Reviewed-by: Amber Lin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Increase Max GPU instance to 64Mukul Joshi1-1/+1
Increase Max GPU instances to 64 to handle multi-socket system with GFX 9.4.3 asic. Signed-off-by: Mukul Joshi <[email protected]> Acked-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: increase AMDGPU_MAX_RINGSLe Ma1-1/+1
On newer GPUs, the number of kernel rings are increased. Signed-off-by: Le Ma <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Create VRAM BOs on GTT for GFXIP9.4.3Rajneesh Bhardwaj1-2/+16
On GFXIP9.4.3 APP APU where there is no dedicated VRAM domain handle VRAM BO allocation requests on CPU domain and validate them on GTT. Support for handling multi-socket and multi-numa partitions within a socket will be added by future patches, this enables 1P NPS1 asic bringup configuration. Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Implement new dummy vram managerRajneesh Bhardwaj1-7/+60
This adds dummy vram manager to support ASICs that do not have a dedicated or carvedout vram domain. Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Handle VRAM dependencies on GFXIP9.4.3Rajneesh Bhardwaj11-47/+99
[For 1P NPS1 mode driver bringup] Changes required to initialize the amdgpu driver with frontdoor firmware loading and discovery=2 with the native mode SBIOS that enables CPU GPU unified interleaved memory. sudo modprobe amdgpu discovery=2 Once PSP TMR region is reported via the ACPI interface, the dependency on the ip_discovery.bin will be removed. Choice of where to allocate driver table is given to each IP version. In general, both GTT and VRAM domains will be considered. If one of the tables has a strict restriction for VRAM domain, then only VRAM domain is considered. Reviewed-by: Felix Kuehling <[email protected]> (lijo: Modified the handling for SMU Tables) Signed-off-by: Lijo Lazar <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Enable CG for IH v4.4.2Asad kamal1-1/+2
Enable clock gating on IH v4.4.2 versions. Signed-off-by: Asad kamal <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Enable persistent edc harvesting in APP APUHawking Zhang1-1/+2
Persistent edc harvesting is supported in APP APU Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Initialize mmhub v1_8 ras functionHawking Zhang3-0/+17
Initialize mmhub v1_8 ras function. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add reset_ras_error_status for mmhub v1_8Hawking Zhang1-0/+91
Add reset_ras_error_status callback for mmhub v1_8. It will be used to reset mmhub error status. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add query_ras_error_status for mmhub v1_8Hawking Zhang1-0/+56
Add query_ras_error_status callback for mmhub v1_8. It will be used to log mmhub error status. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add reset_ras_error_count for mmhub v1_8Hawking Zhang1-0/+28
Add reset_ras_error_count callback for mmhub v1_8. It will be used to reset mmhub ras error count. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add query_ras_error_count for mmhub v1_8Hawking Zhang2-0/+116
Add query_ras_error_count callback for mmhub v1_8. It will be used to query and log mmhub error count and memory block. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add mmhub v1_8_0 ras err status registersHawking Zhang2-8/+373
add new ras error status registers introduced in mmhub v1_8_0 to log mmea and mm_cane ras err, including MMEAx_UE|CE_ERR_STATUS_LO|HI MM_CANE_UE|CE_ERR_STATUS_LO|HI Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Initialize sdma v4_4_2 ras functionHawking Zhang1-9/+39
Initialize sdma v4_4_2 ras function and interrupt handler. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add reset_ras_error_count for sdma v4_4_2Hawking Zhang1-0/+23
Add reset_ras_error_count callback for sdma v4_4_2. It will be used to reset sdma ras error count. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add query_ras_error_count for sdma v4_4_2Hawking Zhang2-0/+92
Add query_ras_error_count callback for sdma v4_4_2. It will be used to query and log sdma uncorrectable error count and memory block. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add sdma v4_4_2 ras registersHawking Zhang2-0/+28
SDMA_UE_ERR_STATUS_HI|LO are introduced in v4_4_2 to replace SDMA_EDC_COUNTER/COUNTER2 registers to log SDMA RAS errors Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add common helper to reset ras errorHawking Zhang2-0/+24
Add common helper to reset ras error status. It applies to IP blocks that follow the new ras error logging register design, and need to write 0 to reset the error status. For IP blocks that don't support the new design, please still implement ip specific helper. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Add common helper to query ras error (v2)Hawking Zhang2-0/+173
Add common helper to query ras error status and log error information, including memory block id and erorr count. The helpers are applicable to IP blocks that follow the new ras error logging design. For IP blocks that don't support the new design, please still implement ip specific helper to query ras error. v2: optimize struct amdgpu_ras_err_status_reg_entry and the implementaion in helper (Lijo/Tao) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Enable MGCG on SDMAv4.4.2Lijo Lazar2-11/+15
Enable clock gating on SDMAv4.4.2 versions. Leave memory light sleep to default. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: enable context empty interrupt on sdma v4.4.2Le Ma1-0/+2
With SDMA_CTNL.CTXEMPTY_INT_ENABLE set, the F32 clock can be gated when SDMA finishes all job and goes to idle. And no specific interrupt handling is required in driver. Signed-off-by: Le Ma <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: add vcn_4_0_3 codec querySonny Jiang1-0/+24
Add support for vcn_4_0_3 video codec query Signed-off-by: Sonny Jiang <[email protected]> Reviewed-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: bind cpu and hiveless gpu to a hive if xgmi connectedJonathan Kim1-1/+8
If a CPU and GPU are xGMI connected but the GPU is hiveless with respect to other GPUs, create a new CPU-GPU hive using the GPU's PCI device location ID as the new hive ID to maintain fine grain memory access usage. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdkfd: Cleanup KFD nodes creationPhilip Yang1-16/+2
kfd node allocation outside kfd->num_nodes loop is not needed and causes memory leak because kfd->num_nodes is at least equal to 1. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/ttm: add NUMA node id to the poolRajneesh Bhardwaj3-7/+12
This allows backing ttm_tt structure with pages from different NUMA pools. Tested-by: Graham Sider <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Christian König <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Fix mqd init on GFX v9.4.3Lijo Lazar1-11/+11
For MQD init, an XCC's queue is selected with GRBM select. However, for initialization of MQD, values read from logical XCC0 registers are used. This results in garbage values being read from XCC0 whose queue is not selected. Change to read from the right XCC for MQD initialization. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amd: fix compiler error to support older compilersHarish Kasiviswanathan1-2/+2
‘for’ loop initial declarations are only allowed in C99 or C11 mode Signed-off-by: Harish Kasiviswanathan <[email protected]> Reviewed-by: Mukul Joshi <[email protected]> Signed-off-by: Alex Deucher <[email protected]>