aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
AgeCommit message (Collapse)AuthorFilesLines
2024-10-15drm/amdgpu: Fix off by one in current_memory_partition_show()Dan Carpenter1-1/+1
The >= ARRAY_SIZE() should be > ARRAY_SIZE() to prevent an out of bounds read. Fixes: 012be6f22c01 ("drm/amdgpu: Add sysfs interfaces for NPS mode") Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-10-15drm/amdgpu: Fetch NPS mode for GCv9.4.3 VFsLijo Lazar1-5/+7
Use the memory ranges published in discovery table to deduce NPS mode of GC v9.4.3 VFs. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Vignesh Chander <[email protected]> Tested-by: Vignesh Chander <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-10-15drm/amdgpu: Check gmc requirement for reset on initLijo Lazar1-1/+12
Add a callback to check if there is any condition detected by GMC block for reset on init. One case is if a pending NPS change request is detected. If reset is done because of NPS switch, refresh NPS info from discovery table. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-10-15drm/amdgpu: Place NPS mode request on unloadLijo Lazar1-0/+47
If a user has requested NPS mode switch, place the request through PSP during unload of the driver. For devices which are part of a hive, all requests are placed together. If one of them fails, revert back to the current NPS mode. Signed-off-by: Lijo Lazar <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-10-07drm/amdgpu: Add sysfs interfaces for NPS modeLijo Lazar1-16/+98
Add a sysfs interface to see available NPS modes to switch to - cat /sys/bus/pci/devices/../available_memory_paritition Make the current_memory_partition sysfs node read/write for requesting a new NPS mode. The request is only cached and at a later point a driver unload/reload is required to switch to the new NPS mode. Ex: echo NPS1 > /sys/bus/pci/devices/../current_memory_paritition echo NPS4 > /sys/bus/pci/devices/../current_memory_paritition The above interfaces will be available only if the SOC supports more than one NPS mode. Also modify the current memory partition sysfs logic to be more generic. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-10-07drm/amdgpu: Add gmc interface to request NPS modeLijo Lazar1-0/+16
Add a common interface in GMC to request NPS mode through PSP. Also add a variable in hive and gmc control to track the last requested mode. Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-10-01drm/amdgpu: Add option to refresh NPS dataLijo Lazar1-1/+1
In certain use cases, NPS data needs to be refreshed again from discovery table. Add API parameter to refresh NPS data from discovery table. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-09-26drm/amdgpu: Remove unused amdgpu_gmc_vram_cpu_paDr. David Alan Gilbert1-12/+0
amdgpu_gmc_vram_cpu_pa has been unused since commit 087451f372bf ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.") Remove it. Signed-off-by: Dr. David Alan Gilbert <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-16drm/amdgpu: abort KIQ waits when there is a pending resetVictor Skvortsov1-1/+2
Stop waiting for the KIQ to return back when there is a reset pending. It's quite likely that the KIQ will never response. Signed-off-by: Koenig Christian <[email protected]> Suggested-by: Lazar Lijo <[email protected]> Tested-by: Victor Skvortsov <[email protected]> Signed-off-by: Victor Skvortsov <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-08-13drm/amdgpu/mes: add multiple mes ring instances supportJack Xiao1-2/+3
Add multiple mes ring instances in mes structure to support multiple mes pipes. Signed-off-by: Jack Xiao <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-02drm/amdgpu: add tmz support for GC IP v11.5.2Tim Huang1-0/+1
Add tmz support for GC 11.5.2. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: add missing error handling in function ↵Bob Zhou1-1/+5
amdgpu_gmc_flush_gpu_tlb_pasid Fix the unchecked return value warning reported by Coverity, so add error handling. Signed-off-by: Bob Zhou <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: fix locking scope when flushing tlbYunxiang Li1-32/+34
Which method is used to flush tlb does not depend on whether a reset is in progress or not. We should skip flush altogether if the GPU will get reset. So put both path under reset_domain read lock. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> CC: [email protected]
2024-06-14drm/amdgpu: call flush_gpu_tlb directly in gfxhub enableYunxiang Li1-3/+1
Here since we are in reset and takes the reset_domain write side lock already. We can't use the flush tlb helper which tries to take the read side. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1Tim Van Patten1-1/+0
The following commit updated gmc->noretry from 0 to 1 for GC HW IP 9.3.0: commit 5f3854f1f4e2 ("drm/amdgpu: add more cases to noretry=1") This causes the device to hang when a page fault occurs, until the device is rebooted. Instead, revert back to gmc->noretry=0 so the device is still responsive. Fixes: 5f3854f1f4e2 ("drm/amdgpu: add more cases to noretry=1") Signed-off-by: Tim Van Patten <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-17drm/amdgpu: Use NPS ranges from discovery tableLijo Lazar1-0/+76
Add GMC API to fetch NPS range information from discovery table. Use NPS range information in GMC 9.4.3 SOCs when available, otherwise fallback to software method. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-13drm/amdgpu: remove structurally dead code for amd_gmcJesse Zhang1-2/+0
This code cannot be reached: return sysfs_emit(buf, "UNK....) Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-05-02drm/amdgpu: Add gfx v9_4_4 ip blockHawking Zhang1-0/+1
Add gfx v9_4_4 ip block support Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-30drm/amdgpu: support gfx v12 specific pte/pde fieldsHawking Zhang1-2/+2
Add gfx v12 pte/pde support to gmc common helper. v2: squash in fixes (Alex) Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-22drm/amdgpu: add tmz support for GC IP v11.5.1Yifan Zhang1-0/+1
Add tmz support for GC 11.5.1. Signed-off-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: Clean up errors in amdgpu_gmc.cchenxuebing1-1/+1
Fix the following errors reported by checkpatch: ERROR: need consistent spacing around '-' (ctx:WxV) Signed-off-by: chenxuebing <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-01-15drm/amdgpu: move kiq_reg_write_reg_wait() out of amdgpu_virt.cAlex Deucher1-0/+53
It's used for more than just SR-IOV now, so move it to amdgpu_gmc.c and rename it to better match the functionality and update the comments in the code paths to better document when each path is used and why. No functional change. Reviewed-by: Shaoyun.liu <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Cc: [email protected]
2024-01-15drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'Srinivasan Shanmugam1-7/+14
The amdgpu_gmc_vram_checking() function in emulation checks whether all of the memory range of shared system memory could be accessed by GPU, from this aspect, -EIO is returned for error scenarios. Fixes the below: drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:919 gmc_v6_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1103 gmc_v7_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1223 gmc_v8_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2344 gmc_v9_0_hw_init() warn: missing error code? 'r' Cc: Xiaojian Du <[email protected]> Cc: Lijo Lazar <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-11-29drm/amdgpu: fix AGP addressing when GART is not at 0Alex Deucher1-0/+3
This worked by luck if the GART aperture ended up at 0. When we ended up moving GART on some chips, the GART aperture ended up offsetting the AGP address since the resource->start is a GART offset, not an MC address. Fix this by moving the AGP address setup into amdgpu_bo_gpu_offset_no_check(). v2: check mem_type before checking agp v3: check if the ttm bo has a ttm_tt allocated yet Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11") Tested-by: Mario Limonciello <[email protected]> Reported-by: Jesse Zhang <[email protected]> Reported-by: Yifan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Cc: [email protected]
2023-11-07drm/amd: Disable XNACK on SRIOV environmentSurbhi Kakarya1-1/+4
The purpose of this patch is to disable XNACK or set XNACK OFF mode on SRIOV platform which doesn't support it. This will prevent user-space application to fail or result into unexpected behaviour whenever the application need to run test-case in XNACK ON mode. Signed-off-by: Surbhi Kakarya <[email protected]> Reviewed-by: Shaoyun Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-26drm/amdgpu: add tmz support for GC IP v11.5.0Jiadong Zhu1-0/+1
Add tmz support for GC 11.5.0. Signed-off-by: Jiadong Zhu <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-13drm/amdgpu: Address member 'gart_placement' not described in ↵Srinivasan Shanmugam1-0/+1
'amdgpu_gmc_gart_location' Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c:274: warning: Function parameter or member 'gart_placement' not described in 'amdgpu_gmc_gart_location' Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Cc: "Pan, Xinhui" <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-10-04drm/amdgpu/gmc: add a way to force a particular placement for GARTAlex Deucher1-5/+17
We normally place GART based on the location of VRAM and the available address space around that, but provide an option to force a particular location for hardware that needs it. v2: Switch to passing the placement via parameter Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-26drm/amdgpu/gmc: set a default disable value for AGPAlex Deucher1-8/+19
To disable AGP, the start needs to be set to a higher value than the end. Set a default disable value for the AGP aperture and allow the IP specific GMC code to enable it selectively be calling amdgpu_gmc_agp_location(). Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-26drm/amdgpu: further move TLB hw workarounds a layer upChristian König1-0/+19
For the PASID flushing we already handled that at a higher layer, apply those workarounds to the standard flush as well. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-26drm/amdgpu: rework lock handling for flush_tlb v2Christian König1-0/+9
Instead of each implementation doing this more or less correctly move taking the reset lock at a higher level. v2: fix typo Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-26drm/amdgpu: drop error return from flush_gpu_tlb_pasidChristian König1-3/+4
That function never fails, drop the error return. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-26drm/amdgpu: fix and cleanup gmc_v9_0_flush_gpu_tlb_pasidChristian König1-0/+60
Testing for reset is pointless since the reset can start right after the test. The same PASID can be used by more than one VMID, invalidate each of them. Move the KIQ and all the workaround handling into common GMC code. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-26drm/amdgpu: rework gmc_v10_0_flush_gpu_tlb v2Christian König1-0/+48
Move the SDMA workaround necessary for Navi 1x into a higher layer. v2: use dev_err Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-26drm/amdkfd: Don't use sw fault filter if retry cam enabledPhilip Yang1-1/+4
If retry cam enabled, we don't use sw retry fault filter and add fault into sw filter ring, so we shouldn't remove fault from sw filter. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amd: Drop special case for yellow carp without discoveryMario Limonciello1-6/+0
`amdgpu_gmc_get_vbios_allocations` has a special case for how to bring up yellow carp when amdgpu discovery is turned off. As this ASIC ships with discovery turned on, it's generally dead code and worse it causes `adev->mman.keep_stolen_vga_memory` to not be initialized for yellow carp. Remove it. Signed-off-by: Mario Limonciello <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-09-20drm/amdgpu: Use function for IP version checkLijo Lazar1-2/+2
Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-08-31drm/amdgpu: reserve mmhub engine 3 for UMSCH FWLang Yu1-1/+5
UMSCH FW uses mmhub engine 3 for invalidation. Signed-off-by: Lang Yu <[email protected]> Acked-by: Leo Liu <[email protected]> Acked-by: Veerabadhran Gopalakrishnan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: enable tmz by default for GC 11.0.1Ikshwaku Chauhan1-1/+2
Add IP GC 11.0.1 in the list of target to have tmz enabled by default. Signed-off-by: Ikshwaku Chauhan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: Move memory partition query to gmcLijo Lazar1-0/+44
GMC block handles memory related information, it makes more sense to keep memory partition functions in gmc block. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: alloc vm inv engines for every vmhubShiwu Zhang1-8/+7
There are AMDGPU_MAX_VMHUBS of vmhub in maximum and need to init the vm_inv_engs for all of them. In this way, the below error can be ruled out. [ 217.317752] amdgpu 0000:02:00.0: amdgpu: no VM inv eng for ring sdma0 Signed-off-by: Shiwu Zhang <[email protected]> Reviewed-by: Christian Koenig <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-06-09drm/amdgpu: introduce vmhub definition for multi-partition cases (v3)Hawking Zhang1-2/+2
v1: Each partition has its own gfxhub or mmhub. adjust the num of MAX_VMHUBS and the GFXHUB/MMHUB layout (Le) v2: re-design the AMDGPU_GFXHUB/AMDGPU_MMHUB layout (Le) v3: apply the gfxhub/mmhub layout to new IPs (Hawking) v4: fix up gmc11 (Alex) v5: rebase (Alex) Signed-off-by: Le Ma <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-14drm/amdgpu: move vmhub out of amdgpu_ring_funcs (v4)Le Ma1-2/+2
It looks better to place this field in ring structure. Also drop the repeated ring funcs definitions if there's no difference except for vmhub field. v2: rename the field to vm_hub like others (Le) v3: apply the changes to new ip blocks (Hawking) v4: fix vcn sw ring (Alex) Signed-off-by: Le Ma <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-13drm/amdgpu: Rework retry fault removalMukul Joshi1-3/+33
Rework retry fault removal from the software filter by storing an expired timestamp for a fault that is being removed. When a new fault comes, and it matches an entry in the sw filter, it will be added as a new fault only when its timestamp is greater than the timestamp expiry of the fault in the sw filter. This helps in avoiding stale faults being added back into the filter and preventing legitimate faults from being handled. Suggested-by: Felix Kuehling <[email protected]> Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Philip Yang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-03-31drm/amdkfd: Set noretry/xnack for GC 9.4.3Amber Lin1-0/+1
For GC 9.4.3, disable retry as default and XNACK can be different modes per process. Signed-off-by: Amber Lin <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-03-15drm/amdgpu: Rework xgmi_wafl_pcs ras sw_initHawking Zhang1-5/+4
To align with other IP blocks. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-03-15drm/amdgpu: Rework mca ras sw_initHawking Zhang1-0/+13
To align with other IP blocks Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-03-13drm/amdgpu: Move hdp ras block init to ras sw_initHawking Zhang1-0/+5
Initialize hdp ras block only when mmhub ip block supports ras features. Driver queries ras capabilities after early_init, ras block init needs to be moved to sw_init. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-03-13drm/amdgpu: Move mmhub ras block init to ras sw_initHawking Zhang1-0/+5
Initialize mmhub ras block only when mmhub ip block supports ras features. Driver queries ras capabilities after early_init, ras block init needs to be moved to sw_init. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-03-13drm/amdgpu: Move umc ras block init to gmc ras sw_initHawking Zhang1-1/+8
Initialize umc ras block only when umc ip block supports ras. Driver queries ras capabilities after early_init, ras block init needs to be moved to sw_init. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Stanley Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>