aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
AgeCommit message (Collapse)AuthorFilesLines
2021-11-17drm/amd/amdgpu: fix potential memleakBernard Zhao1-0/+1
In function amdgpu_get_xgmi_hive, when kobject_init_and_add failed There is a potential memleak if not call kobject_put. Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Bernard Zhao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-11-05drm/amdgpu: correct xgmi ras error count resetTao Zhou1-2/+2
The error count reset for xgmi3x16 pcs is missed. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-26drm/amdgpu: Add support for RAS XGMI err queryJohn Clements1-0/+65
Update XGMI RAS to support error query on aldebaran Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-24drm/amdgpu: Update RAS XGMI Error QueryJohn Clements1-1/+3
Resolve bug querying error on unsupported ASIC Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amdgpu: get extended xgmi topology dataJonathan Kim1-2/+56
The TA has a limit to the amount of data that can be retrieved from GET_TOPOLOGY. For setups that exceed this limit, the xGMI topology needs to be re-initialized and data needs to be re-fetched from the extended link records by setting a flag in the shared command buffer. The number of hops and the number of links must be accumulated by the driver. Other data points are all fetched from the first request. Because the TA has already exceeded its link record limit, it cannot hold bidirectional information. Otherwise the driver would have to do more than two fetches so the driver has to reflect the topology information in the opposite direction. v2: squashed with internal reviewed fix Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/amdgpu: remove unnecessary RAS context fieldCandice Li1-1/+0
Delete ras_if->name in the RAS ctx structure and remove related lines. Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-23drm/amdkfd: report xgmi bandwidth between direct peers to the kfdJonathan Kim1-0/+12
Report the min/max bandwidth in megabytes to the kfd for direct xgmi connections only. Indirect peers will report 0 since indirect route is unknown. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: move xgmi ras functions to xgmi_ras_funcsHawking Zhang1-7/+14
xgmi ras is not managed by gpu driver when gpu is connected to cpu through xgmi. move all xgmi ras functions to xgmi_ras_funcs so gpu driver only initializes xgmi ras functions when it manages xgmi ras. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Dennis Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amdgpu: Convert sysfs sprintf/snprintf family to sysfs_emitTian Tao1-2/+2
Fix the following coccicheck warning: drivers/gpu//drm/amd/amdgpu/amdgpu_ras.c:434:9-17: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:220:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:249:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/df_v3_6.c:208:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_psp.c:2973:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:75:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:112:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:58:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:93:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:125:9-17: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:52:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:71:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:140:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:164:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:186:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:208:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_atombios.c:1916:8-16: WARNING: use scnprintf or sprintf Signed-off-by: Tian Tao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-04-09drm/amd/pm: label these APIs used internally as staticEvan Quan1-1/+0
Also drop unnecessary header file and declarations. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-03-23drm/amdgpu: Reset the devices in the XGMI hive duirng probeshaoyunl1-3/+5
In passthrough configuration, hypervisior will trigger the SBR(Secondary bus reset) to the devices without sync to each other. This could cause device hang since for XGMI configuration, all the devices within the hive need to be reset at a limit time slot. This serial of patches try to solve this issue by co-operate with new SMU which will only do minimum house keeping to response the SBR request but don't do the real reset job and leave it to driver. Driver need to do the whole sw init and minimum HW init to bring up the SMU and trigger the reset(possibly BACO) on all the ASICs at the same time Signed-off-by: shaoyunl <[email protected]> Acked-by: Andrey Grodzovsky [email protected] Signed-off-by: Alex Deucher <[email protected]>
2021-03-23drm/amdgpu: mask the xgmi number of hops reported from psp to kfdJonathan Kim1-1/+8
The psp supplies the link type in the upper 2 bits of the psp xgmi node information num_hops field. With a new link type, Aldebaran has these bits set to a non-zero value (1 = xGMI3) so the KFD topology will report the incorrect IO link weights without proper masking. The actual number of hops is located in the 3 least significant bits of this field so mask if off accordingly before passing it to the KFD. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Amber Lin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-02-09drm/amdgpu: optimize list operation in amdgpu_xgmiKevin Wang1-6/+4
simplify the list operation. Signed-off-by: Kevin Wang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-11-13drm/amdgpu: check hive pointer before accessHawking Zhang1-4/+9
in case it is an invalid one Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Kevin Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-10-09drm/amdgpu: Fix inconsistent of format with argument type in amdgpu_xgmi.cYe Bin1-1/+1
Fix follow warning: [drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c:249]: (warning) %d in format string (no. 1) requires 'int' but the argument type is 'unsigned int'. Reported-by: Hulk Robot <[email protected]> Signed-off-by: Ye Bin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-08-24drm/amdgpu: Get DRM dev from adev by inline-fLuben Tuikov1-1/+1
Add a static inline adev_to_drm() to obtain the DRM device pointer from an amdgpu_device pointer. Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-08-24drm/amdgpu: drm_device to amdgpu_device by inline-f (v2)Luben Tuikov1-2/+2
Get the amdgpu_device from the DRM device by use of an inline function, drm_to_adev(). The inline function resolves a pointer to struct drm_device to a pointer to struct amdgpu_device. v2: Use a typed visible static inline function instead of an invisible macro. Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-08-24drm/amdgpu: refine create and release logic of hive infoDennis Li1-95/+113
Change to dynamically create and release hive info object, which help driver support more hives in the future. v2: Change to save hive object pointer in adev, to avoid locking xgmi_mutex every time when calling amdgpu_get_xgmi_hive. v3: 1. Change type of hive object pointer in adev from void* to amdgpu_hive_info*. 2. remove unnecessary variable initialization. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-08-24drm/amdgpu: refine codes to avoid reentering GPU recoveryDennis Li1-2/+1
if other threads have holden the reset lock, recovery will fail to try_lock. Therefore we introduce atomic hive->in_reset and adev->in_gpu_reset, to avoid reentering GPU recovery. v2: drop "? true : false" in the definition of amdgpu_in_reset Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-08-14drm/amdgpu: revert "fix system hang issue during GPU reset"Christian König1-7/+4
The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1aa278c435c30a69b8f2418b4a52fcb929. Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-08-07drm/amdgpu: unlock mutex on errorDennis Li1-3/+3
Make sure to unlock the mutex when error happen v2: 1. correct syntax error in the commit comments 2. remove change-Id Acked-by: Nirmoy Das <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-07-27drm/amdgpu: fix system hang issue during GPU resetDennis Li1-4/+7
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: Hawking Zhang <[email protected]> Suggested-by: Andrey Grodzovsky <[email protected]> Suggested-by: Christian König <[email protected]> Suggested-by: Felix Kuehling <[email protected]> Suggested-by: Lijo Lazar <[email protected]> Suggested-by: Luben Tukov <[email protected]> Signed-off-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-07-01drm/amdgpu: remove unused functionsNirmoy Das1-5/+0
Remove unused amdgpu_xgmi_hive_try_lock() and smu7_reset_asic_tasks(). Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-05-21drm/amdgpu fix incorrect sysfs remove behavior for xgmiJack Zhang1-7/+16
Under xgmi setup,some sysfs fail to create for the second time of kmd driver loading. It's due to sysfs nodes are not removed appropriately in the last unlod time. Changes of this patch: 1. remove sysfs for dev_attr_xgmi_error 2. remove sysfs_link adev->dev->kobj with target name. And it only needs to be removed once for a xgmi setup 3. remove sysfs_link hive->kobj with target name In amdgpu_xgmi_remove_device: 1. amdgpu_xgmi_sysfs_rem_dev_info needs to be run per device 2. amdgpu_xgmi_sysfs_destroy needs to be run on the last node of device. v2: initialize array with memset Signed-off-by: Jack Zhang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-05-14drm/amdgpu: remove redundant assignment to variable retColin Ian King1-1/+1
The variable ret is being initializeed with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-05-08drm/amdgpu: use node_id and node_size to calcualte dram_base_addressHawking Zhang1-25/+2
physical_node_id * node_segment_size should be the dram_base_address for current gpu node in xgmi config Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-27drm/amdgpu: sw pstate switch should only be for vega20Jonathan Kim1-1/+3
Driver steered p-state switching is designed for Vega20 only. Also simplify early return for temporary disable due to SMU FW bug. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amdgpu: fix race between pstate and remote buffer mapJonathan Kim1-32/+36
Vega20 arbitrates pstate at hive level and not device level. Last peer to remote buffer unmap could drop P-State while another process is still remote buffer mapped. With this fix, P-States still needs to be disabled for now as SMU bug was discovered on synchronous P2P transfers. This should be fixed in the next FW update. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-01drm/amdgpu: added xgmi ras error reset sequenceJohn Clements1-0/+30
added mechanism to clear xgmi ras status inbetween error queries Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-03-10drm/amdgpu: call ras_debugfs_create_all in debugfs_initTao Zhou1-1/+0
and remove each ras IP's own debugfs creation this is required to fix ras when the driver does not use the drm load and unload callbacks due to ordering issues with the drm device node. Signed-off-by: Tao Zhou <[email protected]> Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-03-06drm/amdgpu: enable PCS error report on arcturusHawking Zhang1-0/+31
add arcturus xgmi/wafl pcs err status group to support PCS error detection and report on arcturus Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-03-06drm/amdgpu: add helper funcs to detect PCS errorHawking Zhang1-0/+173
Since from vega20, hardware supports run-time detect and report XGMI/WAFL PCS ras error. Add helper functions to walkthrough every type of ras error and report it if any. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-02-26drm/amdgpu: toggle DF-Cstate to protect DF reg accessHawking Zhang1-5/+20
driver needs to take DF out Cstate before any DF register access. otherwise, the DF register may not be accessible. Signed-off-by: Hawking Zhang <[email protected]> Acked-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-02-26drm/amdgpu: move get_xgmi_relative_phy_addr to amdgpu_xgmi.cHawking Zhang1-0/+15
centralize all the xgmi related function to amdgpu_xgmi.c Signed-off-by: Hawking Zhang <[email protected]> Acked-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-02-06drm/amdgpu: move xgmi init/fini to xgmi_add/remove_device call (v2)Hawking Zhang1-3/+12
For sriov, psp ip block has to be initialized before ih block for the dynamic register programming interface that needed for vf ih ring buffer. On the other hand, current psp ip block hw_init function will initialize xgmi session which actaully depends on interrupt to return session context. This results an empty xgmi ta session id and later failures on all the xgmi ta cmd invoked from vf. xgmi ta session initialization has to be done after ih ip block hw_init call. to unify xgmi session init/fini for both bare-metal sriov virtualization use scenario, move xgmi ta init to xgmi_add_device call, and accordingly terminate xgmi ta session in xgmi_remove_device call. The existing suspend/resume sequence will not be changed. v2: squash in return fix from Nirmoy Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Frank Min <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amdgpu: Create generic DF struct in adevJoseph Greathouse1-3/+3
The only data fabric information the adev struct currently contains is a function pointer table. In the near future, we will be adding some cached DF information into adev. As such, this patch creates a new amdgpu_df struct for adev. Right now, it only containst the old function pointer table, but new stuff will be added soon. Signed-off-by: Joseph Greathouse <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-14drm/amd/powerplay: cover the powerplay implementation details V3Evan Quan1-7/+1
This can save users much troubles. As they do not actually need to care whether swSMU or traditional powerplay routine should be used. V2: apply the fixes to vi.c and cik.c also V3: squash in oops fix Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-12-18drm/amdgpu: Add task barrier to XGMI hive.Andrey Grodzovsky1-0/+4
Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-11-07drm/amdgpu: fix vega20 pstate status changeJonathan Kim1-3/+4
vega20 only requires all devices be set to same pstate level for low pstate and not high. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-11-06drm/amdgpu: fix possible pstate switch race conditionEvan Quan1-2/+32
Added lock protection so that the p-state switch will be guarded to be sequential. Also update the hive pstate only all device from the hive are in the same state. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-11-06drm/amd/powerplay: support xgmi pstate setting on powerplay routine V2Evan Quan1-0/+5
Add xgmi pstate setting on powerplay routine. V2: split the change of is_support_sw_smu_xgmi into a separate patch Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-10-03drm/amdgpu: move xgmi ras fini to xgmi blockTao Zhou1-0/+14
it's more suitable to put xgmi ras fini in xgmi block Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-16drm/amdgpu: initialize ras structures for xgmi block (v2)Hawking Zhang1-0/+36
init ras common interface and fs node for xgmi block v2: remove unnecessary physical node number check before invoking amdgpu_xgmi_ras_late_init Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-30drm/amdgpu: adding xgmi error monitoringJonathan Kim1-2/+36
monitor xgmi errors via mc pie status through fica registers. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Kent Russell <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-18amd/powerplay: No SW XGMI dpm for Arcturus rev 2Yong Zhao1-1/+1
xgmi dpm is handled by the SMU. Signed-off-by: Yong Zhao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-18drm/amdgpu: skip get/update xgmi topology info when no psp existsLe Ma1-22/+25
We don't currently have psp support for arcturus so provide a alternative mechanism in the meantime. Signed-off-by: Le Ma <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-18drm/amdgpu: Hack xgmi topology info when there is no psp fwOak Zeng1-11/+16
This is only needed on emulation platform where psp fw might not be available, to hack xgmi topology info such as hive id and node id. v2: Add offset to hacked hive/node id v3: Don't use introduce new module parameter. Signed-off-by: Oak Zeng <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-05-24drm/amd/doc: Add XGMI sysfs documentationTom St Denis1-0/+28
Acked-by: Slava Abramov <[email protected]> Signed-off-by: Tom St Denis <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-05-24drm/amdgpu: Update latest xgmi topology info after each device is enumulatedshaoyunl1-12/+20
Adjust the sequence of set/get xgmi topology, so driver can have the latest XGMI topology info for future usage Signed-off-by: shaoyunl <[email protected]> Acked-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-05-24drm/amdgpu: Implement get num of hops between two xgmi deviceshaoyunl1-5/+18
KFD need to provide the info for upper level to determine the data path Signed-off-by: shaoyunl <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>