diff options
author | Yunxiang Li <[email protected]> | 2024-04-22 15:04:52 -0400 |
---|---|---|
committer | Alex Deucher <[email protected]> | 2024-05-02 15:41:05 -0400 |
commit | 6e4aa08fa9c6c0c027fc86f242517c925d159393 (patch) | |
tree | 4ddddca78ac62e823b8852fef3c32a41a56483dc /scripts/generate_rust_analyzer.py | |
parent | a5b843269a8f664df85948ec41db1dbcbc2a2d8b (diff) |
drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic
The retry loop for SRIOV reset have refcount and memory leak issue.
Depending on which function call fails it can potentially call
amdgpu_amdkfd_pre/post_reset different number of times and causes
kfd_locked count to be wrong. This will block all future attempts at
opening /dev/kfd. The retry loop also leakes resources by calling
amdgpu_virt_init_data_exchange multiple times without calling the
corresponding fini function.
Align with the bare-metal reset path which doesn't have these issues.
This means taking the amdgpu_amdkfd_pre/post_reset functions out of the
reset loop and calling amdgpu_device_pre_asic_reset each retry which
properly free the resources from previous try by calling
amdgpu_virt_fini_data_exchange.
Signed-off-by: Yunxiang Li <[email protected]>
Reviewed-by: Emily Deng <[email protected]>
Reviewed-by: Zhigang Luo <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Diffstat (limited to 'scripts/generate_rust_analyzer.py')
0 files changed, 0 insertions, 0 deletions