aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2024-03-20drm/amdgpu: drop setting buffer funcs in sdma442Le Ma1-22/+1
To fix the entity rq NULL issue. This setting has been moved to upper level. Fixes: b70438004a14 ("drm/amdgpu: move buffer funcs setting up a level") Signed-off-by: Le Ma <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-20Revert "drm/amdgpu/vpe: don't emit cond exec command under collaborate mode"Lang Yu1-3/+0
Ready now. Remove this workaround. This reverts commit d40f6213b52c161fd4634933acbc32103a283363. Signed-off-by: Lang Yu <[email protected]> Tested-by: Alan Liu <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-20Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in ↵Ma Jun1-10/+6
amdgpu_device_init()" This patch causes the following iounmap erorr and calltrace iounmap: bad address 00000000d0b3631f The original patch was unjustified because amdgpu_device_fini_sw() will always cleanup the rmmio mapping. This reverts commit eb4f139888f636614dab3bcce97ff61cefc4b3a7. Signed-off-by: Ma Jun <[email protected]> Suggested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-20drm/amdgpu: Bypass display ta if display hw is not availableHawking Zhang1-0/+18
Do not load/invoke display TA if display hardware is not available. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-20drm/amdgpu: correct the KGQ fallback messagePrike Liang1-1/+1
Fix the KGQ fallback function name, as this will help differentiate the failure in the KCQ enablement. Signed-off-by: Prike Liang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-20drm/amdgpu: Skip access PF-only registers on gfx10/gfxhub2_1 under SRIOVZhenGuo Yin2-2/+9
[Why] RLCG interface returns "out-of-range" error under SRIOV VF when accessing PF-only registers. [How] Skip access PF-only registers on gfx10/gfxhub2_1 under SRIOV. Acked-by: Alex Deucher <[email protected]> Signed-off-by: ZhenGuo Yin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-20drm/amdgpu: Init zone device and drm client after mode-1 reset on reloadAhmad Rehman2-2/+5
In passthrough environment, when amdgpu is reloaded after unload, mode-1 is triggered after initializing the necessary IPs, That init does not include KFD, and KFD init waits until the reset is completed. KFD init is called in the reset handler, but in this case, the zone device and drm client is not initialized, causing app to create kernel panic. v2: Removing the init KFD condition from amdgpu_amdkfd_drm_client_create. As the previous version has the potential of creating DRM client twice. v3: v2 patch results in SDMA engine hung as DRM open causes VM clear to SDMA before SDMA init. Adding the condition to in drm client creation, on top of v1, to guard against drm client creation call multiple times. Signed-off-by: Ahmad Rehman <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-20drm/amdgpu: amdgpu_ttm_gart_bind set gtt bound flagPhilip Yang1-0/+1
Otherwise after the GTT bo is released, the GTT and gart space is freed but amdgpu_ttm_backend_unbind will not clear the gart page table entry and leave valid mapping entry pointing to the stale system page. Then if GPU access the gart address mistakely, it will read undefined value instead page fault, harder to debug and reproduce the real issue. Cc: [email protected] Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-20drm/amdgpu/vcn: enable vcn1 fw load for VCN 4_0_6Saleemkhan Jamadar10-38/+52
v1 - update the fw header for each vcn instance (Veera) VCN1 has different FW binary in VCN v4_0_6. Add changes to load the VCN1 fw binary Signed-off-by: Saleemkhan Jamadar <[email protected]> Reviewed-by: Veerabadhran Gopalakrishnan <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-20drm/amdgpu: Reset IH OVERFLOW_EN bit for IH 7.0Friedrich Vock1-0/+6
IH 7.0 support landed shortly after the original patch for resetting the bit on all other generations, but without that patch applied. Fixes: 12443fc53e7d ("drm/amdgpu: Add ih v7_0 ip block support") Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Friedrich Vock <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-20drm/amdgpu: fix mmhub client id out-of-bounds accessLang Yu1-4/+3
Properly handle cid 0x140. Fixes: aba2be41470a ("drm/amdgpu: add mmhub 3.3.0 support") Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-20drm/amdgpu: fix use-after-free bugVitaly Prosyak1-4/+16
The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl to the AMDGPU DRM driver on any ASICs with an invalid address and size. The bug was reported by Joonkyo Jung <[email protected]>. For example the following code: static void Syzkaller1(int fd) { struct drm_amdgpu_gem_userptr arg; int ret; arg.addr = 0xffffffffffff0000; arg.size = 0x80000000; /*2 Gb*/ arg.flags = 0x7; ret = drmIoctl(fd, 0xc1186451/*amdgpu_gem_userptr_ioctl*/, &arg); } Due to the address and size are not valid there is a failure in amdgpu_hmm_register->mmu_interval_notifier_insert->__mmu_interval_notifier_insert-> check_shl_overflow, but we even the amdgpu_hmm_register failure we still call amdgpu_hmm_unregister into amdgpu_gem_object_free which causes access to a bad address. The following stack is below when the issue is reproduced when Kazan is enabled: [ +0.000014] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000009] RIP: 0010:mmu_interval_notifier_remove+0x327/0x340 [ +0.000017] Code: ff ff 49 89 44 24 08 48 b8 00 01 00 00 00 00 ad de 4c 89 f7 49 89 47 40 48 83 c0 22 49 89 47 48 e8 ce d1 2d 01 e9 32 ff ff ff <0f> 0b e9 16 ff ff ff 4c 89 ef e8 fa 14 b3 ff e9 36 ff ff ff e8 80 [ +0.000014] RSP: 0018:ffffc90002657988 EFLAGS: 00010246 [ +0.000013] RAX: 0000000000000000 RBX: 1ffff920004caf35 RCX: ffffffff8160565b [ +0.000011] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8881a9f78260 [ +0.000010] RBP: ffffc90002657a70 R08: 0000000000000001 R09: fffff520004caf25 [ +0.000010] R10: 0000000000000003 R11: ffffffff8161d1d6 R12: ffff88810e988c00 [ +0.000010] R13: ffff888126fb5a00 R14: ffff88810e988c0c R15: ffff8881a9f78260 [ +0.000011] FS: 00007ff9ec848540(0000) GS:ffff8883cc880000(0000) knlGS:0000000000000000 [ +0.000012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000010] CR2: 000055b3f7e14328 CR3: 00000001b5770000 CR4: 0000000000350ef0 [ +0.000010] Call Trace: [ +0.000006] <TASK> [ +0.000007] ? show_regs+0x6a/0x80 [ +0.000018] ? __warn+0xa5/0x1b0 [ +0.000019] ? mmu_interval_notifier_remove+0x327/0x340 [ +0.000018] ? report_bug+0x24a/0x290 [ +0.000022] ? handle_bug+0x46/0x90 [ +0.000015] ? exc_invalid_op+0x19/0x50 [ +0.000016] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000017] ? kasan_save_stack+0x26/0x50 [ +0.000017] ? mmu_interval_notifier_remove+0x23b/0x340 [ +0.000019] ? mmu_interval_notifier_remove+0x327/0x340 [ +0.000019] ? mmu_interval_notifier_remove+0x23b/0x340 [ +0.000020] ? __pfx_mmu_interval_notifier_remove+0x10/0x10 [ +0.000017] ? kasan_save_alloc_info+0x1e/0x30 [ +0.000018] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? __kasan_kmalloc+0xb1/0xc0 [ +0.000018] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_read+0x11/0x20 [ +0.000020] amdgpu_hmm_unregister+0x34/0x50 [amdgpu] [ +0.004695] amdgpu_gem_object_free+0x66/0xa0 [amdgpu] [ +0.004534] ? __pfx_amdgpu_gem_object_free+0x10/0x10 [amdgpu] [ +0.004291] ? do_syscall_64+0x5f/0xe0 [ +0.000023] ? srso_return_thunk+0x5/0x5f [ +0.000017] drm_gem_object_free+0x3b/0x50 [drm] [ +0.000489] amdgpu_gem_userptr_ioctl+0x306/0x500 [amdgpu] [ +0.004295] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu] [ +0.004270] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? __this_cpu_preempt_check+0x13/0x20 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? sysvec_apic_timer_interrupt+0x57/0xc0 [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ +0.000022] ? drm_ioctl_kernel+0x17b/0x1f0 [drm] [ +0.000496] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu] [ +0.004272] ? drm_ioctl_kernel+0x190/0x1f0 [drm] [ +0.000492] drm_ioctl_kernel+0x140/0x1f0 [drm] [ +0.000497] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu] [ +0.004297] ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm] [ +0.000489] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? __kasan_check_write+0x14/0x20 [ +0.000016] drm_ioctl+0x3da/0x730 [drm] [ +0.000475] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu] [ +0.004293] ? __pfx_drm_ioctl+0x10/0x10 [drm] [ +0.000506] ? __pfx_rpm_resume+0x10/0x10 [ +0.000016] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? __kasan_check_write+0x14/0x20 [ +0.000010] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? _raw_spin_lock_irqsave+0x99/0x100 [ +0.000015] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? preempt_count_sub+0x18/0xc0 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000010] ? _raw_spin_unlock_irqrestore+0x27/0x50 [ +0.000019] amdgpu_drm_ioctl+0x7e/0xe0 [amdgpu] [ +0.004272] __x64_sys_ioctl+0xcd/0x110 [ +0.000020] do_syscall_64+0x5f/0xe0 [ +0.000021] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ +0.000015] RIP: 0033:0x7ff9ed31a94f [ +0.000012] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ +0.000013] RSP: 002b:00007fff25f66790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000016] RAX: ffffffffffffffda RBX: 000055b3f7e133e0 RCX: 00007ff9ed31a94f [ +0.000012] RDX: 000055b3f7e133e0 RSI: 00000000c1186451 RDI: 0000000000000003 [ +0.000010] RBP: 00000000c1186451 R08: 0000000000000000 R09: 0000000000000000 [ +0.000009] R10: 0000000000000008 R11: 0000000000000246 R12: 00007fff25f66ca8 [ +0.000009] R13: 0000000000000003 R14: 000055b3f7021ba8 R15: 00007ff9ed7af040 [ +0.000024] </TASK> [ +0.000007] ---[ end trace 0000000000000000 ]--- v2: Consolidate any error handling into amdgpu_hmm_register which applied to kfd_bo also. (Christian) v3: Improve syntax and comment (Christian) Cc: Christian Koenig <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Joonkyo Jung <[email protected]> Cc: Dokyung Song <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Signed-off-by: Vitaly Prosyak <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-20drm/amdgpu: Handle duplicate BOs during process restoreMukul Joshi1-4/+10
In certain situations, some apps can import a BO multiple times (through IPC for example). To restore such processes successfully, we need to tell drm to ignore duplicate BOs. While at it, also add additional logging to prevent silent failures when process restore fails. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-14Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min heap optimizations". - Kuan-Wei Chiu has also sped up the library sorting code in the series "lib/sort: Optimize the number of swaps and comparisons". - Alexey Gladkov has added the ability for code running within an IPC namespace to alter its IPC and MQ limits. The series is "Allow to change ipc/mq sysctls inside ipc namespace". - Geert Uytterhoeven has contributed some dhrystone maintenance work in the series "lib: dhry: miscellaneous cleanups". - Ryusuke Konishi continues nilfs2 maintenance work in the series "nilfs2: eliminate kmap and kmap_atomic calls" "nilfs2: fix kernel bug at submit_bh_wbc()" - Nathan Chancellor has updated our build tools requirements in the series "Bump the minimum supported version of LLVM to 13.0.1". - Muhammad Usama Anjum continues with the selftests maintenance work in the series "selftests/mm: Improve run_vmtests.sh". - Oleg Nesterov has done some maintenance work against the signal code in the series "get_signal: minor cleanups and fix". Plus the usual shower of singleton patches in various parts of the tree. Please see the individual changelogs for details. * tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) nilfs2: prevent kernel bug at submit_bh_wbc() nilfs2: fix failure to detect DAT corruption in btree and direct mappings ocfs2: enable ocfs2_listxattr for special files ocfs2: remove SLAB_MEM_SPREAD flag usage assoc_array: fix the return value in assoc_array_insert_mid_shortcut() buildid: use kmap_local_page() watchdog/core: remove sysctl handlers from public header nilfs2: use div64_ul() instead of do_div() mul_u64_u64_div_u64: increase precision by conditionally swapping a and b kexec: copy only happens before uchunk goes to zero get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig get_signal: don't abuse ksig->info.si_signo and ksig->sig const_structs.checkpatch: add device_type Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>" dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace() list: leverage list_is_head() for list_entry_is_head() nilfs2: MAINTAINERS: drop unreachable project mirror site smp: make __smp_processor_id() 0-argument macro fat: fix uninitialized field in nostale filehandles ...
2024-03-11Merge tag 'amd-drm-next-6.9-2024-03-08-1' of ↵Dave Airlie34-562/+1265
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.9-2024-03-08-1: amdgpu: - DCN 3.5.1 support - Fixes for IOMMUv2 removal - UAF fix - Misc small fixes and cleanups - SR-IOV fixes - MCBP cleanup - devcoredump update - NBIF 6.3.1 support - VPE 6.1.1 support amdkfd: - Misc fixes and cleanups - GFX10.1 trap fixes Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-03-08Merge drm/drm-next into drm-misc-nextThomas Zimmermann41-283/+391
Backmerging to get the latest fixes from drm-next; specifically the build fix from the patchset at [1]. Also fixes the build by removing an unused variable from rzg2l_du_vsp_atomic_flush(). Signed-off-by: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/series/130720/ # 1
2024-03-08Merge tag 'amd-drm-next-6.9-2024-03-01' of ↵Dave Airlie41-283/+391
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.9-2024-03-01: amdgpu: - GC 11.5.1 updates - Misc display cleanups - NBIO 7.9 updates - Backlight fixes - DMUB fixes - MPO fixes - atomfirmware table updates - SR-IOV fixes - VCN 4.x updates - use RMW accessors for pci config registers - PSR fixes - Suspend/resume fixes - RAS fixes - ABM fixes - Misc code cleanups - SI DPM fix - Revert freesync video amdkfd: - Misc cleanups - Error handling fixes radeon: - use RMW accessors for pci config registers From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Dave Airlie <[email protected]>
2024-03-07drm/amdgpu/soc21: add mode2 asic reset for SMU IP v14.0.1lima10021-0/+1
Set the default reset method to mode2 for SMU IP v14.0.1 Signed-off-by: lima1002 <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-07drm/amdgpu: add VPE 6.1.1 discovery supportAlex Deucher1-0/+1
Enable VPE 6.1.1. Signed-off-by: Alex Deucher <[email protected]>
2024-03-07drm/amdgpu/vpe: add VPE 6.1.1 supportLang Yu2-106/+207
Add initial support for VPE 6.1.1. v2: squash in updates (Alex) Signed-off-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-07drm/amdgpu/vpe: don't emit cond exec command under collaborate modeLang Yu1-0/+3
Not ready now. Signed-off-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-07drm/amdgpu/vpe: add collaborate mode support for VPELang Yu2-0/+21
Under clollaborate mode, multiple VPE instances share a ring buferr and work together to finish a job. Signed-off-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-07drm/amdgpu/vpe: add PRED_EXE and COLLAB_SYNC OPCODELang Yu1-1/+2
To support multi VPE collaborate mode. Signed-off-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-07drm/amdgpu/vpe: add multi instance VPE supportLang Yu2-0/+13
Add support for multi instance VPE processing. Signed-off-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-07drm/amdgpu/discovery: add nbif v6_3_1 ip blockLikun Gao1-0/+5
Add nbif v6_3_1 ip block. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-07drm/amdgpu: Add nbif v6_3_1 ip block supportHawking Zhang3-1/+529
Add nbif v6_3_1 ip block support. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-06drm/amdgpu: add ring timeout information in devcoredumpSunil Khatri2-0/+15
Add ring timeout related information in the amdgpu devcoredump file for debugging purposes. During the gpu recovery process the registered call is triggered and add the debug information in data file created by devcoredump framework under the directory /sys/class/devcoredump/devcdx/ Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-06drm/amdgpu: add dcn3.5.1 supportYifan Zhang1-0/+1
This patch to add dcn3.5.1 support. Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-06drm/amdgpu: disable ring_muxer if mcbp is offPierre-Eric Pelloux-Prayer2-11/+14
Using the ring_muxer without preemption adds overhead for no reason since mcbp cannot be triggered. Moving back to a single queue in this case also helps when high priority app are used: in this case the gpu_scheduler priority handling will work as expected - much better than ring_muxer with its 2 independant schedulers competing for the same hardware queue. This change requires moving amdgpu_device_set_mcbp above amdgpu_device_ip_early_init because we use adev->gfx.mcbp. Signed-off-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]> Acked-by: Jiadong Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-06drm/amdgpu: remove unused codeJesse Zhang2-47/+15
Remove the unused function - amdgpu_vm_pt_is_root_clean and remove the impossible condition v1: entries == 0 is not possible any more, so this condition could probably be removed (Felix) Signed-off-by: Jesse Zhang <[email protected]> Suggested-by:Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-04drm/amdgpu: workaround to avoid SET_Q_MODE packets v2Christian König2-16/+93
It turned out that executing the SET_Q_MODE packet on every submission creates to much overhead. Implement a workaround which allows skipping the SET_Q_MODE packet if subsequent submissions all use the same parameters. v2: add a NULL check for ring_obj Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-04drm/amdgpu: cleanup conditional executionChristian König11-184/+99
First of all calculating the number of dw to patch into a conditional execution is not something HW generation specific. This is just standard ring buffer calculations. While at it also reduce the BUG_ON() into WARN_ON(). Then instead of a random bit pattern use 0 as default value for the number of dw skipped, this way it's not mandatory any more to patch the conditional execution. And last make the address to check a parameter of the conditional execution instead of getting this from the ring. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-04drm/amdgpu: Use rpm_mode flag instead of checking it again for rpmMa Jun1-12/+12
Because the rpm_mode flag is already set when the driver is initialized, we use it directly for runtime suspend/resume instead of checking it again Signed-off-by: Ma Jun <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-04drm/amdgpu: change vm->task_info handlingShashank Sharma13-119/+249
This patch changes the handling and lifecycle of vm->task_info object. The major changes are: - vm->task_info is a dynamically allocated ptr now, and its uasge is reference counted. - introducing two new helper funcs for task_info lifecycle management - amdgpu_vm_get_task_info: reference counts up task_info before returning this info - amdgpu_vm_put_task_info: reference counts down task_info - last put to task_info() frees task_info from the vm. This patch also does logistical changes required for existing usage of vm->task_info. V2: Do not block all the prints when task_info not found (Felix) V3: Fixed review comments from Felix - Fix wrong indentation - No debug message for -ENOMEM - Add NULL check for task_info - Do not duplicate the debug messages (ti vs no ti) - Get first reference of task_info in vm_init(), put last in vm_fini() V4: Fixed review comments from Felix - fix double reference increment in create_task_info - change amdgpu_vm_get_task_info_pasid - additional changes in amdgpu_gem.c while porting Cc: Christian Koenig <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Felix Kuehling <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-04Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for ↵Jesse Zhang3-81/+1
Raven fix the issue: "amdgpu: Failed to create process VM object". [Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm page table. But when clifo run. It also initializes a vm for a process device through the function kfd_process_device_init_vm and ensure the root PD is clean through the function amdgpu_vm_pt_is_root_clean. So they have a conflict, and clinfo always failed. v1: - remove all the pte_supports_ats stuff from the amdgpu_vm code (Felix) Signed-off-by: Jesse Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-01drm/amdgpu: use GTT only as fallback for VRAM|GTTChristian König1-0/+6
Try to fill up VRAM as well by setting the busy flag on GTT allocations. This fixes the issue that when VRAM was evacuated for suspend it's never filled up again unless the application is restarted. Signed-off-by: Christian König <[email protected]> Reviewed-by: Zack Rusin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-29drm/amdgpu: remove misleading amdgpu_pmops_runtime_idle() commentBjorn Helgaas1-2/+1
After 4020c2280233 ("drm/amdgpu: don't runtime suspend if there are displays attached (v3)"), "ret" is unconditionally set later before being used, so there's point in initializing it and the associated comment is no longer meaningful. Remove the comment and the unnecessary initialization. Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-29Revert "drm/amd: Remove freesync video mode amdgpu parameter"Alex Deucher2-0/+28
This reverts commit e94e787e37b99645e7c02d20d0a1ba0f8a18a82a. This conflicts with how compositors want to handle VRR. Now that compositors actually handle VRR, we probably don't need freesync video. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2985 Acked-by: Hamza Mahfooz <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-29drm/amdgpu: add deferred error check for UMC v12 address queryTao Zhou1-1/+2
Both RAS UE and deferred errors need page retirement. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-28amd/amdkfd: remove unused parameterEric Huang2-3/+2
The adev can be found from bo by amdgpu_ttm_adev(bo->tbo.bdev), and adev is also not used in the function amdgpu_amdkfd_map_gtt_bo_to_gart(). Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Harish Kasiviswanathan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-27drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()Srinivasan Shanmugam1-6/+10
This ensures that the memory mapped by ioremap for adev->rmmio, is properly handled in amdgpu_device_init(). If the function exits early due to an error, the memory is unmapped. If the function completes successfully, the memory remains mapped. Reported by smatch: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4337 amdgpu_device_init() warn: 'adev->rmmio' from ioremap() not released on lines: 4035,4045,4051,4058,4068,4337 Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-27drm/amdgpu: Fix missing break in ATOM_ARG_IMM Case of atom_get_src_int()Srinivasan Shanmugam1-1/+1
Missing break statement in the ATOM_ARG_IMM case of a switch statement, adds the missing break statement, ensuring that the program's control flow is as intended. Fixes the below: drivers/gpu/drm/amd/amdgpu/atom.c:323 atom_get_src_int() warn: ignoring unreachable code. Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") Cc: Jammy Zhou <[email protected]> Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-27drm/amd: Drop abm_level propertyMario Limonciello2-10/+0
This vendor specific property has never been used by userspace software and conflicts with the panel_power_savings sysfs file. That is a compositor and user could fight over the same data. Fixes: 63d0b87213a0 ("drm/amd/display: add panel_power_savings sysfs entry to eDP connectors") Suggested-by: Harry Wentland <[email protected]> Cc: Hamza Mahfooz <[email protected]> Cc: "Sun peng Li (Leo)" <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-26drm/amdgpu: reserve more memory for MES runtime DRAMTim Huang1-3/+10
This patch fixes a MES firmware boot failure issue when backdoor loading the MES firmware. MES firmware runtime DRAM size is changed to 512k, the driver needs to reserve this amount of memory in FB, otherwise adjacent memory will be overwritten by the MES firmware startup code. Signed-off-by: Tim Huang <[email protected]> Reviewed-by: Yifan Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-26drm/amdgpu: Enable gpu reset for S3 abort cases on Raven seriesPrike Liang1-20/+25
Currently, GPU resets can now be performed successfully on the Raven series. While GPU reset is required for the S3 suspend abort case. So now can enable gpu reset for S3 abort cases on the Raven series. Signed-off-by: Prike Liang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-26drm/amdgpu: Do not program SQ_TIMEOUT_CONFIG in SRIOVVictor Lu1-0/+3
VF should not program this register. Signed-off-by: Victor Lu <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-26drm/amdgpu: Fix ineffective ras_mask settingsStanley.Yang1-0/+1
Check amdgpu_ras_mask to fix ineffective ras_mask setting due to special asic without sram ecc enable but with poison supported. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-26drm/amdkfd: Skip packet submission on fatal errorLijo Lazar2-0/+6
If fatal error is detected, packet submission won't go through. Return error in such cases. Also, avoid waiting for fence when fatal error is detected. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-26drm/amdgpu: Add fatal error detected flagLijo Lazar3-0/+39
For a RAS error that needs a full reset to recover, set the fatal error status. Clear the status once the device is reset. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Asad Kamal <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-02-26drm/amdgpu: Fix the runtime resume failure issueMa Jun1-0/+3
Don't set power state flag when system enter runtime suspend, or it may cause runtime resume failure issue. Fixes: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend") Signed-off-by: Ma Jun <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]>