aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2024-06-27drm/amdgpu: refine poison consumption interrupt handlerYiPeng Chai2-23/+43
1. The poison fifo is only used for poison consumption requests. 2. Merge reset requests when poison fifo caches multiple poison consumption messages Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: refine poison creation interrupt handlerYiPeng Chai2-22/+18
In order to apply to the case where a large number of ras poison interrupts: 1. Change to use variable to record poison creation requests to avoid fifo full. 2. Prioritize handling poison creation requests instead of following the order of requests received by the driver. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: process RAS fatal error MB notificationVignesh Chander7-8/+55
For RAS error scenario, VF guest driver will check mailbox and set fed flag to avoid unnecessary HW accesses. additionally, poll for reset completion message first to avoid accidentally spamming multiple reset requests to host. v2: add another mailbox check for handling case where kfd detects timeout first v3: set host_flr bit and use wait_for_reset Signed-off-by: Vignesh Chander <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: add variable to record the deferred error number read by driverYiPeng Chai3-21/+48
Add variable to record the deferred error number read by driver. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: Use dev_ prints for virtualization as it supports multi adapterVignesh Chander2-16/+26
So we can get clearer per device logging. Signed-off-by: Vignesh Chander <[email protected]> Reviewed-by: Zhigang Luo <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: clear RB_OVERFLOW bit when enabling interruptsDanijel Slivka1-0/+28
Why: Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit if RB_ENABLE is not set. How to fix: Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set. The RB_ENABLE bit is required to be set, together with WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit would clear the RB_OVERFLOW. Signed-off-by: Danijel Slivka <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27Revert "drm/amd/amdgpu: add module parameter for jpeg"Kenneth Feng3-8/+0
This reverts commit d3620eeae82cccf8316e6754f8ddb52473e2e5ea. Revert this due to a final solution: commit ed3165d660d8 ("drm/amdgpu/jpeg5: reprogram doorbell setting after power up for each playback") Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Sonny Jiang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: Don't show false warning for reg listLijo Lazar2-6/+24
If reg list is already loaded on PSP 13.0.2 SOCs, psp will give TEE_ERR_CANCEL response on second time load. Avoid printing warn message for it. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: avoid using null object of framebufferJulia Zhang1-2/+16
Instead of using state->fb->obj[0] directly, get object from framebuffer by calling drm_gem_fb_get_obj() and return error code when object is null to avoid using null object of framebuffer. Reported-by: Fusheng Huang <[email protected]> Signed-off-by: Julia Zhang <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: Fix smatch static checker warningHawking Zhang1-4/+4
adev->gfx.imu.funcs could be NULL Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Likun Gao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: add missing error handling in function ↵Bob Zhou1-1/+5
amdgpu_gmc_flush_gpu_tlb_pasid Fix the unchecked return value warning reported by Coverity, so add error handling. Signed-off-by: Bob Zhou <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: Fix register access violationHawking Zhang3-7/+12
fault_status is read only register. fault_cntl is not accessible from guest environment. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: access ltr through pci cfg spaceFrank Min1-4/+10
Access ltr through pci cfg space instead of mmio while programing aspm on gfx12 Signed-off-by: Frank Min <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: refine gfx12 firmware loadingYang Wang1-12/+10
refine gfx12 firmware loading Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: update MTYPE mapping for gfx12Frank Min1-4/+0
gfx12 only support MTYPE UC and NC, so update it accordingly. Signed-off-by: Frank Min <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27amdgpu: don't dereference a NULL resource in sysfs codePierre-Eric Pelloux-Prayer1-30/+33
dma_resv_trylock being successful doesn't guarantee that bo->tbo.base.resv is not NULL, so check its validity before using it. Signed-off-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: Fix pci state save during mode-1 resetLijo Lazar1-2/+5
Cache the PCI state before bus master is disabled. The saved state is later used for other cases like restoring config space after mode-2 reset. Fixes: 5c03e5843e6b ("drm/amdgpu:add smu mode1/2 support for aldebaran") Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu: refine gfx11 firmware loadingYang Wang1-14/+12
refine gfx11 firmware loading Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu/jpeg5: reprogram doorbell setting after power up for each playbackSonny Jiang1-4/+4
Doorbell needs to be configured after power up during each playback Signed-off-by: Sonny Jiang <[email protected]> Reviewed-by: Kenneth Feng <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27drm/amdgpu/atomfirmware: fix parsing of vram_infoAlex Deucher1-1/+1
v3.x changed the how vram width was encoded. The previous implementation actually worked correctly for most boards. Fix the implementation to work correctly everywhere. This fixes the vram width reported in the kernel log on some boards. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-27Merge tag 'amd-drm-next-6.11-2024-06-22' of ↵Dave Airlie80-657/+674
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.11-2024-06-22: amdgpu: - HPD fixes - PSR fixes - DCC updates - DCN 4.0.1 fixes - FAMS fixes - Misc code cleanups - SR-IOV fixes - GPUVM TLB flush cleanups - Make VCN less verbose - ACPI backlight fixes - MES fixes - Firmware loading cleanups - Replay fixes - LTTPR fixes - Trap handler fixes - Cursor and overlay fixes - Primary plane zpos fixes - DML 2.1 fixes - RAS updates - USB4 fixes - MALL fixes - Reserved VMID fix - Silence UBSAN warnings amdkfd: - Misc code cleanups From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Dave Airlie <[email protected]>
2024-06-25drm/amdgpu: Don't show false warning for reg listLijo Lazar2-6/+24
If reg list is already loaded on PSP 13.0.2 SOCs, psp will give TEE_ERR_CANCEL response on second time load. Avoid printing warn message for it. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-25drm/amdgpu: avoid using null object of framebufferJulia Zhang1-2/+16
Instead of using state->fb->obj[0] directly, get object from framebuffer by calling drm_gem_fb_get_obj() and return error code when object is null to avoid using null object of framebuffer. Reported-by: Fusheng Huang <[email protected]> Signed-off-by: Julia Zhang <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-06-25drm/amdgpu: Fix pci state save during mode-1 resetLijo Lazar1-2/+5
Cache the PCI state before bus master is disabled. The saved state is later used for other cases like restoring config space after mode-2 reset. Fixes: 5c03e5843e6b ("drm/amdgpu:add smu mode1/2 support for aldebaran") Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-25drm/amdgpu/atomfirmware: fix parsing of vram_infoAlex Deucher1-1/+1
v3.x changed the how vram width was encoded. The previous implementation actually worked correctly for most boards. Fix the implementation to work correctly everywhere. This fixes the vram width reported in the kernel log on some boards. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-06-21Merge tag 'drm-misc-next-2024-05-30' of ↵Dave Airlie8-34/+13
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.11: UAPI Changes: - Deprecate DRM date and return a 0 date in DRM_IOCTL_VERSION Core Changes: - connector: Create a set of helpers to help with HDMI support - fbdev: Create memory manager optimized fbdev emulation - panic: Allow to select fonts, improve drm_fb_dma_get_scanout_buffer Driver Changes: - Remove driver owner assignments - Allow more drivers to compile with COMPILE_TEST - Conversions to drm_edid - ivpu: hardware scheduler support, profiling support, improvements to the platform support layer - mgag200: general reworks and improvements - nouveau: Add NVreg_RegistryDwords command line option - rockchip: Conversion to the hdmi helpers - sun4i: Conversion to the hdmi helpers - vc4: Conversion to the hdmi helpers - v3d: Perf counters improvements - zynqmp: IRQ and debugfs improvements - bridge: - Remove redundant checks on bridge->encoder - panels: - Switch panels from register table initialization to proper code - Now that the panel code tracks the panel state, remove every ad-hoc implementation in the panel drivers - New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology 13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE nv110wum-l60, IVO t109nw41 Signed-off-by: Dave Airlie <[email protected]> From: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/20240530-hilarious-flat-magpie-5fa186@houat
2024-06-19drm/amdgpu: init TA fw for psp v14Likun Gao1-0/+5
Add support to init TA firmware for psp v14. Signed-off-by: Likun Gao <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: cleanup MES11 command submissionChristian König1-28/+48
The approach of having a separate WB slot for each submission doesn't really work well and for example breaks GPU reset. Use a status query packet for the fence update instead since those should always succeed we can use the fence of the original packet to signal the state of the operation. While at it cleanup the coding style. Fixes: eef016ba8986 ("drm/amdgpu/mes11: Use a separate fence per transaction") Reviewed-by: Mukul Joshi <[email protected]> Signed-off-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2Christian König3-51/+0
This reverts commit b8c415e3bf98 ("drm/amdgpu: take runtime pm reference when we attach a buffer") and commit 425285d39afd ("drm/amdgpu: add amdgpu runpm usage trace for separate funcs"). Taking a runtime pm reference for DMA-buf is actually completely unnecessary and even dangerous. The problem is that calling pm_runtime_get_sync() from the DMA-buf callbacks is illegal because we have the reservation locked here which is also taken during resume. So this would deadlock. When the buffer is in GTT it is still accessible even when the GPU is powered down and when it is in VRAM the buffer gets migrated to GTT before powering down. The only use case which would make it mandatory to keep the runtime pm reference would be if we pin the buffer into VRAM, and that's not something we currently do. v2: improve the commit message Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> CC: [email protected]
2024-06-19drm/amdgpu: Indicate CU havest info to CPHarish Kasiviswanathan1-2/+13
To achieve full occupancy CP hardware needs to know if CUs in SE are symmetrically or asymmetrically harvested v2: Reset is_symmetric_cus for each loop Signed-off-by: Harish Kasiviswanathan <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: fix locking scope when flushing tlbYunxiang Li1-32/+34
Which method is used to flush tlb does not depend on whether a reset is in progress or not. We should skip flush altogether if the GPU will get reset. So put both path under reset_domain read lock. Signed-off-by: Yunxiang Li <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> CC: [email protected]
2024-06-19drm/amdgpu: init TA fw for psp v14Likun Gao1-0/+5
Add support to init TA firmware for psp v14. Signed-off-by: Likun Gao <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: refine gfx6 firmware loadingYang Wang1-10/+9
refine gfx6 firmware loading Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19Revert "drm/amdgpu: change aca bank error lock type to spinlock"Yang Wang2-11/+11
This reverts commit f6bce954f432c556659a57be9e18fecdc575affb. Revert this patch to modify lock type back to 'mutex' to avoid kernel calltrace issue. [ 602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu] [ 602.668939] Call Trace: [ 602.668940] <TASK> [ 602.668941] dump_stack_lvl+0x4c/0x70 [ 602.668945] dump_stack+0x14/0x20 [ 602.668946] __schedule_bug+0x5a/0x70 [ 602.668950] __schedule+0x940/0xb30 [ 602.668952] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668955] ? hrtimer_reprogram+0x77/0xb0 [ 602.668957] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668959] ? hrtimer_start_range_ns+0x126/0x370 [ 602.668961] schedule+0x39/0xe0 [ 602.668962] schedule_hrtimeout_range_clock+0xb1/0x140 [ 602.668964] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 602.668966] schedule_hrtimeout_range+0x17/0x20 [ 602.668967] usleep_range_state+0x69/0x90 [ 602.668970] psp_cmd_submit_buf+0x132/0x570 [amdgpu] [ 602.669066] psp_ras_invoke+0x75/0x1a0 [amdgpu] [ 602.669156] psp_ras_query_address+0x9c/0x120 [amdgpu] [ 602.669245] umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu] [ 602.669337] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669339] ? stack_depot_save+0x12/0x20 [ 602.669342] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669343] ? set_track_prepare+0x52/0x70 [ 602.669346] ? kmemleak_alloc+0x4f/0x90 [ 602.669348] ? __kmalloc_node+0x34b/0x450 [ 602.669352] amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu] [ 602.669438] mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu] [ 602.669554] mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu] [ 602.669655] amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu] [ 602.669743] ? kmemleak_free+0x36/0x60 [ 602.669745] ? kvfree+0x32/0x40 [ 602.669747] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669749] ? kfree+0x15d/0x2a0 [ 602.669752] amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu] [ 602.669839] amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu] [ 602.669924] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669925] ? __call_rcu_common.constprop.0+0xa6/0x2b0 [ 602.669929] amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu] [ 602.670014] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.670017] amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu] [ 602.670103] amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu] [ 602.670187] ? srso_alias_return_thunk+0x5/0 Signed-off-by: Yang Wang <[email protected]> Reviewed-by: YiPeng Chai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19Revert "drm/amdgpu: change bank cache lock type to spinlock"Yang Wang2-6/+7
This reverts commit 258ed689bc3163f86204f75df6c23f92b59b3fad revert this patch to modify lock type back to 'mutex' to avoid kernel calltrace issue. [ 602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu] [ 602.668939] Call Trace: [ 602.668940] <TASK> [ 602.668941] dump_stack_lvl+0x4c/0x70 [ 602.668945] dump_stack+0x14/0x20 [ 602.668946] __schedule_bug+0x5a/0x70 [ 602.668950] __schedule+0x940/0xb30 [ 602.668952] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668955] ? hrtimer_reprogram+0x77/0xb0 [ 602.668957] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668959] ? hrtimer_start_range_ns+0x126/0x370 [ 602.668961] schedule+0x39/0xe0 [ 602.668962] schedule_hrtimeout_range_clock+0xb1/0x140 [ 602.668964] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 602.668966] schedule_hrtimeout_range+0x17/0x20 [ 602.668967] usleep_range_state+0x69/0x90 [ 602.668970] psp_cmd_submit_buf+0x132/0x570 [amdgpu] [ 602.669066] psp_ras_invoke+0x75/0x1a0 [amdgpu] [ 602.669156] psp_ras_query_address+0x9c/0x120 [amdgpu] [ 602.669245] umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu] [ 602.669337] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669339] ? stack_depot_save+0x12/0x20 [ 602.669342] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669343] ? set_track_prepare+0x52/0x70 [ 602.669346] ? kmemleak_alloc+0x4f/0x90 [ 602.669348] ? __kmalloc_node+0x34b/0x450 [ 602.669352] amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu] [ 602.669438] mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu] [ 602.669554] mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu] [ 602.669655] amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu] [ 602.669743] ? kmemleak_free+0x36/0x60 [ 602.669745] ? kvfree+0x32/0x40 [ 602.669747] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669749] ? kfree+0x15d/0x2a0 [ 602.669752] amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu] [ 602.669839] amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu] [ 602.669924] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669925] ? __call_rcu_common.constprop.0+0xa6/0x2b0 [ 602.669929] amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu] [ 602.670014] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.670017] amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu] [ 602.670103] amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu] [ 602.670187] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.670189] ? __schedule+0x37d/0xb30 [ 602.670191] process_one_work+0x176/0x350 [ 602.670194] worker_thread+0x2f7/0x420 [ 602.670197] ? Signed-off-by: Yang Wang <[email protected]> Reviewed-by: YiPeng Chai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: remove amdgpu_mes_fence_wait_polling()Alex Deucher2-16/+0
No longer used so remove it. Reviewed-by: Mukul Joshi <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: cleanup MES12 command submissionAlex Deucher1-28/+48
The approach of having a separate WB slot for each submission doesn't really work well and for example breaks GPU reset. Use a status query packet for the fence update instead since those should always succeed we can use the fence of the original packet to signal the state of the operation. While at it cleanup the coding style. Fixes: ade887c63394 ("drm/amdgpu/mes12: Use a separate fence per transaction") Reviewed-by: Mukul Joshi <[email protected]> Suggested-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: refine gfx10 firmware loadingYang Wang1-13/+12
refine gfx10 firmware loading Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: refine gfx9 firmware loadingYang Wang2-30/+26
refine gfx9 firmware loading Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: cleanup MES11 command submissionChristian König1-28/+48
The approach of having a separate WB slot for each submission doesn't really work well and for example breaks GPU reset. Use a status query packet for the fence update instead since those should always succeed we can use the fence of the original packet to signal the state of the operation. While at it cleanup the coding style. Fixes: eef016ba8986 ("drm/amdgpu/mes11: Use a separate fence per transaction") Reviewed-by: Mukul Joshi <[email protected]> Signed-off-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: fix using the reserved VMID with gang submitChristian König3-13/+45
We need to ensure that even when using a reserved VMID that the gang members can still run in parallel. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19drm/amdgpu: Do not wait for MP0_C2PMSG_33 IFWI init in SRIOVVictor Lu1-12/+14
SRIOV does not need to wait for IFWI init, and MP0_C2PMSG_33 is blocked for VF access. Signed-off-by: Victor Lu <[email protected]> Reviewed-by: Vignesh Chander <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-19Revert "drm/amdgpu: Add missing locking for MES API calls"Mukul Joshi1-12/+0
This reverts commit 3612702852acbded39233b1600c8d9f47e40139f. This is causing a BUG message during suspend. [ 61.603542] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283 [ 61.603550] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2028, name: kworker/u64:14 [ 61.603553] preempt_count: 1, expected: 0 [ 61.603555] RCU nest depth: 0, expected: 0 [ 61.603557] Preemption disabled at: [ 61.603559] [<ffffffffc08a3261>] amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu] [ 61.603789] CPU: 9 PID: 2028 Comm: kworker/u64:14 Tainted: G W 6.8.0+ #7 [ 61.603795] Workqueue: events_unbound async_run_entry_fn [ 61.603801] Call Trace: [ 61.603803] <TASK> [ 61.603806] dump_stack_lvl+0x37/0x50 [ 61.603811] ? amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu] [ 61.604007] dump_stack+0x10/0x20 [ 61.604010] __might_resched+0x16f/0x1d0 [ 61.604016] __might_sleep+0x43/0x70 [ 61.604020] mutex_lock+0x1f/0x60 [ 61.604024] amdgpu_mes_unmap_legacy_queue+0x6d/0x100 [amdgpu] [ 61.604226] gfx11_kiq_unmap_queues+0x3dc/0x430 [amdgpu] [ 61.604422] ? srso_alias_return_thunk+0x5/0xfbef5 [ 61.604429] amdgpu_gfx_disable_kgq+0x122/0x160 [amdgpu] [ 61.604621] gfx_v11_0_hw_fini+0xda/0x100 [amdgpu] [ 61.604814] gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 61.605008] amdgpu_device_ip_suspend_phase2+0x135/0x1d0 [amdgpu] [ 61.605175] amdgpu_device_suspend+0xec/0x180 [amdgpu] Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: refine gfx8 firmware loadingYang Wang1-36/+33
refine gfx8 firmware loading Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: set RAS fed status for more casesTao Zhou2-0/+2
Indicate fatal error for each RAS block and NBIO. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: create amdgpu_ras_in_recovery to simplify codeTao Zhou4-35/+25
Reduce redundant code and user doesn't need to pay attention to RAS details. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: trigger mode1 reset for RAS RMA statusTao Zhou3-11/+29
Check RMA status in bad page retirement flow. v2: fix coding bugs in v1. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: refine gfx7 firmware loadingYang Wang1-14/+13
refine gfx7 firmware loading Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-06-14drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2Christian König3-51/+0
This reverts commit b8c415e3bf98 ("drm/amdgpu: take runtime pm reference when we attach a buffer") and commit 425285d39afd ("drm/amdgpu: add amdgpu runpm usage trace for separate funcs"). Taking a runtime pm reference for DMA-buf is actually completely unnecessary and even dangerous. The problem is that calling pm_runtime_get_sync() from the DMA-buf callbacks is illegal because we have the reservation locked here which is also taken during resume. So this would deadlock. When the buffer is in GTT it is still accessible even when the GPU is powered down and when it is in VRAM the buffer gets migrated to GTT before powering down. The only use case which would make it mandatory to keep the runtime pm reference would be if we pin the buffer into VRAM, and that's not something we currently do. v2: improve the commit message Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> CC: [email protected]
2024-06-14drm/amdgpu: refine imu firmware loadingYang Wang2-12/+8
refine imu firmware loading Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>