aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
AgeCommit message (Collapse)AuthorFilesLines
2021-08-20drm/amdgpu: Cancel delayed work when GFXOFF is disabledMichel Dänzer1-6/+5
schedule_delayed_work does not push back the work if it was already scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms after the first time GFXOFF was disabled and re-enabled, even if GFXOFF was disabled and re-enabled again during those 100 ms. This resulted in frame drops / stutter with the upcoming mutter 41 release on Navi 14, due to constantly enabling GFXOFF in the HW and disabling it again (for getting the GPU clock counter). To fix this, call cancel_delayed_work_sync when the disable count transitions from 0 to 1, and only schedule the delayed work on the reverse transition, not if the disable count was already 0. This makes sure the delayed work doesn't run at unexpected times, and allows it to be lock-free. v2: * Use cancel_delayed_work_sync & mutex_trylock instead of mod_delayed_work. v3: * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König) v4: * Fix race condition between amdgpu_gfx_off_ctrl incrementing adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off checking for it to be 0 (Evan Quan) Cc: [email protected] Reviewed-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> # v3 Acked-by: Christian König <[email protected]> # v3 Signed-off-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amd/amdgpu:flush ttm delayed work before cancel_syncYuBiao Wang1-1/+3
[Why] In some cases when we unload driver, warning call trace will show up in vram_mgr_fini which claims that LRU is not empty, caused by the ttm bo inside delay deleted queue. [How] We should flush delayed work to make sure the delay deleting is done. Signed-off-by: YuBiao Wang <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/amdgpu embed hw_fence into amdgpu_jobJack Zhang1-1/+12
Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime, and simplify the design of gpu-scheduler. How: We propose to embed hw_fence into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. v2: use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is embedded in a job. v3: remove redundant variable ring in amdgpu_job v4: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. v5 add missing handling in amdgpu_fence_enable_signaling Signed-off-by: Jingwen Chen <[email protected]> Signed-off-by: Jack Zhang <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Reviewed by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-09drm/amd/amdgpu: skip locking delayed work if not initialized.YuBiao Wang1-1/+2
When init failed in early init stage, amdgpu_object has not been initialized, so hasn't the ttm delayed queue functions. Signed-off-by: YuBiao Wang <[email protected]> Reviewed-by: Emily.Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-05drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)Guchun Chen1-3/+2
In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop scheduler in s3 test, otherwise, fence related failure will arrive after resume. To fix this and for a better clean up, move drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of driver shutdown, and should never be called in hw_fini. v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, to keep sw_init and sw_fini paired. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1668 Fixes: 8d35a2596164c1 ("drm/amdgpu: adjust fence driver enable sequence") Suggested-by: Christian König <[email protected]> Tested-by: Mike Lothian <[email protected]> Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-30Merge tag 'amd-drm-next-5.15-2021-07-29' of ↵Dave Airlie1-44/+97
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.15-2021-07-29: amdgpu: - VCN/JPEG power down sequencing fixes - Various navi pcie link handling fixes - Clockgating fixes - Yellow Carp fixes - Beige Goby fixes - Misc code cleanups - S0ix fixes - SMU i2c bus rework - EEPROM handling rework - PSP ucode handling cleanup - SMU error handling rework - AMD HDMI freesync fixes - USB PD firmware update rework - MMIO based vram access rework - Misc display fixes - Backlight fixes - Add initial Cyan Skillfish support - Overclocking fixes suspend/resume amdkfd: - Sysfs leak fix - Add counters for vm faults and migration - GPUVM TLB optimizations radeon: - Misc fixes Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-07-28drm/amdgpu: adjust fence driver enable sequenceLikun Gao1-4/+6
Fence driver was enabled per ring when sw init on per IP block before. Change to enable all the fence driver at the same time after amdgpu_device_ip_init finished. Rename some function related to fence to make it reasonable for read. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-23drm/amdgpu: Clear doorbell interrupt status for Sienna CichlidChengzhe Liu1-0/+4
On Sienna Cichlid, in pass-through mode, if we unload the driver in BACO mode(RTPM), then the kernel would receive thousands of interrupts. That's because there is doorbell monitor interrupt on BIF, so KVM keeps injecting interrupts to the guest VM. So we should clear the doorbell interrupt status after BACO exit. v2: Modify coding style and commit message Signed-off-by: Chengzhe Liu <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-23drm/amdgpu: init family name for cyan_skillfishTao Zhou1-0/+1
Use FAMILY_NV for cyan_skillfish. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-23drm/amdgpu: add cyan_skillfish asic typeTao Zhou1-0/+5
Add cyan_skillfish asic family. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-23drm/amd/amdgpu: consider kernel job always not guiltyJingwen Chen1-3/+3
[Why] Currently all timedout job will be considered to be guilty. In SRIOV multi-vf use case, the vf flr happens first and then job time out is found. There can be several jobs timeout during a very small time slice. And if the innocent sdma job time out is found before the real bad job, then the innocent sdma job will be set to guilty. This will lead to a page fault after resubmitting job. [How] If the job is a kernel job, we will always consider it not guilty Reviewed-by: Christian König <[email protected]> Signed-off-by: Jingwen Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-23drm/amdgpu: Change the imprecise function nameRoy Sun1-1/+1
The callback functions are used for SRIOV read/write instead of just for rlcg read/write Signed-off-by: Roy Sun <[email protected]> Reviewed-by: Zhou pengju <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-23Merge tag 'drm-misc-next-2021-07-22' of ↵Dave Airlie1-5/+6
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.15-rc1: UAPI Changes: - Remove sysfs stats for dma-buf attachments, as it causes a performance regression. Previous merge is not in a rc kernel yet, so no userspace regression possible. Cross-subsystem Changes: - Sanitize user input in kyro's viewport ioctl. - Use refcount_t in fb_info->count - Assorted fixes to dma-buf. - Extend x86 efifb handling to all archs. - Fix neofb divide by 0. - Document corpro,gm7123 bridge dt bindings. Core Changes: - Slightly rework drm master handling. - Cleanup vgaarb handling. - Assorted fixes. Driver Changes: - Add support for ws2401 panel. - Assorted fixes to stm, ast, bochs. - Demidlayer ingenic irq. Signed-off-by: Dave Airlie <[email protected]> From: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-07-21vgaarb: don't pass a cookie to vga_client_registerChristoph Hellwig1-4/+5
The VGA arbitration is entirely based on pci_dev structures, so just pass that back to the set_vga_decode callback. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Christian König <[email protected]> Signed-off-by: Christian König <[email protected]>
2021-07-21vgaarb: remove the unused irq_set_state argument to vga_client_registerChristoph Hellwig1-1/+1
All callers pass NULL as the irq_set_state argument, so remove it and the ->irq_set_state member in struct vga_device. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Christian König <[email protected]> Signed-off-by: Christian König <[email protected]>
2021-07-21vgaarb: provide a vga_client_unregister wrapperChristoph Hellwig1-1/+1
Add a trivial wrapper for the unregister case that sets all fields to NULL. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Christian König <[email protected]> Signed-off-by: Christian König <[email protected]>
2021-07-16drm/amdgpu: split amdgpu_device_access_vram() into two small partsKevin Wang1-30/+75
split amdgpu_device_access_vram() 1. amdgpu_device_mm_access(): using MM_INDEX/MM_DATA to access vram 2. amdgpu_device_aper_access(): using vram aperature to access vram (option) Signed-off-by: Kevin Wang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-01drm/amdgpu: Fix resource leak on probe error pathJiri Kosina1-6/+2
This reverts commit 4192f7b5768912ceda82be2f83c87ea7181f9980. It is not true (as stated in the reverted commit changelog) that we never unmap the BAR on failure; it actually does happen properly on amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() -> amdgpu_device_fini() error path. What's worse, this commit actually completely breaks resource freeing on probe failure (like e.g. failure to load microcode), as amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too early, leaving all the resources that'd normally be freed in amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading to all sorts of oopses when someone tries to, for example, access the sysfs and procfs resources which are still around while the driver is gone. Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") Reported-by: Vojtech Pavlik <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-01drm/amdgpu: move apu flags initialization to the start of device initHuang Rui1-0/+36
In some asics, we need to adjust the behavior according to the apu flags at very early stage. Signed-off-by: Huang Rui <[email protected]> Reviewed-by: Aaron Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-06-30drm/amd/amdgpu: enable gpu recovery for beige_gobyChengming Gui1-0/+1
Enable gpu recovery for beige_goby. Signed-off-by: Chengming Gui <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-06-15drm/amdgpu: move shadow_list to amdgpu_bo_vmNirmoy Das1-2/+3
Move shadow_list to struct amdgpu_bo_vm as shadow BOs are part of PT/PD BOs. Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-06-10Merge tag 'amd-drm-next-5.14-2021-06-09' of ↵Dave Airlie1-2/+38
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.14-2021-06-09: amdgpu: - SR-IOV fixes - Smartshift updates - GPUVM TLB flush updates - 16bpc fixed point display fix for DCE11 - BACO cleanups and core refactoring - Aldebaran updates - Initial Yellow Carp support - RAS fixes - PM API cleanup - DC visual confirm updates - DC DP MST fixes - DC DML fixes - Misc code cleanups and bug fixes amdkfd: - Initial Yellow Carp support radeon: - memcpy_to/from_io fixes UAPI: - Add Yellow Carp chip family id Used internally in the kernel driver and by mesa Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-06-10Merge tag 'drm-misc-next-2021-06-09' of ↵Dave Airlie1-3/+3
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.14: UAPI Changes: * drm/panfrost: Export AFBC_FEATURES register to userspace Cross-subsystem Changes: * dma-buf: Fix debug printing; Rename dma_resv_*() functions + changes in callers; Cleanups Core Changes: * Add prefetching memcpy for WC * Avoid circular dependency on CONFIG_FB * Cleanups * Documentation fixes throughout DRM * ttm: Make struct ttm_resource the base of all managers + changes in all users of TTM; Add a generic memcpy for page-based iomem; Remove use of VM_MIXEDMAP; Cleanups Driver Changes: * drm/bridge: Add TI SN65DSI83 and SN65DSI84 + DT bindings * drm/hyperv: Add DRM driver for HyperV graphics output * drm/msm: Fix module dependencies * drm/panel: KD53T133: Support rotation * drm/pl111: Fix module dependencies * drm/qxl: Fixes * drm/stm: Cleanups * drm/sun4i: Be explicit about format modifiers * drm/vc4: Use struct gpio_desc; Cleanups * drm/vgem: Cleanups * drm/vmwgfx: Use ttm_bo_move_null() if there's nothing to copy * fbdev/mach64: Cleanups * fbdev/mb862xx: Use DEVICE_ATTR_RO Signed-off-by: Dave Airlie <[email protected]> From: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/YMBw3DF2b9udByfT@linux-uq9g
2021-06-04drm/amdgpu: Add DC support and display block for Yellow CarpNicholas Kazlauskas1-0/+1
To enable output on real display instead of virtual. Acked-by: Huang Rui <[email protected]> Signed-off-by: Nicholas Kazlauskas <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-06-04drm/amdgpu: add yellow carp support for gpu_info and ip block settingAaron Liu1-0/+7
This patch adds yellow carp support for gpu_info firmware and ip block setting. Signed-off-by: Aaron Liu <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-06-04drm/amdgpu: add yellow carp asic_type enumAaron Liu1-0/+1
This patch adds yellow carp to amd_asic_type enum and amdgpu_asic_name[]. Signed-off-by: Aaron Liu <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-06-04drm/amdgpu: Don't flush/invalidate HDP for APUs and A+AEric Huang1-2/+29
Integrate two generic functions to determine if HDP flush is needed for all Asics. Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-06-04Merge tag 'amd-drm-next-5.14-2021-06-02' of ↵Dave Airlie1-3/+29
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.14-2021-06-02: amdgpu: - GC/MM register access macro clean up for SR-IOV - Beige Goby updates - W=1 Fixes - Aldebaran fixes - Misc display fixes - ACPI ATCS/ATIF handling rework - SR-IOV fixes - RAS fixes - 16bpc fixed point format support - Initial smartshift support - RV/PCO power tuning fixes for suspend/resume - More buffer object subclassing work - Add new INFO query for additional vbios information - Add new placement for preemptable SG buffers amdkfd: - Misc fixes radeon: - W=1 Fixes - Misc cleanups UAPI: - Add new INFO query for additional vbios information Useful for debugging vbios related issues. Proposed umr patch: https://patchwork.freedesktop.org/patch/433297/ - 16bpc fixed point format support IGT test: https://lists.freedesktop.org/archives/igt-dev/2021-May/031507.html Proposed Vulkan patch: https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac3c4 - Add a new GEM flag which is only used internally in the kernel driver. Userspace is not allowed to set it. drm: - 16bpc fixed point format fourcc Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-06-02drm/ttm: rename bo->mem and make it a pointerChristian König1-3/+3
When we want to decouble resource management from buffer management we need to be able to handle resources separately. Add a resource pointer and rename bo->mem so that all code needs to change to access the pointer instead. No functional change. Signed-off-by: Christian König <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-06-01drm/amdgpu: enable smart shift on dGPU (v5)Sathishkumar S1-0/+24
enable smart shift on dGPU if it is part of HG system and the platform supports ATCS method to handle power shift. V2: avoid psc updates in baco enter and exit (Lijo) fix alignment (Shashank) V3: rebased on unified ATCS handling. (Alex) V4: check for return value and warn on failed update (Shashank) return 0 if device does not support smart shift. (Lizo) V5: rebased on ATPX/ATCS structures global (Alex) Signed-off-by: Sathishkumar S <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-05-27drm/amd/amdgpu/amdgpu_device: Make local function staticLee Jones1-1/+1
Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4624:6: warning: no previous prototype for ‘amdgpu_device_recheck_guilty_jobs’ [-Wmissing-prototypes] Cc: Alex Deucher <[email protected]> Cc: "Christian König" <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sumit Semwal <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Lee Jones <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-05-26drm/amdgpu: Fix clang warning: unused label 'exit'Andrey Grodzovsky1-0/+2
Problem: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:332:1: warning: unused label 'exit' [-Wunused-label] exit: ^~~~~ Fix: Put #ifdef CONFIG_64BIT around exit Reported-by: kernel test robot <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-25drm/amdgpu: add judgement for dc supportAsher Song1-1/+3
Drop DC initialization when DCN is harvested in VBIOS. The way doesn't affect virtual display ip initialization. Signed-off-by: Likun Gao <[email protected]> Signed-off-by: Asher Song <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-05-25drm/amdgpu: Rename flag which prevents HW accessAndrey Grodzovsky1-3/+3
Make it's name not feature but function descriptive. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-22Merge drm/drm-next into drm-misc-nextThomas Zimmermann1-14/+29
Backmerging from drm/drm-next to the patches for AMD devices for v5.14. Signed-off-by: Thomas Zimmermann <[email protected]>
2021-05-21drm/amdgpu: Indirect register access for Navi12 sriovPeng Ju Zhou1-1/+1
This patch series are used for GC/MMHUB(part)/IH_RB_CNTL indirect access in the SRIOV environment. There are 4 bits, controlled by host, to control if GC/MMHUB(part)/IH_RB_CNTL indirect access enabled. (one bit is master bit controls other 3 bits) For GC registers, changing all the register access from MMIO to RLC and use RLC as the default access method in the full access time. For partial MMHUB registers, changing their access from MMIO to RLC in the full access time, the remaining registers keep the original access method. For IH_RB_CNTL register, changing it's access from MMIO to PSP. Signed-off-by: Peng Ju Zhou <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-05-19drm/amdgpu: Unmap all MMIO mappingsAndrey Grodzovsky1-3/+23
Access to those must be prevented post pci_remove v6: Drop BOs list, unampping VRAM BAR is enough. v8: Add condition of xgmi.connected_to_cpu to MTTR handling and remove MTTR handling from the old place. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-19drm/amdgpu: Guard against write accesses after device removalAndrey Grodzovsky1-1/+9
This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed. v5: Protect more places wher memcopy_to/form_io takes place Protect IB submissions v6: Switch to !drm_dev_enter instead of scoping entire code with brackets. v7: Drop guard of HW ring commands emission protection since they are in GART and not in MMIO. Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-19drm/amdgpu: Handle IOMMU enabled case.Andrey Grodzovsky1-2/+2
Problem: Handle all DMA IOMMU group related dependencies before the group is removed. Those manifest themself in that when IOMMU enabled DMA map/unmap is dependent on the presence of IOMMU group the device belongs to but, this group is released once the device is removed from PCI topology. Fix: Expedite all such unmap operations to pci remove driver callback. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate v6: Drop the BO unamp list v7: Drop amdgpu_gart_fini In amdgpu_ih_ring_fini do uncinditional check (!ih->ring) to avoid freeing uniniitalized rings. Call amdgpu_ih_ring_fini unconditionally. v8: Add deatiled explanation Signed-off-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-19drm/amdgpu: Add early fini callbackAndrey Grodzovsky1-19/+40
Use it to call disply code dependent on device->drv_data before it's set to NULL on device unplug v5: Move HW finilization into this callback to prevent MMIO accesses post cpi remove. v7: Split kfd suspend from device exit to expdite HW related stuff to amdgpu_pci_remove v8: Squash previous KFD commit into this commit to avoid compile break. Signed-off-by: Andrey Grodzovsky <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-19drm/amdgpu: Split amdgpu_device_fini into early and lateAndrey Grodzovsky1-8/+18
Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the last device reference is dropped. v4: Change functions prefix early->hw and late->sw Signed-off-by: Andrey Grodzovsky <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-05-19drm/amd/amdgpu: fix a potential deadlock in gpu resetLang Yu1-1/+0
When amdgpu_ib_ring_tests failed, the reset logic called amdgpu_device_ip_suspend twice, then deadlock occurred. Deadlock log: [ 805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). [ 806.290952] [drm] free PSP TMR buffer [ 806.319406] ============================================ [ 806.320315] WARNING: possible recursive locking detected [ 806.321225] 5.11.0-custom #1 Tainted: G W OEL [ 806.322135] -------------------------------------------- [ 806.323043] cat/2593 is trying to acquire lock: [ 806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.325668] but task is already holding lock: [ 806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.328430] other info that might help us debug this: [ 806.329539] Possible unsafe locking scenario: [ 806.330549] CPU0 [ 806.330983] ---- [ 806.331416] lock(&adev->dm.dc_lock); [ 806.332086] lock(&adev->dm.dc_lock); [ 806.332738] *** DEADLOCK *** [ 806.333747] May be due to missing lock nesting notation [ 806.334899] 3 locks held by cat/2593: [ 806.335537] #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110 [ 806.337009] #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu] [ 806.339018] #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.340869] stack backtrace: [ 806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G W OEL 5.11.0-custom #1 [ 806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020 [ 806.344413] Call Trace: [ 806.344849] dump_stack+0x93/0xbd [ 806.345435] __lock_acquire.cold+0x18a/0x2cf [ 806.346179] lock_acquire+0xca/0x390 [ 806.346807] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.347813] __mutex_lock+0x9b/0x930 [ 806.348454] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.349434] ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu] [ 806.350581] ? _raw_spin_unlock_irqrestore+0x47/0x50 [ 806.351437] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.352437] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.353252] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.354064] mutex_lock_nested+0x1b/0x20 [ 806.354747] ? mutex_lock_nested+0x1b/0x20 [ 806.355457] dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.356427] ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu] [ 806.357736] amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu] [ 806.360394] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] [ 806.362926] amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu] [ 806.365560] amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu] Signed-off-by: Lang Yu <[email protected]> Acked-by: Christian KÃnig <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-05-19drm/amd/amdgpu: Enable DCN IP init for Beige GobyAurabindo Pillai1-0/+1
[Why&How] Adds DCN IP block initialization for Beige Goby Signed-off-by: Aurabindo Pillai <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-05-19drm/amd/amdgpu: Use IP discovery table for beige gobyChengming Gui1-0/+1
Rather than gpu info firmware. Signed-off-by: Chengming Gui <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-05-19drm/amd/amdgpu: set asic family and ip blocks for beige_gobyChengming Gui1-0/+1
Same as navi series Signed-off-by: Chengming Gui <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-05-19drm/amd/amdgpu: add beige_goby asic typeChengming Gui1-0/+1
Add chip type for beige_goby v2: fix enum count (Alex) Signed-off-by: Chengming Gui <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-05-19drm/amdgpu: switch to cached fw flags for gpu virt capHawking Zhang1-1/+1
Check cached firmware_flags to determine if gpu virtualization is supported in vbios Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-05-19drm/amdgpu: add judgement when add ip blocks (v2)Likun GAO1-1/+14
Judgement whether to add an sw ip according to the harvest info. v2: fix indentation (Alex) Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-05-10drm/amdgpu: Rename to ras_*_enabledLuben Tuikov1-2/+2
Rename, ras_hw_supported --> ras_hw_enabled, and ras_features --> ras_enabled, to show that ras_enabled is a subset of ras_hw_enabled, which itself is a subset of the ASIC capability. Cc: Alexander Deucher <[email protected]> Cc: John Clements <[email protected]> Cc: Hawking Zhang <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-05-10drm/amdgpu: Remove redundant ras->supportedLuben Tuikov1-2/+4
Remove redundant ras->supported, as this value is also stored in adev->ras_features. Use adev->ras_features, as that supercedes "ras", since the latter is its member. The dependency goes like this: ras <== adev->ras_features <== hw_supported, and is read as "ras depends on ras_features, which depends on hw_supported." The arrows show the flow of information, i.e. the dependency update. "hw_supported" should also live in "adev". Cc: Alexander Deucher <[email protected]> Cc: John Clements <[email protected]> Cc: Hawking Zhang <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>