aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
AgeCommit message (Collapse)AuthorFilesLines
2021-07-23drm/amd/amdgpu: consider kernel job always not guiltyJingwen Chen1-3/+3
[Why] Currently all timedout job will be considered to be guilty. In SRIOV multi-vf use case, the vf flr happens first and then job time out is found. There can be several jobs timeout during a very small time slice. And if the innocent sdma job time out is found before the real bad job, then the innocent sdma job will be set to guilty. This will lead to a page fault after resubmitting job. [How] If the job is a kernel job, we will always consider it not guilty Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-23drm/amdgpu: Change the imprecise function nameRoy Sun1-1/+1
The callback functions are used for SRIOV read/write instead of just for rlcg read/write Signed-off-by: Roy Sun <Roy.Sun@amd.com> Reviewed-by: Zhou pengju <pengju.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-23Merge tag 'drm-misc-next-2021-07-22' of ↵Dave Airlie1-5/+6
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.15-rc1: UAPI Changes: - Remove sysfs stats for dma-buf attachments, as it causes a performance regression. Previous merge is not in a rc kernel yet, so no userspace regression possible. Cross-subsystem Changes: - Sanitize user input in kyro's viewport ioctl. - Use refcount_t in fb_info->count - Assorted fixes to dma-buf. - Extend x86 efifb handling to all archs. - Fix neofb divide by 0. - Document corpro,gm7123 bridge dt bindings. Core Changes: - Slightly rework drm master handling. - Cleanup vgaarb handling. - Assorted fixes. Driver Changes: - Add support for ws2401 panel. - Assorted fixes to stm, ast, bochs. - Demidlayer ingenic irq. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/2d0d2fe8-01fc-e216-c3fd-38db9e69944e@linux.intel.com
2021-07-21vgaarb: don't pass a cookie to vga_client_registerChristoph Hellwig1-4/+5
The VGA arbitration is entirely based on pci_dev structures, so just pass that back to the set_vga_decode callback. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210716061634.2446357-8-hch@lst.de Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>
2021-07-21vgaarb: remove the unused irq_set_state argument to vga_client_registerChristoph Hellwig1-1/+1
All callers pass NULL as the irq_set_state argument, so remove it and the ->irq_set_state member in struct vga_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210716061634.2446357-7-hch@lst.de Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>
2021-07-21vgaarb: provide a vga_client_unregister wrapperChristoph Hellwig1-1/+1
Add a trivial wrapper for the unregister case that sets all fields to NULL. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210716061634.2446357-6-hch@lst.de Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>
2021-07-16drm/amdgpu: split amdgpu_device_access_vram() into two small partsKevin Wang1-30/+75
split amdgpu_device_access_vram() 1. amdgpu_device_mm_access(): using MM_INDEX/MM_DATA to access vram 2. amdgpu_device_aper_access(): using vram aperature to access vram (option) Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: Fix resource leak on probe error pathJiri Kosina1-6/+2
This reverts commit 4192f7b5768912ceda82be2f83c87ea7181f9980. It is not true (as stated in the reverted commit changelog) that we never unmap the BAR on failure; it actually does happen properly on amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() -> amdgpu_device_fini() error path. What's worse, this commit actually completely breaks resource freeing on probe failure (like e.g. failure to load microcode), as amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too early, leaving all the resources that'd normally be freed in amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading to all sorts of oopses when someone tries to, for example, access the sysfs and procfs resources which are still around while the driver is gone. Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") Reported-by: Vojtech Pavlik <vojtech@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: move apu flags initialization to the start of device initHuang Rui1-0/+36
In some asics, we need to adjust the behavior according to the apu flags at very early stage. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-30drm/amd/amdgpu: enable gpu recovery for beige_gobyChengming Gui1-0/+1
Enable gpu recovery for beige_goby. Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-15drm/amdgpu: move shadow_list to amdgpu_bo_vmNirmoy Das1-2/+3
Move shadow_list to struct amdgpu_bo_vm as shadow BOs are part of PT/PD BOs. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-10Merge tag 'amd-drm-next-5.14-2021-06-09' of ↵Dave Airlie1-2/+38
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.14-2021-06-09: amdgpu: - SR-IOV fixes - Smartshift updates - GPUVM TLB flush updates - 16bpc fixed point display fix for DCE11 - BACO cleanups and core refactoring - Aldebaran updates - Initial Yellow Carp support - RAS fixes - PM API cleanup - DC visual confirm updates - DC DP MST fixes - DC DML fixes - Misc code cleanups and bug fixes amdkfd: - Initial Yellow Carp support radeon: - memcpy_to/from_io fixes UAPI: - Add Yellow Carp chip family id Used internally in the kernel driver and by mesa Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210610031649.4006-1-alexander.deucher@amd.com
2021-06-10Merge tag 'drm-misc-next-2021-06-09' of ↵Dave Airlie1-3/+3
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.14: UAPI Changes: * drm/panfrost: Export AFBC_FEATURES register to userspace Cross-subsystem Changes: * dma-buf: Fix debug printing; Rename dma_resv_*() functions + changes in callers; Cleanups Core Changes: * Add prefetching memcpy for WC * Avoid circular dependency on CONFIG_FB * Cleanups * Documentation fixes throughout DRM * ttm: Make struct ttm_resource the base of all managers + changes in all users of TTM; Add a generic memcpy for page-based iomem; Remove use of VM_MIXEDMAP; Cleanups Driver Changes: * drm/bridge: Add TI SN65DSI83 and SN65DSI84 + DT bindings * drm/hyperv: Add DRM driver for HyperV graphics output * drm/msm: Fix module dependencies * drm/panel: KD53T133: Support rotation * drm/pl111: Fix module dependencies * drm/qxl: Fixes * drm/stm: Cleanups * drm/sun4i: Be explicit about format modifiers * drm/vc4: Use struct gpio_desc; Cleanups * drm/vgem: Cleanups * drm/vmwgfx: Use ttm_bo_move_null() if there's nothing to copy * fbdev/mach64: Cleanups * fbdev/mb862xx: Use DEVICE_ATTR_RO Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YMBw3DF2b9udByfT@linux-uq9g
2021-06-04drm/amdgpu: Add DC support and display block for Yellow CarpNicholas Kazlauskas1-0/+1
To enable output on real display instead of virtual. Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-04drm/amdgpu: add yellow carp support for gpu_info and ip block settingAaron Liu1-0/+7
This patch adds yellow carp support for gpu_info firmware and ip block setting. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-04drm/amdgpu: add yellow carp asic_type enumAaron Liu1-0/+1
This patch adds yellow carp to amd_asic_type enum and amdgpu_asic_name[]. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-04drm/amdgpu: Don't flush/invalidate HDP for APUs and A+AEric Huang1-2/+29
Integrate two generic functions to determine if HDP flush is needed for all Asics. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-04Merge tag 'amd-drm-next-5.14-2021-06-02' of ↵Dave Airlie1-3/+29
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.14-2021-06-02: amdgpu: - GC/MM register access macro clean up for SR-IOV - Beige Goby updates - W=1 Fixes - Aldebaran fixes - Misc display fixes - ACPI ATCS/ATIF handling rework - SR-IOV fixes - RAS fixes - 16bpc fixed point format support - Initial smartshift support - RV/PCO power tuning fixes for suspend/resume - More buffer object subclassing work - Add new INFO query for additional vbios information - Add new placement for preemptable SG buffers amdkfd: - Misc fixes radeon: - W=1 Fixes - Misc cleanups UAPI: - Add new INFO query for additional vbios information Useful for debugging vbios related issues. Proposed umr patch: https://patchwork.freedesktop.org/patch/433297/ - 16bpc fixed point format support IGT test: https://lists.freedesktop.org/archives/igt-dev/2021-May/031507.html Proposed Vulkan patch: https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac3c4 - Add a new GEM flag which is only used internally in the kernel driver. Userspace is not allowed to set it. drm: - 16bpc fixed point format fourcc Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210602214009.4553-1-alexander.deucher@amd.com
2021-06-02drm/ttm: rename bo->mem and make it a pointerChristian König1-3/+3
When we want to decouble resource management from buffer management we need to be able to handle resources separately. Add a resource pointer and rename bo->mem so that all code needs to change to access the pointer instead. No functional change. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210430092508.60710-4-christian.koenig@amd.com
2021-06-01drm/amdgpu: enable smart shift on dGPU (v5)Sathishkumar S1-0/+24
enable smart shift on dGPU if it is part of HG system and the platform supports ATCS method to handle power shift. V2: avoid psc updates in baco enter and exit (Lijo) fix alignment (Shashank) V3: rebased on unified ATCS handling. (Alex) V4: check for return value and warn on failed update (Shashank) return 0 if device does not support smart shift. (Lizo) V5: rebased on ATPX/ATCS structures global (Alex) Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-27drm/amd/amdgpu/amdgpu_device: Make local function staticLee Jones1-1/+1
Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4624:6: warning: no previous prototype for ‘amdgpu_device_recheck_guilty_jobs’ [-Wmissing-prototypes] Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-26drm/amdgpu: Fix clang warning: unused label 'exit'Andrey Grodzovsky1-0/+2
Problem: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:332:1: warning: unused label 'exit' [-Wunused-label] exit: ^~~~~ Fix: Put #ifdef CONFIG_64BIT around exit Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210525184431.1170373-1-andrey.grodzovsky@amd.com
2021-05-25drm/amdgpu: add judgement for dc supportAsher Song1-1/+3
Drop DC initialization when DCN is harvested in VBIOS. The way doesn't affect virtual display ip initialization. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Asher Song <Asher.Song@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-25drm/amdgpu: Rename flag which prevents HW accessAndrey Grodzovsky1-3/+3
Make it's name not feature but function descriptive. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210521204122.762288-1-andrey.grodzovsky@amd.com
2021-05-22Merge drm/drm-next into drm-misc-nextThomas Zimmermann1-14/+29
Backmerging from drm/drm-next to the patches for AMD devices for v5.14. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2021-05-21drm/amdgpu: Indirect register access for Navi12 sriovPeng Ju Zhou1-1/+1
This patch series are used for GC/MMHUB(part)/IH_RB_CNTL indirect access in the SRIOV environment. There are 4 bits, controlled by host, to control if GC/MMHUB(part)/IH_RB_CNTL indirect access enabled. (one bit is master bit controls other 3 bits) For GC registers, changing all the register access from MMIO to RLC and use RLC as the default access method in the full access time. For partial MMHUB registers, changing their access from MMIO to RLC in the full access time, the remaining registers keep the original access method. For IH_RB_CNTL register, changing it's access from MMIO to PSP. Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19drm/amdgpu: Unmap all MMIO mappingsAndrey Grodzovsky1-3/+23
Access to those must be prevented post pci_remove v6: Drop BOs list, unampping VRAM BAR is enough. v8: Add condition of xgmi.connected_to_cpu to MTTR handling and remove MTTR handling from the old place. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210517193105.491461-1-andrey.grodzovsky@amd.com
2021-05-19drm/amdgpu: Guard against write accesses after device removalAndrey Grodzovsky1-1/+9
This should prevent writing to memory or IO ranges possibly already allocated for other uses after our device is removed. v5: Protect more places wher memcopy_to/form_io takes place Protect IB submissions v6: Switch to !drm_dev_enter instead of scoping entire code with brackets. v7: Drop guard of HW ring commands emission protection since they are in GART and not in MMIO. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-10-andrey.grodzovsky@amd.com
2021-05-19drm/amdgpu: Handle IOMMU enabled case.Andrey Grodzovsky1-2/+2
Problem: Handle all DMA IOMMU group related dependencies before the group is removed. Those manifest themself in that when IOMMU enabled DMA map/unmap is dependent on the presence of IOMMU group the device belongs to but, this group is released once the device is removed from PCI topology. Fix: Expedite all such unmap operations to pci remove driver callback. v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate v6: Drop the BO unamp list v7: Drop amdgpu_gart_fini In amdgpu_ih_ring_fini do uncinditional check (!ih->ring) to avoid freeing uniniitalized rings. Call amdgpu_ih_ring_fini unconditionally. v8: Add deatiled explanation Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210517143851.475058-1-andrey.grodzovsky@amd.com
2021-05-19drm/amdgpu: Add early fini callbackAndrey Grodzovsky1-19/+40
Use it to call disply code dependent on device->drv_data before it's set to NULL on device unplug v5: Move HW finilization into this callback to prevent MMIO accesses post cpi remove. v7: Split kfd suspend from device exit to expdite HW related stuff to amdgpu_pci_remove v8: Squash previous KFD commit into this commit to avoid compile break. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210520032057.497334-1-andrey.grodzovsky@amd.com
2021-05-19drm/amdgpu: Split amdgpu_device_fini into early and lateAndrey Grodzovsky1-8/+18
Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the last device reference is dropped. v4: Change functions prefix early->hw and late->sw Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-3-andrey.grodzovsky@amd.com
2021-05-19drm/amd/amdgpu: fix a potential deadlock in gpu resetLang Yu1-1/+0
When amdgpu_ib_ring_tests failed, the reset logic called amdgpu_device_ip_suspend twice, then deadlock occurred. Deadlock log: [ 805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). [ 806.290952] [drm] free PSP TMR buffer [ 806.319406] ============================================ [ 806.320315] WARNING: possible recursive locking detected [ 806.321225] 5.11.0-custom #1 Tainted: G W OEL [ 806.322135] -------------------------------------------- [ 806.323043] cat/2593 is trying to acquire lock: [ 806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.325668] but task is already holding lock: [ 806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.328430] other info that might help us debug this: [ 806.329539] Possible unsafe locking scenario: [ 806.330549] CPU0 [ 806.330983] ---- [ 806.331416] lock(&adev->dm.dc_lock); [ 806.332086] lock(&adev->dm.dc_lock); [ 806.332738] *** DEADLOCK *** [ 806.333747] May be due to missing lock nesting notation [ 806.334899] 3 locks held by cat/2593: [ 806.335537] #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110 [ 806.337009] #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu] [ 806.339018] #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.340869] stack backtrace: [ 806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G W OEL 5.11.0-custom #1 [ 806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020 [ 806.344413] Call Trace: [ 806.344849] dump_stack+0x93/0xbd [ 806.345435] __lock_acquire.cold+0x18a/0x2cf [ 806.346179] lock_acquire+0xca/0x390 [ 806.346807] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.347813] __mutex_lock+0x9b/0x930 [ 806.348454] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.349434] ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu] [ 806.350581] ? _raw_spin_unlock_irqrestore+0x47/0x50 [ 806.351437] ? dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.352437] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.353252] ? rcu_read_lock_sched_held+0x4f/0x80 [ 806.354064] mutex_lock_nested+0x1b/0x20 [ 806.354747] ? mutex_lock_nested+0x1b/0x20 [ 806.355457] dm_suspend+0xb8/0x1d0 [amdgpu] [ 806.356427] ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu] [ 806.357736] amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu] [ 806.360394] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] [ 806.362926] amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu] [ 806.365560] amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu] Signed-off-by: Lang Yu <Lang.Yu@amd.com> Acked-by: Christian KÃnig <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19drm/amd/amdgpu: Enable DCN IP init for Beige GobyAurabindo Pillai1-0/+1
[Why&How] Adds DCN IP block initialization for Beige Goby Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19drm/amd/amdgpu: Use IP discovery table for beige gobyChengming Gui1-0/+1
Rather than gpu info firmware. Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19drm/amd/amdgpu: set asic family and ip blocks for beige_gobyChengming Gui1-0/+1
Same as navi series Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19drm/amd/amdgpu: add beige_goby asic typeChengming Gui1-0/+1
Add chip type for beige_goby v2: fix enum count (Alex) Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19drm/amdgpu: switch to cached fw flags for gpu virt capHawking Zhang1-1/+1
Check cached firmware_flags to determine if gpu virtualization is supported in vbios Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19drm/amdgpu: add judgement when add ip blocks (v2)Likun GAO1-1/+14
Judgement whether to add an sw ip according to the harvest info. v2: fix indentation (Alex) Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10drm/amdgpu: Rename to ras_*_enabledLuben Tuikov1-2/+2
Rename, ras_hw_supported --> ras_hw_enabled, and ras_features --> ras_enabled, to show that ras_enabled is a subset of ras_hw_enabled, which itself is a subset of the ASIC capability. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10drm/amdgpu: Remove redundant ras->supportedLuben Tuikov1-2/+4
Remove redundant ras->supported, as this value is also stored in adev->ras_features. Use adev->ras_features, as that supercedes "ras", since the latter is its member. The dependency goes like this: ras <== adev->ras_features <== hw_supported, and is read as "ras depends on ras_features, which depends on hw_supported." The arrows show the flow of information, i.e. the dependency update. "hw_supported" should also live in "adev". Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10drm/amdgpu: change the default timeout for kernel compute queuesAlex Deucher1-5/+3
Change to 60s. This matches what we already do in virtualization. Infinite timeout can lead to deadlocks in the kernel. Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-10drm/amdgpu: Register VGA clients after init can no longer failKai-Heng Feng1-15/+13
When an amdgpu device fails to init, it makes another VGA device cause kernel splat: kernel: amdgpu 0000:08:00.0: amdgpu: amdgpu_device_ip_init failed kernel: amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init kernel: amdgpu: probe of 0000:08:00.0 failed with error -110 ... kernel: amdgpu 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none kernel: BUG: kernel NULL pointer dereference, address: 0000000000000018 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0000 [#1] SMP NOPTI kernel: CPU: 6 PID: 1080 Comm: Xorg Tainted: G W 5.12.0-rc8+ #12 kernel: Hardware name: HP HP EliteDesk 805 G6/872B, BIOS S09 Ver. 02.02.00 12/30/2020 kernel: RIP: 0010:amdgpu_device_vga_set_decode+0x13/0x30 [amdgpu] kernel: Code: 06 31 c0 c3 b8 ea ff ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 8b 87 90 06 00 00 48 89 e5 53 89 f3 <48> 8b 40 18 40 0f b6 f6 e8 40 58 39 fd 80 fb 01 5b 5d 19 c0 83 e0 kernel: RSP: 0018:ffffae3c0246bd68 EFLAGS: 00010002 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 kernel: RDX: ffff8dd1af5a8560 RSI: 0000000000000000 RDI: ffff8dce8c160000 kernel: RBP: ffffae3c0246bd70 R08: ffff8dd1af5985c0 R09: ffffae3c0246ba38 kernel: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000246 kernel: R13: 0000000000000000 R14: 0000000000000003 R15: ffff8dce81490000 kernel: FS: 00007f9303d8fa40(0000) GS:ffff8dd1af580000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000018 CR3: 0000000103cfa000 CR4: 0000000000350ee0 kernel: Call Trace: kernel: vga_arbiter_notify_clients.part.0+0x4a/0x80 kernel: vga_get+0x17f/0x1c0 kernel: vga_arb_write+0x121/0x6a0 kernel: ? apparmor_file_permission+0x1c/0x20 kernel: ? security_file_permission+0x30/0x180 kernel: vfs_write+0xca/0x280 kernel: ksys_write+0x67/0xe0 kernel: __x64_sys_write+0x1a/0x20 kernel: do_syscall_64+0x38/0x90 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae kernel: RIP: 0033:0x7f93041e02f7 kernel: Code: 75 05 48 83 c4 58 c3 e8 f7 33 ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 kernel: RSP: 002b:00007fff60e49b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 kernel: RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f93041e02f7 kernel: RDX: 000000000000000b RSI: 00007fff60e49b40 RDI: 000000000000000f kernel: RBP: 00007fff60e49b40 R08: 00000000ffffffff R09: 00007fff60e499d0 kernel: R10: 00007f93049350b5 R11: 0000000000000246 R12: 000056111d45e808 kernel: R13: 0000000000000000 R14: 000056111d45e7f8 R15: 000056111d46c980 kernel: Modules linked in: nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq input_leds snd_seq_device snd_timer snd soundcore joydev kvm_amd serio_raw k10temp mac_hid hp_wmi ccp kvm sparse_keymap wmi_bmof ucsi_acpi efi_pstore typec_ucsi rapl typec video wmi sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt crc32_pclmul fb_sys_fops ghash_clmulni_intel cec rc_core aesni_intel crypto_simd psmouse cryptd r8169 i2c_piix4 drm ahci xhci_pci realtek libahci xhci_pci_renesas gpio_amdpt gpio_generic kernel: CR2: 0000000000000018 kernel: ---[ end trace 76d04313d4214c51 ]--- Commit 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") makes amdgpu_driver_unload_kms() skips amdgpu_device_fini(), so the VGA clients remain registered. So when vga_arbiter_notify_clients() iterates over registered clients, it causes NULL pointer dereference. Since there's no reason to register VGA clients that early, so solve the issue by putting them after all the goto cleanups. v2: - Remove redundant vga_switcheroo cleanup in failed: label. Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-29drm/amdgpu: Register VGA clients after init can no longer failKai-Heng Feng1-15/+13
When an amdgpu device fails to init, it makes another VGA device cause kernel splat: kernel: amdgpu 0000:08:00.0: amdgpu: amdgpu_device_ip_init failed kernel: amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init kernel: amdgpu: probe of 0000:08:00.0 failed with error -110 ... kernel: amdgpu 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none kernel: BUG: kernel NULL pointer dereference, address: 0000000000000018 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0000 [#1] SMP NOPTI kernel: CPU: 6 PID: 1080 Comm: Xorg Tainted: G W 5.12.0-rc8+ #12 kernel: Hardware name: HP HP EliteDesk 805 G6/872B, BIOS S09 Ver. 02.02.00 12/30/2020 kernel: RIP: 0010:amdgpu_device_vga_set_decode+0x13/0x30 [amdgpu] kernel: Code: 06 31 c0 c3 b8 ea ff ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 8b 87 90 06 00 00 48 89 e5 53 89 f3 <48> 8b 40 18 40 0f b6 f6 e8 40 58 39 fd 80 fb 01 5b 5d 19 c0 83 e0 kernel: RSP: 0018:ffffae3c0246bd68 EFLAGS: 00010002 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 kernel: RDX: ffff8dd1af5a8560 RSI: 0000000000000000 RDI: ffff8dce8c160000 kernel: RBP: ffffae3c0246bd70 R08: ffff8dd1af5985c0 R09: ffffae3c0246ba38 kernel: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000246 kernel: R13: 0000000000000000 R14: 0000000000000003 R15: ffff8dce81490000 kernel: FS: 00007f9303d8fa40(0000) GS:ffff8dd1af580000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000018 CR3: 0000000103cfa000 CR4: 0000000000350ee0 kernel: Call Trace: kernel: vga_arbiter_notify_clients.part.0+0x4a/0x80 kernel: vga_get+0x17f/0x1c0 kernel: vga_arb_write+0x121/0x6a0 kernel: ? apparmor_file_permission+0x1c/0x20 kernel: ? security_file_permission+0x30/0x180 kernel: vfs_write+0xca/0x280 kernel: ksys_write+0x67/0xe0 kernel: __x64_sys_write+0x1a/0x20 kernel: do_syscall_64+0x38/0x90 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae kernel: RIP: 0033:0x7f93041e02f7 kernel: Code: 75 05 48 83 c4 58 c3 e8 f7 33 ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 kernel: RSP: 002b:00007fff60e49b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 kernel: RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f93041e02f7 kernel: RDX: 000000000000000b RSI: 00007fff60e49b40 RDI: 000000000000000f kernel: RBP: 00007fff60e49b40 R08: 00000000ffffffff R09: 00007fff60e499d0 kernel: R10: 00007f93049350b5 R11: 0000000000000246 R12: 000056111d45e808 kernel: R13: 0000000000000000 R14: 000056111d45e7f8 R15: 000056111d46c980 kernel: Modules linked in: nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq input_leds snd_seq_device snd_timer snd soundcore joydev kvm_amd serio_raw k10temp mac_hid hp_wmi ccp kvm sparse_keymap wmi_bmof ucsi_acpi efi_pstore typec_ucsi rapl typec video wmi sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt crc32_pclmul fb_sys_fops ghash_clmulni_intel cec rc_core aesni_intel crypto_simd psmouse cryptd r8169 i2c_piix4 drm ahci xhci_pci realtek libahci xhci_pci_renesas gpio_amdpt gpio_generic kernel: CR2: 0000000000000018 kernel: ---[ end trace 76d04313d4214c51 ]--- Commit 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") makes amdgpu_driver_unload_kms() skips amdgpu_device_fini(), so the VGA clients remain registered. So when vga_arbiter_notify_clients() iterates over registered clients, it causes NULL pointer dereference. Since there's no reason to register VGA clients that early, so solve the issue by putting them after all the goto cleanups. v2: - Remove redundant vga_switcheroo cleanup in failed: label. Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-28drm/amd/amdgpu/sriov disable all ip hw status by defaultJack Zhang1-1/+1
Disable all ip's hw status to false before any hw_init. Only set it to true until its hw_init is executed. The old 5.9 branch has this change but somehow the 5.11 kernrel does not have this fix. Without this change, sriov tdr have gfx IB test fail. Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Review-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-20drm/amd/amdgpu/amdgpu_device: Remove unused variable 'r'Lee Jones1-3/+2
Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_suspend’: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3733:6: warning: variable ‘r’ set but not used [-Wunused-but-set-variable] Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: split mmhub callbacks into ras and non-ras onesHawking Zhang1-5/+6
mmhub ras is only avaiable in cerntain mmhub ip generation. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: indirect register access for nv12 sriovPeng Ju Zhou1-1/+1
1. expand rlcg interface for gc & mmhub indirect access 2. add rlcg interface for no kiq v2: squash in fix for gfx9 (Changfeng) Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Emily.Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: indirect register access for nv12 sriovPeng Ju Zhou1-0/+5
get pf2vf msg info at it's earliest time so that guest driver can use these info to decide whether register indirect access enabled. Signed-off-by: Peng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: Emily.Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: Reset error code for 'no handler' caseLijo Lazar1-3/+8
If reset handler is not implemented, reset error before proceeding. Fixes issue with the following trace - [ 106.508592] amdgpu 0000:b1:00.0: amdgpu: ASIC reset failed with error, -38 for drm dev, 0000:b1:00.0 [ 106.508972] amdgpu 0000:b1:00.0: amdgpu: GPU reset succeeded, trying to resume [ 106.509116] [drm] PCIE GART of 512M enabled. [ 106.509120] [drm] PTB located at 0x0000008000000000 [ 106.509136] [drm] VRAM is lost due to GPU reset! [ 106.509332] [drm] PSP is resuming... Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-and-tested-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: Enable recovery on aldebaranLijo Lazar1-0/+1
Add aldebaran to devices which support recovery Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>