aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2021-10-28drm/amdgpu/discovery: add UVD/VCN IP instance info for soc15 partsAlex Deucher1-0/+3
Add secondary instance version info for vega20, arcturure, and aldebaran. Acked-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-28drm/amdgpu: remove GPRs init for ALDEBARAN in gpu reset (v3)Tao Zhou1-3/+3
Remove GPRs init for ALDEBARAN in gpu reset temporarily, will add the init once the algorithm is stable. v2: Only remove GPRs init in gpu reset. v3: Suspend needs it, only skip it in gpu reset. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-28drm/amdgpu: skip GPRs init for some CU settings on ALDEBARANTao Zhou1-0/+5
Skip GPRs init in specific condition since current GPRs init algorithm only works for some CU settings. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-28drm/amdgpu: Update TA version output in driverCandice Li7-26/+26
TA version should only be displayed in firmware version column. Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-28drm/amdgpu: fix a potential memory leak in amdgpu_device_fini_sw()Lang Yu1-1/+1
amdgpu_fence_driver_sw_fini() should be executed before amdgpu_device_ip_fini(), otherwise fence driver resource won't be properly freed as adev->rings have been tore down. Fixes: 72c8c97b1522 ("drm/amdgpu: Split amdgpu_device_fini into early and late") Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-28drm/amdkfd: Separate pinned BOs destruction from general routineLang Yu2-0/+12
Currently, all kfd BOs use same destruction routine. But pinned BOs are not unpinned properly. Separate them from general routine. v2 (Felix): Add safeguard to prevent user space from freeing signal BO. Kunmap signal BO in the event of setting event page error. Just kunmap signal BO to avoid duplicating the code. Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-28drm/amdkfd: restore userptr ignore bad address errorPhilip Yang2-10/+20
The userptr can be unmapped by application and still registered to driver, restore userptr work return user pages will get -EFAULT bad address error. Pretend this error as succeed. GPU access this userptr will have VM fault later, it is better than application soft hangs with stalled user mode queues. v2: squash in warning fix (Alex) Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-28drm/amdgpu: Add kernel parameter support for ignoring bad page thresholdKent Russell3-5/+13
When a GPU hits the bad_page_threshold, it will not be initialized by the amdgpu driver. This means that the table cannot be cleared, nor can information gathering be performed (getting serial number, BDF, etc). If the bad_page_threshold kernel parameter is set to -2, continue to initialize the GPU, while printing a warning to dmesg that this action has been done v2: squash in Luben's fix to restore RAS info reporting Cc: Luben Tuikov <[email protected]> Cc: Mukul Joshi <[email protected]> Signed-off-by: Kent Russell <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-28drm/amdgpu: Warn when bad pages approaches 90% thresholdKent Russell1-0/+7
dmesg doesn't warn when the number of bad pages approaches the threshold for page retirement. WARN when the number of bad pages is at 90% or greater for easier checks and planning, instead of waiting until the GPU is full of bad pages. Cc: Luben Tuikov <[email protected]> Cc: Mukul Joshi <[email protected]> Signed-off-by: Kent Russell <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-28BackMerge tag 'v5.15-rc7' into drm-nextDave Airlie1-0/+4
The msm next tree is based on rc3, so let's just backmerge rc7 before pulling it in. Signed-off-by: Dave Airlie <[email protected]>
2021-10-25dma-buf: move dma-buf symbols into the DMA_BUF module namespaceGreg Kroah-Hartman1-0/+3
In order to better track where in the kernel the dma-buf code is used, put the symbols in the namespace DMA_BUF and modify all users of the symbols to properly import the namespace to not break the build at the same time. Now the output of modinfo shows the use of these symbols, making it easier to watch for users over time: $ modinfo drivers/misc/fastrpc.ko | grep import import_ns: DMA_BUF Cc: "Pan, Xinhui" <[email protected]> Cc: David Airlie <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: [email protected] Acked-by: Daniel Vetter <[email protected]> Acked-by: Christian König <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: Sumit Semwal <[email protected]> Acked-by: Alex Deucher <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2021-10-22drm/amdgpu/nbio7.4: use original HDP_FLUSH bitsAlex Deucher3-1/+20
The extended bits were not available for use on vega20 and presumably arcturus as well. Fixes: a0f9f854666834 ("drm/amdgpu/nbio7.4: don't use GPU_HDP_FLUSH bit 12") Reviewed-and-tested-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-21drm/amdgpu: Workaround harvesting info for some navy flounder boardsAlex Deucher1-0/+4
Some navy flounder boards do not properly mark harvested VCN instances. Fix that here. v2: use IP versions Fixes: 1b592d00b4ac83 ("drm/amdgpu/vcn: remove manual instance setting") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1743 Reviewed-and-tested-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-21drm/amdgpu/vcn3.0: remove intermediate variableAlex Deucher1-9/+2
No need to use the id variable, just use the constant plus instance offset directly. Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-21drm/amdgpu/vcn2.0: remove intermediate variableAlex Deucher1-3/+2
No need to use the tmp variable, just use the constant directly. Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-21drm/amdgpu: Consolidate VCN firmware setup codeAlex Deucher6-50/+33
Roughly the same code was present in all VCN versions. Consolidate it into a single function. v2: use AMDGPU_UCODE_ID_VCN + i, check if num_inst >= 2 Signed-off-by: Alex Deucher <[email protected]> Reviewed-by: Leo Liu <[email protected]> Reviewed-by: James Zhu <[email protected]>
2021-10-21drm/amdgpu/vcn3.0: handle harvesting in firmware setupAlex Deucher1-8/+8
Only enable firmware for the instance that is enabled. v2: use AMDGPU_UCODE_ID_VCN + i Fixes: 1b592d00b4ac83 ("drm/amdgpu/vcn: remove manual instance setting") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1743 Reviewed-by: James Zhu <[email protected]> Reviewed-by: Leo Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-21drm/amd/amdgpu: add dummy_page_addr to sriov msgJingwen Chen2-1/+3
Add dummy_page_addr to sriov msg for host driver to set GCVM_L2_PROTECTION_DEFAULT_ADDR* registers correctly. v2: should update vf2pf msg instead Signed-off-by: Jingwen Chen <[email protected]> Reviewed-by: Horace Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-21drm/amdgpu: remove grbm cam index/data operations for gfx v10Huang Rui1-22/+0
PSP firmware will be responsible for applying the GRBM CAM remapping in the production. And the GRBM_CAM_INDEX / GRBM_CAM_DATA registers will be protected by PSP under security policy. So remove it according to the new security policy. Signed-off-by: Huang Rui <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-20drm/amdgpu: support B0&B1 external revision id for yellow carpAaron Liu1-1/+1
B0 internal rev_id is 0x01, B1 internal rev_id is 0x02. The external rev_id for B0 and B1 is 0x20. The original expression is not suitable for B1. v2: squash in fix for display code (Alex) Signed-off-by: Aaron Liu <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-10-20drm/amdgpu: Clarify error when hitting bad page thresholdKent Russell1-1/+1
Change the error message when the bad_page_threshold is reached, explicitly stating that the GPU will not be initialized. Cc: Luben Tuikov <[email protected]> Cc: Mukul Joshi <[email protected]> Signed-off-by: Kent Russell <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-20drm/amdgpu: drop navi reg init functionsAlex Deucher10-433/+2
No longer used since IP enumeration is driven by the IP discovery table now. Acked-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-20drm/amdgpu: drop nv_set_ip_blocks()Alex Deucher2-294/+0
No longer used since IP enumeration is now driven by amdgpu IP discovery code. Acked-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-20drm/amdgpu: drop soc15_set_ip_blocks()Alex Deucher2-180/+0
No longer used since IP enumeration is now driven by amdgpu IP discovery code. Acked-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-20drm/amdgpu/gfx10: fix typo in gfx_v10_0_update_gfx_clock_gating()Alex Deucher1-2/+3
Check was incorrectly converted to IP version checking. Fixes: 4b0ad8425498ba ("drm/amdgpu/gfx10: convert to IP version checking") Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-20drm/amdgpu: replace snprintf in show functions with sysfs_emitQing Wang1-1/+1
show() must not use snprintf() when formatting the value to be returned to user space. Fix the following coccicheck warning: drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c:427: WARNING: use scnprintf or sprintf. Signed-off-by: Qing Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-20drm/amdgpu: support B0&B1 external revision id for yellow carpAaron Liu1-1/+1
B0 internal rev_id is 0x01, B1 internal rev_id is 0x02. The external rev_id for B0 and B1 is 0x20. The original expression is not suitable for B1. v2: squash in fix for display code (Alex) Signed-off-by: Aaron Liu <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-19drm/amdgpu/discovery: parse hw_id_name for SDMA instance 2 and 3Guchun Chen1-0/+2
Otherwise, hw_id_name string is NULL for SDMA 2 and 3 when dumping ip version from VBIOS. Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-19drm/amdgpu: output warning for unsupported ras error inject (v2)Tao Zhou2-1/+10
Output a warning message if RAS TA returns UNSUPPORTED_ERROR_INJ status. v2: implement it in psp_ras_ta_check_status function. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-19drm/amdgpu: centralize checking for RAS TA statusTao Zhou1-4/+20
Create new function to check status returned by RAS TA. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-19drm/amd/amdgpu: Do irq_fini_hw after ip_fini_earlyYuBiao Wang1-2/+2
[Why] drm_irq_uninstall is called in irq_fini_hw so that irq is disabled in sw stage. SMU (and maybe other IP blocks) fini_hw will call irq_put for cleanup and the whole cleanup process will be skipped because of drm->irq_enable = false. [How] Move ip_fini_early before irq_fini_hw. Signed-off-by: YuBiao Wang <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-19drm/amdgpu: load PSP RL in resume pathTao Zhou1-0/+6
Some registers' access will fail without PSP RL after resume. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-19drm/amdkfd: Fix an inappropriate error handling in allloc memory of gpuLang Yu1-1/+1
We should unreference a gem object instead of an amdgpu bo here. Fixes: fd9a9f8801de ("drm/amdgpu: Use GEM obj reference for KFD BOs") Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-13drm/amdgpu/psp: add some missing cases to ↵Alex Deucher1-1/+2
psp_check_pmfw_centralized_cstate_management Missed a few asics. v2: update comment Fixes: 82d05736c47b19 ("drm/amdgpu/amdgpu_psp: convert to IP version checking") Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-13drm/amdkfd: unregistered svm range not overlap with TTM rangePhilip Yang2-2/+4
When creating unregistered new svm range to recover retry fault, avoid new svm range to overlap with ranges or userptr ranges managed by TTM, otherwise svm migration will trigger TTM or userptr eviction, to evict user queues unexpectedly. Change helper amdgpu_ttm_tt_affect_userptr to return userptr which is inside the range. Add helper svm_range_check_vm_userptr to scan all userptr of the vm, and return overlap userptr bo start, last. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-13drm/amdkfd: fix boot failure when iommu is disabled in Picasso.Yifan Zhang1-4/+0
When IOMMU disabled in sbios and kfd in iommuv2 path, iommuv2 init will fail. But this failure should not block amdgpu driver init. Reported-by: youling <[email protected]> Tested-by: youling <[email protected]> Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-13drm/amdgpu: Fix RAS page retirement with mode2 reset on AldebaranMukul Joshi1-12/+21
During mode2 reset, the GPU is temporarily removed from the mgpu_info list. As a result, page retirement fails because it cannot find the GPU in the GPU list. To fix this, create our own list of GPUs that support MCE notifier based page retirement and use that list to check if the UMC error occurred on a GPU that supports MCE notifier based page retirement. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-13drm/amdgpu: Enable RAS error injection after mode2 reset on AldebaranMukul Joshi1-0/+2
Add the missing call to re-enable RAS error injections on the Aldebaran mode2 reset code path. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-13drm/amdgpu: enable display for cyan skillfishLang Yu1-2/+1
Display support for cyan skillfish is ready now. Enable it! Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Huang Rui <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-13drm/amdgpu/nbio2.3: don't use GPU_HDP_FLUSH bit 12Alex Deucher3-1/+36
It's used internally by firmware. Using it in the driver could conflict with firmware. v2: squash in fix for navi1x (Alex) Reviewed-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-11drm/amdgpu/nbio7.4: don't use GPU_HDP_FLUSH bit 12Alex Deucher1-9/+12
It's used internally by firmware. Using it in the driver could conflict with firmware. Reviewed-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-11Merge tag 'drm-misc-next-2021-10-06' of ↵Dave Airlie1-6/+19
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.16: UAPI Changes: - Allow empty drm leases for creating separate GEM namespaces. Cross-subsystem Changes: - Slightly rework dma_buf_poll. - Add dma_resv_for_each_fence_unlocked to iterate, and use it inside the lockless dma-resv functions. Core Changes: - Allow devm_drm_of_get_bridge to build without CONFIG_OF for compile testing. - Add more DP2 headers. - fix CONFIG_FB dependency in fb_helper. - Add DRM_FORMAT_R8 to drm_format_info, and helpers for RGB332 and RGB888. - Fix crash on a 0 or invalid EDID. Driver Changes: - Apply and revert DRM_MODESET_LOCK_ALL_BEGIN. - Add mode_valid to ti-sn65dsi86 bridge. - Support multiple syncobjs in v3d. - Add R8, RGB332 and RGB888 pixel formats to GUD. - Use devm_add_action_or_reset in dw-hdmi-cec. Signed-off-by: Dave Airlie <[email protected]> # gpg: Signature made Wed 06 Oct 2021 20:48:12 AEST # gpg: using RSA key B97BD6A80CAC4981091AE547FE558C72A67013C3 # gpg: Good signature from "Maarten Lankhorst <[email protected]>" [expired] # gpg: aka "Maarten Lankhorst <[email protected]>" [expired] # gpg: aka "Maarten Lankhorst <[email protected]>" [expired] # gpg: Note: This key has expired! # Primary key fingerprint: B97B D6A8 0CAC 4981 091A E547 FE55 8C72 A670 13C3 From: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-10-08drm/amdgpu: use adev_to_drm for consistency when accessing drm_deviceGuchun Chen15-24/+24
adev_to_drm is used everywhere, so improve recent changes when accessing drm_device pointer from amdgpu_device. Signed-off-by: Guchun Chen <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-08drm/amdgpu: add missing case for HDP for renoirAlex Deucher1-0/+1
Missing 4.1.2. Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-08drm/amdgpu/discovery: add missing case for SMU 11.0.5Alex Deucher1-0/+1
Was missed when converting the driver over to IP based initialization. Tested-by: Harry Wentland <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-07drm/amdgpu: unify BO evicting method in amdgpu_ttmNirmoy Das6-35/+58
Unify BO evicting functionality for possible memory types in amdgpu_ttm.c. Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-06drm/amdgpu: Register MCE notifier for Aldebaran RASMukul Joshi1-0/+141
On Aldebaran, GPU driver will handle bad page retirement for GPU memory even though UMC is host managed. As a result, register a bad page retirement handler on the mce notifier chain to retire bad pages on Aldebaran. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Yazen Ghannam <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-06drm/amdgpu: return early if debugfs is not initializedNirmoy Das1-0/+3
Check first if debugfs is initialized before creating amdgpu debugfs files. References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686 Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-06drm/amd/display: fix DCC settings for DCN3Marek Olšák1-1/+2
ind_block_64b_no_128bcl means INDEP_64B && INDEP_128B && MAX_COMPRESSED_BLOCK_SIZE == 64B. Only used by gfx10.3. ind_block_64b means INDEP_64B && !INDEP_128B && MAX_COMPRESSED_BLOCK_SIZE == 64B. Only used by gfx9 and gfx10. Signed-off-by: Marek Olšák <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-10-05drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resumeGuchun Chen2-0/+7
In current code, when a PCI error state pci_channel_io_normal is detectd, it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI driver will continue the execution of PCI resume callback report_resume by pci_walk_bridge, and the callback will go into amdgpu_pci_resume finally, where write lock is releasd unconditionally without acquiring such lock first. In this case, a deadlock will happen when other threads start to acquire the read lock. To fix this, add a member in amdgpu_device strucutre to cache pci_channel_state, and only continue the execution in amdgpu_pci_resume when it's pci_channel_io_frozen. Fixes: c9a6b82f45e2 ("drm/amdgpu: Implement DPC recovery") Suggested-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>