aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
AgeCommit message (Collapse)AuthorFilesLines
2023-01-09drm/amd: Delay removal of the firmware framebufferMario Limonciello1-6/+0
Removing the firmware framebuffer from the driver means that even if the driver doesn't support the IP blocks in a GPU it will no longer be functional after the driver fails to initialize. This change will ensure that unsupported IP blocks at least cause the driver to work with the EFI framebuffer. Cc: stable@vger.kernel.org Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-03Revert "drm/amd/display: Enable Freesync Video Mode by default"Michel Dänzer1-0/+27
This reverts commit de05abe6b9d0fe08f65d744f7f75a4cba4df27ad. The bug referenced below was bisected to this commit. There has been no activity toward fixing it in 3 months, so let's revert for now. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-03drm/amdgpu: allow zero as vram limitChristian König1-1/+1
This allows testing the driver without any VRAM. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: bump minor version number for DEV_INFO and SENSOR IOCTLs updateEvan Quan1-2/+4
Update AMDGPU_INFO_DEV_INFO IOCTL for minimum engine and memory clock. And update AMDGPU_INFO_SENSOR IOCTL for PEAK_PSTATE engine and memory clock. User applications can better utilize these IOCTLs to get needed informations. Increase the minor version number to indicate that the new flags are available. Proposed mesa patch: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/278 Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-09drm/amdgpu: handle polaris10/11 overlap asics (v2)Alex Deucher1-2/+11
Some special polaris 10 chips overlap with the polaris11 DID range. Handle this properly in the driver. v2: use local flags for other function calls. Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-12-01drm/amdgpu: improve GART and GTT documentationPeter Maucher1-4/+6
Document difference between amdgpu.gartsize and amdgpu.gttsize module parameters, as initially explained by Alex Deucher here: https://lists.freedesktop.org/archives/dri-devel/2022-October/375358.html v2: minor cleanups (Alex) Signed-off-by: Peter Maucher <bellosilicio@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29drm/amdgpu: use dev_dbg to print messages in runtime cycleGuchun Chen1-1/+1
Runtime PM can happen pretty frequently, as these printings may be annoyed, switch to dev_dbg. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29drm/amdgpu: add printing to indicate rpm completenessGuchun Chen1-0/+2
Add an explicit printing to tell when finishing rpm execution in amdgpu. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23drm/amdgpu: update documentation of parameter amdgpu_gtt_sizeZhenGuo Yin1-2/+1
Fixes: f7ba887f606b ("drm/amdgpu: Adjust logic around GTT size (v3)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-22Merge tag 'amd-drm-next-6.2-2022-11-18' of ↵Dave Airlie1-5/+2
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.2-2022-11-18: amdgpu: - SR-IOV fixes - Clean up DC checks - DCN 3.2.x fixes - DCN 3.1.x fixes - Don't enable degamma on asics which don't support it - IP discovery fixes - BACO fixes - Fix vbios allocation handling when vkms is enabled - Drop buggy tdr advanced mode GPU reset handling - Fix the build when DCN is not set in kconfig - MST DSC fixes - Userptr fixes - FRU and RAS EEPROM fixes - VCN 4.x RAS support - Aldrebaran CU occupancy reporting fix - PSP ring cleanup amdkfd: - Memory limit fix - Enable cooperative launch on gfx 10.3 amd-drm-next-6.2-2022-11-11: amdgpu: - SMU 13.x updates - GPUVM TLB race fix - DCN 3.1.4 updates - DCN 3.2.x updates - PSR fixes - Kerneldoc fix - Vega10 fan fix - GPUVM locking fixes in error pathes - BACO fix for Beige Goby - EEPROM I2C address cleanup - GFXOFF fix - Fix DC memory leak in error pathes - Flexible array updates - Mtype fix for GPUVM PTEs - Move Kconfig into amdgpu directory - SR-IOV updates - Fix possible memory leak in CS IOCTL error path amdkfd: - Fix possible memory overrun - CRIU fixes radeon: - ACPI ref count fix - HDA audio notifier support - Move Kconfig into radeon directory UAPI: - Add new GEM_CREATE flags to help to transition more KFD functionality to the DRM UAPI. These are used internally in the driver to align location based memory coherency requirements from memory allocated in the KFD with how we manage GPUVM PTEs. They are currently blocked in the GEM_CREATE IOCTL as we don't have a user right now. They are just used internally in the kernel driver for now for existing KFD memory allocations. So a change to the UAPI header, but no functional change in the UAPI. From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221118170807.6505-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-11-16Merge tag 'drm-misc-next-2022-11-10-1' of ↵Dave Airlie1-0/+1
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 6.2: UAPI Changes: Cross-subsystem Changes: Core Changes: - atomic-helper: Add begin_fb_access and end_fb_access hooks - fb-helper: Rework to move fb emulation into helpers - scheduler: rework entity flush, kill and fini - ttm: Optimize pool allocations Driver Changes: - amdgpu: scheduler rework - hdlcd: Switch to DRM-managed resources - ingenic: Fix registration error path - lcdif: FIFO threshold tuning - meson: Fix return type of cvbs' mode_valid - ofdrm: multiple fixes (kconfig, types, endianness) - sun4i: A100 and D1 support - panel: - New Panel: Jadard JD9365DA-H3 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20221110083612.g63eaocoaa554soh@houat
2022-11-15drm/amdgpu: revert "implement tdr advanced mode"Christian König1-1/+1
This reverts commit e6c6338f393b74ac0b303d567bb918b44ae7ad75. This feature basically re-submits one job after another to figure out which one was the one causing a hang. This is obviously incompatible with gang-submit which requires that multiple jobs run at the same time. It's also absolutely not helpful to crash the hardware multiple times if a clean recovery is desired. For testing and debugging environments we should rather disable recovery alltogether to be able to inspect the state with a hw debugger. Additional to that the sw implementation is clearly buggy and causes reference count issues for the hardware fence. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15drm/amdgpu: remove the DID of Vangogh from pciidlistPerry Yuan1-3/+0
change the vangogh family to use IP discovery path to initialize IP list, this needs to remove the DID from the PCI ID list to allow the IP discovery path to set all the IP versions correctly. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15drm/amdgpu: clarify DC checksAlex Deucher1-1/+1
There are several places where we don't want to check if a particular asic could support DC, but rather, if DC is enabled. Set a flag if DC is enabled and check for that rather than if a device supports DC or not. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-05drm/fb-helper: Move generic fbdev emulation into separate source fileThomas Zimmermann1-0/+1
Move the generic fbdev implementation into its own source and header file. Adapt drivers. No functional changes, but some of the internal helpers have been renamed to fit into the drm_fbdev_ naming scheme. v3: * rename drm_fbdev.{c,h} to drm_fbdev_generic.{c,h} * rebase onto vmwgfx changes * rebase onto xlnx changes * fix include statements in amdgpu Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221103151446.2638-22-tzimmermann@suse.de
2022-11-01drm/amdgpu: Disable GPU reset on SRIOV before remove pci.Gavin Wan1-1/+2
The recent change brought a bug on SRIOV envrionment. It caused unloading amdgpu failed on Guest VM. The reason is that the VF FLR was requested while unloading amdgpu driver, but the VF FLR of SRIOV sequence is wrong while removing PCI device. For SRIOV, the guest driver should not trigger the whole XGMI hive to do the reset. Host driver control how the device been reset. Fixes: f5c7e7797060 ("drm/amdgpu: Adjust removal control flow for smu v13_0_2") Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: Gavin Wan <Gavin.Wan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-07Merge tag 'driver-core-6.1-rc1' of ↵Linus Torvalds1-0/+14
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core and debug printk changes for 6.1-rc1. Included in here is: - dynamic debug updates for the core and the drm subsystem. The drm changes have all been acked by the relevant maintainers - kernfs fixes for syzbot reported problems - kernfs refactors and updates for cgroup requirements - magic number cleanups and removals from the kernel tree (they were not being used and they really did not actually do anything) - other tiny cleanups All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (74 commits) docs: filesystems: sysfs: Make text and code for ->show() consistent Documentation: NBD_REQUEST_MAGIC isn't a magic number a.out: restore CMAGIC device property: Add const qualifier to device_get_match_data() parameter drm_print: add _ddebug descriptor to drm_*dbg prototypes drm_print: prefer bare printk KERN_DEBUG on generic fn drm_print: optimize drm_debug_enabled for jump-label drm-print: add drm_dbg_driver to improve namespace symmetry drm-print.h: include dyndbg header drm_print: wrap drm_*_dbg in dyndbg descriptor factory macro drm_print: interpose drm_*dbg with forwarding macros drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers. drm_print: condense enum drm_debug_category debugfs: use DEFINE_SHOW_ATTRIBUTE to define debugfs_regset32_fops driver core: use IS_ERR_OR_NULL() helper in device_create_groups_vargs() Documentation: ENI155_MAGIC isn't a magic number Documentation: NBD_REPLY_MAGIC isn't a magic number nbd: remove define-only NBD_MAGIC, previously magic number Documentation: FW_HEADER_MAGIC isn't a magic number Documentation: EEPROM_MAGIC_VALUE isn't a magic number ...
2022-09-24drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers.Jim Cromie1-0/+14
Use DECLARE_DYNDBG_CLASSMAP across DRM: - in .c files, since macro defines/initializes a record - in drivers, $mod_{drv,drm,param}.c ie where param setup is done, since a classmap is param related - in drm/drm_print.c since existing __drm_debug param is defined there, and we ifdef it, and provide an elaborated alternative. - in drm_*_helper modules: dp/drm_dp - 1st item in makefile target drivers/gpu/drm/drm_crtc_helper.c - random pick iirc. Since these modules all use identical CLASSMAP declarations (ie: names and .class_id's) they will all respond together to "class DRM_UT_*" query-commands: :#> echo class DRM_UT_KMS +p > /proc/dynamic_debug/control NOTES: This changes __drm_debug from int to ulong, so BIT() is usable on it. DRM's enum drm_debug_category values need to sync with the index of their respective class-names here. Then .class_id == category, and dyndbg's class FOO mechanisms will enable drm_dbg(DRM_UT_KMS, ...). Though DRM needs consistent categories across all modules, thats not generally needed; modules X and Y could define FOO differently (ie a different NAME => class_id mapping), changes are made according to each module's private class-map. No callsites are actually selected by this patch, since none are class'd yet. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20220912052852.1123868-3-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-20drm/amdgpu: bump minor for gang submitChristian König1-1/+2
Since that has now landed bump the minor to let userspace know about it. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19drm/amdgpu: Fixed psp fence and memory issues when removing amdgpu deviceYiPeng Chai1-0/+1
V3: Fixed psp fence and memory issues for the asic using smu v13_0_2 when removing amdgpu device. [Why]: 1. psp_suspend->psp_free_shared_bufs-> psp_ta_free_shared_buf-> amdgpu_bo_free_kernel-> ...->amdgpu_bo_release_notify-> amdgpu_fill_buffer psp will free vram memory used by psp when psp_suspend is called. But for the asic using smu v13_0_2, because psp_suspend is called before adev->shutdown is set to true when removing the first hive device, amdgpu fill_buffer will be called, which will cause fence issues when evicting all vram resources in amdgpu vram mgr_fini. 2. Since psp_hw_fini is not called after calling psp_suspend and psp_suspend only calls psp_ring_stop, the psp ring memory will not be released when amdgpu device is removed. [How]: 1. Set shutdown to true before calling amdgpu_device_gpu_recover, then amdgpu_fill_buffer will not be called when psp_suspend is called. 2. Free psp ring memory in psp_sw_fini. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19drm/amdgpu: Adjust removal control flow for smu v13_0_2YiPeng Chai1-0/+30
Adjust removal control flow for smu v13_0_2: During amdgpu uninstallation, when removing the first device, the kernel needs to first send a mode1reset message to all gpu devices. Otherwise, smu initialization will fail the next time amdgpu is installed. V2: 1. Update commit comments. 2. Remove the global variable amdgpu_device_remove_cnt and add a variable to the structure amdgpu_hive_info. 3. Use hive to detect the first removed device instead of a global variable. V3: 1. Update commit comments. 2. Split a patch into multiple patches. 3. The current patch does: a. Add a work mode of AMDGPU_RESET_FOR_DEVICE_REMOVE into the existing gpu recover path, which make all devices in hive list only have HW reset but no resume (except the base IP). b. Call AMDGPU_RESET_FOR_DEVICE_REMOVE and AMDGPU_NEED_FULL_RESET mode of amdgpu_device_gpu_recover in amdgpu_pci_remove when removing the first device in hive list. c. When removing the first device, the IP blocks keyword function call sequence is as follows: .suspend->mode1reset->.resume(basic ip)->.hw_fini->.early_fini->.sw_fini. ^ | |-<----------<---------<----| The first three sequences are because of a call to amdgpu_device_gpu_recover. The three sequences will be executed in a loop until all devices in the hive list are iterated. The sequences starting from .hw_fini only apply to the first device. Since .suspend has been called before, except the resumed phase1 basic ip blocks, all other ip blocks .hw_fini of current device will do nothing. d. When removing other devices, the calling sequences is the same as legacy: .hw_fini -> .early_fini -> .sw_fini. Since .suspend has been called when removing the first device, except the resumed phase1 basic ip blocks, all of other ip blocks .hw_fini of current device will do nothing. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-07drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is ↵YiPeng Chai1-2/+2
uninstalled V1: The psp_cmd_submit_buf function is called by psp_hw_fini to send TA unload messages to psp to terminate ras, asd and tmr. But when amdgpu is uninstalled, drm_dev_unplug is called earlier than psp_hw_fini in amdgpu_pci_remove, the calling order as follows: static void amdgpu_pci_remove(struct pci_dev *pdev) { drm_dev_unplug ...... amdgpu_driver_unload_kms->amdgpu_device_fini_hw->... ->.hw_fini->psp_hw_fini->... ->psp_ta_unload->psp_cmd_submit_buf ...... } The program will return when calling drm_dev_enter in psp_cmd_submit_buf. So the call to drm_dev_enter in psp_cmd_submit_buf should be removed, so that the TA unload messages can be sent to the psp when amdgpu is uninstalled. V2: 1. Restore psp_cmd_submit_buf to its original code. 2. Move drm_dev_unplug call after amdgpu_driver_unload_kms in amdgpu_pci_remove. 3. Since amdgpu_device_fini_hw is called by amdgpu_driver_unload_kms, remove the unplug check to release device mmio resource in amdgpu_device_fini_hw before calling drm_dev_unplug. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: add missing pci_disable_device() in amdgpu_pmops_runtime_resume()Yang Yingliang1-1/+4
Add missing pci_disable_device() if amdgpu_device_resume() fails. Fixes: 8e4d5d43cc6c ("drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25drm/amd/display: Add visualconfirm module parameterLeo Li1-0/+4
[Why] Being able to configure visual confirm at boot or in cmdline is helpful when debugging. [How] Add a module parameter to configure DC visual confirm, which works the same way as the equivalent debugfs entry. Signed-off-by: Leo Li <sunpeng.li@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25drm/amdgpu: bump driver version for IP discovery info in HW INFOAlex Deucher1-2/+3
So userspace knows when it is available. Proposed mesa patch: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17411/diffs?commit_id=c8a63590dfd0d64e6e6a634dcfed993f135dd075 Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25drm/amdgpu: Fix comment typoJason Wang1-1/+1
The double `to' is duplicated in the comment, remove one. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18drm/amdgpu: drop runpm from amdgpu_device structureGuchun Chen1-5/+5
It's redundant, as now switching to rpm_mode to indicate runtime power management mode. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUsRamesh Errabolu1-0/+10
Add support for peer-to-peer communication among AMD GPUs over PCIe bus. Support REQUIRES enablement of config HSA_AMD_P2P. Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-26drm/amdgpu: bump minor version numberChristian König1-2/+3
Increase the minor version number to indicate that the new flags are available. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-26drm/amdgpu: add beige goby PCI IDAlex Deucher1-0/+1
Add a beige goby PCI ID. Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-05-18drm/amd: Don't reset dGPUs if the system is going to s2idleMario Limonciello1-1/+1
An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC reset for handling aborted suspend can't work with s2idle. This functionality was introduced in commit daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)"). A few other commits have gone on top of the ASIC reset, but this still doesn't work on the A+A configuration in s2idle. Avoid doing the reset on dGPUs specifically when using s2idle. Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-05Revert "drm/amdgpu: disable runpm if we are the primary adapter"Alex Deucher1-28/+0
This reverts commit b95dc06af3e683d6b7ddbbae178b2b2a21ee8b2b. This workaround is no longer necessary. We have a better workaround in commit f95af4a9236695 ("drm/amdgpu: don't runtime suspend if there are displays attached (v3)"). Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04drm/amd/amdgpu: add more fw load type to fit new ASICsChengming Gui1-1/+1
Align exported fw load types with internal used. Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04drm/amdgpu: add mes_kiq module parameter v2Jack Xiao1-0/+10
mes_kiq parameter is used to enable mes kiq pipe. This module parameter is unneccessary or enabled by default in final version. v2: reword commit message. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-21drm/amdgpu: don't runtime suspend if there are displays attached (v3)Alex Deucher1-35/+70
We normally runtime suspend when there are displays attached if they are in the DPMS off state, however, if something wakes the GPU we send a hotplug event on resume (in case any displays were connected while the GPU was in suspend) which can cause userspace to light up the displays again soon after they were turned off. Prior to commit 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's."), the driver took a runtime pm reference when the fbdev emulation was enabled because we didn't implement proper shadowing support for vram access when the device was off so the device never runtime suspended when there was a console bound. Once that commit landed, we now utilize the core fb helper implementation which properly handles the emulation, so runtime pm now suspends in cases where it did not before. Ultimately, we need to sort out why runtime suspend in not working in this case for some users, but this should restore similar behavior to before. v2: move check into runtime_suspend v3: wake ups -> wakeups in comment, retain pm_runtime behavior in runtime_idle callback Fixes: 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.") Link: https://lore.kernel.org/r/20220403132322.51c90903@darkstar.example.org/ Tested-by: Michele Ballabio <ballabio.m@gmail.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-08drm/amdgpu: expand cg_flags from u32 to u64Evan Quan1-4/+4
With this, we can support more CG flags. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-07drm/amdgpu: Ensure HDA function is suspended before ASIC resetKai-Heng Feng1-6/+12
DP/HDMI audio on AMD PRO VII stops working after S3: [ 149.450391] amdgpu 0000:63:00.0: amdgpu: MODE1 reset [ 149.450395] amdgpu 0000:63:00.0: amdgpu: GPU mode1 reset [ 149.450494] amdgpu 0000:63:00.0: amdgpu: GPU psp mode1 reset [ 149.983693] snd_hda_intel 0000:63:00.1: refused to change power state from D0 to D3hot [ 150.003439] amdgpu 0000:63:00.0: refused to change power state from D0 to D3hot ... [ 155.432975] snd_hda_intel 0000:63:00.1: CORB reset timeout#2, CORBRP = 65535 The offending commit is daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)"). Commit 34452ac3038a7 ("drm/amdgpu: don't use BACO for reset in S3 ") doesn't help, so the issue is something different. Assuming that to make HDA resume to D0 fully realized, it needs to be successfully put to D3 first. And this guesswork proves working, by moving amdgpu_asic_reset() to noirq callback, so it's called after HDA function is in D3. Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-25drm/amdkfd: Fix Incorrect VMIDs passed to HWSTushar Patel1-1/+1
Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix this by passing correct number of VMIDs to HWS v2: squash in warning fix (Alex) Signed-off-by: Tushar Patel <tushar.patel@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdgpu/vcn: Add vcn firmware logRuijing Dong1-0/+8
vcn fwlog is for debugging purpose only, by default, it is disabled. Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Bump minor version for hot plug tests enabling.Andrey Grodzovsky1-1/+2
This will allow to enable the tests only after latest fix after which the tests passed on my system. I tested on NV21 standalone and Vega 10 and Polaris as pair with DRI_PRIME. It's possible there might be still issues on ASICs i don't have at my posession but that that the point of enbling the tests finally - if other people during testing will encounter errors they will report and I will be able to fix. The releated merge request for enabling libdrm tests suite is in https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/227 Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-01Merge tag 'amd-drm-next-5.18-2022-02-25' of ↵Dave Airlie1-19/+11
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.18-2022-02-25: amdgpu: - Raven2 suspend/resume fix - SDMA 5.2.6 updates - VCN 3.1.2 updates - SMU 13.0.5 updates - DCN 3.1.5 updates - Virtual display fixes - SMU code cleanup - Harvest fixes - Expose benchmark tests via debugfs - Drop no longer relevant gart aperture tests - More RAS restructuring - W=1 fixes - PSR rework - DP/VGA adapter fixes - DP MST fixes - GPUVM eviction fix - GPU reset debugfs register dumping support - Misc display fixes - SR-IOV fix - Aldebaran mGPU fix - Add module parameter to disable XGMI for testing amdkfd: - IH ring overflow logging fixes - CRIU fixes - Misc fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220225183535.5907-1-alexander.deucher@amd.com
2022-02-28Backmerge tag 'v5.17-rc6' into drm-nextDave Airlie1-0/+3
This backmerges v5.17-rc6 so I can merge some amdgpu and some tegra changes on top. Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-02-24drm/amdgpu: Exclude PCI reset method for now.Andrey Grodzovsky1-2/+2
According to my investigation of the state of PCI reset recently it's not working. The reason is due to the fact the kernel PCI code rejects SBR when there are more then one PF under same bridge which we always have (at least AUDIO PF but usually more) and that because SBR will reset all the PFS and devices under the same bridge as you and you cannot assume they support SBR. Once we anble FLR support we can reenable this option as FLR is doable on single PF and doens't have this restriction. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-24drm/amdgpu: Add use_xgmi_p2p module parameterAlex Sierra1-0/+8
This parameter controls xGMI p2p communication, which is enabled by default. However, it can be disabled by setting it to 0. In case xGMI p2p is disabled in a dGPU, PCIe p2p interface will be used instead. This parameter is ignored in GPUs that do not support xGMI p2p configuration. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-23drm/amd: Check if ASPM is enabled from PCIe subsystemMario Limonciello1-0/+3
commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") enabled ASPM by default but a variety of hardware configurations it turns out that this caused a regression. * PPC64LE hardware does not support ASPM at a hardware level. CONFIG_PCIEASPM is often disabled on these architectures. * Some dGPUs on ALD platforms don't work with ASPM enabled and PCIe subsystem disables it Check with the PCIe subsystem to see that ASPM has been enabled or not. Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") Link: https://wiki.raptorcs.com/w/images/a/ad/P9_PHB_version1.0_27July2018_pub.pdf Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1723 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1739 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1907 Tested-by: koba.ko@canonical.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-02-23drm/amdgpu: drop testing module parameterAlex Deucher1-8/+0
This test is not particularly useful now that GTT and GART are decoupled in the driver. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-23drm/amdgpu: drop benchmark module parameterAlex Deucher1-8/+0
Now that we expose the benchmarks via debugfs, there is no longer a need for the module parameter. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-22drm/amdgpu: Fix typo in *whether* in commentPaul Menzel1-1/+1
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-17drm/amd: Refactor `amdgpu_aspm` to be evaluated per deviceMario Limonciello1-3/+0
Evaluating `pcie_aspm_enabled` as part of driver probe has the implication that if one PCIe bridge with an AMD GPU connected doesn't support ASPM then none of them do. This is an invalid assumption as the PCIe core will configure ASPM for individual PCIe bridges. Create a new helper function that can be called by individual dGPUs to react to the `amdgpu_aspm` module parameter without having negative results for other dGPUs on the PCIe bus. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-17drm/amd: Check if ASPM is enabled from PCIe subsystemMario Limonciello1-0/+3
commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") enabled ASPM by default but a variety of hardware configurations it turns out that this caused a regression. * PPC64LE hardware does not support ASPM at a hardware level. CONFIG_PCIEASPM is often disabled on these architectures. * Some dGPUs on ALD platforms don't work with ASPM enabled and PCIe subsystem disables it Check with the PCIe subsystem to see that ASPM has been enabled or not. Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") Link: https://wiki.raptorcs.com/w/images/a/ad/P9_PHB_version1.0_27July2018_pub.pdf Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1723 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1739 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1907 Tested-by: koba.ko@canonical.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>