aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)AuthorFilesLines
2021-08-24drm/amdkfd: CWSR with sw scheduler on Aldebaran and ArcturusMukul Joshi4-2/+6
Program trap handler settings to enable CWSR with software scheduler on Aldebaran and Arcturus. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Amber Lin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-24drm/amdgpu/OLAND: clip the ref divider max valueShashank Sharma3-9/+16
This patch limits the ref_div_max value to 100, during the calculation of PLL feedback reference divider. With current value (128), the produced fb_ref_div value generates unstable output at particular frequencies. Radeon driver limits this value at 100. On Oland, when we try to setup mode 2048x1280@60 (a bit weird, I know), it demands a clock of 221270 Khz. It's been observed that the PLL calculations using values 128 and 100 are vastly different, and look like this: +------------------------------------------+ |Parameter |AMDGPU |Radeon | | | | | +-------------+----------------------------+ |Clock feedback | | |divider max | 128 | 100 | |cap value | | | | | | | | | | | +------------------------------------------+ |ref_div_max | | | | | 42 | 20 | | | | | | | | | +------------------------------------------+ |ref_div | 42 | 20 | | | | | +------------------------------------------+ |fb_div | 10326 | 8195 | +------------------------------------------+ |fb_div | 1024 | 163 | +------------------------------------------+ |fb_dev_p | 4 | 9 | |frac fb_de^_p| | | +----------------------------+-------------+ With ref_div_max value clipped at 100, AMDGPU driver can also drive videmode 2048x1280@60 (221Mhz) and produce proper output without any blanking and distortion on the screen. PS: This value was changed from 128 to 100 in Radeon driver also, here: https://github.com/freedesktop/drm-tip/commit/4b21ce1b4b5d262e7d4656b8ececc891fc3cb806 V1: Got acks from: Acked-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]> V2: - Restricting the changes only for OLAND, just to avoid any regression for other cards. - Changed unsigned -> unsigned int to make checkpatch quiet. V3: Apply the change on SI family (not only oland) (Christian) Cc: Alex Deucher <[email protected]> Cc: Christian König <[email protected]> Cc: Eddy Qin <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Shashank Sharma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-24drm/amd/display: refactor riommu invalidation waEric Yang4-20/+1
[Why] A cleaner solution, only done once on boot. [How] Remove previous workaround and configure an extra vmid one time on boot Reviewed-by: Kazlauskas Nicholas <[email protected]> Acked-by: Solomon Chiu <[email protected]> Signed-off-by: Eric Yang <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-24drm/amdgpu: Fix build with missing pm_suspend_target_state module exportBorislav Petkov1-1/+1
Building a randconfig here triggered: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! because the module export of that symbol happens in kernel/power/suspend.c which is enabled with CONFIG_SUSPEND. The ifdef guards in amdgpu_acpi_is_s0ix_supported(), however, test for CONFIG_PM_SLEEP which is defined like this: config PM_SLEEP def_bool y depends on SUSPEND || HIBERNATE_CALLBACKS and that randconfig has: # CONFIG_SUSPEND is not set CONFIG_HIBERNATE_CALLBACKS=y leading to the module export missing. Change the ifdeffery to depend directly on CONFIG_SUSPEND. Fixes: 5706cb3c910c ("drm/amdgpu: fix checking pmops when PM_SLEEP is not enabled") Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-08-20drm/amdgpu: Cancel delayed work when GFXOFF is disabledMichel Dänzer2-17/+30
schedule_delayed_work does not push back the work if it was already scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms after the first time GFXOFF was disabled and re-enabled, even if GFXOFF was disabled and re-enabled again during those 100 ms. This resulted in frame drops / stutter with the upcoming mutter 41 release on Navi 14, due to constantly enabling GFXOFF in the HW and disabling it again (for getting the GPU clock counter). To fix this, call cancel_delayed_work_sync when the disable count transitions from 0 to 1, and only schedule the delayed work on the reverse transition, not if the disable count was already 0. This makes sure the delayed work doesn't run at unexpected times, and allows it to be lock-free. v2: * Use cancel_delayed_work_sync & mutex_trylock instead of mod_delayed_work. v3: * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König) v4: * Fix race condition between amdgpu_gfx_off_ctrl incrementing adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off checking for it to be 0 (Evan Quan) Cc: [email protected] Reviewed-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> # v3 Acked-by: Christian König <[email protected]> # v3 Signed-off-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-20drm/amdgpu: use the preferred pin domain after the checkChristian König1-5/+5
For some reason we run into an use case where a BO is already pinned into GTT, but should be pinned into VRAM|GTT again. Handle that case gracefully as well. Reviewed-by: Shashank Sharma <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-08-20drm/amdgpu: Cancel delayed work when GFXOFF is disabledMichel Dänzer2-17/+30
schedule_delayed_work does not push back the work if it was already scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms after the first time GFXOFF was disabled and re-enabled, even if GFXOFF was disabled and re-enabled again during those 100 ms. This resulted in frame drops / stutter with the upcoming mutter 41 release on Navi 14, due to constantly enabling GFXOFF in the HW and disabling it again (for getting the GPU clock counter). To fix this, call cancel_delayed_work_sync when the disable count transitions from 0 to 1, and only schedule the delayed work on the reverse transition, not if the disable count was already 0. This makes sure the delayed work doesn't run at unexpected times, and allows it to be lock-free. v2: * Use cancel_delayed_work_sync & mutex_trylock instead of mod_delayed_work. v3: * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König) v4: * Fix race condition between amdgpu_gfx_off_ctrl incrementing adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off checking for it to be 0 (Evan Quan) Cc: [email protected] Reviewed-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> # v3 Acked-by: Christian König <[email protected]> # v3 Signed-off-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-20drm/amdgpu: use the preferred pin domain after the checkChristian König1-5/+5
For some reason we run into an use case where a BO is already pinned into GTT, but should be pinned into VRAM|GTT again. Handle that case gracefully as well. Reviewed-by: Shashank Sharma <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-08-20drm/amd/pm: a quick fix for "divided by zero" errorEvan Quan2-9/+20
Considering Arcturus is a dedicated ASIC for computing, it will be more proper to drop the support for fan speed reading and setting. That's on the TODO list. Signed-off-by: Evan Quan <[email protected]> Reported-by: Rui Teng <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-19isystem: ship and use stdarg.hAlexey Dobriyan1-1/+1
Ship minimal stdarg.h (1 type, 4 macros) as <linux/stdarg.h>. stdarg.h is the only userspace header commonly used in the kernel. GPL 2 version of <stdarg.h> can be extracted from http://archive.debian.org/debian/pool/main/g/gcc-4.2/gcc-4.2_4.2.4.orig.tar.gz Signed-off-by: Alexey Dobriyan <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2021-08-19isystem: trim/fixup stdarg.h and other headersAlexey Dobriyan1-1/+0
Delete/fixup few includes in anticipation of global -isystem compile option removal. Note: crypto/aegis128-neon-inner.c keeps <stddef.h> due to redefinition of uintptr_t error (one definition comes from <stddef.h>, another from <linux/types.h>). Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2021-08-18drm/amd/display: Use DCN30 watermark calc for DCN301Zhan Liu1-95/+1
[why] dcn301_calculate_wm_and_dl() causes flickering when external monitor is connected. This issue has been fixed before by commit 0e4c0ae59d7e ("drm/amdgpu/display: drop dcn301_calculate_wm_and_dl for now"), however part of the fix was gone after commit 2cbcb78c9ee5 ("Merge tag 'amd-drm-next-5.13-2021-03-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-next"). [how] Use dcn30_calculate_wm_and_dlg() instead as in the original fix. Fixes: 2cbcb78c9ee5 ("Merge tag 'amd-drm-next-5.13-2021-03-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-next") Signed-off-by: Nikola Cornij <[email protected]> Reviewed-by: Zhan Liu <[email protected]> Tested-by: Zhan Liu <[email protected]> Tested-by: Oliver Logush <[email protected]> Signed-off-by: Zhan Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amd/pm: Fix spelling mistake "firwmare" -> "firmware"Colin Ian King1-1/+1
There is a spelling mistake in a dev_err error message. Fix it. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amd/amdgpu:flush ttm delayed work before cancel_syncYuBiao Wang1-1/+3
[Why] In some cases when we unload driver, warning call trace will show up in vram_mgr_fini which claims that LRU is not empty, caused by the ttm bo inside delay deleted queue. [How] We should flush delayed work to make sure the delay deleting is done. Signed-off-by: YuBiao Wang <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amd: consolidate TA shared memory structuresCandice Li6-196/+168
Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amdgpu: increase max xgmi physical node for aldebaranHawking Zhang1-3/+2
aldebaran supports up to 16 xgmi physical nodes. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amdgpu: disable BACO support for 699F:C7 polaris12 SKU temporarilyEvan Quan1-1/+8
We have a S3 issue on that SKU with BACO enabled. Will bring back this when that root caused. Signed-off-by: Evan Quan <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amd/display: Use DCN30 watermark calc for DCN301Zhan Liu1-95/+1
[why] dcn301_calculate_wm_and_dl() causes flickering when external monitor is connected. This issue has been fixed before by commit 0e4c0ae59d7e ("drm/amdgpu/display: drop dcn301_calculate_wm_and_dl for now"), however part of the fix was gone after commit 2cbcb78c9ee5 ("Merge tag 'amd-drm-next-5.13-2021-03-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-next"). [how] Use dcn30_calculate_wm_and_dlg() instead as in the original fix. Fixes: 2cbcb78c9ee5 ("Merge tag 'amd-drm-next-5.13-2021-03-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-next") Signed-off-by: Nikola Cornij <[email protected]> Reviewed-by: Zhan Liu <[email protected]> Tested-by: Zhan Liu <[email protected]> Tested-by: Oliver Logush <[email protected]> Signed-off-by: Zhan Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amdgpu: correct MMSCH 1.0 versionZhigang Luo1-3/+1
MMSCH 1.0 doesn't have major/minor version, only verison. Signed-off-by: Zhigang Luo <[email protected]> Reviewed by Shaoyun.liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-18drm/amdgpu: get extended xgmi topology dataJonathan Kim4-14/+145
The TA has a limit to the amount of data that can be retrieved from GET_TOPOLOGY. For setups that exceed this limit, the xGMI topology needs to be re-initialized and data needs to be re-fetched from the extended link records by setting a flag in the shared command buffer. The number of hops and the number of links must be accumulated by the driver. Other data points are all fetched from the first request. Because the TA has already exceeded its link record limit, it cannot hold bidirectional information. Otherwise the driver would have to do more than two fetches so the driver has to reflect the topology information in the opposite direction. v2: squashed with internal reviewed fix Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/display: 3.2.149Aric Cyr1-1/+1
This version brings along following fixes: - Ensure DCN save init registers after VM setup - Fix multi-display support for idle opt workqueue - Use vblank control events for PSR enable/disable - Create default dc_sink when fail reading EDID under MST Acked-by: Wayne Lin <[email protected]> Signed-off-by: Aric Cyr <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/display: [FW Promotion] Release 0.0.79Anthony Koo1-2/+12
Acked-by: Wayne Lin <[email protected]> Signed-off-by: Anthony Koo <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/display: Guard vblank wq flush with DCN guardsNicholas Kazlauskas1-0/+4
[Why] Compilation of the workqueue fails if not building with the DCN config option set. [How] Guard calls to the flush with the DCN config option to fix the build. Reviewed-by: Roman Li <[email protected]> Acked-by: Wayne Lin <[email protected]> Signed-off-by: Nicholas Kazlauskas <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/display: Ensure DCN save after VM setupJake Wang8-0/+30
[Why] DM initializes VM context after DMCUB initialization. This results in loss of DCN_VM_CONTEXT registers after z10. [How] Notify DMCUB when VM setup is complete, and have DMCUB save init registers. v2: squash in CONFIG_DRM_AMD_DC_DCN3_1 fix Reviewed-by: Nicholas Kazlauskas <[email protected]> Acked-by: Wayne Lin <[email protected]> Signed-off-by: Jake Wang <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/display: Ensure DCN save after VM setupJake Wang8-0/+30
[Why] DM initializes VM context after DMCUB initialization. This results in loss of DCN_VM_CONTEXT registers after z10. [How] Notify DMCUB when VM setup is complete, and have DMCUB save init registers. v2: squash in CONFIG_DRM_AMD_DC_DCN3_1 fix Reviewed-by: Nicholas Kazlauskas <[email protected]> Acked-by: Wayne Lin <[email protected]> Signed-off-by: Jake Wang <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failureYifan Zhang1-0/+8
KFDSVMRangeTest.SetGetAttributesTest randomly fails in stress test. Note: Google Test filter = KFDSVMRangeTest.* [==========] Running 18 tests from 1 test case. [----------] Global test environment set-up. [----------] 18 tests from KFDSVMRangeTest [ RUN ] KFDSVMRangeTest.BasicSystemMemTest [ OK ] KFDSVMRangeTest.BasicSystemMemTest (30 ms) [ RUN ] KFDSVMRangeTest.SetGetAttributesTest [ ] Get default atrributes /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure Value of: expectedDefaultResults[i] Actual: 4294967295 Expected: outputAttributes[i].value Which is: 0 /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure Value of: expectedDefaultResults[i] Actual: 4294967295 Expected: outputAttributes[i].value Which is: 0 /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:152: Failure Value of: expectedDefaultResults[i] Actual: 4 Expected: outputAttributes[i].type Which is: 2 [ ] Setting/Getting atrributes [ FAILED ] the root cause is that svm work queue has not finished when svm_range_get_attr is called, thus some garbage svm interval tree data make svm_range_get_attr get wrong result. Flush work queue before iterate svm interval tree. Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: change the workload type for some cardsKenneth Feng1-1/+14
change the workload type for some cards as it is needed. Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16Revert "drm/amd/pm: fix workload mismatch on vega10"Kenneth Feng1-1/+1
This reverts commit 0979d43259e13846d86ba17e451e17fec185d240. Revert this because it does not apply to all the cards. Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/display: Use vblank control events for PSR enable/disableNicholas Kazlauskas3-8/+43
[Why] PSR can disable the HUBP along with the OTG when PSR is active. We'll hit a pageflip timeout when the OTG is disable because we're no longer updating the CRTC vblank counter and the pflip high IRQ will not fire on the flip. In order to flip the page flip timeout occur we should modify the enter/exit conditions to match DRM requirements. [How] Use our deferred handlers for DRM vblank control to notify DMCU(B) when it can enable or disable PSR based on whether vblank is disabled or enabled respectively. We'll need to pass along the stream with the notification now because we want to access the CRTC state while the CRTC is locked to get the stream state prior to the commit. Retain a reference to the stream so it remains safe to continue to access and release that reference once we're done with it. Enable/disable logic follows what we were previously doing in update_planes. The workqueue has to be flushed before programming streams or planes to ensure that we exit out of idle optimizations and PSR before these events occur if necessary. To keep the skip count logic the same to avoid FBCON PSR enablement requires copying the allow condition onto the DM IRQ parameters - a field that we can actually access from the worker. Reviewed-by: Roman Li <[email protected]> Acked-by: Wayne Lin <[email protected]> Signed-off-by: Nicholas Kazlauskas <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/display: Fix multi-display support for idle opt workqueueNicholas Kazlauskas2-47/+36
[Why] The current implementation for idle optimization support only has a single work item that gets reshuffled into the system workqueue whenever we receive an enable or disable event. We can have mismatched events if the work hasn't been processed or if we're getting control events from multiple displays at once. This fixes this issue and also makes the implementation usable for PSR control - which will be addressed in another patch. [How] We need to be able to flush remaining work out on demand for driver stop and psr disable so create a driver specific workqueue instead of using the system one. The workqueue will be single threaded to guarantee the ordering of enable/disable events. Refactor the queue to allocate the control work and deallocate it after processing it. Pass the acrtc directly to make it easier to handle psr enable/disable in a later patch. Rename things to indicate that it's not just MALL specific. Reviewed-by: Roman Li <[email protected]> Acked-by: Wayne Lin <[email protected]> Signed-off-by: Nicholas Kazlauskas <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/display: Create dc_sink when EDID failWayne Lin1-0/+23
[Why] While reading remote EDID via Startech 1-to-4 hub, occasionally we won't get response in time and won't light up corresponding monitor. Ideally, we can still add generic modes for userspace to choose to try to light up the monitor and which is done in drm_helper_probe_single_connector_modes(). So the main problem here is that we fail .mode_valid since we don't create remote dc_sink for this case. [How] Also add default dc_sink if we can't get the EDID. Reviewed-by: Nicholas Kazlauskas <[email protected]> Signed-off-by: Wayne Lin <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: correct the address of Arcturus fan related registersEvan Quan1-5/+133
These registers have different address from other SMU V11 ASICs. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: drop unnecessary manual mode checkEvan Quan1-12/+4
As the fan control was guarded under manual mode before fan speed RPM/PWM setting. Thus the extra check is totally redundant. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: drop the unnecessary intermediate percent-based transitionEvan Quan23-140/+112
Currently, the readout of fan speed pwm is transited into percent-based and then pwm-based. However, the transition into percent-based is totally unnecessary and make the final output less accurate. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: correct the fan speed RPM retrievingEvan Quan8-5/+106
The relationship "PWM = RPM / smu->fan_max_rpm" between fan speed PWM and RPM is not true for SMU11 ASICs. So, we need a new way to retrieving the fan speed RPM. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: correct the fan speed PWM retrievingEvan Quan8-81/+62
The relationship "PWM = RPM / smu->fan_max_rpm" between fan speed PWM and RPM is not true for SMU11 ASICs. So, we need a new way to retrieving the fan speed PWM. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: record the RPM and PWM based fan speed settingsEvan Quan3-6/+31
As the relationship "PWM = RPM / smu->fan_max_rpm" between fan speed PWM and RPM is not true for SMU11 ASICs. So, both the RPM and PWM settings need to be saved. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: correct the fan speed RPM settingEvan Quan7-4/+51
The relationship "PWM = RPM / smu->fan_max_rpm" between fan speed PWM and RPM is not true for SMU11 ASICs. So, we need a new way to perform the fan speed RPM setting. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/amdgpu: remove unnecessary RAS context fieldCandice Li10-14/+6
Delete ras_if->name in the RAS ctx structure and remove related lines. Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failureYifan Zhang1-0/+8
KFDSVMRangeTest.SetGetAttributesTest randomly fails in stress test. Note: Google Test filter = KFDSVMRangeTest.* [==========] Running 18 tests from 1 test case. [----------] Global test environment set-up. [----------] 18 tests from KFDSVMRangeTest [ RUN ] KFDSVMRangeTest.BasicSystemMemTest [ OK ] KFDSVMRangeTest.BasicSystemMemTest (30 ms) [ RUN ] KFDSVMRangeTest.SetGetAttributesTest [ ] Get default atrributes /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure Value of: expectedDefaultResults[i] Actual: 4294967295 Expected: outputAttributes[i].value Which is: 0 /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure Value of: expectedDefaultResults[i] Actual: 4294967295 Expected: outputAttributes[i].value Which is: 0 /home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:152: Failure Value of: expectedDefaultResults[i] Actual: 4 Expected: outputAttributes[i].type Which is: 2 [ ] Setting/Getting atrributes [ FAILED ] the root cause is that svm work queue has not finished when svm_range_get_attr is called, thus some garbage svm interval tree data make svm_range_get_attr get wrong result. Flush work queue before iterate svm interval tree. Signed-off-by: Yifan Zhang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: change the workload type for some cardsKenneth Feng1-1/+14
change the workload type for some cards as it is needed. Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16Revert "drm/amd/pm: fix workload mismatch on vega10"Kenneth Feng1-1/+1
This reverts commit 0979d43259e13846d86ba17e451e17fec185d240. Revert this because it does not apply to all the cards. Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/amdgpu: consolidate PSP TA contextCandice Li11-187/+158
Signed-off-by: Candice Li <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amdgpu: Add MB_REQ_MSG_READY_TO_RESET response when VF get FLR notification.Jiange Zhao2-1/+4
When guest received FLR notification from host, it would lock adapter into reset state. There will be no more job submission and hardware access after that. Then it should send a response to host that it has prepared for host reset. Signed-off-by: Jiange Zhao <[email protected]> Signed-off-by: Peng Ju Zhou <[email protected]> Reviewed-by: Emily.Deng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: change pp_dpm_sclk/mclk/fclk attribute is RO for aldebaranKevin Wang1-2/+7
the following clock is only support voltage DPM, change attribute to RO: 1. pp_dpm_sclk 2. pp_dpm_mclk 3. pp_dpm_fclk Signed-off-by: Kevin Wang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: change smu msg's attribute to allow working under sriovKevin Wang1-2/+2
the following message is allowed in sriov mode: 1. GetEnabledSmuFeaturesLow 2. GetEnabledSmuFeaturesHigh Signed-off-by: Kevin Wang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: change return value in aldebaran_get_power_limit()Kevin Wang1-2/+13
v1: 1. change return value to avoid smu driver probe fails when FEATURE_PPT is not enabled. 2. if FEATURE_PPT is not enabled, set power limit value to 0. v2: instead dev_err with dev_warn Signed-off-by: Kevin Wang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: skip to load smu microcode on sriov for aldebaranKevin Wang1-32/+70
v1: 1. skip to load smu firmware in sriov mode for aldebaran chip 2. using vbios pptable if in sriov mode. v2: clean up smu driver code in sriov code path Signed-off-by: Kevin Wang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/pm: correct DPM_XGMI/VCN_DPM feature nameKevin Wang3-11/+10
the following feature is wrong, it will cause sysnode of pp_features show error: 1. DPM_XGMI 2. VCN_DPM Signed-off-by: Kevin Wang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/amdgpu embed hw_fence into amdgpu_jobJack Zhang9-37/+119
Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime, and simplify the design of gpu-scheduler. How: We propose to embed hw_fence into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. v2: use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is embedded in a job. v3: remove redundant variable ring in amdgpu_job v4: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. v5 add missing handling in amdgpu_fence_enable_signaling Signed-off-by: Jingwen Chen <[email protected]> Signed-off-by: Jack Zhang <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Reviewed by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>