aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-04-22drm/amd/powerplay: remove defined but not used variablesJason Yan1-23/+0
Fix the following gcc warning: drivers/gpu/drm/amd/amdgpu/../powerplay/hwmgr/vega10_powertune.c:710:46: warning: ‘PSMGCEDCThresholdConfig_vega10’ defined but not used [-Wunused-const-variable=] static const struct vega10_didt_config_reg PSMGCEDCThresholdConfig_vega10[] = ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/../powerplay/hwmgr/vega10_powertune.c:654:46: warning: ‘PSMSEEDCThresholdConfig_Vega10’ defined but not used [-Wunused-const-variable=] static const struct vega10_didt_config_reg PSMSEEDCThresholdConfig_Vega10[] = ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amdgpu: fix race between pstate and remote buffer mapJonathan Kim6-53/+49
Vega20 arbitrates pstate at hive level and not device level. Last peer to remote buffer unmap could drop P-State while another process is still remote buffer mapped. With this fix, P-States still needs to be disabled for now as SMU bug was discovered on synchronous P2P transfers. This should be fixed in the next FW update. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amdgpu/display: give aux i2c buses more meaningful namesAlex Deucher3-4/+9
Mirror what we do for i2c display buses. Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amdgpu/display: fix aux registration (v2)Alex Deucher2-5/+14
We were registering the aux device in the MST late_register rather than the regular one. v2: handle eDP as well Fixes: 405a1f9090d1ac ("drm/amdgpu/display: split dp connector registration (v4)") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1100 Signed-off-by: Alex Deucher <[email protected]> Reviewed-by: Harry Wentland <[email protected]>
2020-04-22drm/amdgpu: Correctly initialize thermal controller for GPUs with Powerplay ↵Sandeep Raghuraman1-0/+26
table v0 (e.g Hawaii) Initialize thermal controller fields in the PowerPlay table for Hawaii GPUs, so that fan speeds are reported. Signed-off-by: Sandeep Raghuraman <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22Revert "drm/amdgpu: Disable gfx off if VCN is busy"James Zhu1-2/+0
This reverts commit 3fded222f4bf7f4c56ef4854872a39a4de08f7a8 This is work around for vcn1 only. Currently vcn1 has separate begin_use and idle work handle. Signed-off-by: James Zhu <[email protected]> Tested-by: changzhu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amdgpu: fix kernel page fault issue by ras recovery on sGPUGuchun Chen1-2/+3
When running ras uncorrectable error injection and triggering GPU reset on sGPU, below issue is observed. It's caused by the list uninitialized when accessing. [ 80.047227] BUG: unable to handle page fault for address: ffffffffc0f4f750 [ 80.047300] #PF: supervisor write access in kernel mode [ 80.047351] #PF: error_code(0x0003) - permissions violation [ 80.047404] PGD 12c20e067 P4D 12c20e067 PUD 12c210067 PMD 41c4ee067 PTE 404316061 [ 80.047477] Oops: 0003 [#1] SMP PTI [ 80.047516] CPU: 7 PID: 377 Comm: kworker/7:2 Tainted: G OE 5.4.0-rc7-guchchen #1 [ 80.047594] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018 [ 80.047888] Workqueue: events amdgpu_ras_do_recovery [amdgpu] Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amdgpu: Disable FRU read on ArcturusKent Russell1-3/+4
Update the list with supported Arcturus chips, but disable for now until final list is confirmed. Ideally we can poll atombios for FRU support, instead of maintaining this list of chips, but this will enable serial number reading for supported ASICs for the time-being. Signed-off-by: Kent Russell <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amd/powerplay: fix resume failed as smu table initialize early exitPrike Liang1-1/+6
When the amdgpu in the suspend/resume loop need notify the dpm disabled, otherwise the smu table will be uninitialize and result in resume failed. Signed-off-by: Prike Liang <[email protected]> Tested-by: Mengbing Wang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amdgpu/gmc: Fix spelling mistake.Rajneesh Bhardwaj1-6/+6
Fixes a minor typo in the file. Reviewed-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amdgpu: cache smu fw version infoJohn Clements5-12/+29
reduce cmd submission to smu by caching version info Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"Kent Russell1-26/+0
This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e. The original patch causes a RAS event and subsequent kernel hard-hang when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and Arcturus dmesg output at hang time: [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected! amdgpu 0000:67:00.0: GPU reset begin! Evicting PASID 0x8000 queues Started evicting pasid 0x8000 qcm fence wait loop timeout expired The cp might be in an unrecoverable state due to an unsuccessful queues preemption Failed to evict process queues Failed to suspend process 0x8000 Finished evicting pasid 0x8000 Started restoring pasid 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state may lost due to RAS ERREVENT_ATHUB_INTERRUPT amdgpu: [powerplay] Failed to send message 0x26, response 0x0 amdgpu: [powerplay] Failed to set soft min gfxclk ! amdgpu: [powerplay] Failed to upload DPM Bootup Levels! amdgpu: [powerplay] Failed to send message 0x7, response 0x0 amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu features! amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features! amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM! [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <powerplay> failed -5 Signed-off-by: Kent Russell <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amdgpu/gfx9: add gfxoff quirkAlex Deucher1-0/+2
Fix screen corruption with firefox. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207171 Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amdgpu: set mp1 state before reloadJohn Clements2-7/+10
Set MP1 state to prepare for unload before reloading SMU FW Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amdgpu: update psp fw loading sequenceJohn Clements1-49/+77
Added dedicated function to check if particular fw should be skipped from loading. Added dedicated function for SMU FW loading via PSP Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amd/powerplay: update Arcturus smu-driver if headerEvan Quan2-3/+14
To fit the latest PMFW. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Kenneth Feng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amd/powerplay: properly set the dpm_enabled stateEvan Quan5-66/+216
On the ASIC powered down(in baco or system suspend), the dpm_enabled will be set as false. Then all access (e.g. df state setting issued on RAS error event) to SMU will be blocked. Signed-off-by: Evan Quan <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amd/powerplay: correct i2c eeprom init/fini sequenceEvan Quan3-15/+17
As data transfer may starts immediately after i2c eeprom init completed. Thus i2c eeprom should be initialized after SMU ready. And i2c data transfer should be prohibited when SMU down. That is the i2c eeprom fini sequence needs to be updated also. Signed-off-by: Evan Quan <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Kenneth Feng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amd/powerplay: bump the NAVI10 smu-driver if versionEvan Quan1-1/+1
To fit the latest SMC firmware 42.53 and eliminate the warning on driver loading. Signed-off-by: Evan Quan <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Kenneth Feng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amd/powerplay: revise the way to retrieve the board parametersEvan Quan2-71/+130
It can support different NV1x ASIC better. And this can guard no member got missing. Signed-off-by: Evan Quan <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Kenneth Feng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amdgpu: fix the hw hang during perform system reboot and resetPrike Liang1-0/+2
The system reboot failed as some IP blocks enter power gate before perform hw resource destory. Meanwhile use unify interface to set device CGPG to ungate state can simplify the amdgpu poweroff or reset ungate guard. Fixes: 487eca11a321ef ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)") Signed-off-by: Prike Liang <[email protected]> Tested-by: Mengbing Wang <[email protected]> Tested-by: Paul Menzel <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-22drm/amd/display: remove redundant assignment to variable dp_ref_clk_khzColin Ian King1-1/+1
The variable dp_ref_clk_khz is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/radeon: remove defined but not used variables in ci_dpm.cJason Yan1-14/+0
Fix the following gcc warning: drivers/gpu/drm/radeon/ci_dpm.c:82:36: warning: ‘defaults_saturn_pro’ defined but not used [-Wunused-const-variable=] static const struct ci_pt_defaults defaults_saturn_pro = ^~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/radeon/ci_dpm.c:68:36: warning: ‘defaults_bonaire_pro’ defined but not used [-Wunused-const-variable=] static const struct ci_pt_defaults defaults_bonaire_pro = ^~~~~~~~~~~~~~~~~~~~ Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/radeon: remove defined but not used 'dte_data_tahiti_le'Jason Yan1-18/+0
Fix the following gcc warning: drivers/gpu/drm/radeon/si_dpm.c:255:33: warning: ‘dte_data_tahiti_le’ defined but not used [-Wunused-const-variable=] static const struct si_dte_data dte_data_tahiti_le = Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amdgpu: remove dead code in si_dpm.cJason Yan1-20/+0
This code is dead, let's remove it. Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amd/amdgpu: remove hardcoded module name in printsAurabindo Pillai6-9/+9
Let format prefixes take care of printing the module name through pr_fmt and dev_fmt definitions. Signed-off-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amd/amdgpu: add print prefix for dev_* variantsAurabindo Pillai1-0/+6
Define dev_fmt macro for informative print messages Signed-off-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amd/amdgpu: add prefix for pr_* printsAurabindo Pillai1-0/+6
amdgpu uses lots of pr_* calls for printing error messages. With this prefix, errors shall be more obvious to the end use regarding its origin, and may help debugging. Prefix format: [xxx.xxxxx] amdgpu: ... Signed-off-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amd/display: code clean up in dce80_hw_sequencer.cJason Yan1-28/+0
Fix the following gcc warning: drivers/gpu/drm/amd/amdgpu/../display/dc/dce80/dce80_hw_sequencer.c:43:46: warning: ‘reg_offsets’ defined but not used [-Wunused-const-variable=] static const struct dce80_hw_seq_reg_offsets reg_offsets[] = { ^~~~~~~~~~~ Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amdgpu/ring: simplify scheduler setup logicAlex Deucher1-3/+1
Set up a GPU scheduler based on the ring flag rather than the ring type. Reviewed-by: Christian König <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amdgpu/kiq: add no_scheduler flag to KIQAlex Deucher1-0/+1
We don't want a GPU scheduler for this ring. Reviewed-by: Christian König <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amdgpu/ring: add no_scheduler flagAlex Deucher2-1/+3
This allows IPs to flag whether a specific ring requires a GPU scheduler or not. E.g., sometimes instances of an IP are asymmetric and have different capabilities. Reviewed-by: Christian König <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amdgpu/powerplay: get SMC FW size to a flexible wayLikun Gao2-2/+3
Get SMC fw size before backdoor loading instead of giving an certain value, as it may different for different ASIC. Signed-off-by: Likun Gao <[email protected]> Reviewed-by: Kenneth Feng <[email protected]> Reviewed-by: Evan Quan <[email protected]> Reviewed-by: Kevin Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amdgpu: fix wrong vram lost counter increment V2Evan Quan5-14/+18
Vram lost counter is wrongly increased by two during baco reset. V2: assumed vram lost for mode1 reset on all ASICs Signed-off-by: Evan Quan <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amdgpu: replace DRM prefix with PCI device info for GFX RASGuchun Chen1-20/+27
Prefix RAS message printing in GFX IP with PCI device info, which assists the debug in multiple GPU case. Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amdgpu: resume kiq access debugfsYintian Tao2-3/+11
If there is no GPU hang, user still can access debugfs through kiq. Signed-off-by: Yintian Tao <[email protected]> Reviewed-by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amdgpu: refine ras related message printGuchun Chen3-29/+48
Prefix ras related kernel message logging with PCI device info by replacing DRM_INFO/WARN/ERROR with dev_info/warn/err. This can clearly tell user about GPU device information where ras is. And add some other ras message printing to make it more clear and friendly as well. Suggested-by: Hawking Zhang <[email protected]> Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amdgpu: add uncorrectable error count print in UMC ecc irq cbGuchun Chen1-0/+3
Uncorrectable error count printing is missed when issuing UMC UE injection. When going to the error count log function in GPU recover work thread, there is no chance to get correct error count value by last error injection and print, because the error status register is automatically cleared after reading in UMC ecc irq callback. So add such message printing in UMC ecc irq cb to be consistent with other RAS error interrupt cases. Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amd/powerplay: force the trim of the mclk dpm_levels if OD is enabledSergei Lopatin1-1/+4
Should prevent flicker if PP_OVERDRIVE_MASK is set. bug: https://bugs.freedesktop.org/show_bug.cgi?id=102646 bug: https://bugs.freedesktop.org/show_bug.cgi?id=108941 bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1088 bug: https://gitlab.freedesktop.org/drm/amd/-/issues/628 Signed-off-by: Sergei Lopatin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amd/display: Change "error" to "dc_log" at amdgpu_dm dpcd reading stageZhan Liu1-1/+1
[Why] If reading dpcd happens ahead of hw initialization, then aconnector is NULL at this point. This is expected, so there is no need to output an error (which will spam dmesg.log) [How] Change type of message from "error" to "DC_LOG_DC". Signed-off-by: Zhan Liu <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Acked-by: Zhan Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-13drm/amdgpu: restrict debugfs register access under SR-IOVYintian Tao4-6/+106
Under bare metal, there is no more else to take care of the GPU register access through MMIO. Under Virtualization, to access GPU register is implemented through KIQ during run-time due to world-switch. Therefore, under SR-IOV user can only access debugfs to r/w GPU registers when meets all three conditions below. - amdgpu_gpu_recovery=0 - TDR happened - in_gpu_reset=0 v2: merge amdgpu_virt_can_access_debugfs() into amdgpu_virt_enable_access_debugfs() v3: drop ret variable in amdgpu_virt_enable_access_debugfs() and directly return result Signed-off-by: Yintian Tao <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-09drm/amdgpu: increased atom cmd timeoutJohn Clements1-2/+5
added macro to define timeout Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-09drm/amd/powerplay: unload mp1 for Arcturus RAS baco resetEvan Quan1-0/+6
This sequence is recommended by PMFW team for the baco reset with PMFW reloaded. And it seems able to address the random failure seen on Arcturus. Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Reviewed-by: John Clements <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-09amdgpu_kms: Remove unnecessary condition checkAurabindo Pillai1-6/+4
Execution will only reach here if the asserted condition is true. Hence there is no need for the additional check. Signed-off-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-09drm/amdgpu/display: fix warning when compiling without debugfsAlex Deucher1-1/+1
fixes unused variable warning. Reported-by: Eric Biggers <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Mikita Lipski <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-09drm/amdgpu: unify fw_write_wait for new gfx9 asicsAaron Liu1-0/+2
Make the fw_write_wait default case true since presumably all new gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM packet with opration=1. Signed-off-by: Aaron Liu <[email protected]> Tested-by: Aaron Liu <[email protected]> Tested-by: Yuxian Dai <[email protected]> Acked-by: Alex Deucher <[email protected]> Acked-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-09drm/amdgpu: support access regs outside of mmio barHawking Zhang3-39/+28
add indirect access support to registers outside of mmio bar. Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-09drm/amdgpu: retire AMDGPU_REGS_KIQ flagHawking Zhang2-5/+4
all the register access through kiq is redirected to amdgpu_kiq_rreg/amdgpu_kiq_wreg Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-09drm/amdgpu: retire RREG32_IDX/WREG32_IDXHawking Zhang2-6/+2
those are not needed anymore Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-09drm/amdgpu: retire indirect mmio reg support from cgsHawking Zhang2-5/+4
not needed anymore Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>