aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2021-12-14Merge v5.16-rc5 into drm-nextDaniel Vetter7-14/+25
Thomas Zimmermann requested a fixes backmerge, specifically also for 96c5f82ef0a1 ("drm/vc4: fix error code in vc4_create_object()") Just a bunch of adjacent changes conflicts, even the big pile of them in vc4. Signed-off-by: Daniel Vetter <[email protected]>
2021-12-13drm:amdgpu:remove unneeded variablechiminghao2-8/+3
return value form directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <[email protected]> Signed-off-by: chiminghao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: re-format file header commentsIsabella Basso1-1/+1
Fix the warning below: warning: Cannot understand * \file amdgpu_ioc32.c on line 2 - I thought it was a doc line Changes since v1: - As suggested by Alexander Deucher: 1. Reduce diff to minimum as this DOC section doesn't provide much value. Signed-off-by: Isabella Basso <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: fix amdgpu_ras_mca_query_error_status scopeIsabella Basso1-3/+3
This commit fixes the compile-time warning below: warning: no previous prototype for ‘amdgpu_ras_mca_query_error_status’ [-Wmissing-prototypes] Changes since v1: - As suggested by Alexander Deucher: 1. Make function static instead of adding prototype. Signed-off-by: Isabella Basso <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: Reduce SG bo memory usage for mGPUsPhilip Yang1-4/+6
For userptr bo, if adev is not in IOMMU isolation mode, RAM direct map to GPU, multiple GPUs use same system memory dma mapping address, they can share the original mem->bo in attachment to reduce dma address array memory usage. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: Detect if amdgpu in IOMMU direct map modePhilip Yang2-0/+21
If host and amdgpu IOMMU is not enabled or IOMMU is pass through mode, set adev->ram_is_direct_mapped flag which will be used to optimize memory usage for multi GPU mappings. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: add support for SMU debug optionLang Yu1-0/+3
SMU firmware expects the driver maintains error context and doesn't interact with SMU any more when SMU errors occurred. That will aid in debugging SMU firmware issues. Add SMU debug option support for this request, it can be enabled or disabled via amdgpu_smu_debug debugfs file. Use a 32-bit mask to indicate corresponding debug modes. Currently, only one mode(HALT_ON_ERROR) is supported. When enabled, it brings hardware to a kind of halt state so that no one can touch it any more in the envent of SMU errors. The dirver interacts with SMU via sending messages. And threre are three ways to sending messages to SMU in current implementation. Handle them respectively as following: 1, smu_cmn_send_smc_msg_with_param() for normal timeout cases Halt on any error. 2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response() for longer timeout cases Halt on errors apart from ETIME. Otherwise this way won't work. Let the user handle ETIME error in such a case. 3, smu_cmn_send_msg_without_waiting() for no waiting cases Halt on errors apart from ETIME. Otherwise second way won't work. == Command Guide == 1, enable HALT_ON_ERROR mode # echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug 2, disable HALT_ON_ERROR mode # echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug v5: - Use bit mask to allow more debug features.(Evan) - Use WRAN() instead of BUG().(Evan) v4: - Set to halt state instead of a simple hang.(Christian) v3: - Use debugfs_create_bool().(Christian) - Put variable into smu_context struct. - Don't resend command when timeout. v2: - Resend command when timeout.(Lijo) - Use debugfs file instead of module parameter. Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: introduce a kind of halt state for amdgpu deviceLang Yu2-0/+41
It is useful to maintain error context when debugging SW/FW issues. Introduce amdgpu_device_halt() for this purpose. It will bring hardware to a kind of halt state, so that no one can touch it any more. Compare to a simple hang, the system will keep stable at least for SSH access. Then it should be trivial to inspect the hardware state and see what's going on. v2: - Set adev->no_hw_access earlier to avoid potential crashes.(Christian) Suggested-by: Christian Koenig <[email protected]> Suggested-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Christian Koenig <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: check df_funcs and its callback pointersHawking Zhang4-6/+38
in case they are not avaiable in early phase Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Le Ma <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: don't override default ECO_BITs settingHawking Zhang8-9/+0
Leave this bit as hardware default setting Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORELe Ma1-2/+2
should count on GC IP base address Signed-off-by: Le Ma <[email protected]> Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: read and authenticate ip discovery binaryHawking Zhang1-23/+23
read and authenticate ip discovery binary getting from vram first, if it is not valid, read and authenticate the one getting from file Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: add helper to verify ip discovery binary signatureHawking Zhang1-0/+8
To be used to check ip discovery binary signature Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: rename discovery_read_binary helperHawking Zhang1-2/+2
add _from_vram in the funciton name to diffrentiate the one used to read from file Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: add helper to load ip_discovery binary from fileHawking Zhang1-1/+30
To be used when ip_discovery binary is not carried by vbios Signed-off-by: Hawking Zhang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: fix incorrect VCN revision in SRIOVLeslie Shi4-34/+14
Guest OS will setup VCN instance 1 which is disabled as an enabled instance and execute initialization work on it, but this causes VCN ib ring test failure on the disabled VCN instance during modprobe: amdgpu 0000:00:08.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 5 on hub 1 amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_dec_0 (-110). amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_enc_0.0 (-110). [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110). v2: drop amdgpu_discovery_get_vcn_version and rename sriov_config to vcn_config v3: modify VCN's revision in SR-IOV and bare-metal Fixes: baf3f8f3740625 ("drm/amdgpu: handle SRIOV VCN revision parsing") Signed-off-by: Leslie Shi <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: add modifiers in amdgpu_vkms_plane_init()Leslie Shi1-1/+2
Fix following warning in SRIOV during modprobe: amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu] Signed-off-by: Leslie Shi <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: only hw fini SMU fisrt for ASICs need thatLang Yu1-15/+32
We found some headaches on ASICs don't need that, so remove that for them. Suggested-by: Lijo Lazar <[email protected]> Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Kevin Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: Handle fault with same timestampPhilip Yang2-5/+2
Remove not unique timestamp WARNING as same timestamp interrupt happens on some chips, Drain fault need to wait for the processed_timestamp to be truly greater than the checkpoint or the ring to be empty to be sure no stale faults are handled. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1818 Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: fix location of prototype for amdgpu_kms_compat_ioctlIsabella Basso2-2/+3
This fixes the warning below by changing the prototype to a location that's actually included by the .c files that call amdgpu_kms_compat_ioctl: warning: no previous prototype for ‘amdgpu_kms_compat_ioctl’ [-Wmissing-prototypes] 37 | long amdgpu_kms_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) | ^~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Isabella Basso <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amd: append missing includesIsabella Basso1-0/+1
This fixes warnings caused by global functions lacking prototypes:, such as: warning: no previous prototype for 'dcn303_hw_sequencer_construct' [-Wmissing-prototypes] 12 | void dcn303_hw_sequencer_construct(struct dc *dc) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... warning: no previous prototype for ‘amdgpu_has_atpx’ [-Wmissing-prototypes] 76 | bool amdgpu_has_atpx(void) { | ^~~~~~~~~~~~~~~ Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Isabella Basso <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: fix function scopesIsabella Basso1-2/+2
This turns previously global functions into static, thus removing compile-time warnings such as: warning: no previous prototype for 'amdgpu_vkms_output_init' [-Wmissing-prototypes] 399 | int amdgpu_vkms_output_init(struct drm_device *dev, | ^~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Isabella Basso <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amd: fix improper docstring syntaxIsabella Basso10-16/+13
This fixes various warnings relating to erroneous docstring syntax, of which some are listed below: warning: Function parameter or member 'adev' not described in 'amdgpu_atomfirmware_ras_rom_addr' ... warning: expecting prototype for amdgpu_atpx_validate_functions(). Prototype was for amdgpu_atpx_validate() instead ... warning: Excess function parameter 'mem' description in 'amdgpu_preempt_mgr_new' ... warning: Cannot understand * @kfd_get_cu_occupancy - Collect number of waves in-flight on this device ... warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: Isabella Basso <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: extended waiting SRIOV VF reset completion timeout to 10sZhigang Luo1-1/+1
For the ASIC has big FB, it need more time to clear FB during reset. This change extended SRIOV VF waiting reset completion timeout from 5s to 10s. Signed-off-by: Zhigang Luo <[email protected]> Acked-by: Shaoyun Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: recover XGMI topology for SRIOV VF after resetZhigang Luo1-3/+14
For SRIOV VF, the XGMI topology was not recovered after reset. This change added code to SRIOV VF reset function to update XGMI topology for SRIOV VF after reset. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Shaoyun Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: added PSP XGMI initialization for SRIOV VF during recoverZhigang Luo1-0/+12
For SRIOV VF, XGMI was not initialized in PSP during recover. This change added PSP XGMI initialization for SRIOV VF during recover. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Shaoyun Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: skip reset other device in the same hive if it's SRIOV VFZhigang Luo1-3/+4
On SRIOV, host driver can support FLR(function level reset) on individual VF within the hive which might bring the individual device back to normal without the necessary to execute the hive reset. If the FLR failed , host driver will trigger the hive reset, each guest VF will get reset notification before the real hive reset been executed. The VF device can handle the reset request individually in it's reset work handler. This change updated gpu recover sequence to skip reset other device in the same hive for SRIOV VF. Signed-off-by: Zhigang Luo <[email protected]> Reviewed-by: Shaoyun Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-13drm/amdgpu: enable RAS poison flag when GPU is connected to CPUTao Zhou1-1/+5
The RAS poison mode is enabled by default on the platform. Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-10Merge tag 'amd-drm-next-5.17-2021-12-02' of ↵Dave Airlie42-1180/+961
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.17-2021-12-02: amdgpu: - Use generic drm fb helpers - PSR fixes - Rework DCN3.1 clkmgr - DPCD 1.3 fixes - Misc display fixes can cleanups - Clock query fixes for APUs - LTTPR fixes - DSC fixes - Misc PM fixes - RAS fixes - OLED backlight fix - SRIOV fixes - Add STB (Smart Trace Buffer) for supported dGPUs - IH rework - Enable seamless boot for DCN3.01 amdkfd: - Rework more stuff around IP discovery enumeration - Further clean up of interfaces with amdgpu - SVM fixes radeon: - Indentation fixes UAPI: - Add a new KFD header that defines some of the sysfs bitfields and enums that userspace has been using for a while The corresponding bit-fields and enums in user mode are defined in https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/master/include/hsakmttypes.h Signed-off-by: Dave Airlie <[email protected]> # Conflicts: # drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-12-09drm/amdgpu: don't skip runtime pm get on A+A configChristian König1-3/+0
The runtime PM get was incorrectly added after the check. Signed-off-by: Christian König <[email protected]> Acked-by: Evan Quan <[email protected]> Acked-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-12-09Merge tag 'drm-misc-next-2021-11-29' of ↵Daniel Vetter2-13/+1
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.17: UAPI Changes: Cross-subsystem Changes: * Move 'nomodeset' kernel boot option into DRM subsystem Core Changes: * Replace several DRM_*() logging macros with drm_*() equivalents * panel: Add quirk for Lenovo Yoga Book X91F/L * ttm: Documentation fixes Driver Changes: * Cleanup nomodeset handling in drivers * Fixes * bridge/anx7625: Fix reading EDID; Fix error code * bridge/megachips: Probe both bridges before registering * vboxvideo: Fix ERR_PTR usage Signed-off-by: Daniel Vetter <[email protected]> From: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-12-07drm/amdgpu: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmiClaudio Suarez4-12/+12
Once EDID is parsed, the monitor HDMI support information is available through drm_display_info.is_hdmi. The amdgpu driver still calls drm_detect_hdmi_monitor() to retrieve the same information, which is less efficient. Change to drm_display_info.is_hdmi This is a TODO task in Documentation/gpu/todo.rst Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Claudio Suarez <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-07drm/amdgpu: update drm_display_info correctly when the edid is readClaudio Suarez1-1/+4
drm_display_info is updated by drm_get_edid() or drm_connector_update_edid_property(). In the amdgpu driver it is almost always updated when the edid is read in amdgpu_connector_get_edid(), but not always. Change amdgpu_connector_get_edid() and amdgpu_connector_free_edid() to keep drm_display_info updated. Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Claudio Suarez <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-07drm/amdgpu: skip umc ras error count harvestStanley.Yang1-5/+10
remove in recovery stat check, skip umc ras err cnt harvest in amdgpu_ras_log_on_err_counter Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-07drm/amdgpu: free vkms_output after useFlora Cui1-7/+9
Signed-off-by: Flora Cui <[email protected]> Reviewed-by: Leslie Shi <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-07drm/amdgpu: drop the critial WARN_ON in amdgpu_vkmsFlora Cui1-1/+2
Signed-off-by: Flora Cui <[email protected]> Reviewed-by: Leslie Shi <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-07drm/amdgpu: only skip get ecc info for aldebaranStanley.Yang1-1/+2
skip get ecc info for aldebarn through check ip version do not affect other asic type Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-02drm/amdgpu: Fix a NULL pointer dereference in amdgpu_connector_lcd_native_mode()Zhou Qingyang1-0/+6
In amdgpu_connector_lcd_native_mode(), the return value of drm_mode_duplicate() is assigned to mode, and there is a dereference of it in amdgpu_connector_lcd_native_mode(), which will lead to a NULL pointer dereference on failure of drm_mode_duplicate(). Fix this bug add a check of mode. This bug was found by a static analyzer. The analysis employs differential checking to identify inconsistent security operations (e.g., checks or kfrees) between two code paths and confirms that the inconsistent operations are not recovered in the current function or the callers, so they constitute bugs. Note that, as a bug found by static analysis, it can be a false positive or hard to trigger. Multiple researchers have cross-reviewed the bug. Builds with CONFIG_DRM_AMDGPU=m show no new warnings, and our static analyzer no longer warns about this code. Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") Signed-off-by: Zhou Qingyang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-02drm/amdgpu: handle SRIOV VCN revision parsingAlex Deucher4-7/+15
For SR-IOV, the IP discovery revision number encodes additional information. Handle that case here. v2: drop additional IP versions Reviewed-by: Guchun Chen <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-02drm/amdgpu: skip query ecc info in gpu recoveryStanley.Yang1-0/+4
this is a workaround due to get ecc info failed during gpu recovery [ 700.236122] amdgpu 0000:09:00.0: amdgpu: Failed to export SMU ecc table! [ 700.236128] amdgpu 0000:09:00.0: amdgpu: GPU reset begin! [ 704.331171] amdgpu: qcm fence wait loop timeout expired [ 704.331194] amdgpu: The cp might be in an unrecoverable state due to an unsuccessful queues preemption [ 704.332445] amdgpu 0000:09:00.0: amdgpu: GPU reset begin! [ 704.332448] amdgpu 0000:09:00.0: amdgpu: Bailing on TDR for s_job:ffffffffffffffff, as another already in progress [ 704.332456] amdgpu: Pasid 0x8000 destroy queue 0 failed, ret -62 [ 710.360924] amdgpu 0000:09:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000013 SMN_C2PMSG_82:0x00000007 [ 710.360964] amdgpu 0000:09:00.0: amdgpu: Failed to disable smu features. [ 710.361002] amdgpu 0000:09:00.0: amdgpu: Fail to disable dpm features! [ 710.361014] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62 Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-01drm/amdgpu: adjust the kfd reset sequence in reset sriov functionshaoyunl1-4/+8
This change revert previous commits: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") 271fd38ce56d ("drm/amdgpu: move kfd post_reset out of reset_sriov function") This change moves the amdgpu_amdkfd_pre_reset to an earlier place in amdgpu_device_reset_sriov, presumably to address the sequence issue that the first patch was originally meant to fix. Some register access(GRBM_GFX_CNTL) only be allowed on full access mode. Move kfd_pre_reset and kfd_post_reset back inside reset_sriov function. Fixes: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") Fixes: 271fd38ce56d ("drm/amdgpu: move kfd post_reset out of reset_sriov function") Signed-off-by: shaoyunl <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-01drm/amdkfd: fix double free mem structurePhilip Yang1-3/+5
drm_gem_object_put calls release_notify callback to free the mem structure and unreserve_mem_limit, move it down after the last access of mem and make it conditional call. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-01drm/amdgpu: Don't halt RLC on GFX suspendLijo Lazar1-3/+4
On aldebaran, RLC also controls GFXCLK. Skip halting RLC during GFX IP suspend and keep it running till PMFW disables all DPMs. [ 578.019986] amdgpu 0000:23:00.0: amdgpu: GPU reset begin! [ 583.245566] amdgpu 0000:23:00.0: amdgpu: Failed to disable smu features. [ 583.245621] amdgpu 0000:23:00.0: amdgpu: Fail to disable dpm features! [ 583.245639] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62 [ 583.248504] [drm] free PSP TMR buffer Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-01drm/amdgpu: fix the missed handling for SDMA2 and SDMA3Guchun Chen1-0/+2
There is no base reg offset or ip_version set for SDMA2 and SDMA3 on SIENNA_CICHLID, so add them. Signed-off-by: Guchun Chen <[email protected]> Reviewed-by: Kevin Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-01drm/amdgpu: check atomic flag to differeniate with legacy pathFlora Cui1-2/+2
since vkms support atomic KMS interface Signed-off-by: Flora Cui <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-01drm/amdgpu: cancel the correct hrtimer on exitFlora Cui1-2/+2
Signed-off-by: Flora Cui <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-01drm/amdgpu/sriov/vcn: add new vcn ip revision check case for SIENNA_CICHLIDJane Jian3-0/+3
[WHY] for sriov odd# vf will modify vcn0 engine ip revision(due to multimedia bandwidth feature), which will be mismatched with original vcn0 revision [HOW] add new version check for vcn0 disabled revision(3, 0, 192), typically modified under sriov mode Signed-off-by: Jane Jian <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-01drm/amdgpu: update fw_load_type module parameter doc to match codeYann Dirson1-2/+5
amdgpu_ucode_get_load_type() does not interpret this parameter as documented. It is ignored for many ASIC types (which presumably only support one load_type), and when not ignored it is only used to force direct loading instead of PSP loading. SMU loading is only available for ASICs for which the parameter is ignored. Signed-off-by: Yann Dirson <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-01drm/amdkfd: err_pin_bo path leaks kfd_bo_listPhilip Yang1-8/+6
Refactor userptr and pin_bo path to make it less confusing, move err_pin_bo label up to remove mem from process_info kfd_bo_list. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-12-01drm/amdgpu: adjust the kfd reset sequence in reset sriov functionshaoyunl1-4/+8
This change revert previous commits: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") 271fd38ce56d ("drm/amdgpu: move kfd post_reset out of reset_sriov function") This change moves the amdgpu_amdkfd_pre_reset to an earlier place in amdgpu_device_reset_sriov, presumably to address the sequence issue that the first patch was originally meant to fix. Some register access(GRBM_GFX_CNTL) only be allowed on full access mode. Move kfd_pre_reset and kfd_post_reset back inside reset_sriov function. Fixes: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") Fixes: 271fd38ce56d ("drm/amdgpu: move kfd post_reset out of reset_sriov function") Signed-off-by: shaoyunl <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>