aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
AgeCommit message (Collapse)AuthorFilesLines
2022-01-19drm/amdgpu: modify a pair of functions for the pcie port wreg/rregXiaojian Du1-0/+33
This patch will modify a pair of functions for pcie port wreg/rreg. AMD GPU have had an independent NBIO block from SOC15 arch. If the dirver wants to read/write the address space of the pcie devices, it has to go through the NBIO block. This patch will move the pcie port wreg/rreg functions to "amdgpu_device.c", so that to reuse the functions on the future GPU ASICs. Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-18drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOVJingwen Chen1-1/+1
[Why] This fixes 892deb48269c ("drm/amdgpu: Separate vf2pf work item init from virt data exchange"). we should read pf2vf data based at mman.fw_vram_usage_va after gmc sw_init. commit 892deb48269c breaks this logic. [How] calling amdgpu_virt_exchange_data in amdgpu_virt_init_data_exchange to set the right base in the right sequence. v2: call amdgpu_virt_init_data_exchange after gmc sw_init to make data exchange workqueue run v3: clean up the code logic v4: add some comment and make the code more readable Fixes: 892deb48269c ("drm/amdgpu: Separate vf2pf work item init from virt data exchange") Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Reviewed-by: Horace Chen <horace.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14drm/amdgpu: invert the logic in amdgpu_device_should_recover_gpu()Alex Deucher1-27/+17
Rather than opting into GPU recovery support, default to on, and opt out if it's not working on a particular GPU. This avoids the need to add new asics to this list since this is a core feature. Reviewed-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14drm/amdgpu: Enable recovery on yellow carpCHANDAN VURDIGERE NATARAJ1-0/+1
Add yellow carp to devices which support recovery Signed-off-by: CHANDAN VURDIGERE NATARAJ <chandan.vurdigerenataraj@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14drm/amdgpu: clean up some inconsistent indentingYang Li1-1/+1
Eliminate the follow smatch warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3504 amdgpu_device_init() warn: inconsistent indenting drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1716 amdgpu_ras_error_status_query() warn: if statement not indented drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1058 amdgpu_ras_error_inject() warn: inconsistent indenting Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14drm/amdgpu: Modify mmhub block to fit for the unified ras block data and opsyipechai1-6/+6
1.Modify mmhub block to fit for the unified ras block data and ops. 2.Change amdgpu_mmhub_ras_funcs to amdgpu_mmhub_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of mmhub ras variable so that mmhub ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register mmhub ras block into amdgpu device ras block link list. 5.Remove the redundant code about mmhub in amdgpu_ras.c after using the unified ras block. 5.Remove the redundant code about mmhub in amdgpu_ras.c after using the unified ras block. 6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of mmhub versions. If .ras_late_init and .ras_fini had been defined by the selected mmhub version, the defined functions will take effect; if not defined, default fill them with amdgpu_mmhub_ras_late_init and amdgpu_mmhub_ras_fini. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14drm/amdgpu: Unify ras block interface for each ras blockyipechai1-0/+2
1. Define unified ops interface for each block. 2. Add ras_block_match function pointer in ops interface, each ras block can customize specail match function to identify itself. 3. Add amdgpu_ras_block_match_default new function. If a ras block doesn't define .ras_block_match, default execute amdgpu_ras_block_match_default to identify this ras block. 4. Define unified basic ras block data for each ras block. 5. Create dedicated amdgpu device ras block link list to manage all of the ras blocks. 6. Add amdgpu_ras_register_ras_block new function interface for each ras block to register itself to ras controlling block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14drm/amd/pm: do not expose implementation details to other blocks out of powerEvan Quan1-3/+3
Those implementation details(whether swsmu supported, some ppt_funcs supported, accessing internal statistics ...)should be kept internally. It's not a good practice and even error prone to expose implementation details. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-11drm/amdgpu: not return error on the init_apu_flagsPrike Liang1-4/+2
In some APU project we needn't always assign flags to identify each other, so we may not need return an error. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-11drm/amd/amdgpu: Add pcie indirect support to amdgpu_mm_wreg_mmio_rlc()Tom St Denis1-1/+3
The function amdgpu_mm_wreg_mmio_rlc() is used by debugfs to write to MMIO registers. It didn't support registers beyond the BAR mapped MMIO space. This adds pcie indirect write support. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-11drm/amdgpu: recover gart table at resumeNirmoy Das1-11/+0
Get rid off pin/unpin of gart BO at resume/suspend and instead pin only once and try to recover gart content at resume time. This is much more stable in case there is OOM situation at 2nd call to amdgpu_device_evict_resources() while evicting GART table. v3: remove gart recovery from other places v2: pin gart at amdgpu_gart_table_vram_alloc() Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-11drm/amdgpu: do not pass ttm_resource_manager to gtt_mgrNirmoy Das1-2/+2
Do not allow exported amdgpu_gtt_mgr_*() to accept any ttm_resource_manager pointer. Also there is no need to force other module to call a ttm function just to eventually call gtt_mgr functions. v4: remove unused adev. v3: upcast mgr from ttm resopurce manager instead of getting it from adev. v2: pass adev's gtt_mgr instead of adev. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-11drm/amdgpu: Unmap MMIO mappings when device is not unpluggedLeslie Shi1-0/+11
Patch: 3efb17ae7e92 ("drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure") makes call to amdgpu_device_unmap_mmio() conditioned on device unplugged. This patch unmaps MMIO mappings even when device is not unplugged. v2: Add condition of drm_dev_enter() to deleted unmaps in patch "drm/amdgpu: Unmap all MMIO mappings" Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-07drm/amdgpu: explicitly check for s0ix when evicting resourcesMario Limonciello1-2/+2
This codepath should be running in both s0ix and s3, but only does currently because s3 and s0ix are both set in the s0ix case. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-31Merge tag 'amd-drm-next-5.17-2021-12-30' of ↵Dave Airlie1-17/+29
ssh://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.17-2021-12-30: amdgpu: - Suspend/resume fixes - Fence fix - Misc code cleanups - IP discovery fixes - SRIOV fixes - RAS fixes - GMC 8 VRAM detection fix - FRU fixes for Aldebaran - Display fixes amdkfd: - SVM fixes - IP discovery fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211230141032.613596-1-alexander.deucher@amd.com
2021-12-30drm/amdgpu: no DC support for headless chipsAlex Deucher1-0/+6
Chips with no display hardware should return false for DC support. v2: drop Arcturus and Aldebaran Fixes: f7f12b25823c0d ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support") Reviewed-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reported-by: Tareque Md.Hanif <tarequemd.hanif@yahoo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30drm/amdgpu: Check the memory can be accesssed by ttm_device_clear_dma_mappings.Surbhi Kakarya1-1/+2
If the event guard is enabled and VF doesn't receive an ack from PF for full access, the guest driver load crashes. This is caused due to the call to ttm_device_clear_dma_mappings with non-initialized mman during driver tear down. This patch adds the necessary condition to check if the mman initialization passed or not and takes the path based on the condition output. Signed-off-by: Surbhi Kakarya <Surbhi.Kakarya@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to ↵Leslie Shi1-1/+3
prevent crash in GPU initialization failure [Why] In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver modprobe, it will start the error handle path immediately and call into amdgpu_device_unmap_mmio as well to release mapped VRAM. However, in the following release callback, driver stills visits the unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs. [How] call amdgpu_device_unmap_mmio() if device is unplugged to prevent invalid memory address in vcn_v3_0_sw_fini() when GPU initialization failure. Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handlingsashank saye1-5/+4
For Aldebaran chip passthrough case we need to intimate SMU about special handling for SBR.On older chips we send LightSBR to SMU, enabling the same for Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU would do a heavy reset on SBR. Hence, the word Heavy instead of Light SBR is used for SMU to differentiate. Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: sashank saye <sashank.saye@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28drm/amdgpu: get xgmi info before ip_initVictor Skvortsov1-0/+7
Driver needs to call get_xgmi_info() before ip_init to determine whether it needs to handle a pending hive reset. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: David Nieto <david.nieto@amd.com> Reviewed by: shaoyun.liu <Shaoyun.lui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-23Merge tag 'amd-drm-next-5.17-2021-12-16' of ↵Dave Airlie1-27/+118
https://gitlab.freedesktop.org/agd5f/linux into drm-next amdgpu: - Add some display debugfs entries - RAS fixes - SR-IOV fixes - W=1 fixes - Documentation fixes - IH timestamp fix - Misc power fixes - IP discovery fixes - Large driver documentation updates - Multi-GPU memory use reductions - Misc display fixes and cleanups - Add new SMU debug option amdkfd: - SVM fixes radeon: - Fix typo in comment From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211216202731.5900-1-alexander.deucher@amd.com
2021-12-16drm/amdgpu: Separate vf2pf work item init from virt data exchangeVictor Skvortsov1-1/+5
We want to be able to call virt data exchange conditionally after gmc sw init to reserve bad pages as early as possible. Since this is a conditional call, we will need to call it again unconditionally later in the init sequence. Refactor the data exchange function so it can be called multiple times without re-initializing the work item. v2: Cleaned up the code. Kept the original call to init_exchange_data() inside early init to initialize the work item, afterwards call exchange_data() when needed. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed By: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fenceHuang Rui1-9/+2
The job embedded fence donesn't initialize the flags at dma_fence_init(). Then we will go a wrong way in amdgpu_fence_get_timeline_name callback and trigger a null pointer panic once we enabled the trace event here. So introduce new amdgpu_fence object to indicate the job embedded fence. [ 156.131790] BUG: kernel NULL pointer dereference, address: 00000000000002a0 [ 156.131804] #PF: supervisor read access in kernel mode [ 156.131811] #PF: error_code(0x0000) - not-present page [ 156.131817] PGD 0 P4D 0 [ 156.131824] Oops: 0000 [#1] PREEMPT SMP PTI [ 156.131832] CPU: 6 PID: 1404 Comm: sdma0 Tainted: G OE 5.16.0-rc1-custom #1 [ 156.131842] Hardware name: Gigabyte Technology Co., Ltd. Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016 [ 156.131848] RIP: 0010:strlen+0x0/0x20 [ 156.131859] Code: 89 c0 c3 0f 1f 80 00 00 00 00 48 01 fe eb 0f 0f b6 07 38 d0 74 10 48 83 c7 01 84 c0 74 05 48 39 f7 75 ec 31 c0 c3 48 89 f8 c3 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31 [ 156.131872] RSP: 0018:ffff9bd0018dbcf8 EFLAGS: 00010206 [ 156.131880] RAX: 00000000000002a0 RBX: ffff8d0305ef01b0 RCX: 000000000000000b [ 156.131888] RDX: ffff8d03772ab924 RSI: ffff8d0305ef01b0 RDI: 00000000000002a0 [ 156.131895] RBP: ffff9bd0018dbd60 R08: ffff8d03002094d0 R09: 0000000000000000 [ 156.131901] R10: 000000000000005e R11: 0000000000000065 R12: ffff8d03002094d0 [ 156.131907] R13: 000000000000001f R14: 0000000000070018 R15: 0000000000000007 [ 156.131914] FS: 0000000000000000(0000) GS:ffff8d062ed80000(0000) knlGS:0000000000000000 [ 156.131923] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 156.131929] CR2: 00000000000002a0 CR3: 000000001120a005 CR4: 00000000003706e0 [ 156.131937] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 156.131942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 156.131949] Call Trace: [ 156.131953] <TASK> [ 156.131957] ? trace_event_raw_event_dma_fence+0xcc/0x200 [ 156.131973] ? ring_buffer_unlock_commit+0x23/0x130 [ 156.131982] dma_fence_init+0x92/0xb0 [ 156.131993] amdgpu_fence_emit+0x10d/0x2b0 [amdgpu] [ 156.132302] amdgpu_ib_schedule+0x2f9/0x580 [amdgpu] [ 156.132586] amdgpu_job_run+0xed/0x220 [amdgpu] v2: fix mismatch warning between the prototype and function name (Ray, kernel test robot) Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14amdgpu: fix some kernel-doc markupYann Dirson1-3/+3
Those are not today pulled by the sphinx doc, but better be ready. Signed-off-by: Yann Dirson <ydirson@free.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14drm/amdgpu: use adev_to_drm to get drm_device pointerGuchun Chen1-1/+1
Updated for consistency when accessing drm_device from amdgpu driver. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14Merge v5.16-rc5 into drm-nextDaniel Vetter1-6/+9
Thomas Zimmermann requested a fixes backmerge, specifically also for 96c5f82ef0a1 ("drm/vc4: fix error code in vc4_create_object()") Just a bunch of adjacent changes conflicts, even the big pile of them in vc4. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-12-13drm/amdgpu: Detect if amdgpu in IOMMU direct map modePhilip Yang1-0/+19
If host and amdgpu IOMMU is not enabled or IOMMU is pass through mode, set adev->ram_is_direct_mapped flag which will be used to optimize memory usage for multi GPU mappings. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: introduce a kind of halt state for amdgpu deviceLang Yu1-0/+39
It is useful to maintain error context when debugging SW/FW issues. Introduce amdgpu_device_halt() for this purpose. It will bring hardware to a kind of halt state, so that no one can touch it any more. Compare to a simple hang, the system will keep stable at least for SSH access. Then it should be trivial to inspect the hardware state and see what's going on. v2: - Set adev->no_hw_access earlier to avoid potential crashes.(Christian) Suggested-by: Christian Koenig <christian.koenig@amd.com> Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Christian Koenig <christian.koenig@amd.co> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: only hw fini SMU fisrt for ASICs need thatLang Yu1-15/+32
We found some headaches on ASICs don't need that, so remove that for them. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Kevin Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amd: fix improper docstring syntaxIsabella Basso1-2/+2
This fixes various warnings relating to erroneous docstring syntax, of which some are listed below: warning: Function parameter or member 'adev' not described in 'amdgpu_atomfirmware_ras_rom_addr' ... warning: expecting prototype for amdgpu_atpx_validate_functions(). Prototype was for amdgpu_atpx_validate() instead ... warning: Excess function parameter 'mem' description in 'amdgpu_preempt_mgr_new' ... warning: Cannot understand * @kfd_get_cu_occupancy - Collect number of waves in-flight on this device ... warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: recover XGMI topology for SRIOV VF after resetZhigang Luo1-3/+14
For SRIOV VF, the XGMI topology was not recovered after reset. This change added code to SRIOV VF reset function to update XGMI topology for SRIOV VF after reset. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: skip reset other device in the same hive if it's SRIOV VFZhigang Luo1-3/+4
On SRIOV, host driver can support FLR(function level reset) on individual VF within the hive which might bring the individual device back to normal without the necessary to execute the hive reset. If the FLR failed , host driver will trigger the hive reset, each guest VF will get reset notification before the real hive reset been executed. The VF device can handle the reset request individually in it's reset work handler. This change updated gpu recover sequence to skip reset other device in the same hive for SRIOV VF. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01drm/amdgpu: adjust the kfd reset sequence in reset sriov functionshaoyunl1-4/+8
This change revert previous commits: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") 271fd38ce56d ("drm/amdgpu: move kfd post_reset out of reset_sriov function") This change moves the amdgpu_amdkfd_pre_reset to an earlier place in amdgpu_device_reset_sriov, presumably to address the sequence issue that the first patch was originally meant to fix. Some register access(GRBM_GFX_CNTL) only be allowed on full access mode. Move kfd_pre_reset and kfd_post_reset back inside reset_sriov function. Fixes: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") Fixes: 271fd38ce56d ("drm/amdgpu: move kfd post_reset out of reset_sriov function") Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01drm/amdgpu: check atomic flag to differeniate with legacy pathFlora Cui1-2/+2
since vkms support atomic KMS interface Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Alex Deucher <aleander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01drm/amdgpu: adjust the kfd reset sequence in reset sriov functionshaoyunl1-4/+8
This change revert previous commits: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") 271fd38ce56d ("drm/amdgpu: move kfd post_reset out of reset_sriov function") This change moves the amdgpu_amdkfd_pre_reset to an earlier place in amdgpu_device_reset_sriov, presumably to address the sequence issue that the first patch was originally meant to fix. Some register access(GRBM_GFX_CNTL) only be allowed on full access mode. Move kfd_pre_reset and kfd_post_reset back inside reset_sriov function. Fixes: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") Fixes: 271fd38ce56d ("drm/amdgpu: move kfd post_reset out of reset_sriov function") Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01drm/amdgpu: fix disable ras feature failed when unload drvier v2Stanley.Yang1-2/+3
v2: still need call ras_disable_all_featrures to handle ras initilization failure case. Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw, so ras ta will unload before send ras disable command, ras dsiable operation must before hw fini. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01drm/amdgpu: check atomic flag to differeniate with legacy pathFlora Cui1-2/+2
since vkms support atomic KMS interface Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Alex Deucher <aleander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amdgpu: move kfd post_reset out of reset_sriov functionshaoyunl1-4/+3
Fixes: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") For sriov XGMI configuration, the host driver will handle the hive reset, so in guest side, the reset_sriov only be called once on one device. This will make kfd post_reset unblanced with kfd pre_reset since kfd pre_reset already been moved out of reset_sriov function. Move kfd post_reset out of reset_sriov function to make them balance . Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amdgpu: move kfd post_reset out of reset_sriov functionshaoyunl1-4/+3
Fixes: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") For sriov XGMI configuration, the host driver will handle the hive reset, so in guest side, the reset_sriov only be called once on one device. This will make kfd post_reset unblanced with kfd pre_reset since kfd pre_reset already been moved out of reset_sriov function. Move kfd post_reset out of reset_sriov function to make them balance . Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22drm/amd/pm: avoid duplicate powergate/ungate settingEvan Quan1-0/+3
Just bail out if the target IP block is already in the desired powergate/ungate state. This can avoid some duplicate settings which sometimes may cause unexpected issues. Link: https://lore.kernel.org/all/YV81vidWQLWvATMM@zn.tnic/ Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214921 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215025 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1789 Signed-off-by: Evan Quan <evan.quan@amd.com> Tested-by: Borislav Petkov <bp@suse.de> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17drm/amd/pm: avoid duplicate powergate/ungate settingEvan Quan1-0/+3
Just bail out if the target IP block is already in the desired powergate/ungate state. This can avoid some duplicate settings which sometimes may cause unexpected issues. Link: https://lore.kernel.org/all/YV81vidWQLWvATMM@zn.tnic/ Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214921 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215025 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1789 Fixes: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend") Signed-off-by: Evan Quan <evan.quan@amd.com> Tested-by: Borislav Petkov <bp@suse.de> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-11-17drm/amdgpu: use generic fb helpers instead of setting up AMD own's.Evan Quan1-8/+4
With the shadow buffer support from generic framebuffer emulation, it's possible now to have runpm kicked when no update for console. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-09drm/amd/amdgpu: fix the kfd pre_reset sequence in sriovshaoyunl1-4/+1
The KFD pre_reset should be called before reset been executed, it will hold the lock to prevent other rocm process to sent the packlage to hiq during host execute the real reset on the HW Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05drm/amdgpu: fix SI handling in amdgpu_device_asic_has_dc_support()Alex Deucher1-1/+11
Properly handle SI DC support when CONFIG_DRM_AMD_DC_SI is not set. Fixes: f7f12b25823c0d ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support") Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-03drm/amdgpu: remove duplicated kfd_resume_iommuJames Zhu1-4/+0
Remove duplicated kfd_resume_iommu which already runs in mdgpu_amdkfd_device_init. Tested-By: Ken Moffat <zarniwhoop@ntlworld.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-03drm/amd/amdgpu: fix bad job hw_fence use after free in advance tdrJingwen Chen1-0/+4
[Why] In advance tdr mode, the real bad job will be resubmitted twice, while in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job is put one more time than other jobs. [How] Adding dma_fence_get before resbumit job in amdgpu_device_recheck_guilty_jobs and put the fence for normal jobs Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28drm/amdgpu: fix a potential memory leak in amdgpu_device_fini_sw()Lang Yu1-1/+1
amdgpu_fence_driver_sw_fini() should be executed before amdgpu_device_ip_fini(), otherwise fence driver resource won't be properly freed as adev->rings have been tore down. Fixes: 72c8c97b1522 ("drm/amdgpu: Split amdgpu_device_fini into early and late") Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28BackMerge tag 'v5.15-rc7' into drm-nextDave Airlie1-0/+4
The msm next tree is based on rc3, so let's just backmerge rc7 before pulling it in. Signed-off-by: Dave Airlie <airlied@redhat.com>
2021-10-19drm/amd/amdgpu: Do irq_fini_hw after ip_fini_earlyYuBiao Wang1-2/+2
[Why] drm_irq_uninstall is called in irq_fini_hw so that irq is disabled in sw stage. SMU (and maybe other IP blocks) fini_hw will call irq_put for cleanup and the whole cleanup process will be skipped because of drm->irq_enable = false. [How] Move ip_fini_early before irq_fini_hw. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-13drm/amdkfd: fix boot failure when iommu is disabled in Picasso.Yifan Zhang1-4/+0
When IOMMU disabled in sbios and kfd in iommuv2 path, iommuv2 init will fail. But this failure should not block amdgpu driver init. Reported-by: youling <youling257@gmail.com> Tested-by: youling <youling257@gmail.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>