aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
AgeCommit message (Collapse)AuthorFilesLines
2024-10-15drm/amdgpu: enable enforce_isolation sysfs node on VFsAlex Deucher1-7/+4
It should be enabled on both bare metal and VFs. Fixes: e189be9b2e38 ("drm/amdgpu: Add enforce_isolation sysfs attribute") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> (cherry picked from commit dc8847b054fd6679866ed4ee861e069e54c10799)
2024-09-06drm/amdgpu: Replace 'amdgpu_job_submit_direct' with 'drm_sched_entity' in ↵Srinivasan Shanmugam1-16/+19
cleaner shader This commit replaces the use of amdgpu_job_submit_direct which submits the job to the ring directly, with drm_sched_entity in the cleaner shader job submission process. The change allows the GPU scheduler to manage the cleaner shader job. - The job is then submitted to the GPU using the drm_sched_entity_push_job function, which allows the GPU scheduler to manage the job. This change improves the reliability of the cleaner shader job submission process by leveraging the capabilities of the GPU scheduler. Fixes: d361ad5d2fc0 ("drm/amdgpu: Add sysfs interface for running cleaner shader") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-29drm/amdgpu/mes: add mes mapping legacy queue switchJack Xiao1-2/+2
For mes11 old firmware has issue to map legacy queue, add a flag to switch mes to map legacy queue. Fixes: f9d8c5c7855d ("drm/amdgpu/gfx: enable mes to map legacy queue support") Reported-by: Andrew Worsley <amworsley@gmail.com> Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serializationSrinivasan Shanmugam1-0/+167
This commit introduces the Enforce Isolation Handler designed to enforce shader isolation on AMD GPUs, which helps to prevent data leakage between different processes. The handler counts the number of emitted fences for each GFX and compute ring. If there are any fences, it schedules the `enforce_isolation_work` to be run after a delay of `GFX_SLICE_PERIOD`. If there are no fences, it signals the Kernel Fusion Driver (KFD) to resume the runqueue. The function is synchronized using the `enforce_isolation_mutex`. This commit also introduces a reference count mechanism (kfd_sch_req_count) to keep track of the number of requests to enable the KFD scheduler. When a request to enable the KFD scheduler is made, the reference count is decremented. When the reference count reaches zero, a delayed work is scheduled to enforce isolation after a delay of GFX_SLICE_PERIOD. When a request to disable the KFD scheduler is made, the function first checks if the reference count is zero. If it is, it cancels the delayed work for enforcing isolation and checks if the KFD scheduler is active. If the KFD scheduler is active, it sends a request to stop the KFD scheduler and sets the KFD scheduler state to inactive. Then, it increments the reference count. The function is synchronized using the kfd_sch_mutex to ensure that the KFD scheduler state and reference count are updated atomically. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20drm/amdgpu: Add sysfs interface for running cleaner shaderSrinivasan Shanmugam1-0/+134
This patch adds a new sysfs interface for running the cleaner shader on AMD GPUs. The cleaner shader is used to clear GPU memory before it's reused, which can help prevent data leakage between different processes. The new sysfs file is write-only and is named `run_cleaner_shader`. Write the number of the partition to this file to trigger the cleaner shader on that partition. There is only one partition on GPUs which do not support partitioning. Changes made in this patch: - Added `amdgpu_set_run_cleaner_shader` function to handle writes to the `run_cleaner_shader` sysfs file. - Added `run_cleaner_shader` to the list of device attributes in `amdgpu_device_attrs`. - Updated `default_attr_update` to handle `run_cleaner_shader`. - Added `AMDGPU_DEVICE_ATTR_WO` macro to create write-only device attributes. v2: fix error handling (Alex) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
2024-08-20drm/amdgpu: Add enforce_isolation sysfs attributeSrinivasan Shanmugam1-0/+101
This commit adds a new sysfs attribute 'enforce_isolation' to control the 'enforce_isolation' setting per GPU. The attribute can be read and written, and accepts values 0 (disabled) and 1 (enabled). When 'enforce_isolation' is enabled, reserved VMIDs are allocated for each ring. When it's disabled, the reserved VMIDs are freed. The set function locks a mutex before changing the 'enforce_isolation' flag and the VMIDs, and unlocks it afterwards. This ensures that these operations are atomic and prevents race conditions and other concurrency issues. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu: Add infrastructure for Cleaner Shader featureSrinivasan Shanmugam1-0/+35
The cleaner shader is used by the CP firmware to clean LDS and GPRs between processes on the CUs. This adds an internal API for GFX IP code to allocate and initialize the cleaner shader. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>
2024-08-13drm/amdgpu/mes12: fix suspend issueJack Xiao1-0/+22
Use mes pipe to unmap kcq and kgq. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13drm/amdgpu/mes: add multiple mes ring instances supportJack Xiao1-2/+2
Add multiple mes ring instances in mes structure to support multiple mes pipes. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06drm/amdgpu: fix unchecked return value warning for amdgpu_gfxTim Huang1-4/+14
This resolves the unchecded return value warning reported by Coverity. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: create amdgpu_ras_in_recovery to simplify codeTao Zhou1-12/+2
Reduce redundant code and user doesn't need to pay attention to RAS details. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amdgpu: Fix type mismatch in amdgpu_gfx_kiq_init_ringSrinivasan Shanmugam1-1/+2
This commit fixes a type mismatch in the amdgpu_gfx_kiq_init_ring function triggered by the snprintf function expecting unsigned char arguments due to the '%hhu' format specifier, but receiving int and u32 arguments. The issue occurred because the arguments xcc_id, ring->me, ring->pipe, and ring->queue were of type int and u32, not unsigned char. This led to a type mismatch when these arguments were passed to snprintf. To resolve this, the snprintf line was modified to cast these arguments to unsigned char. This ensures that the arguments are of the correct type for the '%hhu' format specifier and resolves the warning. Fixes the below: >> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:4: warning: format >> specifies type 'unsigned char' but the argument has type 'int' >> [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~ >> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:12: warning: format >> specifies type 'unsigned char' but the argument has type 'u32' (aka >> 'unsigned int') [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:22: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:34: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] xcc_id, ring->me, ring->pipe, ring->queue); ^~~~~~~~~~~ 4 warnings generated. Fixes: 0ea554455542 ("drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405250446.XeaWe66u-lkp@intel.com/ Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ringSrinivasan Shanmugam1-1/+1
This commit fixes a format truncation issue arosed by the snprintf function potentially writing more characters into the ring->name buffer than it can hold, in the amdgpu_gfx_kiq_init_ring function The issue occurred because the '%d' format specifier could write between 1 and 10 bytes into a region of size between 0 and 8, depending on the values of xcc_id, ring->me, ring->pipe, and ring->queue. The snprintf function could output between 12 and 41 bytes into a destination of size 16, leading to potential truncation. To resolve this, the snprintf line was modified to use the '%hhu' format specifier for xcc_id, ring->me, ring->pipe, and ring->queue. The '%hhu' specifier is used for unsigned char variables and ensures that these values are printed as unsigned decimal integers. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_kiq_init_ring’: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:61: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size between 0 and 8 [-Wformat-truncation=] 332 | snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d", | ^~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:50: note: directive argument in the range [0, 2147483647] 332 | snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d", | ^~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:9: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 16 332 | snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 333 | xcc_id, ring->me, ring->pipe, ring->queue); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 345a36c4f1ba ("drm/amdgpu: prefer snprintf over sprintf") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13drm/amdgpu/mes: fix mes12 to map legacy queueJack Xiao1-25/+46
Adjust mes12 initialization sequence to fix mapping legacy queue. v2: use dev_err. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02drm/amdgpu/gfx: enable mes to map legacy queue supportJack Xiao1-2/+2
Enable mes to map legacy queue support. v2: drop unused gfx_v12_0_kiq_enable_kgq() (Alex) Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02drm/amdgpu: Add gfx v9_4_4 ip blockHawking Zhang1-1/+2
Add gfx v9_4_4 ip block support Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30drm/amdgpu/gfx: enable mes to map legacy queue supportJack Xiao1-9/+41
Enable mes to map legacy queue support. v2: kiq_set_resources is required. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: fix uninitialized scalar variable warningTim Huang1-1/+2
Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20drm/amdgpu: correct the KGQ fallback messagePrike Liang1-1/+1
Fix the KGQ fallback function name, as this will help differentiate the failure in the KCQ enablement. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-08Merge tag 'amd-drm-next-6.9-2024-03-01' of ↵Dave Airlie1-3/+3
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.9-2024-03-01: amdgpu: - GC 11.5.1 updates - Misc display cleanups - NBIO 7.9 updates - Backlight fixes - DMUB fixes - MPO fixes - atomfirmware table updates - SR-IOV fixes - VCN 4.x updates - use RMW accessors for pci config registers - PSR fixes - Suspend/resume fixes - RAS fixes - ABM fixes - Misc code cleanups - SI DPM fix - Revert freesync video amdkfd: - Misc cleanups - Error handling fixes radeon: - use RMW accessors for pci config registers From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240301204857.13960-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-02-22drm/amdgpu: Drop redundant parameter in amdgpu_gfx_kiq_init_ringMa Jun1-3/+3
Drop redundant parameters in function amdgpu_gfx_kiq_init_ring to simplify the code Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22Merge tag 'amd-drm-next-6.9-2024-02-19' of ↵Dave Airlie1-1/+8
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.9-2024-02-19: amdgpu: - ATHUB 4.1 support - EEPROM support updates - RAS updates - LSDMA 7.0 support - JPEG DPG support - IH 7.0 support - HDP 7.0 support - VCN 5.0 support - Misc display fixes - Retimer fixes - DCN 3.5 fixes - VCN 4.x fixes - PSR fixes - PSP 14.0 support - VA_RESERVED cleanup - SMU 13.0.6 updates - NBIO 7.11 updates - SDMA 6.1 updates - MMHUB 3.3 updates - Suspend/resume fixes - DMUB updates amdkfd: - Trap handler enhancements - Fix cache size reporting - Relocate the trap handler radeon: - fix typo in print statement Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240219214810.4911-1-alexander.deucher@amd.com
2024-02-13Revert "drm/amd: flush any delayed gfxoff on suspend entry"Mario Limonciello1-1/+8
commit ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") caused GFXOFF control to be used more heavily and the codepath that was removed from commit 0dee72639533 ("drm/amd: flush any delayed gfxoff on suspend entry") now can be exercised at suspend again. Users report that by using GNOME to suspend the lockscreen trigger will cause SDMA traffic and the system can deadlock. This reverts commit 0dee726395333fea833eaaf838bc80962df886c8. Acked-by: Alex Deucher <alexander.deucher@amd.com> Fixes: ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-13Merge tag 'amd-drm-next-6.9-2024-02-09' of ↵Dave Airlie1-2/+2
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.9-2024-02-09: amdgpu: - Validate DMABuf imports in compute VMs - Add RAS ACA framework - PSP 13 fixes - Misc code cleanups - Replay fixes - Atom interpretor PS, WS bounds checking - DML2 fixes - Audio fixes - DCN 3.5 Z state fixes - Remove deprecated ida_simple usage - UBSAN fixes - RAS fixes - Enable seq64 infrastructure - DC color block enablement - Documentation updates - DC documentation updates - DMCUB updates - S3 fixes - VCN 4.0.5 fixes - DP MST fixes - SR-IOV fixes amdkfd: - Validate DMABuf imports in compute VMs - SVM fixes - Trap handler updates radeon: - Atom interpretor PS, WS bounds checking - Misc code cleanups UAPI: - Bump KFD version so UMDs know that the fixes that enable the management of VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD are present - Add INFO query for input power. This matches the existing INFO query for average power. Used in gaming HUDs, etc. Example userspace: https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240209221459.5453-1-alexander.deucher@amd.com
2024-01-31drm/amdgpu: prefer snprintf over sprintfJani Nikula1-1/+2
This will trade the W=1 warning -Wformat-overflow to -Wformat-truncation. This lets us enable -Wformat-overflow subsystem wide. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Pan, Xinhui <Xinhui.Pan@amd.com> Cc: amd-gfx@lists.freedesktop.org Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/fea7a52924f98b1ac24f4a7e6ba21d7754422430.1704908087.git.jani.nikula@intel.com
2024-01-22drm/amdgpu: Cleanup inconsistent indenting in 'amdgpu_gfx_enable_kcq()'Srinivasan Shanmugam1-2/+2
Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:645 amdgpu_gfx_enable_kcq() warn: inconsistent indenting Cc: Le Ma <Le.Ma@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5)Victor Lu1-4/+4
amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0. Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC and amdgpu_device_xcc_wreg/rreg to use the new xcc_id parameter. Using amdgpu_sriov_runtime to determine whether to access via kiq or RLC is sufficient for now. v5: add condition in amdgpu_device_xcc_w/rreg, remove trace func call v4: avoid using amdgpu_sriov_w/rreg v3: use W/RREG32_XCC to handle non-kiq case v2: define amdgpu_device_xcc_wreg/rreg instead of changing parameters of amdgpu_device_wreg/rreg Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64Alex Deucher1-0/+2
Issues were reported with commit 1cfb4d612127 ("drm/amdgpu: put MQDs in VRAM") on an ADLINK Ampere Altra Developer Platform (AVA developer platform). Various ARM systems seem to have problems related to PCIe and MMIO access. In this case, I'm not sure if this is specific to the ADLINK platform or ARM in general. Seems to be some coherency issue with VRAM. For now, just don't put MQDs in VRAM on ARM. Link: https://lists.freedesktop.org/archives/amd-gfx/2023-October/100453.html Fixes: 1cfb4d612127 ("drm/amdgpu: put MQDs in VRAM") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: alexey.klimov@linaro.org
2023-10-19drm/amdgpu: Workaround to skip kiq ring test during ras gpu recoveryStanley.Yang1-0/+21
This is workaround, kiq ring test failed in suspend stage when do ras recovery. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20drm/amdgpu: Use function for IP version checkLijo Lazar1-2/+2
Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16drm/amd: flush any delayed gfxoff on suspend entryMario Limonciello1-8/+1
DCN 3.1.4 is reported to hang on s2idle entry if graphics activity is happening during entry. This is because GFXOFF was scheduled as delayed but RLC gets disabled in s2idle entry sequence which will hang GFX IP if not already in GFXOFF. To help this problem, flush any delayed work for GFXOFF early in s2idle entry sequence to ensure that it's off when RLC is changed. commit 4b31b92b143f ("drm/amdgpu: complete gfxoff allow signal during suspend without delay") modified power gating flow so that if called in s0ix that it ensured that GFXOFF wasn't put in work queue but instead processed immediately. This is dead code due to commit 10cb67eb8a1b ("drm/amdgpu: skip CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly called as part of the suspend entry code. Remove that dead code. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25drm/amdgpu: Add -ENOMEM error handling when there is no memorySrinivasan Shanmugam1-7/+10
Return -ENOMEM, when there is no sufficient dynamically allocated memory Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25drm/amdgpu: Return -ENOMEM when there is no memory in 'amdgpu_gfx_mqd_sw_init'Srinivasan Shanmugam1-3/+8
Return -ENOMEM, when there is no sufficient dynamically allocated memory to create MQD backup for ring Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Fix unused variable in amdgpu_gfx.cSrinivasan Shanmugam1-3/+3
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_disable_kcq’: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:497:6: warning: variable ‘j’ set but not used [-Wunused-but-set-variable] 497 | int j; | ^ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_disable_kgq’: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:528:6: warning: variable ‘j’ set but not used [-Wunused-but-set-variable] 528 | int j; | ^ drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_enable_kgq’: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:630:12: warning: variable ‘j’ set but not used [-Wunused-but-set-variable] 630 | int r, i, j; | Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix the memory override in kiq ring structShiwu Zhang1-2/+2
This is introduced by the code merge and will let the adev->gfx.kiq[0].ring struct being overrided Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix S3 issue if MQD in VRAMJack Xiao1-0/+4
1. Need flush HDP for MQD putting in vram 2. Zero out mes MQD Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add GFX RAS common functionTao Zhou1-0/+19
The common function can help reduce redundant code. v2: remove xcp operation, only need to do RAS operations for all instances. v3: remove check for GFX RAS support, will be checked in higher level. add amdgpu prefix for the function name. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Add compute mode descriptor functionLijo Lazar1-23/+1
Keep a helper function to get description of compute partition mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Move memory partition query to gmcLijo Lazar1-29/+1
GMC block handles memory related information, it makes more sense to keep memory partition functions in gmc block. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Add flags for partition mode queryLijo Lazar1-1/+2
It's not required to take lock on all cases while querying partition mode. Querying partition mode during KFD init process doesn't need to take a lock. Init process after a switch will already be happening under lock. Control the behaviour by adding flags to xcp_query_partition_mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Use unique doorbell range per xccLijo Lazar1-1/+4
Program different ranges in each XCC with MEC_DOORBELL_RANGE_LOWER/HIGHER. Keeping the same range causes CPF in other XCCs also to be busy when an IB packet is submitted to KCQ. Only the XCC which processes the packet comes back to idle afterwards and this causes other CPs not be idle. This in turn affects clockgating behavior as RLC doesn't get idle interrupt. LOWER/HIGHER covers only KIQ/KCQs which are per XCC queues. Assigning different ranges doesn't seem to have any side effect as user queue ranges are outside of this range. User queue tests - PM4 through KFD and AQL through rocr - have the same results after this change. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Switch to SOC partition funcsLijo Lazar1-27/+4
For GFXv9.4.3, use SOC level partition switch implementation rather than keeping them at GFX IP level. Change the exisiting implementation in GFX IP for keeping partition mode and restrict it to only GFX related switch. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: remove partition attributes sys file for gfx_v9_4_3Shiwu Zhang1-0/+7
For driver de-init like rmmod operations those partition specific attributes need to be removed accordingly. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix kcq mqd_backup buffer double free for multi-XCDShiwu Zhang1-1/+0
For gfx_v9_4_3 and beyond, struct kiq has its own mqd_backup pointer rather than using the last pointer from mec struct. Then the kfree operation on the pointer from the mec struct should be removed otherwise it will cause double free on the first kcq's mqd_backup buffer on XCD1. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: detect current GPU memory partition modeRajneesh Bhardwaj1-0/+25
- Add helpers to detect the current GPU memory partition. - Add current memory partition mode sysfs node. Tested-by: Ori Messinger <Ori.Messinger@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Fix failure when switching to DPX modeMukul Joshi1-1/+5
Fix the if condition which causes dynamic repartitioning to fail when trying to switch to DPX mode. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: Add device repartition supportMukul Joshi1-1/+21
GFX9.4.3 will support dynamic repartitioning of the GPU through sysfs. Add device repartitioning support in KFD to repartition GPU from one mode to other. v2: squash in fix ("drm/amdkfd: Fix warning kgd2kfd_unlock_kfd defined but not used") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Change num_xcd to xcc_maskLijo Lazar1-10/+11
Instead of number of XCCs, keep a mask of XCCs for the exact XCCs available on the ASIC. XCC configuration could differ based on different ASIC configs. v2: Rename num_xcd to num_xcc (Hawking) Use smaller xcc_mask size, changed to u16 (Le) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: introduce new doorbell assignment table for GC 9.4.3Le Ma1-5/+1
Four basic reasons as below to do the change: 1. number of ring expand a lot on GC 9.4.3, and adjustment on old assignment cannot make each ring in a continuous doorbell space. 2. the SDMA doorbell index should not exceed 0x1FF on SDMA 4.2.2 due to regDOORBELLx_CTRL_ENTRY.BIF_DOORBELLx_RANGE_OFFSET_ENTRY field width. 3. re-design the doorbell assignment and unify the calculation as "start + ring/inst id" will make the code much concise. 4. only defining the START/END makes the table look simple v2: (Lijo) 1. replace name 2. use num_inst_per_aid/sdma_doorbell_range instead of hardcoding Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: convert the doorbell_index to 2 dwords offset for kiqLe Ma1-4/+3
KIQ doorbell_index is non-zero from XCC1, thus need to left-shift it like other rings. Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>