aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2024-07-23drm/amd/amdgpu: Fix uninitialized variable warningsMa Ke1-1/+1
Return 0 to avoid returning an uninitialized variable r. Cc: [email protected] Fixes: 230dd6bb6117 ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10") Signed-off-by: Ma Ke <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: fix a possible null pointer dereferenceMa Ke1-0/+3
In amdgpu_connector_add_common_modes(), the return value of drm_cvt_mode() is assigned to mode, which will lead to a NULL pointer dereference on failure of drm_cvt_mode(). Add a check to avoid npd. Cc: [email protected] Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") Signed-off-by: Ma Ke <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: Fix atomics on GFX12David Belanger6-1/+87
If PCIe supports atomics, configure register to prevent DF from breaking atomics in separate load/store operations. Signed-off-by: David Belanger <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: Add sdma_v7_0 ip dump for devcoredumpSunil Khatri1-0/+91
Add ip dump for sdma_v7_0 for devcoredump for all instances of sdma. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu/sdma5.2: Update wptr registers as well as doorbellAlex Deucher1-0/+12
We seem to have a case where SDMA will sometimes miss a doorbell if GFX is entering the powergating state when the doorbell comes in. To workaround this, we can update the wptr via MMIO, however, this is only safe because we disallow gfxoff in begin_ring() for SDMA 5.2 and then allow it again in end_ring(). Enable this workaround while we are root causing the issue with the HW team. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440 Tested-by: Friedrich Vock <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: Remove unused codeYiPeng Chai4-124/+0
Remove unused code. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: optimize logging deferred error infoYiPeng Chai4-46/+40
1. Use pa_pfn as the radix-tree key index to log deferred error info. 2. Use local array to store a row of bad pages. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: optimize umc v12 address conversion functionYiPeng Chai1-39/+77
Split into 3 parts: 1. Convert soc physical address via ras ta. 2. Expand bad pages from soc physical address. 3. Dump bad address info. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: add print support for sdma_v_5_0 ip_dumpSunil Khatri1-0/+22
Add support for ip dump for sdma_v_5_0 in devcoredump. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: Add sdma_v5_0 ip dump for devcoredumpSunil Khatri1-0/+82
Add ip dump for sdma_v5_0 for devcoredump for all instances of sdma. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: add print support for sdma_v_6_0 ip_dumpSunil Khatri1-0/+22
Add print support for ip dump for sdma_v_6_0 in devcoredump. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: Add sdma_v6_0 ip dump for devcoredumpSunil Khatri1-0/+90
Add ip dump for sdma_v6_0 for devcoredump for all instances of sdma. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: fix the print message in devcoredumpSunil Khatri1-1/+1
Fix the memory type logged for gtt memory size which is wrongly logged as visible vram size. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: fix the extra space between two functionsSunil Khatri1-0/+1
fix extra line space between two functions. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: add print support for sdma_v_5_2 ip_dumpSunil Khatri1-0/+21
Add support for ip dump for sdma_v_5_2 in devcoredump. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-23drm/amdgpu: Add sdma_v5_2 ip dump for devcoredumpSunil Khatri2-0/+83
Add ip dump for sdma_v5_2 for devcoredump for all instances of sdma. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-16drm/amdgpu: add mutex to protect ras shared memoryYiPeng Chai3-40/+86
Add mutex to protect ras shared memory. v2: Add TA_RAS_COMMAND__TRIGGER_ERROR command call status check. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-16drm/amdgpu/vcn: not pause dpg for unified queueBoyuan Zhang1-3/+11
For unified queue, DPG pause for encoding is done inside VCN firmware, so there is no need to pause dpg based on ring type in kernel. For VCN3 and below, pausing DPG for encoding in kernel is still needed. v2: add more comments v3: update commit message Signed-off-by: Boyuan Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Ruijing Dong <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-16drm/amdgpu/vcn: identify unified queue in sw initBoyuan Zhang2-24/+16
Determine whether VCN using unified queue in sw_init, instead of calling functions later on. v2: fix coding style Signed-off-by: Boyuan Zhang <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Ruijing Dong <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-15drm/amd: Bump KMS_DRIVER_MINOR versionAurabindo Pillai1-1/+2
Increase the KMS minor version to indicate GFX12 DCC support since this contains a major change in how DCC is managed across IPs like GFX, DCN etc. This will be used mainly by userspace like Mesa to figure out DCC support on GFX12 hardware. v2: fix version number (Alex) Signed-off-by: Aurabindo Pillai <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-12drm/amdgpu/mes12: add missing opcode stringAlex Deucher1-0/+1
Fixes the indexing of the string array. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-12drm/amdgpu/mes11: update opcode stringsAlex Deucher1-0/+3
Add new packet. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-10drm/amd: Add power_saving_policy drm property to eDP connectorsMario Limonciello1-0/+4
When the `power_saving_policy` property is set to bit mask "Require color accuracy" ABM should be disabled immediately and any requests by sysfs to update will return an -EBUSY error. When the `power_saving_policy` property is set to bit mask "Require low latency" PSR should be disabled. When the property is restored to an empty bit mask ABM and PSR can be enabled again. Signed-off-by: Mario Limonciello <[email protected]> Reviewed-by: Leo Li <[email protected]> Signed-off-by: Hamza Mahfooz <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-07-10drm/amdgpu: remove exp hw support check for gfx12Alex Deucher1-2/+0
Enable it by default. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-10drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completedYiPeng Chai2-1/+23
The problem case is as follows: 1. GPU A triggers a gpu ras reset, and GPU A drives GPU B to also perform a gpu ras reset. 2. After gpu B ras reset started, gpu B queried a DE data. Since the DE data was queried in the ras reset thread instead of the page retirement thread, bad page retirement work would not be triggered. Then even if all gpu resets are completed, the bad pages will be cached in RAM until GPU B's bad page retirement work is triggered again and then saved to eeprom. This patch can save the bad pages to eeprom in time after gpu ras reset is completed. v2: 1. Add the above description to code comments. 2. Reuse existing function. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-10drm/amdgpu: flush all cached ras bad pages to eepromYiPeng Chai1-6/+29
Before uninstalling gpu driver, flush all cached ras bad pages to eeprom. v2: Put the same code into a function and reuse the function. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-10drm/amdgpu: select compute ME engines dynamicallySunil Khatri1-1/+1
GFX ME right now is one but this could change in future SOC's. Use no of ME for GFX as start point for ME for compute for GFX12. Signed-off-by: Sunil Khatri <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-10drm/amdgpu: select compute ME engines dynamicallySunil Khatri1-1/+1
GFX ME right now is one but this could change in future SOC's. Use no of ME for GFX as start point for ME for compute for GFX11. Signed-off-by: Sunil Khatri <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-10drm/amdgpu/job: Replace DRM_INFO/ERROR loggingAlex Deucher1-10/+11
Use the dev_info/err variants so we get per device logging in multi-GPU cases. Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-10drm/amdgpu: select compute ME engines dynamicallySunil Khatri1-1/+1
GFX ME right now is one but this could change in future SOC's. Use no of ME for GFX as start point for ME for compute for GFX10. Signed-off-by: Sunil Khatri <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-10drm/amdgpu: Initialize VF partition modeLijo Lazar4-12/+88
For SOCs with GFX v9.4.3, a VF may have multiple compute partitions. Fetch the partition information during init and initialize partition nodes. There is no support to switch partition mode in VF mode, hence disable the same. Signed-off-by: Lijo Lazar <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-10drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping.Gavin Wan1-7/+13
sdma has 2 instances in SRIOV cpx mode. Odd numbered VFs have sdma0/sdma1 instances. Even numbered vfs have sdma2/sdma3. For Even numbered vfs, the sdma2 & sdma3 (irq srouce id CLIENTID_SDMA2 and CLIENTID_SDMA3) should map to irq seq 0 & 1. Signed-off-by: Gavin Wan <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-09drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist movesThomas Hellström1-0/+4
To address the problem with hitches moving when bulk move sublists are lru-bumped, register the list cursors with the ttm_lru_bulk_move structure when traversing its list, and when lru-bumping the list, move the cursor hitch to the tail. This also means it's mandatory for drivers to call ttm_lru_bulk_move_init() and ttm_lru_bulk_move_fini() when initializing and finalizing the bulk move structure, so add those calls to the amdgpu- and xe driver. Compared to v1 this is slightly more code but less fragile and hopefully easier to understand. Changes in previous series: - Completely rework the functionality - Avoid a NULL pointer dereference assigning manager->mem_type - Remove some leftover code causing build problems v2: - For hitch bulk tail moves, store the mem_type in the cursor instead of with the manager. v3: - Remove leftover mem_type member from change in v2. v6: - Add some lockdep asserts (Matthew Brost) - Avoid NULL pointer dereference (Matthew Brost) - No need to check bo->resource before dereferencing bo->bulk_move (Matthew Brost) Cc: Christian König <[email protected]> Cc: Somalapuram Amaranath <[email protected]> Cc: Matthew Brost <[email protected]> Cc: <[email protected]> Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Acked-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
2024-07-08drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1Zhigang Luo1-0/+3
to avoid reading wrong WPTR from doorbell in sriov vf, set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD. Signed-off-by: Zhigang Luo <[email protected]> Acked-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: add ras event state device attribute supportYang Wang2-5/+59
add amdgpu ras 'event_state' sysfs device attribute support Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: add ras POSION_CONSUMPTION event id supportYang Wang2-3/+14
add amdgpu ras POSION_CONSUMPTION event id support. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: add ras POSION_CREATION event id supportYang Wang2-3/+15
add amdgpu ras POSION_CREATION event id support. Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: refine amdgpu ras event id core codeYang Wang4-26/+104
v1: - use unified event id to manage ras events - add a new function amdgpu_ras_query_error_status_with_event() to accept event type as parameter. v2: add a warn log to show the location of function failure when calling amdgpu_ras_mark_event(). (Tao Zhou) v3: change RAS_EVENT_TYPE_ISR to RAS_EVENT_TYPE_FATAL. v4: rename amdgpu_ras_get_recovery_event() to amdgpu_ras_get_fatal_error_event(). Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: reject gang submit on reserved VMIDsChristian König3-1/+30
A gang submit won't work if the VMID is reserved and we can't flush out VM changes from multiple engines at the same time. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: enable dpg for vcn and jpeg on GC 11_5_2Saleemkhan Jamadar1-1/+3
DPG mode is enabled for vcn and jpeg on VCN v4_0_5 Signed-off-by: Saleemkhan Jamadar <[email protected]> Reviewed-by: Tim Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: remove redundant semicolons in RAS_EVENT_LOGYang Wang1-1/+1
remove redundant semicolons in RAS_EVENT_LOG to avoid code format check warning. Fixes: b712d7c20133 ("drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG()") Signed-off-by: Yang Wang <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: restore dcc bo tilling configs while movingFrank Min3-5/+33
While moving buffer which has dcc tiling config, it is needed to restore its original dcc tiling. 1. extend copy flag to cover tiling bits 2. add logic to restore original dcc tiling config Signed-off-by: Frank Min <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: add gfx queue support for gfx12 ipdumpSunil Khatri1-0/+94
Add support of all the CP GFX queues for gfx12 ipdump to be used by devcoredump. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: add cp queue registers for gfx12 ipdumpSunil Khatri1-2/+109
Add gfx12 support of CP queue registers for all queues to be used by devcoredump. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: enable redirection of irq's for IH v7.0Sunil Khatri1-0/+15
Enable redirection of irq for pagefaults for specific clients to avoid overflow without dropping interrupts. So here we redirect the interrupts to another IH ring i.e ring1 where only these interrupts are processed. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm:amdgpu: enable IH ring1 for IH v7.0Sunil Khatri1-2/+9
We need IH ring1 for handling the pagefault interrupts which over flow in default ring for specific usecases. Enable ring1 allows software to redirect high interrupts to ring1 from default IH ring. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: Set no_hw_access when VF request full GPU failsYifan Zha1-1/+3
[Why] If VF request full GPU access and the request failed, the VF driver can get stuck accessing registers for an extended period during the unload of KMS. [How] Set no_hw_access flag when VF request for full GPU access fails This prevents further hardware access attempts, avoiding the prolonged stuck state. Signed-off-by: Yifan Zha <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: add print support for gfx12 ipdumpSunil Khatri1-0/+16
Add support of gfx12 ipdump print so devcoredump could trigger it to dump the captured registers in devcoredump. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: add gfx12 register support in ipdumpSunil Khatri1-0/+101
Add general registers of gfx12 in ipdump for devcoredump support. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Sunil Khatri <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-08drm/amdgpu: update gfxhub client id for gfx12Frank Min1-1/+21
update gfxhub client id for gfx12 Signed-off-by: Frank Min <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>