aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2024-04-26drm/amdgpu: prepare to handle pasid poison consumptionYiPeng Chai4-8/+29
Prepare to handle pasid poison consumption. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: retire bad pages for umc v12_0YiPeng Chai1-2/+57
Retire bad pages for umc v12_0. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: add condition check for amdgpu_umc_fill_error_recordYiPeng Chai3-4/+19
Add condition check for amdgpu_umc_fill_error_record. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: Add delay work to retire bad pagesYiPeng Chai4-2/+40
Add delay work to retire bad pages. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: umc v12_0 logs ecc errorsYiPeng Chai3-2/+113
1. umc v12_0 logs ecc errors. 2. Reserve newly detected ecc error pages. 3. Add tag for bad pages, so that they can be retired later. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: umc v12_0 converts error addressYiPeng Chai2-1/+105
Umc v12_0 converts error address. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: add interface to update umc v12_0 ecc statusYiPeng Chai5-0/+44
Add interface to update umc v12_0 ecc status. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: add poison creation handlerYiPeng Chai1-7/+69
Add poison creation handler. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: prepare for logging ecc errorsYiPeng Chai2-0/+55
Prepare for logging ecc errors. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: add message fifo to handle RAS poison eventsYiPeng Chai2-0/+53
Add message fifo to handle RAS poison events. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_relocJesse Zhang1-1/+2
Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x03000001. V2: To really improve the handling we would actually need to have a separate value of 0xffffffff.(Christian) Signed-off-by: Jesse Zhang <[email protected]> Suggested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu/mes11: Use a separate fence per transactionAlex Deucher3-4/+33
We can't use a shared fence location because each transaction should be considered independently. Reviewed-by: Shaoyun.liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: add a spinlock to wb allocationAlex Deucher2-1/+11
As we use wb slots more dynamically, we need to lock access to avoid racing on allocation or free. Reviewed-by: Shaoyun.liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: update fw_share for VCN5Sonny Jiang3-8/+21
kmd_fw_shared changed in VCN5 Signed-off-by: Sonny Jiang <[email protected]> Reviewed-by: Ruijing Dong <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: Fix VRAM memory accountingMukul Joshi1-1/+1
Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: update jpeg max decode resolutionSathishkumar S3-7/+7
jpeg ip version v2.1 and higher supports 16kx16k resolution decode Signed-off-by: Sathishkumar S <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: add ip dump for each ip in devcoredumpSunil Khatri1-0/+14
Add ip dump for each ip of the asic in the devcoredump for all the ips where a callback is registered for register dump. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: dump ip state before reset for each ipSunil Khatri1-0/+7
Invoke the dump_ip_state function for each ip before the asic resets and save the register values for debugging via devcoredump. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: add support for gfx v10 printSunil Khatri1-1/+16
Add support to print ip information to be used to print registers in devcoredump buffer. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: add protype for print ip stateSunil Khatri59-0/+61
Add the protoype for print ip state to be used to print the registers in devcoredump during a gpu reset. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: add support of gfx10 register dumpSunil Khatri4-1/+143
Adding gfx10 gc registers to be used for register dump via devcoredump during a gpu reset. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: add prototype for ip dumpSunil Khatri59-0/+61
Add the prototype to dump ip registers for all ips of different asics and set them to NULL for now. Based on the requirement add a function pointer for each of them. Signed-off-by: Sunil Khatri <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: Add interface to reserve bad pageYiPeng Chai2-0/+23
Add interface to reserve bad page. Signed-off-by: YiPeng Chai <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Tao Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: Fix uninitialized variable warningsMa Jun2-2/+2
return 0 to avoid returning an uninitialized variable r Signed-off-by: Ma Jun <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu/mes: fix use-after-free issueJack Xiao1-0/+1
Delete fence fallback timer to fix the ramdom use-after-free issue. v2: move to amdgpu_mes.c Signed-off-by: Jack Xiao <[email protected]> Acked-by: Lijo Lazar <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3Alex Deucher1-11/+15
This avoids a potential conflict with firmwares with the newer HDP flush mechanism. Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: Update CGCG settings for GFXIP 9.4.3Rajneesh Bhardwaj1-4/+4
Tune coarse grain clock gating idle threshold and rlc idle timeout to achieve better kernel launch latency. Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: replace tmz flag into buffer flagFrank Min14-34/+40
Replace tmz flag into buffer flag to make it easier to understand and extend Signed-off-by: Likun Gao <[email protected]> Signed-off-by: Frank Min <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: init microcode chip name from ip versionsLe Ma1-4/+4
To adapt to different gc versions in gfx_v9_4_3.c file. Signed-off-by: Le Ma <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu: Fix the ring buffer size for queue VM flushPrike Liang3-6/+2
Here are the corrections needed for the queue ring buffer size calculation for the following cases: - Remove the KIQ VM flush ring usage. - Add the invalidate TLBs packet for gfx10 and gfx11 queue. - There's no VM flush and PFP sync, so remove the gfx9 real ring and compute ring buffer usage. Signed-off-by: Prike Liang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-26drm/amdgpu/umsch: don't execute umsch test when GPU is in reset/suspendLang Yu1-0/+3
umsch test needs full GPU functionality(e.g., VM update, TLB flush, possibly buffer moving under memory pressure) which may be not ready under these states. Just skip it to avoid potential issues. Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Veerabadhran Gopalakrishnan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-25drm/amdgpu: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACYDamien Le Moal1-1/+1
Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Alex Deucher <[email protected]>
2024-04-23drm/amdgpu/mes: fix use-after-free issueJack Xiao1-0/+1
Delete fence fallback timer to fix the ramdom use-after-free issue. v2: move to amdgpu_mes.c Signed-off-by: Jack Xiao <[email protected]> Acked-by: Lijo Lazar <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-23drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3Alex Deucher1-11/+15
This avoids a potential conflict with firmwares with the newer HDP flush mechanism. Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-23drm/amdgpu: Fix the ring buffer size for queue VM flushPrike Liang3-6/+2
Here are the corrections needed for the queue ring buffer size calculation for the following cases: - Remove the KIQ VM flush ring usage. - Add the invalidate TLBs packet for gfx10 and gfx11 queue. - There's no VM flush and PFP sync, so remove the gfx9 real ring and compute ring buffer usage. Signed-off-by: Prike Liang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-23drm/amdgpu/umsch: don't execute umsch test when GPU is in reset/suspendLang Yu1-0/+3
umsch test needs full GPU functionality(e.g., VM update, TLB flush, possibly buffer moving under memory pressure) which may be not ready under these states. Just skip it to avoid potential issues. Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Veerabadhran Gopalakrishnan <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-23drm/amdgpu: Update BO eviction prioritiesFelix Kuehling1-0/+2
Make SVM BOs more likely to get evicted than other BOs. These BOs opportunistically use available VRAM, but can fall back relatively seamlessly to system memory. It also avoids SVM migrations evicting other, more important BOs as they will evict other SVM allocations first. Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Mukul Joshi <[email protected]> Tested-by: Mukul Joshi <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-23drm/amdgpu/vpe: fix vpe dpm setup failedPeyton Lee2-8/+8
The vpe dpm settings should be done before firmware is loaded. Otherwise, the frequency cannot be successfully raised. Signed-off-by: Peyton Lee <[email protected]> Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-23drm/amdgpu: Assign correct bits for SDMA HDP flushLijo Lazar1-1/+2
HDP Flush request bit can be kept unique per AID, and doesn't need to be unique SOC-wide. Assign only bits 10-13 for SDMA v4.4.2. Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-23drm/amdkfd: make sure VM is ready for updating operationsLang Yu1-14/+20
When page table BOs were evicted but not validated before updating page tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY and restore_process_worker runs into a dead loop. v2: Split the BO validation and page table update into two separate loops in amdgpu_amdkfd_restore_process_bos. (Felix) 1.Validate BOs 2.Validate VM (and DMABuf attachments) 3.Update page tables for the BOs validated above Fixes: 50661eb1a2c8 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Signed-off-by: Lang Yu <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-23drm/amdgpu: Fix leak when GPU memory allocation failsMukul Joshi1-0/+1
Free the sync object if the memory allocation fails for any reason. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2024-04-23drm/amdgpu: Update BO eviction prioritiesFelix Kuehling1-0/+2
Make SVM BOs more likely to get evicted than other BOs. These BOs opportunistically use available VRAM, but can fall back relatively seamlessly to system memory. It also avoids SVM migrations evicting other, more important BOs as they will evict other SVM allocations first. Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Mukul Joshi <[email protected]> Tested-by: Mukul Joshi <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-23drm/amdgpu/vcn: fix unitialized variable warningsPierre-Eric Pelloux-Prayer4-0/+4
Avoid returning an uninitialized value if we never enter the loop. This case should never be hit in practice, but returning 0 doesn't hurt. The same fix is applied to the 4 places using the same pattern. v2: - fixed typos in commit message (Alex) - use "return 0;" before the done label instead of initializing r to 0 Signed-off-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-23drm/amdgpu/mes11: print MES opcodes rather than numbersAlex Deucher1-4/+74
Makes it easier to review the logs when there are MES errors. v2: use dbg for emitted, add helpers for fetching strings v3: fix missing commas (Harish) v4: drop command prefixes (Felix) v5: squash in bounds fix (Jun) Acked-by: Felix Kuehling <[email protected]> Reviewed by Shaoyun.liu <[email protected]> (v2) Signed-off-by: Alex Deucher <[email protected]>
2024-04-23drm/amdgpu/vpe: fix vpe dpm setup failedPeyton Lee2-8/+8
The vpe dpm settings should be done before firmware is loaded. Otherwise, the frequency cannot be successfully raised. Signed-off-by: Peyton Lee <[email protected]> Reviewed-by: Lang Yu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-23drm/amdgpu: Assign correct bits for SDMA HDP flushLijo Lazar1-1/+2
HDP Flush request bit can be kept unique per AID, and doesn't need to be unique SOC-wide. Assign only bits 10-13 for SDMA v4.4.2. Signed-off-by: Lijo Lazar <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-23drm/amdgpu: Support setting reset_method at runtimeStanley.Yang1-1/+1
In order to support more test cases, support user change reset_method at runtime. Signed-off-by: Stanley.Yang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-04-23Merge drm/drm-next into drm-misc-nextMaxime Ripard80-936/+1940
Maíra needs a backmerge to apply v3d patches, and Danilo for some nouveau patches. Signed-off-by: Maxime Ripard <[email protected]>
2024-04-22drm/amdgpu: Enable clear page functionalityArunpravin Paneer Selvam6-9/+116
Add clear page support in vram memory region. v1(Christian): - Dont handle clear page as TTM flag since when moving the BO back in from GTT again we don't need that. - Make a specialized version of amdgpu_fill_buffer() which only clears the VRAM areas which are not already cleared - Drop the TTM_PL_FLAG_WIPE_ON_RELEASE check in amdgpu_object.c v2: - Modify the function name amdgpu_ttm_* (Alex) - Drop the delayed parameter (Christian) - handle amdgpu_res_cleared(&cursor) just above the size calculation (Christian) - Use AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE for clearing the buffers in the free path to properly wait for fences etc.. (Christian) v3(Christian): - Remove buffer clear code in VRAM manager instead change the AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE handling to set the DRM_BUDDY_CLEARED flag. - Remove ! from amdgpu_res_cleared(&cursor) check. v4(Christian): - vres flag setting move to vram manager file - use dma_fence_get_stub in amdgpu_ttm_clear_buffer function - make fence a mandatory parameter and drop the if and the get/put dance Signed-off-by: Arunpravin Paneer Selvam <[email protected]> Suggested-by: Christian König <[email protected]> Acked-by: Felix Kuehling <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
2024-04-22drm/buddy: Implement tracking clear page featureArunpravin Paneer Selvam1-3/+3
- Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. - Add a function to support defragmentation. v1: - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list(Christian) - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)(Christian) - Defragment the memory beginning from min_order till the required memory space is available. v2: (Matthew) - Add a wrapper drm_buddy_free_list_internal for the freeing of blocks operation within drm buddy. - Write a macro block_incompatible() to allocate the required blocks. - Update the xe driver for the drm_buddy_free_list change in arguments. - add a warning if the two blocks are incompatible on defragmentation - call full defragmentation in the fini() function - place a condition to test if min_order is equal to 0 - replace the list with safe_reverse() variant as we might remove the block from the list. v3: - fix Gitlab user reported lockup issue. - Keep DRM_BUDDY_HEADER_CLEAR define sorted(Matthew) - modify to pass the root order instead max_order in fini() function(Matthew) - change bool 1 to true(Matthew) - add check if min_block_size is power of 2(Matthew) - modify the min_block_size datatype to u64(Matthew) v4: - rename the function drm_buddy_defrag with __force_merge. - Include __force_merge directly in drm buddy file and remove the defrag use in amdgpu driver. - Remove list_empty() check(Matthew) - Remove unnecessary space, headers and placement of new variables(Matthew) - Add a unit test case(Matthew) v5: - remove force merge support to actual range allocation and not to bail out when contains && split(Matthew) - add range support to force merge function. v6: - modify the alloc_range() function clear page non merged blocks allocation(Matthew) - correct the list_insert function name(Matthew). Signed-off-by: Arunpravin Paneer Selvam <[email protected]> Signed-off-by: Matthew Auld <[email protected]> Suggested-by: Christian König <[email protected]> Suggested-by: Matthew Auld <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>