aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
AgeCommit message (Collapse)AuthorFilesLines
2023-06-15drm/amdgpu: add VM generation tokenChristian König1-1/+1
Instead of using the VRAM lost counter add a 64bit token which indicates if a context or job is still valid to use. Should the VRAM be lost or the page tables need re-creation the token will change indicating that userspace needs to act and re-create the contexts and re-submit the work. Signed-off-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2023-04-24drm/amdgpu: add gfx shadow CS IOCTL supportChristian König1-0/+6
Add support for submitting the shadow update packet when submitting an IB. Needed for MCBP on GFX11. v2: update API for CSA (Alex) v3: fix ordering; SET_Q_PREEMPTION_MODE most come before COND_EXEC Add missing check for AMDGPU_CHUNK_ID_CP_GFX_SHADOW in amdgpu_cs_pass1() Only initialize shadow on first use (Alex) v4: Pass parameters rather than job to new ring callback (Alex) v5: squash in change to call SET_Q_PREEMPTION_MODE/COND_EXEC before RELEASE_MEM to complete the UMDs use of the shadow (Alex) Reviewed-by: Christian König <[email protected]> Signed-off-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-12-14drm/amdgpu: cleanup SPM support a bitChristian König1-0/+1
This should probably not access job->vm and also emit the SPM switch under the conditional execute. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-12-14drm/amdgpu: fix GDS/GWS/OA switch handlingChristian König1-0/+1
Bas pointed out that this isn't working as expected and could cause crashes. Fix the handling by storing the marker that a switch is needed inside the job instead. Reported-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-11-03drm/amdgpu: use scheduler dependencies for CSChristian König1-1/+0
Entirely remove the sync obj in the job. Signed-off-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-11-03drm/amdgpu: move explicit sync check into the CSChristian König1-1/+1
This moves the memory allocation out of the critical code path. Signed-off-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-11-03drm/amdgpu: cleanup scheduler job initialization v2Christian König1-6/+8
Init the DRM scheduler base class while allocating the job. This makes the whole handling much more cleaner. v2: fix coding style Signed-off-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-09-20drm/amdgpu: add gang submit backend v2Christian König1-0/+3
Allows submitting jobs as gang which needs to run on multiple engines at the same time. Basic idea is that we have a global gang submit fence representing when the gang leader is finally pushed to run on the hardware last. Jobs submitted as gang are never re-submitted in case of a GPU reset since this won't work and will just deadlock the hardware immediately again. v2: fix logic inversion, improve documentation, fix rcu Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-09-19drm/amdgpu: move entity selection and job init earlier during CSChristian König1-0/+5
Initialize the entity for the CS and scheduler job much earlier. v2: fix job initialisation order and use correct scheduler instance Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-09-13drm/amdgpu: move setting the job resourcesChristian König1-0/+2
Move setting the job resources into amdgpu_job.c Signed-off-by: Christian König <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-07-18drm/amdgpu: Get rid of amdgpu_job->external_hw_fenceAndrey Grodzovsky1-1/+0
This is a follow-up cleanup to [1]. See bellow refcount balancing for calling amdgpu_job_submit_direct after this cleanup as far as I calculated. amdgpu_fence_emit dma_fence_init 1 dma_fence_get(fence) 2 rcu_assign_pointer(*ptr, dma_fence_get(fence) 3 ---> amdgpu_job_submit_direct completes before fence signaled amdgpu_sa_bo_free (*sa_bo)->fence = dma_fence_get(fence) 4 amdgpu_job_free dma_fence_put 3 amdgpu_vcn_enc_get_destroy_msg *fence = dma_fence_get(f) 4 dma_fence_put(f); 3 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 2 amdgpu_fence_process dma_fence_put 1 amdgpu_sa_bo_remove_locked dma_fence_put 0 ---> amdgpu_job_submit_direct completes after fence signaled amdgpu_fence_process dma_fence_put 2 amdgpu_job_free dma_fence_put 1 amdgpu_vcn_enc_get_destroy_msg *fence = dma_fence_get(f) 2 dma_fence_put(f); 1 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 0 [1] - https://patchwork.kernel.org/project/dri-devel/cover/[email protected]/ Signed-off-by: Andrey Grodzovsky <[email protected]> Suggested-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-03-04drm/amdgpu: properly embed the IBs into the jobChristian König1-2/+4
We now have standard macros for that. Signed-off-by: Christian König <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2022-03-04drm/amdgpu: header cleanupChristian König1-0/+3
No function change, just move a bunch of definitions from amdgpu.h into separate header files. Signed-off-by: Christian König <[email protected]> Acked-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-08-16drm/amd/amdgpu embed hw_fence into amdgpu_jobJack Zhang1-1/+5
Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime, and simplify the design of gpu-scheduler. How: We propose to embed hw_fence into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. v2: use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is embedded in a job. v3: remove redundant variable ring in amdgpu_job v4: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. v5 add missing handling in amdgpu_fence_enable_signaling Signed-off-by: Jingwen Chen <[email protected]> Signed-off-by: Jack Zhang <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Reviewed by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-28drm/amdgpu: Move to a per-IB secure flag (TMZ)Luben Tuikov1-3/+0
Move from a per-CS secure flag (TMZ) to a per-IB secure flag. Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-28drm/amdgpu: job is secure iff CS is secure (v5)Huang Rui1-0/+2
Mark a job as secure, if and only if the command submission flag has the secure flag set. v2: fix the null job pointer while in vmid 0 submission. v3: Context --> Command submission. v4: filling cs parser with cs->in.flags v5: move the job secure flag setting out of amdgpu_cs_submit() Signed-off-by: Huang Rui <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-04-01drm/amdgpu: implement more ib pools (v2)xinhui pan1-2/+2
We have three ib pools, they are normal, VM, direct pools. Any jobs which schedule IBs without dependence on gpu scheduler should use DIRECT pool. Any jobs schedule direct VM update IBs should use VM pool. Any other jobs use NORMAL pool. v2: squash in coding style fix Signed-off-by: xinhui pan <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-01-16drm/amdgpu: drop amdgpu_job.ownerChristian König1-1/+0
Entirely unused. Signed-off-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-09-13drm/amdgpu: Avoid HW GPU reset for RAS.Andrey Grodzovsky1-0/+3
Problem: Under certain conditions, when some IP bocks take a RAS error, we can get into a situation where a GPU reset is not possible due to issues in RAS in SMU/PSP. Temporary fix until proper solution in PSP/SMU is ready: When uncorrectable error happens the DF will unconditionally broadcast error event packets to all its clients/slave upon receiving fatal error event and freeze all its outbound queues, err_event_athub interrupt will be triggered. In such case and we use this interrupt to issue GPU reset. THe GPU reset code is modified for such case to avoid HW reset, only stops schedulers, deatches all in progress and not yet scheduled job's fences, set error code on them and signals. Also reject any new incoming job submissions from user space. All this is done to notify the applications of the problem. v2: Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c Remove print param from amdgpu_ras_query_error_count v3: Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset for other XGMI hive memebers. Signed-off-by: Andrey Grodzovsky <[email protected]> Acked-by: Felix Kuehling <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-06-21drm/amdgpu: add ib preemption status in amdgpu_job (v2)Jack Xiao1-0/+3
Add ib preemption status in amdgpu_job, so that ring level function can detect preemption and program for resuming it. v2: squash in fix to restore job->preamble_status back to status value (Jack) Acked-by: Alex Deucher <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Jack Xiao <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-11-05drm/amdgpu: Modify the argument of emit_ib interfaceRex Zhu1-0/+2
use the point of struct amdgpu_job as the function argument instand of vmid, so the other members of struct amdgpu_job can be visit in emit_ib function. v2: add a wrapper for getting the VMID add the job before the ib on the parameter list. v3: refine the wrapper name Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Rex Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-07-17drm/amdgpu: remove job->adev (v2)Christian König1-1/+0
We can get that from the ring. v2: squash in "drm/amdgpu: always initialize job->base.sched" (Alex) Signed-off-by: Christian König <[email protected]> Reviewed-by: Junwei Zhang <[email protected]> Acked-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-07-16drm/amdgpu: add amdgpu_job_submit_direct helperChristian König1-0/+4
Make sure that we properly initialize at least the sched member. Signed-off-by: Christian König <[email protected]> Reviewed-by: Junwei Zhang <[email protected]> Acked-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-07-16drm/amdgpu: remove job->ringChristian König1-1/+0
We can easily get that from the scheduler. Signed-off-by: Christian König <[email protected]> Reviewed-by: Junwei Zhang <[email protected]> Acked-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-07-16drm/amdgpu: remove ring parameter from amdgpu_job_submitChristian König1-3/+2
We know the ring through the entity anyway. Signed-off-by: Christian König <[email protected]> Reviewed-by: Junwei Zhang <[email protected]> Acked-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-07-16drm/amdgpu: remove fence context from the jobChristian König1-1/+0
Can be obtained directly from the fence as well. Signed-off-by: Christian König <[email protected]> Reviewed-by: Junwei Zhang <[email protected]> Acked-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-07-16drm/amdgpu: cleanup job headerChristian König1-0/+74
Move job related defines, structure and function declarations to amdgpu_job.h Signed-off-by: Christian König <[email protected]> Reviewed-by: Junwei Zhang <[email protected]> Acked-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>