aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
AgeCommit message (Collapse)AuthorFilesLines
2017-05-24drm/amdgpu/SRIOV:implement guilty job TDR for(V2)Monk Liu1-0/+6
1,TDR will kickout guilty job if it hang exceed the threshold of the given one from kernel paramter "job_hang_limit", that way a bad command stream will not infinitly cause GPU hang. by default this threshold is 1 so a job will be kicked out after it hang. 2,if a job timeout TDR routine will not reset all sched/ring, instead if will only reset on the givn one which is indicated by @job of amdgpu_sriov_gpu_reset, that way we don't need to reset and recover each sched/ring if we already know which job cause GPU hang. 3,unblock sriov_gpu_reset for AI family. V2: 1:put kickout guilty job after sched parked. 2:since parking scheduler prior to kickout already occupies a while, we can do last check on the in question job before doing hw_reset. TODO: 1:when a job is considered as guilty, we should mark some flag in its fence status flag, and let UMD side aware that this fence signaling is not due to job complete but job hang. 2:if gpu reset cause all video memory lost, we need introduce a new policy to implement TDR, like drop all jobs not yet signaled, and all IOCTL on this device will return ERROR DEVICE_LOST. this will be implemented later. Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-05-24drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset (v2)Monk Liu1-0/+6
because we don't want to do sriov-gpu-reset under certain cases, so just split those two funtion and don't invoke sr-iov one from bare-metal one. V2: remove debugfs_gpu_reset routine on SRIOV case. Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-11-11drm/amdgpu:no gpu scheduler for KIQTrigger Huang1-18/+21
KIQ is used for interaction between driver and CP, and not exposed to outside client, as such it doesn't need to be handled by GPU scheduler. Signed-off-by: Monk Liu <[email protected]> Signed-off-by: Xiangliang Yu <[email protected]> Signed-off-by: Trigger Huang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-11-07Backmerge tag 'v4.9-rc4' into drm-nextDave Airlie1-0/+1
Linux 4.9-rc4 This is needed for nouveau development.
2016-10-25dma-buf: Rename struct fence to dma_fenceChris Wilson1-29/+29
I plan to usurp the short name of struct fence for a core kernel struct, and so I need to rename the specialised fence/timeline for DMA operations to make room. A consensus was reached in https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html that making clear this fence applies to DMA operations was a good thing. Since then the patch has grown a bit as usage increases, so hopefully it remains a good thing! (v2...: rebase, rerun spatch) v3: Compile on msm, spotted a manual fixup that I broke. v4: Try again for msm, sorry Daniel coccinelle script: @@ @@ - struct fence + struct dma_fence @@ @@ - struct fence_ops + struct dma_fence_ops @@ @@ - struct fence_cb + struct dma_fence_cb @@ @@ - struct fence_array + struct dma_fence_array @@ @@ - enum fence_flag_bits + enum dma_fence_flag_bits @@ @@ ( - fence_init + dma_fence_init | - fence_release + dma_fence_release | - fence_free + dma_fence_free | - fence_get + dma_fence_get | - fence_get_rcu + dma_fence_get_rcu | - fence_put + dma_fence_put | - fence_signal + dma_fence_signal | - fence_signal_locked + dma_fence_signal_locked | - fence_default_wait + dma_fence_default_wait | - fence_add_callback + dma_fence_add_callback | - fence_remove_callback + dma_fence_remove_callback | - fence_enable_sw_signaling + dma_fence_enable_sw_signaling | - fence_is_signaled_locked + dma_fence_is_signaled_locked | - fence_is_signaled + dma_fence_is_signaled | - fence_is_later + dma_fence_is_later | - fence_later + dma_fence_later | - fence_wait_timeout + dma_fence_wait_timeout | - fence_wait_any_timeout + dma_fence_wait_any_timeout | - fence_wait + dma_fence_wait | - fence_context_alloc + dma_fence_context_alloc | - fence_array_create + dma_fence_array_create | - to_fence_array + to_dma_fence_array | - fence_is_array + dma_fence_is_array | - trace_fence_emit + trace_dma_fence_emit | - FENCE_TRACE + DMA_FENCE_TRACE | - FENCE_WARN + DMA_FENCE_WARN | - FENCE_ERR + DMA_FENCE_ERR ) ( ... ) Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Gustavo Padovan <[email protected]> Acked-by: Sumit Semwal <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2016-10-24drm/amdgpu: fix fence slab teardownGrazvydas Ignotas1-0/+1
To free fences, call_rcu() is used, which calls amdgpu_fence_free() after a grace period. During teardown, there is no guarantee all callbacks have finished, so amdgpu_fence_slab may be destroyed before all fences have been freed. If we are lucky, this results in some slab warnings, if not, we get a crash in one of rcu threads because callback is called after amdgpu has already been unloaded. Fix it with a rcu_barrier(). Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release") Acked-by: Chunming Zhou <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Grazvydas Ignotas <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-09-27drm/amdgpu: don't leave dangling pointers aroundGrazvydas Ignotas1-0/+1
Right now it's possible to trigger fence_drv.fences[] dereference after the array has been freed. While the real problem is elsewhere, this still results in confusing errors that depend on how the freed memory was reused (I've seen "kernel tried to execute NX-protected page"), it's better to clear them and get NULL dereference so that it's obvious what's going wrong. Signed-off-by: Grazvydas Ignotas <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-07-14drm/amdgpu: always signal all fencesChristian König1-2/+5
A little fallout from "drm/amdgpu: sanitize fence numbers", we sometimes need to signal all fences in the ring. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-07-14drm/amdgpu: sanitize fence numbersChristian König1-2/+8
Looks like the VCE block sometimes still sends nonsense fence numbers on startup. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-05-25drm/amdgpu: fix bug in fence driver finiMonk Liu1-1/+1
Using wrong counter for walking fences. Fixes a crash when unloading the driver. Signed-off-by: Monk Liu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-05-13drm/amdgpu: create fence slab once when amdgpu module init.Rex Zhu1-11/+14
This avoids problems with multiple GPUs. For example, if the first GPU failed before amdgpu_fence_init() was called, amdgpu_fence_slab_ref is still 0 and it will get decremented in amdgpu_fence_driver_fini(). This will lead to a crash during init of the second GPU since amdgpu_fence_slab_ref is not 0. v2: add functions for init/exit instead of moving the variables into the driver. Signed-off-by: Rex Zhu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-05-04drm/amdgpu: Replace rcu_assign_pointer() with RCU_INIT_POINTER()Muhammad Falak R Wani1-1/+1
The rcu_assign_pointer() ensures that the initialization of a structure is carried out before storing a pointer to that structre. It is always safe to use RCU_INIT_POINTER() to NULL a pointer, instead of rcu_assign_pointer(). This results in slightly smaller/faster code. The following semantic patch was used: <smpl> @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) </smpl> Reviewed-by: Christian König <[email protected]> Signed-off-by: Muhammad Falak R Wani <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-05-04drm/amdgpu: double fence slotChunming Zhou1-2/+2
we introduced vmid fence, so one hw submission could produce two fences. Signed-off-by: Chunming Zhou <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-05-04drm/amdgpu: Mark all instances of struct drm_info_list as constNils Wallménius1-1/+1
All these are compile time constand and the drm_debugfs_create/remove_files functions take a const pointer argument. Reviewed-by: Christian König <[email protected]> Signed-off-by: Nils Wallménius <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-04-01drm/amdgpu: fence wait old rcu slotChunming Zhou1-2/+6
since the rcu slot was initialized to be num_hw_submission, if command submission doesn't use scheduler, this limitation will be invalid like uvd test. Signed-off-by: Chunming Zhou <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-03-16drm/amdgpu: switch back to 32bit hw fences v2Christian König1-30/+19
We don't need to extend them to 64bits any more, so avoid the extra overhead. v2: update commit message. Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Chunming Zhou <[email protected]>
2016-03-16drm/amdgpu: remove amdgpu_fence_is_signaledChristian König1-25/+0
It's just overhead to check the fence value when we signal them directly anyway. Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Chunming Zhou <[email protected]>
2016-03-16drm/amdgpu: drop the extra fence range check v2Christian König1-3/+0
Amdgpu doesn't support using scratch registers for fences any more. So we won't see values like 0xdeadbeef as fence value any more. v2: reschedule timer even if no change detected Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Chunming Zhou <[email protected]>
2016-03-16drm/amdgpu: signal fences directly in amdgpu_fence_processChristian König1-67/+30
Because of the scheduler we need to signal all fences immediately anyway, so try to avoid the waitqueue overhead. Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Chunming Zhou <[email protected]>
2016-03-16drm/amdgpu: cleanup amdgpu_fence_wait_empty v2Christian König1-54/+15
Just wait for last fence instead of waiting for the sequence manually. v2: don't use amdgpu_sched_jobs for the mask Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Chunming Zhou <[email protected]>
2016-03-16drm/amdgpu: keep all fences in an RCU protected array v2Christian König1-3/+29
Just keep all HW fences in a RCU protected array as a first step to replace the wait queue. v2: update commit message, move fixes into separate patch. Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Chunming Zhou <[email protected]>
2016-03-16drm/amdgpu: add number of hardware submissions to amdgpu_fence_driver_init_ringChristian König1-2/+8
Make this a parameter instead of using the global variable directly. Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Chunming Zhou <[email protected]>
2016-03-16drm/amdgpu: RCU protected amdgpu_fence_releaseChristian König1-1/+22
Fences must be freed RCU protected, otherwise the reservation_object_*_rcu() functions can run into problems. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
2016-03-16drm/amdgpu: merge amdgpu_fence_process and _activityChristian König1-19/+5
No need to keep the two separate any more. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]>
2016-03-16drm/amdgpu: cleanup amdgpu_fence_activityChristian König1-32/+3
The comment about the loop counter was never valid, even when you have multiple threads this loop only runs as long as the sequence increases. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]>
2016-03-14drm/amdgpu: move fence structure into amdgpu_fence.cChristian König1-1/+25
No need to have that in the header file any more. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-03-14drm/amdgpu: remove amdgpu_fence_wait_nextChristian König1-20/+0
Not used any more. Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-03-08drm/amdgpu: return the common fence from amdgpu_fence_emitChristian König1-13/+14
Try to avoid using the hardware specific fences even more. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Chunming Zhou <[email protected]>
2016-03-08drm/amdgpu: remove HW fence ownerChristian König1-5/+1
Not used any more since we now always use the sheduler. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Chunming Zhou <[email protected]>
2016-02-16drm/amdgpu: remove fence reset detection leftoversChristian König1-10/+4
wait_event() never returns before the fence was signaled. Signed-off-by: Christian König <[email protected]> Acked-by: Alex Deucher <[email protected]>
2016-02-10drm/amdgpu: remove the ring lock v2Christian König1-6/+0
It's not needed any more because all access goes through the scheduler now. v2: Update commit message. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-02-10drm/amdgpu: add a debugfs property to trigger a GPU resetAlex Deucher1-1/+19
Ported from similar code in radeon. Reviewed-by: Junwei Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Ken Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-02-10drm/amdgpu: cleanup sync_seq handlingChristian König1-83/+11
Not used any more without semaphores Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2016-02-10drm/amdgpu: clean up non-scheduler code path (v2)Chunming Zhou1-20/+19
Non-scheduler code is longer supported. v2: agd: rebased on upstream Signed-off-by: Chunming Zhou <[email protected]> Reviewed-by: Ken Wang <[email protected]> Reviewed-by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2015-11-16drm/amdgpu: add kmem cache for amdgpu fenceChunming Zhou1-2/+20
Change-Id: I5ad8dd156ccf27a6f18004aa0a215a0925b6e67b Signed-off-by: Chunming Zhou <[email protected]> Reviewed-by: Christian König <[email protected]>
2015-11-16drm/amdgpu: use a timer for fence fallbackChristian König1-44/+34
Less overhead than a work item and also adds proper cleanup handling. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Acked-by: Alex Deucher <[email protected]>
2015-11-16drm/amdgpu: remove fence trace pointsChristian König1-1/+0
Mostly unused and replaced by the common trace points. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Acked-by: Alex Deucher <[email protected]>
2015-11-04drm/amdgpu: group together common fence implementationChristian König1-97/+109
And also add some missing function documentation. No functional change. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
2015-11-04drm/amdgpu: fix fence fallback checkChristian König1-0/+2
Interrupts are notorious unreliable, enable the fallback at a couple of more places. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
2015-10-30drm/amdgpu: remove amdgpu_fence_ref/unrefChristian König1-30/+0
Just move the remaining users to fence_put/get. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
2015-10-30drm/amdgpu: switch to common fence_wait_any_timeout v2Christian König1-98/+0
No need to duplicate the functionality any more. v2: fix handling if no fence is available. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> (v1)
2015-10-30drm/amdgpu: remove unneeded fence functionsChristian König1-12/+1
amdgpu_fence_default_wait isn't needed any more the default wait does the same thing and amdgpu_test_signaled is dead as well. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
2015-10-21drm/amdgpu: remove the unnecessary parameter adev for amdgpu_fence_wait_any()Junwei Zhang1-7/+2
Signed-off-by: Junwei Zhang <[email protected]> Reviewed-by: Christian König <[email protected]>
2015-10-21drm/amdgpu: remove the exclusive lockChristian König1-16/+5
Finally getting rid of it. Signed-off-by: Christian König <[email protected]>
2015-10-21drm/amdgpu: remove old lockup detection infrastructureChristian König1-19/+1
It didn't worked to well anyway. Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Reviewed-by: Junwei Zhang <[email protected]>
2015-10-14drm/amdgpu: add timer to fence to detect scheduler lockupJunwei Zhang1-1/+13
Change-Id: I67e987db0efdca28faa80b332b75571192130d33 Signed-off-by: Junwei Zhang <[email protected]> Reviewed-by: David Zhou <[email protected]> Reviewed-by: Christian König <[email protected]>
2015-09-23drm/amdgpu: more scheduler cleanups v2Christian König1-11/+12
Embed the scheduler into the ring structure instead of allocating it. Use the ring name directly instead of the id. v2: rebased, whitespace cleanup Signed-off-by: Christian König <[email protected]> Reviewed-by: Junwei Zhang <[email protected]> Reviewed-by: Chunming Zhou<[email protected]>
2015-09-23drm/amdgpu: cleanup fence queue init v2Christian König1-0/+2
Move the fence related stuff into amdgpu_fence.c v2: rework commit message, cause this is actually not a bug Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou<[email protected]> Reviewed-by: Junwei Zhang <[email protected]>
2015-09-23drm/amdgpu: rename fence->scheduler to sched v2Christian König1-7/+7
Just to be consistent with the other members. v2: rename the ring member as well. Signed-off-by: Christian König <[email protected]> Reviewed-by: Junwei Zhang <[email protected]> (v1) Reviewed-by: Chunming Zhou<[email protected]>
2015-09-02drm/amdgpu: partially revert "modify amdgpu_fence_wait_any() to ↵Christian König1-35/+9
amdgpu_fence_wait_multiple()" v2 That isn't used any more. v2: rebase Signed-off-by: Christian König <[email protected]> Reviewed-by: Chunming Zhou <[email protected]> Signed-off-by: Alex Deucher <[email protected]>