aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/i915_request.c
AgeCommit message (Collapse)AuthorFilesLines
2020-09-30drm/i915: Cancel outstanding work after disabling heartbeats on an engineChris Wilson1-0/+5
We only allow persistent requests to remain on the GPU past the closure of their containing context (and process) so long as they are continuously checked for hangs or allow other requests to preempt them, as we need to ensure forward progress of the system. If we allow persistent contexts to remain on the system after the the hangcheck mechanism is disabled, the system may grind to a halt. On disabling the mechanism, we sent a pulse along the engine to remove all executing contexts from the engine which would check for hung contexts -- but we did not prevent those contexts from being resubmitted if they survived the final hangcheck. Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs") Testcase: igt/gem_ctx_persistence/heartbeat-stop Signed-off-by: Chris Wilson <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: <[email protected]> # v5.7+ Reviewed-by: Tvrtko Ursulin <[email protected]> Acked-by: Joonas Lahtinen <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 7a991cd3e3da9a56d5616b62d425db000a3242f2) Signed-off-by: Rodrigo Vivi <[email protected]>
2020-09-30drm/i915: Redo "Remove i915_request.lock requirement for execution callbacks"Chris Wilson1-10/+2
The reordering and rebasing of commit 2e4c6c1a9db5 ("drm/i915: Remove i915_request.lock requirement for execution callbacks") caused it to revert an earlier correction. Let us restore commit 99f0a640d464 ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs") Fixes: 2e4c6c1a9db5 ("drm/i915: Remove i915_request.lock requirement for execution callbacks") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Joonas Lahtinen <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 35faeb7de9ef83da510a048f2016061f1e31d5fc) Signed-off-by: Rodrigo Vivi <[email protected]>
2020-09-07drm/i915: Remove i915_request.lock requirement for execution callbacksChris Wilson1-47/+58
To implement preempt-to-busy (and so efficient timeslicing and best utilization of the hardware submission ports) we let the GPU run asynchronously in respect to the ELSP submission queue. This created challenges in keeping and accessing the driver state mirroring the asynchronous GPU execution. Previous fix 1d9221e9d395 ("drm/i915: Skip signaling a signaled request") however did not correctly serialize request retirement with the execution callbacks. We were using the i915_request.lock to serialise adding an execution callback with __i915_request_submit. However, if we use an atomic llist_add to serialise multiple waiters and then check to see if the request is already executing, we can remove the irq-spinlock and fix serialization between retirement and execution callbacks in one go. v2: Avoid using the irq_work when outside of the irq-spinlocks, where we can execute the callbacks immediately. v3: Pay close attention to the order of setting ACTIVE on retirement, we need to ensure the request is signaled and breadcrumbs detached before we finish removing the request from the engine. v4: Expanded commit message. Fixes: 1d9221e9d395 ("drm/i915: Skip signaling a signaled request") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> [Joonas: Rebased and reordered into drm-intel-gt-next branch] [Joonas: Added expanded commit message from Tvrtko and Chris] Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Be wary of data races when reading the active execlistsChris Wilson1-2/+23
To implement preempt-to-busy (and so efficient timeslicing and best utilization of the hardware submission ports) we let the GPU run asynchronously in respect to the ELSP submission queue. This created challenges in keeping and accessing the driver state mirroring the asynchronous GPU execution. The latest occurence of this was spotted by KCSAN: [ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915] [ 1413.563221] [ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1: [ 1413.563548] __await_execution+0x217/0x370 [i915] [ 1413.563891] i915_request_await_dma_fence+0x4eb/0x6a0 [i915] [ 1413.564235] i915_request_await_object+0x421/0x490 [i915] [ 1413.564577] i915_gem_do_execbuffer+0x29b7/0x3c40 [i915] [ 1413.564967] i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915] [ 1413.564998] drm_ioctl_kernel+0x156/0x1b0 [ 1413.565022] drm_ioctl+0x2ff/0x480 [ 1413.565046] __x64_sys_ioctl+0x87/0xd0 [ 1413.565069] do_syscall_64+0x4d/0x80 [ 1413.565094] entry_SYSCALL_64_after_hwframe+0x44/0xa9 To complicate matters, we have to both avoid the read tearing of *active and avoid any write tearing as perform the pending[] -> inflight[] promotion of the execlists. This is because we cannot rely on the memcpy doing u64 aligned copies on all kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE annotations to satisfy KCSAN. v2: When in doubt, write the same comment again. v3: Expanded commit message. Fixes: b55230e5e800 ("drm/i915: Check for awaits on still currently executing requests") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> [Joonas: Rebased and reordered into drm-intel-gt-next branch] [Joonas: Added expanded commit message from Tvrtko and Chris] Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Hold context/request reference while breadcrumbs are activeChris Wilson1-5/+4
Currently we hold no actual reference to the request nor context while they are attached to a breadcrumb. To avoid freeing the request/context too early, we serialise with cancel-breadcrumbs by taking the irq spinlock in i915_request_retire(). The alternative is to take a reference for a new breadcrumb and release it upon signaling; removing the more frequently hit contention point in i915_request_retire(). Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> [Joonas: Rebased and reordered into drm-intel-gt-next branch] Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbsChris Wilson1-0/+1
On the virtual engines, we only use the intel_breadcrumbs for tracking signaling of stale breadcrumbs from the irq_workers. They do not have any associated interrupt handling, active requests are passed to a physical engine and associated breadcrumb interrupt handler. This causes issues for us as we need to ensure that we do not actually try and enable interrupts and the powermanagement required for them on the virtual engine, as they will never be disabled. Instead, let's specify the physical engine used for interrupt handler on a particular breadcrumb. v2: Drop b->irq_armed = true mocking for no interrupt HW Fixes: 4fe6abb8f513 ("drm/i915/gt: Ignore irq enabling on the virtual engines") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbsChris Wilson1-3/+2
After staring at the breadcrumb enabling/cancellation and coming to the conclusion that the cause of the mysterious stale breadcrumbs must the act of submitting a completed requests, we can then redirect those completed requests onto a dedicated signaled_list at the time of construction and so eliminate intel_engine_transfer_stale_breadcrumbs(). Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Remove requirement for holding i915_request.lock for breadcrumbsChris Wilson1-27/+39
Since the breadcrumb enabling/cancelling itself is serialised by the breadcrumbs.irq_lock, with a bit of care we can remove the outer serialisation with i915_request.lock for concurrent dma_fence_enable_signaling(). This has the important side-effect of eliminating the nested i915_request.lock within request submission. The challenge in serialisation is around the unsubmission where we take an active request that wants a breadcrumb on the signaling engine and put it to sleep. We do not want a concurrent dma_fence_enable_signaling() to attach a breadcrumb as we unsubmit, so we must mark the request as no longer active before serialising with the concurrent enable-signaling. On retire, we serialise with the concurrent enable-signaling, but instead of clearing ACTIVE, we mark it as SIGNALED. Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> [Joonas: Rebased and reordered into drm-intel-gt-next branch] Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gem: Remove disordered per-file request list for throttlingChris Wilson1-21/+0
I915_GEM_THROTTLE dates back to the time before contexts where there was just a single engine, and therefore a single timeline and request list globally. That request list was in execution/retirement order, and so walking it to find a particular aged request made sense and could be split per file. That is no more. We now have many timelines with a file, as many as the user wants to construct (essentially per-engine, per-context). Each of those run independently and so make the single list futile. Remove the disordered list, and iterate over all the timelines to find a request to wait on in each to satisfy the criteria that the CPU is no more than 20ms ahead of its oldest request. It should go without saying that the I915_GEM_THROTTLE ioctl is no longer used as the primary means of throttling, so it makes sense to push the complication into the ioctl where it only impacts upon its few irregular users, rather than the execbuf/retire where everybody has to pay the cost. Fortunately, the few users do not create vast amount of contexts, so the loops over contexts/engines should be concise. Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Soften the tasklet flush frequency before waitsChris Wilson1-2/+18
We include a tasklet flush before waiting on a request as a precaution against the HW being lax in event signaling. We now have a precautionary flush in the engine's heartbeat and so do not need to be quite so zealous on every request wait. If we focus on the request, the only tasklet flush that matters is if there is a delay in submitting this request to HW, so if the request is not ready to be executed, no advantage in reducing this wait can be gained by running the tasklet. And there is little point in doing busy work for no result. Signed-off-by: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Reduce i915_request.lock contention for i915_request_waitChris Wilson1-10/+8
Currently, we use i915_request_completed() directly in i915_request_wait() and follow up with a manual invocation of dma_fence_signal(). This appears to cause a large number of contentions on i915_request.lock as when the process is woken up after the fence is signaled by an interrupt, we will then try and call dma_fence_signal() ourselves while the signaler is still holding the lock. dma_fence_is_signaled() has the benefit of checking the DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so avoids most of that contention. Signed-off-by: Chris Wilson <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-08-17drm/i915: Remove gen check before calling intel_rps_boostChris Wilson1-5/+2
It's been a while since gen6_rps_boost() [that only worked on gen6+] was replaced by intel_rps_boost() that understood itself when rps was active. Since the intel_rps_boost() is gen-agnostic, just call it. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]>
2020-07-13drm/i915: Skip signaling a signaled requestChris Wilson1-10/+13
Preempt-to-busy introduces various fascinating complications in that the requests may complete as we are unsubmitting them from HW. As they may then signal after unsubmission, we may find ourselves having to cleanup the signaling request from within the signaling callback. This causes us to recurse onto the same i915_request.lock. However, if the request is already signaled (as it will be before we enter the signal callbacks), we know we can skip the signaling of that request during submission, neatly evading the spinlock recursion. unsubmit(ve.rq0) # timeslice expiration or other preemption -> virtual_submit_request(ve.rq0) dma_fence_signal(ve.rq0) # request completed before preemption ack -> submit_notify(ve.rq1) -> virtual_submit_request(ve.rq1) # sees that we have completed ve.rq0 -> __i915_request_submit(ve.rq0) [ 264.210142] BUG: spinlock recursion on CPU#2, sample_multi_tr/2093 [ 264.210150] lock: 0xffff9efd6ac55080, .magic: dead4ead, .owner: sample_multi_tr/2093, .owner_cpu: 2 [ 264.210155] CPU: 2 PID: 2093 Comm: sample_multi_tr Tainted: G U [ 264.210158] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X212.B01.1909060036 09/06/2019 [ 264.210160] Call Trace: [ 264.210167] dump_stack+0x98/0xda [ 264.210174] spin_dump.cold+0x24/0x3c [ 264.210178] do_raw_spin_lock+0x9a/0xd0 [ 264.210184] _raw_spin_lock_nested+0x6a/0x70 [ 264.210314] __i915_request_submit+0x10a/0x3c0 [i915] [ 264.210415] virtual_submit_request+0x9b/0x380 [i915] [ 264.210516] submit_notify+0xaf/0x14c [i915] [ 264.210602] __i915_sw_fence_complete+0x8a/0x230 [i915] [ 264.210692] i915_sw_fence_complete+0x2d/0x40 [i915] [ 264.210762] __dma_i915_sw_fence_wake+0x19/0x30 [i915] [ 264.210767] dma_fence_signal_locked+0xb1/0x1c0 [ 264.210772] dma_fence_signal+0x29/0x50 [ 264.210871] i915_request_wait+0x5cb/0x830 [i915] [ 264.210876] ? dma_resv_get_fences_rcu+0x294/0x5d0 [ 264.210974] i915_gem_object_wait_fence+0x2f/0x40 [i915] [ 264.211084] i915_gem_object_wait+0xce/0x400 [i915] [ 264.211178] i915_gem_wait_ioctl+0xff/0x290 [i915] Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: "Nayana, Venkata Ramana" <[email protected]> Cc: <[email protected]> # v5.4+ Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-06-03drm/i915: Drop i915_request.i915 backpointerChris Wilson1-6/+6
We infrequently use the direct i915 backpointer from the i915_request, so do we really need to waste the space in the struct for it? 8 bytes from the most frequently allocated struct vs an 3 bytes and pointer chasing in using rq->engine->i915? Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Akeem G Abodunrin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-29drm/i915: Check for awaits on still currently executing requestsChris Wilson1-1/+48
With the advent of preempt-to-busy, a request may still be on the GPU as we unwind. And in the case of a unpreemptible [due to HW] request, that request will remain indefinitely on the GPU even though we have returned it back to our submission queue, and cleared the active bit. We only run the execution callbacks on transferring the request from our submission queue to the execution queue, but if this is a bonded request that the HW is waiting for, we will not submit it (as we wait for a fresh execution) even though it is still being executed. As we know that there are always preemption points between requests, we know that only the currently executing request may be still active even though we have cleared the flag. However, we do not precisely know which request is in ELSP[0] due to a delay in processing events, and furthermore we only store the last request in a context in our state tracker. Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") Testcase: igt/gem_exec_balancer/bonded-dual Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-29drm/i915: Add a few asserts around handling of i915_request_is_active()Chris Wilson1-2/+3
Let's assert that we only call the execute callbacks on making the request active, and that we do not execute the request without calling the callbacks. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-27drm/i915/gt: Do not schedule normal requests immediately along virtualChris Wilson1-4/+21
When we push a virtual request onto the HW, we update the rq->engine to point to the physical engine. A request that is then submitted by the user that waits upon the virtual engine, but along the physical engine in use, will then see that it is due to be submitted to the same engine and take a shortcut (and be queued without waiting for the completion fence). However, the virtual request may be preempted (either by higher priority users, or by timeslicing) and removed from the physical engine to be migrated over to one of its siblings. The dependent normal request however is oblivious to the removal of the virtual request and remains queued to execute on HW, believing that once it reaches the head of its queue all of its predecessors will have completed executing! v2: Beware restriction of signal->execution_mask prior to submission. Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine") Testcase: igt/gem_exec_balancer/sliced Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: <[email protected]> # v5.3+ Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-27drm/i915: Reorder await_execution before await_requestChris Wilson1-132/+132
Reorder the code so that we can reuse the await_execution from a special case in await_request in the next patch. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-26drm/i915: Improve execute_cb struct packingChris Wilson1-9/+14
Reduce the irq_work llist for attaching the callbacks to the signal for both smaller structs (two fewer pointers!) and simpler [debug] code: Function old new delta irq_execute_cb 35 34 -1 __igt_breadcrumbs_smoketest 1684 1682 -2 i915_request_retire 2003 1996 -7 __i915_request_create 1047 1040 -7 __notify_execute_cb 135 126 -9 __i915_request_ctor 188 178 -10 __await_execution.part.constprop 451 440 -11 igt_wait_request 924 714 -210 One minor artifact is that the order of cb exection is reversed. No current use cases are affected by that change. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-21drm/i915: Avoid using rq->engine after free during i915_fence_releaseChris Wilson1-2/+33
In order to be valid to dereference during the i915_fence_release, after retiring the fence and releasing its refererences, we assume that rq->engine can only be a real engine (that stay intact until the device is shutdown after all fences have been flushed). However, due to a quirk of preempt-to-busy, we may retire a request that still belongs to a virtual engine and so eventually free it with rq->engine being invalid. To avoid dereferencing that invalid engine, we look at the execution_mask which if it indicates it may be executed on more than one engine, we know it originated on a virtual engine and may still be on one. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1906 Fixes: 43acd6516ca9 ("drm/i915: Keep a per-engine request pool") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-14drm/i915: Drop no-semaphore boostingChris Wilson1-36/+4
Now that we have fast timeslicing on semaphores, we no longer need to prioritise none-semaphore work as we will yield any work blocked on a semaphore to the next in the queue. Previously with no timeslicing, blocking on the semaphore caused extremely bad scheduling with multiple clients utilising multiple rings. Now, there is no impact and we can remove the complication. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-13drm/i915: Mark the addition of the initial-breadcrumb in the requestChris Wilson1-1/+6
The initial-breadcrumb is used to mark the end of the awaiting and the beginning of the user payload. We verify that we do not start the user payload before all signaler are completed, checking our semaphore setup by looking for the initial breadcrumb being written too early. We also want to ensure that we do not add semaphore waits after we have already closed the semaphore section, an issue for later deferred waits. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-09drm/i915: Replace the hardcoded I915_FENCE_TIMEOUTChris Wilson1-1/+2
Expose the hardcoded timeout for unsignaled foreign fences as a Kconfig option, primarily to allow brave systems to disable the timeout and solely rely on correct signaling. Signed-off-by: Chris Wilson <[email protected]> Cc: Joonas Lahtinen <[email protected]> Acked-by: Michael J. Ruhl <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-08drm/i915: Prevent using semaphores to chain up to external fencesChris Wilson1-0/+26
The downside of using semaphores is that we lose metadata passing along the signaling chain. This is particularly nasty when we need to pass along a fatal error such as EFAULT or EDEADLK. For fatal errors we want to scrub the request before it is executed, which means that we cannot preload the request onto HW and have it wait upon a semaphore. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-08drm/i915: Peel dma-fence-chains for awaitLionel Landwerlin1-1/+28
To allow faster engine to engine synchronization, peel the layer of dma-fence-chain to expose potential i915 fences so that the i915_request code can emit HW semaphore wait/signal operations in the ring which is faster than waking up the host to submit unblocked workloads after interrupt notification. This is similar to the peeling we do for e.g. dma_fence_array. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-08drm/i915: Pull waiting on an external dma-fence into its routineChris Wilson1-6/+10
As a means for a small code consolidation, but primarily to start thinking more carefully about internal-vs-external linkage, pull the pair of i915_sw_fence_await_dma_fence() calls into a common routine. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-08drm/i915: Ignore submit-fences on the same timelineChris Wilson1-0/+3
While we ordinarily do not skip submit-fences due to the accompanying hook that we want to callback on execution, a submit-fence on the same timeline is meaningless. Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-07drm/i915: Remove wait priority boostingChris Wilson1-3/+0
Upon waiting a request (when asked), we gave that request a small priority boost, not enough for it to cause preemption, but enough for it to be scheduled next before all equals. We also used that bit to give new clients a small priority boost, similar to FQ_CODEL, such that we favoured short interactive tasks ahead of long running streams. However, this is causing lots of complications with timeslicing where we both want to honour the boost and yet ignore it. Those complications cause unexpected user behaviour (tasks not being timesliced and run concurrently as epxected), and the easiest way to resolve that is to remove the boost. Hopefully, we can find a compromise again if we need to, but in theory timeslicing itself and future more advanced schedulers should give us the interactivity boost we seek. Testcase: igt/gem_exec_schedule/lateslice Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-07drm/i915: Mark concurrent submissions with a weak-dependencyChris Wilson1-2/+6
We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could correctly perform priority inheritance from the parallel branches to the common trunk. However, for the purpose of timeslicing and reset handling, the dependency is weak -- as we the pair of requests are allowed to run in parallel and not in strict succession. The real significance though is that this allows us to rearrange groups of WAIT_FOR_SUBMIT linked requests along the single engine, and so can resolve user level inter-batch scheduling dependencies from user semaphores. Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Testcase: igt/gem_exec_fence/submit Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: <[email protected]> # v5.6+ Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-06drm/i915: Propagate error from completed fencesChris Wilson1-1/+3
We need to preserve fatal errors from fences that are being terminated as we hook them up. Fixes: ef4688497512 ("drm/i915: Propagate fence errors") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Matthew Auld <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-04-03drm/i915: Keep a per-engine request poolChris Wilson1-5/+22
Add a tiny per-engine request mempool so that we should always have a request available for powermanagement allocations from tricky contexts. This reserve is expected to be only used for kernel contexts when barriers must be emitted [almost] without fail. The main consumer for this reserved request is expected to be engine-pm, for which we know that there will always be at least the previous pm request that we can reuse under mempressure (so there should always be a spare request for engine_park()). This is an alternative to using a comparatively bulky mempool, which requires custom handling for both our reserved allocation requirement and to protect our TYPESAFE_BY_RCU slab cache. The advantage of mempool would be that it would allow us to keep a larger per-engine request pool. However, converting over to mempool is straightforward should the need arise. Signed-off-by: Chris Wilson <[email protected]> Cc: Janusz Krzysztofik <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-and-tested-by: Janusz Krzysztofik <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-23drm/i915: Rely on direct submission to the queueChris Wilson1-2/+0
Drop the pretense of kicking the tasklet (used only for the defunct guc submission backend, it should just take ownership of the submit!) and so remove the bh-kicking from around submission. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-10drm/i915: Mark up racy read of active rq->engineChris Wilson1-3/+5
As a virtual engine may change the rq->engine to point to the active request in flight, we need to warn the compiler that an active request's engine is volatile. [ 95.017686] write (marked) to 0xffff8881e8386b10 of 8 bytes by interrupt on cpu 2: [ 95.018123] execlists_dequeue+0x762/0x2150 [i915] [ 95.018539] __execlists_submission_tasklet+0x48/0x60 [i915] [ 95.018955] execlists_submission_tasklet+0xd3/0x170 [i915] [ 95.018986] tasklet_action_common.isra.0+0x42/0xa0 [ 95.019016] __do_softirq+0xd7/0x2cd [ 95.019043] irq_exit+0xbe/0xe0 [ 95.019068] irq_work_interrupt+0xf/0x20 [ 95.019491] i915_request_retire+0x2c5/0x670 [i915] [ 95.019937] retire_requests+0xa1/0xf0 [i915] [ 95.020348] engine_retire+0xa1/0xe0 [i915] [ 95.020376] process_one_work+0x3b1/0x690 [ 95.020403] worker_thread+0x80/0x670 [ 95.020429] kthread+0x19a/0x1e0 [ 95.020454] ret_from_fork+0x1f/0x30 [ 95.020476] [ 95.020498] read to 0xffff8881e8386b10 of 8 bytes by task 8909 on cpu 3: [ 95.020918] __i915_request_commit+0x177/0x220 [i915] [ 95.021329] i915_gem_do_execbuffer+0x38c4/0x4e50 [i915] [ 95.021750] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 95.021784] drm_ioctl_kernel+0xe4/0x120 [ 95.021809] drm_ioctl+0x297/0x4c7 [ 95.021832] ksys_ioctl+0x89/0xb0 [ 95.021865] __x64_sys_ioctl+0x42/0x60 [ 95.021901] do_syscall_64+0x6e/0x2c0 [ 95.021927] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-10drm/i915: Defer semaphore priority bumping to a workqueueChris Wilson1-5/+17
Since the semaphore fence may be signaled from inside an interrupt handler from inside a request holding its request->lock, we cannot then enter into the engine->active.lock for processing the semaphore priority bump as we may traverse our call tree and end up on another held request. CPU 0: [ 2243.218864] _raw_spin_lock_irqsave+0x9a/0xb0 [ 2243.218867] i915_schedule_bump_priority+0x49/0x80 [i915] [ 2243.218869] semaphore_notify+0x6d/0x98 [i915] [ 2243.218871] __i915_sw_fence_complete+0x61/0x420 [i915] [ 2243.218874] ? kmem_cache_free+0x211/0x290 [ 2243.218876] i915_sw_fence_complete+0x58/0x80 [i915] [ 2243.218879] dma_i915_sw_fence_wake+0x3e/0x80 [i915] [ 2243.218881] signal_irq_work+0x571/0x690 [i915] [ 2243.218883] irq_work_run_list+0xd7/0x120 [ 2243.218885] irq_work_run+0x1d/0x50 [ 2243.218887] smp_irq_work_interrupt+0x21/0x30 [ 2243.218889] irq_work_interrupt+0xf/0x20 CPU 1: [ 2242.173107] _raw_spin_lock+0x8f/0xa0 [ 2242.173110] __i915_request_submit+0x64/0x4a0 [i915] [ 2242.173112] __execlists_submission_tasklet+0x8ee/0x2120 [i915] [ 2242.173114] ? i915_sched_lookup_priolist+0x1e3/0x2b0 [i915] [ 2242.173117] execlists_submit_request+0x2e8/0x2f0 [i915] [ 2242.173119] submit_notify+0x8f/0xc0 [i915] [ 2242.173121] __i915_sw_fence_complete+0x61/0x420 [i915] [ 2242.173124] ? _raw_spin_unlock_irqrestore+0x39/0x40 [ 2242.173137] i915_sw_fence_complete+0x58/0x80 [i915] [ 2242.173140] i915_sw_fence_commit+0x16/0x20 [i915] Closes: https://gitlab.freedesktop.org/drm/intel/issues/1318 Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: <[email protected]> # v5.2+ Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-10drm/i915: Improve the start alignment of bonded pairsChris Wilson1-5/+36
Always wait on the start of the signaler request to reduce the problem of dequeueing the bonded pair too early -- we want both payloads to start at the same time, with no latency, and yet still allow others to make full use of the slack in the system. This reduce the amount of time we spend waiting on the semaphore used to synchronise the start of the bonded payload. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-09drm/i915: Mark racy read of intel_engine_cs.saturatedChris Wilson1-1/+1
[ 3783.276728] BUG: KCSAN: data-race in __i915_request_submit [i915] / i915_request_await_dma_fence [i915] [ 3783.276766] [ 3783.276787] write to 0xffff8881f1bc60a0 of 1 bytes by interrupt on cpu 2: [ 3783.277187] __i915_request_submit+0x47e/0x4a0 [i915] [ 3783.277580] __execlists_submission_tasklet+0x997/0x2780 [i915] [ 3783.277973] execlists_submission_tasklet+0xd3/0x170 [i915] [ 3783.278006] tasklet_action_common.isra.0+0x42/0xa0 [ 3783.278035] __do_softirq+0xd7/0x2cd [ 3783.278063] irq_exit+0xbe/0xe0 [ 3783.278089] do_IRQ+0x51/0x100 [ 3783.278114] ret_from_intr+0x0/0x1c [ 3783.278140] finish_task_switch+0x72/0x260 [ 3783.278170] __schedule+0x1e5/0x510 [ 3783.278198] schedule+0x45/0xb0 [ 3783.278226] smpboot_thread_fn+0x23e/0x300 [ 3783.278256] kthread+0x19a/0x1e0 [ 3783.278283] ret_from_fork+0x1f/0x30 [ 3783.278305] [ 3783.278327] read to 0xffff8881f1bc60a0 of 1 bytes by task 19440 on cpu 3: [ 3783.278724] i915_request_await_dma_fence+0x2a6/0x530 [i915] [ 3783.279130] i915_request_await_object+0x2fe/0x470 [i915] [ 3783.279524] i915_gem_do_execbuffer+0x45dc/0x4c20 [i915] [ 3783.279908] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 3783.279940] drm_ioctl_kernel+0xe4/0x120 [ 3783.279968] drm_ioctl+0x297/0x4c7 [ 3783.279996] ksys_ioctl+0x89/0xb0 [ 3783.280021] __x64_sys_ioctl+0x42/0x60 [ 3783.280047] do_syscall_64+0x6e/0x2c0 [ 3783.280074] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-07drm/i915: Do not poison i915_request.link on removalChris Wilson1-1/+1
Do not poison the timeline link on the i915_request to allow both forward/backward list traversal under RCU. [ 9759.139229] RIP: 0010:active_request+0x2a/0x90 [i915] [ 9759.139240] Code: 41 56 41 55 41 54 55 48 89 fd 53 48 89 f3 48 83 c5 60 e8 49 de dc e0 48 8b 83 e8 01 00 00 48 39 c5 74 12 48 8d 90 20 fe ff ff <48> 8b 80 50 fe ff ff a8 01 74 11 e8 66 20 dd e0 48 89 d8 5b 5d 41 [ 9759.139251] RSP: 0018:ffffc9000014ce80 EFLAGS: 00010012 [ 9759.139260] RAX: dead000000000122 RBX: ffff888817cac040 RCX: 0000000000022000 [ 9759.139267] RDX: deacffffffffff42 RSI: ffff888817cac040 RDI: ffff888851fee900 [ 9759.139275] RBP: ffff888851fee960 R08: 000000000000001a R09: ffffffffa04702e0 [ 9759.139282] R10: ffffffff82187ea0 R11: 0000000000000002 R12: 0000000000000004 [ 9759.139289] R13: ffffffffa04d5179 R14: ffff8887f994ae40 R15: ffff888857b9a068 [ 9759.139296] FS: 0000000000000000(0000) GS:ffff88885ed80000(0000) knlGS:0000000000000000 [ 9759.139304] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9759.139311] CR2: 00007fff5bdec000 CR3: 00000008534fe001 CR4: 00000000001606e0 [ 9759.139318] Call Trace: [ 9759.139325] <IRQ> [ 9759.139389] execlists_reset+0x14d/0x310 [i915] [ 9759.139400] ? _raw_spin_unlock_irqrestore+0xf/0x30 [ 9759.139445] ? fwtable_read32+0x90/0x230 [i915] [ 9759.139499] execlists_submission_tasklet+0xf6/0x150 [i915] [ 9759.139508] tasklet_action_common.isra.17+0x32/0xa0 [ 9759.139516] __do_softirq+0x114/0x3dc [ 9759.139525] ? handle_irq_event_percpu+0x59/0x70 [ 9759.139533] irq_exit+0xa1/0xc0 [ 9759.139540] do_IRQ+0x76/0x150 [ 9759.139547] common_interrupt+0xf/0xf [ 9759.139554] </IRQ> Signed-off-by: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-06drm/i915: Assert requests within a context are submitted in orderChris Wilson1-0/+11
Check the flow of requests into the hardware to verify that are submitted in order along their timeline. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-05drm/i915: Return early for await_start on same timelineChris Wilson1-2/+2
Requests within a timeline are ordered by that timeline, so awaiting for the start of a request within the timeline is a no-op. This used to work by falling out of the mutex_trylock() as the signaler and waiter had the same timeline and not returning an error. Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: <[email protected]> # v5.5+ Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-05drm/i915: Actually emit the await_startChris Wilson1-1/+1
Fix the inverted test to emit the wait on the end of the previous request if we /haven't/ already. Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: <[email protected]> # v5.5+ Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-04drm/i915: Apply i915_request_skip() on submissionChris Wilson1-20/+49
Trying to use i915_request_skip() prior to i915_request_add() causes us to try and fill the ring upto request->postfix, which has not yet been set, and so may cause us to memset() past the end of the ring. Instead of skipping the request immediately, just flag the error on the request (only accepting the first fatal error we see) and then clear the request upon submission. Signed-off-by: Chris Wilson <[email protected]> Cc: Matthew Auld <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-03drm/i915/gem: Check that the context wasn't closed during setupChris Wilson1-46/+8
As setup takes a long time, the user may close the context during the construction of the execbuf. In order to make sure we correctly track all outstanding work with non-persistent contexts, we need to serialise the submission with the context closure and mop up any leaks. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-02-28drm/i915/gt: Expose busywait duration to sysfsChris Wilson1-9/+10
We busywait on an inflight request (one that is currently executing on HW, and so might complete quickly) prior to setting up an interrupt and sleeping. The trade off is that we keep an expensive CPU core busy in order to avoid wake up latency: where that trade off should lie is best left to the sysadmin. The busywait mechanism can be compiled out with ./scripts/config --set-val DRM_I915_SPIN_REQUEST 0 The maximum busywait duration can be adjusted per-engine using, /sys/class/drm/card?/engine/*/ms_busywait_duration_ns Signed-off-by: Chris Wilson <[email protected]> Cc: Joonas Lahtinen <[email protected]> Reviewed-by: Steve Carbonari <[email protected]> Tested-by: Steve Carbonari <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-02-28drm/i915: Protect i915_request_await_start from early waitsChris Wilson1-13/+28
We need to be extremely careful inside i915_request_await_start() as it needs to walk the list of requests in the foreign timeline with very little protection. As we hold our own timeline mutex, we can not nest inside the signaler's timeline mutex, so all that remains is our RCU protection. However, to be safe we need to tell the compiler that we may be traversing the list only under RCU protection, and furthermore we need to start declaring requests as elements of the timeline from their construction. Fixes: 9ddc8ec027a3 ("drm/i915: Eliminate the trylock for awaiting an earlier request") Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating") Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-02-20drm/i915: remove the other slab_dependenciesMatthew Auld1-11/+0
The real one can be found in i915_scheduler.c. Signed-off-by: Matthew Auld <[email protected]> Cc: Chris Wilson <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-02-12drm/i915: Poison rings after useChris Wilson1-9/+17
On retiring the request, we should not re-use these elements in the ring (at least not until we fill the ringbuffer and knowingly reuse the space). Leave behind some poison to (hopefully) trap ourselves if we make a mistake. Suggested-by: Mika Kuoppala <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-02-11drm/i915: Disable use of hwsp_cacheline for kernel_contextChris Wilson1-5/+9
Currently on execlists, we use a local hwsp for the kernel_context, rather than the engine's HWSP, as this is the default for execlists. However, seqno wrap requires allocating a new HWSP cacheline, and may require pinning a new HWSP page in the GGTT. This operation requiring pinning in the GGTT is not allowed within the kernel_context timeline, as doing so may require re-entering the kernel_context in order to evict from the GGTT. As we want to avoid requiring a new HWSP for the kernel_context, we can use the permanently pinned engine's HWSP instead. However to do so we must prevent the use of semaphores reading the kernel_context's HWSP, as the use of semaphores do not support rollover onto the same cacheline. Fortunately, the kernel_context is mostly isolated, so unlikely to give benefit to semaphores. Reported-by: Maarten Lankhorst <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-02-05drm/i915: Flush execution tasklets before checking request statusChris Wilson1-1/+2
Rather than flushing the submission tasklets just before we sleep, flush before we check the request status. Ideally this gives us a moment to process the tasklets after sleeping just before we timeout. v2: Compromise by pushing the flush prior to the timeout, but after the check on completion so that we do not further delay the ready client. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-02-03drm/i915: Initialise basic fence before acquiring seqnoChris Wilson1-7/+14
Inside the intel_timeline_get_seqno(), we currently track the retirement of the old cachelines by listening to the new request. This requires that the new request is ready to be used and so requires a minimum bit of initialisation prior to getting the new seqno. Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Matthew Auld <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-01-22drm/i915: Mark the removal of the i915_request from the sched.linkChris Wilson1-0/+2
Keep the rq->fence.flags consistent with the status of the rq->sched.link, and clear the associated bits when decoupling the link on retirement (as we may wish to inspect those flags independent of other state). Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests") References: https://gitlab.freedesktop.org/drm/intel/issues/997 Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]