From 9981927cc9e10fb4dbc24b2a5f8c17ab133859a0 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed, 15 May 2019 14:00:50 +0100
Subject: drm/i915: Bump signaler priority on adding a waiter

The handling of the no-preemption priority level imposes the restriction
that we need to maintain the implied ordering even though preemption is
disabled. Otherwise we may end up with an AB-BA deadlock across multiple
engine due to a real preemption event reordering the no-preemption
WAITs. To resolve this issue we currently promote all requests to WAIT
on unsubmission, however this interferes with the timeslicing
requirement that we do not apply any implicit promotion that will defeat
the round-robin timeslice list. (If we automatically promote the active
request it will go back to the head of the queue and not the tail!)

So we need implicit promotion to prevent reordering around semaphores
where we are not allowed to preempt, and we must avoid implicit
promotion on unsubmission. So instead of at unsubmit, if we apply that
implicit promotion on adding the dependency, we avoid the semaphore
deadlock and we also reduce the gains made by the promotion for user
space waiting. Furthermore, by keeping the earlier dependencies at a
higher level, we reduce the search space for timeslicing without
altering runtime scheduling too badly (no dependencies at all will be
assigned a higher priority for rrul).

v2: Limit the bump to external edges (as originally intended) i.e.
between contexts and out to the user.

Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk
(cherry picked from commit 6e7eb7a80769e7250e31652b96918cf7f3e0d285)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 9 ---------
 1 file changed, 9 deletions(-)

(limited to 'drivers/gpu/drm/i915/i915_request.c')

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f6c78c0fa74b..f258281bf3f3 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -502,15 +502,6 @@ void __i915_request_unsubmit(struct i915_request *request)
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
 
-	/*
-	 * As we do not allow WAIT to preempt inflight requests,
-	 * once we have executed a request, along with triggering
-	 * any execution callbacks, we must preserve its ordering
-	 * within the non-preemptible FIFO.
-	 */
-	BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
-	request->sched.attr.priority |= __NO_PREEMPTION;
-
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
 		i915_request_cancel_breadcrumb(request);
 
-- 
cgit 


From c80274bb5882ea1493551df7768977c5a43aa9d9 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed, 15 May 2019 14:00:51 +0100
Subject: drm/i915: Downgrade NEWCLIENT to non-preemptive

Commit 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") had the
intended consequence of not allowing a sequence of work that merely
crossed into a new engine the privilege to be promoted to NEWCLIENT
status. It also had the unintended consequence of actually making
NEWCLIENT effective on heavily oversubscribed transcode machines and
impacting upon their throughput.

If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of
those clients, using the NEWCLIENT boost that will be scheduled as

	rcsA x 30, (rcsB, vcs) x 30

where as before it would have been

	(rcsA, rcsB, vcs) x 30

That is with NEWCLIENT only boosting the first request of each client,
we would execute all rcsA requests prior to running on the vcs engines;
acruing a lot of dead time as compared to the previous case where the
vcs engine would be started in parallel to processing the second client.

The previous patch has the effect of delaying submission until it is
required by a third party (either the user with an explicit wait, or by
another client/engine). We reduce the NEWCLIENT bump to a mere WAIT,
which has the effect of removing its preemptive grant and reducing it to
the same level as any other user interaction -- that it will not be
promoted above the interengine dependencies, and so preventing NEWCLIENTS
from starving other engines. This a large nerf to the rrul properties of
the current NEWCLIENT, but it still does give prioritised submission to
new requests from light workloads.

References: b16c765122f9 ("drm/i915: Priority boost for new clients")
Fixes: 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") # customer impact
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk
(cherry picked from commit 68fc728b01fcc93b26d52f6e884e738962a49a66)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_priolist_types.h | 5 ++---
 drivers/gpu/drm/i915/i915_request.c        | 2 +-
 drivers/gpu/drm/i915/intel_lrc.c           | 2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

(limited to 'drivers/gpu/drm/i915/i915_request.c')

diff --git a/drivers/gpu/drm/i915/i915_priolist_types.h b/drivers/gpu/drm/i915/i915_priolist_types.h
index cc44ebd3b553..49709de69875 100644
--- a/drivers/gpu/drm/i915/i915_priolist_types.h
+++ b/drivers/gpu/drm/i915/i915_priolist_types.h
@@ -20,15 +20,14 @@ enum {
 	I915_PRIORITY_INVALID = INT_MIN
 };
 
-#define I915_USER_PRIORITY_SHIFT 3
+#define I915_USER_PRIORITY_SHIFT 2
 #define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
 
 #define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT)
 #define I915_PRIORITY_MASK (I915_PRIORITY_COUNT - 1)
 
 #define I915_PRIORITY_WAIT		((u8)BIT(0))
-#define I915_PRIORITY_NEWCLIENT		((u8)BIT(1))
-#define I915_PRIORITY_NOSEMAPHORE	((u8)BIT(2))
+#define I915_PRIORITY_NOSEMAPHORE	((u8)BIT(1))
 
 #define __NO_PREEMPTION (I915_PRIORITY_WAIT)
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f258281bf3f3..ede79886cf98 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1228,7 +1228,7 @@ void i915_request_add(struct i915_request *request)
 		 * the bulk clients. (FQ_CODEL)
 		 */
 		if (list_empty(&request->sched.signalers_list))
-			attr.priority |= I915_PRIORITY_NEWCLIENT;
+			attr.priority |= I915_PRIORITY_WAIT;
 
 		engine->schedule(request, &attr);
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4e0a351bfbca..11e5a86610bf 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -164,7 +164,7 @@
 #define WA_TAIL_DWORDS 2
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
 
-#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
+#define ACTIVE_PRIORITY (I915_PRIORITY_NOSEMAPHORE)
 
 static int execlists_context_deferred_alloc(struct intel_context *ce,
 					    struct intel_engine_cs *engine);
-- 
cgit 


From a491cc8e1597ea25803191cded49d3686702a406 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed, 15 May 2019 14:00:49 +0100
Subject: drm/i915: Truly bump ready tasks ahead of busywaits

In commit b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of
busywaits"), I tried cutting a corner in order to not install a signal
for each of our dependencies, and only listened to requests on which we
were intending to busywait. The compromise that was made was that
instead of then being able to promote the request with a full
NOSEMAPHORE like its non-busywaiting brethren, as we had not ensured we
had cleared the semaphore chain, we settled for only using the NEWCLIENT
boost. With an over saturated system with multiple NEWCLIENTS in flight
at any time, this was found to be an inadequate promotion and left us
with a much poorer scheduling order than prior to using semaphores.

The outcome of this patch, is that all requests have NOSEMAPHORE
priority when they have no dependencies and are ready to run and not
busywait, restoring the pre-semaphore ordering on saturated systems.

We can demonstrate the effect of poor scheduling order by oversaturating
the system using gem_wsim on a system with multiple vcs engines
(i.e running the same workloads across more clients than required for
peak throughput, e.g. media_load_balance_17i7.wsim -c4 -b context):

x v5.1 (normalized)
+ tip
* fix
+------------------------------------------------------------------------+
|                                                                    x   |
|                                                                    x   |
|                                                                    x   |
|                                                                    x   |
|                                                                   %x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %%x   |
|                                                                  %#x   |
|                                                                  %#x   |
|                                                                  %#x   |
|                                                                  %#x   |
|                                                                  %#x   |
|         +                                                        %#xx  |
|         +                                                        %#xx  |
|         +                                                       %%#xx  |
|         +                                                       %%#xx  |
|         +                                                       %%#xx  |
|         +                                                       %%#xx  |
|         +                                                       %%##x  |
|         +++                                                     %%##x  |
|         +++                                                     %%##x  |
|         +++                                                     %%##x  |
|        ++++                                                     %%##x  |
|        ++++                                                     %%##x  |
|        ++++                                                     %%##xx |
|        ++++                                                     %###xx |
|        ++++                                                     %###xx |
|        ++++                                                     %###xx |
|        ++++                                                     %###xx |
|        ++++ +                                                   %#O#xx |
|        ++++ +                                                   %#O#xx |
|        ++++++ +                                                 %#O#xx |
|       ++++++++++                                                %OOOxxx|
|       ++++++++++       +                                       %#OOO#xx|
|     + ++++++++++++ ++ +++++    +                        ++    @@OOOO#xx|
|                                                                   |A_| |
||__________M_______A____________________|                               |
|                                                                 |A_|   |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 120       0.99456       1.00628      0.999985     1.0001545  0.0024387139
+ 120      0.873021       1.00037      0.884134    0.90148752   0.039190862
Difference at 99.5% confidence
	-0.098667 +/- 0.0110762
	-9.86517% +/- 1.10745%
	(Student's t, pooled s = 0.0277657)
% 120      0.990207       1.00165     0.9970265    0.99699748     0.0021024
Difference at 99.5% confidence
	-0.003157 +/- 0.000908245
	-0.315651% +/- 0.0908105%
	(Student's t, pooled s = 0.00227678)

Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-2-chris@chris-wilson.co.uk
(cherry picked from commit 17db337f5098d29415314c4a588b842fc684394b)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 31 ++++++++++++-------------------
 1 file changed, 12 insertions(+), 19 deletions(-)

(limited to 'drivers/gpu/drm/i915/i915_request.c')

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index ede79886cf98..c88e538b2ef4 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -573,18 +573,7 @@ semaphore_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 
 	switch (state) {
 	case FENCE_COMPLETE:
-		/*
-		 * We only check a small portion of our dependencies
-		 * and so cannot guarantee that there remains no
-		 * semaphore chain across all. Instead of opting
-		 * for the full NOSEMAPHORE boost, we go for the
-		 * smaller (but still preempting) boost of
-		 * NEWCLIENT. This will be enough to boost over
-		 * a busywaiting request (as that cannot be
-		 * NEWCLIENT) without accidentally boosting
-		 * a busywait over real work elsewhere.
-		 */
-		i915_schedule_bump_priority(request, I915_PRIORITY_NEWCLIENT);
+		i915_schedule_bump_priority(request, I915_PRIORITY_NOSEMAPHORE);
 		break;
 
 	case FENCE_FREE:
@@ -865,12 +854,6 @@ emit_semaphore_wait(struct i915_request *to,
 	if (err < 0)
 		return err;
 
-	err = i915_sw_fence_await_dma_fence(&to->semaphore,
-					    &from->fence, 0,
-					    I915_FENCE_GFP);
-	if (err < 0)
-		return err;
-
 	/* We need to pin the signaler's HWSP until we are finished reading. */
 	err = i915_timeline_read_hwsp(from, to, &hwsp_offset);
 	if (err)
@@ -936,8 +919,18 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 						    &from->fence, 0,
 						    I915_FENCE_GFP);
 	}
+	if (ret < 0)
+		return ret;
+
+	if (to->sched.flags & I915_SCHED_HAS_SEMAPHORE_CHAIN) {
+		ret = i915_sw_fence_await_dma_fence(&to->semaphore,
+						    &from->fence, 0,
+						    I915_FENCE_GFP);
+		if (ret < 0)
+			return ret;
+	}
 
-	return ret < 0 ? ret : 0;
+	return 0;
 }
 
 int
-- 
cgit