aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/intel_ringbuffer.c
AgeCommit message (Collapse)AuthorFilesLines
2017-11-23drm/i915: Move mi_set_context() into the legacy ringbuffer submissionChris Wilson1-1/+185
The legacy i915_switch_context() is only applicable to the legacy ringbuffer submission method, so move it from the general i915_gem_context.c to intel_ringbuffer.c (rename pending!). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171123152631.31385-2-chris@chris-wilson.co.uk
2017-11-23drm/i915: Unwind incomplete legacy context switchesChris Wilson1-0/+1
The legacy context switch for ringbuffer submission is multistaged, where each of those stages may fail. However, we were updating global state after some stages, and so we had to force the incomplete request to be submitted because we could not unwind. Save the global state before performing the switches, and so enable us to unwind back to the previous global state should any phase fail. We then must cancel the request instead of submitting it should the construction fail. v2: s/saved_ctx/from_ctx/; s/ctx/to_ctx/ etc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171123152631.31385-1-chris@chris-wilson.co.uk
2017-11-20drm/i915: Remove i915.semaphores modparamChris Wilson1-2/+2
Having disabled the broken semaphores on Sandybridge, there is no need for a modparam any more, so remove it in favour of a simple HAS_LEGACY_SEMAPHORES() guard. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-5-chris@chris-wilson.co.uk
2017-11-20drm/i915: Remove obsolete ringbuffer emission for gen8+Chris Wilson1-368/+63
Since removing the module parameter to force selection of ringbuffer emission for gen8, the code is defunct. Remove it. To put the difference into perspective, a couple of microbenchmarks (bdw i7-5557u, 20170324): ring execlists exec continuous nops on all rings: 1.491us 2.223us exec sequential nops on each ring: 12.508us 53.682us single nop + sync: 9.272us 30.291us vblank_mode=0 glxgears: ~11000fps ~9000fps Since the earlier submission, gen8 ringbuffer submission has fallen further and further behind in features. So while ringbuffer may hold the throughput crown, in terms of interactive latency, execlists is much better. Alas, we have no convenient metrics for such, other than demonstrating things we can do with execlists but can not using legacy ringbuffer submission. We have made a few improvements to lowlevel execlists throughput, and ringbuffer currently panics on boot! (bdw i7-5557u, 20171026): ring execlists exec continuous nops on all rings: n/a 1.921us exec sequential nops on each ring: n/a 44.621us single nop + sync: n/a 21.953us vblank_mode=0 glxgears: n/a ~18500fps References: https://bugs.freedesktop.org/show_bug.cgi?id=87725 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Once-upon-a-time-Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-2-chris@chris-wilson.co.uk
2017-11-20drm/i915: Automatic i915_switch_context for legacyChris Wilson1-0/+4
During request construction, after pinning the context we know whether or not we have to emit a context switch. So move this common operation from every caller into i915_gem_request_alloc() itself. v2: Always submit the request if we emitted some commands during request construction, as typically it also involves changes in global state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120102002.22254-2-chris@chris-wilson.co.uk
2017-11-15drm/i915: Make request's wait-for-space explicitChris Wilson1-20/+36
At the start of building a request, we would wait for roughly enough space to fit the average request (to reduce the likelihood of having to wait and abort partway through request construction). To achieve we would try to begin a 0-length command packet, this just adds extra confusion so make the wait-for-space explicit, as in the next patch we want to move it from the backend to the i915_gem_request_alloc() so it can ensure that the wait-for-space is the first operation in building a new request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171115151204.8105-2-chris@chris-wilson.co.uk
2017-11-10drm/i915: Stop caching the "golden" renderstateChris Wilson1-1/+4
As we now record the default HW state and so only emit the "golden" renderstate once to prepare the HW, there is no advantage in keeping the renderstate batch around as it will never be used again. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-8-chris@chris-wilson.co.uk
2017-11-10drm/i915: Record the default hw state after reset upon loadChris Wilson1-12/+33
Take a copy of the HW state after a reset upon module loading by executing a context switch from a blank context to the kernel context, thus saving the default hw state over the blank context image. We can then use the default hw state to initialise any future context, ensuring that each starts with the default view of hw state. v2: Unmap our default state from the GTT after stealing it from the context. This should stop us from accidentally overwriting it via the GTT (and frees up some precious GTT space). Testcase: igt/gem_ctx_isolation Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk
2017-11-10drm/i915: Mark the context state as dirty/writtenChris Wilson1-3/+3
In the next few patches, we will want to both copy out of the context image and write a valid image into a new context. To be completely safe, we should then couple in our domain tracking to ensure that we don't have any issues with stale data remaining in unwanted cachelines. Historically, we omitted the .write=true from the call to set-gtt-domain in i915_switch_context() in order to avoid a stall between every request as we would want to wait for the previous context write from the gpu. Since then, we limit the set-gtt-domain to only occur when we first bind the vma, so once in use we will never stall, and we are sure to flush the context following a load from swap. Equally we never applied the lessons learnt from ringbuffer submission to execlists; so time to apply the flush of the lrc after load as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-6-chris@chris-wilson.co.uk
2017-10-27drm/i915: Empty the ring before disablingChris Wilson1-1/+5
An interesting snippet from Sandybridge's prm: "Although a Ring Buffer can be enabled in the non-empty state, it must not be disabled unless it is empty. Attempting to disable a Ring Buffer in the non-empty state is UNDEFINED." Let's avoid the undefined behaviour as we disable the rings prior to reset and resume. v2: Tell HEAD to catch up to TAIL (empty ring) first, then reset both to 0 (supposedly while stopped). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171027094311.30380-1-chris@chris-wilson.co.uk
2017-10-25drm/i915: Add a hook for making the engines idle (parking) and unparkingChris Wilson1-1/+4
In the next patch, we will want to install a callback when the engines (GT as a whole) become idle and similarly when they first become busy. To enable that callback, first rename intel_engines_mark_idle() to intel_engines_park() and provide the companion intel_engines_unpark(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171025143943.7661-2-chris@chris-wilson.co.uk
2017-10-16drm/i915: Remove walk over obj->vma_list for the shrinkerChris Wilson1-1/+7
In the next patch, we want to reduce the lock coverage within the shrinker, and one of the dangerous walks we have is over obj->vma_list. We are only walking the obj->vma_list in order to check whether it has been permanently pinned by HW access, typically via use on the scanout. But we have a couple of other long term pins, the context objects for which we currently have to check the individual vma pin_count. If we instead mark these using obj->pin_display, we can forgo the dangerous and sometimes slow list iteration. v2: Rearrange code to try and avoid confusion from false associations due to arrangement of whitespace along with rebasing on obj->pin_global. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-4-chris@chris-wilson.co.uk
2017-10-13drm/i915: Keep the rings stopped until they have been re-initializedChris Wilson1-5/+3
Before modifying the ring register (RING_START, HEAD, TAIL, CTL) we first make sure it is stopped (or else the hw may not resample the registers). However, we do not need to let the hw restart until after we have reprogrammed all the rings. This should help prevent situations where pending operations on the ring may resume (because we are trying to re-initialize following an unsuccessful GPU hang, i.e. from i915_gem_unset_wedged). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103260 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171013131218.18013-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-10-09drm/i915: Provide an assert for when we expect forcewake to be heldChris Wilson1-1/+10
Add assert_forcewakes_active() (the complementary function to assert_forcewakes_inactive) that documents the requirement of a function for its callers to be holding the forcewake ref (i.e. the function is part of a sequence over which RC6 must be prevented). One such example is during ringbuffer reset, where RC6 must be held across the whole reinitialisation sequence. v2: Include debug information in the WARN so we know which fw domain is missing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20171009110301.21705-5-chris@chris-wilson.co.uk
2017-09-22drm/i915: Rename global i915 to i915_modparamsMichal Wajdeczko1-4/+4
Our global struct with params is named exactly the same way as new preferred name for the drm_i915_private function parameter. To avoid such name reuse lets use different name for the global. v5: pure rename v6: fix Credits-to: Coccinelle @@ identifier n; @@ ( - i915.n + i915_modparams.n ) Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ville Syrjala <ville.syrjala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com
2017-09-18drm/i915: Cancel all ready but queued requests when wedgingChris Wilson1-0/+20
When wedging the hw, we want to mark all in-flight requests as -EIO. This is made slightly more complex by execlists who store the ready but not yet submitted-to-hw requests on a private queue (an rbtree priolist). Call into execlists to cancel not only the ELSP tracking for the submitted requests, but also the queue of unsubmitted requests. v2: Move the majority of engine_set_wedged to the backends (both legacy ringbuffer and execlists handling their own lists). Reported-by: Michał Winiarski <michal.winiarski@intel.com> Testcase: igt/gem_eio/in-flight-contexts Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170915173100.26470-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-09-15drm/i915: Mask everything in ring HWSTAM on gen6+ in ringbuffer modeVille Syrjälä1-0/+3
The execlist code already masks everything in the ring HWSTAM, but the ringbuffer code doesn't. Let's go ahead and do that. Pre-gen6 platforms setup HWSTAM during irq setup already since there's just the one register, and it also contains bits for non-ring interrupts. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170818183705.27850-13-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-13drm/i915/lrc: allocate separate page for HWSPDaniele Ceraolo Spurio1-124/+1
On gen8+ we're currently using the PPHWSP of the kernel ctx as the global HWSP. However, when the kernel ctx gets submitted (e.g. from __intel_autoenable_gt_powersave) the HW will use that page as both HWSP and PPHWSP. This causes a conflict in the register arena of the HWSP, i.e. dword indices below 0x30. We don't current utilize this arena, but in the following patches we will take advantage of the cached register state for handling execlist's context status interrupt. To avoid the conflict, instead of re-using the PPHWSP of the kernel ctx we can allocate a separate page for the HWSP like what happens for pre-execlists platform. v2: Add a use-case for the register arena of the HWSP. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1499357440-34688-1-git-send-email-daniele.ceraolospurio@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-3-chris@chris-wilson.co.uk
2017-09-08drm/i915: Add a default case in gen7 hwsp switch-caseMichel Thierry1-5/+6
Gen7 won't get any new engines, and we already added VCS2 there to just silence gcc's not handled in switch warnings. Use a default case instead, otherwise we will need to keep adding extra cases if changes happen in the future. v2: Since reaching the default case is impossible, use GEM_BUG_ON (Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170830180115.907-1-michel.thierry@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-07-27drm/i915: Enforce that CS packets are qword alignedChris Wilson1-0/+3
We require the caller to ensure that the packets they wish to emit into the CS ring are qword aligned (i.e. have an even number of dwords). Double check this. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721161101.1618-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-06-20drm/i915: Do not re-calculate num_rings locallyTvrtko Ursulin1-3/+2
Since bb8f0f5abdd7 ("drm/i915: Split intel_engine allocation and initialisation") intel_info->num_rings is set early in the load sequence and so available to be used direclty in the 2nd load phase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170616130339.23015-1-tvrtko.ursulin@linux.intel.com
2017-05-04drm/i915: Micro-optimise hotpath through intel_ring_begin()Chris Wilson1-31/+36
Typically, there is space available within the ring and if not we have to wait (by definition a slow path). Rearrange the code to reduce the number of branches and stack size for the hotpath, accomodating a slight growth for the wait. v2: Fix the new assert that packets are not larger than the actual ring. v3: Make the parameters unsigned as well to make usage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504130846.4807-3-chris@chris-wilson.co.uk
2017-05-04drm/i915: Report the ring->space from intel_ring_update_space()Chris Wilson1-4/+8
Some callers immediately want to know the current ring->space after calling intel_ring_update_space(), which we can freely provide via the return parameter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504130846.4807-2-chris@chris-wilson.co.uk
2017-05-04drm/i915: Avoid the branch in computing intel_ring_space()Chris Wilson1-11/+13
Exploit the power-of-two ring size to compute the space across the wraparound using a mask rather than a if. Convert to unsigned integers so the operation is well defined. References: https://bugs.freedesktop.org/show_bug.cgi?id=99671 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504130846.4807-1-chris@chris-wilson.co.uk
2017-05-04drm/i915: Use engine->context_pin() to report the intel_ringChris Wilson1-12/+13
Since unifying ringbuffer/execlist submission to use engine->pin_context, we ensure that the intel_ring is available before we start constructing the request. We can therefore move the assignment of the request->ring to the central i915_gem_request_alloc() and not require it in every engine->request_alloc() callback. Another small step towards simplification (of the core, but at a cost of handling error pointers in less important callers of engine->pin_context). v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code generation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk
2017-04-28drm/i915: Sanitize engine context sizesJoonas Lahtinen1-2/+2
Pre-calculate engine context size based on engine class and device generation and store it in the engine instance. v2: - Squash and get rid of hw_context_size (Chris) v3: - Move after MMIO init for probing on Gen7 and 8 (Chris) - Retained rounding (Tvrtko) v4: - Rebase for deferred legacy context allocation Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: intel-gvt-dev@lists.freedesktop.org Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-27drm/i915: Defer context state allocation for legacy ring submissionChris Wilson1-0/+50
Almost from the outset for execlists, we used deferred allocation of the logical context and rings. Then we ported the infrastructure for pinning contexts back to legacy, and so now we are able to also implement deferred allocation for context objects prior to first use on the legacy submission. v2: We still need to differentiate between legacy engines, Joonas is fixing that but I want this first ;) (Joonas) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170427104651.22394-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-25drm/i915: Poison the request before emitting commandsChris Wilson1-0/+1
If we poison the request before we emit commands, it should be easier to spot when we execute an uninitialised request. References: https://bugs.freedesktop.org/show_bug.cgi?id=100144 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170423170619.7156-2-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-04-25drm/i915: Differentiate between sw write location into ring and last hw readChris Wilson1-14/+27
We need to keep track of the last location we ask the hw to read up to (RING_TAIL) separately from our last write location into the ring, so that in the event of a GPU reset we do not tell the HW to proceed into a partially written request (which can happen if that request is waiting for an external signal before being executed). v2: Refactor intel_ring_reset() (Mika) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144 Testcase: igt/gem_exec_fence/await-hang Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests") Fixes: d55ac5bf97c6 ("drm/i915: Defer transfer onto execution timeline to actual hw submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-04-20drm/i915: Use discardable buffers for ringsChris Wilson1-1/+1
The contents of a ring are only valid between HEAD and TAIL, when the ring is idle (HEAD == TAIL) we can simply let the pages go under memory pressure if they are not pinned by an active context. Any new content will be written after HEAD and so the ring will again be valid between HEAD and TAIL, everything outside can be discarded. Note that we take care of ensuring that we do not reset the HEAD backwards following a GPU hang on an idle ring. The same precautions are what enable us to use stolen memory for rings. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170420101709.27250-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-11drm/i915: Use the same vfunc for BSD2 ring initOscar Mateo1-14/+0
If we needed to do something different for the init functions, we could always look at the engine instance to make the distinction. But, in any case, the two functions are virtually identical already (please notice that BSD2_RING is only used from gen8 onwards). With this, the init functions depends excusively on the engine class (a fact that we will use soon). v2: Commit message Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1491834873-9345-3-git-send-email-oscar.mateo@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-11drm/i915: Use safer intel_uncore_wait_for_register in ring-initChris Wilson1-3/+3
While we do hold the forcewake for legacy ringbuffer initialisation, we don't guard our access with the uncore.lock spinlock. In theory, we only initialise when no others should be accessing the same mmio cachelines, but in practice be safe as this is an infrequently used path and not worth risky micro-optimisations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170411101340.31994-5-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-04-11drm/i915: Stop sleeping from inside gen6_bsd_submit_request()Chris Wilson1-5/+5
submit_request() is called from an atomic context, it's not allowed to sleep. We have to be careful in our parameters to intel_uncore_wait_for_register() to limit ourselves to the atomic wait loop and not incur the wrath of our warnings. Fixes: 6976e74b5fa1 ("drm/i915: Don't allow overuse of __intel_wait_for_register_fw()") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170410143807.22725-1-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/20170411101340.31994-2-chris@chris-wilson.co.uk Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-04-03drm/i915: Onion unwind for intel_init_ring_common()Chris Wilson1-41/+36
Rather than call intel_engine_cleanup() with a partially constructed engine, unwind the error during intel_init_ring_common(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-2-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-03drm/i915: intel_ring.engine is unusedChris Wilson1-15/+13
Or rather it is used only by intel_ring_pin() to extract the drm_i915_private which we can easily pass in. As this is a relatively rare operation, save the space in the struct, and as such it is even break even in the extra code for passing around the parameter: add/remove: 0/0 grow/shrink: 2/3 up/down: 15/-15 (0) function old new delta intel_init_ring_buffer 906 918 +12 execlists_context_pin 1308 1311 +3 mock_engine 407 403 -4 intel_engine_create_ring 367 363 -4 intel_ring_pin 326 319 -7 Total: Before=1261794, After=1261794, chg +0.00% v2: Reorder intel_init_ring_buffer to keep the ring setup together: add/remove: 0/0 grow/shrink: 2/3 up/down: 9/-15 (-6) function old new delta intel_init_ring_buffer 906 912 +6 execlists_context_pin 1308 1311 +3 mock_engine 407 403 -4 intel_engine_create_ring 367 363 -4 intel_ring_pin 326 319 -7 Total: Before=1261794, After=1261788, chg -0.00% Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-1-chris@chris-wilson.co.uk
2017-03-27drm/i915: Refactor tests for validity of RING_TAILChris Wilson1-6/+3
Whilst I like having the assertions clearly visible in the code, they are quite repetitious! As we find new limits we want to incorporate into the set of assertions, it make sense to refactor them to a common routine. v2: Add a guc holdout. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170327131412.20293-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-27drm/i915: Assert that the request->tail fits within the ringChris Wilson1-0/+3
In addition to being qword-aligned, the RING_TAIL offset must be within the ring! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170327130009.4678-2-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-24drm/i915: Fix semaphore emission for BDW+ RCS ringbuffer emissionChris Wilson1-1/+1
The required number of dwords for semaphore emission on BDW RCS is 8, not 6 - leading to ring buffer corruption and immediate GPU hangs when using ringbuffer submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170324151724.32640-2-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-03-23drm/i915: Restore marking context objects as dirty on pinningChris Wilson1-0/+2
Commit e8a9c58fcd9a ("drm/i915: Unify active context tracking between legacy/execlists/guc") converted the legacy intel_ringbuffer submission to the same context pinning mechanism as execlists - that is to pin the context until the subsequent request is retired. Previously it used the vma retirement of the context object to keep itself pinned until the next request (after i915_vma_move_to_active()). In the conversion, I missed that the vma retirement was also responsible for marking the object as dirty. Mark the context object as dirty when pinning (equivalent to execlists) which ensures that if the context is swapped out due to mempressure or suspend/hibernation, when it is loaded back in it does so with the previous state (and not all zero). Fixes: e8a9c58fcd9a ("drm/i915: Unify active context tracking between legacy/execlists/guc") Reported-by: Dennis Gilmore <dennis@ausil.us> Reported-by: Mathieu Marquer <mathieu.marquer@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99993 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100181 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.11-rc1 Link: http://patchwork.freedesktop.org/patch/msgid/20170322205930.12762-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-03-21drm/i915: Remove intel_ring.last_retired_headChris Wilson1-17/+4
Storing the position of the breadcrumb of the last retired request as a separate last_retired_head is superfluous as we always copy that into head prior to recalculation of the intel_ring.space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170321102552.24357-1-chris@chris-wilson.co.uk
2017-03-16drm/i915: Assert that the context pin_counts do not overflowChris Wilson1-0/+1
This should be impossible, but let's assert that we do not pin a context 4 billion times before retiring! v2: Fix the assertion -- the patch had just one job to do! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316171628.3228-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-03-16drm/i915: Move engine->submit_request selection to a vfuncChris Wilson1-2/+13
It turns out that we may want to restore the original engine->submit_request (and engine->schedule) callbacks from more than just the guc <-> execlists transition. Move this to a vfunc so we can have a common interface. v2: Move initial selection to intel_engines_init_common(), repaint vfunc with engine->set_default_submission (and a similar colour for the helper). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-2-chris@chris-wilson.co.uk
2017-02-27drm/i915: Reduce context alignmentChris Wilson1-1/+2
No hardware was ever shipped that needed more than 4096 byte alignment and future hardware will not use this legacy path. So reduce the alignment to make it easier and quicker to launch workloads. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170227135913.8056-3-chris@chris-wilson.co.uk
2017-02-20drm/i915: Assert that the request->tail is always qword alignedChris Wilson1-0/+3
The hardware requires that the tail pointer only advance in qword units, so assert that the value we write is aligned to qwords, and similarly enforce this restriction onto the request->tail. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170217163833.731-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-02-17drm/i915: Consolidate gen8_emit_pipe_controlTvrtko Ursulin1-30/+15
We have a few open coded instances in the execlists code and an almost suitable helper in intel_ringbuf.c We can consolidate to a single helper if we change the existing helper to emit directly to ring buffer memory and move the space reservation outside it. v2: Drop memcpy for memset. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170216122325.31391-2-tvrtko.ursulin@linux.intel.com
2017-02-17drm/i915: Move common workaround code to intel_engine_csTvrtko Ursulin1-550/+0
It is used by all submission backends. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-02-17drm/i915: Make int __intel_ring_space staticTvrtko Ursulin1-1/+1
It is only used within intel_ringbuffer.c Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.oc.uk>
2017-02-15drm/i915: Remove duplicate intel_logical_ring_workarounds_emitTvrtko Ursulin1-1/+1
intel_ring_workarounds_emit is exactly the same code. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170214150017.16058-1-tvrtko.ursulin@linux.intel.com
2017-02-14drm/i915: fix for WaDisableDopClockGating:bdwRobert Bragg1-1/+5
This workaround for BDW was incomplete as it also requires EUTC clock gating to be disabled via UCGCTL1. v2: read modify write UCGTL1 in broadwell_init_clock_gating (Ville) Signed-off-by: Robert Bragg <robert@sixbynine.org> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170212133252.20990-1-robert@sixbynine.org Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-02-14drm/i915: Emit to ringbuffer directlyTvrtko Ursulin1-316/+276
This removes the usage of intel_ring_emit in favour of directly writing to the ring buffer. intel_ring_emit was preventing the compiler for optimising fetch and increment of the current ring buffer pointer and therefore generating very verbose code for every write. It had no useful purpose since all ringbuffer operations are started and ended with intel_ring_begin and intel_ring_advance respectively, with no bail out in the middle possible, so it is fine to increment the tail in intel_ring_begin and let the code manage the pointer itself. Useless instruction removal amounts to approximately two and half kilobytes of saved text on my build. Not sure if this has any measurable performance implications but executing a ton of useless instructions on fast paths cannot be good. v2: * Change return from intel_ring_begin to error pointer by popular demand. * Move tail increment to intel_ring_advance to enable some error checking. v3: * Move tail advance back into intel_ring_begin. * Rebase and tidy. v4: * Complete rebase after a few months since v3. v5: * Remove unecessary cast and fix !debug compile. (Chris Wilson) v6: * Make intel_ring_offset take request as well. * Fix recording of request postfix plus a sprinkle of asserts. (Chris Wilson) v7: * Use intel_ring_offset to get the postfix. (Chris Wilson) * Convert GVT code as well. v8: * Rename *out++ to *cs++. v9: * Fix GVT out to cs conversion in GVT. v10: * Rebase for new intel_ring_begin in selftests. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com