Age | Commit message (Collapse) | Author | Files | Lines |
|
Running igt, I was encountering the invalid TLB bug on my 845g, despite
that it was using the CS workaround. Examining the w/a buffer in the
error state, showed that the copy from the user batch into the
workaround itself was suffering from the invalid TLB bug (the first
cacheline was broken with the first two words reversed). Time to try a
fresh approach. This extends the workaround to write into each page of
our scratch buffer in order to overflow the TLB and evict the invalid
entries. This could be refined to only do so after we update the GTT,
but for simplicity, we do it before each batch.
I suspect this supersedes our current workaround, but for safety keep
doing both.
v2: The magic number shall be 2.
This doesn't conclusively prove that it is the mythical TLB bug we've
been trying to workaround for so long, that it requires touching a number
of pages to prevent the corruption indicates to me that it is TLB
related, but the corruption (the reversed cacheline) is more subtle than
a TLB bug, where we would expect it to read the wrong page entirely.
Oh well, it prevents a reliable hang for me and so probably for others
as well.
Signed-off-by: Chris Wilson <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: [email protected]
Reviewed-by: Daniel Vetter <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
|
|
Withtout this, ring initialization fails reliabily during resume with
[drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head ffffff8804 tail 00000000 start 000e4000
This is not a complete fix, but it is verified to make the ring
initialization failures during resume much less likely.
We were not able to root-cause this bug (likely HW-specific to Gen4 chips)
yet. This is therefore used as a ducttape before problem is fully
understood and proper fix created, so that people don't suffer from
completely unusable systems in the meantime.
The discussion and debugging is happening at
https://bugs.freedesktop.org/show_bug.cgi?id=76554
Signed-off-by: Jiri Kosina <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
|
|
On Broadwell, any PIPE_CONTROL with the "State Cache Invalidate" bit set
must be preceded by a PIPE_CONTROL with the "CS Stall" bit set.
Documented on the BSpec 3D workarounds page.
Reviewed-by: Rafael Barbalho <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
[vsyrjala: add chv w/a note too]
Signed-off-by: Ville Syrjälä <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
We'll want to reuse this for a workaround.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jesse Barnes <[email protected]>
[danvet: Rmove now unused int.]
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Traditionally we use genX_ for GT/render stuff and the codenames for
display stuff. But the gt and pm interrupt handling functions on
gen5/6+ stuck out as exceptions, so convert them.
Looking at the diff this nicely realigns our ducks since almost all
the callers are already platform-specific functions following the
genX_ pattern.
Spotted while reviewing some internal rps patches.
No function change in this patch.
Acked-by: Damien Lespiau <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
On g33, the documentation states
"HWS_PGA:
Format = Bits 28:12 of graphics memory address (bits 31:29 MBZ)."
which translates to that the address of the HWS must be below 256MiB,
which is conveniently the mappable aperture.
This also appears to be true (but not documented as so) for gen4 and
gen5. To generalise we force it into the low mappable region for all
non-LLC platforms. If we locate the HWS at the top of the GTT the
machine will hard hang during boot (fails on pnv, gm45, ilk and byt,
but works on snb, ivb, hsw).
v2: Add comments to explain why use PIN_MAPPABLE even though we have
no intention of mapping the object. (Ville)
Signed-off-by: Chris Wilson <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
It's simple enough that it doesn't need to know anything about the
engine.
Trivial change.
Reviewed-by: Jesse Barnes <[email protected]>
Signed-off-by: Oscar Mateo <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
More prep work: with Execlists, we are going to start creating a lot
of extra ringbuffers soon, so these functions are handy.
No functional changes.
v2: rename allocate/destroy_ring_buffer to alloc/destroy_ringbuffer_obj
because the name is more meaningful and to mirror a similar function in
the context world: i915_gem_alloc_context_obj(). Change suggested by Brad
Volkin.
Reviewed-by: Jesse Barnes <[email protected]>
Signed-off-by: Oscar Mateo <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
As Ville points out, it's possible/probable we don't actually need this.
Potentially, this validates the letter of the spec, and not the spirit.
Ville:
> I discussed this on irc w/ Ben, and I was suggesting we don't need to
> poll. Polling apparently can be used as a workaround for certain
> hardware issues, but it looks like those issues shouldn't affect us,
> for the momemnt at least. So my suggestion was to try w/o polling
> first (since there could be some power cost to polling) and add the
> poll bit if problems arise.
Rodrigo: Spec suggests this as an W/A for GT3. However semaphores didn't
worked in my BDW GT2 on Signal Mode. So pool mode is definitely needed.
Reviewed-by: Rodrigo Vivi <[email protected]>
Tested-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Semaphore waits use a new instruction, MI_SEMAPHORE_WAIT. The seqno to
wait on is all well defined by the table in the previous patch. There is
nothing else different from previous GEN's semaphore synchronization
code.
v2: Update macros to not require the other ring's ring->id (Chris)
v3: Add missing VCS2 gen8_ring_wait init besides
s/ring_buffer/engine_cs (Rodrigo)
Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Semaphore signalling works similarly to previous GENs with the exception
that the per ring mailboxes no longer exist. Instead you must define
your own space, somewhere in the GTT.
The comments in the code define the layout I've opted for, which should
be fairly future proof. Ie. I tried to define offsets in abstract terms
(NUM_RINGS, seqno size, etc).
NOTE: If one wanted to move this to the HWSP they could. I've decided
one 4k object would be easier to deal with, and provide potential wins
with cache locality, but that's all speculative.
v2: Update the macro to not need the other ring's ring->id (Chris)
Update the comment to use the correct formula (Chris)
v3: Move the macros the ringbuffer.h to prevent churn in next patch
(Ville)
v4: Fixed compilation rebase conflict
commit 1ec9e26ddab06459e89a890431b2de064c5d1056
Author: Daniel Vetter <[email protected]>
Date: Fri Feb 14 14:01:11 2014 +0100
drm/i915: Consolidate binding parameters into flags
v5: VCS2 rebase
Replace hweight_long with hweight32
v6 (Rodrigo): * Add missed VC2 gen8 ring signal init
* fixing conflicst on rebase
* minor fixes on address table
* remove WARN_ON
Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
[danvet: s/BUG_ON/WARN_ON/]
Signed-off-by: Daniel Vetter <[email protected]>
|
|
With the ring mask we now have an easy way to know the number of rings
in the system, and therefore can accurately predict the number of dwords
to emit for semaphore signalling. This was not possible (easily)
previously.
There should be no functional impact, simply fewer instructions emitted.
While we're here, simply do the round up to 2 instead of the fancier
rounding we did before, which rounding up per mbox, ie 4. This also
allows us to drop the unnecessary MI_NOOP, so not really 4, 3.
v2: Use 3 dwords instead of 4 (Ville)
Do the proper calculation to get the number of dwords to emit (Ville)
Conditionally set .sync_to when semaphores are enabled (Ville)
v3: Rebased on VCS2
Replace hweight_long with hweight32 (Ville)
v4: Pull out the accidentally squashed hunk from the next patch after
rebase (Daniel).
v5: Fix conflict after rebase (Rodrigo)
Reviewed-by: Rodrigo Vivi <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]> (v1)
Signed-off-by: Ben Widawsky <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Gen8 has already had some differentiation with how it handles rings.
Semaphores bring yet more differences, and now is as good a time as any
to do the split.
Also, since gen8 doesn't actually use semaphores up until this point,
put the proper "NULL" values in for the mbox info.
v2: v1 had a stale commit message
v3: Move everything in the is_semaphore_enabled() check
v4: VCS2 rebase
Remove double assignment of signal in render ring (Ville)
v5: Adding missed VCS2 signal init on gen8+ (Rodrigo)
Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
It just fix a typo.
v2: removing underscore to let this like all other ring names (Oscar)
Cc: Oscar Mateo <[email protected]>
Reviewed-by (v1): Ben Widawsky <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
This commit add check for return value of init_ring_common() in the
init_render_ring(). Now, when failure is detected the error code is
propagated to the caller instead of being ignored.
Signed-off-by: Konrad Zapalowicz <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
This adds support for a write-enable bit in the entry of GTT.
This is handled via a read-only flag in the GEM buffer object which
is then used to see how to set the bit when writing the GTT entries.
Currently by default the Batch buffer & Ring buffers are marked as read only.
v2: Moved the pte override code for read-only bit to 'byt_pte_encode'. (Chris)
Fixed the issue of leaving 'gt_old_ro' as unused. (Chris)
v3: Removed the 'gt_old_ro' field, now setting RO bit only for Ring Buffers(Daniel).
v4: Added a new 'flags' parameter to all the pte(gen6) encode & insert_entries functions,
in lieu of overloading the cache_level enum (Daniel).
v5: Removed the superfluous VLV check & changed the definition location of PTE_READ_ONLY flag (Imre)
Reviewed-by: Imre Deak <[email protected]>
Signed-off-by: Akash Goel <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
These do not exist anymore.
Spotted while reading through intel_ringbuffer.c
Signed-off-by: Oscar Mateo <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Gen2 doesn't have the ring idle/stop bits in the SCPD/MI_MODE register,
so don't go spewing warnings about the state of those bits.
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Manual cleanup after the previous Coccinelle script.
Yes, I could write another Coccinelle script to do this but I
don't want labor-replacing robots making an honest programmer's
work obsolete (also, I'm lazy).
Signed-off-by: Oscar Mateo <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
This refactoring has been performed using the following Coccinelle
semantic script:
@@
struct intel_engine_cs r;
@@
(
- (r).obj
+ r.buffer->obj
|
- (r).virtual_start
+ r.buffer->virtual_start
|
- (r).head
+ r.buffer->head
|
- (r).tail
+ r.buffer->tail
|
- (r).space
+ r.buffer->space
|
- (r).size
+ r.buffer->size
|
- (r).effective_size
+ r.buffer->effective_size
|
- (r).last_retired_head
+ r.buffer->last_retired_head
)
@@
struct intel_engine_cs *r;
@@
(
- (r)->obj
+ r->buffer->obj
|
- (r)->virtual_start
+ r->buffer->virtual_start
|
- (r)->head
+ r->buffer->head
|
- (r)->tail
+ r->buffer->tail
|
- (r)->space
+ r->buffer->space
|
- (r)->size
+ r->buffer->size
|
- (r)->effective_size
+ r->buffer->effective_size
|
- (r)->last_retired_head
+ r->buffer->last_retired_head
)
@@
expression E;
@@
(
- LP_RING(E)->obj
+ LP_RING(E)->buffer->obj
|
- LP_RING(E)->virtual_start
+ LP_RING(E)->buffer->virtual_start
|
- LP_RING(E)->head
+ LP_RING(E)->buffer->head
|
- LP_RING(E)->tail
+ LP_RING(E)->buffer->tail
|
- LP_RING(E)->space
+ LP_RING(E)->buffer->space
|
- LP_RING(E)->size
+ LP_RING(E)->buffer->size
|
- LP_RING(E)->effective_size
+ LP_RING(E)->buffer->effective_size
|
- LP_RING(E)->last_retired_head
+ LP_RING(E)->buffer->last_retired_head
)
Note: On top of this this patch also removes the now unused ringbuffer
fields in intel_engine_cs.
Signed-off-by: Oscar Mateo <[email protected]>
[danvet: Add note about fixup patch included here.]
Signed-off-by: Daniel Vetter <[email protected]>
|
|
As advanced by the previous patch, the ringbuffers and the engine
command streamers belong in different structs. This is so because,
while they used to be tightly coupled together, the new Logical
Ring Contexts (LRC for short) have a ringbuffer each.
In legacy code, we will use the buffer* pointer inside each ring
to get to the pertaining ringbuffer (the actual switch will be
done in the next patch). In the new Execlists code, this pointer
will be NULL and we will use instead the one inside the context
instead.
Signed-off-by: Oscar Mateo <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
In the upcoming patches we plan to break the correlation between
engine command streamers (a.k.a. rings) and ringbuffers, so it
makes sense to refactor the code and make the change obvious.
No functional changes.
Signed-off-by: Oscar Mateo <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
We implement the following workarounds:
* WaDisableAsyncFlipPerfMode:chv
* WaProgramMiArbOnOffAroundMiSetContext:chv
v2: Drop WaDisableSemaphoreAndSyncFlipWait note
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Paulo Zanoni <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
If we dont have semaphores enabled, we allocate 4
dwords for signalling. But end up emitting more regardless.
Fix this by bailing out early if semaphores are not enabled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78274
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78283
Signed-off-by: Mika Kuoppala <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
This is missing in:
commit 78325f2d270897c9ee0887125b7abb963eb8efea
Author: Ben Widawsky <[email protected]>
Date: Tue Apr 29 14:52:29 2014 -0700
drm/i915: Virtualize the ringbuffer signal func
Looks to me like a rebase side-effect...
Signed-off-by: Oscar Mateo <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
For clients that submit large batch buffers the command parser has
a substantial impact on performance. On my HSW ULT system performance
drops as much as ~20% on some tests. Most of the time is spent in the
command lookup code. Converting that from the current naive search to
a hash table lookup reduces the performance drop to ~10%.
The choice of value for I915_CMD_HASH_ORDER allows all commands
currently used in the parser tables to hash to their own bucket (except
for one collision on the render ring). The tradeoff is that it wastes
memory. Because the opcodes for the commands in the tables are not
particularly well distributed, reducing the order still leaves many
buckets empty. The increased collisions don't seem to have a huge
impact on the performance gain, but for now anyhow, the parser trades
memory for performance.
NB: Ville noticed that the error paths through the ring init code
will leak memory. I've not addressed that here. We can do a follow
up pass to handle all of the leaks.
v2: improved comment describing selection of hash key mask (Damien)
replace a BUG_ON() with an error return (Tvrtko, Ville)
commit message improvements
Signed-off-by: Brad Volkin <[email protected]>
Reviewed-by: Damien Lespiau <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
During the review of
commit 1f70999f9052f5a1b0ce1a55aff3808f2ec9fe42
Author: Chris Wilson <[email protected]>
Date: Mon Jan 27 22:43:07 2014 +0000
drm/i915: Prevent recursion by retiring requests when the ring is full
Ville raised the point that our interaction with request->tail was
likely to foul up other uses elsewhere (such as hang check comparing
ACTHD against requests).
However, we also need to restore the implicit retire requests that certain
test cases depend upon (e.g. igt/gem_exec_lut_handle), this raises the
spectre that the ppgtt will randomly call i915_gpu_idle() and recurse
back into intel_ring_begin().
Signed-off-by: Chris Wilson <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78023
Reviewed-by: Brad Volkin <[email protected]>
[danvet: Remove now unused 'tail' variable as spotted by Brad.]
Signed-off-by: Daniel Vetter <[email protected]>
|
|
A few improvements to the fallback method for waiting upon ring space:
1. Fix the start/end wait tracepoints to always be paired.
2. Increase responsiveness of checking
3. Mark the process as waiting upon io
4. Check for signal interruptions
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Brad Volkin <[email protected]>
[danvet: Drop the s/msleep/io_schedule_timeout/ change again since the
latter isn't exported.]
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Previously, our code only had a 32b offset value for where the
batchbuffer starts. With full PPGTT, and 64b canonical GPU address
space, that is an insufficient value. The code to expand is pretty
straight forward, and only one platform needs to do anything with the
extra bits.
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Reviewed-by: Rafael Barbalho <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Add_request has always contained both the semaphore mailbox updates as
well as the breadcrumb writes. Since the semaphore signal is the one
which actually knows about the number of dwords it needs to emit to the
ring, we move the ring_begin to that function. This allows us to remove
the hideously shared #define
On a related not, gen8 will use a different number of dwords for
semaphores, but not for add request.
v2: Make number of dwords an explicit part of signalling (via function
argument). (Chris)
v3: very slight comment change
Reviewed-by: Ville Syrjälä <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
This abstraction again is in preparation for gen8. Gen8 will bring new
semantics for doing this operation.
While here, make the writes of MI_NOOPs explicit for non-existent rings.
This should have been implicit before.
NOTE: This is going to be removed in a few patches.
Reviewed-by: Ville Syrjälä <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
This will be helpful in abstracting some of the code in preparation for
gen8 semaphores.
v2: Move mbox stuff to a separate struct
v3: Rebased over VCS2 work
Reviewed-by: Ville Syrjälä <[email protected]> (v1)
Signed-off-by: Ben Widawsky <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
The Gen7 doesn't have the second BSD ring. But it will complain the switch check
warning message during compilation. So just add it to remove the
switch check warning.
V1->V2: Follow Daniel's comment to update the comment
Reviewed-by: Imre Deak <[email protected]>
Signed-off-by: Zhao Yakui <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Based on the hardware spec, the BDW GT3 machine has two independent
BSD ring that can be used to dispatch the video commands.
So just initialize it.
V3->V4: Follow Imre's comment to do some minor updates. For example:
more comments are added to describe the semaphore between ring.
Reviewed-by: Imre Deak <[email protected]>
Signed-off-by: Zhao Yakui <[email protected]>
[danvet: Fix up checkpatch error.]
Signed-off-by: Daniel Vetter <[email protected]>
|
|
If we include the expected values for the failing ring register checks,
it makes it marginally easier to see which is the culprit.
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Oscar Mateo <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Tearing down the ring buffers across resume is overkill, risks
unnecessary failure and increases fragmentation.
After failure, since the device is still active we may end up trying to
write into the dangling iomapping and trigger an oops.
v2: stop_ringbuffers() was meant to call stop(ring) not
cleanup(ring) during resume!
Reported-by: Jae-hyeon Park <[email protected]>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=72351
References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Oscar Mateo <[email protected]>
[danvet: s/ring->obj == NULL/!intel_ring_initialized(ring)/ as
suggested by Oscar.]
Signed-off-by: Daniel Vetter <[email protected]>
|
|
For readibility and guess at the meaning behind the constants.
v2: Claim only the meagerest connections with reality.
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Oscar Mateo <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Piglit runner and QA are both looking at the dmesg for
DRM_ERRORs with test cases. Add a flag to control those
when we they are expected from related test cases.
Also add flag to control if contexts should be banned
that introduced the hang. Hangcheck is timer based and
preventing bans by adding sleeps to testcases makes
testing slower.
v2: intel_ring_stopped(), readable comment (Chris)
v3: keep compatibility (Daniel)
References: https://bugs.freedesktop.org/show_bug.cgi?id=75876
Signed-off-by: Mika Kuoppala <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
In commit a51435a3137ad8ae75c288c39bd2d8b2696bae8f
Author: Naresh Kumar Kachhi <[email protected]>
Date: Wed Mar 12 16:39:40 2014 +0530
drm/i915: disable rings before HW status page setup
we reordered stopping the rings to do so before we set the HWS register.
However, there is an extra workaround for g45 to reset the rings twice,
and for consistency we should apply that workaround before setting the
HWS to be sure that the rings are truly stopped.
Cc: Naresh Kumar Kachhi <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Jesse Barnes <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
We have been setting the bit which was originally BIOS dependent since:
commit f05bb0c7b624252a5e768287e340e8e45df96e42
Author: Chris Wilson <[email protected]>
Date: Sun Jan 20 16:33:32 2013 +0000
drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits
Therefore, we do not need to try to figure it out dynamically and we can
just always invalidate the TLBs.
It's a partial revert of:
commit 12b0286f49947a6cdc9285032d918466a8c3f5f9
Author: Ben Widawsky <[email protected]>
Date: Mon Jun 4 14:42:50 2012 -0700
drm/i915: possibly invalidate TLB before context switch
The original commit attempted to only invalidate when necessary
(very much a relic from the old days). Now, we can just always invalidate.
I guess the old TODO still exists. Since we seem to have abandoned ILK
contexts however, there isn't much point in even remembering.
Cc: Chris Wilson <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
This patch Enables the bit for TLB invalidate in GFX Mode register
for Gen7.
According to bspec, When enabled this bit limits the invalidation
of the TLB only to batch buffer boundaries, to pipe_control
commands which have the TLB invalidation bit set and sync flushes.
If disabled, the TLB caches are flushed for every full flush of
the pipeline.
Tested only on vlv platform. Chris has tested on ivb and hsw
platforms.
v2: Adding the explicit enabling of this bit for all Gen7 platforms
instead of only vlv (Chris)
Signed-off-by: Akash Goel <[email protected]>
Signed-off-by: Sourab Gupta <[email protected]>
Tested-by: Chris Wilson <[email protected]> #ivb, hsw -Chris
Reviewed-by: Ville Syrjälä <[email protected]>
[danvet: Add w/a markers as suggested by Ville.]
Signed-off-by: Daniel Vetter <[email protected]>
|
|
The documentation calls this GFX_MODE bit "Flush TLB invalidate Mode".
However, that is not a good name for an enable bit as it doesn't make it
clear what is enabled. An even worse name is GFX_TLB_INVALIDATE_ALWAYS
as enabling that bit actually prevents the TLB from being invalidated at
every flush. This leads to great confusion when reading code and
proposed patches. To get around this try to bake in what is enabled by
setting the bit and call it GFX_TLB_INVALIDATE_EXPLICIT.
Signed-off-by: Chris Wilson <[email protected]>
Cc: "Gupta, Sourab" <[email protected]>
Acked-by: "Gupta, Sourab" <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
No functional changes.
Signed-off-by: Jani Nikula <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
As Broadwell has an increased virtual address size, it requires more
than 32 bits to store offsets into its address space. This includes the
debug registers to track the current HEAD of the individual rings, which
may be anywhere within the per-process address spaces. In order to find
the full location, we need to read the high bits from a second register.
We then also need to expand our storage to keep track of the larger
address.
v2: Carefully read the two registers to catch wraparound between
the reads.
v3: Use a WARN_ON rather than loop indefinitely on an unstable
register read.
Signed-off-by: Chris Wilson <[email protected]>
Cc: Ben Widawsky <[email protected]>
Cc: Timo Aaltonen <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
[danvet: Drop spurious hunk which conflicted.]
Signed-off-by: Daniel Vetter <[email protected]>
|
|
This patch Removes the VS_TIMER_DISPATCH bit enable in MI MODE reg for
platforms > Gen6.
VS_TIMER_DISPATCH bit enable was earlier required as a part of
WA 'WaTimedSingleVertexDispatch', which is now applicable only to
platforms < Gen7.
v2: Enhancing the scope of the patch to full Gen7 (Chris)
v3: Modifying the WA condition to the cover the applicable platforms,
and adding the WA name in comments. (Ville)
Signed-off-by: Akash Goel <[email protected]>
Signed-off-by: Sourab Gupta <[email protected]>
Tested-by: Chris Wilson <[email protected]> # ivb, hsw -Chris
Reviewed-by: Ville Syrjälä <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
While wandering in the spec, I noticed that BDW removes those 2 bits
from INSTPM. I couldn't find any direct way to invalidate the TLB (ie
without the ring working already). Maybe someone will be more lucky.
At least, we now know we may be a problem.
Signed-off-by: Damien Lespiau <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Based on Bspec the command parser must be stopped prior to
issuing sync flush. This should be done by the caller of
intel_ring_setup_status_page. Patch adds a warning if it is
not done.
v2: rebased based on new patch (wait for ring to become idle)
Signed-off-by: Naresh Kumar Kachhi <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
make sure we wait for rings to become idle once they are
disabled. In case of timeout print an error message
Signed-off-by: Naresh Kumar Kachhi <[email protected]>
[danvet: Frob patch as suggested by Chris.]
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Rings should be idle before issuing sync_flush
(in intel_ring_setup_status_page). This patch moves the ring
disabling before doing the HW status page setup.
Signed-off-by: Naresh Kumar Kachhi <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
Linux 3.14-rc6
I need the hdmi/dvi-dual link fixes in 3.14 to avoid ugly conflicts
when merging Ville's new hdmi cloning support into my -next tree
Conflicts:
drivers/gpu/drm/i915/Makefile
drivers/gpu/drm/i915/intel_dp.c
Makefile cleanup conflicts with an acpi build fix, intel_dp.c is
trivial.
Signed-off-by: Daniel Vetter <[email protected]>
|