aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/i915_gem_execbuffer.c
AgeCommit message (Collapse)AuthorFilesLines
2011-03-23drm/i915: Disable pagefaults along execbuffer relocation fast pathChris Wilson1-4/+17
Along the fast path for relocation handling, we attempt to copy directly from the user data structures whilst holding our mutex. This causes lockdep to warn about circular lock dependencies if we need to pagefault the user pages. [Since when handling a page fault on a mmapped bo, we need to acquire the struct mutex whilst already holding the mm semaphore, it is then verboten to acquire the mm semaphore when already holding the struct mutex. The likelihood of the user passing in the relocations contained in a GTT mmaped bo is low, but conceivable for extreme pathology.] In order to force the mm to return EFAULT rather than handle the pagefault, we therefore need to disable pagefaults across the relocation fast path. Signed-off-by: Chris Wilson <[email protected]> Cc: [email protected] Reviewed-by: Daniel Vetter <[email protected]>
2011-03-07Merge branch 'drm-intel-fixes' into drm-intel-nextChris Wilson1-1/+2
Apply the trivial conflicting regression fixes, but keep GPU semaphores enabled. Conflicts: drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/i915/i915_gem_execbuffer.c
2011-03-07drm/i915: Only wait on a pending flip if we intend to write to the bufferChris Wilson1-48/+44
... as if we are only reading from it, we can do that concurrently with the queue flip. Signed-off-by: Chris Wilson <[email protected]>
2011-03-07drm/i915: Disable GPU semaphores by defaultChris Wilson1-2/+2
Andi Kleen narrowed his GPU hangs on his Sugar Bay (SNB desktop) rev 09 down to the use of GPU semaphores, and we already know that they appear broken up to Huron River (mobile) rev 08. (I'm optimistic that disabling GPU semaphores is simply hiding another bug by the latency and side-effects of the additional device interaction it introduces...) However, use of semaphores is a massive performance improvement... Only as long as the system remains stable. Enable at your peril. Reported-by: Andi Kleen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33921 Signed-off-by: Chris Wilson <[email protected]>
2011-03-01drm/i915: Re-enable GPU semaphores for SandyBridge mobileChris Wilson1-2/+1
This seems to be running stably on my test laptop, so hopefully the reported hangs where just symptoms of other bugs. Signed-off-by: Chris Wilson <[email protected]>
2011-03-01drm/i915: Allow relocation deltas outside of target boChris Wilson1-10/+0
Userspace has a legitimate requirement to use a delta that points to outside of the target bo, and so we need to enable this. (As this is an abi break, albeit a relaxation of the current restrictions, mark the change with a new flag.) Signed-off-by: Chris Wilson <[email protected]>
2011-02-22drm/i915: Use a device flag for non-interruptible phasesChris Wilson1-2/+2
The code paths for modesetting are growing in complexity as we may need to move the buffers around in order to fit the scanout in the aperture. Therefore we face a choice as to whether to thread the interruptible status through the entire pinning and unbinding code paths or to add a flag to the device when we may not be interrupted by a signal. This does the latter and so fixes a few instances of modesetting failures under stress. Signed-off-by: Chris Wilson <[email protected]>
2011-02-22drm/i915: First try a normal large kmalloc for the temporary exec buffersChris Wilson1-1/+5
As we just need a temporary array whilst performing the relocations for the execbuffer, first attempt to allocate using kmalloc even if it is not of order page-0. This avoids the overhead of remapping the discontiguous array and so gives a moderate boost to execution throughput. Signed-off-by: Chris Wilson <[email protected]>
2011-02-22drm/i915: Protect against drm_gem_object not being the first memberChris Wilson1-2/+2
Dave Airlie spotted that we had a potential bug should we ever rearrange the drm_i915_gem_object so not the base drm_gem_object was not its first member. He noticed that we often convert the return of drm_gem_object_lookup() immediately into drm_i915_gem_object and then check the result for nullity. This is only valid when the base object is the first member and so the superobject has the same address. Play safe instead and use the compiler to convert back to the original return address for sanity testing. Signed-off-by: Chris Wilson <[email protected]>
2011-02-07drm/i915: Refine tracepointsChris Wilson1-45/+13
A lot of minor tweaks to fix the tracepoints, improve the outputting for ftrace, and to generally make the tracepoints useful again. It is a start and enough to begin identifying performance issues and gaps in our coverage. Signed-off-by: Chris Wilson <[email protected]>
2011-01-24Merge branch 'drm-intel-fixes' into drm-intel-nextChris Wilson1-1/+1
Merge important suspend and resume regression fixes and resolve the small conflict. Conflicts: drivers/gpu/drm/i915/i915_dma.c
2011-01-23drm/i915: Fix use of invalid array size for ring->sync_seqnoChris Wilson1-1/+1
There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we were clearing I915_NUM_RINGS of them. Oops. Reported-by: Jiri Slaby <[email protected]> Tested-by: Jiri Slaby <[email protected]> Signed-off-by: Chris Wilson <[email protected]>
2011-01-19drm/i915: Trivial sparse fixesChris Wilson1-2/+0
Move code around and invoke iomem annotation in a few more places in order to silence sparse. Still a few more iomem annotations to go... Signed-off-by: Chris Wilson <[email protected]>
2011-01-14drm/i915: Disable GPU semaphores on SandyBridge mobileChris Wilson1-1/+2
Hopefully, this is a temporary measure whilst the root cause is understood. At the moment, we experience a hard hang whilst looping urbanterror that has been identified as a result of the use of semaphores, but so far only on SNB mobile. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32752 Tested-by: [email protected] Signed-off-by: Chris Wilson <[email protected]>
2011-01-13drm/i915/execbuffer: Clear domains before beginning reloc processingChris Wilson1-4/+3
After reordering the sequence of relocating objects, commit 6fe4f1404, we can no longer rely on seeing all reloc targets prior to performing the relocation. As a result we were ignoring the need to flush objects from the render cache and invalidate the sampler caches, resulting in rendering glitches. So we need to clear the relocation domains earlier. Reported-by: Linus Torvalds <[email protected]> Tested-by: Linus Torvalds <[email protected]> Signed-off-by: Chris Wilson <[email protected]>
2011-01-13drm/i915/execbuffer: Reorder relocations to match new object orderChris Wilson1-6/+9
On the fault path, commit 6fe4f140 introduction a regression whereby it changed the sequence of the objects but continued to use the original ordering of relocation entries. The result was that incorrect GTT offsets were being fed into the execbuffer causing lots of misrendering and potential hangs. Reported-by: Linus Torvalds <[email protected]> Tested-by: Linus Torvalds <[email protected]> Signed-off-by: Chris Wilson <[email protected]>
2011-01-11drm/i915/execbuffer: Reorder binding of objects to favour restrictionsChris Wilson1-26/+46
As the mappable portion of the aperture is always a small subset at the start of the GTT, it is allocated preferentially by drm_mm. This is useful in case we ever need to map an object later. However, if you have a large object that can consume the entire mappable region of the GTT this prevents the batchbuffer from fitting and so causing an error. Instead allocate all those that require a mapping up front in order to improve the likelihood of finding sufficient space to bind them. Signed-off-by: Chris Wilson <[email protected]>
2011-01-11drm/i915/execbuffer: Correctly clear the current object list upon EFAULTChris Wilson1-3/+1
Before releasing the lock in order to copy the relocation list from user pages, we need to drop all the object references as another thread may usurp and execute another batchbuffer before we reacquire the lock. However, the code was buggy and failed to clear the list... Signed-off-by: Chris Wilson <[email protected]> Cc: [email protected]
2011-01-11drm/i915: Propagate error from flushing the ringChris Wilson1-10/+18
... in order to avoid a BUG() and potential unbounded waits. Signed-off-by: Chris Wilson <[email protected]>
2011-01-11drm/i915: Handle ringbuffer stalls when flushingChris Wilson1-5/+7
Signed-off-by: Chris Wilson <[email protected]>
2011-01-11drm/i915: Enforce write ordering through the GTTChris Wilson1-0/+3
We need to ensure that writes through the GTT land before any modification to the MMIO registers and so must impose a mandatory write barrier when flushing the GTT domain. This was revealed by relaxing the write ordering by experimentally mapping the registers and the GATT as write-combining. Signed-off-by: Chris Wilson <[email protected]>
2010-12-20drm/i915: Allow the application to choose the constant addressing modeChris Wilson1-1/+34
The relative-to-general state default is useless as it means having to rewrite the streaming kernels for each batch. Relative-to-surface is more useful, as that stream usually needs to be rewritten for each batch. And absolute addressing mode, vital if you start streaming state, is also only available by adjusting the register... Signed-off-by: Chris Wilson <[email protected]>
2010-12-09drm/i915: Mark the user reloc error paths as unlikelyChris Wilson1-9/+8
Signed-off-by: Chris Wilson <[email protected]>
2010-12-09drm/i915: Eliminate drm_gem_object_lookup during relocationChris Wilson1-27/+123
As we provide a list of all objects that will be accessed from the batchbuffer, we can build a lut of the handles associated with those objects for this invocation and use that to avoid the overhead of looking up those objects again for every relocation. The cost of building and searching a small hash table is much less than that of acquiring a spinlock, searching a radix tree and manipulating an atomic refcnt per relocation. Signed-off-by: Chris Wilson <[email protected]>
2010-12-05drm/i915: Ignore fenced commands for gpu access on gen4Chris Wilson1-11/+16
Userspace should not have been declaring that it needed fenced GPU access with gen4+ as those GPUs have no fenced commands, but to be on the safe side it is easier to ignore userspace in case they did. Signed-off-by: Chris Wilson <[email protected]>
2010-12-05drm/i915: Implement GPU semaphores for inter-ring synchronisation on SNBChris Wilson1-22/+72
The bulk of the change is to convert the growing list of rings into an array so that the relationship between the rings and the semaphore sync registers can be easily computed. Signed-off-by: Chris Wilson <[email protected]>
2010-12-02drm/i915: Pipelined fencing [infrastructure]Chris Wilson1-7/+13
With this change, every batchbuffer can use all available fences (save pinned and scanout, of course) without ever stalling the gpu! In theory. Currently the actual pipelined update of the register is disabled due to some stability issues. However, just the deferred update is a significant win. Based on a series of patches by Daniel Vetter. The premise is that before every access to a buffer through the GTT we have to declare whether we need a register or not. If the access is by the GPU, a pipelined update to the register is made via the ringbuffer, and we track the last seqno of the batches that access it. If by the CPU we wait for the last GPU access and update the register (either to clear or to set it for the current buffer). One advantage of being able to pipeline changes is that we can defer the actual updating of the fence register until we first need to access the object through the GTT, i.e. we can eliminate the stall on set_tiling. This is important as the userspace bo cache does not track the tiling status of active buffers which generate frequent stalls on gen3 when enabling tiling for an already bound buffer. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Daniel Vetter <[email protected]>
2010-12-02drm/i915: Prevent stalling for a GTT read back from a read-only GPU targetChris Wilson1-0/+1
Signed-off-by: Chris Wilson <[email protected]>
2010-11-30drm/i915/ringbuffer: Handle cliprects in the callerChris Wilson1-5/+25
This makes the various rings more consistent by removing the anomalous handing of the rendering ring execbuffer dispatch. Signed-off-by: Chris Wilson <[email protected]>
2010-11-28drm/i915/execbuffer: On error, starting unwinding from the previous objectChris Wilson1-0/+3
As the error occurred on the current object, it means that its state was not changed and so it should be excluded from the unwind. Reported-by: Daniel Vetter <[email protected]> Signed-off-by: Chris Wilson <[email protected]>
2010-11-25drm/i915: Avoid allocation for execbuffer object listChris Wilson1-214/+189
Besides the minimal improvement in reducing the execbuffer overhead, the real benefit is clarifying a few routines. Signed-off-by: Chris Wilson <[email protected]>
2010-11-25drm/i915: Split i915_gem_execbuffer into its own file.Chris Wilson1-0/+1155
A number of dragons have been seen lurking within the execbuffer code. The first step is then to isolate them from the rest and begin to scrutinise them in depth. Suggested by Daniel Vetter. Signed-off-by: Chris Wilson <[email protected]>