aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-06-21drm/i915: Retire the VMA's fence tracker before unbindingChris Wilson1-0/+5
Since we may track unfenced access (GPU access to the vma that explicitly requires no fence), vma->last_fence may be set without any attached fence (vma->fence) and so will not be flushed when we call i915_vma_put_fence(). Since we stopped doing a full retire of the activity trackers for unbind, we need to explicitly retire each tracker. Fixes: b0decaf75bd9 ("drm/i915: Track active vma requests") Signed-off-by: Chris Wilson <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Tvrtko Ursulin <[email protected]>
2017-06-21drm/i915: select CRC32Nicholas Piggin1-0/+1
kbuild test robot found a build failure when building with thin archives: http://marc.info/?l=linux-kbuild&m=149802285009737&w=2 Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Pass the right flags to i915_vma_move_to_active()Chris Wilson1-1/+1
i915_vma_move_to_active() takes the execobject flags and not a boolean! Instead of passing EXEC_OBJECT_WRITE we passed true [i.e. EXEC_OBJECT_NEEDS_FENCE] causing us to start tracking the vma->last_fence access and since we forgot to clear that on unbinding, we caused a use-after-free. [ 321.263854] BUG: KASAN: use-after-free in i915_gem_request_retire+0x1728/0x1740 [i915] [ 321.264001] Read of size 8 at addr ffff880100fc67d8 by task gem_exec_reloc/2868 [ 321.264181] CPU: 0 PID: 2868 Comm: gem_exec_reloc Not tainted 4.12.0-rc6-CI-Custom_2759+ #1 [ 321.264195] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015 [ 321.264208] Call Trace: [ 321.264234] dump_stack+0x67/0x99 [ 321.264260] print_address_description+0x77/0x290 [ 321.264437] ? i915_gem_request_retire+0x1728/0x1740 [i915] [ 321.264459] kasan_report+0x269/0x350 [ 321.264487] __asan_report_load8_noabort+0x14/0x20 [ 321.264660] i915_gem_request_retire+0x1728/0x1740 [i915] [ 321.264841] ? intel_ring_context_pin+0x131/0x690 [i915] [ 321.265021] i915_gem_request_alloc+0x2c6/0x1220 [i915] [ 321.265044] ? _raw_spin_unlock_irqrestore+0x3d/0x60 [ 321.265226] i915_gem_do_execbuffer+0xac0/0x2a20 [i915] [ 321.265250] ? __lock_acquire+0xceb/0x5450 [ 321.265269] ? entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 321.265291] ? kvmalloc_node+0x6b/0x80 [ 321.265310] ? kvmalloc_node+0x6b/0x80 [ 321.265489] ? eb_relocate_slow+0xbe0/0xbe0 [i915] [ 321.265520] ? ___slab_alloc.constprop.28+0x2ab/0x3d0 [ 321.265549] ? debug_check_no_locks_freed+0x280/0x280 [ 321.265591] ? __might_fault+0xc6/0x1b0 [ 321.265782] i915_gem_execbuffer2+0x14a/0x3f0 [i915] [ 321.265815] drm_ioctl+0x4ba/0xaa0 [ 321.265986] ? i915_gem_execbuffer+0xde0/0xde0 [i915] [ 321.266017] ? drm_getunique+0x270/0x270 [ 321.266068] do_vfs_ioctl+0x17f/0xfa0 [ 321.266091] ? __fget+0x1ba/0x330 [ 321.266112] ? lock_acquire+0x390/0x390 [ 321.266133] ? ioctl_preallocate+0x1d0/0x1d0 [ 321.266164] ? __fget+0x1db/0x330 [ 321.266194] ? __fget_light+0x79/0x1f0 [ 321.266219] SyS_ioctl+0x3c/0x70 [ 321.266247] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 321.266265] RIP: 0033:0x7fcede207357 [ 321.266279] RSP: 002b:00007ffef0effe58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 321.266307] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fcede207357 [ 321.266321] RDX: 00007ffef0effef0 RSI: 0000000040406469 RDI: 0000000000000004 [ 321.266335] RBP: ffffffff812097c6 R08: 0000000000000008 R09: 0000000000000000 [ 321.266349] R10: 0000000000000008 R11: 0000000000000246 R12: ffff880116bcff98 [ 321.266363] R13: ffffffff81cb7cb3 R14: ffff880116bcff70 R15: 0000000000000000 [ 321.266385] ? __this_cpu_preempt_check+0x13/0x20 [ 321.266406] ? trace_hardirqs_off_caller+0x1d6/0x2c0 [ 321.266487] Allocated by task 2868: [ 321.266568] save_stack_trace+0x16/0x20 [ 321.266586] kasan_kmalloc+0xee/0x180 [ 321.266602] kasan_slab_alloc+0x12/0x20 [ 321.266620] kmem_cache_alloc+0xc7/0x2e0 [ 321.266795] i915_vma_instance+0x28c/0x1540 [i915] [ 321.266964] eb_lookup_vmas+0x5a7/0x2250 [i915] [ 321.267130] i915_gem_do_execbuffer+0x69a/0x2a20 [i915] [ 321.267296] i915_gem_execbuffer2+0x14a/0x3f0 [i915] [ 321.267315] drm_ioctl+0x4ba/0xaa0 [ 321.267333] do_vfs_ioctl+0x17f/0xfa0 [ 321.267350] SyS_ioctl+0x3c/0x70 [ 321.267369] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 321.267428] Freed by task 177: [ 321.267502] save_stack_trace+0x16/0x20 [ 321.267521] kasan_slab_free+0xad/0x180 [ 321.267539] kmem_cache_free+0xc5/0x340 [ 321.267710] i915_vma_unbind+0x666/0x10a0 [i915] [ 321.267880] i915_vma_close+0x23a/0x2f0 [i915] [ 321.268048] __i915_gem_free_objects+0x17d/0xc70 [i915] [ 321.268215] __i915_gem_free_work+0x49/0x70 [i915] [ 321.268234] process_one_work+0x66f/0x1410 [ 321.268252] worker_thread+0xe1/0xe90 [ 321.268269] kthread+0x304/0x410 [ 321.268285] ret_from_fork+0x27/0x40 [ 321.268346] The buggy address belongs to the object at ffff880100fc6640 which belongs to the cache i915_vma of size 656 [ 321.268550] The buggy address is located 408 bytes inside of 656-byte region [ffff880100fc6640, ffff880100fc68d0) [ 321.268741] The buggy address belongs to the page: [ 321.268837] page:ffffea000403f000 count:1 mapcount:0 mapping: (null) index:0xffff880100fc5980 compound_mapcount: 0 [ 321.269045] flags: 0x8000000000008100(slab|head) [ 321.269147] raw: 8000000000008100 0000000000000000 ffff880100fc5980 00000001001e001d [ 321.269312] raw: ffffea0004038e20 ffff880116b46240 ffff88011646c640 0000000000000000 [ 321.269484] page dumped because: kasan: bad access detected [ 321.269665] Memory state around the buggy address: [ 321.269778] ffff880100fc6680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 321.269949] ffff880100fc6700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 321.270115] >ffff880100fc6780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 321.270279] ^ [ 321.270410] ffff880100fc6800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 321.270576] ffff880100fc6880: fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc [ 321.270740] ================================================================== [ 321.270903] Disabling lock debugging due to kernel taint Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101511 Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing") Signed-off-by: Chris Wilson <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Tvrtko Ursulin <[email protected]>
2017-06-20drm/i915: Enable Engine reset and recovery supportMichel Thierry1-2/+2
This feature is made available only from Gen8, for previous gen devices driver uses legacy full gpu reset. Cc: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Signed-off-by: Tomas Elf <[email protected]> Signed-off-by: Arun Siluvery <[email protected]> Signed-off-by: Michel Thierry <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915/selftests: reset engine self testsMichel Thierry1-0/+148
Check that we can reset specific engines, also check the fallback to full reset if something didn't work. v2: rebase. v3: use RESET_ENGINE_IN_PROGRESS flag. v4: use I915_RESET_ENGINE flag. Signed-off-by: Michel Thierry <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Export per-engine reset count info to debugfsMichel Thierry1-0/+21
A new variable is added to export the reset counts to debugfs, this includes full gpu reset and engine reset count. This is useful for tests where they are expected to trigger reset; these counts are checked before and after the test to ensure the same. v2: Include reset engine count in i915_engine_info too (Chris). Cc: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Signed-off-by: Arun Siluvery <[email protected]> Signed-off-by: Michel Thierry <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Add engine reset count to error stateMichel Thierry3-0/+16
Driver maintains count of how many times a given engine is reset, useful to capture this in error state also. It gives an idea of how engine is coping up with the workloads it is executing before this error state. A follow-up patch will provide this information in debugfs. v2: s/engine_reset/reset_engine/ (Chris) Define count as unsigned int (Tvrtko) Cc: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Signed-off-by: Arun Siluvery <[email protected]> Signed-off-by: Michel Thierry <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Add support for per engine reset recoveryMichel Thierry3-38/+110
This change implements support for per-engine reset as an initial, less intrusive hang recovery option to be attempted before falling back to the legacy full GPU reset recovery mode if necessary. This is only supported from Gen8 onwards. Hangchecker determines which engines are hung and invokes error handler to recover from it. Error handler schedules recovery for each of those engines that are hung. The recovery procedure is as follows, - identifies the request that caused the hang and it is dropped - force engine to idle: this is done by issuing a reset request - reset the engine - re-init the engine to resume submissions. If engine reset fails then we fall back to heavy weight full gpu reset which resets all engines and reinitiazes complete state of HW and SW. v2: Rebase. v3: s/*engine_reset*/*reset_engine*/; freeze engine and irqs before calling i915_gem_reset_engine (Chris). v4: Rebase, modify i915_gem_reset_prepare to use a ring mask and reuse the function for reset_engine. v5: intel_reset_engine_start/cancel instead of request/unrequest_reset. v6: Clean up reset_engine function to not require mutex, i.e. no need to call revoke/restore_fences and _retire_requests (Chris). v7: Remove leftovers from v5, i.e. no need to disable irq, hold forcewake or wakeup the handoff bit (Chris). v8: engine_retire_requests should be (and it was) static; explain that we have to re-init the engine after reset, which is why the init_hw call is needed; check reset-in-progress flag (Chris). v9: Rebase, include code to pass the active request to gem_reset_engine (as it is already done in full reset). Remove unnecessary intel_reset_engine_start/cancel, these are executed as part of the reset. v10: Rebase, use the right I915_RESET_ENGINE flag. v11: Fixup to call reset_finish_engine even on error. Cc: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Signed-off-by: Tomas Elf <[email protected]> Signed-off-by: Arun Siluvery <[email protected]> Signed-off-by: Michel Thierry <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Modify error handler for per engine hang recoveryMichel Thierry5-1/+78
This is a preparatory patch which modifies error handler to do per engine hang recovery. The actual patch which implements this sequence follows later in the series. The aim is to prepare existing recovery function to adapt to this new function where applicable (which fails at this point because core implementation is lacking) and continue recovery using legacy full gpu reset. A helper function is also added to query the availability of engine reset. A subsequent patch will add the capability to query which type of reset is present (engine -> full -> no-reset) via the get-param ioctl. It has been decided that the error events that are used to notify user of reset will only be sent in case if full chip reset. In case of just single (or multiple) engine resets, userspace won't be notified by these events. Note that this implementation of engine reset is for i915 directly submitting to the ELSP, where the driver manages the hang detection, recovery and resubmission. With GuC submission these tasks are shared between driver and firmware; i915 will still responsible for detecting a hang, and when it does it will have to request GuC to reset that Engine and remind the firmware about the outstanding submissions. This will be added in different patch. v2: rebase, advertise engine reset availability in platform definition, add note about GuC submission. v3: s/*engine_reset*/*reset_engine*/. (Chris) Handle reset as 2 level resets, by first going to engine only and fall backing to full/chip reset as needed, i.e. reset_engine will need the struct_mutex. v4: Pass the engine mask to i915_reset. (Chris) v5: Rebase, update selftests. v6: Rebase, prepare for mutex-less reset engine. v7: Pass reset_engine mask as a function parameter, and iterate over the engine mask for reset_engine. (Chris) v8: Use i915.reset >=2 in has_reset_engine; remove redundant reset logging; add a reset-engine-in-progress flag to prevent concurrent resets, and avoid dual purposing of reset-backoff. (Chris) v9: Support reset of different engines in parallel (Chris) v10: Handle reset-engine flag locking better (Chris) v11: Squash in reporting of per-engine-reset availability. Cc: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Signed-off-by: Ian Lister <[email protected]> Signed-off-by: Tomas Elf <[email protected]> Signed-off-by: Arun Siluvery <[email protected]> Signed-off-by: Michel Thierry <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Update i915.reset to handle engine resetsMichel Thierry2-4/+4
In preparation for engine reset work update this parameter to handle more than one type of reset. Default at the moment is still full gpu reset. Cc: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Signed-off-by: Arun Siluvery <[email protected]> Signed-off-by: Michel Thierry <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Look for active requests earlier in the reset pathMichel Thierry2-7/+9
And store the active request so that we only search for it once. v2: Check for request completion inside _prepare_engine, don't use ECANCELED, remove unnecessary null checks (Chris). v3: Capture active requests during reset_prepare and store it the engine hangcheck obj. v4: Rename commit, change i915_gem_reset_request to just confirm the active_request is still incomplete, instead of engine_stalled (Chris). v5: With style; pass the active request to gem_reset_engine, keep single return in reset_prepare_engine (Chris). v6: Moved before reset-engine code appears (Chris) Suggested-by: Chris Wilson <[email protected]> Reviewed-by: Chris Wilson <[email protected]> (v5) Signed-off-by: Michel Thierry <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Wait for concurrent global resets to completeChris Wilson2-12/+15
If we enter i915_handle_error() a second time and a global reset is already in progress, we can simply wait for completion of the first reset. Currently we exit early prior to the actual reset being performed -- the worst of both worlds! v2: Plug into the existing reset_queue, and remember that kselftests is playing games with I915_RESET_BACKOFF to prevent hangcheck from screwing up. v3: Rename to i915_reset_device to fit in better with i915_reset_engine Signed-off-by: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Cc: Michel Thierry <[email protected]> Reviewed-by: Michel Thierry <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Enable rcu-only context lookupsChris Wilson4-55/+64
Whilst the contents of the context is still protected by the big struct_mutex, this is not much of an improvement. It is just one tiny step towards reducing our BKL. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Allow contexts to be unreferenced locklesslyChris Wilson10-39/+88
If we move the actual cleanup of the context to a worker, we can allow the final free to be called from any context and avoid undue latency in the caller. v2: Negotiate handling the delayed contexts free by flushing the workqueue before calling i915_gem_context_fini() and performing the final free of the kernel context directly v3: Flush deferred frees before new context allocations Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Group all the global context information togetherChris Wilson11-49/+58
Create a substruct to hold all the global context state under drm_i915_private. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Do not re-calculate num_rings locallyTvrtko Ursulin1-3/+2
Since bb8f0f5abdd7 ("drm/i915: Split intel_engine allocation and initialisation") intel_info->num_rings is set early in the load sequence and so available to be used direclty in the 2nd load phase. Signed-off-by: Tvrtko Ursulin <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-20drm/i915: Simplify intel_engines_initTvrtko Ursulin1-24/+12
We do not want to carry on over missing constructors and don't need a duplicated engine mask checking which is already done in the setup phase. Signed-off-by: Tvrtko Ursulin <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
2017-06-19drm/i915/cnl: Fix RMW on ddi vswing sequence.Rodrigo Vivi2-0/+16
Paulo noticed that we were missing few bits clear before writing values back to the register on these RMW MMIO operations. v2: Remove "POST_" from CURSOR_COEFF_MASK. (Paulo). v3: Remove unnecessary braces. (Jani). Fixes: cf54ca8bc567 ("drm/i915/cnl: Implement voltage swing sequence.") Cc: Paulo Zanoni <[email protected]> Cc: Manasi Navare <[email protected]> Cc: Jani Nikula <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Reviewed-by: Paulo Zanoni <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-19drm/i915: Make intel_digital_port_connected() work for any portVille Syrjälä1-13/+70
Add the missing port A handling to intel_digital_port_connected() and also separate SPT from the CPT/LPT code a bit. Cc: Manasi Navare <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Manasi Navare <[email protected]>
2017-06-19drm/i915: Update DRIVER_DATE to 20170619Daniel Vetter1-2/+2
Signed-off-by: Daniel Vetter <[email protected]>
2017-06-16drm/i915/cfl: Introduce Coffee Lake workarounds.Rodrigo Vivi2-23/+60
Coffee Lake inherit most of Kabylake production workarounds. v2: Fix typo on commit message and remove WaDisableKillLogic and GEN9_DISABLE_OCL_OOB_SUPPRESS_LOGIC, since as Mika pointed out they shouldn't be here for cfl according to BSpec. Cc: Dhinakaran Pandiyan <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-16drm/i915: Store 9 bits of PCI Device ID for platforms with a LP PCHDhinakaran Pandiyan1-2/+11
Although we use 9 bits of Device ID for identifying PCH, only 8 bits are stored in dev_priv->pch_id. This makes HAS_PCH_CNP_LP() and HAS_PCH_SPT_LP() incorrect. Fix this by storing all the 9 bits for the platforms with LP PCH. v2: Drop PCH_LPT_LP change (Imre) Cc: Rodrigo Vivi <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Imre Deak <[email protected]> Fixes: commit ec7e0bb35f8d ("drm/i915/cnp: Add PCI ID for Cannonpoint LP PCH") Reported-by: Imre Deak <[email protected]> Reviewed-by: Imre Deak <[email protected]> Signed-off-by: Dhinakaran Pandiyan <[email protected]> Signed-off-by: Imre Deak <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-16drm/i915: Stash a pointer to the obj's resv in the vmaChris Wilson3-14/+15
During execbuf, a mandatory step is that we add this request (this fence) to each object's reservation_object. Inside execbuf, we track the vma, and to add the fence to the reservation_object then means having to first chase the obj, incurring another cache miss. We can reduce the number of cache misses by stashing a pointer to the reservation_object in the vma itself. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-16drm/i915: Async GPU relocation processingChris Wilson2-8/+220
If the user requires patching of their batch or auxiliary buffers, we currently make the alterations on the cpu. If they are active on the GPU at the time, we wait under the struct_mutex for them to finish executing before we rewrite the contents. This happens if shared relocation trees are used between different contexts with separate address space (and the buffers then have different addresses in each), the 3D state will need to be adjusted between execution on each context. However, we don't need to use the CPU to do the relocation patching, as we could queue commands to the GPU to perform it and use fences to serialise the operation with the current activity and future - so the operation on the GPU appears just as atomic as performing it immediately. Performing the relocation rewrites on the GPU is not free, in terms of pure throughput, the number of relocations/s is about halved - but more importantly so is the time under the struct_mutex. v2: Break out the request/batch allocation for clearer error flow. v3: A few asserts to ensure rq ordering is maintained Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Allow execbuffer to use the first object as the batchChris Wilson3-3/+22
Currently, the last object in the execlist is the always the batch. However, when building the batch buffer we often know the batch object first and if we can use the first slot in the execlist we can emit relocation instructions relative to it immediately and avoid a separate pass to adjust the relocations to point to the last execlist slot. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Wait upon userptr get-user-pages within execbufferChris Wilson5-5/+31
This simply hides the EAGAIN caused by userptr when userspace causes resource contention. However, it is quite beneficial with highly contended userptr users as we avoid repeating the setup costs and kernel-user context switches. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Michał Winiarski <[email protected]>
2017-06-16drm/i915: First try the previous execbuffer locationChris Wilson3-4/+15
When choosing a slot for an execbuffer, we ideally want to use the same address as last time (so that we don't have to rebind it) and the same address as expected by the user (so that we don't have to fixup any relocations pointing to it). If we first try to bind the incoming execbuffer->offset from the user, or the currently bound offset that should hopefully achieve the goal of avoiding the rebind cost and the relocation penalty. However, if the object is not currently bound there we don't want to arbitrarily unbind an object in our chosen position and so choose to rebind/relocate the incoming object instead. After we report the new position back to the user, on the next pass the relocations should have settled down. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Store a persistent reference for an object in the execbuffer cacheChris Wilson3-10/+28
If we take a reference to the object/vma when it is first used in an execbuf, we can keep that reference until the object's file-local handle is closed. Thereby saving a frequent ref/unref pair. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Eliminate lots of iterations over the execobjects arrayChris Wilson7-916/+1239
The major scaling bottleneck in execbuffer is the processing of the execobjects. Creating an auxiliary list is inefficient when compared to using the execobject array we already have allocated. Reservation is then split into phases. As we lookup up the VMA, we try and bind it back into active location. Only if that fails, do we add it to the unbound list for phase 2. In phase 2, we try and add all those objects that could not fit into their previous location, with fallback to retrying all objects and evicting the VM in case of severe fragmentation. (This is the same as before, except that phase 1 is now done inline with looking up the VMA to avoid an iteration over the execobject array. In the ideal case, we eliminate the separate reservation phase). During the reservation phase, we only evict from the VM between passes (rather than currently as we try to fit every new VMA). In testing with Unreal Engine's Atlantis demo which stresses the eviction logic on gen7 class hardware, this speed up the framerate by a factor of 2. The second loop amalgamation is between move_to_gpu and move_to_active. As we always submit the request, even if incomplete, we can use the current request to track active VMA as we perform the flushes and synchronisation required. The next big advancement is to avoid copying back to the user any execobjects and relocations that are not changed. v2: Add a Theory of Operation spiel. v3: Fall back to slow relocations in preparation for flushing userptrs. v4: Document struct members, factor out eb_validate_vma(), add a few more comments to explain some magic and hide other magic behind macros. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocationsChris Wilson1-0/+10
If we write a relocation into the buffer, we require our own implicit synchronisation added after the start of the execbuf, outside of the user's control. As we may end up clflushing, or doing the patch itself on the GPU, asynchronously we need to look at the implicit serialisation on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this object. If the user does trigger a stall for relocations, we make sure the stall is complete enough so that the batch is not submitted before we complete those relocations. Fixes: 77ae9957897d ("drm/i915: Enable userspace to opt-out of implicit fencing") Signed-off-by: Chris Wilson <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Jason Ekstrand <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Pass vma to relocate entryChris Wilson1-60/+42
We can simplify our tracking of pending writes in an execbuf to the single bit in the vma->exec_entry->flags, but that requires the relocation function knowing the object's vma. Pass it along. Note we have only been using a single bit to track flushing since commit cc889e0f6ce6a63c62db17d702ecfed86d58083f Author: Daniel Vetter <[email protected]> Date: Wed Jun 13 20:45:19 2012 +0200 drm/i915: disable flushing_list/gpu_write_list unconditionally flushed all render caches before the breadcrumb and commit 6ac42f4148bc27e5ffd18a9ab0eac57f58822af4 Author: Daniel Vetter <[email protected]> Date: Sat Jul 21 12:25:01 2012 +0200 drm/i915: Replace the complex flushing logic with simple invalidate/flush all did away with the explicit GPU domain tracking. This was then codified into the ABI with NO_RELOC in commit ed5982e6ce5f106abcbf071f80730db344a6da42 Author: Daniel Vetter <[email protected]> # Oi! Patch stealer! Date: Thu Jan 17 22:23:36 2013 +0100 drm/i915: Allow userspace to hint that the relocations were known Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Store a direct lookup from object handle to vmaChris Wilson11-112/+322
The advent of full-ppgtt lead to an extra indirection between the object and its binding. That extra indirection has a noticeable impact on how fast we can convert from the user handles to our internal vma for execbuffer. In order to bypass the extra indirection, we use a resizable hashtable to jump from the object to the per-ctx vma. rhashtable was considered but we don't need the online resizing feature and the extra complexity proved to undermine its usefulness. Instead, we simply reallocate the hastable on demand in a background task and serialize it before iterating. In non-full-ppgtt modes, multiple files and multiple contexts can share the same vma. This leads to having multiple possible handle->vma links, so we only use the first to establish the fast path. The majority of buffers are not shared and so we should still be able to realise speedups with multiple clients. v2: Prettier names, more magic. v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2 Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Fix retrieval of hangcheck statsChris Wilson1-3/+0
The default context is always supported (as it contains the global hangcheck stats) and the contexts for hangcheck are not limited to any ring. This was dropped in 2013 because it was supposed to have been included with Ben's full-ppgtt patch set. It never landed and the bug remains. References: https://bugs.freedesktop.org/show_bug.cgi?id=65845 Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-16drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirtyChris Wilson8-12/+17
For ease of use (i.e. avoiding a few checks and function calls), store the object's cache coherency next to the cache is dirty bit. Specifically this patch aims to reduce the frequency of no-op calls to i915_gem_object_clflush() to counter-act the increase of such calls for GPU only objects in the previous patch. v2: Replace cache_dirty & ~cache_coherent with cache_dirty && !cache_coherent as gcc generates much better code for the latter (Tvrtko) Signed-off-by: Chris Wilson <[email protected]> Cc: Dongwon Kim <[email protected]> Cc: Matt Roper <[email protected]> Tested-by: Dongwon Kim <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Tvrtko Ursulin <[email protected]>
2017-06-16drm/i915: Mark CPU cache as dirty on every transition for CPU writesChris Wilson6-56/+67
Currently, we only mark the CPU cache as dirty if we skip a clflush. This leads to some confusion where we have to ask if the object is in the write domain or missed a clflush. If we always mark the cache as dirty, this becomes a much simply question to answer. The goal remains to do as few clflushes as required and to do them as late as possible, in the hope of deferring the work to a kthread and not block the caller (e.g. execbuf, flips). v2: Always call clflush before GPU execution when the cache_dirty flag is set. This may cause some extra work on llc systems that migrate dirty buffers back and forth - but we do try to limit that by only setting cache_dirty at the end of the gpu sequence. v3: Always mark the cache as dirty upon a level change, as we need to invalidate any stale cachelines due to external writes. Reported-by: Dongwon Kim <[email protected]> Fixes: a6a7cc4b7db6 ("drm/i915: Always flush the dirty CPU cache when pinning the scanout") Signed-off-by: Chris Wilson <[email protected]> Cc: Dongwon Kim <[email protected]> Cc: Matt Roper <[email protected]> Tested-by: Dongwon Kim <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Tvrtko Ursulin <[email protected]>
2017-06-16drm/i915: Make i915_vma_destroy() staticChris Wilson2-2/+1
i915_vma_destroy() is now not used outside of i915_vma.c so we can remove the export and make the function static. Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Tvrtko Ursulin <[email protected]>
2017-06-16drm/i915: Actually attach the tv_format property to the SDVO connectorVille Syrjälä1-1/+2
Attach the tv_format property to the SDVO connector instead of passing a '0' in place of the pointer to the property. This got broken when the SDVO connector properties were converted to atomic. We can thank sparse for catching this: drivers/gpu/drm/i915/intel_sdvo.c:2742:75: warning: Using plain integer as NULL pointer Cc: Maarten Lankhorst <[email protected]> Cc: Daniel Vetter <[email protected]> Fixes: 630d30a4ee27 ("drm/i915: Convert intel_sdvo connector properties to atomic.") Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Rodrigo Vivi <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]>
2017-06-16Merge tag 'gvt-next-2017-06-08' of https://github.com/01org/gvt-linux into ↵Jani Nikula19-440/+604
drm-intel-next-queued gvt-next-2017-06-08 First gvt-next pull for 4.13: - optimization for per-VM mmio save/restore (Changbin) - optimization for mmio hash table (Changbin) - scheduler optimization with event (Ping) - vGPU reset refinement (Fred) - other misc refactor and cleanups, etc. Signed-off-by: Jani Nikula <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-15Revert "drm/i915/skl: New ddb allocation algorithm"Rodrigo Vivi1-159/+98
This reverts commit bb9d85f6e9de8fef5236c076530eab67a2f2431b. New ddb allocation algorithm is a show stopper on my SKL system. Besides not be able to get external DP 4k@60 (through USB type C), It fully hang my screen when unplugging the USB type C. Bugzilla: https://patchwork.freedesktop.org/patch/161571/ Fixes: bb9d85f6e9de ("drm/i915/skl: New ddb allocation algorithm") Cc: Mahesh Kumar <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Matt Roper <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Reviewed-by: Matt Roper <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-15drm/i915/glk: Add cold boot sequence for GLK DSIMadhav Chauhan1-28/+51
As per BSEPC, if device ready bit is '0' in enable IO sequence then its a cold boot/reset scenario eg: S3/S4 resume. If cold boot scenario detected in enable IO, then prepare port immediately. In normal boot scenario, prepare port after glk_dsi_device_ready(). Without cold boot sequence enabled, features like S3/S4 doesn't work. Signed-off-by: Madhav Chauhan <[email protected]> Signed-off-by: Jani Nikula <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-15drm/i915/glk: Split GLK DSI device ready functionalityMadhav Chauhan1-7/+18
This patch divides glk_dsi_device_ready() function into two part. First part will program LP wake and MIPI DSI mode to MIPI_CTRL reg using newly defined function glk_dsi_enable_io(). glk_dsi_enable_io() will be called from intel_dsi_pre_enable. Second part will do remaining device ready activities using the existing function glk_dsi_device_ready(). Signed-off-by: Madhav Chauhan <[email protected]> Signed-off-by: Jani Nikula <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-15drm/i915: Don't enable backlight at setup time.Dhinakaran Pandiyan1-10/+0
Maarten and Ville noticed that we are enabling backlight via DP aux very early in the modeset_init path via the intel_dp_aux_setup_backlight() function, since commit e7156c833903 ("drm/i915: Add Backlight Control using DPCD for eDP connectors (v9)"). Looks like all we need to do during _setup_backlight() is read the current brightness state instead of modifying it. v2: Rewrote commit message. Cc: Ville Syrjala <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Yetunde Adebisi <[email protected]> Signed-off-by: Dhinakaran Pandiyan <[email protected]> Reviewed-by: Maarten Lankhorst <[email protected]> Acked-by: Jani Nikula <[email protected]> Tested-by: Puthikorn Voravootivat <[email protected]> Fixes: e7156c833903 ("drm/i915: Add Backlight Control using DPCD for eDP connectors (v9)") Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Ville Syrjälä <[email protected]>
2017-06-15drm/i915/cnl: make function cnl_ddi_dp_set_dpll_hw_state staticColin Ian King1-2/+3
The function cnl_ddi_dp_set_dpll_hw_state does not need to be in global scope, so make it static. Cleans up sparse warning: "symbol 'cnl_ddi_dp_set_dpll_hw_state' was not declared. Should it be static?" Signed-off-by: Colin Ian King <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Tvrtko Ursulin <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]>
2017-06-15drm/i915: Remove pipe A quirk remnantsVille Syrjälä3-85/+10
With 830 the only thing needing pipe quirks, we can just drop the quirk defines and replace the checks with IS_I830() checks. Signed-off-by: Ville Syrjälä <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Chris Wilson <[email protected]> Acked-by: Maarten Lankhorst <[email protected]>
2017-06-15drm/i915: Drop pipe A quirk for Thinkapd T60Ville Syrjälä1-3/+0
The pipe A force quirk shouldn't needed except on 830. So let's nuke it for the IBM Thinkpad T60 945 machines. This quirk pre-dates KMS so it's usefulness is doubtful at best now. The original bug report [1] describes the symptoms as "system hang on closing T60 panel lid", and we already dropped a similar quirk for another 945 machine in commit 736a69ca8c99 ("drm/i915: Drop PIPE-A quirk for 945GSE HP Mini") so I'm hopeful we can drop this one as well. The quirk was added into xf86-video-intel in commit 08903abe4dc0 ("Add pipe a force enable quirk for Lenovo T60") [1] https://bugs.freedesktop.org/show_bug.cgi?id=16494 Signed-off-by: Ville Syrjälä <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Chris Wilson <[email protected]> Acked-by: Maarten Lankhorst <[email protected]>
2017-06-15drm/i915: Drop pipe A quirk for Toshiba Protege R205-S209Ville Syrjälä1-3/+0
The pipe A force quirk shouldn't needed except on 830. So let's nuke it for the Toshiba Protege R-205/S-209 945 machines. This quirk pre-dates KMS so it's usefulness is doubtful at best now. Unfortunately the original bug report [1] isn't very helpful since it doesn't describe the symptoms. And the commit message in xf86-video-intel commit ecdb5963ef68 ("Add pipe A force enable quirk for Toshiba Portege R205-S209") is not much help either. However, if we assume the problem was the typical "closing the lid hangs the box" type of thing, we already nuked the quirk for another 945 machine in commit 736a69ca8c99 ("drm/i915: Drop PIPE-A quirk for 945GSE HP Mini") and so I hope we can drop this one as well. [1] https://bugs.freedesktop.org/show_bug.cgi?id=14944 Signed-off-by: Ville Syrjälä <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Chris Wilson <[email protected]> Acked-by: Maarten Lankhorst <[email protected]>
2017-06-15drm/i915: Add i830 "pipes power well"Ville Syrjälä3-1/+157
830 more or less requires both pipes and DPLLs to remain on as long as either pipe is needed. However, when neither pipe is actually needed, we can save a bit of power by turning everything off. To do that we add a new "power well" that turns both pipes and DPLLs on and off in the right order. Seems to save ~50mW on my Fujitsu-Siemens Lifebook S6010. This also avoids having to abuse the load detection to force pipe A on at init time. That was never very robust, and it only worked for one pipe, whereas 830 really needs both pipes enabled. As a bonus the 830 pipe quirk is now a bit more isolated from the rest of the mode setting infrastructure, which should mean that it's much less likely someone will accidentally break it in the future. The extra cost is of course slight code duplication, but that seems like a worthwile tradeoff here. v2; s/BIT/BIT_ULL/ Signed-off-by: Ville Syrjälä <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Acked-by: Maarten Lankhorst <[email protected]>
2017-06-15drm/i915: Use a loop for the "three times for luck" DPLL procedureVille Syrjälä1-9/+6
The magic "enable the DPLL three times" sequence feels like it deserves a loop. Signed-off-by: Ville Syrjälä <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Jani Nikula <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Reviewed-by: Maarten Lankhorst <[email protected]>
2017-06-15drm/i915: Plumb the correct acquire ctx into intel_crtc_disable_noatomic()Ville Syrjälä1-4/+5
If intel_crtc_disable_noatomic() were to ever get called during resume we'd end up deadlocking since resume has its own acqcuire_ctx but intel_crtc_disable_noatomic() still tries to use the mode_config.acquire_ctx. Pass down the correct acquire ctx from the top. Cc: [email protected] Cc: Maarten Lankhorst <[email protected]> Fixes: e2c8b8701e2d ("drm/i915: Use atomic helpers for suspend, v2.") Signed-off-by: Ville Syrjälä <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Maarten Lankhorst <[email protected]>
2017-06-15drm/i915: Fix deadlock witha the pipe A quirk during resumeVille Syrjälä1-9/+12
Pass down the correct acquire context to the pipe A quirk load detect hack during display resume. Avoids deadlocking the entire thing. Cc: [email protected] Cc: Maarten Lankhorst <[email protected]> Fixes: e2c8b8701e2d ("drm/i915: Use atomic helpers for suspend, v2.") Signed-off-by: Ville Syrjälä <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Maarten Lankhorst <[email protected]>