aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
2020-09-07drm/i915/gem: Reduce context termination list iteration guard to RCUChris Wilson1-13/+19
As we now protect the timeline list using RCU, we can drop the timeline->mutex for guarding the list iteration during context close, as we are searching for an inflight request. Any new request will see the context is banned and not be submitted. In doing so, pull the checks for a concurrent submission of the request (notably the i915_request_completed()) under the engine spinlock, to fully serialise with __i915_request_submit()). That is in the case of preempt-to-busy where the request may be completed during the __i915_request_submit(), we need to be careful that we sample the request status after serialising so that we don't miss the request the engine is actually submitting. Fixes: 4a3174152147 ("drm/i915/gem: Refine occupancy test in kill_context()") References: d22d2d073ef8 ("drm/i915: Protect i915_request_await_start from early waits") # rcu protection of timeline->requests References: https://gitlab.freedesktop.org/drm/intel/-/issues/1622 References: https://gitlab.freedesktop.org/drm/intel/-/issues/2158 Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/selftests: Prevent selecting 0 for our random width/alignChris Wilson1-3/+3
When igt_random_offset() is a given a range of [0, PAGE_SIZE], it is allowed to return 0. However, attempting to use a size of 0 for the igt_lmem_write_cpu() byte poking, leads to call igt_random_offset() with a range of [offset, offset + 0] and ask it to find a length of 4 within it. This triggers the bug on that the requested length should fit within the range! Signed-off-by: Chris Wilson <[email protected]> Cc: Matthew Auld <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Hold context/request reference while breadcrumbs are activeChris Wilson2-38/+74
Currently we hold no actual reference to the request nor context while they are attached to a breadcrumb. To avoid freeing the request/context too early, we serialise with cancel-breadcrumbs by taking the irq spinlock in i915_request_retire(). The alternative is to take a reference for a new breadcrumb and release it upon signaling; removing the more frequently hit contention point in i915_request_retire(). Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> [Joonas: Rebased and reordered into drm-intel-gt-next branch] Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Move intel_breadcrumbs_arm_irq earlierChris Wilson1-42/+42
Move the __intel_breadcrumbs_arm_irq earlier, next to the disarm_irq, so that we can make use of it in the following patch. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Shrink i915_page_directory's slab bucketChris Wilson4-28/+50
kmalloc uses power-of-two slab buckets for small allocations (up to a few pages). Since i915_page_directory is a page of pointers, plus a couple more, this is rounded up to 8K, and we waste nearly 50% of that allocation. Long terms this leads to poor memory utilisation, bloating the kernel footprint, but the problem is exacerbated by our conservative preallocation scheme for binding VMA. As we are required to allocate all levels for each vma just in case we need to insert them upon binding, this leads to a large multiplication factor for a single page vma. By halving the allocation we need for the page directory structure, we halve the impact of that factor, bringing workloads that once fitted into memory, hopefully back to fitting into memory. We maintain the split between i915_page_directory and i915_page_table as we only need half the allocation for the lowest, most populous, level. Signed-off-by: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Cc: Matthew Auld <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Switch to object allocations for page directoriesChris Wilson18-420/+289
The GEM object is grossly overweight for the practicality of tracking large numbers of individual pages, yet it is currently our only abstraction for tracking DMA allocations. Since those allocations need to be reserved upfront before an operation, and that we need to break away from simple system memory, we need to ditch using plain struct page wrappers. In the process, we drop the WC mapping as we ended up clflushing everything anyway due to various issues across a wider range of platforms. Though in a future step, we need to drop the kmap_atomic approach which suggests we need to pre-map all the pages and keep them mapped. v2: Verify our large scratch page is suitably DMA aligned; and manually clear the scratch since we are allocating plain struct pages full of prior content. Signed-off-by: Chris Wilson <[email protected]> Cc: Matthew Auld <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Preallocate stashes for vma page-directoriesChris Wilson9-190/+237
We need to make the DMA allocations used for page directories to be performed up front so that we can include those allocations in our memory reservation pass. The downside is that we have to assume the worst case, even before we know the final layout, and always allocate enough page directories for this object, even when there will be overlap. This unfortunately can be quite expensive, especially as we have to clear/reset the page directories and DMA pages, but it should only be required during early phases of a workload when new objects are being discovered, or after memory/eviction pressure when we need to rebind. Once we reach steady state, the objects should not be moved and we no longer need to preallocating the pages tables. It should be noted that the lifetime for the page directories DMA is more or less decoupled from individual fences as they will be shared across objects across timelines. v2: Only allocate enough PD space for the PTE we may use, we do not need to allocate PD that will be left as scratch. v3: Store the shift unto the first PD level to encapsulate the different PTE counts for gen6/gen8. Signed-off-by: Chris Wilson <[email protected]> Cc: Matthew Auld <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbsChris Wilson16-95/+162
On the virtual engines, we only use the intel_breadcrumbs for tracking signaling of stale breadcrumbs from the irq_workers. They do not have any associated interrupt handling, active requests are passed to a physical engine and associated breadcrumb interrupt handler. This causes issues for us as we need to ensure that we do not actually try and enable interrupts and the powermanagement required for them on the virtual engine, as they will never be disabled. Instead, let's specify the physical engine used for interrupt handler on a particular breadcrumb. v2: Drop b->irq_armed = true mocking for no interrupt HW Fixes: 4fe6abb8f513 ("drm/i915/gt: Ignore irq enabling on the virtual engines") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Only transfer the virtual context to the new engine if activeChris Wilson1-25/+40
One more complication of preempt-to-busy with respect to the virtual engine is that we may have retired the last request along the virtual engine at the same time as preparing to submit the completed request to a new engine. That submit will be shortcircuited, but not before we have updated the context with the new register offsets and marked the virtual engine as bound to the new engine (by calling swap on ve->siblings[]). As we may have just retired the completed request, we may also be in the middle of calling virtual_context_exit() to turn off the power management associated with the virtual engine, and that in turn walks the ve->siblings[]. If we happen to call swap() on the array as we walk, we will call intel_engine_pm_put() twice on the same engine. In this patch, we prevent this by only updating the bound engine after a successful submission which weeds out the already completed requests. Alternatively, we could walk a non-volatile array for the pm, such as using the engine->mask. The small advantage to performing the update after the submit is that we then only have to do a swap for active requests. Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine" Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: "Nayana, Venkata Ramana" <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbsChris Wilson4-52/+21
After staring at the breadcrumb enabling/cancellation and coming to the conclusion that the cause of the mysterious stale breadcrumbs must the act of submitting a completed requests, we can then redirect those completed requests onto a dedicated signaled_list at the time of construction and so eliminate intel_engine_transfer_stale_breadcrumbs(). Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Remove requirement for holding i915_request.lock for breadcrumbsChris Wilson3-87/+123
Since the breadcrumb enabling/cancelling itself is serialised by the breadcrumbs.irq_lock, with a bit of care we can remove the outer serialisation with i915_request.lock for concurrent dma_fence_enable_signaling(). This has the important side-effect of eliminating the nested i915_request.lock within request submission. The challenge in serialisation is around the unsubmission where we take an active request that wants a breadcrumb on the signaling engine and put it to sleep. We do not want a concurrent dma_fence_enable_signaling() to attach a breadcrumb as we unsubmit, so we must mark the request as no longer active before serialising with the concurrent enable-signaling. On retire, we serialise with the concurrent enable-signaling, but instead of clearing ACTIVE, we mark it as SIGNALED. Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> [Joonas: Rebased and reordered into drm-intel-gt-next branch] Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Provide a fastpath for waiting on vma bindingsChris Wilson2-2/+22
Before we can execute a request, we must wait for all of its vma to be bound. This is a frequent operation for which we can optimise away a few atomic operations (notably a cmpxchg) in lieu of the RCU protection. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Reduce locking around i915_active_acquire_preallocate_barrier()Chris Wilson1-5/+5
As the conversion between idle-barrier and full i915_active_fence is already serialised by explicit memory barriers, we can reduce the spinlock in i915_active_acquire_preallocate_barrier() for finding an idle-barrier to reuse to an RCU read lock to ensure the fence remains valid, only taking the spinlock for the update of the rbtree itself. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Make the stale cached active node available for any timelineChris Wilson1-2/+28
Rather than require the next timeline after idling to match the MRU before idling, reset the index on the node and allow it to match the first request. However, this requires cmpxchg(u64) and so is not trivial on 32b, so for compatibility we just fallback to keeping the cached node pointing to the MRU timeline. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Keep the most recently used active-fence upon discardChris Wilson2-11/+20
Whenever an i915_active idles, we prune its tree of old fence slots to prevent a gradual leak should it be used to track many, many timelines. The downside is that we then have to frequently reallocate the rbtree. A compromise is that we keep the most recently used fence slot, and reuse that for the next active reference as that is the most likely timeline to be reused. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Export a preallocate variant of i915_active_acquire()Chris Wilson4-38/+130
Sometimes we have to be very careful not to allocate underneath a mutex (or spinlock) and yet still want to track activity. Enter i915_active_acquire_for_context(). This raises the activity counter on i915_active prior to use and ensures that the fence-tree contains a slot for the context. v2: Refactor active_lookup() so it can be called again before/after locking to resolve contention. Since we protect the rbtree until we idle, we can do a lockfree lookup, with the caveat that if another thread performs a concurrent insertion, the rotations from the insert may cause us to not find our target. A second pass holding the treelock will find the target if it exists, or the place to perform our insertion. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Skip taking acquire mutex for no ref->active callbackChris Wilson1-8/+16
If no active callback is defined for i915_active, we do not need to serialise its enabling with the mutex. We still do only want to call the debug activate once, and must still serialise with a concurrent retire. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/selftests: Drop stale timeline constructor assertChris Wilson1-1/+0
Since we pass around encoded parameters to the kernel context constructor using the ce->timeline pointer, we can no longer assert that it should be zero for mock timeline construction. Fixes: d1bf5dd8f6d5 ("drm/i915/gt: Support multiple pinned timelines") Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> [Joonas: Updated Fixes: link after rebasing and reordering into drm-intel-gt-next branch] Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Pull release of node->age under the spinlockChris Wilson1-1/+1
We need to ensure that the list is valid prior to marking the node as retrievable, otherwise we may see two threads compete over the same node in intel_gt_get_buffer_pool(). If the first thread acquires and releases the node in the same jiffie, the second thread may then acquire it (as the jiffie now again matches the expected value) and claim the node before it is put back into the list. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Support multiple pinned timelinesChris Wilson8-23/+61
We may need to allocate more than one pinned context/timeline for each engine which can utilise the per-engine HWSP, so we need to give each a different offset within it. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gem: Delay tracking the GEM context until it is registeredChris Wilson1-5/+11
Avoid exposing a partially constructed context by deferring the list_add() from the initial construction to the end of registration. Otherwise, if we peek into the list of contexts from inside debugfs, we may see the partially constructed context and chase down some dangling incomplete pointers. Reported-by: CQ Tang <[email protected]> Fixes: 3aa9945a528e ("drm/i915: Separate GEM context construction and registration to userspace") References: f6e8aa387171 ("drm/i915: Report the number of closed vma held by each context in debugfs") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: CQ Tang <[email protected]> Cc: <[email protected]> # v5.2+ Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Fix termination condition for freeing all buffer objectsChris Wilson1-5/+8
A last minute change, that unfortunately broke CI so badly it declared SUCCESS, was to refactor the debug free all buffer pool code to reuse the normal worker, inverted the termination condition so that it instead of discarding the nodes, they were all declared young enough and eligible for reuse. Fixes: 06b73c2d0b65 ("drm/i915/gt: Delay taking the spinlock for grabbing from the buffer pool") Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> [Joonas: Updating Fixes: link after rebasing and reordering into drm-intel-gt-next] Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/selftests: Flush the active barriers before assertingChris Wilson1-0/+2
Before we peek at the barrier status for an assert, first serialise with its callbacks so that we see a stable value. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Delay taking the spinlock for grabbing from the buffer poolChris Wilson2-42/+64
Some very low hanging fruit, but contention on the pool->lock is noticeable between intel_gt_get_buffer_pool() and pool_retire(), with the majority of the hold time due to the locked list iteration. If we make the node itself RCU protected, we can perform the search for an suitable node just under RCU, reserving taking the lock itself for claiming the node and manipulating the list. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gt: Disable preparser around xcs invalidations on tglChris Wilson1-2/+13
Unlike rcs where we have conclusive evidence from our selftesting that disabling the preparser before performing the TLB invalidate and relocations does impact upon the GPU execution, the evidence for the same requirement on xcs is much more circumstantial. Let's apply the preparser disable between batches as we invalidate the TLB as a dose of healthy paranoia, just in case. References: https://gitlab.freedesktop.org/drm/intel/-/issues/2169 Signed-off-by: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/gem: Remove disordered per-file request list for throttlingChris Wilson8-85/+50
I915_GEM_THROTTLE dates back to the time before contexts where there was just a single engine, and therefore a single timeline and request list globally. That request list was in execution/retirement order, and so walking it to find a particular aged request made sense and could be split per file. That is no more. We now have many timelines with a file, as many as the user wants to construct (essentially per-engine, per-context). Each of those run independently and so make the single list futile. Remove the disordered list, and iterate over all the timelines to find a request to wait on in each to satisfy the criteria that the CPU is no more than 20ms ahead of its oldest request. It should go without saying that the I915_GEM_THROTTLE ioctl is no longer used as the primary means of throttling, so it makes sense to push the complication into the ioctl where it only impacts upon its few irregular users, rather than the execbuf/retire where everybody has to pay the cost. Fortunately, the few users do not create vast amount of contexts, so the loops over contexts/engines should be concise. Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Soften the tasklet flush frequency before waitsChris Wilson1-2/+18
We include a tasklet flush before waiting on a request as a precaution against the HW being lax in event signaling. We now have a precautionary flush in the engine's heartbeat and so do not need to be quite so zealous on every request wait. If we focus on the request, the only tasklet flush that matters is if there is a delay in submitting this request to HW, so if the request is not ready to be executed, no advantage in reducing this wait can be gained by running the tasklet. And there is little point in doing busy work for no result. Signed-off-by: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915/selftests: Mock the status_page.vma for the kernel_contextChris Wilson1-0/+3
Since we assert that the kernel_context is using the perma-pinned HWSP, make it so. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2179 Signed-off-by: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-07drm/i915: Reduce i915_request.lock contention for i915_request_waitChris Wilson1-10/+8
Currently, we use i915_request_completed() directly in i915_request_wait() and follow up with a manual invocation of dma_fence_signal(). This appears to cause a large number of contentions on i915_request.lock as when the process is woken up after the fence is signaled by an interrupt, we will then try and call dma_fence_signal() ourselves while the signaler is still holding the lock. dma_fence_is_signaled() has the benefit of checking the DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so avoids most of that contention. Signed-off-by: Chris Wilson <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]>
2020-09-06Merge tag 'for-linus-5.9-rc4-tag' of ↵Linus Torvalds1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "A small series for fixing a problem with Xen PVH guests when running as backends (e.g. as dom0). Mapping other guests' memory is now working via ZONE_DEVICE, thus not requiring to abuse the memory hotplug functionality for that purpose" * tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: add helpers to allocate unpopulated memory memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC xen/balloon: add header guard
2020-09-05drm: xlnx: dpsub: Fix DMADEVICES Kconfig dependencyLaurent Pinchart1-0/+1
The dpsub driver uses the DMA engine API, and thus selects DMA_ENGINE to provide that API. DMA_ENGINE depends on DMADEVICES, which can be deselected by the user, creating a possibly unmet indirect dependency: WARNING: unmet direct dependencies detected for DMA_ENGINE Depends on [n]: DMADEVICES [=n] Selected by [m]: - DRM_ZYNQMP_DPSUB [=m] && HAS_IOMEM [=y] && (ARCH_ZYNQMP || COMPILE_TEST [=y]) && COMMON_CLK [=y] && DRM [=m] && OF [=y] Add a dependency on DMADEVICES to fix this. Reported-by: Randy Dunlap <[email protected]> Signed-off-by: Laurent Pinchart <[email protected]> Acked-by: Randy Dunlap <[email protected]>
2020-09-04drm/msm: Disable the RPTR shadowJordan Crouse6-27/+43
Disable the RPTR shadow across all targets. It will be selectively re-enabled later for targets that need it. Cc: [email protected] Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2020-09-04drm/msm: Disable preemption on all 5xx targetsJordan Crouse1-1/+2
Temporarily disable preemption on a5xx targets pending some improvements to protect the RPTR shadow from being corrupted. Cc: [email protected] Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2020-09-04drm/msm: Enable expanded apriv support for a650Jordan Crouse4-4/+19
a650 supports expanded apriv support that allows us to map critical buffers (ringbuffer and memstore) as as privileged to protect them from corruption. Cc: [email protected] Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2020-09-04drm/msm: Split the a5xx preemption recordJordan Crouse2-5/+21
The main a5xx preemption record can be marked as privileged to protect it from user access but the counters storage needs to be remain unprivileged. Split the buffers and mark the critical memory as privileged. Cc: [email protected] Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2020-09-04Merge tag 'drm-fixes-2020-09-04' of git://anongit.freedesktop.org/drm/drmLinus Torvalds13-25/+81
Pull drm fixes from Dave Airlie: "Not much going on this week, nouveau has a display hw bug workaround, amdgpu has some PM fixes and CIK regression fixes, one single radeon PLL fix, and a couple of i915 display fixes. amdgpu: - Fix for 32bit systems - SW CTF fix - Update for Sienna Cichlid - CIK bug fixes radeon: - PLL fix i915: - Clang build warning fix - HDCP fixes nouveau: - display fixes" * tag 'drm-fixes-2020-09-04' of git://anongit.freedesktop.org/drm/drm: drm/nouveau/kms/nv50-gp1xx: add WAR for EVO push buffer HW bug drm/nouveau/kms/nv50-gp1xx: disable notifies again after core update drm/nouveau/kms/nv50-: add some whitespace before debug message drm/nouveau/kms/gv100-: Include correct push header in crcc37d.c drm/radeon: Prefer lower feedback dividers drm/amdgpu: Fix bug in reporting voltage for CIK drm/amdgpu: Specify get_argument function for ci_smu_funcs drm/amd/pm: enable MP0 DPM for sienna_cichlid drm/amd/pm: avoid false alarm due to confusing softwareshutdowntemp setting drm/amd/pm: fix is_dpm_running() run error on 32bit system drm/i915: Clear the repeater bit on HDCP disable drm/i915: Fix sha_text population code drm/i915/display: Ensure that ret is always initialized in icl_combo_phy_verify_state
2020-09-04Merge branch 'simplify-do_wp_page'Linus Torvalds1-8/+0
Merge emailed patches from Peter Xu: "This is a small series that I picked up from Linus's suggestion to simplify cow handling (and also make it more strict) by checking against page refcounts rather than mapcounts. This makes uffd-wp work again (verified by running upmapsort)" Note: this is horrendously bad timing, and making this kind of fundamental vm change after -rc3 is not at all how things should work. The saving grace is that it really is a a nice simplification: 8 files changed, 29 insertions(+), 120 deletions(-) The reason for the bad timing is that it turns out that commit 17839856fd58 ("gup: document and work around 'COW can break either way' issue" broke not just UFFD functionality (as Peter noticed), but Mikulas Patocka also reports that it caused issues for strace when running in a DAX environment with ext4 on a persistent memory setup. And we can't just revert that commit without re-introducing the original issue that is a potential security hole, so making COW stricter (and in the process much simpler) is a step to then undoing the forced COW that broke other uses. Link: https://lore.kernel.org/lkml/alpine.LRH.2.02.2009031328040.6929@file01.intranet.prod.int.rdu2.redhat.com/ * emailed patches from Peter Xu <[email protected]>: mm: Add PGREUSE counter mm/gup: Remove enfornced COW mechanism mm/ksm: Remove reuse_ksm_page() mm: do_wp_page() simplification
2020-09-04mm/gup: Remove enfornced COW mechanismPeter Xu1-8/+0
With the more strict (but greatly simplified) page reuse logic in do_wp_page(), we can safely go back to the world where cow is not enforced with writes. This essentially reverts commit 17839856fd58 ("gup: document and work around 'COW can break either way' issue"). There are some context differences due to some changes later on around it: 2170ecfa7688 ("drm/i915: convert get_user_pages() --> pin_user_pages()", 2020-06-03) 376a34efa4ee ("mm/gup: refactor and de-duplicate gup_fast() code", 2020-06-03) Some lines moved back and forth with those, but this revert patch should have striped out and covered all the enforced cow bits anyways. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Peter Xu <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-09-04drm/virtio: drop virtio_gpu_output->enabledGerd Hoffmann3-6/+1
Not needed, already tracked by drm_crtc_state->active. Signed-off-by: Gerd Hoffmann <[email protected]> Acked-by: Daniel Vetter <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 1174c8a0f33c1e5c442ac40381fe124248c08b3a)
2020-09-04drm/sun4i: backend: Disable alpha on the lowest plane on the A20Maxime Ripard1-1/+0
Unlike we previously thought, the per-pixel alpha is just as broken on the A20 as it is on the A10. Remove the quirk that says we can use it. Fixes: dcf496a6a608 ("drm/sun4i: sun4i: Introduce a quirk for lowest plane alpha support") Signed-off-by: Maxime Ripard <[email protected]> Reviewed-by: Chen-Yu Tsai <[email protected]> Cc: Paul Kocialkowski <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-09-04drm/sun4i: backend: Support alpha property on lowest planeMaxime Ripard1-2/+1
Unlike what we previously thought, only the per-pixel alpha is broken on the lowest plane and the per-plane alpha isn't. Remove the check on the alpha property being set on the lowest plane to reject a mode. Fixes: dcf496a6a608 ("drm/sun4i: sun4i: Introduce a quirk for lowest plane alpha support") Signed-off-by: Maxime Ripard <[email protected]> Reviewed-by: Chen-Yu Tsai <[email protected]> Cc: Paul Kocialkowski <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-09-04drm/sun4i: Fix DE2 YVU handlingJernej Skrabec1-1/+1
Function sun8i_vi_layer_get_csc_mode() is supposed to return CSC mode but due to inproper return type (bool instead of u32) it returns just 0 or 1. Colors are wrong for YVU formats because of that. Fixes: daab3d0e8e2b ("drm/sun4i: de2: csc_mode in de2 format struct is mostly redundant") Reported-by: Roman Stratiienko <[email protected]> Signed-off-by: Jernej Skrabec <[email protected]> Tested-by: Roman Stratiienko <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-09-04xen: add helpers to allocate unpopulated memoryRoger Pau Monne1-4/+5
To be used in order to create foreign mappings. This is based on the ZONE_DEVICE facility which is used by persistent memory devices in order to create struct pages and kernel virtual mappings for the IOMEM areas of such devices. Note that on kernels without support for ZONE_DEVICE Xen will fallback to use ballooned pages in order to create foreign mappings. The newly added helpers use the same parameters as the existing {alloc/free}_xenballooned_pages functions, which allows for in-place replacement of the callers. Once a memory region has been added to be used as scratch mapping space it will no longer be released, and pages returned are kept in a linked list. This allows to have a buffer of pages and prevents resorting to frequent additions and removals of regions. If enabled (because ZONE_DEVICE is supported) the usage of the new functionality untangles Xen balloon and RAM hotplug from the usage of unpopulated physical memory ranges to map foreign pages, which is the correct thing to do in order to avoid mappings of foreign pages depend on memory hotplug. Note the driver is currently not enabled on Arm platforms because it would interfere with the identity mapping required on some platforms. Signed-off-by: Roger Pau Monné <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2020-09-04Merge branch 'linux-5.9' of git://github.com/skeggsb/linux into drm-fixesDave Airlie4-3/+12
A couple of minor fixes to the display changes that went in for 5.9. The most important of which is a workaround for a HW bug that was exposed by better push buffer space management, leading to random(ish...) display engine hangs. Signed-off-by: Dave Airlie <[email protected]> From: Ben Skeggs <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/ <CACAvsv5QDxyMihrxbPk+-sORnaYtjR6_dbM68gEhb2wxht_G1w@mail.gmail.com
2020-09-04Merge tag 'drm-intel-fixes-2020-09-03' of ↵Dave Airlie2-8/+28
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.9-rc4: - Clang build warning fix - HDCP fixes Signed-off-by: Dave Airlie <[email protected]> From: Jani Nikula <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-09-03drm/tve200: Stabilize enable/disableLinus Walleij1-1/+21
The TVE200 will occasionally print a bunch of lost interrupts and similar dmesg messages, sometimes during boot and sometimes after disabling and coming back to enablement. This is probably because the hardware is left in an unknown state by the boot loader that displays a logo. This can be fixed by bringing the controller into a known state by resetting the controller while enabling it. We retry reset 5 times like the vendor driver does. We also put the controller into reset before de-clocking it and clear all interrupts before enabling the vblank IRQ. This makes the video enable/disable/enable cycle rock solid on the D-Link DIR-685. Tested extensively. Signed-off-by: Linus Walleij <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: [email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-09-03drm/amdgpu/mmhub2.0: print client id string for mmhubAlex Deucher1-6/+82
Print the name of the client rather than the number. This makes it easier to debug what block is causing the fault. Reviewed-by: Huang Rui <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-09-03drm/amdgpu/gmc9: print client id string for mmhubAlex Deucher1-9/+230
Print the name of the client rather than the number. This makes it easier to debug what block is causing the fault. Reviewed-by: Huang Rui <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-09-03drm/amdgpu/gmc10: print client id string for gfxhubAlex Deucher2-6/+54
Print the name of the client rather than the number. This makes it easier to debug what block is causing the fault. Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-09-03drm/amdgpu/gmc9: print client id string for gfxhubAlex Deucher1-4/+26
Print the name of the client rather than the number. This makes it easier to debug what block is causing the fault. Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>