aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
2017-10-28drm/msm: dump a rd GPUADDR header for all buffers in the commandJordan Crouse1-15/+15
Currently the rd dump avoids any buffers marked as WRITE under the assumption that the contents are not interesting. While it is true that the contents are uninteresting we should still print the iova and size for all buffers so that any listening replay tools can correctly construct the submission. Print the header for all buffers but only dump the contents for buffers marked as READ. Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm: Removed unused struct_mutex_taskJordan Crouse2-8/+0
Recent changes to locking have rendered struct_mutex_task unused. Unused since 0e08270a1f01. Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm: Implement preemption for A5XX targetsJordan Crouse10-20/+599
Implement preemption for A5XX targets - this allows multiple ringbuffers for different priorities with automatic preemption of a lower priority ringbuffer if a higher one is ready. Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm: Make the value of RB_CNTL (almost) genericJordan Crouse2-5/+12
We use a global ringbuffer size and block size for all targets and at least for 5XX preemption we need to know the value the RB_CNTL in several locations so it makes sense to calculate it once and use it everywhere. The only monkey wrench is that we need to disable the RPTR shadow for A430 targets but that only needs to be done once and doesn't affect A5XX so we can or in the value at init time. Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm: Shadow current pointer in the ring until command is completeJordan Crouse3-6/+16
Add a shadow pointer to track the current command being written into the ring. Don't commit it as 'cur' until the command is submitted. Because 'cur' is used to construct the software copy of the wptr this ensures that somebody peeking in on the ring doesn't assume that a command is inflight while it is being written. This isn't a huge deal with a single ring (though technically the hangcheck could assume the system is prematurely busy when it isn't) but it will be rather important for preemption where the decision to preempt is based on a non-empty ringbuffer. Without a shadow an aggressive preemption scheme could assume that the ringbuffer is non empty and switch to it before the CPU is done writing the command and boom. Even though preemption won't be supported for all targets because of the way the code is organized it is simpler to make this generic for all targets. The extra load for non-preemption targets should be minimal. Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm: Add a parameter query for the number of ringbuffersJordan Crouse1-0/+3
In order to manage ringbuffer priority to its fullest userspace should know how many ringbuffers it has to work with. Add a parameter to return the number of active rings. Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm: Support multiple ringbuffersJordan Crouse18-210/+363
Add the infrastructure to support the idea of multiple ringbuffers. Assign each ringbuffer an id and use that as an index for the various ring specific operations. The biggest delta is to support legacy fences. Each fence gets its own sequence number but the legacy functions expect to use a unique integer. To handle this we return a unique identifier for each submission but map it to a specific ring/sequence under the covers. Newer users use a dma_fence pointer anyway so they don't care about the actual sequence ID or ring. The actual mechanics for multiple ringbuffers are very target specific so this code just allows for the possibility but still only defines one ringbuffer for each target family. Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm: Move memptrs to msm_gpuJordan Crouse7-72/+58
When we move to multiple ringbuffers we're going to store the data in the memptrs on a per-ring basis. In order to prepare for that move the current memptrs from the adreno namespace into msm_gpu. This is way cleaner and immediately lets us kill off some sub functions so there is much less cost later when we do move to per-ring structs. Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm: Add per-instance submit queuesJordan Crouse7-16/+228
Currently the behavior of a command stream is provided by the user application during submission and the application is expected to internally maintain the settings for each 'context' or 'rendering queue' and specify the correct ones. This works okay for simple cases but as applications become more complex we will want to set context specific flags and do various permission checks to allow certain contexts to enable additional privileges. Add kernel-side submit queues to be analogous to 'contexts' or 'rendering queues' on the application side. Each file descriptor instance will maintain its own list of queues. Queues cannot be shared between file descriptors. For backwards compatibility context id '0' is defined as a default context specifying no priority and no special flags. This is intended to be the usual configuration for 99% of applications so that a garden variety application can function correctly without creating a queue. Only those applications requiring the specific benefit of different queues need create one. Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm/mdp5: disable vblanks when crtc is offRob Clark1-0/+6
Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm/mdp4: disable vblanks when crtc is offRob Clark1-0/+7
Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm/hdmi: convert to msm_clk_get()Rob Clark5-11/+9
We already have, as a result of upstreaming the gpu bindings, msm_clk_get() which will try to get the clock both without and with a "_clk" suffix. Use this in HDMI code so we can drop the "_clk" suffix in bindings while maintaing backwards compatibility. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Sean Paul <[email protected]>
2017-10-28drm/msm/edp: convert to msm_clk_get()Rob Clark1-11/+11
We already have, as a result of upstreaming the gpu bindings, msm_clk_get() which will try to get the clock both without and with a "_clk" suffix. Use this in eDP code so we can drop the "_clk" suffix in bindings while maintaing backwards compatibility. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Sean Paul <[email protected]>
2017-10-28drm/msm/dsi: convert to msm_clk_get()Rob Clark3-20/+20
We already have, as a result of upstreaming the gpu bindings, msm_clk_get() which will try to get the clock both without and with a "_clk" suffix. Use this in DSI code so we can drop the "_clk" suffix in bindings while maintaing backwards compatibility. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Sean Paul <[email protected]>
2017-10-28drm/msm/mdp5: always print mdp5 versionRob Clark1-1/+1
This is useful to see in the log, without requiring drm.debug. Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm/adreno: deal with linux-firmware fw pathsRob Clark3-8/+103
When firmware was added to linux-firmware, it was put in a qcom sub- directory, unlike what we'd been using before. For a300_pfp.fw and a300_pm4.fw symlinks were created, but we'd prefer not to have to do this in the future. So add support to look in both places when loading firmware. Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm/adreno: split out helper to load fwRob Clark4-18/+34
Prep work for the next patch. Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm/adreno: load gpu at probe/bind timeRob Clark4-65/+73
Previously, in an effort to defer initializing the gpu until firmware was available (ie. rootfs mounted), the gpu was not loaded at when the subdevice was bound. Which resulted that clks/etc were requested in a place that devm couldn't really help unwind if something failed. Instead move request_firmware() to gpu->hw_init() and construct the gpu earlier in adreno_bind(). To avoid the rest of the driver needing to be aware of a gpu that hasn't managed to load firmware and hw_init() yet, stash the gpu ptr in the adreno device's drvdata, and don't set priv->gpu() until hw_init() succeeds. Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm/hdmi: Remove mmagic_iface_clk from the 8x96 PHY clocksArchit Taneja1-1/+0
This was used as a placeholder. It was never really input to the MDSS/HDMI clocks. Signed-off-by: Archit Taneja <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-10-28drm/msm: fix _NO_IMPLICIT fencing caseRob Clark2-17/+18
We need to call reservation_object_reserve_shared() in both cases, but this wasn't happening in the _NO_IMPLICIT submit case. Fixes: f0a42bb ("drm/msm: submit support for in-fences") Reported-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-10-27drm/amdkfd: use a high priority workqueue for IH workAndres Rodriguez3-2/+4
In systems under heavy load the IH work may experience significant scheduling delays. Under load + system workqueue: Max Latency: 7.023695 ms Avg Latency: 0.263994 ms Under load + high priority workqueue: Max Latency: 1.162568 ms Avg Latency: 0.163213 ms Further work is required to measure the impact of per-cpu settings on IH performance. Signed-off-by: Andres Rodriguez <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: wait only for IH work on IH exitAndres Rodriguez1-2/+2
We don't need to wait for all work to complete in the IH exit function. We only need to make sure the interrupt_work has finished executing to guarantee that ih_kfifo is no longer in use. Signed-off-by: Andres Rodriguez <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: increase IH num entries to 8192Andres Rodriguez1-1/+1
A larger buffer will let us accommodate applications with a large amount of semi-simultaneous event signals. Signed-off-by: Andres Rodriguez <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: use standard kernel kfifo for IHAndres Rodriguez2-57/+27
Replace our implementation of a lockless ring buffer with the standard linux kernel kfifo. We shouldn't maintain our own version of a standard data structure. Signed-off-by: Andres Rodriguez <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: Make event limit dependent on user mode mapping sizeFelix Kuehling2-6/+20
This allows increasing the KFD_SIGNAL_EVENT_LIMIT in kfd_ioctl.h without breaking processes built with older kfd_ioctl.h versions. Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: Use IH context ID for signal lookupFelix Kuehling2-16/+64
This speeds up signal lookup when the IH ring entry includes a valid context ID or partial context ID. Only if the context ID is found to be invalid, fall back to an exhaustive search of all signaled events. Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: Simplify event ID and signal slot managementFelix Kuehling3-170/+80
Signal slots are identical to event IDs. Replace the used_slot_bitmap and events hash table with an IDR to allocate and lookup event IDs and signal slots more efficiently. Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: Simplify events page allocatorFelix Kuehling3-132/+70
The first event page is always big enough to handle all events. Handling of multiple events pages is not supported by user mode, and not necessary. Signed-off-by: Yong Zhao <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: Use wait_queue_t to implement event waitingFelix Kuehling2-38/+24
Use standard wait queues for waiting and waking up waiting threads instead of inventing our own. We still have our own wait loop because the HSA event semantics require the ability to have one thread waiting on multiple wait queues (events) at the same time. Signed-off-by: Kent Russell <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: remove redundant kfd_event_waiter.input_indexFelix Kuehling1-6/+3
This always identical with the index of the event_waiter in the array. No need to store it in the waiter record. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: Fix event destruction with pending waitersFelix Kuehling1-26/+46
When an event with pending waiters is destroyed, those waiters may end up sleeping forever unless they are notified and woken up. Implement the notification by clearing the waiter->event pointer, which becomes invalid anyway, when the event is freed, and waking up the waiting tasks. Waiters on an event that's destroyed return failure. Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: Clean up kfd_wait_on_eventsFelix Kuehling3-52/+32
Cleaned up the code while resolving some potential bugs and inconsistencies in the process. Clean-ups: * Remove enum kfd_event_wait_result, which duplicates KFD_IOC_EVENT_RESULT definitions * alloc_event_waiters can be called without holding p->event_mutex * Return an error code from copy_signaled_event_data instead of bool * Clean up error handling code paths to minimize duplication in kfd_wait_on_events Fixes: * Consistently return an error code from kfd_wait_on_events and set wait_result to KFD_IOC_WAIT_RESULT_FAIL in all failure cases. * Always call free_waiters while holding p->event_mutex * copy_signaled_event_data might sleep. Don't call it while the task state is TASK_INTERRUPTIBLE. Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: Fix scheduler race in kfd_wait_on_events sleep loopSean Keely1-1/+12
Signed-off-by: Sean Keely <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: Short cut for kfd_wait_on_events without waitingSean Keely1-4/+39
If kfd_wait_on_events can return immediately, we don't need to populate the wait list and don't need to enter the sleep-loop. Signed-off-by: Sean Keely <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Acked-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: Don't dereference kfd_process.mmFelix Kuehling3-6/+21
The kfd_process doesn't own a reference to the mm_struct, so it can disappear without warning even while the kfd_process still exists. Therefore, avoid dereferencing the kfd_process.mm pointer and make it opaque. Use get_task_mm to get a temporary reference to the mm when it's needed. v2: removed unnecessary WARN_ON Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amdkfd: Add SDMA trap src id to the KFD isr wanted listBesar Wicaksono2-1/+5
This enables SDMA signalling with event interrupt. Signed-off-by: Besar Wicaksono <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Oded Gabbay <[email protected]> Signed-off-by: Oded Gabbay <[email protected]>
2017-10-27drm/amd/powerplay: change ASIC temperature reading on Vega10Eric Huang1-2/+2
ASIC temperature reading from HOTSPOT to ASIC edge which makes things consistent with previous asics. Signed-off-by: Eric Huang <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-10-27drm/amd/display: check if modeset is required before adding planeShirish S1-0/+3
Adding affected planes without checking if modeset is requested from the user space causes performance regression in video p/b scenarios when full screen p/b is not composited. Hence add a check before adding a plane as affected. bug: https://bugs.freedesktop.org/show_bug.cgi?id=103408 Acked-by: Alex Deucher <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Shirish S <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-10-27drm/amd/display: fix high part address in dm_plane_helper_prepare_fb()Shirish S1-2/+8
The high part calculation of luma and chroma address' was missing in dm_plane_helper_prepare_fb(). This fix brings uniformity in the address' at atomic_check and atomic_commit for both RGB & YUV planes. Signed-off-by: Shirish S <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-10-27drm/amd/display : add high part address calculation for underlayShirish S1-2/+7
Currently the high part of the address structure is not populated in case of luma and chroma. This patch adds this calculation. Signed-off-by: Shirish S <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-10-27drm/amd/display: Fix no display on FijiJerry Zuo1-10/+9
Allocate memory for the second pipe allocate_mem_input() needs to be done prior to program pipe front end. It shows sensitive to Fiji. Failure to do so will cause error in allocate memory  allocate_mem_input() on the second connected display. Signed-off-by: Jerry Zuo <[email protected]> Signed-off-by: Yongqiang Sun <[email protected]> Reviewed-by: Tony Cheng <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-10-27Revert "drm/amd/display: Match actual state during S3 resume."Rex Zhu1-33/+0
This reverts commit 4f346e655d24140fb40b46f814506ba17ac34ea1. fix s3 hang issue. Acked-by: Alex Deucher <[email protected]> Signed-off-by: Rex Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-10-27Merge tag 'drm-intel-fixes-2017-10-26' of ↵Dave Airlie5-65/+15
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes One fix for stable: - fix perf enable/disable ioctls for 32bits (Lionel) Plus GVT fixes: - Fix per_ctx_bb check (Zhenyu) - Fix GPU hang of Linux guest (Xion) - Refine MMIO_RING_F to check for presence of VCS2 ring (Zhi) * tag 'drm-intel-fixes-2017-10-26' of git://anongit.freedesktop.org/drm/drm-intel: drm/i915/gvt: Adding ACTHD mmio read handler drm/i915/gvt: Extract mmio_read_from_hw() common function drm/i915/gvt: Refine MMIO_RING_F() drm/i915/gvt: properly check per_ctx bb valid state
2017-10-26drm/i915/gvt: Adding ACTHD mmio read handlerXiong Zhang1-2/+3
When a workload is too heavy to finish it in gpu hang check timer intervals(1.5), gpu hang check function will check ACTHD register value to decide whether gpu is real dead or not. On real hw, ACTHD is updated by HW when workload is running, then host kernel won't think it is gpu hang. while guest kernel always read a constant ACTHD value as GVT doesn't supply ACTHD emulate handler, then guest kernel detects a fake gpu hang. To remove such guest fake gpu hang, this patch supply ACTHD mmio read handler which read real HW ACTHD register directly. Signed-off-by: Xiong Zhang <[email protected]> Signed-off-by: Zhi Wang <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-10-27drm/i915/gvt: Extract mmio_read_from_hw() common functionXiong Zhang1-16/+5
The mmio read handler for ring timestmap / instdone register are same as reading hw value directly. Extract it as common function to reduce code duplications. Signed-off-by: Xiong Zhang <[email protected]> Signed-off-by: Zhi Wang <[email protected]>
2017-10-27drm/i915/gvt: Refine MMIO_RING_F()Zhi Wang2-45/+2
Inspect if the host has VCS2 ring by host i915 macro in MMIO_RING_F(). Also this helps on reducing some LOCs. Signed-off-by: Zhi Wang <[email protected]>
2017-10-27drm/i915/gvt: properly check per_ctx bb valid stateZhenyu Wang3-2/+5
Need to check valid state for per_ctx bb and bypass batch buffer combine for scan if necessary. Otherwise adding invalid MI batch buffer start cmd for per_ctx bb will cause scan failure, which is taken as -EFAULT now so vGPU would be put in failsafe. This trys to fix that by checking per_ctx bb valid state. Also remove old invalid WARNING that indirect ctx bb shouldn't depend on valid per_ctx bb. Signed-off-by: Zhenyu Wang <[email protected]> Signed-off-by: Zhi Wang <[email protected]>
2017-10-26Merge branch 'drm-next-4.15' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie11-82/+94
into drm-next Just a few fixes for 4.15. * 'drm-next-4.15' of git://people.freedesktop.org/~agd5f/linux: drm/amd/amdgpu: Remove workaround for suspend/resume in uvd7 drm/amdgpu: don't flush the TLB before initializing GART drm/amdgpu: minor cleanup for amdgpu_ttm_bind drm/amdgpu/psp: prevent page fault by checking write_frame address(v4) drm/amd/powerplay: retrieve the real-time coreClock values drm/amd/powerplay: fix performance drop on Vega10 drm/amd/powerplay: add one smc message for Vega10 drm/amd/powerplay: fix amd_powerplay_reset() amdgpu: add padding to the fence to handle ioctl. drm/amdgpu:fix wb_clear drm/amdgpu:fix vf_error_put drm/amdgpu/sriov:now must reinit psp drm/amdgpu: merge bios post checking functions
2017-10-25drm/amd/amdgpu: Remove workaround for suspend/resume in uvd7Tom St Denis1-11/+5
The workaround is not required anymor and would result in hangs during suspend/resume cycles if the uvd block were busy. Signed-off-by: Tom St Denis <[email protected]> Acked-by: Leo Liu <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-10-25drm/amdgpu: don't flush the TLB before initializing GARTChristian König1-6/+7
No point in doing this. Signed-off-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>