aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-07-03drm/i915: Disable MSI for all pre-gen5Ville Syrjälä1-3/+5
We have pretty clear evidence that MSIs are getting lost on g4x and somehow the interrupt logic doesn't seem to recover from that state even if we try hard to clear the IIR. Disabling IER around the normal IIR clearing in the irq handler isn't sufficient to avoid this, so the problem really seems to be further up the interrupt chain. This should guarantee that there's always an edge if any IIR bits are set after the interrupt handler is done, which should normally guarantee that the CPU interrupt is generated. That approach seems to work perfectly on VLV/CHV, but apparently not on g4x. MSI is documented to be broken on 965gm at least. The chipset spec says MSI is defeatured because interrupts can be delayed or lost, which fits well with what we're seeing on g4x. Previously we've already disabled GMBUS interrupts on g4x because somehow GMBUS manages to raise legacy interrupts even when MSI is enabled. Since there's such widespread MSI breakahge all over in the pre-gen5 land let's just give up on MSI on these platforms. Seqno reporting might be negatively affected by this since the legcy interrupts aren't guaranteed to be ordered with the seqno writes, whereas MSI interrupts may be? But an occasioanlly missed seqno seems like a small price to pay for generally working interrupts. Cc: [email protected] Cc: Diego Viola <[email protected]> Tested-by: Diego Viola <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101261 Signed-off-by: Ville Syrjälä <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Daniel Vetter <[email protected]> (cherry picked from commit e38c2da01f76cca82b59ca612529b81df82a7cc7) Signed-off-by: Jani Nikula <[email protected]>
2017-06-30Merge tag 'gvt-fixes-2017-06-29' of https://github.com/01org/gvt-linux into ↵Jani Nikula4-32/+99
drm-intel-next-fixes gvt-fixes-2017-06-29 - two race fixes for VFIO locks from Chuanxiao - virtual display fix for BDW from Xiong Signed-off-by: Jani Nikula <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-29drm/i915/gvt: Make function dpy_reg_mmio_readx safeChangbin Du1-16/+19
The dpy_reg_mmio_read_x functions directly copy 4 bytes data to the target address with considering the length. If may cause the target memory corrupted if the requested length less than 4 bytes. Fix it for safety even we already have some checking to avoid this happen. And for convince, the 3 functions are merged. Signed-off-by: Changbin Du <[email protected]> Signed-off-by: Zhenyu Wang <[email protected]>
2017-06-27drm/i915/gvt: Don't read ADPA_CRT_HOTPLUG_MONITOR from hostXiong Zhang2-1/+5
When host connects a crt screen, linux guest will detect two screens: crt and dp. This is wrong as linux guest has only one dp. In order to avoid guest get host crt screen, we should set ADPA_CRT_HOTPLUG_MONITOR to none. But MMIO_RO(PCH_ADPA) prevent from that. So MMIO_DH should be used instead of MMIO_RO. v2: Clear its staus to none at initialize, so guest don't get host crt.(Zhangyu) v3: SKL doesn't have this register, limit it to pre_skl.(xiong) Signed-off-by: Xiong Zhang <[email protected]> Signed-off-by: Zhenyu Wang <[email protected]>
2017-06-27drm/i915/gvt: Set initial PORT_CLK_SEL vreg for BDWXiong Zhang1-0/+18
On BDW, when host physical screen and guest virtual screen aren't on the same DDI port, guest i915 driver prints the following error and stop running. [ 6.775873] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068 [ 6.775928] IP: intel_ddi_clock_get+0x81/0x430 [i915] [ 6.776206] Call Trace: [ 6.776233] ? vgpu_read32+0x4f/0x100 [i915] [ 6.776264] intel_ddi_get_config+0x11c/0x230 [i915] [ 6.776298] intel_modeset_setup_hw_state+0x313/0xd40 [i915] [ 6.776334] intel_modeset_init+0xe49/0x18d0 [i915] [ 6.776368] ? vgpu_write32+0x53/0x100 [i915] [ 6.776731] ? intel_i2c_reset+0x42/0x50 [i915] [ 6.777085] ? intel_setup_gmbus+0x32a/0x350 [i915] [ 6.777427] i915_driver_load+0xabc/0x14d0 [i915] [ 6.777768] i915_pci_probe+0x4f/0x70 [i915] The null pointer is guest intel_crtc_state->shared_dpll which is setted in haswell_get_ddi_pll(). When guest and host screen are on different DDI port, host driver won't set PORT_CLK_SET(guest_port), so haswell_get_ddi_pll() will return null and don't set pipe_config->shared_dpll, once the following program refernce this structure, it will print the above error. This patch set the initial val of guest PORT_CLK_SEL(guest_port) to LCPLL_810. And guest i915 driver will reset this value according to guest screen mode. Signed-off-by: Xiong Zhang <[email protected]> Signed-off-by: Zhenyu Wang <[email protected]>
2017-06-26drm/i915: Clear execbuf's vma backpointer upon releaseChris Wilson1-0/+1
commit 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array") jiggled around the error handling and replace a test that we cleaned up properly after ourselves with an assertion. That assertion failed because in the release function (moments after the assertion) we were indeed forgetting to mark the vma as cleared. The consequence was when testing an invalid relocation address, we would try to release the vma twice (following the couple of attempts to verify the address) and on the second release notice that the first release was incomplete. Testcase: igt/gem_reloc_overflow/invalid-address Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array") Signed-off-by: Chris Wilson <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Tvrtko Ursulin <[email protected]> (cherry picked from commit 51d05e1b29676a0425749a1533b87e3ad3c6f176) Signed-off-by: Daniel Vetter <[email protected]>
2017-06-26drm/i915: Pass the right flags to i915_vma_move_to_active()Chris Wilson1-1/+1
i915_vma_move_to_active() takes the execobject flags and not a boolean! Instead of passing EXEC_OBJECT_WRITE we passed true [i.e. EXEC_OBJECT_NEEDS_FENCE] causing us to start tracking the vma->last_fence access and since we forgot to clear that on unbinding, we caused a use-after-free. [ 321.263854] BUG: KASAN: use-after-free in i915_gem_request_retire+0x1728/0x1740 [i915] [ 321.264001] Read of size 8 at addr ffff880100fc67d8 by task gem_exec_reloc/2868 [ 321.264181] CPU: 0 PID: 2868 Comm: gem_exec_reloc Not tainted 4.12.0-rc6-CI-Custom_2759+ #1 [ 321.264195] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015 [ 321.264208] Call Trace: [ 321.264234] dump_stack+0x67/0x99 [ 321.264260] print_address_description+0x77/0x290 [ 321.264437] ? i915_gem_request_retire+0x1728/0x1740 [i915] [ 321.264459] kasan_report+0x269/0x350 [ 321.264487] __asan_report_load8_noabort+0x14/0x20 [ 321.264660] i915_gem_request_retire+0x1728/0x1740 [i915] [ 321.264841] ? intel_ring_context_pin+0x131/0x690 [i915] [ 321.265021] i915_gem_request_alloc+0x2c6/0x1220 [i915] [ 321.265044] ? _raw_spin_unlock_irqrestore+0x3d/0x60 [ 321.265226] i915_gem_do_execbuffer+0xac0/0x2a20 [i915] [ 321.265250] ? __lock_acquire+0xceb/0x5450 [ 321.265269] ? entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 321.265291] ? kvmalloc_node+0x6b/0x80 [ 321.265310] ? kvmalloc_node+0x6b/0x80 [ 321.265489] ? eb_relocate_slow+0xbe0/0xbe0 [i915] [ 321.265520] ? ___slab_alloc.constprop.28+0x2ab/0x3d0 [ 321.265549] ? debug_check_no_locks_freed+0x280/0x280 [ 321.265591] ? __might_fault+0xc6/0x1b0 [ 321.265782] i915_gem_execbuffer2+0x14a/0x3f0 [i915] [ 321.265815] drm_ioctl+0x4ba/0xaa0 [ 321.265986] ? i915_gem_execbuffer+0xde0/0xde0 [i915] [ 321.266017] ? drm_getunique+0x270/0x270 [ 321.266068] do_vfs_ioctl+0x17f/0xfa0 [ 321.266091] ? __fget+0x1ba/0x330 [ 321.266112] ? lock_acquire+0x390/0x390 [ 321.266133] ? ioctl_preallocate+0x1d0/0x1d0 [ 321.266164] ? __fget+0x1db/0x330 [ 321.266194] ? __fget_light+0x79/0x1f0 [ 321.266219] SyS_ioctl+0x3c/0x70 [ 321.266247] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 321.266265] RIP: 0033:0x7fcede207357 [ 321.266279] RSP: 002b:00007ffef0effe58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 321.266307] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fcede207357 [ 321.266321] RDX: 00007ffef0effef0 RSI: 0000000040406469 RDI: 0000000000000004 [ 321.266335] RBP: ffffffff812097c6 R08: 0000000000000008 R09: 0000000000000000 [ 321.266349] R10: 0000000000000008 R11: 0000000000000246 R12: ffff880116bcff98 [ 321.266363] R13: ffffffff81cb7cb3 R14: ffff880116bcff70 R15: 0000000000000000 [ 321.266385] ? __this_cpu_preempt_check+0x13/0x20 [ 321.266406] ? trace_hardirqs_off_caller+0x1d6/0x2c0 [ 321.266487] Allocated by task 2868: [ 321.266568] save_stack_trace+0x16/0x20 [ 321.266586] kasan_kmalloc+0xee/0x180 [ 321.266602] kasan_slab_alloc+0x12/0x20 [ 321.266620] kmem_cache_alloc+0xc7/0x2e0 [ 321.266795] i915_vma_instance+0x28c/0x1540 [i915] [ 321.266964] eb_lookup_vmas+0x5a7/0x2250 [i915] [ 321.267130] i915_gem_do_execbuffer+0x69a/0x2a20 [i915] [ 321.267296] i915_gem_execbuffer2+0x14a/0x3f0 [i915] [ 321.267315] drm_ioctl+0x4ba/0xaa0 [ 321.267333] do_vfs_ioctl+0x17f/0xfa0 [ 321.267350] SyS_ioctl+0x3c/0x70 [ 321.267369] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 321.267428] Freed by task 177: [ 321.267502] save_stack_trace+0x16/0x20 [ 321.267521] kasan_slab_free+0xad/0x180 [ 321.267539] kmem_cache_free+0xc5/0x340 [ 321.267710] i915_vma_unbind+0x666/0x10a0 [i915] [ 321.267880] i915_vma_close+0x23a/0x2f0 [i915] [ 321.268048] __i915_gem_free_objects+0x17d/0xc70 [i915] [ 321.268215] __i915_gem_free_work+0x49/0x70 [i915] [ 321.268234] process_one_work+0x66f/0x1410 [ 321.268252] worker_thread+0xe1/0xe90 [ 321.268269] kthread+0x304/0x410 [ 321.268285] ret_from_fork+0x27/0x40 [ 321.268346] The buggy address belongs to the object at ffff880100fc6640 which belongs to the cache i915_vma of size 656 [ 321.268550] The buggy address is located 408 bytes inside of 656-byte region [ffff880100fc6640, ffff880100fc68d0) [ 321.268741] The buggy address belongs to the page: [ 321.268837] page:ffffea000403f000 count:1 mapcount:0 mapping: (null) index:0xffff880100fc5980 compound_mapcount: 0 [ 321.269045] flags: 0x8000000000008100(slab|head) [ 321.269147] raw: 8000000000008100 0000000000000000 ffff880100fc5980 00000001001e001d [ 321.269312] raw: ffffea0004038e20 ffff880116b46240 ffff88011646c640 0000000000000000 [ 321.269484] page dumped because: kasan: bad access detected [ 321.269665] Memory state around the buggy address: [ 321.269778] ffff880100fc6680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 321.269949] ffff880100fc6700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 321.270115] >ffff880100fc6780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 321.270279] ^ [ 321.270410] ffff880100fc6800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 321.270576] ffff880100fc6880: fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc [ 321.270740] ================================================================== [ 321.270903] Disabling lock debugging due to kernel taint Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101511 Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing") Signed-off-by: Chris Wilson <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Tvrtko Ursulin <[email protected]> (cherry picked from commit 25ffaa67459e988e73210543f7e05dfbf3f16163) Signed-off-by: Daniel Vetter <[email protected]>
2017-06-26drm/i915/cnl: Fix RMW on ddi vswing sequence.Rodrigo Vivi2-0/+16
Paulo noticed that we were missing few bits clear before writing values back to the register on these RMW MMIO operations. v2: Remove "POST_" from CURSOR_COEFF_MASK. (Paulo). v3: Remove unnecessary braces. (Jani). Fixes: cf54ca8bc567 ("drm/i915/cnl: Implement voltage swing sequence.") Cc: Paulo Zanoni <[email protected]> Cc: Manasi Navare <[email protected]> Cc: Jani Nikula <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Reviewed-by: Paulo Zanoni <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 1f588aeb60b4412019546ce596f179635abc2ac3) Signed-off-by: Daniel Vetter <[email protected]>
2017-06-26drm/i915/gvt: Fix inconsistent locks holding sequenceChuanxiao Dong1-5/+9
There are two kinds of locking sequence. One is in the thread which is started by vfio ioctl to do the iommu unmapping. The locking sequence is: down_read(&group_lock) ----> mutex_lock(&cached_lock) The other is in the vfio release thread which will unpin all the cached pages. The lock sequence is: mutex_lock(&cached_lock) ---> down_read(&group_lock) And, the cache_lock is used to protect the rb tree of the cache node and doing vfio unpin doesn't require this lock. Move the vfio unpin out of the cache_lock protected region. v2: - use for style instead of do{}while(1). (Zhenyu) Fixes: f30437c5e7bf ("drm/i915/gvt: add KVMGT support") Signed-off-by: Chuanxiao Dong <[email protected]> Cc: Zhenyu Wang <[email protected]> Cc: [email protected] # v4.10+ Signed-off-by: Zhenyu Wang <[email protected]>
2017-06-26drm/i915/gvt: Fix possible recursive locking issueChuanxiao Dong2-10/+48
vfio_unpin_pages will hold a read semaphore however it is already hold in the same thread by vfio ioctl. It will cause below warning: [ 5102.127454] ============================================ [ 5102.133379] WARNING: possible recursive locking detected [ 5102.139304] 4.12.0-rc4+ #3 Not tainted [ 5102.143483] -------------------------------------------- [ 5102.149407] qemu-system-x86/1620 is trying to acquire lock: [ 5102.155624] (&container->group_lock){++++++}, at: [<ffffffff817768c6>] vfio_unpin_pages+0x96/0xf0 [ 5102.165626] but task is already holding lock: [ 5102.172134] (&container->group_lock){++++++}, at: [<ffffffff8177728f>] vfio_fops_unl_ioctl+0x5f/0x280 [ 5102.182522] other info that might help us debug this: [ 5102.189806] Possible unsafe locking scenario: [ 5102.196411] CPU0 [ 5102.199136] ---- [ 5102.201861] lock(&container->group_lock); [ 5102.206527] lock(&container->group_lock); [ 5102.211191] *** DEADLOCK *** [ 5102.217796] May be due to missing lock nesting notation [ 5102.225370] 3 locks held by qemu-system-x86/1620: [ 5102.230618] #0: (&container->group_lock){++++++}, at: [<ffffffff8177728f>] vfio_fops_unl_ioctl+0x5f/0x280 [ 5102.241482] #1: (&(&iommu->notifier)->rwsem){++++..}, at: [<ffffffff810de775>] __blocking_notifier_call_chain+0x35/0x70 [ 5102.253713] #2: (&vgpu->vdev.cache_lock){+.+...}, at: [<ffffffff8157b007>] intel_vgpu_iommu_notifier+0x77/0x120 [ 5102.265163] stack backtrace: [ 5102.270022] CPU: 5 PID: 1620 Comm: qemu-system-x86 Not tainted 4.12.0-rc4+ #3 [ 5102.277991] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.01.APER.061220151418 06/12/2015 [ 5102.289445] Call Trace: [ 5102.292175] dump_stack+0x85/0xc7 [ 5102.295871] validate_chain.isra.21+0x9da/0xaf0 [ 5102.300925] __lock_acquire+0x405/0x820 [ 5102.305202] lock_acquire+0xc7/0x220 [ 5102.309191] ? vfio_unpin_pages+0x96/0xf0 [ 5102.313666] down_read+0x2b/0x50 [ 5102.317259] ? vfio_unpin_pages+0x96/0xf0 [ 5102.321732] vfio_unpin_pages+0x96/0xf0 [ 5102.326024] intel_vgpu_iommu_notifier+0xe5/0x120 [ 5102.331283] notifier_call_chain+0x4a/0x70 [ 5102.335851] __blocking_notifier_call_chain+0x4d/0x70 [ 5102.341490] blocking_notifier_call_chain+0x16/0x20 [ 5102.346935] vfio_iommu_type1_ioctl+0x87b/0x920 [ 5102.351994] vfio_fops_unl_ioctl+0x81/0x280 [ 5102.356660] ? __fget+0xf0/0x210 [ 5102.360261] do_vfs_ioctl+0x93/0x6a0 [ 5102.364247] ? __fget+0x111/0x210 [ 5102.367942] SyS_ioctl+0x41/0x70 [ 5102.371542] entry_SYSCALL_64_fastpath+0x1f/0xbe put the vfio_unpin_pages in a workqueue can fix this. v2: - use for style instead of do{}while(1). (Zhenyu) v3: - rename gvt_cache_mark to gvt_cache_mark_remove. (Zhenyu) Fixes: 659643f7d814 ("drm/i915/gvt/kvmgt: add vfio/mdev support to KVMGT") Signed-off-by: Chuanxiao Dong <[email protected]> Cc: Zhenyu Wang <[email protected]> Cc: [email protected] # v4.10+ Signed-off-by: Zhenyu Wang <[email protected]>
2017-06-21Merge tag 'drm-misc-next-2017-06-19_0' of ↵Dave Airlie8-32/+227
git://anongit.freedesktop.org/git/drm-misc into drm-next UAPI Changes: - vc4: Add get/set tiling format ioctls (Eric) Driver Changes: - vc4: Add tiling T-format support for scanout (Eric) - vc4: Use atomic helpers in commit (Boris) Cc: Boris Brezillon <[email protected]> Cc: Eric Anholt <[email protected]> * tag 'drm-misc-next-2017-06-19_0' of git://anongit.freedesktop.org/git/drm-misc: drm/vc4: Mimic drm_atomic_helper_commit() behavior drm/vc4: Add get/set tiling ioctls. drm/vc4: Add T-format scanout support.
2017-06-21drm/i915: remove rate_to_index, messed up merge.Dave Airlie1-11/+0
This was from a merge I did incorrectly. Signed-off-by: Dave Airlie <[email protected]>
2017-06-21Merge tag 'drm-intel-next-2017-06-19' of ↵Dave Airlie108-3297/+36835
git://anongit.freedesktop.org/git/drm-intel into drm-next Final pile of features for 4.13 New uabi: - batch bo in first slot, for faster execbuf assembly in userspace (Chris Wilson) - (sub)slice getparam, needed for mesa perf support (Robert Bragg) First pile of patches for cnl/cfl support, maintained by Rodrigo but with lots of contributions from others. Still incomplete since public review still ongoing. Features/refactoring: - Make execbuf faster (Chris Wilson), a pile of series to make execbuf buffer handling have fewer passes, use less list walking, postpone more work to async workers and shuffle buffers less, all to make the common case much faster (in some cases at least). - cold boot support for glk dsi (Madhav Chauhan) - Clean up pipe A quirk and related old platform hacks (Ville) - perf sampling support for kbl/glk (Lionel) - perf cleanups (Robert Bragg) - wire atomic state to backlight code, to avoid pipe lookup hacks (Maarten) - reduce request waiting latency/overhead to remove the spinning and associated cpu cycle wasting (Chris) - fix 90/270 rotation wm computation (Ville) - new ddb allocation algo for skl (Kumar Mahesh) - fix regression due to system suspend optimiazatino (Imre) - the usual pile of small cleanups and refactors all over GVT updates contained in this tag: - optimization for per-VM mmio save/restore (Changbin) - optimization for mmio hash table (Changbin) - scheduler optimization with event (Ping) - vGPU reset refinement (Fred) - other misc refactor and cleanups, etc. * tag 'drm-intel-next-2017-06-19' of git://anongit.freedesktop.org/git/drm-intel: (170 commits) drm/i915: Update DRIVER_DATE to 20170619 drm/i915/cfl: Introduce Coffee Lake workarounds. drm/i915: Store 9 bits of PCI Device ID for platforms with a LP PCH drm/i915: Stash a pointer to the obj's resv in the vma drm/i915: Async GPU relocation processing drm/i915: Allow execbuffer to use the first object as the batch drm/i915: Wait upon userptr get-user-pages within execbuffer drm/i915: First try the previous execbuffer location drm/i915: Store a persistent reference for an object in the execbuffer cache drm/i915: Eliminate lots of iterations over the execobjects array drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations drm/i915: Pass vma to relocate entry drm/i915: Store a direct lookup from object handle to vma drm/i915: Fix retrieval of hangcheck stats drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty drm/i915: Mark CPU cache as dirty on every transition for CPU writes drm/i915: Make i915_vma_destroy() static drm/i915: Actually attach the tv_format property to the SDVO connector Revert "drm/i915/skl: New ddb allocation algorithm" drm/i915/glk: Add cold boot sequence for GLK DSI ...
2017-06-21Merge tag 'drm-msm-next-2017-06-20' of ↵Dave Airlie52-678/+2822
git://people.freedesktop.org/~robclark/linux into drm-next This time around, the biggest thing is a bunch of GEM rework for more fine grained locking and prep work to handle multiple address spaces (ie. per-process pagetables). Also some HDMI fixes for 8x96 (snapdragon 820). One unrelated bus patch, for something that seems to get merged through whatever random tree (and has all the right ack's). * tag 'drm-msm-next-2017-06-20' of git://people.freedesktop.org/~robclark/linux: drm/msm: Fix potential buffer overflow issue bus: SIMPLE_PM_BUS does not depend on ARCH_RENESAS drm/msm: Separate locking of buffer resources from struct_mutex drm/msm/hdmi: Fix HDMI pink strip issue seen on 8x96 drm/msm/hdmi: 8996 PLL: Populate unprepare drm/msm/hdmi: Use bitwise operators when building register values drm/msm: update generated headers drm/msm: remove address-space id drm/msm: support for an arbitrary number of address spaces drm/msm: refactor how we handle vram carveout buffers drm/msm: pass address-space to _get_iova() and friends drm/msm/mdp4+5: move aspace/id to base class drm/msm/mdp5: kill pipe_lock drm/msm: fix locking inconsistency for gpu->hw_init() drm/msm: Remove memptrs->wptr drm/msm: Add a struct to pass configuration to msm_gpu_init() drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA drm/msm: Remove idle function hook drm/msm: Remove DRM_MSM_NUM_IOCTLS drm/msm: gpu: Enable zap shader for A5XX
2017-06-20Merge branch 'drm-next-4.13' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie6-77/+210
into drm-next A few more things for 4.13: - Semaphore support using sync objects - Drop fb location programming - Optimize bo list ioctl * 'drm-next-4.13' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: Optimize mutex usage (v4) drm/amdgpu: Optimization of AMDGPU_BO_LIST_OP_CREATE (v2) amdgpu: use drm sync objects for shared semaphores (v6) amdgpu/cs: split out fence dependency checking (v2) drm/amdgpu: don't check the default value for vm size
2017-06-20Merge branch 'for-upstream/mali-dp' of git://linux-arm.org/linux-ld into ↵Dave Airlie3-16/+26
drm-next Here are the Mali DP driver changes. They include the mali-dp specific changes from Jose Abreu on crtc->mode_valid() as well as a couple of patches for fixing the sharing of IRQ lines and use of DRM CMA helper for framebuffer physical address calculation. Please pull! * 'for-upstream/mali-dp' of git://linux-arm.org/linux-ld: drm/arm: mali-dp: Use CMA helper for plane buffer address calculation drm/mali-dp: Check PM status when sharing interrupt lines drm/arm: malidp: Use crtc->mode_valid() callback
2017-06-20Merge branch 'linux-4.13' of git://github.com/skeggsb/linux into drm-nextDave Airlie91-4018/+4531
- HDMI stereoscopic support - Rework of display code to properly support SOR pad macro routing on >=GM20x GPUs - Various other fixes/improvements. * 'linux-4.13' of git://github.com/skeggsb/linux: (73 commits) drm/nouveau/disp/nv50-: avoid creating ORs that aren't present on HW drm/nouveau: use proper prototype in nouveau_pmops_runtime() definition drm/nouveau: Skip vga_fini on non-PCI device drm/nouveau/tegra: Don't leave GPU in reset drm/nouveau/tegra: Skip manual unpowergating when not necessary drm/nouveau/hwmon: Change permissions to numeric drm/nouveau/hwmon: expose the auto_point and pwm_min/max attrs drm/nouveau/hwmon: Remove old code, add .write/.read operations drm/nouveau/hwmon: Add nouveau_hwmon_ops structure with .is_visible/.read_string drm/nouveau/hwmon: Add config for all sensors and their settings drm/nouveau/disp/gm200-: allow non-identity mapping of SOR <-> macro links drm/nouveau/disp/nv50-: implement a common supervisor 3.0 drm/nouveau/disp/nv50-: implement a common supervisor 2.2 drm/nouveau/disp/nv50-: implement a common supervisor 2.1 drm/nouveau/disp/nv50-: implement a common supervisor 2.0 drm/nouveau/disp/nv50-: implement a common supervisor 1.0 drm/nouveau/disp/nv50-gt21x: remove workaround for dp->tmds hotplug issues drm/nouveau/disp/dp: use new devinit script interpreter entry-point drm/nouveau/disp/dp: determine link bandwidth requirements from head state drm/nouveau/disp: introduce acquire/release display path methods ...
2017-06-20Merge tag 'drm/tegra/for-4.13-rc1' of ↵Dave Airlie22-201/+784
git://anongit.freedesktop.org/tegra/linux into drm-next drm/tegra: Changes for v4.13-rc1 This starts off with the addition of more documentation for the host1x and DRM drivers and finishes with a slew of fixes and enhancements for the staging IOCTLs as a result of the awesome work done by Dmitry and Erik on the grate reverse-engineering effort. * tag 'drm/tegra/for-4.13-rc1' of git://anongit.freedesktop.org/tegra/linux: gpu: host1x: At first try a non-blocking allocation for the gather copy gpu: host1x: Refactor channel allocation code gpu: host1x: Remove unused host1x_cdma_stop() definition gpu: host1x: Remove unused 'struct host1x_cmdbuf' gpu: host1x: Check waits in the firewall gpu: host1x: Correct swapped arguments in the is_addr_reg() definition gpu: host1x: Forbid unrelated SETCLASS opcode in the firewall gpu: host1x: Forbid RESTART opcode in the firewall gpu: host1x: Forbid relocation address shifting in the firewall gpu: host1x: Do not leak BO's phys address to userspace gpu: host1x: Correct host1x_job_pin() error handling gpu: host1x: Initialize firewall class to the job's one drm/tegra: dc: Disable plane if it is invisible drm/tegra: dc: Apply clipping to the plane drm/tegra: dc: Avoid reset asserts on Tegra20 drm/tegra: Check syncpoint ID in the 'submit' IOCTL drm/tegra: Correct copying of waitchecks and disable them in the 'submit' IOCTL drm/tegra: Check for malformed offsets and sizes in the 'submit' IOCTL drm/tegra: Add driver documentation gpu: host1x: Flesh out kerneldoc
2017-06-19drm/msm: Fix potential buffer overflow issueKasin Li1-3/+6
In function submit_create, if nr_cmds or nr_bos is assigned with negative value, the allocated buffer may be small than intended. Using this buffer will lead to buffer overflow issue. Signed-off-by: Kasin Li <[email protected]> Signed-off-by: Jordan Crouse <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-06-19drm/amdgpu: Optimize mutex usage (v4)Alex Xie2-12/+30
In original function amdgpu_bo_list_get, the waiting for result->lock can be quite long while mutex bo_list_lock was holding. It can make other tasks waiting for bo_list_lock for long period. Secondly, this patch allows several tasks(readers of idr) to proceed at the same time. v2: use rcu and kref (Dave Airlie and Christian König) v3: update v1 commit message (Michel Dänzer) v4: rebase on upstream (Alex Deucher) Signed-off-by: Alex Xie <[email protected]> Reviewed-by: Christian König <[email protected]>
2017-06-19drm/amdgpu: Optimization of AMDGPU_BO_LIST_OP_CREATE (v2)Alex Xie1-23/+30
v2: Remove duplication of zeroing of bo list (Christian König) Move idr_alloc function to end of ioctl (Christian König) Call kfree bo_list when amdgpu_bo_list_set return error. Combine the previous two patches into this patch. Add amdgpu_bo_list_set function prototype. Signed-off-by: Alex Xie <[email protected]> Reviewed-by: Christian König <[email protected]>
2017-06-19drm/i915: Update DRIVER_DATE to 20170619Daniel Vetter1-2/+2
Signed-off-by: Daniel Vetter <[email protected]>
2017-06-17bus: SIMPLE_PM_BUS does not depend on ARCH_RENESASRob Clark1-1/+0
In fact, it is needed for PCI to work on msm8996 (and probably other things). No idea why it was depending on renesas but that doesn't make any sense. So drop the dependency. Signed-off-by: Rob Clark <[email protected]> Acked-by: Bjorn Andersson <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Arnd Bergmann <[email protected]>
2017-06-17drm/msm: Separate locking of buffer resources from struct_mutexSushmita Susheelendra17-144/+238
Buffer object specific resources like pages, domains, sg list need not be protected with struct_mutex. They can be protected with a buffer object level lock. This simplifies locking and makes it easier to avoid potential recursive locking scenarios for SVM involving mmap_sem and struct_mutex. This also removes unnecessary serialization when creating buffer objects, and also between buffer object creation and GPU command submission. Signed-off-by: Sushmita Susheelendra <[email protected]> [robclark: squash in handling new locking for shrinker] Signed-off-by: Rob Clark <[email protected]>
2017-06-17drm/nouveau/disp/nv50-: avoid creating ORs that aren't present on HWBen Skeggs14-10/+39
Signed-off-by: Ben Skeggs <[email protected]>
2017-06-16drm/i915/cfl: Introduce Coffee Lake workarounds.Rodrigo Vivi2-23/+60
Coffee Lake inherit most of Kabylake production workarounds. v2: Fix typo on commit message and remove WaDisableKillLogic and GEN9_DISABLE_OCL_OOB_SUPPRESS_LOGIC, since as Mika pointed out they shouldn't be here for cfl according to BSpec. Cc: Dhinakaran Pandiyan <[email protected]> Signed-off-by: Rodrigo Vivi <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-16amdgpu: use drm sync objects for shared semaphores (v6)Dave Airlie4-2/+97
This creates a new command submission chunk for amdgpu to add in and out sync objects around the submission. Sync objects are managed via the drm syncobj ioctls. The command submission interface is enhanced with two new chunks, one for syncobj pre submission dependencies, and one for post submission sync obj signalling, and just takes a list of handles for each. This is based on work originally done by David Zhou at AMD, with input from Christian Konig on what things should look like. In theory VkFences could be backed with sync objects and just get passed into the cs as syncobj handles as well. NOTE: this interface addition needs a version bump to expose it to userspace. TODO: update to dep_sync when rebasing onto amdgpu master. (with this - r-b from Christian) v1.1: keep file reference on import. v2: move to using syncobjs v2.1: change some APIs to just use p pointer. v3: make more robust against CS failures, we now add the wait sems but only remove them once the CS job has been submitted. v4: rewrite names of API and base on new syncobj code. v5: move post deps earlier, rename some apis v6: lookup post deps earlier, and just replace fences in post deps stage (Christian) Reviewed-by: Christian König <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-06-16amdgpu/cs: split out fence dependency checking (v2)Dave Airlie1-42/+51
This just splits out the fence depenency checking into it's own function to make it easier to add semaphore dependencies. v2: rebase onto other changes. v1-Reviewed-by: Christian König <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-06-16drm/amdgpu: don't check the default value for vm sizeAlex Deucher1-0/+4
Avoids printing spurious messages like this: [ 3.102059] amdgpu 0000:01:00.0: VM size (-1) must be a power of 2 Reviewed-by: Christian König <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2017-06-16drm/i915: Store 9 bits of PCI Device ID for platforms with a LP PCHDhinakaran Pandiyan1-2/+11
Although we use 9 bits of Device ID for identifying PCH, only 8 bits are stored in dev_priv->pch_id. This makes HAS_PCH_CNP_LP() and HAS_PCH_SPT_LP() incorrect. Fix this by storing all the 9 bits for the platforms with LP PCH. v2: Drop PCH_LPT_LP change (Imre) Cc: Rodrigo Vivi <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Imre Deak <[email protected]> Fixes: commit ec7e0bb35f8d ("drm/i915/cnp: Add PCI ID for Cannonpoint LP PCH") Reported-by: Imre Deak <[email protected]> Reviewed-by: Imre Deak <[email protected]> Signed-off-by: Dhinakaran Pandiyan <[email protected]> Signed-off-by: Imre Deak <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-16drm/i915: Stash a pointer to the obj's resv in the vmaChris Wilson3-14/+15
During execbuf, a mandatory step is that we add this request (this fence) to each object's reservation_object. Inside execbuf, we track the vma, and to add the fence to the reservation_object then means having to first chase the obj, incurring another cache miss. We can reduce the number of cache misses by stashing a pointer to the reservation_object in the vma itself. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-16drm/i915: Async GPU relocation processingChris Wilson2-8/+220
If the user requires patching of their batch or auxiliary buffers, we currently make the alterations on the cpu. If they are active on the GPU at the time, we wait under the struct_mutex for them to finish executing before we rewrite the contents. This happens if shared relocation trees are used between different contexts with separate address space (and the buffers then have different addresses in each), the 3D state will need to be adjusted between execution on each context. However, we don't need to use the CPU to do the relocation patching, as we could queue commands to the GPU to perform it and use fences to serialise the operation with the current activity and future - so the operation on the GPU appears just as atomic as performing it immediately. Performing the relocation rewrites on the GPU is not free, in terms of pure throughput, the number of relocations/s is about halved - but more importantly so is the time under the struct_mutex. v2: Break out the request/batch allocation for clearer error flow. v3: A few asserts to ensure rq ordering is maintained Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Allow execbuffer to use the first object as the batchChris Wilson3-3/+22
Currently, the last object in the execlist is the always the batch. However, when building the batch buffer we often know the batch object first and if we can use the first slot in the execlist we can emit relocation instructions relative to it immediately and avoid a separate pass to adjust the relocations to point to the last execlist slot. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Wait upon userptr get-user-pages within execbufferChris Wilson5-5/+31
This simply hides the EAGAIN caused by userptr when userspace causes resource contention. However, it is quite beneficial with highly contended userptr users as we avoid repeating the setup costs and kernel-user context switches. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Michał Winiarski <[email protected]>
2017-06-16drm/i915: First try the previous execbuffer locationChris Wilson3-4/+15
When choosing a slot for an execbuffer, we ideally want to use the same address as last time (so that we don't have to rebind it) and the same address as expected by the user (so that we don't have to fixup any relocations pointing to it). If we first try to bind the incoming execbuffer->offset from the user, or the currently bound offset that should hopefully achieve the goal of avoiding the rebind cost and the relocation penalty. However, if the object is not currently bound there we don't want to arbitrarily unbind an object in our chosen position and so choose to rebind/relocate the incoming object instead. After we report the new position back to the user, on the next pass the relocations should have settled down. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Store a persistent reference for an object in the execbuffer cacheChris Wilson3-10/+28
If we take a reference to the object/vma when it is first used in an execbuf, we can keep that reference until the object's file-local handle is closed. Thereby saving a frequent ref/unref pair. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Eliminate lots of iterations over the execobjects arrayChris Wilson7-916/+1239
The major scaling bottleneck in execbuffer is the processing of the execobjects. Creating an auxiliary list is inefficient when compared to using the execobject array we already have allocated. Reservation is then split into phases. As we lookup up the VMA, we try and bind it back into active location. Only if that fails, do we add it to the unbound list for phase 2. In phase 2, we try and add all those objects that could not fit into their previous location, with fallback to retrying all objects and evicting the VM in case of severe fragmentation. (This is the same as before, except that phase 1 is now done inline with looking up the VMA to avoid an iteration over the execobject array. In the ideal case, we eliminate the separate reservation phase). During the reservation phase, we only evict from the VM between passes (rather than currently as we try to fit every new VMA). In testing with Unreal Engine's Atlantis demo which stresses the eviction logic on gen7 class hardware, this speed up the framerate by a factor of 2. The second loop amalgamation is between move_to_gpu and move_to_active. As we always submit the request, even if incomplete, we can use the current request to track active VMA as we perform the flushes and synchronisation required. The next big advancement is to avoid copying back to the user any execobjects and relocations that are not changed. v2: Add a Theory of Operation spiel. v3: Fall back to slow relocations in preparation for flushing userptrs. v4: Document struct members, factor out eb_validate_vma(), add a few more comments to explain some magic and hide other magic behind macros. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocationsChris Wilson1-0/+10
If we write a relocation into the buffer, we require our own implicit synchronisation added after the start of the execbuf, outside of the user's control. As we may end up clflushing, or doing the patch itself on the GPU, asynchronously we need to look at the implicit serialisation on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this object. If the user does trigger a stall for relocations, we make sure the stall is complete enough so that the batch is not submitted before we complete those relocations. Fixes: 77ae9957897d ("drm/i915: Enable userspace to opt-out of implicit fencing") Signed-off-by: Chris Wilson <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Jason Ekstrand <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Pass vma to relocate entryChris Wilson1-60/+42
We can simplify our tracking of pending writes in an execbuf to the single bit in the vma->exec_entry->flags, but that requires the relocation function knowing the object's vma. Pass it along. Note we have only been using a single bit to track flushing since commit cc889e0f6ce6a63c62db17d702ecfed86d58083f Author: Daniel Vetter <[email protected]> Date: Wed Jun 13 20:45:19 2012 +0200 drm/i915: disable flushing_list/gpu_write_list unconditionally flushed all render caches before the breadcrumb and commit 6ac42f4148bc27e5ffd18a9ab0eac57f58822af4 Author: Daniel Vetter <[email protected]> Date: Sat Jul 21 12:25:01 2012 +0200 drm/i915: Replace the complex flushing logic with simple invalidate/flush all did away with the explicit GPU domain tracking. This was then codified into the ABI with NO_RELOC in commit ed5982e6ce5f106abcbf071f80730db344a6da42 Author: Daniel Vetter <[email protected]> # Oi! Patch stealer! Date: Thu Jan 17 22:23:36 2013 +0100 drm/i915: Allow userspace to hint that the relocations were known Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Store a direct lookup from object handle to vmaChris Wilson11-112/+322
The advent of full-ppgtt lead to an extra indirection between the object and its binding. That extra indirection has a noticeable impact on how fast we can convert from the user handles to our internal vma for execbuffer. In order to bypass the extra indirection, we use a resizable hashtable to jump from the object to the per-ctx vma. rhashtable was considered but we don't need the online resizing feature and the extra complexity proved to undermine its usefulness. Instead, we simply reallocate the hastable on demand in a background task and serialize it before iterating. In non-full-ppgtt modes, multiple files and multiple contexts can share the same vma. This leads to having multiple possible handle->vma links, so we only use the first to establish the fast path. The majority of buffers are not shared and so we should still be able to realise speedups with multiple clients. v2: Prettier names, more magic. v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2 Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]>
2017-06-16drm/i915: Fix retrieval of hangcheck statsChris Wilson1-3/+0
The default context is always supported (as it contains the global hangcheck stats) and the contexts for hangcheck are not limited to any ring. This was dropped in 2013 because it was supposed to have been included with Ben's full-ppgtt patch set. It never landed and the bug remains. References: https://bugs.freedesktop.org/show_bug.cgi?id=65845 Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Mika Kuoppala <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2017-06-16drm/msm/hdmi: Fix HDMI pink strip issue seen on 8x96Archit Taneja1-3/+67
A 2 pixel wide pink strip was observed on the left end of some HDMI monitors configured in a HDMI mode. It turned out that we were missing out on configuring AVI infoframes, and unlike APQ8064, the 8x96 HDMI H/W seems to be sensitive to that. Add configuration of AVI infoframes. While at it, make sure that hdmi_audio_update is only called when we've detected that the monitor supports HDMI. Signed-off-by: Archit Taneja <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-06-16drm/msm/hdmi: 8996 PLL: Populate unprepareArchit Taneja1-0/+5
Without doing anything in unprepare, the HDMI driver isn't able to switch modes successfully. Calling set_rate with a new rate results in an un-locked PLL. If we reset the PLL in unprepare, the PLL is able to lock with the new rate. Signed-off-by: Archit Taneja <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2017-06-16drm/msm/hdmi: Use bitwise operators when building register valuesLiviu Dudau1-3/+3
Commit c0c0d9eeeb8d ("drm/msm: hdmi audio support") uses logical OR operators to build up a value to be written in the REG_HDMI_AUDIO_INFO0 and REG_HDMI_AUDIO_INFO1 registers when it should have used bitwise operators. Signed-off-by: Liviu Dudau <[email protected]> Fixes: c0c0d9eeeb8d ("drm/msm: hdmi audio support") Signed-off-by: Rob Clark <[email protected]>
2017-06-16drm/msm: update generated headersRob Clark15-337/+2059
Signed-off-by: Rob Clark <[email protected]>
2017-06-16drm/msm: remove address-space idRob Clark8-46/+0
Now that the msm_gem supports an arbitrary number of vma's, we no longer need to assign an id (index) to each address space. So rip out the associated code. Signed-off-by: Rob Clark <[email protected]>
2017-06-16drm/msm: support for an arbitrary number of address spacesRob Clark2-43/+99
It means we have to do a list traversal where we once had an index into a table. But the list will normally have one or two entries. Signed-off-by: Rob Clark <[email protected]>
2017-06-16drm/msm: refactor how we handle vram carveout buffersRob Clark1-21/+27
Pull some of the logic out into msm_gem_new() (since we don't need to care about the imported-bo case), and don't defer allocating pages. The latter is generally a good idea, since if we are using VRAM carveout to allocate contiguous buffers (ie. no IOMMU), the allocation is more likely to fail. So failing at allocation time is a more sane option. Plus this simplifies things in the next patch. Signed-off-by: Rob Clark <[email protected]>
2017-06-16drm/msm: pass address-space to _get_iova() and friendsRob Clark17-58/+78
No functional change, that will come later. But this will make it easier to deal with dynamically created address spaces (ie. per- process pagetables for gpu). Signed-off-by: Rob Clark <[email protected]>
2017-06-16drm/msm/mdp4+5: move aspace/id to base classRob Clark11-40/+56
Before we can shift to passing the address-space object to _get_iova(), we need to fix a few places (dsi+fbdev) that were hard-coding the adress space id. That gets somewhat easier if we just move these to the kms base class. Prep work for next patch. Signed-off-by: Rob Clark <[email protected]>