Merge tag 'drm-for-v4.14' of git://people.freedesktop.org/~airlied/linux

Pull drm updates from Dave Airlie: "This is the main drm pull request for 4.14 merge window. I'm sending this early, as my continuing journey into fatherhood is occurring really soon now, I'm going to be mostly useless for the next couple of weeks, though I may be able to read email, I doubt I'll be doing much patch applications or git sending. If anything urgent pops up I've asked Daniel/Jani/Alex/Sean to try and direct stuff towards you. Outside drm changes: Some rcar-du updates that touch the V4L tree, all acks should be in place. It adds one export to the radix tree code for new i915 use case. There are some minor AGP cleanups (don't see that too often). Changes to the vbox driver in staging to avoid breaking compilation. Summary: core: - Atomic helper fixes - Atomic UAPI fixes - Add YCBCR 4:2:0 support - Drop set_busid hook - Refactor fb_helper locking - Remove a bunch of internal APIs - Add a bunch of better default handlers - Format modifier/blob plane property added - More internal header refactoring - Make more internal API names consistent - Enhanced syncobj APIs (wait/signal/reset/create signalled) bridge: - Add Synopsys Designware MIPI DSI host bridge driver tiny: - Add Pervasive Displays RePaper displays - Add support for LEGO MINDSTORMS EV3 LCD i915: - Lots of GEN10/CNL support patches - drm syncobj support - Skylake+ watermark refactoring - GVT vGPU 48-bit ppgtt support - GVT performance improvements - NOA change ioctl - CCS (color compression) scanout support - GPU reset improvements amdgpu: - Initial hugepage support - BO migration logic rework - Vega10 improvements - Powerplay fixes - Stop reprogramming the MC - Fixes for ACP audio on stoney - SR-IOV fixes/improvements - Command submission overhead improvements amdkfd: - Non-dGPU upstreaming patches - Scratch VA ioctl - Image tiling modes - Update PM4 headers for new firmware - Drop all BUG_ONs. nouveau: - GP108 modesetting support. - Disable MSI on big endian. vmwgfx: - Add fence fd support. msm: - Runtime PM improvements exynos: - NV12MT support - Refactor KMS drivers imx-drm: - Lock scanout channel to improve memory bw - Cleanups etnaviv: - GEM object population fixes tegra: - Prep work for Tegra186 support - PRIME mmap support sunxi: - HDMI support improvements - HDMI CEC support omapdrm: - HDMI hotplug IRQ support - Big driver cleanup - OMAP5 DSI support rcar-du: - vblank fixes - VSP1 updates arcgpu: - Minor fixes stm: - Add STM32 DSI controller driver dw_hdmi: - Add support for Rockchip RK3399 - HDMI CEC support atmel-hlcdc: - Add 8-bit color support vc4: - Atomic fixes - New ioctl to attach a label to a buffer object - HDMI CEC support - Allow userspace to dictate rendering order on submit ioctl" * tag 'drm-for-v4.14' of git://people.freedesktop.org/~airlied/linux: (1074 commits) drm/syncobj: Add a signal ioctl (v3) drm/syncobj: Add a reset ioctl (v3) drm/syncobj: Add a syncobj_array_find helper drm/syncobj: Allow wait for submit and signal behavior (v5) drm/syncobj: Add a CREATE_SIGNALED flag drm/syncobj: Add a callback mechanism for replace_fence (v3) drm/syncobj: add sync obj wait interface. (v8) i915: Use drm_syncobj_fence_get drm/syncobj: Add a race-free drm_syncobj_fence_get helper (v2) drm/syncobj: Rename fence_get to find_fence drm: kirin: Add mode_valid logic to avoid mode clocks we can't generate drm/vmwgfx: Bump the version for fence FD support drm/vmwgfx: Add export fence to file descriptor support drm/vmwgfx: Add support for imported Fence File Descriptor drm/vmwgfx: Prepare to support fence fd drm/vmwgfx: Fix incorrect command header offset at restart drm/vmwgfx: Support the NOP_ERROR command drm/vmwgfx: Restart command buffers after errors drm/vmwgfx: Move irq bottom half processing to threads drm/vmwgfx: Don't use drm_irq_[un]install ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2017-09-03 17:02:26 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2017-09-03 17:02:26 -0700
commit: 906dde0f355bd97c080c215811ae7db1137c4af8 (patch)
tree: 1e1b4ba58a59c4027f06d86e7430566ee0dcbb15 /drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
parent: 69c0067aa3f40d3e52ab78643aecb17d669d3389 (diff)
parent: 7846b12fe0b5feab5446d892f41b5140c1419109 (diff)
1 files changed, 294 insertions, 210 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 5795f81369f0..6b1343e5541d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -77,8 +77,6 @@ struct amdgpu_pte_update_params {
 	void (*func)(struct amdgpu_pte_update_params *params, uint64_t pe,
 		     uint64_t addr, unsigned count, uint32_t incr,
 		     uint64_t flags);
-	/* indicate update pt or its shadow */
-	bool shadow;
 	/* The next two are used during VM update by CPU
 	 *  DMA addresses to use for mapping
 	 *  Kernel pointer of PD/PT BO that needs to be updated
@@ -161,11 +159,26 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
  */
 static int amdgpu_vm_validate_level(struct amdgpu_vm_pt *parent,
 				    int (*validate)(void *, struct amdgpu_bo *),
-				    void *param)
+				    void *param, bool use_cpu_for_update,
+				    struct ttm_bo_global *glob)
 {
 	unsigned i;
 	int r;
 
+	if (parent->bo->shadow) {
+		struct amdgpu_bo *shadow = parent->bo->shadow;
+
+		r = amdgpu_ttm_bind(&shadow->tbo, &shadow->tbo.mem);
+		if (r)
+			return r;
+	}
+
+	if (use_cpu_for_update) {
+		r = amdgpu_bo_kmap(parent->bo, NULL);
+		if (r)
+			return r;
+	}
+
 	if (!parent->entries)
 		return 0;
 
@@ -179,11 +192,18 @@ static int amdgpu_vm_validate_level(struct amdgpu_vm_pt *parent,
 		if (r)
 			return r;
 
+		spin_lock(&glob->lru_lock);
+		ttm_bo_move_to_lru_tail(&entry->bo->tbo);
+		if (entry->bo->shadow)
+			ttm_bo_move_to_lru_tail(&entry->bo->shadow->tbo);
+		spin_unlock(&glob->lru_lock);
+
 		/*
 		 * Recurse into the sub directory. This is harmless because we
 		 * have only a maximum of 5 layers.
 		 */
-		r = amdgpu_vm_validate_level(entry, validate, param);
+		r = amdgpu_vm_validate_level(entry, validate, param,
+					     use_cpu_for_update, glob);
 		if (r)
 			return r;
 	}
@@ -214,54 +234,12 @@ int amdgpu_vm_validate_pt_bos(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (num_evictions == vm->last_eviction_counter)
 		return 0;
 
-	return amdgpu_vm_validate_level(&vm->root, validate, param);
-}
-
-/**
- * amdgpu_vm_move_level_in_lru - move one level of PT BOs to the LRU tail
- *
- * @adev: amdgpu device instance
- * @vm: vm providing the BOs
- *
- * Move the PT BOs to the tail of the LRU.
- */
-static void amdgpu_vm_move_level_in_lru(struct amdgpu_vm_pt *parent)
-{
-	unsigned i;
-
-	if (!parent->entries)
-		return;
-
-	for (i = 0; i <= parent->last_entry_used; ++i) {
-		struct amdgpu_vm_pt *entry = &parent->entries[i];
-
-		if (!entry->bo)
-			continue;
-
-		ttm_bo_move_to_lru_tail(&entry->bo->tbo);
-		amdgpu_vm_move_level_in_lru(entry);
-	}
+	return amdgpu_vm_validate_level(&vm->root, validate, param,
+					vm->use_cpu_for_update,
+					adev->mman.bdev.glob);
 }
 
 /**
- * amdgpu_vm_move_pt_bos_in_lru - move the PT BOs to the LRU tail
- *
- * @adev: amdgpu device instance
- * @vm: vm providing the BOs
- *
- * Move the PT BOs to the tail of the LRU.
- */
-void amdgpu_vm_move_pt_bos_in_lru(struct amdgpu_device *adev,
-				  struct amdgpu_vm *vm)
-{
-	struct ttm_bo_global *glob = adev->mman.bdev.glob;
-
-	spin_lock(&glob->lru_lock);
-	amdgpu_vm_move_level_in_lru(&vm->root);
-	spin_unlock(&glob->lru_lock);
-}
-
- /**
  * amdgpu_vm_alloc_levels - allocate the PD/PT levels
  *
  * @adev: amdgpu_device pointer
@@ -282,6 +260,7 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev,
 	unsigned pt_idx, from, to;
 	int r;
 	u64 flags;
+	uint64_t init_value = 0;
 
 	if (!parent->entries) {
 		unsigned num_entries = amdgpu_vm_num_entries(adev, level);
@@ -315,6 +294,12 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev,
 		flags |= (AMDGPU_GEM_CREATE_NO_CPU_ACCESS |
 				AMDGPU_GEM_CREATE_SHADOW);
 
+	if (vm->pte_support_ats) {
+		init_value = AMDGPU_PTE_SYSTEM;
+		if (level != adev->vm_manager.num_level - 1)
+			init_value |= AMDGPU_PDE_PTE;
+	}
+
 	/* walk over the address space and allocate the page tables */
 	for (pt_idx = from; pt_idx <= to; ++pt_idx) {
 		struct reservation_object *resv = vm->root.bo->tbo.resv;
@@ -327,10 +312,18 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev,
 					     AMDGPU_GPU_PAGE_SIZE, true,
 					     AMDGPU_GEM_DOMAIN_VRAM,
 					     flags,
-					     NULL, resv, &pt);
+					     NULL, resv, init_value, &pt);
 			if (r)
 				return r;
 
+			if (vm->use_cpu_for_update) {
+				r = amdgpu_bo_kmap(pt, NULL);
+				if (r) {
+					amdgpu_bo_unref(&pt);
+					return r;
+				}
+			}
+
 			/* Keep a reference to the root directory to avoid
 			* freeing them up in the wrong order.
 			*/
@@ -424,7 +417,7 @@ static int amdgpu_vm_grab_reserved_vmid_locked(struct amdgpu_vm *vm,
 	struct dma_fence *updates = sync->last_vm_update;
 	int r = 0;
 	struct dma_fence *flushed, *tmp;
-	bool needs_flush = false;
+	bool needs_flush = vm->use_cpu_for_update;
 
 	flushed  = id->flushed_updates;
 	if ((amdgpu_vm_had_gpu_reset(adev, id)) ||
@@ -545,11 +538,11 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring,
 	}
 	kfree(fences);
 
-	job->vm_needs_flush = false;
+	job->vm_needs_flush = vm->use_cpu_for_update;
 	/* Check if we can use a VMID already assigned to this VM */
 	list_for_each_entry_reverse(id, &id_mgr->ids_lru, list) {
 		struct dma_fence *flushed;
-		bool needs_flush = false;
+		bool needs_flush = vm->use_cpu_for_update;
 
 		/* Check all the prerequisites to using this VMID */
 		if (amdgpu_vm_had_gpu_reset(adev, id))
@@ -745,7 +738,7 @@ static bool amdgpu_vm_is_large_bar(struct amdgpu_device *adev)
  *
  * Emit a VM flush when it is necessary.
  */
-int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
+int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job, bool need_pipe_sync)
 {
 	struct amdgpu_device *adev = ring->adev;
 	unsigned vmhub = ring->funcs->vmhub;
@@ -767,12 +760,15 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job)
 		vm_flush_needed = true;
 	}
 
-	if (!vm_flush_needed && !gds_switch_needed)
+	if (!vm_flush_needed && !gds_switch_needed && !need_pipe_sync)
 		return 0;
 
 	if (ring->funcs->init_cond_exec)
 		patch_offset = amdgpu_ring_init_cond_exec(ring);
 
+	if (need_pipe_sync)
+		amdgpu_ring_emit_pipeline_sync(ring);
+
 	if (ring->funcs->emit_vm_flush && vm_flush_needed) {
 		struct dma_fence *fence;
 
@@ -874,8 +870,8 @@ struct amdgpu_bo_va *amdgpu_vm_bo_find(struct amdgpu_vm *vm,
 {
 	struct amdgpu_bo_va *bo_va;
 
-	list_for_each_entry(bo_va, &bo->va, bo_list) {
-		if (bo_va->vm == vm) {
+	list_for_each_entry(bo_va, &bo->va, base.bo_list) {
+		if (bo_va->base.vm == vm) {
 			return bo_va;
 		}
 	}
@@ -981,6 +977,8 @@ static void amdgpu_vm_cpu_set_ptes(struct amdgpu_pte_update_params *params,
 	unsigned int i;
 	uint64_t value;
 
+	trace_amdgpu_vm_set_ptes(pe, addr, count, incr, flags);
+
 	for (i = 0; i < count; i++) {
 		value = params->pages_addr ?
 			amdgpu_vm_map_gart(params->pages_addr, addr) :
@@ -989,19 +987,16 @@ static void amdgpu_vm_cpu_set_ptes(struct amdgpu_pte_update_params *params,
 					i, value, flags);
 		addr += incr;
 	}
-
-	/* Flush HDP */
-	mb();
-	amdgpu_gart_flush_gpu_tlb(params->adev, 0);
 }
 
-static int amdgpu_vm_bo_wait(struct amdgpu_device *adev, struct amdgpu_bo *bo)
+static int amdgpu_vm_wait_pd(struct amdgpu_device *adev, struct amdgpu_vm *vm,
+			     void *owner)
 {
 	struct amdgpu_sync sync;
 	int r;
 
 	amdgpu_sync_create(&sync);
-	amdgpu_sync_resv(adev, &sync, bo->tbo.resv, AMDGPU_FENCE_OWNER_VM);
+	amdgpu_sync_resv(adev, &sync, vm->root.bo->tbo.resv, owner);
 	r = amdgpu_sync_wait(&sync, true);
 	amdgpu_sync_free(&sync);
 
@@ -1042,23 +1037,14 @@ static int amdgpu_vm_update_level(struct amdgpu_device *adev,
 	params.adev = adev;
 	shadow = parent->bo->shadow;
 
-	WARN_ON(vm->use_cpu_for_update && shadow);
-	if (vm->use_cpu_for_update && !shadow) {
-		r = amdgpu_bo_kmap(parent->bo, (void **)&pd_addr);
-		if (r)
-			return r;
-		r = amdgpu_vm_bo_wait(adev, parent->bo);
-		if (unlikely(r)) {
-			amdgpu_bo_kunmap(parent->bo);
+	if (vm->use_cpu_for_update) {
+		pd_addr = (unsigned long)amdgpu_bo_kptr(parent->bo);
+		r = amdgpu_vm_wait_pd(adev, vm, AMDGPU_FENCE_OWNER_VM);
+		if (unlikely(r))
 			return r;
-		}
+
 		params.func = amdgpu_vm_cpu_set_ptes;
 	} else {
-		if (shadow) {
-			r = amdgpu_ttm_bind(&shadow->tbo, &shadow->tbo.mem);
-			if (r)
-				return r;
-		}
 		ring = container_of(vm->entity.sched, struct amdgpu_ring,
 				    sched);
 
@@ -1094,21 +1080,14 @@ static int amdgpu_vm_update_level(struct amdgpu_device *adev,
 		if (bo == NULL)
 			continue;
 
-		if (bo->shadow) {
-			struct amdgpu_bo *pt_shadow = bo->shadow;
-
-			r = amdgpu_ttm_bind(&pt_shadow->tbo,
-					    &pt_shadow->tbo.mem);
-			if (r)
-				return r;
-		}
-
 		pt = amdgpu_bo_gpu_offset(bo);
 		pt = amdgpu_gart_get_vm_pde(adev, pt);
-		if (parent->entries[pt_idx].addr == pt)
+		/* Don't update huge pages here */
+		if ((parent->entries[pt_idx].addr & AMDGPU_PDE_PTE) ||
+		    parent->entries[pt_idx].addr == (pt | AMDGPU_PTE_VALID))
 			continue;
 
-		parent->entries[pt_idx].addr = pt;
+		parent->entries[pt_idx].addr = pt | AMDGPU_PTE_VALID;
 
 		pde = pd_addr + pt_idx * 8;
 		if (((last_pde + 8 * count) != pde) ||
@@ -1146,28 +1125,29 @@ static int amdgpu_vm_update_level(struct amdgpu_device *adev,
 			    count, incr, AMDGPU_PTE_VALID);
 	}
 
-	if (params.func == amdgpu_vm_cpu_set_ptes)
-		amdgpu_bo_kunmap(parent->bo);
-	else if (params.ib->length_dw == 0) {
-		amdgpu_job_free(job);
-	} else {
-		amdgpu_ring_pad_ib(ring, params.ib);
-		amdgpu_sync_resv(adev, &job->sync, parent->bo->tbo.resv,
-				 AMDGPU_FENCE_OWNER_VM);
-		if (shadow)
-			amdgpu_sync_resv(adev, &job->sync, shadow->tbo.resv,
+	if (!vm->use_cpu_for_update) {
+		if (params.ib->length_dw == 0) {
+			amdgpu_job_free(job);
+		} else {
+			amdgpu_ring_pad_ib(ring, params.ib);
+			amdgpu_sync_resv(adev, &job->sync, parent->bo->tbo.resv,
 					 AMDGPU_FENCE_OWNER_VM);
+			if (shadow)
+				amdgpu_sync_resv(adev, &job->sync,
+						 shadow->tbo.resv,
+						 AMDGPU_FENCE_OWNER_VM);
+
+			WARN_ON(params.ib->length_dw > ndw);
+			r = amdgpu_job_submit(job, ring, &vm->entity,
+					AMDGPU_FENCE_OWNER_VM, &fence);
+			if (r)
+				goto error_free;
 
-		WARN_ON(params.ib->length_dw > ndw);
-		r = amdgpu_job_submit(job, ring, &vm->entity,
-				AMDGPU_FENCE_OWNER_VM, &fence);
-		if (r)
-			goto error_free;
-
-		amdgpu_bo_fence(parent->bo, fence, true);
-		dma_fence_put(vm->last_dir_update);
-		vm->last_dir_update = dma_fence_get(fence);
-		dma_fence_put(fence);
+			amdgpu_bo_fence(parent->bo, fence, true);
+			dma_fence_put(vm->last_dir_update);
+			vm->last_dir_update = dma_fence_get(fence);
+			dma_fence_put(fence);
+		}
 	}
 	/*
 	 * Recurse into the subdirectories. This recursion is harmless because
@@ -1235,33 +1215,98 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev,
 	if (r)
 		amdgpu_vm_invalidate_level(&vm->root);
 
+	if (vm->use_cpu_for_update) {
+		/* Flush HDP */
+		mb();
+		amdgpu_gart_flush_gpu_tlb(adev, 0);
+	}
+
 	return r;
 }
 
 /**
- * amdgpu_vm_find_pt - find the page table for an address
+ * amdgpu_vm_find_entry - find the entry for an address
  *
  * @p: see amdgpu_pte_update_params definition
  * @addr: virtual address in question
+ * @entry: resulting entry or NULL
+ * @parent: parent entry
  *
- * Find the page table BO for a virtual address, return NULL when none found.
+ * Find the vm_pt entry and it's parent for the given address.
  */
-static struct amdgpu_bo *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p,
-					  uint64_t addr)
+void amdgpu_vm_get_entry(struct amdgpu_pte_update_params *p, uint64_t addr,
+			 struct amdgpu_vm_pt **entry,
+			 struct amdgpu_vm_pt **parent)
 {
-	struct amdgpu_vm_pt *entry = &p->vm->root;
 	unsigned idx, level = p->adev->vm_manager.num_level;
 
-	while (entry->entries) {
+	*parent = NULL;
+	*entry = &p->vm->root;
+	while ((*entry)->entries) {
 		idx = addr >> (p->adev->vm_manager.block_size * level--);
-		idx %= amdgpu_bo_size(entry->bo) / 8;
-		entry = &entry->entries[idx];
+		idx %= amdgpu_bo_size((*entry)->bo) / 8;
+		*parent = *entry;
+		*entry = &(*entry)->entries[idx];
 	}
 
 	if (level)
-		return NULL;
+		*entry = NULL;
+}
+
+/**
+ * amdgpu_vm_handle_huge_pages - handle updating the PD with huge pages
+ *
+ * @p: see amdgpu_pte_update_params definition
+ * @entry: vm_pt entry to check
+ * @parent: parent entry
+ * @nptes: number of PTEs updated with this operation
+ * @dst: destination address where the PTEs should point to
+ * @flags: access flags fro the PTEs
+ *
+ * Check if we can update the PD with a huge page.
+ */
+static void amdgpu_vm_handle_huge_pages(struct amdgpu_pte_update_params *p,
+					struct amdgpu_vm_pt *entry,
+					struct amdgpu_vm_pt *parent,
+					unsigned nptes, uint64_t dst,
+					uint64_t flags)
+{
+	bool use_cpu_update = (p->func == amdgpu_vm_cpu_set_ptes);
+	uint64_t pd_addr, pde;
+
+	/* In the case of a mixed PT the PDE must point to it*/
+	if (p->adev->asic_type < CHIP_VEGA10 ||
+	    nptes != AMDGPU_VM_PTE_COUNT(p->adev) ||
+	    p->func == amdgpu_vm_do_copy_ptes ||
+	    !(flags & AMDGPU_PTE_VALID)) {
+
+		dst = amdgpu_bo_gpu_offset(entry->bo);
+		dst = amdgpu_gart_get_vm_pde(p->adev, dst);
+		flags = AMDGPU_PTE_VALID;
+	} else {
+		/* Set the huge page flag to stop scanning at this PDE */
+		flags |= AMDGPU_PDE_PTE;
+	}
+
+	if (entry->addr == (dst | flags))
+		return;
+
+	entry->addr = (dst | flags);
 
-	return entry->bo;
+	if (use_cpu_update) {
+		pd_addr = (unsigned long)amdgpu_bo_kptr(parent->bo);
+		pde = pd_addr + (entry - parent->entries) * 8;
+		amdgpu_vm_cpu_set_ptes(p, pde, dst, 1, 0, flags);
+	} else {
+		if (parent->bo->shadow) {
+			pd_addr = amdgpu_bo_gpu_offset(parent->bo->shadow);
+			pde = pd_addr + (entry - parent->entries) * 8;
+			amdgpu_vm_do_set_ptes(p, pde, dst, 1, 0, flags);
+		}
+		pd_addr = amdgpu_bo_gpu_offset(parent->bo);
+		pde = pd_addr + (entry - parent->entries) * 8;
+		amdgpu_vm_do_set_ptes(p, pde, dst, 1, 0, flags);
+	}
 }
 
 /**
@@ -1287,49 +1332,44 @@ static int amdgpu_vm_update_ptes(struct amdgpu_pte_update_params *params,
 	uint64_t addr, pe_start;
 	struct amdgpu_bo *pt;
 	unsigned nptes;
-	int r;
 	bool use_cpu_update = (params->func == amdgpu_vm_cpu_set_ptes);
 
-
 	/* walk over the address space and update the page tables */
-	for (addr = start; addr < end; addr += nptes) {
-		pt = amdgpu_vm_get_pt(params, addr);
-		if (!pt) {
-			pr_err("PT not found, aborting update_ptes\n");
-			return -EINVAL;
-		}
+	for (addr = start; addr < end; addr += nptes,
+	     dst += nptes * AMDGPU_GPU_PAGE_SIZE) {
+		struct amdgpu_vm_pt *entry, *parent;
 
-		if (params->shadow) {
-			if (WARN_ONCE(use_cpu_update,
-				"CPU VM update doesn't suuport shadow pages"))
-				return 0;
-
-			if (!pt->shadow)
-				return 0;
-			pt = pt->shadow;
-		}
+		amdgpu_vm_get_entry(params, addr, &entry, &parent);
+		if (!entry)
+			return -ENOENT;
 
 		if ((addr & ~mask) == (end & ~mask))
 			nptes = end - addr;
 		else
 			nptes = AMDGPU_VM_PTE_COUNT(adev) - (addr & mask);
 
+		amdgpu_vm_handle_huge_pages(params, entry, parent,
+					    nptes, dst, flags);
+		/* We don't need to update PTEs for huge pages */
+		if (entry->addr & AMDGPU_PDE_PTE)
+			continue;
+
+		pt = entry->bo;
 		if (use_cpu_update) {
-			r = amdgpu_bo_kmap(pt, (void *)&pe_start);
-			if (r)
-				return r;
-		} else
+			pe_start = (unsigned long)amdgpu_bo_kptr(pt);
+		} else {
+			if (pt->shadow) {
+				pe_start = amdgpu_bo_gpu_offset(pt->shadow);
+				pe_start += (addr & mask) * 8;
+				params->func(params, pe_start, dst, nptes,
+					     AMDGPU_GPU_PAGE_SIZE, flags);
+			}
 			pe_start = amdgpu_bo_gpu_offset(pt);
+		}
 
 		pe_start += (addr & mask) * 8;
-
 		params->func(params, pe_start, dst, nptes,
 			     AMDGPU_GPU_PAGE_SIZE, flags);
-
-		dst += nptes * AMDGPU_GPU_PAGE_SIZE;
-
-		if (use_cpu_update)
-			amdgpu_bo_kunmap(pt);
 	}
 
 	return 0;
@@ -1370,10 +1410,9 @@ static int amdgpu_vm_frag_ptes(struct amdgpu_pte_update_params	*params,
 	 * Userspace can support this by aligning virtual base address and
 	 * allocation size to the fragment size.
 	 */
-
-	/* SI and newer are optimized for 64KB */
-	uint64_t frag_flags = AMDGPU_PTE_FRAG(AMDGPU_LOG2_PAGES_PER_FRAG);
-	uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;
+	unsigned pages_per_frag = params->adev->vm_manager.fragment_size;
+	uint64_t frag_flags = AMDGPU_PTE_FRAG(pages_per_frag);
+	uint64_t frag_align = 1 << pages_per_frag;
 
 	uint64_t frag_start = ALIGN(start, frag_align);
 	uint64_t frag_end = end & ~(frag_align - 1);
@@ -1445,6 +1484,10 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	params.vm = vm;
 	params.src = src;
 
+	/* sync to everything on unmapping */
+	if (!(flags & AMDGPU_PTE_VALID))
+		owner = AMDGPU_FENCE_OWNER_UNDEFINED;
+
 	if (vm->use_cpu_for_update) {
 		/* params.src is used as flag to indicate system Memory */
 		if (pages_addr)
@@ -1453,23 +1496,18 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 		/* Wait for PT BOs to be free. PTs share the same resv. object
 		 * as the root PD BO
 		 */
-		r = amdgpu_vm_bo_wait(adev, vm->root.bo);
+		r = amdgpu_vm_wait_pd(adev, vm, owner);
 		if (unlikely(r))
 			return r;
 
 		params.func = amdgpu_vm_cpu_set_ptes;
 		params.pages_addr = pages_addr;
-		params.shadow = false;
 		return amdgpu_vm_frag_ptes(&params, start, last + 1,
 					   addr, flags);
 	}
 
 	ring = container_of(vm->entity.sched, struct amdgpu_ring, sched);
 
-	/* sync to everything on unmapping */
-	if (!(flags & AMDGPU_PTE_VALID))
-		owner = AMDGPU_FENCE_OWNER_UNDEFINED;
-
 	nptes = last - start + 1;
 
 	/*
@@ -1481,6 +1519,9 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	/* padding, etc. */
 	ndw = 64;
 
+	/* one PDE write for each huge page */
+	ndw += ((nptes >> adev->vm_manager.block_size) + 1) * 6;
+
 	if (src) {
 		/* only copy commands needed */
 		ndw += ncmds * 7;
@@ -1542,11 +1583,6 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	if (r)
 		goto error_free;
 
-	params.shadow = true;
-	r = amdgpu_vm_frag_ptes(&params, start, last + 1, addr, flags);
-	if (r)
-		goto error_free;
-	params.shadow = false;
 	r = amdgpu_vm_frag_ptes(&params, start, last + 1, addr, flags);
 	if (r)
 		goto error_free;
@@ -1565,6 +1601,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 
 error_free:
 	amdgpu_job_free(job);
+	amdgpu_vm_invalidate_level(&vm->root);
 	return r;
 }
 
@@ -1687,7 +1724,8 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev,
 			struct amdgpu_bo_va *bo_va,
 			bool clear)
 {
-	struct amdgpu_vm *vm = bo_va->vm;
+	struct amdgpu_bo *bo = bo_va->base.bo;
+	struct amdgpu_vm *vm = bo_va->base.vm;
 	struct amdgpu_bo_va_mapping *mapping;
 	dma_addr_t *pages_addr = NULL;
 	uint64_t gtt_flags, flags;
@@ -1696,27 +1734,27 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev,
 	struct dma_fence *exclusive;
 	int r;
 
-	if (clear || !bo_va->bo) {
+	if (clear || !bo_va->base.bo) {
 		mem = NULL;
 		nodes = NULL;
 		exclusive = NULL;
 	} else {
 		struct ttm_dma_tt *ttm;
 
-		mem = &bo_va->bo->tbo.mem;
+		mem = &bo_va->base.bo->tbo.mem;
 		nodes = mem->mm_node;
 		if (mem->mem_type == TTM_PL_TT) {
-			ttm = container_of(bo_va->bo->tbo.ttm, struct
-					   ttm_dma_tt, ttm);
+			ttm = container_of(bo_va->base.bo->tbo.ttm,
+					   struct ttm_dma_tt, ttm);
 			pages_addr = ttm->dma_address;
 		}
-		exclusive = reservation_object_get_excl(bo_va->bo->tbo.resv);
+		exclusive = reservation_object_get_excl(bo->tbo.resv);
 	}
 
-	if (bo_va->bo) {
-		flags = amdgpu_ttm_tt_pte_flags(adev, bo_va->bo->tbo.ttm, mem);
-		gtt_flags = (amdgpu_ttm_is_bound(bo_va->bo->tbo.ttm) &&
-			adev == amdgpu_ttm_adev(bo_va->bo->tbo.bdev)) ?
+	if (bo) {
+		flags = amdgpu_ttm_tt_pte_flags(adev, bo->tbo.ttm, mem);
+		gtt_flags = (amdgpu_ttm_is_bound(bo->tbo.ttm) &&
+			adev == amdgpu_ttm_adev(bo->tbo.bdev)) ?
 			flags : 0;
 	} else {
 		flags = 0x0;
@@ -1724,7 +1762,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev,
 	}
 
 	spin_lock(&vm->status_lock);
-	if (!list_empty(&bo_va->vm_status))
+	if (!list_empty(&bo_va->base.vm_status))
 		list_splice_init(&bo_va->valids, &bo_va->invalids);
 	spin_unlock(&vm->status_lock);
 
@@ -1747,11 +1785,17 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev,
 
 	spin_lock(&vm->status_lock);
 	list_splice_init(&bo_va->invalids, &bo_va->valids);
-	list_del_init(&bo_va->vm_status);
+	list_del_init(&bo_va->base.vm_status);
 	if (clear)
-		list_add(&bo_va->vm_status, &vm->cleared);
+		list_add(&bo_va->base.vm_status, &vm->cleared);
 	spin_unlock(&vm->status_lock);
 
+	if (vm->use_cpu_for_update) {
+		/* Flush HDP */
+		mb();
+		amdgpu_gart_flush_gpu_tlb(adev, 0);
+	}
+
 	return 0;
 }
 
@@ -1905,15 +1949,19 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
 	struct amdgpu_bo_va_mapping *mapping;
 	struct dma_fence *f = NULL;
 	int r;
+	uint64_t init_pte_value = 0;
 
 	while (!list_empty(&vm->freed)) {
 		mapping = list_first_entry(&vm->freed,
 			struct amdgpu_bo_va_mapping, list);
 		list_del(&mapping->list);
 
+		if (vm->pte_support_ats)
+			init_pte_value = AMDGPU_PTE_SYSTEM;
+
 		r = amdgpu_vm_bo_update_mapping(adev, NULL, 0, NULL, vm,
 						mapping->start, mapping->last,
-						0, 0, &f);
+						init_pte_value, 0, &f);
 		amdgpu_vm_free_mapping(adev, vm, mapping, f);
 		if (r) {
 			dma_fence_put(f);
@@ -1933,26 +1981,26 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
 }
 
 /**
- * amdgpu_vm_clear_invalids - clear invalidated BOs in the PT
+ * amdgpu_vm_clear_moved - clear moved BOs in the PT
  *
  * @adev: amdgpu_device pointer
  * @vm: requested vm
  *
- * Make sure all invalidated BOs are cleared in the PT.
+ * Make sure all moved BOs are cleared in the PT.
  * Returns 0 for success.
  *
  * PTs have to be reserved and mutex must be locked!
  */
-int amdgpu_vm_clear_invalids(struct amdgpu_device *adev,
-			     struct amdgpu_vm *vm, struct amdgpu_sync *sync)
+int amdgpu_vm_clear_moved(struct amdgpu_device *adev, struct amdgpu_vm *vm,
+			    struct amdgpu_sync *sync)
 {
 	struct amdgpu_bo_va *bo_va = NULL;
 	int r = 0;
 
 	spin_lock(&vm->status_lock);
-	while (!list_empty(&vm->invalidated)) {
-		bo_va = list_first_entry(&vm->invalidated,
-			struct amdgpu_bo_va, vm_status);
+	while (!list_empty(&vm->moved)) {
+		bo_va = list_first_entry(&vm->moved,
+			struct amdgpu_bo_va, base.vm_status);
 		spin_unlock(&vm->status_lock);
 
 		r = amdgpu_vm_bo_update(adev, bo_va, true);
@@ -1992,16 +2040,17 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct amdgpu_device *adev,
 	if (bo_va == NULL) {
 		return NULL;
 	}
-	bo_va->vm = vm;
-	bo_va->bo = bo;
+	bo_va->base.vm = vm;
+	bo_va->base.bo = bo;
+	INIT_LIST_HEAD(&bo_va->base.bo_list);
+	INIT_LIST_HEAD(&bo_va->base.vm_status);
+
 	bo_va->ref_count = 1;
-	INIT_LIST_HEAD(&bo_va->bo_list);
 	INIT_LIST_HEAD(&bo_va->valids);
 	INIT_LIST_HEAD(&bo_va->invalids);
-	INIT_LIST_HEAD(&bo_va->vm_status);
 
 	if (bo)
-		list_add_tail(&bo_va->bo_list, &bo->va);
+		list_add_tail(&bo_va->base.bo_list, &bo->va);
 
 	return bo_va;
 }
@@ -2026,7 +2075,8 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
 		     uint64_t size, uint64_t flags)
 {
 	struct amdgpu_bo_va_mapping *mapping, *tmp;
-	struct amdgpu_vm *vm = bo_va->vm;
+	struct amdgpu_bo *bo = bo_va->base.bo;
+	struct amdgpu_vm *vm = bo_va->base.vm;
 	uint64_t eaddr;
 
 	/* validate the parameters */
@@ -2037,7 +2087,7 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
 	/* make sure object fit at this offset */
 	eaddr = saddr + size - 1;
 	if (saddr >= eaddr ||
-	    (bo_va->bo && offset + size > amdgpu_bo_size(bo_va->bo)))
+	    (bo && offset + size > amdgpu_bo_size(bo)))
 		return -EINVAL;
 
 	saddr /= AMDGPU_GPU_PAGE_SIZE;
@@ -2047,7 +2097,7 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev,
 	if (tmp) {
 		/* bo and tmp overlap, invalid addr */
 		dev_err(adev->dev, "bo %p va 0x%010Lx-0x%010Lx conflict with "
-			"0x%010Lx-0x%010Lx\n", bo_va->bo, saddr, eaddr,
+			"0x%010Lx-0x%010Lx\n", bo, saddr, eaddr,
 			tmp->start, tmp->last + 1);
 		return -EINVAL;
 	}
@@ -2092,7 +2142,8 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
 			     uint64_t size, uint64_t flags)
 {
 	struct amdgpu_bo_va_mapping *mapping;
-	struct amdgpu_vm *vm = bo_va->vm;
+	struct amdgpu_bo *bo = bo_va->base.bo;
+	struct amdgpu_vm *vm = bo_va->base.vm;
 	uint64_t eaddr;
 	int r;
 
@@ -2104,7 +2155,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
 	/* make sure object fit at this offset */
 	eaddr = saddr + size - 1;
 	if (saddr >= eaddr ||
-	    (bo_va->bo && offset + size > amdgpu_bo_size(bo_va->bo)))
+	    (bo && offset + size > amdgpu_bo_size(bo)))
 		return -EINVAL;
 
 	/* Allocate all the needed memory */
@@ -2112,7 +2163,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev,
 	if (!mapping)
 		return -ENOMEM;
 
-	r = amdgpu_vm_bo_clear_mappings(adev, bo_va->vm, saddr, size);
+	r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size);
 	if (r) {
 		kfree(mapping);
 		return r;
@@ -2152,7 +2203,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev,
 		       uint64_t saddr)
 {
 	struct amdgpu_bo_va_mapping *mapping;
-	struct amdgpu_vm *vm = bo_va->vm;
+	struct amdgpu_vm *vm = bo_va->base.vm;
 	bool valid = true;
 
 	saddr /= AMDGPU_GPU_PAGE_SIZE;
@@ -2300,12 +2351,12 @@ void amdgpu_vm_bo_rmv(struct amdgpu_device *adev,
 		      struct amdgpu_bo_va *bo_va)
 {
 	struct amdgpu_bo_va_mapping *mapping, *next;
-	struct amdgpu_vm *vm = bo_va->vm;
+	struct amdgpu_vm *vm = bo_va->base.vm;
 
-	list_del(&bo_va->bo_list);
+	list_del(&bo_va->base.bo_list);
 
 	spin_lock(&vm->status_lock);
-	list_del(&bo_va->vm_status);
+	list_del(&bo_va->base.vm_status);
 	spin_unlock(&vm->status_lock);
 
 	list_for_each_entry_safe(mapping, next, &bo_va->valids, list) {
@@ -2337,13 +2388,14 @@ void amdgpu_vm_bo_rmv(struct amdgpu_device *adev,
 void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
 			     struct amdgpu_bo *bo)
 {
-	struct amdgpu_bo_va *bo_va;
+	struct amdgpu_vm_bo_base *bo_base;
 
-	list_for_each_entry(bo_va, &bo->va, bo_list) {
-		spin_lock(&bo_va->vm->status_lock);
-		if (list_empty(&bo_va->vm_status))
-			list_add(&bo_va->vm_status, &bo_va->vm->invalidated);
-		spin_unlock(&bo_va->vm->status_lock);
+	list_for_each_entry(bo_base, &bo->va, bo_list) {
+		spin_lock(&bo_base->vm->status_lock);
+		if (list_empty(&bo_base->vm_status))
+			list_add(&bo_base->vm_status,
+				 &bo_base->vm->moved);
+		spin_unlock(&bo_base->vm->status_lock);
 	}
 }
 
@@ -2361,12 +2413,26 @@ static uint32_t amdgpu_vm_get_block_size(uint64_t vm_size)
 }
 
 /**
- * amdgpu_vm_adjust_size - adjust vm size and block size
+ * amdgpu_vm_set_fragment_size - adjust fragment size in PTE
+ *
+ * @adev: amdgpu_device pointer
+ * @fragment_size_default: the default fragment size if it's set auto
+ */
+void amdgpu_vm_set_fragment_size(struct amdgpu_device *adev, uint32_t fragment_size_default)
+{
+	if (amdgpu_vm_fragment_size == -1)
+		adev->vm_manager.fragment_size = fragment_size_default;
+	else
+		adev->vm_manager.fragment_size = amdgpu_vm_fragment_size;
+}
+
+/**
+ * amdgpu_vm_adjust_size - adjust vm size, block size and fragment size
  *
  * @adev: amdgpu_device pointer
  * @vm_size: the default vm size if it's set auto
  */
-void amdgpu_vm_adjust_size(struct amdgpu_device *adev, uint64_t vm_size)
+void amdgpu_vm_adjust_size(struct amdgpu_device *adev, uint64_t vm_size, uint32_t fragment_size_default)
 {
 	/* adjust vm size firstly */
 	if (amdgpu_vm_size == -1)
@@ -2381,8 +2447,11 @@ void amdgpu_vm_adjust_size(struct amdgpu_device *adev, uint64_t vm_size)
 	else
 		adev->vm_manager.block_size = amdgpu_vm_block_size;
 
-	DRM_INFO("vm size is %llu GB, block size is %u-bit\n",
-		adev->vm_manager.vm_size, adev->vm_manager.block_size);
+	amdgpu_vm_set_fragment_size(adev, fragment_size_default);
+
+	DRM_INFO("vm size is %llu GB, block size is %u-bit, fragment size is %u-bit\n",
+		adev->vm_manager.vm_size, adev->vm_manager.block_size,
+		adev->vm_manager.fragment_size);
 }
 
 /**
@@ -2404,13 +2473,14 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	struct amd_sched_rq *rq;
 	int r, i;
 	u64 flags;
+	uint64_t init_pde_value = 0;
 
 	vm->va = RB_ROOT;
 	vm->client_id = atomic64_inc_return(&adev->vm_manager.client_counter);
 	for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
 		vm->reserved_vmid[i] = NULL;
 	spin_lock_init(&vm->status_lock);
-	INIT_LIST_HEAD(&vm->invalidated);
+	INIT_LIST_HEAD(&vm->moved);
 	INIT_LIST_HEAD(&vm->cleared);
 	INIT_LIST_HEAD(&vm->freed);
 
@@ -2425,10 +2495,17 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (r)
 		return r;
 
-	if (vm_context == AMDGPU_VM_CONTEXT_COMPUTE)
+	vm->pte_support_ats = false;
+
+	if (vm_context == AMDGPU_VM_CONTEXT_COMPUTE) {
 		vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
 						AMDGPU_VM_USE_CPU_FOR_COMPUTE);
-	else
+
+		if (adev->asic_type == CHIP_RAVEN) {
+			vm->pte_support_ats = true;
+			init_pde_value = AMDGPU_PTE_SYSTEM | AMDGPU_PDE_PTE;
+		}
+	} else
 		vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
 						AMDGPU_VM_USE_CPU_FOR_GFX);
 	DRM_DEBUG_DRIVER("VM update mode is %s\n",
@@ -2448,7 +2525,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	r = amdgpu_bo_create(adev, amdgpu_vm_bo_size(adev, 0), align, true,
 			     AMDGPU_GEM_DOMAIN_VRAM,
 			     flags,
-			     NULL, NULL, &vm->root.bo);
+			     NULL, NULL, init_pde_value, &vm->root.bo);
 	if (r)
 		goto error_free_sched_entity;
 
@@ -2457,6 +2534,13 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		goto error_free_root;
 
 	vm->last_eviction_counter = atomic64_read(&adev->num_evictions);
+
+	if (vm->use_cpu_for_update) {
+		r = amdgpu_bo_kmap(vm->root.bo, NULL);
+		if (r)
+			goto error_free_root;
+	}
+
 	amdgpu_bo_unreserve(vm->root.bo);
 
 	return 0;
author	Linus Torvalds <torvalds@linux-foundation.org>	2017-09-03 17:02:26 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2017-09-03 17:02:26 -0700
commit	906dde0f355bd97c080c215811ae7db1137c4af8 (patch)
tree	1e1b4ba58a59c4027f06d86e7430566ee0dcbb15 /drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
parent	69c0067aa3f40d3e52ab78643aecb17d669d3389 (diff)
parent	7846b12fe0b5feab5446d892f41b5140c1419109 (diff)