Age | Commit message (Collapse) | Author | Files | Lines |
|
KIQ doesn't really use the GPU scheduler. The base
drivers generally use the KIQ ring directly rather than
submitting IBs. However, amdgpu_sched_hw_submission
(which defaults to 2) limits the number of outstanding
fences to 2. KFD uses the KIQ for TLB flushes and the
2 fence limit hurts performance when there are several KFD
processes running.
v2: move some expressions to one line
change KIQ sched_hw_submission to at least 16
v3: bump to 256
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Move the asic specific code into the IP modules.
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Be more explicit and add comments explaining each case.
Also s/gart/GART/ in the parameter string as per Felix'
suggestion.
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Set the shadow flag on the shadow and not the parent, always bind shadow BOs
during allocation instead of manually, use the reservation_object wrappers
to grab the lock.
This fixes a couple of issues with binding the shadow BOs as well as correctly
evicting them when memory becomes tight.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We need a larger gart for asics that do not support GPUVM on all
engines (e.g., MM) to make sure we have enough space for all
gtt buffers in physical mode. Change the default size based on
the asic type.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
immediately
For virtual display, it uses software timer to emulate the vsync interrupt,
it doesn't have high precision, so doesn't support disable vblank immediately.
BUG: SWDEV-129274
Signed-off-by: Emily Deng <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Correctly detect system memory mappings when using CPU and don't use
huge pages for them.
Avoid incorrectly translating a physical page table GPU address when
splitting a huge page while mapping system memory.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The function has far more in common with drm_syncobj_find than with
any in the get/put functions.
Signed-off-by: Jason Ekstrand <[email protected]>
Acked-by: Christian König <[email protected]> (v1)
Signed-off-by: Dave Airlie <[email protected]>
|
|
Remove a redundant identical return statement, it has no use.
Detected by CoverityScan, CID#1454586 ("Structurally dead code")
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Check memory allocation failure and return -ENOMEM in such a case.
'num_post_dep_syncobjs' still has to be set to 0 before the test in order
to have it initialized if 'amdgpu_cs_parser_fini()' is called to free
resources.
The calling graph would be, in such a case!
failure in amdgpu_cs_process_syncobj_out_dep()
---> error code returned by amdgpu_cs_dependencies()
--> amdgpu_cs_parser_fini() is called
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
BANK_SELECT should always be FRAGMENT_SIZE + 3 due to 8-entry (2^3)
per cache line in L2 TLB for Vega10.
v2: agd: fix warning
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The function is called only once and doesn't do anything special.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use ttm_bo_mem_space instead of manually allocating GART space.
This allows us to evict BOs when there isn't enought GART space any more.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This isn't used since we don't map evicted BOs to GART any more.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
KIQ doesn't really use the GPU scheduler. The base
drivers generally use the KIQ ring directly rather than
submitting IBs. However, amdgpu_sched_hw_submission
(which defaults to 2) limits the number of outstanding
fences to 2. KFD uses the KIQ for TLB flushes and the
2 fence limit hurts performance when there are several KFD
processes running.
v2: move some expressions to one line
change KIQ sched_hw_submission to at least 16
v3: bump to 256
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Move the asic specific code into the IP modules.
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Be more explicit and add comments explaining each case.
Also s/gart/GART/ in the parameter string as per Felix'
suggestion.
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Set the shadow flag on the shadow and not the parent, always bind shadow BOs
during allocation instead of manually, use the reservation_object wrappers
to grab the lock.
This fixes a couple of issues with binding the shadow BOs as well as correctly
evicting them when memory becomes tight.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We need a larger gart for asics that do not support GPUVM on all
engines (e.g., MM) to make sure we have enough space for all
gtt buffers in physical mode. Change the default size based on
the asic type.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
immediately
For virtual display, it uses software timer to emulate the vsync interrupt,
it doesn't have high precision, so doesn't support disable vblank immediately.
BUG: SWDEV-129274
Signed-off-by: Emily Deng <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Correctly detect system memory mappings when using CPU and don't use
huge pages for them.
Avoid incorrectly translating a physical page table GPU address when
splitting a huge page while mapping system memory.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
git://people.freedesktop.org/~gabbayo/linux into drm-next
This is the amdkfd pull request for 4.14 merge window.
AMD has started cleaning the pipe and sending patches from their internal
development to the upstream community.
The plan as I understand it is to first get all the non-dGPU patches to
upstream and then move to upstream dGPU support.
The patches here are relevant only for Kaveri and Carrizo.
The following is a summary of the changes:
- Add new IOCTL to set a Scratch memory VA
- Update PM4 headers for new firmware that support scratch memory
- Support image tiling mode
- Remove all uses of BUG_ON
- Various Bug fixes and coding style fixes
* tag 'drm-amdkfd-next-2017-08-18' of git://people.freedesktop.org/~gabbayo/linux: (24 commits)
drm/amdkfd: Implement image tiling mode support v2
drm/amdgpu: Add kgd kfd interface get_tile_config() v2
drm/amdkfd: Adding new IOCTL for scratch memory v2
drm/amdgpu: Add kgd/kfd interface to support scratch memory v2
drm/amdgpu: Program SH_STATIC_MEM_CONFIG globally, not per-VMID
drm/amd: Update MEC HQD loading code for KFD
drm/amdgpu: Disable GFX PG on CZ
drm/amdkfd: Update PM4 packet headers
drm/amdkfd: Clamp EOP queue size correctly on Gfx8
drm/amdkfd: Add more error printing to help bringup v2
drm/amdkfd: Handle remaining BUG_ONs more gracefully v2
drm/amdkfd: Allocate gtt_sa_bitmap in long units
drm/amdkfd: Fix doorbell initialization and finalization
drm/amdkfd: Remove BUG_ONs for NULL pointer arguments
drm/amdkfd: Remove usage of alloc(sizeof(struct...
drm/amdkfd: Fix goto usage v2
drm/amdkfd: Change x==NULL/false references to !x
drm/amdkfd: Consolidate and clean up log commands
drm/amdkfd: Clean up KFD style errors and warnings v2
drm/amdgpu: Remove hard-coded assumptions about compute pipes
...
|
|
mmVGT_INDEX_TYPE has no default value, need to make sure
it's initialized when gfx is initialized.
Signed-off-by: Ken Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Allow overrides on the command line.
v2: agd: sqaush in spelling fix and bogus default value warning
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
adds fragment_size in the vm_manager structure and
implements hardware setup for it.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
That better describes what happens here with the BO.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Split that into vm_bo_base and bo_va to allow other uses as well.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Just add the flags to the addr field as well.
v2: add some more comments that the flag is for huge pages.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We now properly kmap all BOs after validation.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Move the CSA bo_va from the VM to the fpriv structure.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The shadow handling isn't implemented completely for userspace BOs and
the kernel sets the VRAM_CONTIGUOUS as necessary.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
update the list first to avoid redundant checks.
Signed-off-by: Chunming Zhou <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Looks like a better place for this.
v2: use atomic64_t members instead
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It doesn't make much sense to count those numbers twice.
v2: use and atomic64_t instead
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of the separate switch/case in the calling function.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The BO manager has its own lock and doesn't use the lru_lock.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Provide the drm printer directly instead of just the callback.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This helps map DMA addresses back to physical addresses.
Signed-off-by: Tom St Denis <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(v2): Added tracepoints for USERPTR, SG mappings, and
SWIOTBL mappings. Reformatted trace call perform
PCI decoding internal to the trace.
(v3): Add unmap tracepoints as well
(v4): Move traces into separate functions
|
|
Those values weren't correct. This should result in quite some speedup.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No need to do this on every CS.
v2: remove all other bind, reorder code
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This should save us a bunch of command submission overhead.
v2: move the LRU move to the right place to avoid the move for the root BO
and handle the shadow BOs as well. This turned out to be a bug fix because
the move needs to happen before the kmap.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
into drm-next
More features for 4.14. Nothing too major here. I have a few more additional
patches for large page support in vega10 among other things, but they require
some resevation object patches from drm-misc-next, so I'll send that request
once you've pulled the latest drm-misc-next. Highlights:
- Fixes for ACP audio on stoney
- SR-IOV fixes for vega10
- various powerplay fixes
- lots of code clean up
* 'drm-next-4.14' of git://people.freedesktop.org/~agd5f/linux: (62 commits)
drm/amdgpu/gfx7: fix function name
drm/amd/amdgpu: Disabling Power Gating for Stoney platform
drm/amd/amdgpu: Added a quirk for Stoney platform
drm/amdgpu: jt_size was wrongly counted twice
drm/amdgpu: fix missing endian-safe guard
drm/amdgpu: ignore digest_size when loading sdma fw for raven
drm/amdgpu: Uninitialized variable in amdgpu_ttm_backend_bind()
drm/amd/powerplay: fix coding style in hwmgr.c
drm/amd/powerplay: refine dmesg info under powerplay.
drm/amdgpu: don't finish the ring if not initialized
drm/radeon: Fix preferred typo
drm/amdgpu: Fix preferred typo
drm/radeon: Fix stolen typo
drm/amdgpu: Fix stolen typo
drm/amd/powerplay: fix coccinelle warnings in vega10_hwmgr.c
drm/amdgpu: set gfx_v9_0_ip_funcs as static
drm/radeon: switch to drm_*{get,put} helpers
drm/amdgpu: switch to drm_*{get,put} helpers
drm/amd/powerplay: add CZ profile support
drm/amd/powerplay: fix PSI not enabled by kmd
...
|
|
v2:
* Removed amdgpu_amdkfd prefix from static functions
* Documented get_tile_config in kgd_kfd_interface.h
Signed-off-by: Yong Zhao <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
v2:
* Shortened headline
* Removed write_config_static_mem, it gets initialized by gfx_v?_0_gpu_init
* Renamed alloc_memory_of_scratch to set_scratch_backing_va
* Made set_scratch_backing_va a void function
* Documented set_scratch_backing in kgd_kfd_interface.h
Signed-off-by: Moses Reuben <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
This register only has a single instance in the hardware. Its value
applies to all VMIDS.
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
Various bug fixes and improvements that accumulated over the last two
years.
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|