Age | Commit message (Collapse) | Author | Files | Lines |
|
Add the IOCTL interface so that applications can allocate per VM BOs.
Still WIP since not all corner cases are tested yet, but this reduces average
CS overhead for 10K BOs from 21ms down to 48us.
v2: add some extra checks, remove the WIP tag
v3: rename new flag to AMDGPU_GEM_CREATE_VM_ALWAYS_VALID
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Per VM BOs are handled like VM PDs and PTs. They are always valid and don't
need to be specified in the BO lists.
v2: validate PDs/PTs first
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Don't allow them to be GEM imported into another process.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We need to refer to the parent instead of the root BO for multi
level page tables on Vega10. Also don't set the PDE_PTE bit.
v2: Don't set the PDE_PTE bit either.
Signed-off-by: Christian König <[email protected]>
Reviewed-and-Tested-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This way we can safely call it on SI as well.
v2: fix type in commit message
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The src isn't used any more after GART hack removal.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Keep track off relocated PDs/PTs instead of walking and checking all PDs.
v2: fix root PD handling
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]> (v1)
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Kfree on NULL pointer is a no-op and therefore checking is redundant.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Himanshu Jha <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of validating all page tables when one was evicted,
track which one needs a validation.
v2: simplify amdgpu_vm_ready as well
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]> (v1)
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Except for the reference count all other members are protected
by the VM PD being reserved.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We changed this to use an extra list a while back, but for the next
series I need a separate flag again.
v2: reorder to avoid unlocked list access
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of using the vm_state use a separate flag to note
that the BO was moved.
v2: reorder patches to avoid temporary lockless access
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Allows writing data to vram via debugfs.
Signed-off-by: Tom St Denis <[email protected]>
Reviewed-by: Christian König <[email protected]>
(v2): Call get_user before holding spinlock.
Signed-off-by: Alex Deucher <[email protected]>
|
|
To allocate additional space for the dynamic cu masks.
Confirmed with the hw team that we only need 1 dword
for the mask. The mask is the same for each SE so
you only need 1 dword.
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Confirmed with the hw team. It's the same for all asics.
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Those are certainly not kernel allocations, instead set the NO_CPU_ACCESS flag.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Stop checking the mapped BO itself, cause that one is
certainly not a page table.
Additional to that move the code into amdgpu_vm.c
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
That somehow got lost.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
sysfs is more stable, and doesn't require root to access
Signed-off-by: Kent Russell <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add 2 debugfs files, one that contains the VBIOS version, and one that
contains the VBIOS itself. These won't change after initialization,
so we can add the VBIOS version when we parse the atombios information.
This ensures that we can find out the VBIOS version, even when the dmesg
buffer fills up, and makes it easier to associate which VBIOS version is
for which GPU on mGPU configurations. Set the size to 20 characters in
case of some weird VBIOS version that exceeds the expected 17 character
format (3-8-3\0). The VBIOS dump also allows for easy debugging
v2: Move to debugfs, clarify commit message, add VBIOS dump file
Signed-off-by: Kent Russell <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Tom St Denis <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Switches the AMDGPU driver over to the TTM tracepoint and removes
our old one. Now you can enable traces before loading the module
and trace all mappings.
Signed-off-by: Tom St Denis <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(v2): Use struct device instead of pci in trace.
|
|
Newer versions of the CP firmware require changes in how the driver
initializes the hw block.
Change the firmware name for new firmware to maintain compatibility with
older kernels.
Acked-by: Christian König <[email protected]>
Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Remove a redundant identical return statement, it has no use.
Detected by CoverityScan, CID#1454586 ("Structurally dead code")
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Check memory allocation failure and return -ENOMEM in such a case.
'num_post_dep_syncobjs' still has to be set to 0 before the test in order
to have it initialized if 'amdgpu_cs_parser_fini()' is called to free
resources.
The calling graph would be, in such a case!
failure in amdgpu_cs_process_syncobj_out_dep()
---> error code returned by amdgpu_cs_dependencies()
--> amdgpu_cs_parser_fini() is called
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
BANK_SELECT should always be FRAGMENT_SIZE + 3 due to 8-entry (2^3)
per cache line in L2 TLB for Vega10.
v2: agd: fix warning
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The function is called only once and doesn't do anything special.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use ttm_bo_mem_space instead of manually allocating GART space.
This allows us to evict BOs when there isn't enought GART space any more.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This isn't used since we don't map evicted BOs to GART any more.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
KIQ doesn't really use the GPU scheduler. The base
drivers generally use the KIQ ring directly rather than
submitting IBs. However, amdgpu_sched_hw_submission
(which defaults to 2) limits the number of outstanding
fences to 2. KFD uses the KIQ for TLB flushes and the
2 fence limit hurts performance when there are several KFD
processes running.
v2: move some expressions to one line
change KIQ sched_hw_submission to at least 16
v3: bump to 256
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Move the asic specific code into the IP modules.
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Be more explicit and add comments explaining each case.
Also s/gart/GART/ in the parameter string as per Felix'
suggestion.
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Set the shadow flag on the shadow and not the parent, always bind shadow BOs
during allocation instead of manually, use the reservation_object wrappers
to grab the lock.
This fixes a couple of issues with binding the shadow BOs as well as correctly
evicting them when memory becomes tight.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We need a larger gart for asics that do not support GPUVM on all
engines (e.g., MM) to make sure we have enough space for all
gtt buffers in physical mode. Change the default size based on
the asic type.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
immediately
For virtual display, it uses software timer to emulate the vsync interrupt,
it doesn't have high precision, so doesn't support disable vblank immediately.
BUG: SWDEV-129274
Signed-off-by: Emily Deng <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Correctly detect system memory mappings when using CPU and don't use
huge pages for them.
Avoid incorrectly translating a physical page table GPU address when
splitting a huge page while mapping system memory.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The function has far more in common with drm_syncobj_find than with
any in the get/put functions.
Signed-off-by: Jason Ekstrand <[email protected]>
Acked-by: Christian König <[email protected]> (v1)
Signed-off-by: Dave Airlie <[email protected]>
|
|
Remove a redundant identical return statement, it has no use.
Detected by CoverityScan, CID#1454586 ("Structurally dead code")
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Check memory allocation failure and return -ENOMEM in such a case.
'num_post_dep_syncobjs' still has to be set to 0 before the test in order
to have it initialized if 'amdgpu_cs_parser_fini()' is called to free
resources.
The calling graph would be, in such a case!
failure in amdgpu_cs_process_syncobj_out_dep()
---> error code returned by amdgpu_cs_dependencies()
--> amdgpu_cs_parser_fini() is called
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
BANK_SELECT should always be FRAGMENT_SIZE + 3 due to 8-entry (2^3)
per cache line in L2 TLB for Vega10.
v2: agd: fix warning
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The function is called only once and doesn't do anything special.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use ttm_bo_mem_space instead of manually allocating GART space.
This allows us to evict BOs when there isn't enought GART space any more.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This isn't used since we don't map evicted BOs to GART any more.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
KIQ doesn't really use the GPU scheduler. The base
drivers generally use the KIQ ring directly rather than
submitting IBs. However, amdgpu_sched_hw_submission
(which defaults to 2) limits the number of outstanding
fences to 2. KFD uses the KIQ for TLB flushes and the
2 fence limit hurts performance when there are several KFD
processes running.
v2: move some expressions to one line
change KIQ sched_hw_submission to at least 16
v3: bump to 256
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Move the asic specific code into the IP modules.
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Be more explicit and add comments explaining each case.
Also s/gart/GART/ in the parameter string as per Felix'
suggestion.
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Set the shadow flag on the shadow and not the parent, always bind shadow BOs
during allocation instead of manually, use the reservation_object wrappers
to grab the lock.
This fixes a couple of issues with binding the shadow BOs as well as correctly
evicting them when memory becomes tight.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We need a larger gart for asics that do not support GPUVM on all
engines (e.g., MM) to make sure we have enough space for all
gtt buffers in physical mode. Change the default size based on
the asic type.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|