Age | Commit message (Collapse) | Author | Files | Lines |
|
We unmapped imported DMA-bufs when the GEM handle was dropped, not when the
hardware was done with the buffere.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
CC: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
|
They don't work 100% correctly at the moment.
Reviewed-and-Tested-by: Michel Dänzer <[email protected]>
Signed-off-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It's the only place it's used.
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
enable eviction of other per VM BOs during allocation and allows
reaping of deleted BOs during CS.
Reviewed-by: Chunming Zhou <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Roger He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of the global statistics use the per context bytes moved counter.
v2: rebased
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
Tested-by: Michel Dänzer <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of specifying if sleeping should be interruptible.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
Tested-by: Michel Dänzer <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Give moving a BO into place an operation context to work with.
v2: rebased
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
Tested-by: Michel Dänzer <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This can be triggered by userspace, e.g. trying to allocate too large a
BO, so it shouldn't log anything by default.
Callers need to handle failure anyway.
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Validates BO size against each requested domain's total memory.
v2:
Make GTT size check a MUST to allow fall back to GTT.
Rmove redundant NULL check.
Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We actually don't bind here, but rather allocate GART space if necessary.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Rename amdgpu_gtt_mgr_is_allocated() to amdgpu_gtt_mgr_has_gart_addr() and use
that instead.
v2: rename the function as well.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We always use the BO mem now.
v2: minor rebase
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We need to test if any domain fits, not all of them.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We always need to bind pinned BOs, not just when the caller requested the
address.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This allows us to specify multiple possible placements again.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When this VF stays in exclusive mode for long, other VFs will be
impacted.
The redundant messages causes exclusive mode timeout when they're
redirected. That is a normal use case for cloud service to redirect
guest log to virtual serial port.
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: pding <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The bo structure is freed up in case of an error, so we can't do any
accounting if that happens.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
CC: [email protected]
Signed-off-by: Alex Deucher <[email protected]>
|
|
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Just set the CPU access required flag when we pin it.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We adjusted the BO flags for USWC handling, but those never took effect
because the placement was passed in instead of generated inside this
function.
v2: better commit message
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of validating all page tables when one was evicted,
track which one needs a validation.
v2: simplify amdgpu_vm_ready as well
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]> (v1)
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Set the shadow flag on the shadow and not the parent, always bind shadow BOs
during allocation instead of manually, use the reservation_object wrappers
to grab the lock.
This fixes a couple of issues with binding the shadow BOs as well as correctly
evicting them when memory becomes tight.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Looks like a better place for this.
v2: use atomic64_t members instead
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It doesn't make much sense to count those numbers twice.
v2: use and atomic64_t instead
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Change "prefered" to "preferred"
Signed-off-by: Kent Russell <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The parameter init_value contains the value to which we initialized
VRAM bo when AMDGPU_GEM_CREATE_VRAM_CLEARED flag is set.
Signed-off-by: Yong Zhao <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Same as amdgpu_bo_create_kernel, but keeps the BO reserved.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Make allocating the new BO optional.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Don't keep around the same pointer twice.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Chunming Zhou <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When a BO is moved or destroyed it shouldn't be kmapped any more.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There is no need for page faults to force BOs into visible VRAM if it's
full, and the time it takes to do so is great enough to cause noticeable
stuttering. Add GTT as a possible placement so that if visible VRAM is
full, page faults move BOs to GTT instead of evicting other BOs from VRAM.
Suggested-by: Michel Dänzer <[email protected]>
Signed-off-by: John Brooks <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When a BO is moved to VRAM, clear AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED.
This allows it to potentially later move to invisible VRAM if the CPU
does not access it again.
Setting the CPU_ACCESS flag in amdgpu_bo_fault_reserve_notify() also means
that we can remove the loop to restrict lpfn to the end of visible VRAM,
because amdgpu_ttm_placement_init() will do it for us.
v3 [Michel Dänzer]
* Use AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED instead of a new flag
(Christian König)
* Clear flag in amdgpu_bo_move instead of amdgpu_move_ram_vram
(Christian)
* Explicitly mention amdgpu_bo_fault_reserve_notify in amdgpu_bo_move
* Also clear flag in amdgpu_bo_create_restricted
Suggested-by: Michel Dänzer <[email protected]>
Signed-off-by: John Brooks <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The BO move throttling code is designed to allow VRAM to fill quickly if it
is relatively empty. However, this does not take into account situations
where the visible VRAM is smaller than total VRAM, and total VRAM may not
be close to full but the visible VRAM segment is under pressure. In such
situations, visible VRAM would experience unrestricted swapping and
performance would drop.
Add a separate counter specifically for moves involving visible VRAM, and
check it before moving BOs there.
v2: Only perform calculations for separate counter if visible VRAM is
smaller than total VRAM. (Michel Dänzer)
v3: [Michel Dänzer]
* Use BO's location rather than the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
flag to determine whether to account a move for visible VRAM in most
cases.
* Use a single
if (adev->mc.visible_vram_size < adev->mc.real_vram_size) {
block in amdgpu_cs_get_threshold_for_moves.
Fixes: 95844d20ae02 (drm/amdgpu: throttle buffer migrations at CS using a fixed MBps limit (v2))
Signed-off-by: John Brooks <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
That is already fixed.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
This allows us to flush the system VM here.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
|
|
v2: bump the DRM version
Signed-off-by: Marek Olšák <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Reviewed-by: Chunming Zhou <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Roger.He <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
1. The signal interrupt can affect the expected behaviour.
2. There is no good mechanism to handle the corresponding error.
Signed-off-by: Alex Xie <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
(v2)
Migration to VRAM will break the sharing, resulting in rendering on the exporting GPU never becoming
visible on the importing GPU.
v2: Don't pin BOs to GTT. Instead, refuse to migrate them out of GTT.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Christopher James Halse Rogers <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This avoids merging them together on page fault.
v2: squash in 64-bit division fix
Signed-off-by: Christian König <[email protected]>
Acked-by: Michel Dänzer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Implement AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS using TTM_PL_FLAG_CONTIGUOUS
instead of a placement limit. That allows us to better handle CPU
accessible placements.
v2: prevent virtual BO start address from overflowing
Signed-off-by: Christian König <[email protected]>
Acked-by: Michel Dänzer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When VRAM pressue and trigger huge evictions there is performance drop,
this patch fix it.
Signed-off-by: Roger.He <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Marek Olšák <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
By using ttm_bo_init_reserved instead of the manual initialization of
the reservation object, the reservation lock will be properly unlocked
and destroyed when the TTM BO initialization fails.
Actual deadlocks caused by the missing unlock should have been fixed
by "drm/ttm: never add BO that failed to validate to the LRU list",
superseding the flawed fix in commit 38fc4856ad98 ("drm/amdgpu: fix
a potential deadlock in amdgpu_bo_create_restricted()").
This change fixes remaining recursive locking errors that can be seen
with lock debugging enabled, and avoids the error of freeing a locked
mutex.
As an additional minor bonus, buffers created with resv == NULL and
the AMDGPU_GEM_CREATE_VRAM_CLEARED flag are now only added to the
global LRU list after the fill commands have been issued.
v2: use amdgpu_bo_unreserve instead of ttm_bo_unreserve
Fixes: 12a852219583 ("drm/amdgpu: improve AMDGPU_GEM_CREATE_VRAM_CLEARED handling (v2)")
Signed-off-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts commit 38fc4856ad98f230bc91da0421dec69e4aee40f8, which
introduces a use-after-free.
The underlying bug should be properly fixed with "drm/ttm: never add BO
that failed to validate to the LRU list".
Cc: zhoucm1 <[email protected]>
Signed-off-by: Nicolai Hähnle <[email protected]>
Tested-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Like ttm_bo_validate(), ttm_bo_init() might need to move BO and
the number of bytes moved by TTM should be reported. This can help
the throttle buffer migration mechanism to make a better decision.
v2: fix computation
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When ttm_bo_init() fails, the reservation mutex should be unlocked.
In debug build, the kernel reported "possible recursive locking
detected" in this codepath. For debugging purposes, I also added
a "WARN_ON(ww_mutex_is_locked())" when ttm_bo_init() fails and the
mutex was locked as expected.
This should fix (random) GPU hangs. The easy way to reproduce the
issue is to change the "Super Sampling" option from 1.0 to 2.0 in
Hitman. It will create a huge buffer, evict a bunch of buffers
(around ~5k) and deadlock.
This regression has been introduced pretty recently.
v2: only release the mutex if resv is NULL
Fixes: 12a852219583 ("drm/amdgpu: improve AMDGPU_GEM_CREATE_VRAM_CLEARED handling (v2)")
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
My randconfig tests on linux-next showed a newly introduced warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c: In function 'amdgpu_bo_create_restricted':
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:377:2: error: #warning Please enable CONFIG_MTRR and CONFIG_X86_PAT for better performance thanks to write-combining [-Werror=cpp]
Generally speaking, warnings about bad kernel configuration are not particularly
helpful. We could enforce the selection of X86_PAT through Kconfig, so the driver
cannot even be used unless it is enabled, or we could just rely on the runtime
warning that is also there.
In this version, I'm making the warning conditional on CONFIG_COMPILE_TEST, which
shuts it up for me, but not people that may actually want to run the kernel
as a compromize.
Fixes: a2e2f29970aa ("drm/amdgpu: Bring bo creation in line with radeon driver (v2)")
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add the bo creation changes that have been done to the radeon driver in
recent times, e.g. disable GTT WC on 32 bit because it is broken there,
and also disable it generally (and print a warning message) when
CONFIG_X86_PAT is not set.
v2: agd: fix warning in defined(CONFIG_X86) && !defined(CONFIG_X86_PAT)
case
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Nils Holland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|