Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch moves invalidation into gart enable function from hw_init.
Because we would like align the sequence calling between init and resume.
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc-plugin prepwork from Kees Cook:
"Use designated initializers for mtk-vcodec, powerplay, amdgpu, and
sgi-xp. Use ERR_CAST() to avoid cross-structure cast in ocf2, ntfs,
and NFS.
Christoph Hellwig recommended that I send these fixes now, rather than
waiting for the v4.13 merge window. These are all initializer and cast
fixes needed for the future randstruct plugin that haven't been picked
up by the respective maintainers"
* tag 'gcc-plugins-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
mtk-vcodec: Use designated initializers
drm/amd/powerplay: Use designated initializers
drm/amdgpu: Use designated initializers
sgi-xp: Use designated initializers
ocfs2: Use ERR_CAST() to avoid cross-structure cast
ntfs: Use ERR_CAST() to avoid cross-structure cast
NFS: Use ERR_CAST() to avoid cross-structure cast
|
|
We are using PSP to resume firmware after suspend, and it is
resumed at where it got suspended, so we'd better save the
the context.
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
To simplify vce bo create
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
In review, Christian would like to keep the logic
inside amdgpu_vm.c with a cost of slightly slower.
The loop is still optimized out with this patch.
v2: remove the if statement. Now it is not slower.
Signed-off-by: Alex Xie <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Rex Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Rex Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use the vbios to look up the default frequencies
for socclk and dcefclk.
Signed-off-by: Rex Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
We accidentally return ERR_PTR(0) which is NULL. The caller is not
expecting that and it leads to an Oops.
Fixes: dd59239a9862 ("amdkfd: init aperture once per process")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
Spreading the load across multiple SDMA engines can increase memory
transfer performance.
Signed-off-by: Andres Rodriguez <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Depending on usage patterns, the current LRU policy may create a
non-injective mapping between userspace ring ids and kernel rings.
This behaviour is undesired as apps that attempt to fill all HW blocks
would be unable to reach some of them.
This change forces the LRU policy to create bijective mappings only.
v2: compress ring_blacklist
v3: simplify amdgpu_ring_is_blacklisted() logic
Signed-off-by: Andres Rodriguez <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use an LRU policy to map usermode rings to HW compute queues.
Most compute clients use one queue, and usually the first queue
available. This results in poor pipe/queue work distribution when
multiple compute apps are running. In most cases pipe 0 queue 0 is
the only queue that gets used.
In order to better distribute work across multiple HW queues, we adopt
a policy to map the usermode ring ids to the LRU HW queue.
This fixes a large majority of multi-app compute workloads sharing the
same HW queue, even though 7 other queues are available.
v2: use ring->funcs->type instead of ring->hw_ip
v3: remove amdgpu_queue_mapper_funcs
v4: change ring_lru_list_lock to spinlock, grab only once in lru_get()
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add amdgpu_queue_mgr, a mechanism that allows disjointing usermode's
ring ids from the kernel's ring ids.
The queue manager maintains a per-file descriptor map of user ring ids
to amdgpu_ring pointers. Once a map is created it is permanent (this is
required to maintain FIFO execution guarantees for a context's ring).
Different queue map policies can be configured for each HW IP.
Currently all HW IPs use the identity mapper, i.e. kernel ring id is
equal to the user ring id.
The purpose of this mechanism is to distribute the load across multiple
queues more effectively for HW IPs that support multiple rings.
Userspace clients are unable to check whether a specific resource is in
use by a different client. Therefore, it is up to the kernel driver to
make the optimal choice.
v2: remove amdgpu_queue_mapper_funcs
v3: made amdgpu_queue_mgr per context instead of per-fd
v4: add context_put on error paths
v5: rebase and include new IPs UVD_ENC & VCN_*
v6: drop unused amdgpu_ring_is_valid_index (Alex)
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Tonga based asics may experience hangs when an HQD's EOP parameters
are modified.
Workaround this HW issue by avoiding writes to these registers for
tonga asics.
Based on the following ROCm commit:
2a0fb8 - drm/amdgpu: Synchronize KFD HQD load protocol with CP scheduler
From the ROCm git repository:
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver.git
CC: Jay Cornwall <[email protected]>
Suggested-by: Felix Kuehling <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The MQD structure matches the reg layout. Take advantage of this to
simplify HQD programming.
Note that the ACTIVE field still needs to be programmed last.
Suggested-by: Felix Kuehling <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of taking the first pipe and giving the rest to kfd, take the
first 2 queues of each pipe.
Effectively, amdgpu and amdkfd own the same number of queues. But
because the queues are spread over multiple pipes the hardware will be
able to better handle concurrent compute workloads.
amdgpu goes from 1 pipe to 4 pipes, i.e. from 1 compute threads to 4
amdkfd goes from 3 pipe to 4 pipes, i.e. from 3 compute threads to 4
v2: fix policy comment
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Instead of picking an arbitrary queue for KIQ, search for one according
to policy. The queue must be unused.
Also report the KIQ as an unavailable resource to KFD.
In testing I ran into KCQ initialization issues when using pipes 2/3 of
MEC2 for the KIQ. Therefore the policy disallows grabbing one of these.
v2: fix (ring.me + 1) to (ring.me -1) in amdgpu_amdkfd_device_init
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The assumption that we are only using the first pipe no longer holds.
Instead, calculate the queue_mask from the queue_bitmap.
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Pipes provide better concurrency than queues, therefore we want to make
sure that apps use queues from different pipes whenever possible.
Optimize for the trivial case where an app will consume rings in order,
therefore we don't want adjacent rings to belong to the same pipe.
Reviewed-by: Edward O'Callaghan <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This information is already available in adev.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update the KGD to KFD interface to allow sharing pipes with queue
granularity instead of pipe granularity.
This allows for more interesting pipe/queue splits.
v2: fix overflow check for res.queue_mask
v3: fix shift overflow when setting res.queue_mask
v4: fix comment in is_pipeline_enabled()
v5: clamp res.queue_mask to the first MEC only
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The current implementation is hardcoded to enable ME1/PIPE0 interrupts
only.
This patch allows amdgpu to enable interrupts for any pipe of ME1.
v2: added gfx9 support
v3: use soc15_grbm_select for gfx9
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Previously the queue/pipe split with kfd operated with pipe
granularity. This patch allows amdgpu to take ownership of an arbitrary
set of queues.
It also consolidates the last few magic numbers in the compute
initialization process into mec_init.
v2: support for gfx9
v3: renamed AMDGPU_MAX_QUEUES to AMDGPU_MAX_COMPUTE_QUEUES
v4: fix off-by-one in num_mec checks in *_compute_queue_acquire
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Make amdgpu the owner of all per-pipe state of the HQDs.
This change will allow us to split the queues between kfd and amdgpu
with a queue granularity instead of pipe granularity.
This patch fixes kfd allocating an HDP_EOP region for its 3 pipes which
goes unused.
v2: support for gfx9
v3: fix gfx7 HPD intitialization
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Rename straggler instances of r(adeon)dev to a(mdgpu)dev
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The return value from copy_form_user is 0 for the success case.
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use the same gfx_*_mqd_commit function for kfd and amdgpu codepaths.
This removes the last duplicates of this programming sequence.
v2: fix cp_hqd_pq_wptr value
Reviewed-by: Edward O'Callaghan <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The gfxv7 contains a slightly different version of cik_mqd called
bonaire_mqd. This can introduce subtle bugs if fixes are not applied in
both places.
Reviewed-by: Edward O'Callaghan <[email protected]>
Acked-by: Christian König <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Handle HQD deactivation timeouts instead of ignoring them.
Reviewed-by: Edward O'Callaghan <[email protected]>
Acked-by: Christian König <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The MQD programming sequence currently exists in 3 different places.
Refactor it to absorb all the duplicates.
The success path remains mostly identical except for a slightly
different order in the non-kiq case. This shouldn't matter if the HQD
is disabled.
The error handling paths have been updated to deal with the new code
structure.
v2: the non-kiq path for gfxv8 was dropped in the rebase
v3: split MEC_HPD_SIZE rename, dropped doorbell changes
Reviewed-by: Edward O'Callaghan <[email protected]>
Acked-by: Christian König <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Rename MEC_HPD_SIZE to GFXN_MEC_HPD_SIZE to clarify it is specific to a
gfx generation.
Signed-off-by: Andres Rodriguez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Rex Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts commit f8fdaa0e7b81698ba2ad8c2d20c7f9a44c75e0c6.
firmware add support for this feature, so still ctrl by vbios.
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Rex Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|