Age | Commit message (Collapse) | Author | Files | Lines |
|
This is for MES to limit only one process for the user queues
Signed-off-by: Shaoyun Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
MES internal scratch data is useful for mes debug, it can only located
in VRAM, change the allocation type and increase size for mes 11
Signed-off-by: shaoyunl <[email protected]>
Acked-by: Feifei Xu <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
mes.
The currect code use the address "adev->mes.read_val_ptr" to
store the value read from register via mes.
So when multiple threads read register,
multiple threads have to share the one address,
and overwrite the value each other.
Assign an address by "amdgpu_device_wb_get" to store register value.
each thread will has an address to store register value.
Signed-off-by: chongli2 <[email protected]>
Reviewed-by: Emily Deng <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We need this prior to the firmware being loaded so fetch
from the header.
v2: fetch directly from the firmware
v3: store both fw versions
Reviewed-by: Srinivasan Shanmugam <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add me/pipe/queue parameters for queue reset input.
v2: fix build (Alex)
Acked-by: Vitaly Prosyak <[email protected]>
Signed-off-by: Jiadong Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For mes11 old firmware has issue to map legacy queue,
add a flag to switch mes to map legacy queue.
Fixes: f9d8c5c7855d ("drm/amdgpu/gfx: enable mes to map legacy queue support")
Reported-by: Andrew Worsley <[email protected]>
Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add implementation for MES Suspend and Resume APIs to unmap/map
all queues for GFX11. Support for GFX12 will be added when the
corresponding firmware support is in place.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add API for resetting user queues.
Acked-by: Vitaly Prosyak <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Configure two pipes with different hardware resources.
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add multiple mes ring instances in mes structure to support
multiple mes pipes.
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add API for resetting kernel queues.
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
MES firmware requires larger log buffer for gfx12. Allocate
proper buffer respectively for gfx11 and gfx12.
Signed-off-by: Michael Chen <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No longer used so remove it.
Reviewed-by: Mukul Joshi <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add mes mapping legacy queue framework support.
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
We can't use a shared fence location because each transaction
should be considered independently.
Reviewed-by: Shaoyun.liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
support MES command SET_HW_RESOURCE1 in sriov
Signed-off-by: chongli2 <[email protected]>
Reviewed-by: Jingwen Chen <[email protected]>
Acked-by: Jingwen Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
From MES version 0x54, the log entry increased and require the log buffer
size to be increased. The 16k is maximum size agreed
Signed-off-by: shaoyunl <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
MES provides the driver a call to explicitly flush stale process memory
within the MES to avoid a race condition that results in a fatal
memory violation.
When SET_SHADER_DEBUGGER is called, the driver passes a memory address
that represents a process context address MES uses to keep track of
future per-process calls.
Normally, MES will purge its process context list when the last queue
has been removed. The driver, however, can call SET_SHADER_DEBUGGER
regardless of whether a queue has been added or not.
If SET_SHADER_DEBUGGER has been called with no queues as the last call
prior to process termination, the passed process context address will
still reside within MES.
On a new process call to SET_SHADER_DEBUGGER, the driver may end up
passing an identical process context address value (based on per-process
gpu memory address) to MES but is now pointing to a new allocated buffer
object during KFD process creation. Since the MES is unaware of this,
access of the passed address points to the stale object within MES and
triggers a fatal memory violation.
The solution is for KFD to explicitly flush the process context address
from MES on process termination.
Note that the flush call and the MES debugger calls use the same MES
interface but are separated as KFD calls to avoid conflicting with each
other.
Signed-off-by: Jonathan Kim <[email protected]>
Tested-by: Alice Wong <[email protected]>
Reviewed-by: Eric Huang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This is the generic SW part, prepare the event log buffer and dump it through debugfs
Signed-off-by: shaoyunl <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
MES allocates process level doorbells, but there is no userspace
client to consume it. It was only being used for the MES ring
tests (in kernel), and was written by kernel doorbell write.
The previous patch of this series has changed the MES ring test code to
use kernel level MES doorbells. This patch now cleans up the process level
doorbell allocation code which is not required.
Cc: Alex Deucher <[email protected]>
Cc: Christian Koenig <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Shashank Sharma <[email protected]>
Signed-off-by: Arvind Yadav <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This patch:
- Removes the existing doorbell management code, and its variables
from the doorbell_init function, it will be done in doorbell
manager now.
- uses the doorbell page created for MES kernel level needs (doorbells
for MES self tests)
- current MES code was allocating MES doorbells in MES process context,
but those were getting written using kernel doorbell calls. This patch
instead allocates a MES kernel doorbell for this (in add_hw_queue).
V2: Create an extra page of doorbells for MES during kernel doorbell
creation (Alex)
V4: Move MES doorbell size and page offset objects in this patch from
patch 6.
Cc: Alex Deucher <[email protected]>
Cc: Christian Koenig <[email protected]>
Reviewed-by: Christian Koenig <[email protected]>
Signed-off-by: Shashank Sharma <[email protected]>
Signed-off-by: Arvind Yadav <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
MES can concurrently schedule queues on the device that require
exclusive device access if marked exclusively_scheduled without the
requirement of GWS. Similar to the F32 HWS, MES will manage
quality of service for these queues.
Use this for cooperative groups since cooperative groups are device
occupancy limited.
Since some GFX11 devices can only be debugged with partial CUs, do not
allow the debugging of cooperative groups on these devices as the CU
occupancy limit will change on attach.
In addition, zero initialize the MES add queue submission vector for MES
initialization tests as we do not want these to be cooperative
dispatches.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
There are a couple of fixes required to enable gfx11 debugging.
First, ADD_QUEUE.trap_en is an inappropriate place to toggle
a per-process register so move it to SET_SHADER_DEBUGGER.trap_en.
When ADD_QUEUE.skip_process_ctx_clear is set, MES will prioritize
the SET_SHADER_DEBUGGER.trap_en setting.
Second, to preserve correct save/restore priviledged wave states
in coordination with the trap enablement setting, resume suspended
waves early in the disable call.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Due to a HW bug, waves in only half the shader arrays can enter trap.
When starting a debug session, relocate all waves to the first shader
array of each shader engine and mask off the 2nd shader array as
unavailable.
When ending a debug session, re-enable the 2nd shader array per
shader engine.
User CU masking per queue cannot be guaranteed to remain functional
if requested during debugging (e.g. user cu mask requests only 2nd shader
array as an available resource leading to zero HW resources available)
nor can runtime be alerted of any of these changes during execution.
Make user CU masking and debugging mutual exclusive with respect to
availability.
If the debugger tries to attach to a process with a user cu masked
queue, return the runtime status as enabled but busy.
If the debugger tries to attach and fails to reallocate queue waves to
the first shader array of each shader engine, return the runtime status
as enabled but with an error.
In addition, like any other mutli-process debug supported devices,
disable trap temporary setup per-process to avoid performance impact from
setup overhead.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Similar to the F32 HWS, the RS64 HWS for GFX11 now supports a multi-process
debug API.
The skip_process_ctx_clear ADD_QUEUE requirement is to prevent the MES
from clearing the process context when the first queue is added to the
scheduler in order to maintain debug mode settings during queue preemption
and restore. The MES clears the process context in this case due to an
unresolved FW caching bug during normal mode operations.
During debug mode, the KFD will hold a reference to the target process
so the process context should never go stale and MES can afford to skip
this requirement.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add an early_init phase to MES for fetching and validating microcode
from the filesystem.
If MES microcode is required but not available during early init, the
firmware framebuffer will have already been released and the screen will
freeze.
Move the request for MES microcode into the early_init phase
so that if it's not available, early_init will fail.
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
MES scheduler and kiq versions are stored in mes.sched_version and
mes.kiq_version, respectively, which are read from a register after
their queues are initialized. Remove mes.ucode_fw_version and
mes.data_fw_version which tried to read this versioning info from the
firmware headers (which don't contain this information).
Signed-off-by: Graham Sider <[email protected]>
Reviewed-by: Jack Xiao <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update mes_v11_api_def.h add_queue API with is_aql_queue parameter. Also
re-use gds_size for the queue size (unused for KFD). MES requires the
queue size in order to compute the actual wptr offset within the queue
RB since it increases monotonically for AQL queues.
v2: Make is_aql_queue assign clearer
Signed-off-by: Graham Sider <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Ring aggregated doorbel to make unmapped queue scheduled in mes firmware.
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Jack Xiao <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Allocate and enable aggregated doorbell.
Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Jack Xiao <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Need reserve buffers before unmap mes ctx bo va.
v2: fix removal of dma_resv_excl_fence() (Alex)
v3: fix dma_resv_usage (Alex)
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For some cases (accessing registers, unmap legacy queue), it needs
access mes in atomic context. Use spinlock to protect agaist mes
ring buffer race condition.
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add common interface for mes misc op, including accessing register
interface.
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
MES requires mc wptr address for usermode queues.
Export bo gart address for mc wptr address.
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update MES API to support oversubscription without aggregated doorbell
for usermode queues.
v2: Change oversubscription_no_aggregated_en to is_kfd_process (align
with MES)
Signed-off-by: Graham Sider <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Jack Xiao <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Starting with GFX11, MES requires wptr BOs to be GTT allocated/mapped to
GART for usermode queues in order to support oversubscription. In the
case that work is submitted to an unmapped queue, MES must have a GART
wptr address to determine whether the queue should be mapped.
This change is accompanied with changes in MES and is applicable for
MES_API_VERSION >= 2.
v3:
- Use amdgpu_vm_bo_lookup_mapping for wptr_bo mapping lookup
- Move wptr_bo refcount increment to amdgpu_amdkfd_map_gtt_bo_to_gart
- Remove list_del_init from amdgpu_amdkfd_map_gtt_bo_to_gart
- Cleanup/fix create_queue wptr_bo error handling
v4:
- Add MES version shift/mask defines to amdgpu_mes.h
- Change version check from MES_VERSION to MES_API_VERSION
- Add check in kfd_ioctl_create_queue before wptr bo pin/GART map to
ensure bo is a single page.
Signed-off-by: Graham Sider <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Philip Yang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Store MES scheduler and MES KIQ version numbers in amdgpu_mes for GFX11.
Signed-off-by: Graham Sider <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Jack Xiao <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
For mes kiq has been taken over by mes sched, drv can't directly
use mes kiq to unmap queues. drv has to use mes sched api to
unmap legacy queue.
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Update the function signatures for process doorbell allocations
with MES enabled to make them more generic. KFD would need to
access these functions to allocate/free doorbells when MES is
enabled.
Signed-off-by: Mukul Joshi <[email protected]>
Acked-by: Oak Zeng <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Jack Xiao <[email protected]>
Reviewed-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Need reserve VM buffers before update VM csa.
v2: rebase fixes
Signed-off-by: Jack Xiao <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add mes self test to verify its fundamental functionality by
running ring test and ib test of mes kernel queue.
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add the helper functions to allocate/free context metadata.
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Remove the mes ring and its resources.
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Use ring as the front end for kernel queue submission.
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add the helper function to get the corresponding ctx meta data offset.
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Remove the MES queue from MES scheduling and free its resources.
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Allocate related resources for the queue and add it to mes
for scheduling.
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Implement resuming all gangs.
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Implement suspending all gangs.
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Free the mes gang and its resources.
Signed-off-by: Jack Xiao <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|