| Age | Commit message (Collapse) | Author | Files | Lines |
|
Document how to use SMI system management interface to enable and
receive SVM events. Document SVM event triggers.
Define SVM events message string format macro that could be used by user
mode for sscanf to parse the event. Add it to uAPI header file to make
it obvious that is changing uAPI in future.
No functional changes.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: James Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Certain GPUs have better copy performance over xGMI on specific
SDMA engines depending on the source and destination GPU.
Allow users to create SDMA queues on these recommended engines.
Close to 2x overall performance has been observed with this
optimization.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
GFX12 debugging requires setting up precise ALU operation for catching
ALU exceptions.
Signed-off-by: Jonathan Kim <[email protected]>
Tested-by: Lancelot Six <[email protected]>
Reviewed-by: Eric Huang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Bump the kfd ioctl minor version to delcare the contiguous VRAM
allocation flag support.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
RDMA device with limited scatter-gather ability requires contiguous VRAM
buffer allocation for RDMA peer direct support.
Add a new KFD alloc memory flag and store as bo alloc flag
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA
peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask
VRAM buddy allocator to get contiguous VRAM.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Due to a CP interrupt bug, bad packet garbage exception codes are raised.
Do a range check so that the debugger and runtime do not receive garbage
codes.
Update the user api to guard exception code type checking as well.
Signed-off-by: Jonathan Kim <[email protected]>
Tested-by: Jesse Zhang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This is not strictly a change in the IOCTL API. This version bump is meant
to indicate to user mode the presence of a number of changes and fixes
that enable the management of VA mappings in compute VMs using the GEM_VA
ioctl for DMABufs exported from KFD.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Xiaogang Chen<[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
These flags (for GEM and SVM allocations) allocate
memory that allows for system-scope atomic semantics.
On GFX943 these flags cause caches to be avoided on
non-local memory.
On all other ASICs they are identical in functionality to the
equivalent COHERENT flags.
Corresponding Thunk patch is at
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88
Reviewed-by: David Yat Sin <[email protected]>
Signed-off-by: David Francis <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Bump the minor version to declare event age tracking feature is now
available.
In kernel amdgpu driver, kfd_wait_on_events is used to support user
space signal event wait function. For multiple threads waiting on same
event scenery, race condition could occur since some threads after
checking signal condition, before calling kfd_wait_on_events, the
event interrupt could be fired and wake up other thread which are
sleeping on this event. Then those threads could fall into sleep
without waking up again. Adding event age tracking in both kernel and
user mode, will help avoiding this race condition.
Proposed ROCT-Thunk-Interface:
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/commit/efdbf6cfbc026bd68ac3c35d00dacf84370eb81e
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/commit/1820ae0a2db85b6f584611dc0cde1a00e7c22915
Proposed ROCR-Runtime:
https://github.com/RadeonOpenCompute/ROCR-Runtime/compare/master...zhums:ROCR-Runtime:new_event_wait_review
https://github.com/RadeonOpenCompute/ROCR-Runtime/commit/e1f5bdb88eb882ac798aeca2c00ea3fbb2dba459
https://github.com/RadeonOpenCompute/ROCR-Runtime/commit/7d26afd14107b5c2a754c1a3f415d89f3aabb503
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add event age tracking
Signed-off-by: James Zhu <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Bump the minor version to declare debugging capability is now
available.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Introduce the GPU debug operations interface.
For ROCm-GDB to extend the GNU Debugger's ability to inspect the AMD GPU
instruction set, provide the necessary interface to allow the debugger
to HW debug-mode set and query exceptions per HSA queue, process or
device.
The runtime_enable interface coordinates exception handling with the
HSA runtime.
Usage is available in the kern docs at uapi/linux/kfd_ioctl.h.
Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Was leftover from GC 9.4.3 bring up and is currently
unused. Drop it for now.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Recently introduced commit "drm/amdgpu: Set cache coherency
for GC 9.4.3" did not update the settings applicable for svm ranges.
Add the coherence settings for svm ranges for GFX IP 9.4.3.
Reviewed-by: Amber Lin <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Exports a DMA buf fd of a given KFD buffer handle. This is intended for
being able to import KFD BOs into GEM contexts to leverage the
amdgpu_bo_va API for more flexible virtual address mappings. It will
also be used for the new upstreamable RDMA solution coming to UCX and
RCCL.
The corresponding user mode change (Thunk API and kfdtest) is here:
https://github.com/fxkamd/ROCT-Thunk-Interface/commits/fxkamd/dmabuf
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Xiaogang Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
To expose unified memory for ctx save/resotre area feature
availablity to libhsakmt.
Proposed userspace:
https://patchwork.freedesktop.org/series/106218/
Signed-off-by: Eric Huang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
It is to add new option for always keeping gpu mapping.
Signed-off-by: Eric Huang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Indicate SMI profiling events available.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Define new system management interface event IDs for migration, GPU
recoverable page fault, user queues eviction, restore and unmap from
GPU events and corresponding event triggers, those will be implemented
in the following patches.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add a new KFD ioctl to return the largest possible memory size that
can be allocated as a buffer object using
kfd_ioctl_alloc_memory_of_gpu. It attempts to use exactly the same
accept/reject criteria as that function so that allocating a new
buffer object of the size returned by this new ioctl is guaranteed to
succeed, barring races with other allocating tasks.
This IOCTL will be used by libhsakmt:
https://www.mail-archive.com/[email protected]/msg75743.html
Signed-off-by: Daniel Phillips <[email protected]>
Signed-off-by: David Yat Sin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Export dmabuf handles for GTT BOs so that their contents can be accessed
using SDMA during checkpoint/restore.
v2: Squash in fix from David to set dmabuf handle to invalid for BOs
that cannot be accessed using SDMA during checkpoint/restore.
Signed-off-by: David Yat Sin <[email protected]>
Reviewed-by : Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
To remove duplicate code, unify event message format and simplify new
event add in the following patches.
Use KFD_SMI_EVENT_MSG_SIZE to define msg size, the same size will be
used in user space to alloc the msg receive buffer.
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Reference:
https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
CC: Changcheng Deng <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Philip Yang <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Cleanup the kfd code by removing the unused old debugger
implementation.
The address watch was only ever implemented in the upstream
driver for GFXv7 (Kaveri). The user mode tools runtime using
this API was never open-sourced. Work on the old debugger
prototype that used this API has been discontinued years ago.
Only a small piece of resetting wavefronts is kept and
is moved to kfd_device_queue_manager.c.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
- Change KFD minor version to 7 for CRIU
Proposed userspace changes:
https://github.com/RadeonOpenCompute/criu
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Checkpoint-Restore in userspace (CRIU) is a powerful tool that can
snapshot a running process and later restore it on same or a remote
machine but expects the processes that have a device file (e.g. GPU)
associated with them, provide necessary driver support to assist CRIU
and its extensible plugin interface. Thus, In order to support the
Checkpoint-Restore of any ROCm process, the AMD Radeon Open Compute
Kernel driver, needs to provide a set of new APIs that provide
necessary VRAM metadata and its contents to a userspace component
(CRIU plugin) that can store it in form of image files.
This introduces some new ioctls which will be used to checkpoint-Restore
any KFD bound user process. KFD only allows ioctl calls from the same
process that opened the KFD file descriptor. Since these ioctls are
expected to be called from a KFD criu plugin which has elevated ptrace
attached privileges and CAP_CHECKPOINT_RESTORE capabilities attached with
the file descriptors so modify KFD to allow such calls.
(API redesigned by David Yat Sin)
Suggested-by: Felix Kuehling <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: David Yat Sin <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Currently the SVM get_attr call allows querying, which flags are set
in the entire address range. Add the opposite query, which flags are
clear in the entire address range. Both queries can be combined in a
single get_attr call, which allows answering questions such as, "is
this address range coherent, non-coherent, or a mix of both"?
Proposed userspace for UAPI:
https://github.com/RadeonOpenCompute/ROCR-Runtime/tree/memory_model_queries
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Philip Yand <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Xnack retries are used for page fault recovery. Some AMD chip
families support continuously retry while page table entries are invalid.
The driver must handle the page fault interrupt and fill in a valid entry
for the GPU to continue.
This ioctl allows to enable/disable XNACK retries per KFD process.
Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add svm (shared virtual memory) ioctl data structure and API definition.
The svm ioctl API is designed to be extensible in the future. All
operations are provided by a single IOCTL to preserve ioctl number
space. The arguments structure ends with a variable size array of
attributes that can be used to set or get one or multiple attributes.
Signed-off-by: Philip Yang <[email protected]>
Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Indicate the availability reliable SRAM EDC state in the new bit
in the device properties.
Proposed userspace changes:
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/commit/7cdd63475c36bb9f49bb960f90f9a8cdb7e80a21
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Kent Russell <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The macro is for memory mapped by GPU as uncached.
Signed-off-by: Eric Huang <[email protected]>
Reviewed-by: Oak Zeng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Replace spaces with Tabs to fix indentation in kfd_smi_event
enum.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add support for reporting GPU reset events through SMI. KFD
would report both pre and post GPU reset events.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Event bitmask is a 64-bit mask with only 1 bit set. Sending this
event bitmask in KFD SMI event message is both wasteful of memory
and potentially limiting to only 64 events. Instead send event
index in SMI event message.
Please note this change does not break the ABI for the two event
types defined so far. The new index is identical to the mask used
before.
Signed-off-by: Mukul Joshi <[email protected]>
Suggested-by: Felix Kuehling <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add support for reporting thermal throttling events through SMI.
Also, add a counter to count the number of throttling interrupts
observed and report the count in the SMI event message.
Signed-off-by: Mukul Joshi <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Bump KFD ioctl after adding SMI events support
Signed-off-by: Amber Lin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When the compute is malfunctioning or performance drops, the system admin
will use SMI (System Management Interface) tool to monitor/diagnostic what
went wrong. This patch provides an event watch interface for the user
space to register devices and subscribe events they are interested. After
registered, the user can use annoymous file descriptor's poll function
with wait-time specified and wait for events to happen. Once an event
happens, the user can use read() to retrieve information related to the
event.
VM fault event is done in this patch.
v2: - remove UNREGISTER and add event ENABLE/DISABLE
- correct kfifo usage
- move event message API to kfd_ioctl.h
v3: send the event msg in text than in binary
v4: support multiple clients
v5: move events enablement from ioctl to fd write
v6: sparse fix
Signed-off-by: Amber Lin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Corrected two function names. Added a missing space.
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Kent Russell <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add a new kfd ioctl to allocate queue GWS. Queue
GWS is released on queue destroy.
v2: re-introduce this API with the following fixes squashed in:
- drm/amdkfd: fix null pointer dereference on dev
- drm/amdkfd: Return proper error code for gws alloc API
- drm/amdkfd: Remove GPU ID in GWS queue creation
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts commit 1a058c3376765ee31d65e28cbbb9d4ff15120056.
This interface is still in too much flux. Revert until
it's sorted out.
Acked-by: Oak Zeng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Add a new kfd ioctl to allocate queue GWS. Queue
GWS is released on queue destroy.
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Existing QUEUE_TYPE_SDMA means PCIe optimized SDMA queues.
Introduce a new QUEUE_TYPE_SDMA_XGMI, which is optimized
for non-PCIe transfer such as XGMI.
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Introduce a new memory type (KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP) and
expose mmio page of HDP registers to user space through this new
memory type.
v2: moved remapped hdp regs to adev struct
v3: rename the new memory type to ALLOC_MEM_FLAGS_MMIO_REMAP
v4: use more generic function name
v5: Fail remapped mmio allocation for asics before gfx9
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Remap HDP_MEM_COHERENCY_FLUSH_CNTL and HDP_REG_COHERENCY_FLUSH_CNTL
to an empty page in mmio space. We will later map this page to process
space so application can flush hdp. This can't be done properly at
those registers' original location because it will expose more than
desired registers to process space.
v2: Use explicit register hole location
v3: Moved remapped hdp registers into adev struct
v4: Use more generic name for remapped page
Expose register offset in kfd_ioctl.h
v5: Move hdp register remap function to nbio ip function
v6: Fixed operator precedence issue and other bugs
Signed-off-by: Oak Zeng <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
RAS ECC event will combine with GPU reset event, due to
ECC interrupts are caused by uncorrectable error that triggers
GPU reset.
v2: Fix misleading-indentation warning
v3: fix build with CONFIG_HSA_AMD disabled
Signed-off-by: Eric Huang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This is used for interoperability between ROCm compute and graphics
APIs. It allows importing graphics driver BOs into the ROCm SVM
address space for zero-copy GPU access.
The API is split into two steps (query and import) to allow user mode
to manage the virtual address space allocation for the imported buffer.
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Consistently use types provided by <linux/types.h> via <drm/drm.h>
to fix struct kfd_ioctl_get_queue_wave_state_args userspace compilation errors.
Fixes: 5df099e8bc83f ("drm/amdkfd: Add wavefront context save state retrieval ioctl")
Signed-off-by: Dmitry V. Levin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Consistently use types provided by <linux/types.h> via <drm/drm.h>
to fix the following linux/kfd_ioctl.h userspace compilation errors:
/usr/include/linux/kfd_ioctl.h:250:2: error: unknown type name 'uint32_t'
uint32_t reset_type;
/usr/include/linux/kfd_ioctl.h:251:2: error: unknown type name 'uint32_t'
uint32_t reset_cause;
/usr/include/linux/kfd_ioctl.h:252:2: error: unknown type name 'uint32_t'
uint32_t memory_lost;
/usr/include/linux/kfd_ioctl.h:253:2: error: unknown type name 'uint32_t'
uint32_t gpu_id;
Fixes: 0c119abad7f0d ("drm/amd: Add kfd ioctl defines for hw_exception event")
Cc: <[email protected]> # v4.19
Signed-off-by: Dmitry V. Levin <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Wavefront context save data is of interest to userspace clients for
debugging static wavefront state. The MQD contains two parameters
required to parse the control stack and the control stack itself
is kept in the MQD from gfx9 onwards.
Add an ioctl to fetch the context save area and control stack offsets
and to copy the control stack to a userspace address if it is kept in
the MQD.
Signed-off-by: Jay Cornwall <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Signed-off-by: Felix Kuehling <[email protected]>
Acked-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|