diff options
author | Dave Airlie <airlied@redhat.com> | 2022-11-30 11:45:43 +1000 |
---|---|---|
committer | Dave Airlie <airlied@redhat.com> | 2022-11-30 12:01:27 +1000 |
commit | 795bd9bb21c694ebcee38e8026ebeac4a63929bf (patch) | |
tree | 1df9c5af1e81c891b8ba9e62fa777f336870c015 /Documentation | |
parent | 2847b6681547022e5c8f1f6dc104c9256a120c0a (diff) | |
parent | 8c5577a5ccc632685e65168fc6890b72a779f93a (diff) |
Merge tag 'drm-accel-2022-11-22' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel into drm-next
This tag contains the patches that add the new compute acceleration
subsystem, which is part of the DRM subsystem.
The patches:
- Add a new directory at drivers/accel.
- Add a new major (261) for compute accelerators.
- Add a new DRM minor type for compute accelerators.
- Integrate the accel core code with DRM core code.
- Add documentation for the accel subsystem.
Signed-off-by: Dave Airlie <airlied@redhat.com>
some acks from the list (some are in the patch series):
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Sonal Santan <sonal.santan@amd.com>
Acked-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Tested-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
From: Oded Gabbay <ogabbay@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20221122112222.GA352082@ogabbay-vm-u20.habana-labs.com
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/accel/index.rst | 17 | ||||
-rw-r--r-- | Documentation/accel/introduction.rst | 110 | ||||
-rw-r--r-- | Documentation/admin-guide/devices.txt | 5 | ||||
-rw-r--r-- | Documentation/subsystem-apis.rst | 1 |
4 files changed, 133 insertions, 0 deletions
diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst new file mode 100644 index 000000000000..2b43c9a7f67b --- /dev/null +++ b/Documentation/accel/index.rst @@ -0,0 +1,17 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================== +Compute Accelerators +==================== + +.. toctree:: + :maxdepth: 1 + + introduction + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/accel/introduction.rst b/Documentation/accel/introduction.rst new file mode 100644 index 000000000000..6f31af14b1fc --- /dev/null +++ b/Documentation/accel/introduction.rst @@ -0,0 +1,110 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============ +Introduction +============ + +The Linux compute accelerators subsystem is designed to expose compute +accelerators in a common way to user-space and provide a common set of +functionality. + +These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU. +Although these devices are typically designed to accelerate +Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer +is not limited to handling these types of accelerators. + +Typically, a compute accelerator will belong to one of the following +categories: + +- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA, + or an IP inside a SoC (e.g. laptop web camera). These devices + are typically configured using registers and can work with or without DMA. + +- Inference data-center - single/multi user devices in a large server. This + type of device can be stand-alone or an IP inside a SoC or a GPU. It will + have on-board DRAM (to hold the DL topology), DMA engines and + command submission queues (either kernel or user-space queues). + It might also have an MMU to manage multiple users and might also enable + virtualization (SR-IOV) to support multiple VMs on the same device. In + addition, these devices will usually have some tools, such as profiler and + debugger. + +- Training data-center - Similar to Inference data-center cards, but typically + have more computational power and memory b/w (e.g. HBM) and will likely have + a method of scaling-up/out, i.e. connecting to other training cards inside + the server or in other servers, respectively. + +All these devices typically have different runtime user-space software stacks, +that are tailored-made to their h/w. In addition, they will also probably +include a compiler to generate programs to their custom-made computational +engines. Typically, the common layer in user-space will be the DL frameworks, +such as PyTorch and TensorFlow. + +Sharing code with DRM +===================== + +Because this type of devices can be an IP inside GPUs or have similar +characteristics as those of GPUs, the accel subsystem will use the +DRM subsystem's code and functionality. i.e. the accel core code will +be part of the DRM subsystem and an accel device will be a new type of DRM +device. + +This will allow us to leverage the extensive DRM code-base and +collaborate with DRM developers that have experience with this type of +devices. In addition, new features that will be added for the accelerator +drivers can be of use to GPU drivers as well. + +Differentiation from GPUs +========================= + +Because we want to prevent the extensive user-space graphic software stack +from trying to use an accelerator as a GPU, the compute accelerators will be +differentiated from GPUs by using a new major number and new device char files. + +Furthermore, the drivers will be located in a separate place in the kernel +tree - drivers/accel/. + +The accelerator devices will be exposed to the user space with the dedicated +261 major number and will have the following convention: + +- device char files - /dev/accel/accel* +- sysfs - /sys/class/accel/accel*/ +- debugfs - /sys/kernel/debug/accel/accel*/ + +Getting Started +=============== + +First, read the DRM documentation at Documentation/gpu/index.rst. +Not only it will explain how to write a new DRM driver but it will also +contain all the information on how to contribute, the Code Of Conduct and +what is the coding style/documentation. All of that is the same for the +accel subsystem. + +Second, make sure the kernel is configured with CONFIG_DRM_ACCEL. + +To expose your device as an accelerator, two changes are needed to +be done in your driver (as opposed to a standard DRM driver): + +- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's + driver_features field. It is important to note that this driver feature is + mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want + to expose both graphics and compute device char files should be handled by + two drivers that are connected using the auxiliary bus framework. + +- Change the open callback in your driver fops structure to accel_open(). + Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily + set the correct function operations pointers structure. + +External References +=================== + +email threads +------------- + +* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022) +* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022) + +Conference talks +---------------- + +* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022) diff --git a/Documentation/admin-guide/devices.txt b/Documentation/admin-guide/devices.txt index 9764d6edb189..06c525e01ea5 100644 --- a/Documentation/admin-guide/devices.txt +++ b/Documentation/admin-guide/devices.txt @@ -3080,6 +3080,11 @@ ... 255 = /dev/osd255 256th OSD Device + 261 char Compute Acceleration Devices + 0 = /dev/accel/accel0 First acceleration device + 1 = /dev/accel/accel1 Second acceleration device + ... + 384-511 char RESERVED FOR DYNAMIC ASSIGNMENT Character devices that request a dynamic allocation of major number will take numbers starting from 511 and downward, diff --git a/Documentation/subsystem-apis.rst b/Documentation/subsystem-apis.rst index af65004a80aa..b51f38527e14 100644 --- a/Documentation/subsystem-apis.rst +++ b/Documentation/subsystem-apis.rst @@ -43,6 +43,7 @@ needed). input/index hwmon/index gpu/index + accel/index security/index sound/index crypto/index |