docs: vfio: Add vfio device cdev description

This gives notes for userspace applications on device cdev usage.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-27-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This commit is contained in:
Yi Liu 2023-07-18 06:55:51 -07:00 committed by Alex Williamson
parent c1cce6d079
commit 094671300f

View file

@ -239,6 +239,137 @@ group and can access them as follows::
/* Gratuitous device reset and go... */
ioctl(device, VFIO_DEVICE_RESET);
IOMMUFD and vfio_iommu_type1
----------------------------
IOMMUFD is the new user API to manage I/O page tables from userspace.
It intends to be the portal of delivering advanced userspace DMA
features (nested translation [5]_, PASID [6]_, etc.) while also providing
a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
cases. Eventually the vfio_iommu_type1 driver, as well as the legacy
vfio container and group model is intended to be deprecated.
The IOMMUFD backwards compatibility interface can be enabled two ways.
In the first method, the kernel can be configured with
CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
transparently provides the entire infrastructure for the VFIO
container and IOMMU backend interfaces. The compatibility mode can
also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
simply symlink'd to /dev/iommu. Note that at the time of writing, the
compatibility mode is not entirely feature complete relative to
VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface. Therefore
it is not generally advisable at this time to switch from native VFIO
implementations to the IOMMUFD compatibility interfaces.
Long term, VFIO users should migrate to device access through the cdev
interface described below, and native access through the IOMMUFD
provided interfaces.
VFIO Device cdev
----------------
Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
in a VFIO group.
With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
by directly opening a character device /dev/vfio/devices/vfioX where
"X" is the number allocated uniquely by VFIO for registered devices.
cdev interface does not support noiommu devices, so user should use
the legacy group interface if noiommu is wanted.
The cdev only works with IOMMUFD. Both VFIO drivers and applications
must adapt to the new cdev security model which requires using
VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
actually use the device. Once BIND succeeds then a VFIO device can
be fully accessed by the user.
VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
Hence those modules can be fully compiled out in an environment
where no legacy VFIO application exists.
So far SPAPR does not support IOMMUFD yet. So it cannot support device
cdev either.
vfio device cdev access is still bound by IOMMU group semantics, ie. there
can be only one DMA owner for the group. Devices belonging to the same
group can not be bound to multiple iommufd_ctx or shared between native
kernel and vfio bus driver or other driver supporting the driver_managed_dma
flag. A violation of this ownership requirement will fail at the
VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access.
Device cdev Example
-------------------
Assume user wants to access PCI device 0000:6a:01.0::
$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
vfio0
This device is therefore represented as vfio0. The user can verify
its existence::
$ ls -l /dev/vfio/devices/vfio0
crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
511:0
$ ls -l /dev/char/511\:0
lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0
Then provide the user with access to the device if unprivileged
operation is desired::
$ chown user:user /dev/vfio/devices/vfio0
Finally the user could get cdev fd by::
cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
An opened cdev_fd doesn't give the user any permission of accessing
the device except binding the cdev_fd to an iommufd. After that point
then the device is fully accessible including attaching it to an
IOMMUFD IOAS/HWPT to enable userspace DMA::
struct vfio_device_bind_iommufd bind = {
.argsz = sizeof(bind),
.flags = 0,
};
struct iommu_ioas_alloc alloc_data = {
.size = sizeof(alloc_data),
.flags = 0,
};
struct vfio_device_attach_iommufd_pt attach_data = {
.argsz = sizeof(attach_data),
.flags = 0,
};
struct iommu_ioas_map map = {
.size = sizeof(map),
.flags = IOMMU_IOAS_MAP_READABLE |
IOMMU_IOAS_MAP_WRITEABLE |
IOMMU_IOAS_MAP_FIXED_IOVA,
.__reserved = 0,
};
iommufd = open("/dev/iommu", O_RDWR);
bind.iommufd = iommufd;
ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
attach_data.pt_id = alloc_data.out_ioas_id;
ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
/* Allocate some space and setup a DMA mapping */
map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
map.iova = 0; /* 1MB starting at 0x0 from device view */
map.length = 1024 * 1024;
map.ioas_id = alloc_data.out_ioas_id;;
ioctl(iommufd, IOMMU_IOAS_MAP, &map);
/* Other device operations as stated in "VFIO Usage Example" */
VFIO User API
-------------------------------------------------------------------------------
@ -566,3 +697,11 @@ This implementation has some specifics:
\-0d.1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
.. [5] Nested translation is an IOMMU feature which supports two stage
address translations. This improves the address translation efficiency
in IOMMU virtualization.
.. [6] PASID stands for Process Address Space ID, introduced by PCI
Express. It is a prerequisite for Shared Virtual Addressing (SVA)
and Scalable I/O Virtualization (Scalable IOV).