blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-12-04	vfio/migration: Add debugfs to live migration driver	Longfang Liu	1	-0/+1
	There are multiple devices, software and operational steps involved in the process of live migration. An error occurred on any node may cause the live migration operation to fail. This complex process makes it very difficult to locate and analyze the cause when the function fails. In order to quickly locate the cause of the problem when the live migration fails, I added a set of debugfs to the vfio live migration driver. +-------------------------------------------+ \| \| \| \| \| QEMU \| \| \| \| \| +---+----------------------------+----------+ \| ^ \| ^ \| \| \| \| \| \| \| \| v \| v \| +---------+--+ +---------+--+ \|src vfio_dev\| \|dst vfio_dev\| +--+---------+ +--+---------+ \| ^ \| ^ \| \| \| \| v \| \| \| +-----------+----+ +-----------+----+ \|src dev debugfs \| \|dst dev debugfs \| +----------------+ +----------------+ The entire debugfs directory will be based on the definition of the CONFIG_DEBUG_FS macro. If this macro is not enabled, the interfaces in vfio.h will be empty definitions, and the creation and initialization of the debugfs directory will not be executed. vfio \| +---<dev_name1> \| +---migration \| +--state \| +---<dev_name2> +---migration +--state debugfs will create a public root directory "vfio" file. then create a dev_name() file for each live migration device. First, create a unified state acquisition file of "migration" in this device directory. Then, create a public live migration state lookup file "state". Signed-off-by: Longfang Liu <[email protected]> Reviewed-by: Cédric Le Goater <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2023-09-28	vfio: use __aligned_u64 in struct vfio_device_ioeventfd	Stefan Hajnoczi	1	-2/+3
	The memory layout of struct vfio_device_ioeventfd is architecture-dependent due to a u64 field and a struct size that is not a multiple of 8 bytes: - On x86_64 the struct size is padded to a multiple of 8 bytes. - On x32 the struct size is only a multiple of 4 bytes, not 8. - Other architectures may vary. Use __aligned_u64 to make memory layout consistent. This reduces the chance that 32-bit userspace on a 64-bit kernel breakage. This patch increases the struct size on x32 but this is safe because of the struct's argsz field. The kernel may grow the struct as long as it still supports smaller argsz values from userspace (e.g. applications compiled against older kernel headers). The code that uses struct vfio_device_ioeventfd already works correctly when the struct size grows, so only the struct definition needs to be changed. Suggested-by: Jason Gunthorpe <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Stefan Hajnoczi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2023-09-28	vfio: use __aligned_u64 in struct vfio_device_gfx_plane_info	Stefan Hajnoczi	1	-1/+2
	The memory layout of struct vfio_device_gfx_plane_info is architecture-dependent due to a u64 field and a struct size that is not a multiple of 8 bytes: - On x86_64 the struct size is padded to a multiple of 8 bytes. - On x32 the struct size is only a multiple of 4 bytes, not 8. - Other architectures may vary. Use __aligned_u64 to make memory layout consistent. This reduces the chance of 32-bit userspace on a 64-bit kernel breakage. This patch increases the struct size on x32 but this is safe because of the struct's argsz field. The kernel may grow the struct as long as it still supports smaller argsz values from userspace (e.g. applications compiled against older kernel headers). Suggested-by: Jason Gunthorpe <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Stefan Hajnoczi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2023-09-28	vfio: trivially use __aligned_u64 for ioctl structs	Stefan Hajnoczi	1	-9/+9
	u64 alignment behaves differently depending on the architecture and so <uapi/linux/types.h> offers __aligned_u64 to achieve consistent behavior in kernel<->userspace ABIs. There are structs in <uapi/linux/vfio.h> that can trivially be updated to __aligned_u64 because the struct sizes are multiples of 8 bytes. There is no change in memory layout on any CPU architecture and therefore this change is safe. The commits that follow this one handle the trickier cases where explanation about ABI breakage is necessary. Suggested-by: Jason Gunthorpe <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Stefan Hajnoczi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2023-09-28	vfio: add bus master feature to device feature ioctl	Nipun Gupta	1	-0/+21
	add bus mastering control to VFIO_DEVICE_FEATURE IOCTL. The VFIO user can use this feature to enable or disable the Bus Mastering of a device bound to VFIO. Co-developed-by: Shubham Rohila <[email protected]> Signed-off-by: Shubham Rohila <[email protected]> Signed-off-by: Nipun Gupta <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2023-08-30	Merge tag 'for-linus-iommufd' of ↵	Linus Torvalds	1	-0/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "On top of the vfio updates is built some new iommufd functionality: - IOMMU_HWPT_ALLOC allows userspace to directly create the low level IO Page table objects and affiliate them with IOAS objects that hold the translation mapping. This is the basic functionality for the normal IOMMU_DOMAIN_PAGING domains. - VFIO_DEVICE_ATTACH_IOMMUFD_PT can be used to replace the current translation. This is wired up to through all the layers down to the driver so the driver has the ability to implement a hitless replacement. This is necessary to fully support guest behaviors when emulating HW (eg guest atomic change of translation) - IOMMU_GET_HW_INFO returns information about the IOMMU driver HW that owns a VFIO device. This includes support for the Intel iommu, and patches have been posted for all the other server IOMMU. Along the way are a number of internal items: - New iommufd kernel APIs: iommufd_ctx_has_group(), iommufd_device_to_ictx(), iommufd_device_to_id(), iommufd_access_detach(), iommufd_ctx_from_fd(), iommufd_device_replace() - iommufd now internally tracks iommu_groups as it needs some per-group data - Reorganize how the internal hwpt allocation flows to have more robust locking - Improve the access interfaces to support detach and replace of an IOAS from an access - New selftests and a rework of how the selftests creates a mock iommu driver to be more like a real iommu driver" Link: https://lore.kernel.org/lkml/ZO%[email protected]/ * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (34 commits) iommufd/selftest: Don't leak the platform device memory when unloading the module iommu/vt-d: Implement hw_info for iommu capability query iommufd/selftest: Add coverage for IOMMU_GET_HW_INFO ioctl iommufd: Add IOMMU_GET_HW_INFO iommu: Add new iommu op to get iommu hardware information iommu: Move dev_iommu_ops() to private header iommufd: Remove iommufd_ref_to_users() iommufd/selftest: Make the mock iommu driver into a real driver vfio: Support IO page table replacement iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage iommufd: Add iommufd_access_replace() API iommufd: Use iommufd_access_change_ioas in iommufd_access_destroy_object iommufd: Add iommufd_access_change_ioas(_id) helpers iommufd: Allow passing in iopt_access_list_id to iopt_remove_access() vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages() iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC iommufd/selftest: Return the real idev id from selftest mock_domain iommufd: Add IOMMU_HWPT_ALLOC iommufd/selftest: Test iommufd_device_replace() iommufd: Make destroy_rwsem use a lock class per object type ...
2023-08-17	vfio: align capability structures	Stefan Hajnoczi	1	-0/+2
	The VFIO_DEVICE_GET_INFO, VFIO_DEVICE_GET_REGION_INFO, and VFIO_IOMMU_GET_INFO ioctls fill in an info struct followed by capability structs: +------+---------+---------+-----+ \| info \| caps[0] \| caps[1] \| ... \| +------+---------+---------+-----+ Both the info and capability struct sizes are not always multiples of sizeof(u64), leaving u64 fields in later capability structs misaligned. Userspace applications currently need to handle misalignment manually in order to support CPU architectures and programming languages with strict alignment requirements. Make life easier for userspace by ensuring alignment in the kernel. This is done by padding info struct definitions and by copying out zeroes after capability structs that are not aligned. The new layout is as follows: +------+---------+---+---------+-----+ \| info \| caps[0] \| 0 \| caps[1] \| ... \| +------+---------+---+---------+-----+ In this example caps[0] has a size that is not multiples of sizeof(u64), so zero padding is added to align the subsequent structure. Adding zero padding between structs does not break the uapi. The memory layout is specified by the info.cap_offset and caps[i].next fields filled in by the kernel. Applications use these field values to locate structs and are therefore unaffected by the addition of zero padding. Note that code that copies out info structs with padding is updated to always zero the struct and copy out as many bytes as userspace requested. This makes the code shorter and avoids potential information leaks by ensuring padding is initialized. Originally-by: Alex Williamson <[email protected]> Signed-off-by: Stefan Hajnoczi <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2023-07-28	vfio: Support IO page table replacement	Nicolin Chen	1	-0/+6
	Now both the physical path and the emulated path can support an IO page table replacement. Call iommufd_device_replace/iommufd_access_replace(), when vdev->iommufd_attached is true. Also update the VFIO_DEVICE_ATTACH_IOMMUFD_PT kdoc in the uAPI header. Link: https://lore.kernel.org/r/b5f01956ff161f76aa52c95b0fa1ad6eaca95c4a.1690523699.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Alex Williamson <[email protected]> Signed-off-by: Nicolin Chen <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-07-25	vfio: Add VFIO_DEVICE_[AT\|DE]TACH_IOMMUFD_PT	Yi Liu	1	-0/+44
	This adds ioctl for userspace to attach device cdev fd to and detach from IOAS/hw_pagetable managed by iommufd. VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS or hw_pagetable managed by iommufd. Attach can be undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close. VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached IOAS or hw_pagetable managed by iommufd. Reviewed-by: Jason Gunthorpe <[email protected]> Tested-by: Nicolin Chen <[email protected]> Tested-by: Matthew Rosato <[email protected]> Tested-by: Yanting Jiang <[email protected]> Tested-by: Shameer Kolothum <[email protected]> Tested-by: Terrence Xu <[email protected]> Tested-by: Zhenzhong Duan <[email protected]> Signed-off-by: Yi Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2023-07-25	vfio: Add VFIO_DEVICE_BIND_IOMMUFD	Yi Liu	1	-0/+27
	This adds ioctl for userspace to bind device cdev fd to iommufd. VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA control provided by the iommufd. open_device op is called after bind_iommufd op. Tested-by: Nicolin Chen <[email protected]> Tested-by: Matthew Rosato <[email protected]> Tested-by: Yanting Jiang <[email protected]> Tested-by: Shameer Kolothum <[email protected]> Tested-by: Terrence Xu <[email protected]> Tested-by: Zhenzhong Duan <[email protected]> Signed-off-by: Yi Liu <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2023-07-25	vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET	Yi Liu	1	-0/+21
	This is the way user to invoke hot-reset for the devices opened by cdev interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing hot-reset for cdev devices. Suggested-by: Jason Gunthorpe <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Tested-by: Yanting Jiang <[email protected]> Tested-by: Zhenzhong Duan <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Yi Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2023-07-25	vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev	Yi Liu	1	-1/+49
	This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx of the cdev device to check the ownership of the other affected devices. When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate the values returned are IOMMUFD devids rather than group IDs as used when accessing vfio devices through the conventional vfio group interface. Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported in this mode if all of the devices affected by the hot-reset are owned by either virtue of being directly bound to the same iommufd context as the calling device, or implicitly owned via a shared IOMMU group. Suggested-by: Jason Gunthorpe <[email protected]> Suggested-by: Alex Williamson <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Tested-by: Yanting Jiang <[email protected]> Tested-by: Zhenzhong Duan <[email protected]> Signed-off-by: Yi Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2023-06-30	Merge tag 'vfio-v6.5-rc1' of https://github.com/awilliam/linux-vfio	Linus Torvalds	1	-0/+18
	Pull VFIO updates from Alex Williamson: - Adjust log levels for common messages (Oleksandr Natalenko, Alex Williamson) - Support for dynamic MSI-X allocation (Reinette Chatre) - Enable and report PCIe AtomicOp Completer capabilities (Alex Williamson) - Cleanup Kconfigs for vfio bus drivers (Alex Williamson) - Add support for CDX bus based devices (Nipun Gupta) - Fix race with concurrent mdev initialization (Eric Farman) * tag 'vfio-v6.5-rc1' of https://github.com/awilliam/linux-vfio: vfio/mdev: Move the compat_class initialization to module init vfio/cdx: add support for CDX bus vfio/fsl: Create Kconfig sub-menu vfio/platform: Cleanup Kconfig vfio/pci: Cleanup Kconfig vfio/pci-core: Add capability for AtomicOp completer support vfio/pci: Also demote hiding standard cap messages vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X vfio/pci: Support dynamic MSI-X vfio/pci: Probe and store ability to support dynamic MSI-X vfio/pci: Use bitfield for struct vfio_pci_core_device flags vfio/pci: Update stale comment vfio/pci: Remove interrupt context counter vfio/pci: Use xarray for interrupt context storage vfio/pci: Move to single error path vfio/pci: Prepare for dynamic interrupt context storage vfio/pci: Remove negative check on unsigned vector vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable vfio/pci: demote hiding ecap messages to debug level
2023-06-16	vfio/cdx: add support for CDX bus	Nipun Gupta	1	-0/+1
	vfio-cdx driver enables IOCTLs for user space to query MMIO regions for CDX devices and mmap them. This change also adds support for reset of CDX devices. With VFIO enabled on CDX devices, user-space applications can also exercise DMA securely via IOMMU on these devices. This change adds the VFIO CDX driver and enables the following ioctls for CDX devices: - VFIO_DEVICE_GET_INFO: - VFIO_DEVICE_GET_REGION_INFO - VFIO_DEVICE_RESET Signed-off-by: Nipun Gupta <[email protected]> Reviewed-by: Pieter Jansen van Vuuren <[email protected]> Tested-by: Nikhil Agarwal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2023-06-16	vfio/pci-core: Add capability for AtomicOp completer support	Alex Williamson	1	-0/+14
	Test and enable PCIe AtomicOp completer support of various widths and report via device-info capability to userspace. Reviewed-by: Cédric Le Goater <[email protected]> Reviewed-by: Robin Voetter <[email protected]> Tested-by: Robin Voetter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2023-06-06	s390/vfio-ap: realize the VFIO_DEVICE_GET_IRQ_INFO ioctl	Tony Krowiak	1	-0/+9
	Realize the VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve the information for the VFIO device request IRQ. Signed-off-by: Tony Krowiak <[email protected]> Acked-by: Alex Williamson <[email protected]> Reviewed-by: Cédric Le Goater <[email protected]> Reviewed-by: Matthew Rosato <[email protected]> Tested-by: Matthew Rosato <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexander Gordeev <[email protected]>
2023-05-23	vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X	Reinette Chatre	1	-0/+3
	Dynamic MSI-X is supported. Clear VFIO_IRQ_INFO_NORESIZE to provide guidance to user space. Signed-off-by: Reinette Chatre <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/fd1ef2bf6ae972da8e2805bc95d5155af5a8fb0a.1683740667.git.reinette.chatre@intel.com Signed-off-by: Alex Williamson <[email protected]>
2023-02-09	vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR	Steve Sistare	1	-6/+9
	Disable the VFIO_UPDATE_VADDR capability if mediated devices are present. Their kernel threads could be blocked indefinitely by a misbehaving userland while trying to pin/unpin pages while vaddrs are being updated. Do not allow groups to be added to the container while vaddr's are invalid, so we never need to block user threads from pinning, and can delete the vaddr-waiting code in a subsequent patch. Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") Cc: [email protected] Signed-off-by: Steve Sistare <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2022-12-06	vfio: Extend the device migration protocol with PRE_COPY	Jason Gunthorpe	1	-4/+119
	The optional PRE_COPY states open the saving data transfer FD before reaching STOP_COPY and allows the device to dirty track internal state changes with the general idea to reduce the volume of data transferred in the STOP_COPY stage. While in PRE_COPY the device remains RUNNING, but the saving FD is open. Only if the device also supports RUNNING_P2P can it support PRE_COPY_P2P, which halts P2P transfers while continuing the saving FD. PRE_COPY, with P2P support, requires the driver to implement 7 new arcs and exists as an optional FSM branch between RUNNING and STOP_COPY: RUNNING -> PRE_COPY -> PRE_COPY_P2P -> STOP_COPY A new ioctl VFIO_MIG_GET_PRECOPY_INFO is provided to allow userspace to query the progress of the precopy operation in the driver with the idea it will judge to move to STOP_COPY at least once the initial data set is transferred, and possibly after the dirty size has shrunk appropriately. This ioctl is valid only in PRE_COPY states and kernel driver should return -EINVAL from any other migration state. Compared to the v1 clarification, STOP_COPY -> PRE_COPY is blocked and to be defined in future. We also split the pending_bytes report into the initial and sustaining values, e.g.: initial_bytes and dirty_bytes. initial_bytes: Amount of initial precopy data. dirty_bytes: Device state changes relative to data previously retrieved. These fields are not required to have any bearing to STOP_COPY phase. It is recommended to leave PRE_COPY for STOP_COPY only after the initial_bytes field reaches zero. Leaving PRE_COPY earlier might make things slower. Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Shameer Kolothum <[email protected]> Signed-off-by: Yishai Hadas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2022-11-14	vfio: Add an option to get migration data size	Yishai Hadas	1	-0/+13
	Add an option to get migration data size by introducing a new migration feature named VFIO_DEVICE_FEATURE_MIG_DATA_SIZE. Upon VFIO_DEVICE_FEATURE_GET the estimated data length that will be required to complete STOP_COPY is returned. This option may better enable user space to consider before moving to STOP_COPY whether it can meet the downtime SLA based on the returned data. The patch also includes the implementation for mlx5 and hisi for this new option to make it feature complete for the existing drivers in this area. Signed-off-by: Yishai Hadas <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Longfang Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2022-09-08	vfio: Introduce DMA logging uAPIs	Yishai Hadas	1	-0/+86
	DMA logging allows a device to internally record what DMAs the device is initiating and report them back to userspace. It is part of the VFIO migration infrastructure that allows implementing dirty page tracking during the pre copy phase of live migration. Only DMA WRITEs are logged, and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE. This patch introduces the DMA logging involved uAPIs. It uses the FEATURE ioctl with its GET/SET/PROBE options as of below. It exposes a PROBE option to detect if the device supports DMA logging. It exposes a SET option to start device DMA logging in given IOVAs ranges. It exposes a SET option to stop device DMA logging that was previously started. It exposes a GET option to read back and clear the device DMA log. Extra details exist as part of vfio.h per a specific option. Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2022-09-01	vfio: Add the device features for the low power entry and exit	Abhishek Sahu	1	-0/+56
	This patch adds the following new device features for the low power entry and exit in the header file. The implementation for the same will be added in the subsequent patches. - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP - VFIO_DEVICE_FEATURE_LOW_POWER_EXIT For vfio-pci based devices, with the standard PCI PM registers, all power states cannot be achieved. The platform-based power management needs to be involved to go into the lowest power state. For doing low power entry and exit with platform-based power management, these device features can be used. The entry device feature has two variants. These two variants are mainly to support the different behaviour for the low power entry. If there is any access for the VFIO device on the host side, then the device will be moved out of the low power state without the user's guest driver involvement. Some devices (for example NVIDIA VGA or 3D controller) require the user's guest driver involvement for each low-power entry. In the first variant, the host can return the device to low power automatically. The device will continue to attempt to reach low power until the low power exit feature is called. In the second variant, if the device exits low power due to an access, the host kernel will signal the user via the provided eventfd and will not return the device to low power without a subsequent call to one of the low power entry features. A call to the low power exit feature is optional if the user provided eventfd is signaled. These device features only support VFIO_DEVICE_FEATURE_SET and VFIO_DEVICE_FEATURE_PROBE operations. Signed-off-by: Abhishek Sahu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2022-05-16	include/uapi/linux/vfio.h: Fix trivial typo - _IORW should be _IOWR instead	Thomas Huth	1	-2/+2
	There is no macro called _IORW, so use _IOWR in the comment instead. Signed-off-by: Thomas Huth <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]>
2022-03-03	vfio: Remove migration protocol v1 documentation	Jason Gunthorpe	1	-198/+2
	v1 was never implemented and is replaced by v2. The old uAPI documentation is removed from the header file. The old uAPI definitions are still kept in the header file to ease transition for userspace copying these headers. They will be fully removed down the road. Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Jason Gunthorpe <[email protected]> Tested-by: Shameer Kolothum <[email protected]> Reviewed-by: Alex Williamson <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2022-03-03	vfio: Extend the device migration protocol with RUNNING_P2P	Jason Gunthorpe	1	-2/+34
	The RUNNING_P2P state is designed to support multiple devices in the same VM that are doing P2P transactions between themselves. When in RUNNING_P2P the device must be able to accept incoming P2P transactions but should not generate outgoing P2P transactions. As an optional extension to the mandatory states it is defined as in between STOP and RUNNING: STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP For drivers that are unable to support RUNNING_P2P the core code silently merges RUNNING_P2P and RUNNING together. Unless driver support is present, the new state cannot be used in SET_STATE. Drivers that support this will be required to implement 4 FSM arcs beyond the basic FSM. 2 of the basic FSM arcs become combination transitions. Compared to the v1 clarification, NDMA is redefined into FSM states and is described in terms of the desired P2P quiescent behavior, noting that halting all DMA is an acceptable implementation. Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Jason Gunthorpe <[email protected]> Tested-by: Shameer Kolothum <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Alex Williamson <[email protected]> Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2022-03-03	vfio: Define device migration protocol v2	Jason Gunthorpe	1	-13/+161
	Replace the existing region based migration protocol with an ioctl based protocol. The two protocols have the same general semantic behaviors, but the way the data is transported is changed. This is the STOP_COPY portion of the new protocol, it defines the 5 states for basic stop and copy migration and the protocol to move the migration data in/out of the kernel. Compared to the clarification of the v1 protocol Alex proposed: https://lore.kernel.org/r/163909282574.728533.7460416142511440919.stgit@omen This has a few deliberate functional differences: - ERROR arcs allow the device function to remain unchanged. - The protocol is not required to return to the original state on transition failure. Instead userspace can execute an unwind back to the original state, reset, or do something else without needing kernel support. This simplifies the kernel design and should userspace choose a policy like always reset, avoids doing useless work in the kernel on error handling paths. - PRE_COPY is made optional, userspace must discover it before using it. This reflects the fact that the majority of drivers we are aware of right now will not implement PRE_COPY. - segmentation is not part of the data stream protocol, the receiver does not have to reproduce the framing boundaries. The hybrid FSM for the device_state is described as a Mealy machine by documenting each of the arcs the driver is required to implement. Defining the remaining set of old/new device_state transitions as 'combination transitions' which are naturally defined as taking multiple FSM arcs along the shortest path within the FSM's digraph allows a complete matrix of transitions. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is defined to replace writing to the device_state field in the region. This allows returning a brand new FD whenever the requested transition opens a data transfer session. The VFIO core code implements the new feature and provides a helper function to the driver. Using the helper the driver only has to implement 6 of the FSM arcs and the other combination transitions are elaborated consistently from those arcs. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIGRATION is defined to report the capability for migration and indicate which set of states and arcs are supported by the device. The FSM provides a lot of flexibility to make backwards compatible extensions but the VFIO_DEVICE_FEATURE also allows for future breaking extensions for scenarios that cannot support even the basic STOP_COPY requirements. The VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE with the GET option (i.e. VFIO_DEVICE_FEATURE_GET) can be used to read the current migration state of the VFIO device. Data transfer sessions are now carried over a file descriptor, instead of the region. The FD functions for the lifetime of the data transfer session. read() and write() transfer the data with normal Linux stream FD semantics. This design allows future expansion to support poll(), io_uring, and other performance optimizations. The complicated mmap mode for data transfer is discarded as current qemu doesn't take meaningful advantage of it, and the new qemu implementation avoids substantially all the performance penalty of using a read() on the region. Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Jason Gunthorpe <[email protected]> Tested-by: Shameer Kolothum <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Alex Williamson <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Yishai Hadas <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2021-05-05	vfio/pci: Revert nvlink removal uAPI breakage	Alex Williamson	1	-4/+42
	Revert the uAPI changes from the below commit with notice that these regions and capabilities are no longer provided. Fixes: b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2") Reported-by: Greg Kurz <[email protected]> Signed-off-by: Alex Williamson <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: Greg Kurz <[email protected]> Tested-by: Greg Kurz <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Message-Id: <162014341432.3807030.11054087109120670135.stgit@omen>
2021-04-06	vfio/pci: remove vfio_pci_nvlink2	Christoph Hellwig	1	-34/+4
	This driver never had any open userspace (which for VFIO would include VM kernel drivers) that use it, and thus should never have been added by our normal userspace ABI rules. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Message-Id: <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2021-02-01	vfio: interfaces to update vaddr	Steve Sistare	1	-0/+19
	Define interfaces that allow the underlying memory object of an iova range to be mapped to a new host virtual address in the host process: - VFIO_DMA_UNMAP_FLAG_VADDR for VFIO_IOMMU_UNMAP_DMA - VFIO_DMA_MAP_FLAG_VADDR flag for VFIO_IOMMU_MAP_DMA - VFIO_UPDATE_VADDR extension for VFIO_CHECK_EXTENSION Unmap vaddr invalidates the host virtual address in an iova range, and blocks vfio translation of host virtual addresses. DMA to already-mapped pages continues. Map vaddr updates the base VA and resumes translation. See comments in uapi/linux/vfio.h for more details. Signed-off-by: Steve Sistare <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2021-02-01	vfio: option to unmap all	Steve Sistare	1	-0/+8
	For the UNMAP_DMA ioctl, delete all mappings if VFIO_DMA_UNMAP_FLAG_ALL is set. Define the corresponding VFIO_UNMAP_ALL extension. Signed-off-by: Steve Sistare <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-12-03	vfio-ccw: Wire in the request callback	Eric Farman	1	-0/+1
	The device is being unplugged, so pass the request to userspace to ask for a graceful cleanup. This should free up the thread that would otherwise loop waiting for the device to be fully released. Signed-off-by: Eric Farman <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-10-12	Merge branches 'v5.10/vfio/fsl-mc-v6' and 'v5.10/vfio/zpci-info-v3' into ↵	Alex Williamson	1	-0/+12
	v5.10/vfio/next
2020-10-07	vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFO	Matthew Rosato	1	-0/+11
	Allow the VFIO_DEVICE_GET_INFO ioctl to include a capability chain. Add a flag indicating capability chain support, and introduce the definitions for the first set of capabilities which are specified to s390 zPCI devices. Signed-off-by: Matthew Rosato <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-10-07	vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices	Bharat Bhushan	1	-0/+1
	DPAA2 (Data Path Acceleration Architecture) consists in mechanisms for processing Ethernet packets, queue management, accelerators, etc. The Management Complex (mc) is a hardware entity that manages the DPAA2 hardware resources. It provides an object-based abstraction for software drivers to use the DPAA2 hardware. The MC mediates operations such as create, discover, destroy of DPAA2 objects. The MC provides memory-mapped I/O command interfaces (MC portals) which DPAA2 software drivers use to operate on DPAA2 objects. A DPRC is a container object that holds other types of DPAA2 objects. Each object in the DPRC is a Linux device and bound to a driver. The MC-bus driver is a platform driver (different from PCI or platform bus). The DPRC driver does runtime management of a bus instance. It performs the initial scan of the DPRC and handles changes in the DPRC configuration (adding/removing objects). All objects inside a container share the same hardware isolation context, meaning that only an entire DPRC can be assigned to a virtual machine. When a container is assigned to a virtual machine, all the objects within that container are assigned to that virtual machine. The DPRC container assigned to the virtual machine is not allowed to change contents (add/remove objects) by the guest. The restriction is set by the host and enforced by the mc hardware. The DPAA2 objects can be directly assigned to the guest. However the MC portals (the memory mapped command interface to the MC) need to be emulated because there are commands that configure the interrupts and the isolation IDs which are virtual in the guest. Example: echo vfio-fsl-mc > /sys/bus/fsl-mc/devices/dprc.2/driver_override echo dprc.2 > /sys/bus/fsl-mc/drivers/vfio-fsl-mc/bind The dprc.2 is bound to the VFIO driver and all the objects within dprc.2 are going to be bound to the VFIO driver. This patch adds the infrastructure for VFIO support for fsl-mc devices. Subsequent patches will add support for binding and secure assigning these devices using VFIO. More details about the DPAA2 objects can be found here: Documentation/networking/device_drivers/freescale/dpaa2/overview.rst Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Diana Craciun <[email protected]> Reviewed-by: Eric Auger <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-09-22	Merge branches 'v5.10/vfio/bardirty', 'v5.10/vfio/dma_avail', ↵	Alex Williamson	1	-1/+16
	'v5.10/vfio/misc', 'v5.10/vfio/no-cmd-mem' and 'v5.10/vfio/yan_zhao_fixes' into v5.10/vfio/next
2020-09-21	vfio iommu: Add dma available capability	Matthew Rosato	1	-0/+15
	Commit 492855939bdb ("vfio/type1: Limit DMA mappings per container") added the ability to limit the number of memory backed DMA mappings. However on s390x, when lazy mapping is in use, we use a very large number of concurrent mappings. Let's provide the current allowable number of DMA mappings to userspace via the IOMMU info chain so that userspace can take appropriate mitigation. Signed-off-by: Matthew Rosato <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-09-21	vfio: Fix typo of the device_state	Zenghui Yu	1	-1/+1
	A typo fix ("_RUNNNG" => "_RUNNING") in comment block of the uapi header. Signed-off-by: Zenghui Yu <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: Kirti Wankhede <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-06-18	vfio/type1: Fix migration info capability ID	Alex Williamson	1	-1/+1
	ID 1 is already used by the IOVA range capability, use ID 2. Reported-by: Liu Yi L <[email protected]> Fixes: ad721705d09c ("vfio iommu: Add migration capability to report supported features") Reviewed-by: Kirti Wankhede <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-06-08	Merge tag 's390-5.8-1' of ↵	Linus Torvalds	1	-0/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add support for multi-function devices in pci code. - Enable PF-VF linking for architectures using the pdev->no_vf_scan flag (currently just s390). - Add reipl from NVMe support. - Get rid of critical section cleanup in entry.S. - Refactor PNSO CHSC (perform network subchannel operation) in cio and qeth. - QDIO interrupts and error handling fixes and improvements, more refactoring changes. - Align ioremap() with generic code. - Accept requests without the prefetch bit set in vfio-ccw. - Enable path handling via two new regions in vfio-ccw. - Other small fixes and improvements all over the code. * tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (52 commits) vfio-ccw: make vfio_ccw_regops variables declarations static vfio-ccw: Add trace for CRW event vfio-ccw: Wire up the CRW irq and CRW region vfio-ccw: Introduce a new CRW region vfio-ccw: Refactor IRQ handlers vfio-ccw: Introduce a new schib region vfio-ccw: Refactor the unregister of the async regions vfio-ccw: Register a chp_event callback for vfio-ccw vfio-ccw: Introduce new helper functions to free/destroy regions vfio-ccw: document possible errors vfio-ccw: Enable transparent CCW IPL from DASD s390/pci: Log new handle in clp_disable_fh() s390/cio, s390/qeth: cleanup PNSO CHSC s390/qdio: remove q->first_to_kick s390/qdio: fix up qdio_start_irq() kerneldoc s390: remove critical section cleanup from entry.S s390: add machine check SIGP s390/pci: ioremap() align with generic code s390/ap: introduce new ap function ap_get_qdev() Documentation/s390: Update / remove developerWorks web links ...
2020-06-03	vfio-ccw: Introduce a new CRW region	Farhan Ali	1	-0/+2
	This region provides a mechanism to pass a Channel Report Word that affect vfio-ccw devices, and needs to be passed to the guest for its awareness and/or processing. The base driver (see crw_collect_info()) provides space for two CRWs, as a subchannel event may have two CRWs chained together (one for the ssid, one for the subchannel). As vfio-ccw will deal with everything at the subchannel level, provide space for a single CRW to be transferred in one shot. Signed-off-by: Farhan Ali <[email protected]> Signed-off-by: Eric Farman <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Message-Id: <[email protected]> [CH: added padding to ccw_crw_region] Signed-off-by: Cornelia Huck <[email protected]>
2020-06-02	vfio-ccw: Introduce a new schib region	Farhan Ali	1	-0/+1
	The schib region can be used by userspace to get the subchannel- information block (SCHIB) for the passthrough subchannel. This can be useful to get information such as channel path information via the SCHIB.PMCW fields. Signed-off-by: Farhan Ali <[email protected]> Signed-off-by: Eric Farman <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Message-Id: <[email protected]> Signed-off-by: Cornelia Huck <[email protected]>
2020-05-28	vfio iommu: Add migration capability to report supported features	Kirti Wankhede	1	-0/+23
	Added migration capability in IOMMU info chain. User application should check IOMMU info chain for migration capability to use dirty page tracking feature provided by kernel module. User application must check page sizes supported and maximum dirty bitmap size returned by this capability structure for ioctls used to get dirty bitmap. Signed-off-by: Kirti Wankhede <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: Yan Zhao <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-05-28	vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap	Kirti Wankhede	1	-0/+11
	DMA mapped pages, including those pinned by mdev vendor drivers, might get unpinned and unmapped while migration is active and device is still running. For example, in pre-copy phase while guest driver could access those pages, host device or vendor driver can dirty these mapped pages. Such pages should be marked dirty so as to maintain memory consistency for a user making use of dirty page tracking. To get bitmap during unmap, user should allocate memory for bitmap, set it all zeros, set size of allocated memory, set page size to be considered for bitmap and set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP. Signed-off-by: Kirti Wankhede <[email protected]> Reviewed-by: Neo Jia <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: Yan Zhao <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-05-28	vfio iommu: Add ioctl definition for dirty pages tracking	Kirti Wankhede	1	-0/+57
	IOMMU container maintains a list of all pages pinned by vfio_pin_pages API. All pages pinned by vendor driver through this API should be considered as dirty during migration. When container consists of IOMMU capable device and all pages are pinned and mapped, then all pages are marked dirty. Added support to start/stop dirtied pages tracking and to get bitmap of all dirtied pages for requested IO virtual address range. Signed-off-by: Kirti Wankhede <[email protected]> Reviewed-by: Neo Jia <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: Yan Zhao <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-05-28	vfio: UAPI for migration interface for device state	Kirti Wankhede	1	-0/+228
	- Defined MIGRATION region type and sub-type. - Defined vfio_device_migration_info structure which will be placed at the 0th offset of migration region to get/set VFIO device related information. Defined members of structure and usage on read/write access. - Defined device states and state transition details. - Defined sequence to be followed while saving and resuming VFIO device. Signed-off-by: Kirti Wankhede <[email protected]> Reviewed-by: Neo Jia <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: Yan Zhao <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-03-24	vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user	Alex Williamson	1	-0/+37
	The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device agnostic ioctl for setting, retrieving, and probing device features. This implementation provides a 16-bit field for specifying a feature index, where the data porition of the ioctl is determined by the semantics for the given feature. Additional flag bits indicate the direction and nature of the operation; SET indicates user data is provided into the device feature, GET indicates the device feature is written out into user data. The PROBE flag augments determining whether the given feature is supported, and if provided, whether the given operation on the feature is supported. The first user of this ioctl is for setting the vfio-pci VF token, where the user provides a shared secret key (UUID) on a SR-IOV PF device, which users must provide when opening associated VF devices. Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2019-08-23	Merge branches 'v5.4/vfio/alexey-tce-memory-free-v1', ↵	Alex Williamson	1	-20/+51
	'v5.4/vfio/connie-re-arrange-v2', 'v5.4/vfio/hexin-pci-reset-v3', 'v5.4/vfio/parav-mtty-uuid-v2' and 'v5.4/vfio/shameer-iova-list-v8' into v5.4/vfio/next
2019-08-19	vfio/type1: Add IOVA range capability support	Shameer Kolothum	1	-1/+25
	This allows the user-space to retrieve the supported IOVA range(s), excluding any non-relaxable reserved regions. The implementation is based on capability chains, added to VFIO_IOMMU_GET_INFO ioctl. Signed-off-by: Shameer Kolothum <[email protected]> Reviewed-by: Eric Auger <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2019-08-19	vfio: re-arrange vfio region definitions	Cornelia Huck	1	-19/+26
	It is easy to miss already defined region types. Let's re-arrange the definitions a bit and add more comments to make it hopefully a bit clearer. No functional change. Signed-off-by: Cornelia Huck <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2019-04-24	vfio-ccw: add handling for async channel instructions	Cornelia Huck	1	-0/+2
	Add a region to the vfio-ccw device that can be used to submit asynchronous I/O instructions. ssch continues to be handled by the existing I/O region; the new region handles hsch and csch. Interrupt status continues to be reported through the same channels as for ssch. Acked-by: Eric Farman <[email protected]> Reviewed-by: Farhan Ali <[email protected]> Signed-off-by: Cornelia Huck <[email protected]>