diff options
Diffstat (limited to 'Documentation/x86/sgx.rst')
-rw-r--r-- | Documentation/x86/sgx.rst | 302 |
1 files changed, 0 insertions, 302 deletions
diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst deleted file mode 100644 index 2bcbffacbed5..000000000000 --- a/Documentation/x86/sgx.rst +++ /dev/null @@ -1,302 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -=============================== -Software Guard eXtensions (SGX) -=============================== - -Overview -======== - -Software Guard eXtensions (SGX) hardware enables for user space applications -to set aside private memory regions of code and data: - -* Privileged (ring-0) ENCLS functions orchestrate the construction of the - regions. -* Unprivileged (ring-3) ENCLU functions allow an application to enter and - execute inside the regions. - -These memory regions are called enclaves. An enclave can be only entered at a -fixed set of entry points. Each entry point can hold a single hardware thread -at a time. While the enclave is loaded from a regular binary file by using -ENCLS functions, only the threads inside the enclave can access its memory. The -region is denied from outside access by the CPU, and encrypted before it leaves -from LLC. - -The support can be determined by - - ``grep sgx /proc/cpuinfo`` - -SGX must both be supported in the processor and enabled by the BIOS. If SGX -appears to be unsupported on a system which has hardware support, ensure -support is enabled in the BIOS. If a BIOS presents a choice between "Enabled" -and "Software Enabled" modes for SGX, choose "Enabled". - -Enclave Page Cache -================== - -SGX utilizes an *Enclave Page Cache (EPC)* to store pages that are associated -with an enclave. It is contained in a BIOS-reserved region of physical memory. -Unlike pages used for regular memory, pages can only be accessed from outside of -the enclave during enclave construction with special, limited SGX instructions. - -Only a CPU executing inside an enclave can directly access enclave memory. -However, a CPU executing inside an enclave may access normal memory outside the -enclave. - -The kernel manages enclave memory similar to how it treats device memory. - -Enclave Page Types ------------------- - -**SGX Enclave Control Structure (SECS)** - Enclave's address range, attributes and other global data are defined - by this structure. - -**Regular (REG)** - Regular EPC pages contain the code and data of an enclave. - -**Thread Control Structure (TCS)** - Thread Control Structure pages define the entry points to an enclave and - track the execution state of an enclave thread. - -**Version Array (VA)** - Version Array pages contain 512 slots, each of which can contain a version - number for a page evicted from the EPC. - -Enclave Page Cache Map ----------------------- - -The processor tracks EPC pages in a hardware metadata structure called the -*Enclave Page Cache Map (EPCM)*. The EPCM contains an entry for each EPC page -which describes the owning enclave, access rights and page type among the other -things. - -EPCM permissions are separate from the normal page tables. This prevents the -kernel from, for instance, allowing writes to data which an enclave wishes to -remain read-only. EPCM permissions may only impose additional restrictions on -top of normal x86 page permissions. - -For all intents and purposes, the SGX architecture allows the processor to -invalidate all EPCM entries at will. This requires that software be prepared to -handle an EPCM fault at any time. In practice, this can happen on events like -power transitions when the ephemeral key that encrypts enclave memory is lost. - -Application interface -===================== - -Enclave build functions ------------------------ - -In addition to the traditional compiler and linker build process, SGX has a -separate enclave “build” process. Enclaves must be built before they can be -executed (entered). The first step in building an enclave is opening the -**/dev/sgx_enclave** device. Since enclave memory is protected from direct -access, special privileged instructions are then used to copy data into enclave -pages and establish enclave page permissions. - -.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c - :functions: sgx_ioc_enclave_create - sgx_ioc_enclave_add_pages - sgx_ioc_enclave_init - sgx_ioc_enclave_provision - -Enclave runtime management --------------------------- - -Systems supporting SGX2 additionally support changes to initialized -enclaves: modifying enclave page permissions and type, and dynamically -adding and removing of enclave pages. When an enclave accesses an address -within its address range that does not have a backing page then a new -regular page will be dynamically added to the enclave. The enclave is -still required to run EACCEPT on the new page before it can be used. - -.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c - :functions: sgx_ioc_enclave_restrict_permissions - sgx_ioc_enclave_modify_types - sgx_ioc_enclave_remove_pages - -Enclave vDSO ------------- - -Entering an enclave can only be done through SGX-specific EENTER and ERESUME -functions, and is a non-trivial process. Because of the complexity of -transitioning to and from an enclave, enclaves typically utilize a library to -handle the actual transitions. This is roughly analogous to how glibc -implementations are used by most applications to wrap system calls. - -Another crucial characteristic of enclaves is that they can generate exceptions -as part of their normal operation that need to be handled in the enclave or are -unique to SGX. - -Instead of the traditional signal mechanism to handle these exceptions, SGX -can leverage special exception fixup provided by the vDSO. The kernel-provided -vDSO function wraps low-level transitions to/from the enclave like EENTER and -ERESUME. The vDSO function intercepts exceptions that would otherwise generate -a signal and return the fault information directly to its caller. This avoids -the need to juggle signal handlers. - -.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h - :functions: vdso_sgx_enter_enclave_t - -ksgxd -===== - -SGX support includes a kernel thread called *ksgxd*. - -EPC sanitization ----------------- - -ksgxd is started when SGX initializes. Enclave memory is typically ready -for use when the processor powers on or resets. However, if SGX has been in -use since the reset, enclave pages may be in an inconsistent state. This might -occur after a crash and kexec() cycle, for instance. At boot, ksgxd -reinitializes all enclave pages so that they can be allocated and re-used. - -The sanitization is done by going through EPC address space and applying the -EREMOVE function to each physical page. Some enclave pages like SECS pages have -hardware dependencies on other pages which prevents EREMOVE from functioning. -Executing two EREMOVE passes removes the dependencies. - -Page reclaimer --------------- - -Similar to the core kswapd, ksgxd, is responsible for managing the -overcommitment of enclave memory. If the system runs out of enclave memory, -*ksgxd* “swaps” enclave memory to normal memory. - -Launch Control -============== - -SGX provides a launch control mechanism. After all enclave pages have been -copied, kernel executes EINIT function, which initializes the enclave. Only after -this the CPU can execute inside the enclave. - -EINIT function takes an RSA-3072 signature of the enclave measurement. The function -checks that the measurement is correct and signature is signed with the key -hashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the -SHA256 of a public key. - -Those MSRs can be configured by the BIOS to be either readable or writable. -Linux supports only writable configuration in order to give full control to the -kernel on launch control policy. Before calling EINIT function, the driver sets -the MSRs to match the enclave's signing key. - -Encryption engines -================== - -In order to conceal the enclave data while it is out of the CPU package, the -memory controller has an encryption engine to transparently encrypt and decrypt -enclave memory. - -In CPUs prior to Ice Lake, the Memory Encryption Engine (MEE) is used to -encrypt pages leaving the CPU caches. MEE uses a n-ary Merkle tree with root in -SRAM to maintain integrity of the encrypted data. This provides integrity and -anti-replay protection but does not scale to large memory sizes because the time -required to update the Merkle tree grows logarithmically in relation to the -memory size. - -CPUs starting from Icelake use Total Memory Encryption (TME) in the place of -MEE. TME-based SGX implementations do not have an integrity Merkle tree, which -means integrity and replay-attacks are not mitigated. B, it includes -additional changes to prevent cipher text from being returned and SW memory -aliases from being created. - -DMA to enclave memory is blocked by range registers on both MEE and TME systems -(SDM section 41.10). - -Usage Models -============ - -Shared Library --------------- - -Sensitive data and the code that acts on it is partitioned from the application -into a separate library. The library is then linked as a DSO which can be loaded -into an enclave. The application can then make individual function calls into -the enclave through special SGX instructions. A run-time within the enclave is -configured to marshal function parameters into and out of the enclave and to -call the correct library function. - -Application Container ---------------------- - -An application may be loaded into a container enclave which is specially -configured with a library OS and run-time which permits the application to run. -The enclave run-time and library OS work together to execute the application -when a thread enters the enclave. - -Impact of Potential Kernel SGX Bugs -=================================== - -EPC leaks ---------- - -When EPC page leaks happen, a WARNING like this is shown in dmesg: - -"EREMOVE returned ... and an EPC page was leaked. SGX may become unusable..." - -This is effectively a kernel use-after-free of an EPC page, and due -to the way SGX works, the bug is detected at freeing. Rather than -adding the page back to the pool of available EPC pages, the kernel -intentionally leaks the page to avoid additional errors in the future. - -When this happens, the kernel will likely soon leak more EPC pages, and -SGX will likely become unusable because the memory available to SGX is -limited. However, while this may be fatal to SGX, the rest of the kernel -is unlikely to be impacted and should continue to work. - -As a result, when this happpens, user should stop running any new -SGX workloads, (or just any new workloads), and migrate all valuable -workloads. Although a machine reboot can recover all EPC memory, the bug -should be reported to Linux developers. - - -Virtual EPC -=========== - -The implementation has also a virtual EPC driver to support SGX enclaves -in guests. Unlike the SGX driver, an EPC page allocated by the virtual -EPC driver doesn't have a specific enclave associated with it. This is -because KVM doesn't track how a guest uses EPC pages. - -As a result, the SGX core page reclaimer doesn't support reclaiming EPC -pages allocated to KVM guests through the virtual EPC driver. If the -user wants to deploy SGX applications both on the host and in guests -on the same machine, the user should reserve enough EPC (by taking out -total virtual EPC size of all SGX VMs from the physical EPC size) for -host SGX applications so they can run with acceptable performance. - -Architectural behavior is to restore all EPC pages to an uninitialized -state also after a guest reboot. Because this state can be reached only -through the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc`` -provides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction -on all pages in the virtual EPC. - -``EREMOVE`` can fail for three reasons. Userspace must pay attention -to expected failures and handle them as follows: - -1. Page removal will always fail when any thread is running in the - enclave to which the page belongs. In this case the ioctl will - return ``EBUSY`` independent of whether it has successfully removed - some pages; userspace can avoid these failures by preventing execution - of any vcpu which maps the virtual EPC. - -2. Page removal will cause a general protection fault if two calls to - ``EREMOVE`` happen concurrently for pages that refer to the same - "SECS" metadata pages. This can happen if there are concurrent - invocations to ``SGX_IOC_VEPC_REMOVE_ALL``, or if a ``/dev/sgx_vepc`` - file descriptor in the guest is closed at the same time as - ``SGX_IOC_VEPC_REMOVE_ALL``; it will also be reported as ``EBUSY``. - This can be avoided in userspace by serializing calls to the ioctl() - and to close(), but in general it should not be a problem. - -3. Finally, page removal will fail for SECS metadata pages which still - have child pages. Child pages can be removed by executing - ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors - mapped into the guest. This means that the ioctl() must be called - twice: an initial set of calls to remove child pages and a subsequent - set of calls to remove SECS pages. The second set of calls is only - required for those mappings that returned a nonzero value from the - first call. It indicates a bug in the kernel or the userspace client - if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has - a return code other than 0. |