blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2022-07-09	x86/speculation: Disable RRSBA behavior	Pawan Gupta	2	-0/+27
	Some Intel processors may use alternate predictors for RETs on RSB-underflow. This condition may be vulnerable to Branch History Injection (BHI) and intramode-BTI. Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines, eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against such attacks. However, on RSB-underflow, RET target prediction may fallback to alternate predictors. As a result, RET's predicted target may get influenced by branch history. A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback behavior when in kernel mode. When set, RETs will not take predictions from alternate predictors, hence mitigating RETs as well. Support for this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2). For spectre v2 mitigation, when a user selects a mitigation that protects indirect CALLs and JMPs against BHI and intramode-BTI, set RRSBA_DIS_S also to protect RETs for RSB-underflow case. Signed-off-by: Pawan Gupta <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-07-08	x86/sgx: Drop 'page_index' from sgx_backing	Sean Christopherson	2	-2/+0
	Storing the 'page_index' value in the sgx_backing struct is dead code and no longer needed. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Kristen Carlson Accardi <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2022-07-08	x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported	Thadeu Lima de Souza Cascardo	1	-2/+5
	There are some VM configurations which have Skylake model but do not support IBPB. In those cases, when using retbleed=ibpb, userspace is going to be killed and kernel is going to panic. If the CPU does not support IBPB, warn and proceed with the auto option. Also, do not fallback to IBPB on AMD/Hygon systems if it is not supported. Fixes: 3ebc17006888 ("x86/bugs: Add retbleed=ibpb") Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-07-07	x86/sgx: Free up EPC pages directly to support large page ranges	Reinette Chatre	3	-0/+18
	The page reclaimer ensures availability of EPC pages across all enclaves. In support of this it runs independently from the individual enclaves in order to take locks from the different enclaves as it writes pages to swap. When needing to load a page from swap an EPC page needs to be available for its contents to be loaded into. Loading an existing enclave page from swap does not reclaim EPC pages directly if none are available, instead the reclaimer is woken when the available EPC pages are found to be below a watermark. When iterating over a large number of pages in an oversubscribed environment there is a race between the reclaimer woken up and EPC pages reclaimed fast enough for the page operations to proceed. Ensure there are EPC pages available before attempting to load a page that may potentially be pulled from swap into an available EPC page. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/a0d8f037c4a075d56bf79f432438412985f7ff7a.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Support complete page removal	Reinette Chatre	1	-0/+145
	The SGX2 page removal flow was introduced in previous patch and is as follows: 1) Change the type of the pages to be removed to SGX_PAGE_TYPE_TRIM using the ioctl() SGX_IOC_ENCLAVE_MODIFY_TYPES introduced in previous patch. 2) Approve the page removal by running ENCLU[EACCEPT] from within the enclave. 3) Initiate actual page removal using the ioctl() SGX_IOC_ENCLAVE_REMOVE_PAGES introduced here. Support the final step of the SGX2 page removal flow with ioctl() SGX_IOC_ENCLAVE_REMOVE_PAGES. With this ioctl() the user specifies a page range that should be removed. All pages in the provided range should have the SGX_PAGE_TYPE_TRIM page type and the request will fail with EPERM (Operation not permitted) if a page that does not have the correct type is encountered. Page removal can fail on any page within the provided range. Support partial success by returning the number of pages that were successfully removed. Since actual page removal will succeed even if ENCLU[EACCEPT] was not run from within the enclave the ENCLU[EMODPR] instruction with RWX permissions is used as a no-op mechanism to ensure ENCLU[EACCEPT] was successfully run from within the enclave before the enclave page is removed. If the user omits running SGX_IOC_ENCLAVE_REMOVE_PAGES the pages will still be removed when the enclave is unloaded. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Tested-by: Haitao Huang <[email protected]> Tested-by: Vijay Dhanraj <[email protected]> Tested-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/b75ee93e96774e38bb44a24b8e9bbfb67b08b51b.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Support modifying SGX page type	Reinette Chatre	1	-0/+202
	Every enclave contains one or more Thread Control Structures (TCS). The TCS contains meta-data used by the hardware to save and restore thread specific information when entering/exiting the enclave. With SGX1 an enclave needs to be created with enough TCSs to support the largest number of threads expecting to use the enclave and enough enclave pages to meet all its anticipated memory demands. In SGX1 all pages remain in the enclave until the enclave is unloaded. SGX2 introduces a new function, ENCLS[EMODT], that is used to change the type of an enclave page from a regular (SGX_PAGE_TYPE_REG) enclave page to a TCS (SGX_PAGE_TYPE_TCS) page or change the type from a regular (SGX_PAGE_TYPE_REG) or TCS (SGX_PAGE_TYPE_TCS) page to a trimmed (SGX_PAGE_TYPE_TRIM) page (setting it up for later removal). With the existing support of dynamically adding regular enclave pages to an initialized enclave and changing the page type to TCS it is possible to dynamically increase the number of threads supported by an enclave. Changing the enclave page type to SGX_PAGE_TYPE_TRIM is the first step of dynamically removing pages from an initialized enclave. The complete page removal flow is: 1) Change the type of the pages to be removed to SGX_PAGE_TYPE_TRIM using the SGX_IOC_ENCLAVE_MODIFY_TYPES ioctl() introduced here. 2) Approve the page removal by running ENCLU[EACCEPT] from within the enclave. 3) Initiate actual page removal using the ioctl() introduced in the following patch. Add ioctl() SGX_IOC_ENCLAVE_MODIFY_TYPES to support changing SGX enclave page types within an initialized enclave. With SGX_IOC_ENCLAVE_MODIFY_TYPES the user specifies a page range and the enclave page type to be applied to all pages in the provided range. The ioctl() itself can return an error code based on failures encountered by the kernel. It is also possible for SGX specific failures to be encountered. Add a result output parameter to communicate the SGX return code. It is possible for the enclave page type change request to fail on any page within the provided range. Support partial success by returning the number of pages that were successfully changed. After the page type is changed the page continues to be accessible from the kernel perspective with page table entries and internal state. The page may be moved to swap. Any access until ENCLU[EACCEPT] will encounter a page fault with SGX flag set in error code. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Tested-by: Jarkko Sakkinen <[email protected]> Tested-by: Haitao Huang <[email protected]> Tested-by: Vijay Dhanraj <[email protected]> Link: https://lkml.kernel.org/r/babe39318c5bf16fc65fbfb38896cdee72161575.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Tighten accessible memory range after enclave initialization	Reinette Chatre	1	-0/+5
	Before an enclave is initialized the enclave's memory range is unknown. The enclave's memory range is learned at the time it is created via the SGX_IOC_ENCLAVE_CREATE ioctl() where the provided memory range is obtained from an earlier mmap() of /dev/sgx_enclave. After an enclave is initialized its memory can be mapped into user space (mmap()) from where it can be entered at its defined entry points. With the enclave's memory range known after it is initialized there is no reason why it should be possible to map memory outside this range. Lock down access to the initialized enclave's memory range by denying any attempt to map memory outside its memory range. Locking down the memory range also makes adding pages to an initialized enclave more efficient. Pages are added to an initialized enclave by accessing memory that belongs to the enclave's memory range but not yet backed by an enclave page. If it is possible for user space to map memory that does not form part of the enclave then an access to this memory would eventually fail. Failures range from a prompt general protection fault if the access was an ENCLU[EACCEPT] from within the enclave, or a page fault via the vDSO if it was another access from within the enclave, or a SIGBUS (also resulting from a page fault) if the access was from outside the enclave. Disallowing invalid memory to be mapped in the first place avoids preventable failures. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/6391460d75ae79cea2e81eef0f6ffc03c6e9cfe7.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Support adding of pages to an initialized enclave	Reinette Chatre	1	-0/+117
	With SGX1 an enclave needs to be created with its maximum memory demands allocated. Pages cannot be added to an enclave after it is initialized. SGX2 introduces a new function, ENCLS[EAUG], that can be used to add pages to an initialized enclave. With SGX2 the enclave still needs to set aside address space for its maximum memory demands during enclave creation, but all pages need not be added before enclave initialization. Pages can be added during enclave runtime. Add support for dynamically adding pages to an initialized enclave, architecturally limited to RW permission at creation but allowed to obtain RWX permissions after trusted enclave runs EMODPE. Add pages via the page fault handler at the time an enclave address without a backing enclave page is accessed, potentially directly reclaiming pages if no free pages are available. The enclave is still required to run ENCLU[EACCEPT] on the page before it can be used. A useful flow is for the enclave to run ENCLU[EACCEPT] on an uninitialized address. This will trigger the page fault handler that will add the enclave page and return execution to the enclave to repeat the ENCLU[EACCEPT] instruction, this time successful. If the enclave accesses an uninitialized address in another way, for example by expanding the enclave stack to a page that has not yet been added, then the page fault handler would add the page on the first write but upon returning to the enclave the instruction that triggered the page fault would be repeated and since ENCLU[EACCEPT] was not run yet it would trigger a second page fault, this time with the SGX flag set in the page fault error code. This can only be recovered by entering the enclave again and directly running the ENCLU[EACCEPT] instruction on the now initialized address. Accessing an uninitialized address from outside the enclave also triggers this flow but the page will remain inaccessible (access will result in #PF) until accepted from within the enclave via ENCLU[EACCEPT]. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Tested-by: Jarkko Sakkinen <[email protected]> Tested-by: Haitao Huang <[email protected]> Tested-by: Vijay Dhanraj <[email protected]> Link: https://lkml.kernel.org/r/a254a58eabea053803277449b24b6e4963a3883b.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Support restricting of enclave page permissions	Reinette Chatre	1	-0/+216
	In the initial (SGX1) version of SGX, pages in an enclave need to be created with permissions that support all usages of the pages, from the time the enclave is initialized until it is unloaded. For example, pages used by a JIT compiler or when code needs to otherwise be relocated need to always have RWX permissions. SGX2 includes a new function ENCLS[EMODPR] that is run from the kernel and can be used to restrict the EPCM permissions of regular enclave pages within an initialized enclave. Introduce ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS to support restricting EPCM permissions. With this ioctl() the user specifies a page range and the EPCM permissions to be applied to all pages in the provided range. ENCLS[EMODPR] is run to restrict the EPCM permissions followed by the ENCLS[ETRACK] flow that will ensure no cached linear-to-physical address mappings to the changed pages remain. It is possible for the permission change request to fail on any page within the provided range, either with an error encountered by the kernel or by the SGX hardware while running ENCLS[EMODPR]. To support partial success the ioctl() returns an error code based on failures encountered by the kernel as well as two result output parameters: one for the number of pages that were successfully changed and one for the SGX return code. The page table entry permissions are not impacted by the EPCM permission changes. VMAs and PTEs will continue to allow the maximum vetted permissions determined at the time the pages are added to the enclave. The SGX error code in a page fault will indicate if it was an EPCM permission check that prevented an access attempt. No checking is done to ensure that the permissions are actually being restricted. This is because the enclave may have relaxed the EPCM permissions from within the enclave without the kernel knowing. An attempt to relax permissions using this call will be ignored by the hardware. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Tested-by: Jarkko Sakkinen <[email protected]> Tested-by: Haitao Huang <[email protected]> Tested-by: Vijay Dhanraj <[email protected]> Link: https://lkml.kernel.org/r/082cee986f3c1a2f4fdbf49501d7a8c5a98446f8.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Support VA page allocation without reclaiming	Reinette Chatre	3	-8/+10
	struct sgx_encl should be protected with the mutex sgx_encl->lock. One exception is sgx_encl->page_cnt that is incremented (in sgx_encl_grow()) when an enclave page is added to the enclave. The reason the mutex is not held is to allow the reclaimer to be called directly if there are no EPC pages (in support of a new VA page) available at the time. Incrementing sgx_encl->page_cnt without sgc_encl->lock held is currently (before SGX2) safe from concurrent updates because all paths in which sgx_encl_grow() is called occur before enclave initialization and are protected with an atomic operation on SGX_ENCL_IOCTL. SGX2 includes support for dynamically adding pages after enclave initialization where the protection of SGX_ENCL_IOCTL is not available. Make direct reclaim of EPC pages optional when new VA pages are added to the enclave. Essentially the existing "reclaim" flag used when regular EPC pages are added to an enclave becomes available to the caller when used to allocate VA pages instead of always being "true". When adding pages without invoking the reclaimer it is possible to do so with sgx_encl->lock held, gaining its protection against concurrent updates to sgx_encl->page_cnt after enclave initialization. No functional change. Reported-by: Haitao Huang <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/42c5934c229982ee67982bb97c6ab34bde758620.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Export sgx_encl_page_alloc()	Jarkko Sakkinen	3	-32/+35
	Move sgx_encl_page_alloc() to encl.c and export it so that it can be used in the implementation for support of adding pages to initialized enclaves, which requires to allocate new enclave pages. Signed-off-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Link: https://lkml.kernel.org/r/57ae71b4ea17998467670232e12d6617b95c6811.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Export sgx_encl_{grow,shrink}()	Reinette Chatre	2	-2/+4
	In order to use sgx_encl_{grow,shrink}() in the page augmentation code located in encl.c, export these functions. Suggested-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/d51730acf54b6565710b2261b3099517b38c2ec4.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Keep record of SGX page type	Reinette Chatre	2	-1/+4
	SGX2 functions are not allowed on all page types. For example, ENCLS[EMODPR] is only allowed on regular SGX enclave pages and ENCLS[EMODPT] is only allowed on TCS and regular pages. If these functions are attempted on another type of page the hardware would trigger a fault. Keep a record of the SGX page type so that there is more certainty whether an SGX2 instruction can succeed and faults can be treated as real failures. The page type is a property of struct sgx_encl_page and thus does not cover the VA page type. VA pages are maintained in separate structures and their type can be determined in a different way. The SGX2 instructions needing the page type do not operate on VA pages and this is thus not a scenario needing to be covered at this time. struct sgx_encl_page hosting this information is maintained for each enclave page so the space consumed by the struct is important. The existing sgx_encl_page->vm_max_prot_bits is already unsigned long while only using three bits. Transition to a bitfield for the two members to support the additional information without increasing the space consumed by the struct. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/a0a6939eefe7ba26514f6c49723521cde372de64.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Create utility to validate user provided offset and length	Reinette Chatre	1	-6/+22
	User provided offset and length is validated when parsing the parameters of the SGX_IOC_ENCLAVE_ADD_PAGES ioctl(). Extract this validation (with consistent use of IS_ALIGNED) into a utility that can be used by the SGX2 ioctl()s that will also provide these values. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/767147bc100047abed47fe27c592901adfbb93a2.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Make sgx_ipi_cb() available internally	Reinette Chatre	2	-1/+3
	The ETRACK function followed by an IPI to all CPUs within an enclave is a common pattern with more frequent use in support of SGX2. Make the (empty) IPI callback function available internally in preparation for usage by SGX2. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/1179ed4a9c3c1c2abf49d51bfcf2c30b493181cc.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Move PTE zap code to new sgx_zap_enclave_ptes()	Reinette Chatre	3	-31/+47
	The SGX reclaimer removes page table entries pointing to pages that are moved to swap. SGX2 enables changes to pages belonging to an initialized enclave, thus enclave pages may have their permission or type changed while the page is being accessed by an enclave. Supporting SGX2 requires page table entries to be removed so that any cached mappings to changed pages are removed. For example, with the ability to change enclave page types a regular enclave page may be changed to a Thread Control Structure (TCS) page that may not be accessed by an enclave. Factor out the code removing page table entries to a separate function sgx_zap_enclave_ptes(), fixing accuracy of comments in the process, and make it available to the upcoming SGX2 code. Place sgx_zap_enclave_ptes() with the rest of the enclave code in encl.c interacting with the page table since this code is no longer unique to the reclaimer. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/b010cdf01d7ce55dd0f00e883b7ccbd9db57160a.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Rename sgx_encl_ewb_cpumask() as sgx_encl_cpumask()	Reinette Chatre	3	-5/+5
	sgx_encl_ewb_cpumask() is no longer unique to the reclaimer where it is used during the EWB ENCLS leaf function when EPC pages are written out to main memory and sgx_encl_ewb_cpumask() is used to learn which CPUs might have executed the enclave to ensure that TLBs are cleared. Upcoming SGX2 enabling will use sgx_encl_ewb_cpumask() during the EMODPR and EMODT ENCLS leaf functions that make changes to enclave pages. The function is needed for the same reason it is used now: to learn which CPUs might have executed the enclave to ensure that TLBs no longer point to the changed pages. Rename sgx_encl_ewb_cpumask() to sgx_encl_cpumask() to reflect the broader usage. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/d4d08c449450a13d8dd3bb6c2b1af03895586d4f.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Export sgx_encl_ewb_cpumask()	Reinette Chatre	3	-29/+68
	Using sgx_encl_ewb_cpumask() to learn which CPUs might have executed an enclave is useful to ensure that TLBs are cleared when changes are made to enclave pages. sgx_encl_ewb_cpumask() is used within the reclaimer when an enclave page is evicted. The upcoming SGX2 support enables changes to be made to enclave pages and will require TLBs to not refer to the changed pages and thus will be needing sgx_encl_ewb_cpumask(). Relocate sgx_encl_ewb_cpumask() to be with the rest of the enclave code in encl.c now that it is no longer unique to the reclaimer. Take care to ensure that any future usage maintains the current context requirement that ETRACK has been called first. Expand the existing comments to highlight this while moving them to a more prominent location before the function. No functional change. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/05b60747fd45130cf9fc6edb1c373a69a18a22c5.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Support loading enclave page without VMA permissions check	Reinette Chatre	2	-19/+40
	sgx_encl_load_page() is used to find and load an enclave page into enclave (EPC) memory, potentially loading it from the backing storage. Both usages of sgx_encl_load_page() are during an access to the enclave page from a VMA and thus the permissions of the VMA are considered before the enclave page is loaded. SGX2 functions operating on enclave pages belonging to an initialized enclave requiring the page to be in EPC. It is thus required to support loading enclave pages into the EPC independent from a VMA. Split the current sgx_encl_load_page() to support the two usages: A new call, sgx_encl_load_page_in_vma(), behaves exactly like the current sgx_encl_load_page() that takes VMA permissions into account, while sgx_encl_load_page() just loads an enclave page into EPC. VMA, PTE, and EPCM permissions continue to dictate whether the pages can be accessed from within an enclave. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/d4393513c1f18987c14a490bcf133bfb71a5dc43.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Add wrapper for SGX2 EAUG function	Reinette Chatre	1	-0/+6
	Add a wrapper for the EAUG ENCLS leaf function used to add a page to an initialized enclave. EAUG: 1) Stores all properties of the new enclave page in the SGX hardware's Enclave Page Cache Map (EPCM). 2) Sets the PENDING bit in the EPCM entry of the enclave page. This bit is cleared by the enclave by invoking ENCLU leaf function EACCEPT or EACCEPTCOPY. Access from within the enclave to the new enclave page is not possible until the PENDING bit is cleared. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/97a46754fe4764e908651df63694fb760f783d6e.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Add wrapper for SGX2 EMODT function	Reinette Chatre	1	-0/+6
	Add a wrapper for the EMODT ENCLS leaf function used to change the type of an enclave page as maintained in the SGX hardware's Enclave Page Cache Map (EPCM). EMODT: 1) Updates the EPCM page type of the enclave page. 2) Sets the MODIFIED bit in the EPCM entry of the enclave page. This bit is reset by the enclave by invoking ENCLU leaf function EACCEPT or EACCEPTCOPY. Access from within the enclave to the enclave page is not possible while the MODIFIED bit is set. After changing the enclave page type by issuing EMODT the kernel needs to collaborate with the hardware to ensure that no logical processor continues to hold a reference to the changed page. This is required to ensure no required security checks are circumvented and is required for the enclave's EACCEPT/EACCEPTCOPY to succeed. Ensuring that no references to the changed page remain is accomplished with the ETRACK flow. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/dba63a8c0db1d510b940beee1ba2a8207efeb1f1.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Add wrapper for SGX2 EMODPR function	Reinette Chatre	1	-0/+6
	Add a wrapper for the EMODPR ENCLS leaf function used to restrict enclave page permissions as maintained in the SGX hardware's Enclave Page Cache Map (EPCM). EMODPR: 1) Updates the EPCM permissions of an enclave page by treating the new permissions as a mask. Supplying a value that attempts to relax EPCM permissions has no effect on EPCM permissions (PR bit, see below, is changed). 2) Sets the PR bit in the EPCM entry of the enclave page to indicate that permission restriction is in progress. The bit is reset by the enclave by invoking ENCLU leaf function EACCEPT or EACCEPTCOPY. The enclave may access the page throughout the entire process if conforming to the EPCM permissions for the enclave page. After performing the permission restriction by issuing EMODPR the kernel needs to collaborate with the hardware to ensure that all logical processors sees the new restricted permissions. This is required for the enclave's EACCEPT/EACCEPTCOPY to succeed and is accomplished with the ETRACK flow. Expand enum sgx_return_code with the possible EMODPR return values. Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/d15e7a769e13e4ca671fa2d0a0d3e3aec5aedbd4.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/sgx: Add short descriptions to ENCLS wrappers	Reinette Chatre	1	-0/+15
	The SGX ENCLS instruction uses EAX to specify an SGX function and may require additional registers, depending on the SGX function. ENCLS invokes the specified privileged SGX function for managing and debugging enclaves. Macros are used to wrap the ENCLS functionality and several wrappers are used to wrap the macros to make the different SGX functions accessible in the code. The wrappers of the supported SGX functions are cryptic. Add short descriptions of each as a comment. Suggested-by: Dave Hansen <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Link: https://lkml.kernel.org/r/5e78a1126711cbd692d5b8132e0683873398f69e.1652137848.git.reinette.chatre@intel.com
2022-07-07	x86/bugs: Add Cannon lake to RETBleed affected CPU list	Pawan Gupta	1	-0/+1
	Cannon lake is also affected by RETBleed, add it to the list. Fixes: 6ad0ad2bf8a6 ("x86/bugs: Report Intel retbleed vulnerability") Signed-off-by: Pawan Gupta <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-29	x86/retbleed: Add fine grained Kconfig knobs	Peter Zijlstra	2	-15/+29
	Do fine-grained Kconfig for all the various retbleed parts. NOTE: if your compiler doesn't support return thunks this will silently 'upgrade' your mitigation to IBPB, you might not like this. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-28	x86/mce: Check whether writes to MCA_STATUS are getting ignored	Smita Koralahalli	2	-1/+48
	The platform can sometimes - depending on its settings - cause writes to MCA_STATUS MSRs to get ignored, regardless of HWCR[McStatusWrEn]'s value. For further info see PPR for AMD Family 19h, Model 01h, Revision B1 Processors, doc ID 55898 at https://bugzilla.kernel.org/show_bug.cgi?id=206537. Therefore, probe for ignored writes to MCA_STATUS to determine if hardware error injection is at all possible. [ bp: Heavily massage commit message and patch. ] Signed-off-by: Smita Koralahalli <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-27	x86/cpu/amd: Enumerate BTC_NO	Andrew Cooper	2	-8/+19
	BTC_NO indicates that hardware is not susceptible to Branch Type Confusion. Zen3 CPUs don't suffer BTC. Hypervisors are expected to synthesise BTC_NO when it is appropriate given the migration pool, to prevent kernels using heuristics. [ bp: Massage. ] Signed-off-by: Andrew Cooper <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/common: Stamp out the stepping madness	Peter Zijlstra	1	-21/+16
	The whole MMIO/RETBLEED enumeration went overboard on steppings. Get rid of all that and simply use ANY. If a future stepping of these models would not be affected, it had better set the relevant ARCH_CAP_$FOO_NO bit in IA32_ARCH_CAPABILITIES. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Dave Hansen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	KVM: VMX: Prevent RSB underflow before vmenter	Josh Poimboeuf	1	-2/+2
	On VMX, there are some balanced returns between the time the guest's SPEC_CTRL value is written, and the vmenter. Balanced returns (matched by a preceding call) are usually ok, but it's at least theoretically possible an NMI with a deep call stack could empty the RSB before one of the returns. For maximum paranoia, don't allow any returns (balanced or otherwise) between the SPEC_CTRL write and the vmenter. [ bp: Fix 32-bit build. ] Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/speculation: Fill RSB on vmexit for IBRS	Josh Poimboeuf	1	-5/+58
	Prevent RSB underflow/poisoning attacks with RSB. While at it, add a bunch of comments to attempt to document the current state of tribal knowledge about RSB attacks and what exactly is being mitigated. Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS	Josh Poimboeuf	1	-0/+4
	On eIBRS systems, the returns in the vmexit return path from __vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks. Fix that by moving the post-vmexit spec_ctrl handling to immediately after the vmexit. Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/speculation: Remove x86_spec_ctrl_mask	Josh Poimboeuf	1	-30/+1
	This mask has been made redundant by kvm_spec_ctrl_test_value(). And it doesn't even work when MSR interception is disabled, as the guest can just write to SPEC_CTRL directly. Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Paolo Bonzini <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit	Josh Poimboeuf	1	-11/+1
	There's no need to recalculate the host value for every entry/exit. Just use the cached value in spec_ctrl_current(). Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/speculation: Fix SPEC_CTRL write on SMT state change	Josh Poimboeuf	1	-1/+2
	If the SMT state changes, SSBD might get accidentally disabled. Fix that. Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/cpu/amd: Add Spectral Chicken	Peter Zijlstra	3	-1/+30
	Zen2 uarchs have an undocumented, unnamed, MSR that contains a chicken bit for some speculation behaviour. It needs setting. Note: very belatedly AMD released naming; it's now officially called MSR_AMD64_DE_CFG2 and MSR_AMD64_DE_CFG2_SUPPRESS_NOBR_PRED_BIT but shall remain the SPECTRAL CHICKEN. Suggested-by: Andrew Cooper <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/bugs: Do IBPB fallback check only once	Josh Poimboeuf	1	-10/+5
	When booting with retbleed=auto, if the kernel wasn't built with CONFIG_CC_HAS_RETURN_THUNK, the mitigation falls back to IBPB. Make sure a warning is printed in that case. The IBPB fallback check is done twice, but it really only needs to be done once. Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/bugs: Add retbleed=ibpb	Peter Zijlstra	1	-9/+34
	jmp2ret mitigates the easy-to-attack case at relatively low overhead. It mitigates the long speculation windows after a mispredicted RET, but it does not mitigate the short speculation window from arbitrary instruction boundaries. On Zen2, there is a chicken bit which needs setting, which mitigates "arbitrary instruction boundaries" down to just "basic block boundaries". But there is no fix for the short speculation window on basic block boundaries, other than to flush the entire BTB to evict all attacker predictions. On the spectrum of "fast & blurry" -> "safe", there is (on top of STIBP or no-SMT): 1) Nothing System wide open 2) jmp2ret May stop a script kiddy 3) jmp2ret+chickenbit Raises the bar rather further 4) IBPB Only thing which can count as "safe". Tentative numbers put IBPB-on-entry at a 2.5x hit on Zen2, and a 10x hit on Zen1 according to lmbench. [ bp: Fixup feature bit comments, document option, 32-bit build fix. ] Suggested-by: Andrew Cooper <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	intel_idle: Disable IBRS during long idle	Peter Zijlstra	1	-0/+6
	Having IBRS enabled while the SMT sibling is idle unnecessarily slows down the running sibling. OTOH, disabling IBRS around idle takes two MSR writes, which will increase the idle latency. Therefore, only disable IBRS around deeper idle states. Shallow idle states are bounded by the tick in duration, since NOHZ is not allowed for them by virtue of their short target residency. Only do this for mwait-driven idle, since that keeps interrupts disabled across idle, which makes disabling IBRS vs IRQ-entry a non-issue. Note: C6 is a random threshold, most importantly C1 probably shouldn't disable IBRS, benchmarking needed. Suggested-by: Tim Chen <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/bugs: Report Intel retbleed vulnerability	Peter Zijlstra	2	-18/+45
	Skylake suffers from RSB underflow speculation issues; report this vulnerability and it's mitigation (spectre_v2=ibrs). [jpoimboe: cleanups, eibrs] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/bugs: Split spectre_v2_select_mitigation() and ↵	Peter Zijlstra	1	-8/+17
	spectre_v2_user_select_mitigation() retbleed will depend on spectre_v2, while spectre_v2_user depends on retbleed. Break this cycle. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS	Pawan Gupta	1	-14/+52
	Extend spectre_v2= boot option with Kernel IBRS. [jpoimboe: no STIBP with IBRS] Signed-off-by: Pawan Gupta <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/bugs: Optimize SPEC_CTRL MSR writes	Peter Zijlstra	1	-6/+12
	When changing SPEC_CTRL for user control, the WRMSR can be delayed until return-to-user when KERNEL_IBRS has been enabled. This avoids an MSR write during context switch. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value	Peter Zijlstra	1	-5/+23
	Due to TIF_SSBD and TIF_SPEC_IB the actual IA32_SPEC_CTRL value can differ from x86_spec_ctrl_base. As such, keep a per-CPU value reflecting the current task's MSR content. [jpoimboe: rename] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/bugs: Enable STIBP for JMP2RET	Kim Phillips	1	-12/+46
	For untrained return thunks to be fully effective, STIBP must be enabled or SMT disabled. Co-developed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Kim Phillips <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/bugs: Add AMD retbleed= boot parameter	Alexandre Chartre	1	-1/+107
	Add the "retbleed=<value>" boot parameter to select a mitigation for RETBleed. Possible values are "off", "auto" and "unret" (JMP2RET mitigation). The default value is "auto". Currently, "retbleed=auto" will select the unret mitigation on AMD and Hygon and no mitigation on Intel (JMP2RET is not effective on Intel). [peterz: rebase; add hygon] [jpoimboe: cleanups] Signed-off-by: Alexandre Chartre <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-27	x86/bugs: Report AMD retbleed vulnerability	Alexandre Chartre	2	-0/+32
	Report that AMD x86 CPUs are vulnerable to the RETBleed (Arbitrary Speculative Code Execution with Return Instructions) attack. [peterz: add hygon] [kim: invert parity; fam15h] Co-developed-by: Kim Phillips <[email protected]> Signed-off-by: Kim Phillips <[email protected]> Signed-off-by: Alexandre Chartre <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Borislav Petkov <[email protected]>
2022-06-22	x86/vmware: Use BIT() macro for shifting	Shreenidhi Shedi	1	-2/+2
	VMWARE_CMD_VCPU_RESERVED is bit 31 and that would mean undefined behavior when shifting an int but the kernel is built with -fno-strict-overflow which will wrap around using two's complement. Use the BIT() macro to improve readability and avoid any potential overflow confusion because it uses an unsigned long. [ bp: Clarify commit message. ] Signed-off-by: Shreenidhi Shedi <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Srivatsa S. Bhat (VMware) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-14	Merge tag 'x86-bugs-2022-06-01' of ↵	Linus Torvalds	2	-39/+248
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MMIO stale data fixes from Thomas Gleixner: "Yet another hw vulnerability with a software mitigation: Processor MMIO Stale Data. They are a class of MMIO-related weaknesses which can expose stale data by propagating it into core fill buffers. Data which can then be leaked using the usual speculative execution methods. Mitigations include this set along with microcode updates and are similar to MDS and TAA vulnerabilities: VERW now clears those buffers too" * tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/mmio: Print SMT warning KVM: x86/speculation: Disable Fill buffer clear within guests x86/speculation/mmio: Reuse SRBDS mitigation for SBDS x86/speculation/srbds: Update SRBDS mitigation selection x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data x86/speculation/mmio: Enable CPU Fill buffer clearing on idle x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data x86/speculation: Add a common function for MD_CLEAR mitigation update x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug Documentation: Add documentation for Processor MMIO Stale Data
2022-06-08	x86/cpu: Add new VMX feature, Tertiary VM-Execution control	Robert Hoo	1	-1/+8
	A new 64-bit control field "tertiary processor-based VM-execution controls", is defined [1]. It's controlled by bit 17 of the primary processor-based VM-execution controls. Different from its brother VM-execution fields, this tertiary VM- execution controls field is 64 bit. So it occupies 2 vmx_feature_leafs, TERTIARY_CTLS_LOW and TERTIARY_CTLS_HIGH. Its companion VMX capability reporting MSR,MSR_IA32_VMX_PROCBASED_CTLS3 (0x492), is also semantically different from its brothers, whose 64 bits consist of all allow-1, rather than 32-bit allow-0 and 32-bit allow-1 [1][2]. Therefore, its init_vmx_capabilities() is a little different from others. [1] ISE 6.2 "VMCS Changes" https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html [2] SDM Vol3. Appendix A.3 Reviewed-by: Sean Christopherson <[email protected]> Reviewed-by: Maxim Levitsky <[email protected]> Signed-off-by: Robert Hoo <[email protected]> Signed-off-by: Zeng Guang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-05	Merge tag 'x86-urgent-2022-06-05' of ↵	Linus Torvalds	3	-6/+115
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SGX fix from Thomas Gleixner: "A single fix for x86/SGX to prevent that memory which is allocated for an SGX enclave is accounted to the wrong memory control group" * tag 'x86-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Set active memcg prior to shmem allocation