aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu
AgeCommit message (Collapse)AuthorFilesLines
2024-01-08Merge tag 'x86_cpu_for_v6.8' of ↵Linus Torvalds4-135/+145
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu feature updates from Borislav Petkov: - Add synthetic X86_FEATURE flags for the different AMD Zen generations and use them everywhere instead of ad-hoc family/model checks. Drop an ancient AMD errata checking facility as a result - Fix a fragile initcall ordering in intel_epb - Do not issue the MFENCE+LFENCE barrier for the TSC deadline and X2APIC MSRs on AMD as it is not needed there * tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/CPU/AMD: Add X86_FEATURE_ZEN1 x86/CPU/AMD: Drop now unused CPU erratum checking function x86/CPU/AMD: Get rid of amd_erratum_1485[] x86/CPU/AMD: Get rid of amd_erratum_400[] x86/CPU/AMD: Get rid of amd_erratum_383[] x86/CPU/AMD: Get rid of amd_erratum_1054[] x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init function x86/CPU/AMD: Move Zenbleed check to the Zen2 init function x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common() x86/CPU/AMD: Call the spectral chicken in the Zen2 init function x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init function x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init function x86/CPU/AMD: Carve out the erratum 1386 fix x86/CPU/AMD: Add ZenX generations flags x86/cpu/intel_epb: Don't rely on link order x86/barrier: Do not serialize MSR accesses on AMD
2024-01-08Merge tag 'x86_microcode_for_v6.8' of ↵Linus Torvalds1-13/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode updates from Borislav Petkov: - Correct minor issues after the microcode revision reporting sanitization * tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/intel: Set new revision only after a successful update x86/microcode/intel: Remove redundant microcode late updated message
2024-01-03arch/x86: Fix typosBjorn Helgaas1-1/+1
Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes. Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-20x86/asm: Always set A (accessed) flag in GDT descriptorsVegard Nossum1-6/+6
We have no known use for having the CPU track whether GDT descriptors have been accessed or not. Simplify the code by adding the flag to the common flags and removing it everywhere else. Signed-off-by: Vegard Nossum <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-20x86/asm: Replace magic numbers in GDT descriptors, script-generated changeVegard Nossum1-20/+20
Actually replace the numeric values by the new symbolic values. I used this to find all the existing users of the GDT_ENTRY*() macros: $ git grep -P 'GDT_ENTRY(_INIT)?\(' Some of the lines will exceed 80 characters, but some of them will be shorter again in the next couple of patches. Signed-off-by: Vegard Nossum <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-20x86/asm: Replace magic numbers in GDT descriptors, preparationsVegard Nossum1-8/+0
We'd like to replace all the magic numbers in various GDT descriptors with new, semantically meaningful, symbolic values. In order to be able to verify that the change doesn't cause any actual changes to the compiled binary code, I've split the change into two patches: - Part 1 (this commit): everything _but_ actually replacing the numbers - Part 2 (the following commit): _only_ replacing the numbers The reason we need this split for verification is that including new headers causes some spurious changes to the object files, mostly line number changes in the debug info but occasionally other subtle codegen changes. Signed-off-by: Vegard Nossum <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-15x86/mce: Handle Intel threshold interrupt stormsTony Luck3-50/+160
Add an Intel specific hook into machine_check_poll() to keep track of per-CPU, per-bank corrected error logs (with a stub for the CONFIG_MCE_INTEL=n case). When a storm is observed the rate of interrupts is reduced by setting a large threshold value for this bank in IA32_MCi_CTL2. This bank is added to the bitmap of banks for this CPU to poll. The polling rate is increased to once per second. When a storm ends reset the threshold in IA32_MCi_CTL2 back to 1, remove the bank from the bitmap for polling, and change the polling rate back to the default. If a CPU with banks in storm mode is taken offline, the new CPU that inherits ownership of those banks takes over management of storm(s) in the inherited bank(s). The cmci_discover() function was already very large. These changes pushed it well over the top. Refactor with three helper functions to bring it back under control. Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-15x86/mce: Add per-bank CMCI storm mitigationTony Luck3-9/+194
This is the core functionality to track CMCI storms at the machine check bank granularity. Subsequent patches will add the vendor specific hooks to supply input to the storm detection and take actions on the start/end of a storm. machine_check_poll() is called both by the CMCI interrupt code, and for periodic polls from a timer. Add a hook in this routine to maintain a bitmap history for each bank showing whether the bank logged an corrected error or not each time it is polled. In normal operation the interval between polls of these banks determines how far to shift the history. The 64 bit width corresponds to about one second. When a storm is observed a CPU vendor specific action is taken to reduce or stop CMCI from the bank that is the source of the storm. The bank is added to the bitmap of banks for this CPU to poll. The polling rate is increased to once per second. During a storm each bit in the history indicates the status of the bank each time it is polled. Thus the history covers just over a minute. Declare a storm for that bank if the number of corrected interrupts seen in that history is above some threshold (defined as 5 in this series, could be tuned later if there is data to suggest a better value). A storm on a bank ends if enough consecutive polls of the bank show no corrected errors (defined as 30, may also change). That calls the CPU vendor specific function to revert to normal operational mode, and changes the polling rate back to the default. [ bp: Massage. ] Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-15x86/mce: Remove old CMCI storm mitigation codeTony Luck3-170/+1
When a "storm" of corrected machine check interrupts (CMCI) is detected this code mitigates by disabling CMCI interrupt signalling from all of the banks owned by the CPU that saw the storm. There are problems with this approach: 1) It is very coarse grained. In all likelihood only one of the banks was generating the interrupts, but CMCI is disabled for all. This means Linux may delay seeing and processing errors logged from other banks. 2) Although CMCI stands for Corrected Machine Check Interrupt, it is also used to signal when an uncorrected error is logged. This is a problem because these errors should be handled in a timely manner. Delete all this code in preparation for a finer grained solution. Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Yazen Ghannam <[email protected]> Tested-by: Yazen Ghannam <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-12x86/mce: Differentiate real hardware #MCs from TDX erratum onesKai Huang1-0/+15
The first few generations of TDX hardware have an erratum. Triggering it in Linux requires some kind of kernel bug involving relatively exotic memory writes to TDX private memory and will manifest via spurious-looking machine checks when reading the affected memory. Make an effort to detect these TDX-induced machine checks and spit out a new blurb to dmesg so folks do not think their hardware is failing. == Background == Virtually all kernel memory accesses operations happen in full cachelines. In practice, writing a "byte" of memory usually reads a 64 byte cacheline of memory, modifies it, then writes the whole line back. Those operations do not trigger this problem. This problem is triggered by "partial" writes where a write transaction of less than cacheline lands at the memory controller. The CPU does these via non-temporal write instructions (like MOVNTI), or through UC/WC memory mappings. The issue can also be triggered away from the CPU by devices doing partial writes via DMA. == Problem == A partial write to a TDX private memory cacheline will silently "poison" the line. Subsequent reads will consume the poison and generate a machine check. According to the TDX hardware spec, neither of these things should have happened. To add insult to injury, the Linux machine code will present these as a literal "Hardware error" when they were, in fact, a software-triggered issue. == Solution == In the end, this issue is hard to trigger. Rather than do something rash (and incomplete) like unmap TDX private memory from the direct map, improve the machine check handler. Currently, the #MC handler doesn't distinguish whether the memory is TDX private memory or not but just dump, for instance, below message: [...] mce: [Hardware Error]: CPU 147: Machine Check Exception: f Bank 1: bd80000000100134 [...] mce: [Hardware Error]: RIP 10:<ffffffffadb69870> {__tlb_remove_page_size+0x10/0xa0} ... [...] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel [...] Kernel panic - not syncing: Fatal local machine check Which says "Hardware Error" and "Data load in unrecoverable area of kernel". Ideally, it's better for the log to say "software bug around TDX private memory" instead of "Hardware Error". But in reality the real hardware memory error can happen, and sadly such software-triggered #MC cannot be distinguished from the real hardware error. Also, the error message is used by userspace tool 'mcelog' to parse, so changing the output may break userspace. So keep the "Hardware Error". The "Data load in unrecoverable area of kernel" is also helpful, so keep it too. Instead of modifying above error log, improve the error log by printing additional TDX related message to make the log like: ... [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel [...] mce: [Hardware Error]: Machine Check: TDX private memory error. Possible kernel bug. Adding this additional message requires determination of whether the memory page is TDX private memory. There is no existing infrastructure to do that. Add an interface to query the TDX module to fill this gap. == Impact == This issue requires some kind of kernel bug to trigger. TDX private memory should never be mapped UC/WC. A partial write originating from these mappings would require *two* bugs, first mapping the wrong page, then writing the wrong memory. It would also be detectable using traditional memory corruption techniques like DEBUG_PAGEALLOC. MOVNTI (and friends) could cause this issue with something like a simple buffer overrun or use-after-free on the direct map. It should also be detectable with normal debug techniques. The one place where this might get nasty would be if the CPU read data then wrote back the same data. That would trigger this problem but would not, for instance, set off mechanisms like slab redzoning because it doesn't actually corrupt data. With an IOMMU at least, the DMA exposure is similar to the UC/WC issue. TDX private memory would first need to be incorrectly mapped into the I/O space and then a later DMA to that mapping would actually cause the poisoning event. [ dhansen: changelog tweaks ] Signed-off-by: Kai Huang <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Yuan Yao <[email protected]> Reviewed-by: Dave Hansen <[email protected]> Reviewed-by: Tony Luck <[email protected]> Link: https://lore.kernel.org/all/20231208170740.53979-18-dave.hansen%40intel.com
2023-12-12x86/CPU/AMD: Add X86_FEATURE_ZEN1Borislav Petkov (AMD)1-5/+6
Add a synthetic feature flag specifically for first generation Zen machines. There's need to have a generic flag for all Zen generations so make X86_FEATURE_ZEN be that flag. Fixes: 30fa92832f40 ("x86/CPU/AMD: Add ZenX generations flags") Suggested-by: Brian Gerst <[email protected]> Suggested-by: Tom Lendacky <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-08x86/virt/tdx: Detect TDX during kernel bootKai Huang1-0/+2
Intel Trust Domain Extensions (TDX) protects guest VMs from malicious host and certain physical attacks. A CPU-attested software module called 'the TDX module' runs inside a new isolated memory range as a trusted hypervisor to manage and run protected VMs. Pre-TDX Intel hardware has support for a memory encryption architecture called MKTME. The memory encryption hardware underpinning MKTME is also used for Intel TDX. TDX ends up "stealing" some of the physical address space from the MKTME architecture for crypto-protection to VMs. The BIOS is responsible for partitioning the "KeyID" space between legacy MKTME and TDX. The KeyIDs reserved for TDX are called 'TDX private KeyIDs' or 'TDX KeyIDs' for short. During machine boot, TDX microcode verifies that the BIOS programmed TDX private KeyIDs consistently and correctly programmed across all CPU packages. The MSRs are locked in this state after verification. This is why MSR_IA32_MKTME_KEYID_PARTITIONING gets used for TDX enumeration: it indicates not just that the hardware supports TDX, but that all the boot-time security checks passed. The TDX module is expected to be loaded by the BIOS when it enables TDX, but the kernel needs to properly initialize it before it can be used to create and run any TDX guests. The TDX module will be initialized by the KVM subsystem when KVM wants to use TDX. Detect platform TDX support by detecting TDX private KeyIDs. The TDX module itself requires one TDX KeyID as the 'TDX global KeyID' to protect its metadata. Each TDX guest also needs a TDX KeyID for its own protection. Just use the first TDX KeyID as the global KeyID and leave the rest for TDX guests. If no TDX KeyID is left for TDX guests, disable TDX as initializing the TDX module alone is useless. [ dhansen: add X86_FEATURE, replace helper function ] Signed-off-by: Kai Huang <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Isaku Yamahata <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Dave Hansen <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]> Link: https://lore.kernel.org/all/20231208170740.53979-1-dave.hansen%40intel.com
2023-12-03x86/microcode/intel: Set new revision only after a successful updateBorislav Petkov (AMD)1-7/+7
This was meant to be done only when early microcode got updated successfully. Move it into the if-branch. Also, make sure the current revision is read unconditionally and only once. Fixes: 080990aa3344 ("x86/microcode: Rework early revisions reporting") Reported-by: Ashok Raj <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Ashok Raj <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-02x86/CPU/AMD: Check vendor in the AMD microcode callbackBorislav Petkov (AMD)1-0/+3
Commit in Fixes added an AMD-specific microcode callback. However, it didn't check the CPU vendor the kernel runs on explicitly. The only reason the Zenbleed check in it didn't run on other x86 vendors hardware was pure coincidental luck: if (!cpu_has_amd_erratum(c, amd_zenbleed)) return; gives true on other vendors because they don't have those families and models. However, with the removal of the cpu_has_amd_erratum() in 05f5f73936fa ("x86/CPU/AMD: Drop now unused CPU erratum checking function") that coincidental condition is gone, leading to the zenbleed check getting executed on other vendors too. Add the explicit vendor check for the whole callback as it should've been done in the first place. Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix") Cc: <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-01x86/microcode/intel: Remove redundant microcode late updated messageAshok Raj1-6/+0
After successful update, the late loading routine prints an update summary similar to: microcode: load: updated on 128 primary CPUs with 128 siblings microcode: revision: 0x21000170 -> 0x21000190 Remove the redundant message in the Intel side of the driver. [ bp: Massage commit message. ] Signed-off-by: Ashok Raj <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Drop now unused CPU erratum checking functionBorislav Petkov (AMD)1-56/+0
Bye bye. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Get rid of amd_erratum_1485[]Borislav Petkov (AMD)1-8/+3
No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Get rid of amd_erratum_400[]Borislav Petkov (AMD)1-13/+20
Setting X86_BUG_AMD_E400 in init_amd() is early enough. No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Get rid of amd_erratum_383[]Borislav Petkov (AMD)1-5/+1
Set it in init_amd_gh() unconditionally as that is the F10h init function. No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Get rid of amd_erratum_1054[]Borislav Petkov (AMD)1-5/+1
No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init functionBorislav Petkov (AMD)1-10/+3
No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Move Zenbleed check to the Zen2 init functionBorislav Petkov (AMD)1-13/+3
Prefix it properly so that it is clear which generation it is dealing with. No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common()Borislav Petkov (AMD)1-4/+7
Call it from all Zen init functions. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Call the spectral chicken in the Zen2 init functionBorislav Petkov (AMD)1-4/+3
No functional change. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init functionBorislav Petkov (AMD)1-5/+5
No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init functionBorislav Petkov (AMD)1-8/+9
No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Carve out the erratum 1386 fixBorislav Petkov (AMD)1-9/+15
Call it on the affected CPU generations. No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Add ZenX generations flagsBorislav Petkov (AMD)1-2/+68
Add X86_FEATURE flags for each Zen generation. They should be used from now on instead of checking f/m/s. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-28x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank typesMuralidhara M K1-0/+6
Add HWID and McaType values for new SMCA bank types. Signed-off-by: Muralidhara M K <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-27x86/mce/amd, EDAC/mce_amd: Move long names to decoder moduleYazen Ghannam1-44/+30
The long names of the SMCA banks are only used by the MCE decoder module. Move them out of the arch code and into the decoder module. [ bp: Name the long names array "smca_long_names", drop local ptr in decode_smca_error(), constify arrays. ] Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-26Merge tag 'x86-urgent-2023-11-26' of ↵Linus Torvalds4-48/+37
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode fixes from Ingo Molnar: "Fix/enhance x86 microcode version reporting: fix the bootup log spam, and remove the driver version announcement to avoid version confusion when distros backport fixes" * tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Rework early revisions reporting x86/microcode: Remove the driver announcement and version
2023-11-24x86/cpu/intel_epb: Don't rely on link orderJames Morse1-1/+1
intel_epb_init() is called as a subsys_initcall() to register cpuhp callbacks. The callbacks make use of get_cpu_device() which will return NULL unless register_cpu() has been called. register_cpu() is called from topology_init(), which is also a subsys_initcall(). This is fragile. Moving the register_cpu() to a different subsys_initcall() leads to a NULL dereference during boot. Make intel_epb_init() a late_initcall(), user-space can't provide a policy before this point anyway. Signed-off-by: James Morse <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Gavin Shan <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]>
2023-11-22x86/mce/inject: Clear test status valueYazen Ghannam1-0/+1
AMD systems generally allow MCA "simulation" where MCA registers can be written with valid data and the full MCA handling flow can be tested by software. However, the platform on Scalable MCA systems, can prevent software from writing data to the MCA registers. There is no architectural way to determine this configuration. Therefore, the MCE injection module will check for this behavior by writing and reading back a test status value. This is done during module init, and the check can run on any CPU with any valid MCA bank. If MCA_STATUS writes are ignored by the platform, then there are no side effects on the hardware state. If the writes are not ignored, then the test status value will remain in the hardware MCA_STATUS register. It is likely that the value will not be overwritten by hardware or software, since the tested CPU and bank are arbitrary. Therefore, the user may see a spurious, synthetic MCA error reported whenever MCA is polled for this CPU. Clear the test value immediately after writing it. It is very unlikely that a valid MCA error is logged by hardware during the test. Errors that cause an #MC won't be affected. Fixes: 891e465a1bd8 ("x86/mce: Check whether writes to MCA_STATUS are getting ignored") Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-22Merge tag 'hyperv-fixes-signed-20231121' of ↵Linus Torvalds1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - One fix for the KVP daemon (Ani Sinha) - Fix for the detection of E820_TYPE_PRAM in a Gen2 VM (Saurabh Sengar) - Micro-optimization for hv_nmi_unknown() (Uros Bizjak) * tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown() x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM hv/hv_kvp_daemon: Some small fixes for handling NM keyfiles
2023-11-22x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()Uros Bizjak1-1/+4
Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old in hv_nmi_unknown(). On x86 the CMPXCHG instruction returns success in the ZF flag, so this change saves a compare after CMPXCHG. The generated asm code improves from: 3e: 65 8b 15 00 00 00 00 mov %gs:0x0(%rip),%edx 45: b8 ff ff ff ff mov $0xffffffff,%eax 4a: f0 0f b1 15 00 00 00 lock cmpxchg %edx,0x0(%rip) 51: 00 52: 83 f8 ff cmp $0xffffffff,%eax 55: 0f 95 c0 setne %al to: 3e: 65 8b 15 00 00 00 00 mov %gs:0x0(%rip),%edx 45: b8 ff ff ff ff mov $0xffffffff,%eax 4a: f0 0f b1 15 00 00 00 lock cmpxchg %edx,0x0(%rip) 51: 00 52: 0f 95 c0 setne %al No functional change intended. Cc: "K. Y. Srinivasan" <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Wei Liu <[email protected]> Cc: Dexuan Cui <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Uros Bizjak <[email protected]> Reviewed-by: Michael Kelley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]> Message-ID: <[email protected]>
2023-11-21x86/microcode: Rework early revisions reportingBorislav Petkov (AMD)4-44/+37
The AMD side of the loader issues the microcode revision for each logical thread on the system, which can become really noisy on huge machines. And doing that doesn't make a whole lot of sense - the microcode revision is already in /proc/cpuinfo. So in case one is interested in the theoretical support of mixed silicon steppings on AMD, one can check there. What is also missing on the AMD side - something which people have requested before - is showing the microcode revision the CPU had *before* the early update. So abstract that up in the main code and have the BSP on each vendor provide those revision numbers. Then, dump them only once on driver init. On Intel, do not dump the patch date - it is not needed. Reported-by: Linus Torvalds <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/CAHk-=wg=%[email protected]
2023-11-21x86/microcode: Remove the driver announcement and versionBorislav Petkov (AMD)1-4/+0
First of all, the print is useless. The driver will either load and say which microcode revision the machine has or issue an error. Then, the version number is meaningless and actively confusing, as Yazen mentioned recently: when a subset of patches are backported to a distro kernel, one can't assume the driver version is the same as the upstream one. And besides, the version number of the loader hasn't been used and incremented for a long time. So drop it. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-20x86/mtrr: Document missing function parameters in kernel-docBorislav Petkov (AMD)1-4/+10
Add text explaining what they do. No functional changes. Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Reported-by: kernel test robot <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-15x86/mce: Remove redundant check from mce_device_create()Nikolay Borisov1-3/+0
mce_device_create() is called only from mce_cpu_online() which in turn will be called iff MCA support is available. That is, at the time of mce_device_create() call it's guaranteed that MCA support is available. No need to duplicate this check so remove it. [ bp: Massage commit message. ] Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-13x86/barrier: Do not serialize MSR accesses on AMDBorislav Petkov (AMD)3-0/+13
AMD does not have the requirement for a synchronization barrier when acccessing a certain group of MSRs. Do not incur that unnecessary penalty there. There will be a CPUID bit which explicitly states that a MFENCE is not needed. Once that bit is added to the APM, this will be extended with it. While at it, move to processor.h to avoid include hell. Untangling that file properly is a matter for another day. Some notes on the performance aspect of why this is relevant, courtesy of Kishon VijayAbraham <[email protected]>: On a AMD Zen4 system with 96 cores, a modified ipi-bench[1] on a VM shows x2AVIC IPI rate is 3% to 4% lower than AVIC IPI rate. The ipi-bench is modified so that the IPIs are sent between two vCPUs in the same CCX. This also requires to pin the vCPU to a physical core to prevent any latencies. This simulates the use case of pinning vCPUs to the thread of a single CCX to avoid interrupt IPI latency. In order to avoid run-to-run variance (for both x2AVIC and AVIC), the below configurations are done: 1) Disable Power States in BIOS (to prevent the system from going to lower power state) 2) Run the system at fixed frequency 2500MHz (to prevent the system from increasing the frequency when the load is more) With the above configuration: *) Performance measured using ipi-bench for AVIC: Average Latency: 1124.98ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.6759M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] *) Performance measured using ipi-bench for x2AVIC: Average Latency: 1172.42ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 40.9432M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] From above, x2AVIC latency is ~4% more than AVIC. However, the expectation is x2AVIC performance to be better or equivalent to AVIC. Upon analyzing the perf captures, it is observed significant time is spent in weak_wrmsr_fence() invoked by x2apic_send_IPI(). With the fix to skip weak_wrmsr_fence() *) Performance measured using ipi-bench for x2AVIC: Average Latency: 1117.44ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.9608M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] Comparing the performance of x2AVIC with and without the fix, it can be seen the performance improves by ~4%. Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option with and without weak_wrmsr_fence() on a Zen4 system also showed significant performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores CCX or CCD and just picks random vCPU. Average throughput (10 iterations) with weak_wrmsr_fence(), Cumulative throughput: 4933374 IPI/s Average throughput (10 iterations) without weak_wrmsr_fence(), Cumulative throughput: 6355156 IPI/s [1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-13x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernelZhiquan Li1-0/+16
Memory errors don't happen very often, especially fatal ones. However, in large-scale scenarios such as data centers, that probability increases with the amount of machines present. When a fatal machine check happens, mce_panic() is called based on the severity grading of that error. The page containing the error is not marked as poison. However, when kexec is enabled, tools like makedumpfile understand when pages are marked as poison and do not touch them so as not to cause a fatal machine check exception again while dumping the previous kernel's memory. Therefore, mark the page containing the error as poisoned so that the kexec'ed kernel can avoid accessing the page. [ bp: Rewrite commit message and comment. ] Co-developed-by: Youquan Song <[email protected]> Signed-off-by: Youquan Song <[email protected]> Signed-off-by: Zhiquan Li <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-04Merge tag 'x86_microcode_for_v6.7_rc1' of ↵Linus Torvalds5-828/+734
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Borislac Petkov: "Major microcode loader restructuring, cleanup and improvements by Thomas Gleixner: - Restructure the code needed for it and add a temporary initrd mapping on 32-bit so that the loader can access the microcode blobs. This in itself is a preparation for the next major improvement: - Do not load microcode on 32-bit before paging has been enabled. Handling this has caused an endless stream of headaches, issues, ugly code and unnecessary hacks in the past. And there really wasn't any sensible reason to do that in the first place. So switch the 32-bit loading to happen after paging has been enabled and turn the loader code "real purrty" again - Drop mixed microcode steppings loading on Intel - there, a single patch loaded on the whole system is sufficient - Rework late loading to track which CPUs have updated microcode successfully and which haven't, act accordingly - Move late microcode loading on Intel in NMI context in order to guarantee concurrent loading on all threads - Make the late loading CPU-hotplug-safe and have the offlined threads be woken up for the purpose of the update - Add support for a minimum revision which determines whether late microcode loading is safe on a machine and the microcode does not change software visible features which the machine cannot use anyway since feature detection has happened already. Roughly, the minimum revision is the smallest revision number which must be loaded currently on the system so that late updates can be allowed - Other nice leanups, fixess, etc all over the place" * tag 'x86_microcode_for_v6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) x86/microcode/intel: Add a minimum required revision for late loading x86/microcode: Prepare for minimal revision check x86/microcode: Handle "offline" CPUs correctly x86/apic: Provide apic_force_nmi_on_cpu() x86/microcode: Protect against instrumentation x86/microcode: Rendezvous and load in NMI x86/microcode: Replace the all-in-one rendevous handler x86/microcode: Provide new control functions x86/microcode: Add per CPU control field x86/microcode: Add per CPU result state x86/microcode: Sanitize __wait_for_cpus() x86/microcode: Clarify the late load logic x86/microcode: Handle "nosmt" correctly x86/microcode: Clean up mc_cpu_down_prep() x86/microcode: Get rid of the schedule work indirection x86/microcode: Mop up early loading leftovers x86/microcode/amd: Use cached microcode for AP load x86/microcode/amd: Cache builtin/initrd microcode early x86/microcode/amd: Cache builtin microcode too x86/microcode/amd: Use correct per CPU ucode_cpu_info ...
2023-11-01Merge tag 'sysctl-6.7-rc1' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. On the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove the sentinel. Both arch and driver changes have been on linux-next for a bit less than a month. It is worth re-iterating the value: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files For v6.8-rc1 expect removal of all the sentinels and also then the unneeded check for procname == NULL. The last two patches are fixes recently merged by Krister Johansen which allow us again to use softlockup_panic early on boot. This used to work but the alias work broke it. This is useful for folks who want to detect softlockups super early rather than wait and spend money on cloud solutions with nothing but an eventual hung kernel. Although this hadn't gone through linux-next it's also a stable fix, so we might as well roll through the fixes now" * tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits) watchdog: move softlockup_panic back to early_param proc: sysctl: prevent aliased sysctls from getting passed to init intel drm: Remove now superfluous sentinel element from ctl_table array Drivers: hv: Remove now superfluous sentinel element from ctl_table array raid: Remove now superfluous sentinel element from ctl_table array fw loader: Remove the now superfluous sentinel element from ctl_table array sgi-xp: Remove the now superfluous sentinel element from ctl_table array vrf: Remove the now superfluous sentinel element from ctl_table array char-misc: Remove the now superfluous sentinel element from ctl_table array infiniband: Remove the now superfluous sentinel element from ctl_table array macintosh: Remove the now superfluous sentinel element from ctl_table array parport: Remove the now superfluous sentinel element from ctl_table array scsi: Remove now superfluous sentinel element from ctl_table array tty: Remove now superfluous sentinel element from ctl_table array xen: Remove now superfluous sentinel element from ctl_table array hpet: Remove now superfluous sentinel element from ctl_table array c-sky: Remove now superfluous sentinel element from ctl_talbe array powerpc: Remove now superfluous sentinel element from ctl_table arrays riscv: Remove now superfluous sentinel element from ctl_table array x86/vdso: Remove now superfluous sentinel element from ctl_table array ...
2023-10-30Merge tag 'x86-core-2023-10-29-v2' of ↵Linus Torvalds13-136/+149
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Thomas Gleixner: - Limit the hardcoded topology quirk for Hygon CPUs to those which have a model ID less than 4. The newer models have the topology CPUID leaf 0xB correctly implemented and are not affected. - Make SMT control more robust against enumeration failures SMT control was added to allow controlling SMT at boottime or runtime. The primary purpose was to provide a simple mechanism to disable SMT in the light of speculation attack vectors. It turned out that the code is sensible to enumeration failures and worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration which means the primary thread mask is not set up correctly. By chance a XEN/PV boot ends up with smp_num_siblings == 2, which makes the hotplug control stay at its default value "enabled". So the mask is never evaluated. The ongoing rework of the topology evaluation caused XEN/PV to end up with smp_num_siblings == 1, which sets the SMT control to "not supported" and the empty primary thread mask causes the hotplug core to deny the bringup of the APS. Make the decision logic more robust and take 'not supported' and 'not implemented' into account for the decision whether a CPU should be booted or not. - Fake primary thread mask for XEN/PV Pretend that all XEN/PV vCPUs are primary threads, which makes the usage of the primary thread mask valid on XEN/PV. That is consistent with because all of the topology information on XEN/PV is fake or even non-existent. - Encapsulate topology information in cpuinfo_x86 Move the randomly scattered topology data into a separate data structure for readability and as a preparatory step for the topology evaluation overhaul. - Consolidate APIC ID data type to u32 It's fixed width hardware data and not randomly u16, int, unsigned long or whatever developers decided to use. - Cure the abuse of cpuinfo for persisting logical IDs. Per CPU cpuinfo is used to persist the logical package and die IDs. That's really not the right place simply because cpuinfo is subject to be reinitialized when a CPU goes through an offline/online cycle. Use separate per CPU data for the persisting to enable the further topology management rework. It will be removed once the new topology management is in place. - Provide a debug interface for inspecting topology information Useful in general and extremly helpful for validating the topology management rework in terms of correctness or "bug" compatibility. * tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too x86/cpu: Provide debug interface x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids x86/apic: Use u32 for wakeup_secondary_cpu[_64]() x86/apic: Use u32 for [gs]et_apic_id() x86/apic: Use u32 for phys_pkg_id() x86/apic: Use u32 for cpu_present_to_apicid() x86/apic: Use u32 for check_apicid_used() x86/apic: Use u32 for APIC IDs in global data x86/apic: Use BAD_APICID consistently x86/cpu: Move cpu_l[l2]c_id into topology info x86/cpu: Move logical package and die IDs into topology info x86/cpu: Remove pointless evaluation of x86_coreid_bits x86/cpu: Move cu_id into topology info x86/cpu: Move cpu_core_id into topology info hwmon: (fam15h_power) Use topology_core_id() scsi: lpfc: Use topology_core_id() x86/cpu: Move cpu_die_id into topology info x86/cpu: Move phys_proc_id into topology info x86/cpu: Encapsulate topology information in cpuinfo_x86 ...
2023-10-30Merge tag 'x86-mm-2023-10-28' of ↵Linus Torvalds1-17/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm handling updates from Ingo Molnar: - Add new NX-stack self-test - Improve NUMA partial-CFMWS handling - Fix #VC handler bugs resulting in SEV-SNP boot failures - Drop the 4MB memory size restriction on minimal NUMA nodes - Reorganize headers a bit, in preparation to header dependency reduction efforts - Misc cleanups & fixes * tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size selftests/x86/lam: Zero out buffer for readlink() x86/sev: Drop unneeded #include x86/sev: Move sev_setup_arch() to mem_encrypt.c x86/tdx: Replace deprecated strncpy() with strtomem_pad() selftests/x86/mm: Add new test that userspace stack is in fact NX x86/sev: Make boot_ghcb_page[] static x86/boot: Move x86_cache_alignment initialization to correct spot x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach x86/sev-es: Allow copy_from_kernel_nofault() in earlier boot x86_64: Show CR4.PSE on auxiliaries like on BSP x86/iommu/docs: Update AMD IOMMU specification document URL x86/sev/docs: Update document URL in amd-memory-encryption.rst x86/mm: Move arch_memory_failure() and arch_is_platform_page() definitions from <asm/processor.h> to <asm/pgtable.h> ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window x86/numa: Introduce numa_fill_memblks()
2023-10-30Merge tag 'x86-entry-2023-10-28' of ↵Linus Torvalds1-18/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry updates from Ingo Molnar: - Make IA32_EMULATION boot time configurable with the new ia32_emulation=<bool> boot option - Clean up fast syscall return validation code: convert it to C and refactor the code - As part of this, optimize the canonical RIP test code * tag 'x86-entry-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/32: Clean up syscall fast exit tests x86/entry/64: Use TASK_SIZE_MAX for canonical RIP test x86/entry/64: Convert SYSRET validation tests to C x86/entry/32: Remove SEP test for SYSEXIT x86/entry/32: Convert do_fast_syscall_32() to bool return type x86/entry/compat: Combine return value test from syscall handler x86/entry/64: Remove obsolete comment on tracing vs. SYSRET x86: Make IA32_EMULATION boot time configurable x86/entry: Make IA32 syscalls' availability depend on ia32_enabled() x86/elf: Make loading of 32bit processes depend on ia32_enabled() x86/entry: Compile entry_SYSCALL32_ignore() unconditionally x86/entry: Rename ignore_sysret() x86: Introduce ia32_enabled()
2023-10-30Merge tag 'x86_cpu_for_6.7_rc1' of ↵Linus Torvalds2-1/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: - Make sure the "svm" feature flag is cleared from /proc/cpuinfo when virtualization support is disabled in the BIOS on AMD and Hygon platforms - A minor cleanup * tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Remove redundant 'break' statement x86/cpu: Clear SVM feature if disabled by BIOS
2023-10-30Merge tag 'x86_cache_for_6.7_rc1' of ↵Linus Torvalds4-95/+242
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Add support for non-contiguous capacity bitmasks being added to Intel's CAT implementation - Other improvements to resctrl code: better configuration, simplifications, debugging support, fixes * tag 'x86_cache_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Display RMID of resource group x86/resctrl: Add support for the files of MON groups only x86/resctrl: Display CLOSID for resource group x86/resctrl: Introduce "-o debug" mount option x86/resctrl: Move default group file creation to mount x86/resctrl: Unwind properly from rdt_enable_ctx() x86/resctrl: Rename rftype flags for consistency x86/resctrl: Simplify rftype flag definitions x86/resctrl: Add multiple tasks to the resctrl group at once Documentation/x86: Document resctrl's new sparse_masks x86/resctrl: Add sparse_masks file in info x86/resctrl: Enable non-contiguous CBMs in Intel CAT x86/resctrl: Rename arch_has_sparse_bitmaps x86/resctrl: Fix remaining kernel-doc warnings
2023-10-30Merge tag 'x86_bugs_for_6.7_rc1' of ↵Linus Torvalds1-45/+50
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 hw mitigation updates from Borislav Petkov: - A bunch of improvements, cleanups and fixlets to the SRSO mitigation machinery and other, general cleanups to the hw mitigations code, by Josh Poimboeuf - Improve the return thunk detection by objtool as it is absolutely important that the default return thunk is not used after returns have been patched. Future work to detect and report this better is pending - Other misc cleanups and fixes * tag 'x86_bugs_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/retpoline: Document some thunk handling aspects x86/retpoline: Make sure there are no unconverted return thunks due to KCSAN x86/callthunks: Delete unused "struct thunk_desc" x86/vdso: Run objtool on vdso32-setup.o objtool: Fix return thunk patching in retpolines x86/srso: Remove unnecessary semicolon x86/pti: Fix kernel warnings for pti= and nopti cmdline options x86/calldepth: Rename __x86_return_skl() to call_depth_return_thunk() x86/nospec: Refactor UNTRAIN_RET[_*] x86/rethunk: Use SYM_CODE_START[_LOCAL]_NOALIGN macros x86/srso: Disentangle rethunk-dependent options x86/srso: Move retbleed IBPB check into existing 'has_microcode' code block x86/bugs: Remove default case for fully switched enums x86/srso: Remove 'pred_cmd' label x86/srso: Unexport untraining functions x86/srso: Improve i-cache locality for alias mitigation x86/srso: Fix unret validation dependencies x86/srso: Fix vulnerability reporting for missing microcode x86/srso: Print mitigation for retbleed IBPB case x86/srso: Print actual mitigation if requested mitigation isn't possible ...
2023-10-24x86/microcode/intel: Add a minimum required revision for late loadingAshok Raj1-4/+33
In general users, don't have the necessary information to determine whether late loading of a new microcode version is safe and does not modify anything which the currently running kernel uses already, e.g. removal of CPUID bits or behavioural changes of MSRs. To address this issue, Intel has added a "minimum required version" field to a previously reserved field in the microcode header. Microcode updates should only be applied if the current microcode version is equal to, or greater than this minimum required version. Thomas made some suggestions on how meta-data in the microcode file could provide Linux with information to decide if the new microcode is suitable candidate for late loading. But even the "simpler" option requires a lot of metadata and corresponding kernel code to parse it, so the final suggestion was to add the 'minimum required version' field in the header. When microcode changes visible features, microcode will set the minimum required version to its own revision which prevents late loading. Old microcode blobs have the minimum revision field always set to 0, which indicates that there is no information and the kernel considers it unsafe. This is a pure OS software mechanism. The hardware/firmware ignores this header field. For early loading there is no restriction because OS visible features are enumerated after the early load and therefore a change has no effect. The check is always enabled, but by default not enforced. It can be enforced via Kconfig or kernel command line. If enforced, the kernel refuses to late load microcode with a minimum required version field which is zero or when the currently loaded microcode revision is smaller than the minimum required revision. If not enforced the load happens independent of the revision check to stay compatible with the existing behaviour, but it influences the decision whether the kernel is tainted or not. If the check signals that the late load is safe, then the kernel is not tainted. Early loading is not affected by this. [ tglx: Massaged changelog and fixed up the implementation ] Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Ashok Raj <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]