aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu
AgeCommit message (Collapse)AuthorFilesLines
2024-01-25x86/resctrl: Remove redundant variable in mbm_config_write_domain()Babu Moger1-11/+4
The kernel test robot reported the following warning after commit 54e35eb8611c ("x86/resctrl: Read supported bandwidth sources from CPUID"). even though the issue is present even in the original commit 92bd5a139033 ("x86/resctrl: Add interface to write mbm_total_bytes_config") which added this function. The reported warning is: $ make C=1 CHECK=scripts/coccicheck arch/x86/kernel/cpu/resctrl/rdtgroup.o ... arch/x86/kernel/cpu/resctrl/rdtgroup.c:1621:5-8: Unneeded variable: "ret". Return "0" on line 1655 Remove the local variable 'ret'. [ bp: Massage commit message, make mbm_config_write_domain() void. ] Fixes: 92bd5a139033 ("x86/resctrl: Add interface to write mbm_total_bytes_config") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-24x86/resctrl: Implement new mba_MBps throttling heuristicTony Luck2-36/+10
The mba_MBps feedback loop increases throttling when a group is using more bandwidth than the target set by the user in the schemata file, and decreases throttling when below target. To avoid possibly stepping throttling up and down on every poll a flag "delta_comp" is set whenever throttling is changed to indicate that the actual change in bandwidth should be recorded on the next poll in "delta_bw". Throttling is only reduced if the current bandwidth plus delta_bw is below the user target. This algorithm works well if the workload has steady bandwidth needs. But it can go badly wrong if the workload moves to a different phase just as the throttling level changed. E.g. if the workload becomes essentially idle right as throttling level is increased, the value calculated for delta_bw will be more or less the old bandwidth level. If the workload then resumes, Linux may never reduce throttling because current bandwidth plus delta_bw is above the target set by the user. Implement a simpler heuristic by assuming that in the worst case the currently measured bandwidth is being controlled by the current level of throttling. Compute how much it may increase if throttling is relaxed to the next higher level. If that is still below the user target, then it is ok to reduce the amount of throttling. Fixes: ba0f26d8529c ("x86/intel_rdt/mba_sc: Prepare for feedback loop") Reported-by: Xiaochen Shen <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Xiaochen Shen <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-23x86/resctrl: Read supported bandwidth sources from CPUIDBabu Moger3-6/+17
If the BMEC (Bandwidth Monitoring Event Configuration) feature is supported, the bandwidth events can be configured. The maximum supported bandwidth bitmask can be read from CPUID: CPUID_Fn80000020_ECX_x03 [Platform QoS Monitoring Bandwidth Event Configuration] Bits Description 31:7 Reserved 6:0 Identifies the bandwidth sources that can be tracked. While at it, move the mask checking to mon_config_write() before iterating over all the domains. Also, print the valid bitmask when the user tries to configure invalid event configuration value. The CPUID details are documented in the Processor Programming Reference (PPR) Vol 1.1 for AMD Family 19h Model 11h B1 - 55901 Rev 0.25 in the Link tag. Fixes: dc2a3e857981 ("x86/resctrl: Add interface to read mbm_total_bytes_config") Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Link: https://lore.kernel.org/r/669896fa512c7451319fa5ca2fdb6f7e015b5635.1705359148.git.babu.moger@amd.com
2024-01-23x86/resctrl: Remove hard-coded memory bandwidth limitBabu Moger2-7/+4
The QOS Memory Bandwidth Enforcement Limit is reported by CPUID_Fn80000020_EAX_x01 and CPUID_Fn80000020_EAX_x02: Bits Description 31:0 BW_LEN: Size of the QOS Memory Bandwidth Enforcement Limit. Newer processors can support higher bandwidth limit than the current hard-coded value. Remove latter and detect using CPUID instead. Also, update the register variables eax and edx to match the AMD CPUID definition. The CPUID details are documented in the Processor Programming Reference (PPR) Vol 1.1 for AMD Family 19h Model 11h B1 - 55901 Rev 0.25 in the Link tag below. Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Link: https://lore.kernel.org/r/c26a8ca79d399ed076cf8bf2e9fbc58048808289.1705359148.git.babu.moger@amd.com
2024-01-23x86/CPU/AMD: Add X86_FEATURE_ZEN5Borislav Petkov (AMD)1-4/+21
Add a synthetic feature flag for Zen5. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-22x86/resctrl: Fix unused variable warning in cache_alloc_hsw_probe()Tony Luck1-4/+4
In a "W=1" build gcc throws a warning: arch/x86/kernel/cpu/resctrl/core.c: In function ‘cache_alloc_hsw_probe’: arch/x86/kernel/cpu/resctrl/core.c:139:16: warning: variable ‘h’ set but not used Switch from wrmsr_safe() to wrmsrl_safe(), and from rdmsr() to rdmsrl() using a single u64 argument for the MSR value instead of the pair of u32 for the high and low halves. Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Babu Moger <[email protected]> Acked-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/r/ZULCd/TGJL9Dmncf@agluck-desk3
2024-01-18Merge tag 'x86_tdx_for_6.8' of ↵Linus Torvalds2-1/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TDX updates from Dave Hansen: "This contains the initial support for host-side TDX support so that KVM can run TDX-protected guests. This does not include the actual KVM-side support which will come from the KVM folks. The TDX host interactions with kexec also needs to be ironed out before this is ready for prime time, so this code is currently Kconfig'd off when kexec is on. The majority of the code here is the kernel telling the TDX module which memory to protect and handing some additional memory over to it to use to store TDX module metadata. That sounds pretty simple, but the TDX architecture is rather flexible and it takes quite a bit of back-and-forth to say, "just protect all memory, please." There is also some code tacked on near the end of the series to handle a hardware erratum. The erratum can make software bugs such as a kernel write to TDX-protected memory cause a machine check and masquerade as a real hardware failure. The erratum handling watches out for these and tries to provide nicer user errors" * tag 'x86_tdx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/virt/tdx: Make TDX host depend on X86_MCE x86/virt/tdx: Disable TDX host support when kexec is enabled Documentation/x86: Add documentation for TDX host support x86/mce: Differentiate real hardware #MCs from TDX erratum ones x86/cpu: Detect TDX partial write machine check erratum x86/virt/tdx: Handle TDX interaction with sleep and hibernation x86/virt/tdx: Initialize all TDMRs x86/virt/tdx: Configure global KeyID on all packages x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID x86/virt/tdx: Designate reserved areas for all TDMRs x86/virt/tdx: Allocate and set up PAMTs for TDMRs x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions x86/virt/tdx: Get module global metadata for module initialization x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory x86/virt/tdx: Add skeleton to enable TDX on demand x86/virt/tdx: Add SEAMCALL error printing for module initialization x86/virt/tdx: Handle SEAMCALL no entropy error in common code x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC x86/virt/tdx: Define TDX supported page sizes as macros ...
2024-01-10x86/bugs: Rename CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSOBreno Leitao1-4/+4
Step 9/10 of the namespace unification of CPU mitigations related Kconfig options. Suggested-by: Josh Poimboeuf <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-10x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY => CONFIG_MITIGATION_IBRS_ENTRYBreno Leitao1-2/+2
Step 8/10 of the namespace unification of CPU mitigations related Kconfig options. Suggested-by: Josh Poimboeuf <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-10x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRYBreno Leitao2-4/+4
Step 7/10 of the namespace unification of CPU mitigations related Kconfig options. Suggested-by: Josh Poimboeuf <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-10x86/bugs: Rename CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINEBreno Leitao1-3/+3
Step 5/10 of the namespace unification of CPU mitigations related Kconfig options. [ mingo: Converted a few more uses in comments/messages as well. ] Suggested-by: Josh Poimboeuf <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Ariel Miculas <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-10x86/bugs: Rename CONFIG_CALL_DEPTH_TRACKING => ↵Breno Leitao1-3/+3
CONFIG_MITIGATION_CALL_DEPTH_TRACKING Step 3/10 of the namespace unification of CPU mitigations related Kconfig options. Suggested-by: Josh Poimboeuf <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-10x86/bugs: Rename CONFIG_CPU_IBPB_ENTRY => CONFIG_MITIGATION_IBPB_ENTRYBreno Leitao1-5/+6
Step 2/10 of the namespace unification of CPU mitigations related Kconfig options. Suggested-by: Josh Poimboeuf <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-10x86/bugs: Rename CONFIG_GDS_FORCE_MITIGATION => CONFIG_MITIGATION_GDS_FORCEBreno Leitao1-1/+1
So the CPU mitigations Kconfig entries - there's 10 meanwhile - are named in a historically idiosyncratic and hence rather inconsistent fashion and have become hard to relate with each other over the years: https://lore.kernel.org/lkml/20231011044252.42bplzjsam3qsasz@treble/ When they were introduced we never expected that we'd eventually have about a dozen of them, and that more organization would be useful, especially for Linux distributions that want to enable them in an informed fashion, and want to make sure all mitigations are configured as expected. For example, the current CONFIG_SPECULATION_MITIGATIONS namespace is only halfway populated, where some mitigations have entries in Kconfig, and they could be modified, while others mitigations do not have Kconfig entries, and can not be controlled at build time. Fine-grained control over these Kconfig entries can help in a number of ways: 1) Users can choose and pick only mitigations that are important for their workloads. 2) Users and developers can choose to disable mitigations that mangle the assembly code generation, making it hard to read. 3) Separate Kconfigs for just source code readability, so that we see *which* butt-ugly piece of crap code is for what reason... In most cases, if a mitigation is disabled at compilation time, it can still be enabled at runtime using kernel command line arguments. This is the first patch of an initial series that renames various mitigation related Kconfig options, unifying them under a single CONFIG_MITIGATION_* namespace: CONFIG_GDS_FORCE_MITIGATION => CONFIG_MITIGATION_GDS_FORCE CONFIG_CPU_IBPB_ENTRY => CONFIG_MITIGATION_IBPB_ENTRY CONFIG_CALL_DEPTH_TRACKING => CONFIG_MITIGATION_CALL_DEPTH_TRACKING CONFIG_PAGE_TABLE_ISOLATION => CONFIG_MITIGATION_PAGE_TABLE_ISOLATION CONFIG_RETPOLINE => CONFIG_MITIGATION_RETPOLINE CONFIG_SLS => CONFIG_MITIGATION_SLS CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY CONFIG_CPU_IBRS_ENTRY => CONFIG_MITIGATION_IBRS_ENTRY CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSO CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK Implement step 1/10 of the namespace unification of CPU mitigations related Kconfig options and rename CONFIG_GDS_FORCE_MITIGATION to CONFIG_MITIGATION_GDS_FORCE. [ mingo: Rewrote changelog for clarity. ] Suggested-by: Josh Poimboeuf <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-08Merge tag 'x86-cleanups-2024-01-08' of ↵Linus Torvalds2-5/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: - Change global variables to local - Add missing kernel-doc function parameter descriptions - Remove unused parameter from a macro - Remove obsolete Kconfig entry - Fix comments - Fix typos, mostly scripted, manually reviewed and a micro-optimization got misplaced as a cleanup: - Micro-optimize the asm code in secondary_startup_64_no_verify() * tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch/x86: Fix typos x86/head_64: Use TESTB instead of TESTL in secondary_startup_64_no_verify() x86/docs: Remove reference to syscall trampoline in PTI x86/Kconfig: Remove obsolete config X86_32_SMP x86/io: Remove the unused 'bw' parameter from the BUILDIO() macro x86/mtrr: Document missing function parameters in kernel-doc x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()
2024-01-08Merge tag 'x86-asm-2024-01-08' of ↵Linus Torvalds1-29/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "Replace magic numbers in GDT descriptor definitions & handling: - Introduce symbolic names via macros for descriptor types/fields/flags, and then use these symbolic names. - Clean up definitions a bit, such as GDT_ENTRY_INIT() - Fix/clean up details that became visibly inconsistent after the symbol-based code was introduced: - Unify accessed flag handling - Set the D/B size flag consistently & according to the HW specification" * tag 'x86-asm-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Add DB flag to 32-bit percpu GDT entry x86/asm: Always set A (accessed) flag in GDT descriptors x86/asm: Replace magic numbers in GDT descriptors, script-generated change x86/asm: Replace magic numbers in GDT descriptors, preparations x86/asm: Provide new infrastructure for GDT descriptors
2024-01-08Merge tag 'ras_core_for_v6.8' of ↵Linus Torvalds6-253/+385
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Borislav Petkov: - Convert the hw error storm handling into a finer-grained, per-bank solution which allows for more timely detection and reporting of errors - Start a documentation section which will hold down relevant RAS features description and how they should be used - Add new AMD error bank types - Slim down and remove error type descriptions from the kernel side of error decoding to rasdaemon which can be used from now on to decode hw errors on AMD - Mark pages containing uncorrectable errors as poison so that kdump can avoid them and thus not cause another panic - The usual cleanups and fixlets * tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Handle Intel threshold interrupt storms x86/mce: Add per-bank CMCI storm mitigation x86/mce: Remove old CMCI storm mitigation code Documentation: Begin a RAS section x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types EDAC/mce_amd: Remove SMCA Extended Error code descriptions x86/mce/amd, EDAC/mce_amd: Move long names to decoder module x86/mce/inject: Clear test status value x86/mce: Remove redundant check from mce_device_create() x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
2024-01-08Merge tag 'x86_cpu_for_v6.8' of ↵Linus Torvalds4-135/+145
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu feature updates from Borislav Petkov: - Add synthetic X86_FEATURE flags for the different AMD Zen generations and use them everywhere instead of ad-hoc family/model checks. Drop an ancient AMD errata checking facility as a result - Fix a fragile initcall ordering in intel_epb - Do not issue the MFENCE+LFENCE barrier for the TSC deadline and X2APIC MSRs on AMD as it is not needed there * tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/CPU/AMD: Add X86_FEATURE_ZEN1 x86/CPU/AMD: Drop now unused CPU erratum checking function x86/CPU/AMD: Get rid of amd_erratum_1485[] x86/CPU/AMD: Get rid of amd_erratum_400[] x86/CPU/AMD: Get rid of amd_erratum_383[] x86/CPU/AMD: Get rid of amd_erratum_1054[] x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init function x86/CPU/AMD: Move Zenbleed check to the Zen2 init function x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common() x86/CPU/AMD: Call the spectral chicken in the Zen2 init function x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init function x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init function x86/CPU/AMD: Carve out the erratum 1386 fix x86/CPU/AMD: Add ZenX generations flags x86/cpu/intel_epb: Don't rely on link order x86/barrier: Do not serialize MSR accesses on AMD
2024-01-08Merge tag 'x86_microcode_for_v6.8' of ↵Linus Torvalds1-13/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode updates from Borislav Petkov: - Correct minor issues after the microcode revision reporting sanitization * tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/intel: Set new revision only after a successful update x86/microcode/intel: Remove redundant microcode late updated message
2024-01-03arch/x86: Fix typosBjorn Helgaas1-1/+1
Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes. Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-20x86/asm: Always set A (accessed) flag in GDT descriptorsVegard Nossum1-6/+6
We have no known use for having the CPU track whether GDT descriptors have been accessed or not. Simplify the code by adding the flag to the common flags and removing it everywhere else. Signed-off-by: Vegard Nossum <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-20x86/asm: Replace magic numbers in GDT descriptors, script-generated changeVegard Nossum1-20/+20
Actually replace the numeric values by the new symbolic values. I used this to find all the existing users of the GDT_ENTRY*() macros: $ git grep -P 'GDT_ENTRY(_INIT)?\(' Some of the lines will exceed 80 characters, but some of them will be shorter again in the next couple of patches. Signed-off-by: Vegard Nossum <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-20x86/asm: Replace magic numbers in GDT descriptors, preparationsVegard Nossum1-8/+0
We'd like to replace all the magic numbers in various GDT descriptors with new, semantically meaningful, symbolic values. In order to be able to verify that the change doesn't cause any actual changes to the compiled binary code, I've split the change into two patches: - Part 1 (this commit): everything _but_ actually replacing the numbers - Part 2 (the following commit): _only_ replacing the numbers The reason we need this split for verification is that including new headers causes some spurious changes to the object files, mostly line number changes in the debug info but occasionally other subtle codegen changes. Signed-off-by: Vegard Nossum <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-15x86/mce: Handle Intel threshold interrupt stormsTony Luck3-50/+160
Add an Intel specific hook into machine_check_poll() to keep track of per-CPU, per-bank corrected error logs (with a stub for the CONFIG_MCE_INTEL=n case). When a storm is observed the rate of interrupts is reduced by setting a large threshold value for this bank in IA32_MCi_CTL2. This bank is added to the bitmap of banks for this CPU to poll. The polling rate is increased to once per second. When a storm ends reset the threshold in IA32_MCi_CTL2 back to 1, remove the bank from the bitmap for polling, and change the polling rate back to the default. If a CPU with banks in storm mode is taken offline, the new CPU that inherits ownership of those banks takes over management of storm(s) in the inherited bank(s). The cmci_discover() function was already very large. These changes pushed it well over the top. Refactor with three helper functions to bring it back under control. Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-15x86/mce: Add per-bank CMCI storm mitigationTony Luck3-9/+194
This is the core functionality to track CMCI storms at the machine check bank granularity. Subsequent patches will add the vendor specific hooks to supply input to the storm detection and take actions on the start/end of a storm. machine_check_poll() is called both by the CMCI interrupt code, and for periodic polls from a timer. Add a hook in this routine to maintain a bitmap history for each bank showing whether the bank logged an corrected error or not each time it is polled. In normal operation the interval between polls of these banks determines how far to shift the history. The 64 bit width corresponds to about one second. When a storm is observed a CPU vendor specific action is taken to reduce or stop CMCI from the bank that is the source of the storm. The bank is added to the bitmap of banks for this CPU to poll. The polling rate is increased to once per second. During a storm each bit in the history indicates the status of the bank each time it is polled. Thus the history covers just over a minute. Declare a storm for that bank if the number of corrected interrupts seen in that history is above some threshold (defined as 5 in this series, could be tuned later if there is data to suggest a better value). A storm on a bank ends if enough consecutive polls of the bank show no corrected errors (defined as 30, may also change). That calls the CPU vendor specific function to revert to normal operational mode, and changes the polling rate back to the default. [ bp: Massage. ] Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-15x86/mce: Remove old CMCI storm mitigation codeTony Luck3-170/+1
When a "storm" of corrected machine check interrupts (CMCI) is detected this code mitigates by disabling CMCI interrupt signalling from all of the banks owned by the CPU that saw the storm. There are problems with this approach: 1) It is very coarse grained. In all likelihood only one of the banks was generating the interrupts, but CMCI is disabled for all. This means Linux may delay seeing and processing errors logged from other banks. 2) Although CMCI stands for Corrected Machine Check Interrupt, it is also used to signal when an uncorrected error is logged. This is a problem because these errors should be handled in a timely manner. Delete all this code in preparation for a finer grained solution. Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Yazen Ghannam <[email protected]> Tested-by: Yazen Ghannam <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-12x86/mce: Differentiate real hardware #MCs from TDX erratum onesKai Huang1-0/+15
The first few generations of TDX hardware have an erratum. Triggering it in Linux requires some kind of kernel bug involving relatively exotic memory writes to TDX private memory and will manifest via spurious-looking machine checks when reading the affected memory. Make an effort to detect these TDX-induced machine checks and spit out a new blurb to dmesg so folks do not think their hardware is failing. == Background == Virtually all kernel memory accesses operations happen in full cachelines. In practice, writing a "byte" of memory usually reads a 64 byte cacheline of memory, modifies it, then writes the whole line back. Those operations do not trigger this problem. This problem is triggered by "partial" writes where a write transaction of less than cacheline lands at the memory controller. The CPU does these via non-temporal write instructions (like MOVNTI), or through UC/WC memory mappings. The issue can also be triggered away from the CPU by devices doing partial writes via DMA. == Problem == A partial write to a TDX private memory cacheline will silently "poison" the line. Subsequent reads will consume the poison and generate a machine check. According to the TDX hardware spec, neither of these things should have happened. To add insult to injury, the Linux machine code will present these as a literal "Hardware error" when they were, in fact, a software-triggered issue. == Solution == In the end, this issue is hard to trigger. Rather than do something rash (and incomplete) like unmap TDX private memory from the direct map, improve the machine check handler. Currently, the #MC handler doesn't distinguish whether the memory is TDX private memory or not but just dump, for instance, below message: [...] mce: [Hardware Error]: CPU 147: Machine Check Exception: f Bank 1: bd80000000100134 [...] mce: [Hardware Error]: RIP 10:<ffffffffadb69870> {__tlb_remove_page_size+0x10/0xa0} ... [...] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel [...] Kernel panic - not syncing: Fatal local machine check Which says "Hardware Error" and "Data load in unrecoverable area of kernel". Ideally, it's better for the log to say "software bug around TDX private memory" instead of "Hardware Error". But in reality the real hardware memory error can happen, and sadly such software-triggered #MC cannot be distinguished from the real hardware error. Also, the error message is used by userspace tool 'mcelog' to parse, so changing the output may break userspace. So keep the "Hardware Error". The "Data load in unrecoverable area of kernel" is also helpful, so keep it too. Instead of modifying above error log, improve the error log by printing additional TDX related message to make the log like: ... [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel [...] mce: [Hardware Error]: Machine Check: TDX private memory error. Possible kernel bug. Adding this additional message requires determination of whether the memory page is TDX private memory. There is no existing infrastructure to do that. Add an interface to query the TDX module to fill this gap. == Impact == This issue requires some kind of kernel bug to trigger. TDX private memory should never be mapped UC/WC. A partial write originating from these mappings would require *two* bugs, first mapping the wrong page, then writing the wrong memory. It would also be detectable using traditional memory corruption techniques like DEBUG_PAGEALLOC. MOVNTI (and friends) could cause this issue with something like a simple buffer overrun or use-after-free on the direct map. It should also be detectable with normal debug techniques. The one place where this might get nasty would be if the CPU read data then wrote back the same data. That would trigger this problem but would not, for instance, set off mechanisms like slab redzoning because it doesn't actually corrupt data. With an IOMMU at least, the DMA exposure is similar to the UC/WC issue. TDX private memory would first need to be incorrectly mapped into the I/O space and then a later DMA to that mapping would actually cause the poisoning event. [ dhansen: changelog tweaks ] Signed-off-by: Kai Huang <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Yuan Yao <[email protected]> Reviewed-by: Dave Hansen <[email protected]> Reviewed-by: Tony Luck <[email protected]> Link: https://lore.kernel.org/all/20231208170740.53979-18-dave.hansen%40intel.com
2023-12-12x86/CPU/AMD: Add X86_FEATURE_ZEN1Borislav Petkov (AMD)1-5/+6
Add a synthetic feature flag specifically for first generation Zen machines. There's need to have a generic flag for all Zen generations so make X86_FEATURE_ZEN be that flag. Fixes: 30fa92832f40 ("x86/CPU/AMD: Add ZenX generations flags") Suggested-by: Brian Gerst <[email protected]> Suggested-by: Tom Lendacky <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-08x86/virt/tdx: Detect TDX during kernel bootKai Huang1-0/+2
Intel Trust Domain Extensions (TDX) protects guest VMs from malicious host and certain physical attacks. A CPU-attested software module called 'the TDX module' runs inside a new isolated memory range as a trusted hypervisor to manage and run protected VMs. Pre-TDX Intel hardware has support for a memory encryption architecture called MKTME. The memory encryption hardware underpinning MKTME is also used for Intel TDX. TDX ends up "stealing" some of the physical address space from the MKTME architecture for crypto-protection to VMs. The BIOS is responsible for partitioning the "KeyID" space between legacy MKTME and TDX. The KeyIDs reserved for TDX are called 'TDX private KeyIDs' or 'TDX KeyIDs' for short. During machine boot, TDX microcode verifies that the BIOS programmed TDX private KeyIDs consistently and correctly programmed across all CPU packages. The MSRs are locked in this state after verification. This is why MSR_IA32_MKTME_KEYID_PARTITIONING gets used for TDX enumeration: it indicates not just that the hardware supports TDX, but that all the boot-time security checks passed. The TDX module is expected to be loaded by the BIOS when it enables TDX, but the kernel needs to properly initialize it before it can be used to create and run any TDX guests. The TDX module will be initialized by the KVM subsystem when KVM wants to use TDX. Detect platform TDX support by detecting TDX private KeyIDs. The TDX module itself requires one TDX KeyID as the 'TDX global KeyID' to protect its metadata. Each TDX guest also needs a TDX KeyID for its own protection. Just use the first TDX KeyID as the global KeyID and leave the rest for TDX guests. If no TDX KeyID is left for TDX guests, disable TDX as initializing the TDX module alone is useless. [ dhansen: add X86_FEATURE, replace helper function ] Signed-off-by: Kai Huang <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Isaku Yamahata <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Dave Hansen <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]> Link: https://lore.kernel.org/all/20231208170740.53979-1-dave.hansen%40intel.com
2023-12-03x86/microcode/intel: Set new revision only after a successful updateBorislav Petkov (AMD)1-7/+7
This was meant to be done only when early microcode got updated successfully. Move it into the if-branch. Also, make sure the current revision is read unconditionally and only once. Fixes: 080990aa3344 ("x86/microcode: Rework early revisions reporting") Reported-by: Ashok Raj <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Ashok Raj <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-02x86/CPU/AMD: Check vendor in the AMD microcode callbackBorislav Petkov (AMD)1-0/+3
Commit in Fixes added an AMD-specific microcode callback. However, it didn't check the CPU vendor the kernel runs on explicitly. The only reason the Zenbleed check in it didn't run on other x86 vendors hardware was pure coincidental luck: if (!cpu_has_amd_erratum(c, amd_zenbleed)) return; gives true on other vendors because they don't have those families and models. However, with the removal of the cpu_has_amd_erratum() in 05f5f73936fa ("x86/CPU/AMD: Drop now unused CPU erratum checking function") that coincidental condition is gone, leading to the zenbleed check getting executed on other vendors too. Add the explicit vendor check for the whole callback as it should've been done in the first place. Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix") Cc: <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-01x86/microcode/intel: Remove redundant microcode late updated messageAshok Raj1-6/+0
After successful update, the late loading routine prints an update summary similar to: microcode: load: updated on 128 primary CPUs with 128 siblings microcode: revision: 0x21000170 -> 0x21000190 Remove the redundant message in the Intel side of the driver. [ bp: Massage commit message. ] Signed-off-by: Ashok Raj <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Drop now unused CPU erratum checking functionBorislav Petkov (AMD)1-56/+0
Bye bye. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Get rid of amd_erratum_1485[]Borislav Petkov (AMD)1-8/+3
No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Get rid of amd_erratum_400[]Borislav Petkov (AMD)1-13/+20
Setting X86_BUG_AMD_E400 in init_amd() is early enough. No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Get rid of amd_erratum_383[]Borislav Petkov (AMD)1-5/+1
Set it in init_amd_gh() unconditionally as that is the F10h init function. No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Get rid of amd_erratum_1054[]Borislav Petkov (AMD)1-5/+1
No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init functionBorislav Petkov (AMD)1-10/+3
No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Move Zenbleed check to the Zen2 init functionBorislav Petkov (AMD)1-13/+3
Prefix it properly so that it is clear which generation it is dealing with. No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common()Borislav Petkov (AMD)1-4/+7
Call it from all Zen init functions. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Call the spectral chicken in the Zen2 init functionBorislav Petkov (AMD)1-4/+3
No functional change. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init functionBorislav Petkov (AMD)1-5/+5
No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init functionBorislav Petkov (AMD)1-8/+9
No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Carve out the erratum 1386 fixBorislav Petkov (AMD)1-9/+15
Call it on the affected CPU generations. No functional changes. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-29x86/CPU/AMD: Add ZenX generations flagsBorislav Petkov (AMD)1-2/+68
Add X86_FEATURE flags for each Zen generation. They should be used from now on instead of checking f/m/s. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Link: http://lore.kernel.org/r/[email protected]
2023-11-28x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank typesMuralidhara M K1-0/+6
Add HWID and McaType values for new SMCA bank types. Signed-off-by: Muralidhara M K <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-27x86/mce/amd, EDAC/mce_amd: Move long names to decoder moduleYazen Ghannam1-44/+30
The long names of the SMCA banks are only used by the MCE decoder module. Move them out of the arch code and into the decoder module. [ bp: Name the long names array "smca_long_names", drop local ptr in decode_smca_error(), constify arrays. ] Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-26Merge tag 'x86-urgent-2023-11-26' of ↵Linus Torvalds4-48/+37
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode fixes from Ingo Molnar: "Fix/enhance x86 microcode version reporting: fix the bootup log spam, and remove the driver version announcement to avoid version confusion when distros backport fixes" * tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Rework early revisions reporting x86/microcode: Remove the driver announcement and version
2023-11-24x86/cpu/intel_epb: Don't rely on link orderJames Morse1-1/+1
intel_epb_init() is called as a subsys_initcall() to register cpuhp callbacks. The callbacks make use of get_cpu_device() which will return NULL unless register_cpu() has been called. register_cpu() is called from topology_init(), which is also a subsys_initcall(). This is fragile. Moving the register_cpu() to a different subsys_initcall() leads to a NULL dereference during boot. Make intel_epb_init() a late_initcall(), user-space can't provide a policy before this point anyway. Signed-off-by: James Morse <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Gavin Shan <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]>
2023-11-22x86/mce/inject: Clear test status valueYazen Ghannam1-0/+1
AMD systems generally allow MCA "simulation" where MCA registers can be written with valid data and the full MCA handling flow can be tested by software. However, the platform on Scalable MCA systems, can prevent software from writing data to the MCA registers. There is no architectural way to determine this configuration. Therefore, the MCE injection module will check for this behavior by writing and reading back a test status value. This is done during module init, and the check can run on any CPU with any valid MCA bank. If MCA_STATUS writes are ignored by the platform, then there are no side effects on the hardware state. If the writes are not ignored, then the test status value will remain in the hardware MCA_STATUS register. It is likely that the value will not be overwritten by hardware or software, since the tested CPU and bank are arbitrary. Therefore, the user may see a spurious, synthetic MCA error reported whenever MCA is polled for this CPU. Clear the test value immediately after writing it. It is very unlikely that a valid MCA error is logged by hardware during the test. Errors that cause an #MC won't be affected. Fixes: 891e465a1bd8 ("x86/mce: Check whether writes to MCA_STATUS are getting ignored") Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]