aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu
AgeCommit message (Collapse)AuthorFilesLines
2020-01-13x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_*Sean Christopherson3-6/+29
Add support for generating VMX feature names in capflags.c and use the resulting x86_vmx_flags to print the VMX flags in /proc/cpuinfo. Don't print VMX flags if no bits are set in word 0, which holds Pin Controls. Pin Control's INTR and NMI exiting are fundamental pillars of VMX, if they are not supported then the CPU is broken, it does not actually support VMX, or the kernel wasn't built with support for the target CPU. Print the features in a dedicated "vmx flags" line to avoid polluting the common "flags" and to avoid having to prefix all flags with "vmx_", which results in horrendously long names. Keep synthetic VMX flags in cpufeatures to preserve /proc/cpuinfo's ABI for those flags. This means that "flags" and "vmx flags" will have duplicate entries for tpr_shadow (virtual_tpr), vnmi, ept, flexpriority, vpid and ept_ad, but caps the pollution of "flags" at those six VMX features. The vendor-specific code that populates the synthetic flags will be consolidated in a future patch to further minimize the lasting damage. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-13x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUsSean Christopherson2-0/+77
Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill the capabilities during IA32_FEAT_CTL MSR initialization. Make the VMX capabilities dependent on IA32_FEAT_CTL and X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't possibly support VMX, or when /proc/cpuinfo is not available. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-13x86/cpu: Clear VMX feature flag if VMX is not fully enabledSean Christopherson1-3/+20
Now that IA32_FEAT_CTL is always configured and locked for CPUs that are known to support VMX[*], clear the VMX capability flag if the MSR is unsupported or BIOS disabled VMX, i.e. locked IA32_FEAT_CTL and didn't set the appropriate VMX enable bit. [*] Because init_ia32_feat_ctl() is called from vendors ->c_init(), it's still possible for IA32_FEAT_CTL to be left unlocked when VMX is supported by the CPU. This is not fatal, and will be addressed in a future patch. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-13x86/zhaoxin: Use common IA32_FEAT_CTL MSR initializationSean Christopherson1-0/+2
Use the recently added IA32_FEAT_CTL MSR initialization sequence to opportunistically enable VMX support when running on a Zhaoxin CPU. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-13x86/centaur: Use common IA32_FEAT_CTL MSR initializationSean Christopherson1-0/+2
Use the recently added IA32_FEAT_CTL MSR initialization sequence to opportunistically enable VMX support when running on a Centaur CPU. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-13x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlockedSean Christopherson1-5/+6
WARN if the IA32_FEAT_CTL MSR is somehow left unlocked now that CPU initialization unconditionally locks the MSR. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-13x86/intel: Initialize IA32_FEAT_CTL MSR at bootSean Christopherson4-0/+44
Opportunistically initialize IA32_FEAT_CTL to enable VMX when the MSR is left unlocked by BIOS. Configuring feature control at boot time paves the way for similar enabling of other features, e.g. Software Guard Extensions (SGX). Temporarily leave equivalent KVM code in place in order to avoid introducing a regression on Centaur and Zhaoxin CPUs, e.g. removing KVM's code would leave the MSR unlocked on those CPUs and would break existing functionality if people are loading kvm_intel on Centaur and/or Zhaoxin. Defer enablement of the boot-time configuration on Centaur and Zhaoxin to future patches to aid bisection. Note, Local Machine Check Exceptions (LMCE) are also supported by the kernel and enabled via feature control, but the kernel currently uses LMCE if and only if the feature is explicitly enabled by BIOS. Keep the current behavior to avoid introducing bugs, future patches can opt in to opportunistic enabling if it's deemed desirable to do so. Always lock IA32_FEAT_CTL if it exists, even if the CPU doesn't support VMX, so that other existing and future kernel code that queries the MSR can assume it's locked. Start from a clean slate when constructing the value to write to IA32_FEAT_CTL, i.e. ignore whatever value BIOS left in the MSR so as not to enable random features or fault on the WRMSR. Suggested-by: Borislav Petkov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-13x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSRSean Christopherson1-5/+5
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL are quite a mouthful, especially the VMX bits which must differentiate between enabling VMX inside and outside SMX (TXT) operation. Rename the MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to make them a little friendlier on the eyes. Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name to match Intel's SDM, but a future patch will add a dedicated Kconfig, file and functions for the MSR. Using the full name for those assets is rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its nomenclature is consistent throughout the kernel. Opportunistically, fix a few other annoyances with the defines: - Relocate the bit defines so that they immediately follow the MSR define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL. - Add whitespace around the block of feature control defines to make it clear they're all related. - Use BIT() instead of manually encoding the bit shift. - Use "VMX" instead of "VMXON" to match the SDM. - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to be consistent with the kernel's verbiage used for all other feature control bits. Note, the SDM refers to the LMCE bit as LMCE_ON, likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore the (literal) one-off usage of _ON, the SDM is simply "wrong". Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-13x86/resctrl: Do not reconfigure exiting tasksXiaochen Shen1-0/+4
When writing a pid to file "tasks", a callback function move_myself() is queued to this task to be called when the task returns from kernel mode or exits. The purpose of move_myself() is to activate the newly assigned closid and/or rmid associated with this task. This activation is done by calling resctrl_sched_in() from move_myself(), the same function that is called when switching to this task. If this work is successfully queued but then the task enters PF_EXITING status (e.g., receiving signal SIGKILL, SIGTERM) prior to the execution of the callback move_myself(), move_myself() still calls resctrl_sched_in() since the task status is not currently considered. When a task is exiting, the data structure of the task itself will be freed soon. Calling resctrl_sched_in() to write the register that controls the task's resources is unnecessary and it implies extra performance overhead. Add check on task status in move_myself() and return immediately if the task is PF_EXITING. [ bp: Massage. ] Signed-off-by: Xiaochen Shen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-13x86/mce: Fix use of uninitialized MCE message stringJan H. Schönherr1-2/+2
The function mce_severity() is not required to update its msg argument. In fact, mce_severity_amd() does not, which makes mce_no_way_out() return uninitialized data, which may be used later for printing. Assuming that implementations of mce_severity() either always or never update the msg argument (which is currently the case), it is sufficient to initialize the temporary variable in mce_no_way_out(). While at it, avoid printing a useless "Unknown". Signed-off-by: Jan H. Schönherr <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-13x86/mce: Fix mce=nobootlogJan H. Schönherr1-13/+9
Since commit 8b38937b7ab5 ("x86/mce: Do not enter deferred errors into the generic pool twice") the mce=nobootlog option has become mostly ineffective (after being only slightly ineffective before), as the code is taking actions on MCEs left over from boot when they have a usable address. Move the check for MCP_DONTLOG a bit outward to make it effective again. Also, since commit 011d82611172 ("RAS: Add a Corrected Errors Collector") the two branches of the remaining "if" at the bottom of machine_check_poll() do same. Unify them. Signed-off-by: Jan H. Schönherr <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-13x86/mce: Take action on UCNA/Deferred errors againJan H. Schönherr1-15/+16
Commit fa92c5869426 ("x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll") added handling of UCNA and Deferred errors by adding them to the ring for SRAO errors. Later, commit fd4cf79fcc4b ("x86/mce: Remove the MCE ring for Action Optional errors") switched storage from the SRAO ring to the unified pool that is still in use today. In order to only act on the intended errors, a filter for MCE_AO_SEVERITY is used -- effectively removing handling of UCNA/Deferred errors again. Extend the severity filter to include UCNA/Deferred errors again. Also, generalize the naming of the notifier from SRAO to UC to capture the extended scope. Note, that this change may cause a message like the following to appear, as the same address may be reported as SRAO and as UCNA: Memory failure: 0x5fe3284: already hardware poisoned Technically, this is a return to previous behavior. Signed-off-by: Jan H. Schönherr <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-10Merge branch 'x86/mm' into efi/core, to pick up dependenciesIngo Molnar5-5/+5
Signed-off-by: Ingo Molnar <[email protected]>
2020-01-09x86/cpu: Add a missing prototype for arch_smt_update()Benjamin Thiel1-0/+1
.. in order to fix a -Wmissing-prototype warning. No functional change. Signed-off-by: Benjamin Thiel <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-02x86/resctrl: Fix potential memory leakShakeel Butt1-3/+3
set_cache_qos_cfg() is leaking memory when the given level is not RDT_RESOURCE_L3 or RDT_RESOURCE_L2. At the moment, this function is called with only valid levels but move the allocation after the valid level checks in order to make it more robust and future proof. [ bp: Massage commit message. ] Fixes: 99adde9b370de ("x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG") Signed-off-by: Shakeel Butt <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Reinette Chatre <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-12-30x86/resctrl: Fix an imbalance in domain_remove_cpu()Qian Cai1-1/+1
A system that supports resource monitoring may have multiple resources while not all of these resources are capable of monitoring. Monitoring related state is initialized only for resources that are capable of monitoring and correspondingly this state should subsequently only be removed from these resources that are capable of monitoring. domain_add_cpu() calls domain_setup_mon_state() only when r->mon_capable is true where it will initialize d->mbm_over. However, domain_remove_cpu() calls cancel_delayed_work(&d->mbm_over) without checking r->mon_capable resulting in an attempt to cancel d->mbm_over on all resources, even those that never initialized d->mbm_over because they are not capable of monitoring. Hence, it triggers a debugobjects warning when offlining CPUs because those timer debugobjects are never initialized: ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0 WARNING: CPU: 143 PID: 789 at lib/debugobjects.c:484 debug_print_object Hardware name: HP Synergy 680 Gen9/Synergy 680 Gen9 Compute Module, BIOS I40 05/23/2018 RIP: 0010:debug_print_object Call Trace: debug_object_assert_init del_timer try_to_grab_pending cancel_delayed_work resctrl_offline_cpu cpuhp_invoke_callback cpuhp_thread_fun smpboot_thread_fn kthread ret_from_fork Fixes: e33026831bdb ("x86/intel_rdt/mbm: Handle counter overflow") Signed-off-by: Qian Cai <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Reinette Chatre <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Cc: [email protected] Cc: <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Tony Luck <[email protected]> Cc: Vikas Shivappa <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-12-30Merge tag 'v5.5-rc4' into locking/kcsan, to resolve conflictsIngo Molnar13-184/+489
Conflicts: init/main.c lib/Kconfig.debug Signed-off-by: Ingo Molnar <[email protected]>
2019-12-17x86/mce: Remove mce_inject_log() in favor of mce_log()Jan H. Schönherr3-13/+2
The mutex in mce_inject_log() became unnecessary with commit 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver"), though the original reason for its presence only vanished with commit 7298f08ea887 ("x86/mcelog: Get rid of RCU remnants"). Drop the mutex. And as that makes mce_inject_log() identical to mce_log(), get rid of the former in favor of the latter. Signed-off-by: Jan H. Schönherr <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tony Luck <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: linux-edac <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-12-17x86/mce: Pass MCE message to mce_panic() on failed kernel recoveryJan H. Schönherr1-1/+1
In commit b2f9d678e28c ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries") another call to mce_panic() was introduced. Pass the message of the handled MCE to that instance of mce_panic() as well, as there doesn't seem to be a reason not to. Signed-off-by: Jan H. Schönherr <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tony Luck <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: linux-edac <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-12-17x86/mce/therm_throt: Mark throttle_active_work() as __maybe_unusedArnd Bergmann1-1/+1
throttle_active_work() is only called if CONFIG_SYSFS is set, otherwise we get a harmless warning: arch/x86/kernel/cpu/mce/therm_throt.c:238:13: error: 'throttle_active_work' \ defined but not used [-Werror=unused-function] Mark the function as __maybe_unused to avoid the warning. Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Srinivas Pandruvada <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: linux-edac <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-12-17x86/mce: Fix possibly incorrect severity calculation on AMDJan H. Schönherr1-1/+1
The function mce_severity_amd_smca() requires m->bank to be initialized for correct operation. Fix the one case, where mce_severity() is called without doing so. Fixes: 6bda529ec42e ("x86/mce: Grade uncorrected errors for SMCA-enabled systems") Fixes: d28af26faa0b ("x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()") Signed-off-by: Jan H. Schönherr <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Tony Luck <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: linux-edac <[email protected]> Cc: <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: x86-ml <[email protected]> Cc: Yazen Ghannam <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-12-17x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[]Yazen Ghannam1-1/+1
Each logical CPU in Scalable MCA systems controls a unique set of MCA banks in the system. These banks are not shared between CPUs. The bank types and ordering will be the same across CPUs on currently available systems. However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while other CPUs do not. In this case, the bank seen as Reserved on one CPU is assumed to be the same type as the bank seen as a known type on another CPU. In general, this occurs when the hardware represented by the MCA bank is disabled, e.g. disabled memory controllers on certain models, etc. The MCA bank is disabled in the hardware, so there is no possibility of getting an MCA/MCE from it even if it is assumed to have a known type. For example: Full system: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | UMC | UMC 2 | CS | CS System with hardware disabled: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | UMC | RAZ 2 | CS | CS For this reason, there is a single, global struct smca_banks[] that is initialized at boot time. This array is initialized on each CPU as it comes online. However, the array will not be updated if an entry already exists. This works as expected when the first CPU (usually CPU0) has all possible MCA banks enabled. But if the first CPU has a subset, then it will save a "Reserved" type in smca_banks[]. Successive CPUs will then not be able to update smca_banks[] even if they encounter a known bank type. This may result in unexpected behavior. Depending on the system configuration, a user may observe issues enumerating the MCA thresholding sysfs interface. The issues may be as trivial as sysfs entries not being available, or as severe as system hangs. For example: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | RAZ | UMC 2 | CS | CS Extend the smca_banks[] entry check to return if the entry is a non-reserved type. Otherwise, continue so that CPUs that encounter a known bank type can update smca_banks[]. Fixes: 68627a697c19 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type") Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: linux-edac <[email protected]> Cc: <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-12-17x86/MCE/AMD: Do not use rdmsr_safe_on_cpu() in smca_configure()Konstantin Khlebnikov1-1/+1
... because interrupts are disabled that early and sending IPIs can deadlock: BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 no locks held by swapper/1/0. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0 softirqs last enabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0 softirqs last disabled at (0): [<0000000000000000>] 0x0 Preemption disabled at: [<ffffffff8104703b>] start_secondary+0x3b/0x190 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.5.0-rc2+ #1 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 Call Trace: dump_stack ___might_sleep.cold.92 wait_for_completion ? generic_exec_single rdmsr_safe_on_cpu ? wrmsr_on_cpus mce_amd_feature_init mcheck_cpu_init identify_cpu identify_secondary_cpu smp_store_cpu_info start_secondary secondary_startup_64 The function smca_configure() is called only on the current CPU anyway, therefore replace rdmsr_safe_on_cpu() with atomic rdmsr_safe() and avoid the IPI. [ bp: Update commit message. ] Signed-off-by: Konstantin Khlebnikov <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Yazen Ghannam <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: linux-edac <[email protected]> Cc: <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/157252708836.3876.4604398213417262402.stgit@buzz
2019-12-15x86/cpu/tsx: Define pr_fmt()Borislav Petkov1-1/+4
... so that all current and future pr_* statements in this file have the proper prefix. No functional changes. Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-12-14x86/bugs: Move enum taa_mitigations to bugs.cBorislav Petkov1-0/+7
... because it is used only there. No functional changes. Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-12-10x86/mm/pat: Rename <asm/pat.h> => <asm/memtype.h>Ingo Molnar5-5/+5
pat.h is a file whose main purpose is to provide the memtype_*() APIs. PAT is the low level hardware mechanism - but the high level abstraction is memtype. So name the header <memtype.h> as well - this goes hand in hand with memtype.c and memtype_interval.c. Signed-off-by: Ingo Molnar <[email protected]>
2019-12-09x86/mtrr: Require CAP_SYS_ADMIN for all accessKees Cook1-19/+2
Zhang Xiaoxu noted that physical address locations for MTRR were visible to non-root users, which could be considered an information leak. In discussing[1] the options for solving this, it sounded like just moving the capable check into open() was the first step. If this breaks userspace, then we will have a test case for the more conservative approaches discussed in the thread. In summary: - MTRR should check capabilities at open time (or retain the checks on the opener's permissions for later checks). - changing the DAC permissions might break something that expects to open mtrr when not uid 0. - if we leave the DAC permissions alone and just move the capable check to the opener, we should get the desired protection. (i.e. check against CAP_SYS_ADMIN not just the wider uid 0.) - if that still breaks things, as in userspace expects to be able to read other parts of the file as non-uid-0 and non-CAP_SYS_ADMIN, then we need to censor the contents using the opener's permissions. For example, as done in other /proc cases, like commit 51d7b120418e ("/proc/iomem: only expose physical resource addresses to privileged users"). [1] https://lore.kernel.org/lkml/201911110934.AC5BA313@keescook/ Reported-by: Zhang Xiaoxu <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: James Morris <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Cc: Matthew Garrett <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: x86-ml <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/201911181308.63F06502A1@keescook
2019-12-09x86/mtrr: Get rid of mtrr_seq_show() forward declarationBorislav Petkov1-22/+20
... by moving the function up in the file. No functional changes. Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-12-01Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-10/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Various fixes: - Fix the PAT performance regression that downgraded write-combining device memory regions to uncached. - There's been a number of bugs in 32-bit double fault handling - hopefully all fixed now. - Fix an LDT crash - Fix an FPU over-optimization that broke with GCC9 code optimizations. - Misc cleanups" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/pat: Fix off-by-one bugs in interval tree search x86/ioperm: Save an indentation level in tss_update_io_bitmap() x86/fpu: Don't cache access to fpu_fpregs_owner_ctx x86/entry/32: Remove unused 'restore_all_notrace' local label x86/ptrace: Document FSBASE and GSBASE ABI oddities x86/ptrace: Remove set_segment_reg() implementations for current x86/traps: die() instead of panicking on a double fault x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area x86/doublefault/32: Rename doublefault.c to doublefault_32.c x86/traps: Disentangle the 32-bit and 64-bit doublefault code lkdtm: Add a DOUBLE_FAULT crash type on x86 selftests/x86/single_step_syscall: Check SYSENTER directly x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all()
2019-11-29x86/mce/therm_throt: Mask out read-only and reserved MSR bitsSrinivas Pandruvada1-5/+12
While writing to MSR IA32_THERM_STATUS/IA32_PKG_THERM_STATUS, avoid writing 1 to read only and reserved fields because updating some fields generates exception. [ bp: Vertically align for better readability. ] Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle") Reported-by: Dominik Brodowski <[email protected]> Tested-by: Dominik Brodowski <[email protected]> Signed-off-by: Srinivas Pandruvada <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: linux-edac <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-11-26x86/doublefault/32: Move #DF stack and TSS to cpu_entry_areaAndy Lutomirski1-10/+2
There are three problems with the current layout of the doublefault stack and TSS. First, the TSS is only cacheline-aligned, which is not enough -- if the hardware portion of the TSS (struct x86_hw_tss) crosses a page boundary, horrible things happen [0]. Second, the stack and TSS are global, so simultaneous double faults on different CPUs will cause massive corruption. Third, the whole mechanism won't work if user CR3 is loaded, resulting in a triple fault [1]. Let the doublefault stack and TSS share a page (which prevents the TSS from spanning a page boundary), make it percpu, and move it into cpu_entry_area. Teach the stack dump code about the doublefault stack. [0] Real hardware will read past the end of the page onto the next *physical* page if a task switch happens. Virtual machines may have any number of bugs, and I would consider it reasonable for a VM to summarily kill the guest if it tries to task-switch to a page-spanning TSS. [1] Real hardware triple faults. At least some VMs seem to hang. I'm not sure what's going on. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2019-11-26Merge branch 'x86-iopl-for-linus' of ↵Linus Torvalds1-108/+80
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 iopl updates from Ingo Molnar: "This implements a nice simplification of the iopl and ioperm code that Thomas Gleixner discovered: we can implement the IO privilege features of the iopl system call by using the IO permission bitmap in permissive mode, while trapping CLI/STI/POPF/PUSHF uses in user-space if they change the interrupt flag. This implements that feature, with testing facilities and related cleanups" [ "Simplification" may be an over-statement. The main goal is to avoid the cli/sti of iopl by effectively implementing the IO port access parts of iopl in terms of ioperm. This may end up not workign well in case people actually depend on cli/sti being available, or if there are mixed uses of iopl and ioperm. We will see.. - Linus ] * 'x86-iopl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86/ioperm: Fix use of deprecated config option x86/entry/32: Clarify register saving in __switch_to_asm() selftests/x86/iopl: Extend test to cover IOPL emulation x86/ioperm: Extend IOPL config to control ioperm() as well x86/iopl: Remove legacy IOPL option x86/iopl: Restrict iopl() permission scope x86/iopl: Fixup misleading comment selftests/x86/ioperm: Extend testing so the shared bitmap is exercised x86/ioperm: Share I/O bitmap if identical x86/ioperm: Remove bitmap if all permissions dropped x86/ioperm: Move TSS bitmap update to exit to user work x86/ioperm: Add bitmap sequence number x86/ioperm: Move iobitmap data into a struct x86/tss: Move I/O bitmap data into a seperate struct x86/io: Speedup schedule out of I/O bitmap user x86/ioperm: Avoid bitmap allocation if no permissions are set x86/ioperm: Simplify first ioperm() invocation logic x86/iopl: Cleanup include maze x86/tss: Fix and move VMX BUILD_BUG_ON() x86/cpu: Unify cpu_init() ...
2019-11-26Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds1-2/+28
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 PTI updates from Ingo Molnar: "Fix reporting bugs of the MDS and TAA mitigation status, if one or both are set via a boot option. No change to mitigation behavior intended" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Fix redundant MDS mitigation message x86/speculation: Fix incorrect MDS/TAA mitigation status
2019-11-26Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main changes in this cycle were: - A PAT series from Davidlohr Bueso, which simplifies the memtype rbtree by using the interval tree helpers. (There's more cleanups in this area queued up, but they didn't make the merge window.) - Also flip over CONFIG_X86_5LEVEL to default-y. This might draw in a few more testers, as all the major distros are going to have 5-level paging enabled by default in their next iterations. - Misc cleanups" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/pat: Rename pat_rbtree.c to pat_interval.c x86/mm/pat: Drop the rbt_ prefix from external memtype calls x86/mm/pat: Do not pass 'rb_root' down the memtype tree helper functions x86/mm/pat: Convert the PAT tree to a generic interval tree x86/mm: Clean up the pmd_read_atomic() comments x86/mm: Fix function name typo in pmd_read_atomic() comment x86/cpu: Clean up intel_tlb_table[] x86/mm: Enable 5-level paging support by default
2019-11-26Merge branch 'x86-hyperv-for-linus' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 hyperv updates from Ingo Molnar: "Misc updates to the hyperv guest code: - Rework clockevents initialization to better support hibernation - Allow guests to enable InvariantTSC - Micro-optimize send_ipi_one" * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Initialize clockevents earlier in CPU onlining x86/hyperv: Allow guests to enable InvariantTSC x86/hyperv: Micro-optimize send_ipi_one()
2019-11-26Merge branches 'x86-cpu-for-linus' and 'x86-fpu-for-linus' of ↵Linus Torvalds2-3/+24
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu and fpu updates from Ingo Molnar: - math-emu fixes - CPUID updates - sanity-check RDRAND output to see whether the CPU at least pretends to produce random data - various unaligned-access across cachelines fixes in preparation of hardware level split-lock detection - fix MAXSMP constraints to not allow !CPUMASK_OFFSTACK kernels with larger than 512 NR_CPUS - misc FPU related cleanups * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Align the x86_capability array to size of unsigned long x86/cpu: Align cpu_caps_cleared and cpu_caps_set to unsigned long x86/umip: Make the comments vendor-agnostic x86/Kconfig: Rename UMIP config parameter x86/Kconfig: Enforce limit of 512 CPUs with MAXSMP and no CPUMASK_OFFSTACK x86/cpufeatures: Add feature bit RDPRU on AMD x86/math-emu: Limit MATH_EMULATION to 486SX compatibles x86/math-emu: Check __copy_from_user() result x86/rdrand: Sanity-check RDRAND output * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Use XFEATURE_FP/SSE enum values instead of hardcoded numbers x86/fpu: Shrink space allocated for xstate_comp_offsets x86/fpu: Update stale variable name in comment
2019-11-25Merge branch 'ras-core-for-linus' of ↵Linus Torvalds5-44/+319
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Fully reworked thermal throttling notifications, there should be no more spamming of dmesg (Srinivas Pandruvada and Benjamin Berg) - More enablement for the Intel-compatible CPUs Zhaoxin (Tony W Wang-oc) - PPIN support for Icelake (Tony Luck) * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/therm_throt: Optimize notifications of thermal throttle x86/mce: Add Xeon Icelake to list of CPUs that support PPIN x86/mce: Lower throttling MCE messages' priority to warning x86/mce: Add Zhaoxin LMCE support x86/mce: Add Zhaoxin CMCI support x86/mce: Add Zhaoxin MCE support x86/mce/amd: Make disable_err_thresholding() static
2019-11-25Merge branch 'x86-microcode-for-linus' of ↵Linus Torvalds3-19/+26
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode updates from Borislav Petkov: "This converts the late loading method to load the microcode in parallel (vs sequentially currently). The patch remained in linux-next for the maximum amount of time so that any potential and hard to debug fallout be minimized. Now cloud folks have their milliseconds back but all the normal people should use early loading anyway :-)" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/intel: Issue the revision updated message only on the BSP x86/microcode: Update late microcode in parallel x86/microcode/amd: Fix two -Wunused-but-set-variable warnings
2019-11-19Merge branch 'for-mingo' of ↵Ingo Molnar1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/kcsan Pull the KCSAN subsystem from Paul E. McKenney: "This pull request contains base kernel concurrency sanitizer (KCSAN) enablement for x86, courtesy of Marco Elver. KCSAN is a sampling watchpoint-based data-race detector, and is documented in Documentation/dev-tools/kcsan.rst. KCSAN was announced in September, and much feedback has since been incorporated: http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com The data races located thus far have resulted in a number of fixes: https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan Additional information may be found here: https://lore.kernel.org/lkml/[email protected]/ " Signed-off-by: Ingo Molnar <[email protected]>
2019-11-19Merge tag 'v5.4-rc8' into WIP.x86/mm, to pick up fixesIngo Molnar16-173/+585
Signed-off-by: Ingo Molnar <[email protected]>
2019-11-16Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two fixes: disable unreliable HPET on Intel Coffe Lake platforms, and fix a lockdep splat in the resctrl code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Fix potential lockdep warning x86/quirks: Disable HPET on Intel Coffe Lake platforms
2019-11-16x86, kcsan: Enable KCSAN for x86Marco Elver1-0/+3
This patch enables KCSAN for x86, with updates to build rules to not use KCSAN for several incompatible compilation units. Signed-off-by: Marco Elver <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2019-11-16x86/speculation: Fix redundant MDS mitigation messageWaiman Long1-0/+13
Since MDS and TAA mitigations are inter-related for processors that are affected by both vulnerabilities, the followiing confusing messages can be printed in the kernel log: MDS: Vulnerable MDS: Mitigation: Clear CPU buffers To avoid the first incorrect message, defer the printing of MDS mitigation after the TAA mitigation selection has been done. However, that has the side effect of printing TAA mitigation first before MDS mitigation. [ bp: Check box is affected/mitigations are disabled first before printing and massage. ] Suggested-by: Pawan Gupta <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Mark Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tim Chen <[email protected]> Cc: Tony Luck <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-11-16x86/speculation: Fix incorrect MDS/TAA mitigation statusWaiman Long1-2/+15
For MDS vulnerable processors with TSX support, enabling either MDS or TAA mitigations will enable the use of VERW to flush internal processor buffers at the right code path. IOW, they are either both mitigated or both not. However, if the command line options are inconsistent, the vulnerabilites sysfs files may not report the mitigation status correctly. For example, with only the "mds=off" option: vulnerabilities/mds:Vulnerable; SMT vulnerable vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable The mds vulnerabilities file has wrong status in this case. Similarly, the taa vulnerability file will be wrong with mds mitigation on, but taa off. Change taa_select_mitigation() to sync up the two mitigation status and have them turned off if both "mds=off" and "tsx_async_abort=off" are present. Update documentation to emphasize the fact that both "mds=off" and "tsx_async_abort=off" have to be specified together for processors that are affected by both TAA and MDS to be effective. [ bp: Massage and add kernel-parameters.txt change too. ] Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort") Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Mark Gross <[email protected]> Cc: <[email protected]> Cc: Pawan Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tim Chen <[email protected]> Cc: Tony Luck <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: x86-ml <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-11-16x86/ioperm: Extend IOPL config to control ioperm() as wellThomas Gleixner1-9/+17
If iopl() is disabled, then providing ioperm() does not make much sense. Rename the config option and disable/enable both syscalls with it. Guard the code with #ifdefs where appropriate. Suggested-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2019-11-16x86/iopl: Restrict iopl() permission scopeThomas Gleixner1-0/+5
The access to the full I/O port range can be also provided by the TSS I/O bitmap, but that would require to copy 8k of data on scheduling in the task. As shown with the sched out optimization TSS.io_bitmap_base can be used to switch the incoming task to a preallocated I/O bitmap which has all bits zero, i.e. allows access to all I/O ports. Implementing this allows to provide an iopl() emulation mode which restricts the IOPL level 3 permissions to I/O port access but removes the STI/CLI permission which is coming with the hardware IOPL mechansim. Provide a config option to switch IOPL to emulation mode, make it the default and while at it also provide an option to disable IOPL completely. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]>
2019-11-16x86/ioperm: Add bitmap sequence numberThomas Gleixner1-0/+1
Add a globally unique sequence number which is incremented when ioperm() is changing the I/O bitmap of a task. Store the new sequence number in the io_bitmap structure and compare it with the sequence number of the I/O bitmap which was last loaded on a CPU. Only update the bitmap if the sequence is different. That should further reduce the overhead of I/O bitmap scheduling when there are only a few I/O bitmap users on the system. The 64bit sequence counter is sufficient. A wraparound of the sequence counter assuming an ioperm() call every nanosecond would require about 584 years of uptime. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2019-11-16x86/tss: Move I/O bitmap data into a seperate structThomas Gleixner1-2/+2
Move the non hardware portion of I/O bitmap data into a seperate struct for readability sake. Originally-by: Ingo Molnar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2019-11-16x86/io: Speedup schedule out of I/O bitmap userThomas Gleixner1-1/+2
There is no requirement to update the TSS I/O bitmap when a thread using it is scheduled out and the incoming thread does not use it. For the permission check based on the TSS I/O bitmap the CPU calculates the memory location of the I/O bitmap by the address of the TSS and the io_bitmap_base member of the tss_struct. The easiest way to invalidate the I/O bitmap is to switch the offset to an address outside of the TSS limit. If an I/O instruction is issued from user space the TSS limit causes #GP to be raised in the same was as valid I/O bitmap with all bits set to 1 would do. This removes the extra work when an I/O bitmap using task is scheduled out and puts the burden on the rare I/O bitmap users when they are scheduled in. Signed-off-by: Thomas Gleixner <[email protected]>
2019-11-16x86/cpu: Unify cpu_init()Thomas Gleixner1-108/+65
Similar to copy_thread_tls() the 32bit and 64bit implementations of cpu_init() are very similar and unification avoids duplicate changes in the future. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Andy Lutomirski <[email protected]>