Age | Commit message (Collapse) | Author | Files | Lines |
|
Add support for generating VMX feature names in capflags.c and use the
resulting x86_vmx_flags to print the VMX flags in /proc/cpuinfo. Don't
print VMX flags if no bits are set in word 0, which holds Pin Controls.
Pin Control's INTR and NMI exiting are fundamental pillars of VMX, if
they are not supported then the CPU is broken, it does not actually
support VMX, or the kernel wasn't built with support for the target CPU.
Print the features in a dedicated "vmx flags" line to avoid polluting
the common "flags" and to avoid having to prefix all flags with "vmx_",
which results in horrendously long names.
Keep synthetic VMX flags in cpufeatures to preserve /proc/cpuinfo's ABI
for those flags. This means that "flags" and "vmx flags" will have
duplicate entries for tpr_shadow (virtual_tpr), vnmi, ept, flexpriority,
vpid and ept_ad, but caps the pollution of "flags" at those six VMX
features. The vendor-specific code that populates the synthetic flags
will be consolidated in a future patch to further minimize the lasting
damage.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill
the capabilities during IA32_FEAT_CTL MSR initialization.
Make the VMX capabilities dependent on IA32_FEAT_CTL and
X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't
possibly support VMX, or when /proc/cpuinfo is not available.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Now that IA32_FEAT_CTL is always configured and locked for CPUs that are
known to support VMX[*], clear the VMX capability flag if the MSR is
unsupported or BIOS disabled VMX, i.e. locked IA32_FEAT_CTL and didn't
set the appropriate VMX enable bit.
[*] Because init_ia32_feat_ctl() is called from vendors ->c_init(), it's
still possible for IA32_FEAT_CTL to be left unlocked when VMX is
supported by the CPU. This is not fatal, and will be addressed in a
future patch.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Use the recently added IA32_FEAT_CTL MSR initialization sequence to
opportunistically enable VMX support when running on a Zhaoxin CPU.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Use the recently added IA32_FEAT_CTL MSR initialization sequence to
opportunistically enable VMX support when running on a Centaur CPU.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
WARN if the IA32_FEAT_CTL MSR is somehow left unlocked now that CPU
initialization unconditionally locks the MSR.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Opportunistically initialize IA32_FEAT_CTL to enable VMX when the MSR is
left unlocked by BIOS. Configuring feature control at boot time paves
the way for similar enabling of other features, e.g. Software Guard
Extensions (SGX).
Temporarily leave equivalent KVM code in place in order to avoid
introducing a regression on Centaur and Zhaoxin CPUs, e.g. removing
KVM's code would leave the MSR unlocked on those CPUs and would break
existing functionality if people are loading kvm_intel on Centaur and/or
Zhaoxin. Defer enablement of the boot-time configuration on Centaur and
Zhaoxin to future patches to aid bisection.
Note, Local Machine Check Exceptions (LMCE) are also supported by the
kernel and enabled via feature control, but the kernel currently uses
LMCE if and only if the feature is explicitly enabled by BIOS. Keep
the current behavior to avoid introducing bugs, future patches can opt
in to opportunistic enabling if it's deemed desirable to do so.
Always lock IA32_FEAT_CTL if it exists, even if the CPU doesn't support
VMX, so that other existing and future kernel code that queries the MSR
can assume it's locked.
Start from a clean slate when constructing the value to write to
IA32_FEAT_CTL, i.e. ignore whatever value BIOS left in the MSR so as not
to enable random features or fault on the WRMSR.
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation. Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.
Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.
Opportunistically, fix a few other annoyances with the defines:
- Relocate the bit defines so that they immediately follow the MSR
define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
- Add whitespace around the block of feature control defines to make
it clear they're all related.
- Use BIT() instead of manually encoding the bit shift.
- Use "VMX" instead of "VMXON" to match the SDM.
- Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
be consistent with the kernel's verbiage used for all other feature
control bits. Note, the SDM refers to the LMCE bit as LMCE_ON,
likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore
the (literal) one-off usage of _ON, the SDM is simply "wrong".
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
When writing a pid to file "tasks", a callback function move_myself() is
queued to this task to be called when the task returns from kernel mode
or exits. The purpose of move_myself() is to activate the newly assigned
closid and/or rmid associated with this task. This activation is done by
calling resctrl_sched_in() from move_myself(), the same function that is
called when switching to this task.
If this work is successfully queued but then the task enters PF_EXITING
status (e.g., receiving signal SIGKILL, SIGTERM) prior to the
execution of the callback move_myself(), move_myself() still calls
resctrl_sched_in() since the task status is not currently considered.
When a task is exiting, the data structure of the task itself will
be freed soon. Calling resctrl_sched_in() to write the register that
controls the task's resources is unnecessary and it implies extra
performance overhead.
Add check on task status in move_myself() and return immediately if the
task is PF_EXITING.
[ bp: Massage. ]
Signed-off-by: Xiaochen Shen <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The function mce_severity() is not required to update its msg argument.
In fact, mce_severity_amd() does not, which makes mce_no_way_out()
return uninitialized data, which may be used later for printing.
Assuming that implementations of mce_severity() either always or never
update the msg argument (which is currently the case), it is sufficient
to initialize the temporary variable in mce_no_way_out().
While at it, avoid printing a useless "Unknown".
Signed-off-by: Jan H. Schönherr <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Since commit
8b38937b7ab5 ("x86/mce: Do not enter deferred errors into the generic
pool twice")
the mce=nobootlog option has become mostly ineffective (after being only
slightly ineffective before), as the code is taking actions on MCEs left
over from boot when they have a usable address.
Move the check for MCP_DONTLOG a bit outward to make it effective again.
Also, since commit
011d82611172 ("RAS: Add a Corrected Errors Collector")
the two branches of the remaining "if" at the bottom of machine_check_poll()
do same. Unify them.
Signed-off-by: Jan H. Schönherr <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Commit
fa92c5869426 ("x86, mce: Support memory error recovery for both UCNA
and Deferred error in machine_check_poll")
added handling of UCNA and Deferred errors by adding them to the ring
for SRAO errors.
Later, commit
fd4cf79fcc4b ("x86/mce: Remove the MCE ring for Action Optional errors")
switched storage from the SRAO ring to the unified pool that is still
in use today. In order to only act on the intended errors, a filter
for MCE_AO_SEVERITY is used -- effectively removing handling of
UCNA/Deferred errors again.
Extend the severity filter to include UCNA/Deferred errors again.
Also, generalize the naming of the notifier from SRAO to UC to capture
the extended scope.
Note, that this change may cause a message like the following to appear,
as the same address may be reported as SRAO and as UCNA:
Memory failure: 0x5fe3284: already hardware poisoned
Technically, this is a return to previous behavior.
Signed-off-by: Jan H. Schönherr <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Tony Luck <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
.. in order to fix a -Wmissing-prototype warning.
No functional change.
Signed-off-by: Benjamin Thiel <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
set_cache_qos_cfg() is leaking memory when the given level is not
RDT_RESOURCE_L3 or RDT_RESOURCE_L2. At the moment, this function is
called with only valid levels but move the allocation after the valid
level checks in order to make it more robust and future proof.
[ bp: Massage commit message. ]
Fixes: 99adde9b370de ("x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG")
Signed-off-by: Shakeel Butt <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Reinette Chatre <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
A system that supports resource monitoring may have multiple resources
while not all of these resources are capable of monitoring. Monitoring
related state is initialized only for resources that are capable of
monitoring and correspondingly this state should subsequently only be
removed from these resources that are capable of monitoring.
domain_add_cpu() calls domain_setup_mon_state() only when r->mon_capable
is true where it will initialize d->mbm_over. However,
domain_remove_cpu() calls cancel_delayed_work(&d->mbm_over) without
checking r->mon_capable resulting in an attempt to cancel d->mbm_over on
all resources, even those that never initialized d->mbm_over because
they are not capable of monitoring. Hence, it triggers a debugobjects
warning when offlining CPUs because those timer debugobjects are never
initialized:
ODEBUG: assert_init not available (active state 0) object type:
timer_list hint: 0x0
WARNING: CPU: 143 PID: 789 at lib/debugobjects.c:484
debug_print_object
Hardware name: HP Synergy 680 Gen9/Synergy 680 Gen9 Compute Module, BIOS I40 05/23/2018
RIP: 0010:debug_print_object
Call Trace:
debug_object_assert_init
del_timer
try_to_grab_pending
cancel_delayed_work
resctrl_offline_cpu
cpuhp_invoke_callback
cpuhp_thread_fun
smpboot_thread_fn
kthread
ret_from_fork
Fixes: e33026831bdb ("x86/intel_rdt/mbm: Handle counter overflow")
Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Reinette Chatre <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Tony Luck <[email protected]>
Cc: Vikas Shivappa <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Conflicts:
init/main.c
lib/Kconfig.debug
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The mutex in mce_inject_log() became unnecessary with commit
5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver"),
though the original reason for its presence only vanished with commit
7298f08ea887 ("x86/mcelog: Get rid of RCU remnants").
Drop the mutex. And as that makes mce_inject_log() identical to mce_log(),
get rid of the former in favor of the latter.
Signed-off-by: Jan H. Schönherr <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: linux-edac <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
In commit
b2f9d678e28c ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries")
another call to mce_panic() was introduced. Pass the message of the
handled MCE to that instance of mce_panic() as well, as there doesn't
seem to be a reason not to.
Signed-off-by: Jan H. Schönherr <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: linux-edac <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
throttle_active_work() is only called if CONFIG_SYSFS is set, otherwise
we get a harmless warning:
arch/x86/kernel/cpu/mce/therm_throt.c:238:13: error: 'throttle_active_work' \
defined but not used [-Werror=unused-function]
Mark the function as __maybe_unused to avoid the warning.
Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Srinivas Pandruvada <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: linux-edac <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The function mce_severity_amd_smca() requires m->bank to be initialized
for correct operation. Fix the one case, where mce_severity() is called
without doing so.
Fixes: 6bda529ec42e ("x86/mce: Grade uncorrected errors for SMCA-enabled systems")
Fixes: d28af26faa0b ("x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()")
Signed-off-by: Jan H. Schönherr <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: linux-edac <[email protected]>
Cc: <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Cc: Yazen Ghannam <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Each logical CPU in Scalable MCA systems controls a unique set of MCA
banks in the system. These banks are not shared between CPUs. The bank
types and ordering will be the same across CPUs on currently available
systems.
However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while
other CPUs do not. In this case, the bank seen as Reserved on one CPU is
assumed to be the same type as the bank seen as a known type on another
CPU.
In general, this occurs when the hardware represented by the MCA bank
is disabled, e.g. disabled memory controllers on certain models, etc.
The MCA bank is disabled in the hardware, so there is no possibility of
getting an MCA/MCE from it even if it is assumed to have a known type.
For example:
Full system:
Bank | Type seen on CPU0 | Type seen on CPU1
------------------------------------------------
0 | LS | LS
1 | UMC | UMC
2 | CS | CS
System with hardware disabled:
Bank | Type seen on CPU0 | Type seen on CPU1
------------------------------------------------
0 | LS | LS
1 | UMC | RAZ
2 | CS | CS
For this reason, there is a single, global struct smca_banks[] that is
initialized at boot time. This array is initialized on each CPU as it
comes online. However, the array will not be updated if an entry already
exists.
This works as expected when the first CPU (usually CPU0) has all
possible MCA banks enabled. But if the first CPU has a subset, then it
will save a "Reserved" type in smca_banks[]. Successive CPUs will then
not be able to update smca_banks[] even if they encounter a known bank
type.
This may result in unexpected behavior. Depending on the system
configuration, a user may observe issues enumerating the MCA
thresholding sysfs interface. The issues may be as trivial as sysfs
entries not being available, or as severe as system hangs.
For example:
Bank | Type seen on CPU0 | Type seen on CPU1
------------------------------------------------
0 | LS | LS
1 | RAZ | UMC
2 | CS | CS
Extend the smca_banks[] entry check to return if the entry is a
non-reserved type. Otherwise, continue so that CPUs that encounter a
known bank type can update smca_banks[].
Fixes: 68627a697c19 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type")
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: linux-edac <[email protected]>
Cc: <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
... because interrupts are disabled that early and sending IPIs can
deadlock:
BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
no locks held by swapper/1/0.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0
softirqs last enabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0
softirqs last disabled at (0): [<0000000000000000>] 0x0
Preemption disabled at:
[<ffffffff8104703b>] start_secondary+0x3b/0x190
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.5.0-rc2+ #1
Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018
Call Trace:
dump_stack
___might_sleep.cold.92
wait_for_completion
? generic_exec_single
rdmsr_safe_on_cpu
? wrmsr_on_cpus
mce_amd_feature_init
mcheck_cpu_init
identify_cpu
identify_secondary_cpu
smp_store_cpu_info
start_secondary
secondary_startup_64
The function smca_configure() is called only on the current CPU anyway,
therefore replace rdmsr_safe_on_cpu() with atomic rdmsr_safe() and avoid
the IPI.
[ bp: Update commit message. ]
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Yazen Ghannam <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: linux-edac <[email protected]>
Cc: <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/157252708836.3876.4604398213417262402.stgit@buzz
|
|
... so that all current and future pr_* statements in this file have the
proper prefix.
No functional changes.
Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
... because it is used only there.
No functional changes.
Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
pat.h is a file whose main purpose is to provide the memtype_*() APIs.
PAT is the low level hardware mechanism - but the high level abstraction
is memtype.
So name the header <memtype.h> as well - this goes hand in hand with memtype.c
and memtype_interval.c.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Zhang Xiaoxu noted that physical address locations for MTRR were visible
to non-root users, which could be considered an information leak.
In discussing[1] the options for solving this, it sounded like just
moving the capable check into open() was the first step.
If this breaks userspace, then we will have a test case for the more
conservative approaches discussed in the thread. In summary:
- MTRR should check capabilities at open time (or retain the
checks on the opener's permissions for later checks).
- changing the DAC permissions might break something that expects to
open mtrr when not uid 0.
- if we leave the DAC permissions alone and just move the capable check
to the opener, we should get the desired protection. (i.e. check
against CAP_SYS_ADMIN not just the wider uid 0.)
- if that still breaks things, as in userspace expects to be able to
read other parts of the file as non-uid-0 and non-CAP_SYS_ADMIN, then
we need to censor the contents using the opener's permissions. For
example, as done in other /proc cases, like commit
51d7b120418e ("/proc/iomem: only expose physical resource addresses to privileged users").
[1] https://lore.kernel.org/lkml/201911110934.AC5BA313@keescook/
Reported-by: Zhang Xiaoxu <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: James Morris <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: Matthew Garrett <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tyler Hicks <[email protected]>
Cc: x86-ml <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/201911181308.63F06502A1@keescook
|
|
... by moving the function up in the file.
No functional changes.
Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Various fixes:
- Fix the PAT performance regression that downgraded write-combining
device memory regions to uncached.
- There's been a number of bugs in 32-bit double fault handling -
hopefully all fixed now.
- Fix an LDT crash
- Fix an FPU over-optimization that broke with GCC9 code
optimizations.
- Misc cleanups"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/pat: Fix off-by-one bugs in interval tree search
x86/ioperm: Save an indentation level in tss_update_io_bitmap()
x86/fpu: Don't cache access to fpu_fpregs_owner_ctx
x86/entry/32: Remove unused 'restore_all_notrace' local label
x86/ptrace: Document FSBASE and GSBASE ABI oddities
x86/ptrace: Remove set_segment_reg() implementations for current
x86/traps: die() instead of panicking on a double fault
x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit
x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area
x86/doublefault/32: Rename doublefault.c to doublefault_32.c
x86/traps: Disentangle the 32-bit and 64-bit doublefault code
lkdtm: Add a DOUBLE_FAULT crash type on x86
selftests/x86/single_step_syscall: Check SYSENTER directly
x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all()
|
|
While writing to MSR IA32_THERM_STATUS/IA32_PKG_THERM_STATUS, avoid
writing 1 to read only and reserved fields because updating some fields
generates exception.
[ bp: Vertically align for better readability. ]
Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Reported-by: Dominik Brodowski <[email protected]>
Tested-by: Dominik Brodowski <[email protected]>
Signed-off-by: Srinivas Pandruvada <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: linux-edac <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
There are three problems with the current layout of the doublefault
stack and TSS. First, the TSS is only cacheline-aligned, which is
not enough -- if the hardware portion of the TSS (struct x86_hw_tss)
crosses a page boundary, horrible things happen [0]. Second, the
stack and TSS are global, so simultaneous double faults on different
CPUs will cause massive corruption. Third, the whole mechanism
won't work if user CR3 is loaded, resulting in a triple fault [1].
Let the doublefault stack and TSS share a page (which prevents the
TSS from spanning a page boundary), make it percpu, and move it into
cpu_entry_area. Teach the stack dump code about the doublefault
stack.
[0] Real hardware will read past the end of the page onto the next
*physical* page if a task switch happens. Virtual machines may
have any number of bugs, and I would consider it reasonable for
a VM to summarily kill the guest if it tries to task-switch to
a page-spanning TSS.
[1] Real hardware triple faults. At least some VMs seem to hang.
I'm not sure what's going on.
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 iopl updates from Ingo Molnar:
"This implements a nice simplification of the iopl and ioperm code that
Thomas Gleixner discovered: we can implement the IO privilege features
of the iopl system call by using the IO permission bitmap in
permissive mode, while trapping CLI/STI/POPF/PUSHF uses in user-space
if they change the interrupt flag.
This implements that feature, with testing facilities and related
cleanups"
[ "Simplification" may be an over-statement. The main goal is to avoid
the cli/sti of iopl by effectively implementing the IO port access
parts of iopl in terms of ioperm.
This may end up not workign well in case people actually depend on
cli/sti being available, or if there are mixed uses of iopl and
ioperm. We will see.. - Linus ]
* 'x86-iopl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/ioperm: Fix use of deprecated config option
x86/entry/32: Clarify register saving in __switch_to_asm()
selftests/x86/iopl: Extend test to cover IOPL emulation
x86/ioperm: Extend IOPL config to control ioperm() as well
x86/iopl: Remove legacy IOPL option
x86/iopl: Restrict iopl() permission scope
x86/iopl: Fixup misleading comment
selftests/x86/ioperm: Extend testing so the shared bitmap is exercised
x86/ioperm: Share I/O bitmap if identical
x86/ioperm: Remove bitmap if all permissions dropped
x86/ioperm: Move TSS bitmap update to exit to user work
x86/ioperm: Add bitmap sequence number
x86/ioperm: Move iobitmap data into a struct
x86/tss: Move I/O bitmap data into a seperate struct
x86/io: Speedup schedule out of I/O bitmap user
x86/ioperm: Avoid bitmap allocation if no permissions are set
x86/ioperm: Simplify first ioperm() invocation logic
x86/iopl: Cleanup include maze
x86/tss: Fix and move VMX BUILD_BUG_ON()
x86/cpu: Unify cpu_init()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PTI updates from Ingo Molnar:
"Fix reporting bugs of the MDS and TAA mitigation status, if one or
both are set via a boot option.
No change to mitigation behavior intended"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Fix redundant MDS mitigation message
x86/speculation: Fix incorrect MDS/TAA mitigation status
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
"The main changes in this cycle were:
- A PAT series from Davidlohr Bueso, which simplifies the memtype
rbtree by using the interval tree helpers. (There's more cleanups
in this area queued up, but they didn't make the merge window.)
- Also flip over CONFIG_X86_5LEVEL to default-y. This might draw in a
few more testers, as all the major distros are going to have
5-level paging enabled by default in their next iterations.
- Misc cleanups"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/pat: Rename pat_rbtree.c to pat_interval.c
x86/mm/pat: Drop the rbt_ prefix from external memtype calls
x86/mm/pat: Do not pass 'rb_root' down the memtype tree helper functions
x86/mm/pat: Convert the PAT tree to a generic interval tree
x86/mm: Clean up the pmd_read_atomic() comments
x86/mm: Fix function name typo in pmd_read_atomic() comment
x86/cpu: Clean up intel_tlb_table[]
x86/mm: Enable 5-level paging support by default
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 hyperv updates from Ingo Molnar:
"Misc updates to the hyperv guest code:
- Rework clockevents initialization to better support hibernation
- Allow guests to enable InvariantTSC
- Micro-optimize send_ipi_one"
* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Initialize clockevents earlier in CPU onlining
x86/hyperv: Allow guests to enable InvariantTSC
x86/hyperv: Micro-optimize send_ipi_one()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu and fpu updates from Ingo Molnar:
- math-emu fixes
- CPUID updates
- sanity-check RDRAND output to see whether the CPU at least pretends
to produce random data
- various unaligned-access across cachelines fixes in preparation of
hardware level split-lock detection
- fix MAXSMP constraints to not allow !CPUMASK_OFFSTACK kernels with
larger than 512 NR_CPUS
- misc FPU related cleanups
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Align the x86_capability array to size of unsigned long
x86/cpu: Align cpu_caps_cleared and cpu_caps_set to unsigned long
x86/umip: Make the comments vendor-agnostic
x86/Kconfig: Rename UMIP config parameter
x86/Kconfig: Enforce limit of 512 CPUs with MAXSMP and no CPUMASK_OFFSTACK
x86/cpufeatures: Add feature bit RDPRU on AMD
x86/math-emu: Limit MATH_EMULATION to 486SX compatibles
x86/math-emu: Check __copy_from_user() result
x86/rdrand: Sanity-check RDRAND output
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Use XFEATURE_FP/SSE enum values instead of hardcoded numbers
x86/fpu: Shrink space allocated for xstate_comp_offsets
x86/fpu: Update stale variable name in comment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
- Fully reworked thermal throttling notifications, there should be no
more spamming of dmesg (Srinivas Pandruvada and Benjamin Berg)
- More enablement for the Intel-compatible CPUs Zhaoxin (Tony W
Wang-oc)
- PPIN support for Icelake (Tony Luck)
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce/therm_throt: Optimize notifications of thermal throttle
x86/mce: Add Xeon Icelake to list of CPUs that support PPIN
x86/mce: Lower throttling MCE messages' priority to warning
x86/mce: Add Zhaoxin LMCE support
x86/mce: Add Zhaoxin CMCI support
x86/mce: Add Zhaoxin MCE support
x86/mce/amd: Make disable_err_thresholding() static
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Borislav Petkov:
"This converts the late loading method to load the microcode in
parallel (vs sequentially currently). The patch remained in linux-next
for the maximum amount of time so that any potential and hard to debug
fallout be minimized.
Now cloud folks have their milliseconds back but all the normal people
should use early loading anyway :-)"
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/intel: Issue the revision updated message only on the BSP
x86/microcode: Update late microcode in parallel
x86/microcode/amd: Fix two -Wunused-but-set-variable warnings
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/kcsan
Pull the KCSAN subsystem from Paul E. McKenney:
"This pull request contains base kernel concurrency sanitizer
(KCSAN) enablement for x86, courtesy of Marco Elver. KCSAN is a
sampling watchpoint-based data-race detector, and is documented in
Documentation/dev-tools/kcsan.rst. KCSAN was announced in September,
and much feedback has since been incorporated:
http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com
The data races located thus far have resulted in a number of fixes:
https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan
Additional information may be found here:
https://lore.kernel.org/lkml/[email protected]/
"
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two fixes: disable unreliable HPET on Intel Coffe Lake platforms, and
fix a lockdep splat in the resctrl code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix potential lockdep warning
x86/quirks: Disable HPET on Intel Coffe Lake platforms
|
|
This patch enables KCSAN for x86, with updates to build rules to not use
KCSAN for several incompatible compilation units.
Signed-off-by: Marco Elver <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
Since MDS and TAA mitigations are inter-related for processors that are
affected by both vulnerabilities, the followiing confusing messages can
be printed in the kernel log:
MDS: Vulnerable
MDS: Mitigation: Clear CPU buffers
To avoid the first incorrect message, defer the printing of MDS
mitigation after the TAA mitigation selection has been done. However,
that has the side effect of printing TAA mitigation first before MDS
mitigation.
[ bp: Check box is affected/mitigations are disabled first before
printing and massage. ]
Suggested-by: Pawan Gupta <[email protected]>
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Mark Gross <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Tyler Hicks <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
For MDS vulnerable processors with TSX support, enabling either MDS or
TAA mitigations will enable the use of VERW to flush internal processor
buffers at the right code path. IOW, they are either both mitigated
or both not. However, if the command line options are inconsistent,
the vulnerabilites sysfs files may not report the mitigation status
correctly.
For example, with only the "mds=off" option:
vulnerabilities/mds:Vulnerable; SMT vulnerable
vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable
The mds vulnerabilities file has wrong status in this case. Similarly,
the taa vulnerability file will be wrong with mds mitigation on, but
taa off.
Change taa_select_mitigation() to sync up the two mitigation status
and have them turned off if both "mds=off" and "tsx_async_abort=off"
are present.
Update documentation to emphasize the fact that both "mds=off" and
"tsx_async_abort=off" have to be specified together for processors that
are affected by both TAA and MDS to be effective.
[ bp: Massage and add kernel-parameters.txt change too. ]
Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort")
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: [email protected]
Cc: Mark Gross <[email protected]>
Cc: <[email protected]>
Cc: Pawan Gupta <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Tyler Hicks <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
If iopl() is disabled, then providing ioperm() does not make much sense.
Rename the config option and disable/enable both syscalls with it. Guard
the code with #ifdefs where appropriate.
Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
The access to the full I/O port range can be also provided by the TSS I/O
bitmap, but that would require to copy 8k of data on scheduling in the
task. As shown with the sched out optimization TSS.io_bitmap_base can be
used to switch the incoming task to a preallocated I/O bitmap which has all
bits zero, i.e. allows access to all I/O ports.
Implementing this allows to provide an iopl() emulation mode which restricts
the IOPL level 3 permissions to I/O port access but removes the STI/CLI
permission which is coming with the hardware IOPL mechansim.
Provide a config option to switch IOPL to emulation mode, make it the
default and while at it also provide an option to disable IOPL completely.
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
|
|
Add a globally unique sequence number which is incremented when ioperm() is
changing the I/O bitmap of a task. Store the new sequence number in the
io_bitmap structure and compare it with the sequence number of the I/O
bitmap which was last loaded on a CPU. Only update the bitmap if the
sequence is different.
That should further reduce the overhead of I/O bitmap scheduling when there
are only a few I/O bitmap users on the system.
The 64bit sequence counter is sufficient. A wraparound of the sequence
counter assuming an ioperm() call every nanosecond would require about 584
years of uptime.
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Move the non hardware portion of I/O bitmap data into a seperate struct for
readability sake.
Originally-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
There is no requirement to update the TSS I/O bitmap when a thread using it is
scheduled out and the incoming thread does not use it.
For the permission check based on the TSS I/O bitmap the CPU calculates the memory
location of the I/O bitmap by the address of the TSS and the io_bitmap_base member
of the tss_struct. The easiest way to invalidate the I/O bitmap is to switch the
offset to an address outside of the TSS limit.
If an I/O instruction is issued from user space the TSS limit causes #GP to be
raised in the same was as valid I/O bitmap with all bits set to 1 would do.
This removes the extra work when an I/O bitmap using task is scheduled out
and puts the burden on the rare I/O bitmap users when they are scheduled
in.
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Similar to copy_thread_tls() the 32bit and 64bit implementations of
cpu_init() are very similar and unification avoids duplicate changes in the
future.
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
|