Age | Commit message (Collapse) | Author | Files | Lines |
|
The rftype flags are bitmaps used for adding files under the resctrl
filesystem. Some of these bitmap defines have one extra level of
indirection which is not necessary.
Drop the RF_* defines and simplify the macros.
[ bp: Massage commit message. ]
Signed-off-by: Babu Moger <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Peter Newman <[email protected]>
Reviewed-by: Tan Shaopeng <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Ilpo Järvinen <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Tan Shaopeng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The resctrl task assignment for monitor or control group needs to be
done one at a time. For example:
$mount -t resctrl resctrl /sys/fs/resctrl/
$mkdir /sys/fs/resctrl/ctrl_grp1
$echo 123 > /sys/fs/resctrl/ctrl_grp1/tasks
$echo 456 > /sys/fs/resctrl/ctrl_grp1/tasks
$echo 789 > /sys/fs/resctrl/ctrl_grp1/tasks
This is not user-friendly when dealing with hundreds of tasks.
Support multiple task assignment in one command with tasks ids separated
by commas. For example:
$echo 123,456,789 > /sys/fs/resctrl/ctrl_grp1/tasks
Signed-off-by: Babu Moger <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Peter Newman <[email protected]>
Reviewed-by: Tan Shaopeng <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Ilpo Järvinen <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Tan Shaopeng <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Move Intel-specific checks into a helper function.
Explicitly use "bool" for return type.
No functional change intended.
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Currently, all valid MCA_ADDR values are assumed to be usable on AMD
systems. However, this is not correct in most cases. Notifiers expecting
usable addresses may then operate on inappropriate values.
Define a helper function to do AMD-specific checks for a usable memory
address. List out all known cases.
[ bp: Tone down the capitalized words. ]
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Define helper functions for legacy and SMCA systems in order to reuse
individual checks in later changes.
Describe what each function is checking for, and correct the XEC bitmask
for SMCA.
No functional change intended.
[ bp: Use "else in amd_mce_is_memory_error() to make the conditional
balanced, for readability. ]
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shuai Xue <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add the interface in resctrl FS to show if sparse cache allocation
bit masks are supported on the platform. Reading the file returns
either a "1" if non-contiguous 1s are supported and "0" otherwise.
The file path is /sys/fs/resctrl/info/{resource}/sparse_masks, where
{resource} can be either "L2" or "L3".
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Maciej Wieczor-Retman <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Ilpo Järvinen <[email protected]>
Reviewed-by: Peter Newman <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Peter Newman <[email protected]>
Link: https://lore.kernel.org/r/7300535160beba41fd8aa073749ec1ee29b4621f.1696934091.git.maciej.wieczor-retman@intel.com
|
|
The setting for non-contiguous 1s support in Intel CAT is
hardcoded to false. On these systems, writing non-contiguous
1s into the schemata file will fail before resctrl passes
the value to the hardware.
In Intel CAT CPUID.0x10.1:ECX[3] and CPUID.0x10.2:ECX[3] stopped
being reserved and now carry information about non-contiguous 1s
value support for L3 and L2 cache respectively. The CAT
capacity bitmask (CBM) supports a non-contiguous 1s value if
the bit is set.
The exception are Haswell systems where non-contiguous 1s value
support needs to stay disabled since they can't make use of CPUID
for Cache allocation.
Originally-by: Fenghua Yu <[email protected]>
Signed-off-by: Maciej Wieczor-Retman <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Ilpo Järvinen <[email protected]>
Reviewed-by: Peter Newman <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Peter Newman <[email protected]>
Link: https://lore.kernel.org/r/1849b487256fe4de40b30f88450cba3d9abc9171.1696934091.git.maciej.wieczor-retman@intel.com
|
|
Rename arch_has_sparse_bitmaps to arch_has_sparse_bitmasks to ensure
consistent terminology throughout resctrl.
Suggested-by: Reinette Chatre <[email protected]>
Signed-off-by: Maciej Wieczor-Retman <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Ilpo Järvinen <[email protected]>
Reviewed-by: Peter Newman <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Peter Newman <[email protected]>
Link: https://lore.kernel.org/r/e330fcdae873ef1a831e707025a4b70fa346666e.1696934091.git.maciej.wieczor-retman@intel.com
|
|
Fix erratum #1485 on Zen4 parts where running with STIBP disabled can
cause an #UD exception. The performance impact of the fix is negligible.
Reported-by: René Rebe <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: René Rebe <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The kernel test robot reported kernel-doc warnings here:
arch/x86/kernel/cpu/resctrl/rdtgroup.c:915: warning: Function parameter or member 'of' not described in 'rdt_bit_usage_show'
arch/x86/kernel/cpu/resctrl/rdtgroup.c:915: warning: Function parameter or member 'seq' not described in 'rdt_bit_usage_show'
arch/x86/kernel/cpu/resctrl/rdtgroup.c:915: warning: Function parameter or member 'v' not described in 'rdt_bit_usage_show'
arch/x86/kernel/cpu/resctrl/rdtgroup.c:1144: warning: Function parameter or member 'type' not described in '__rdtgroup_cbm_overlaps'
arch/x86/kernel/cpu/resctrl/rdtgroup.c:1224: warning: Function parameter or member 'rdtgrp' not described in 'rdtgroup_mode_test_exclusive'
arch/x86/kernel/cpu/resctrl/rdtgroup.c:1261: warning: Function parameter or member 'of' not described in 'rdtgroup_mode_write'
arch/x86/kernel/cpu/resctrl/rdtgroup.c:1261: warning: Function parameter or member 'buf' not described in 'rdtgroup_mode_write'
arch/x86/kernel/cpu/resctrl/rdtgroup.c:1261: warning: Function parameter or member 'nbytes' not described in 'rdtgroup_mode_write'
arch/x86/kernel/cpu/resctrl/rdtgroup.c:1261: warning: Function parameter or member 'off' not described in 'rdtgroup_mode_write'
arch/x86/kernel/cpu/resctrl/rdtgroup.c:1370: warning: Function parameter or member 'of' not described in 'rdtgroup_size_show'
arch/x86/kernel/cpu/resctrl/rdtgroup.c:1370: warning: Function parameter or member 's' not described in 'rdtgroup_size_show'
arch/x86/kernel/cpu/resctrl/rdtgroup.c:1370: warning: Function parameter or member 'v' not described in 'rdtgroup_size_show'
The first two functions are missing an argument description while the
other three are file callbacks and don't require a kernel-doc comment.
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Maciej Wieczor-Retman <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Peter Newman <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Reinette Chatre <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/)
Remove sentinel element from sld_sysctl and itmt_kern_table. This
removal is safe because register_sysctl_init and register_sysctl
implicitly use the array size in addition to checking for the sentinel.
Reviewed-by: Ingo Molnar <[email protected]>
Acked-by: Dave Hansen <[email protected]> # for x86
Signed-off-by: Joel Granados <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
|
|
Provide debug files which dump the topology related information of
cpuinfo_x86. This is useful to validate the upcoming conversion of the
topology evaluation for correctness or bug compatibility.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.
Make it all consistently use u32 because that reflects the hardware
register width and fixup a few related usage sites for consistency sake.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Reviewed-by: Arjan van de Ven <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The topology IDs which identify the LLC and L2 domains clearly belong to
the per CPU topology information.
Move them into cpuinfo_x86::cpuinfo_topo and get rid of the extra per CPU
data and the related exports.
This also paves the way to do proper topology evaluation during early boot
because it removes the only per CPU dependency for that.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Reviewed-by: Arjan van de Ven <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Yet another topology related data pair. Rename logical_proc_id to
logical_pkg_id so it fits the common naming conventions.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
cpuinfo_x86::x86_coreid_bits is only used by the AMD numa topology code. No
point in evaluating it on non AMD systems.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Reviewed-by: Arjan van de Ven <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Rename it to core_id and stick it to the other ID fields.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Move the next member.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Rename it to pkg_id which is the terminology used in the kernel.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The topology related information is randomly scattered across cpuinfo_x86.
Create a new structure cpuinfo_topo and move in a first step initial_apicid
and apicid into it.
Aside of being better readable this is in preparation for replacing the
horribly fragile CPU topology evaluation code further down the road.
Consolidate APIC ID fields to u32 as that represents the hardware type.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Tested-by: Sohil Mehta <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Hygon processors with a model ID > 3 have CPUID leaf 0xB correctly
populated and don't need the fixed package ID shift workaround. The fixup
is also incorrect when running in a guest.
Fixes: e0ceeae708ce ("x86/CPU/hygon: Fix phys_proc_id calculation logic for multi-die processors")
Signed-off-by: Pu Wen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
The kernel test robot reported kernel-doc warnings here:
monitor.c:34: warning: Cannot understand * @rmid_free_lru A least recently used list of free RMIDs on line 34 - I thought it was a doc line
monitor.c:41: warning: Cannot understand * @rmid_limbo_count count of currently unused but (potentially) on line 41 - I thought it was a doc line
monitor.c:50: warning: Cannot understand * @rmid_entry - The entry in the limbo and free lists. on line 50 - I thought it was a doc line
We don't have a syntax for documenting individual data items via
kernel-doc, so remove the "/**" kernel-doc markers and add a hyphen
for consistency.
Fixes: 6a445edce657 ("x86/intel_rdt/cqm: Add RDT monitoring initialization")
Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
c->x86_cache_alignment is initialized from c->x86_clflush_size.
However, commit fbf6449f84bf moved c->x86_clflush_size initialization
to later in boot without moving the c->x86_cache_alignment assignment:
fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
This presumably left c->x86_cache_alignment set to zero for longer
than it should be.
The result was an oops on 32-bit kernels while accessing a pointer
at 0x20. The 0x20 came from accessing a structure member at offset
0x10 (buffer->cpumask) from a ZERO_SIZE_PTR=0x10. kmalloc() can
evidently return ZERO_SIZE_PTR when it's given 0 as its alignment
requirement.
Move the c->x86_cache_alignment initialization to be after
c->x86_clflush_size has an actual value.
Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This break is after the return statement, so it is redundant & confusing,
and should be deleted.
Signed-off-by: Baolin Liu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC page for an
enclave and set secs.epc_page to NULL. The SECS page is used for EAUG
and ELDU in the SGX page fault handler. However, the NULL check for
secs.epc_page is only done for ELDU, not EAUG before being used.
Fix this by doing the same NULL check and reloading of the SECS page as
needed for both EAUG and ELDU.
The SECS page holds global enclave metadata. It can only be reclaimed
when there are no other enclave pages remaining. At that point,
virtually nothing can be done with the enclave until the SECS page is
paged back in.
An enclave can not run nor generate page faults without a resident SECS
page. But it is still possible for a #PF for a non-SECS page to race
with paging out the SECS page: when the last resident non-SECS page A
triggers a #PF in a non-resident page B, and then page A and the SECS
both are paged out before the #PF on B is handled.
Hitting this bug requires that race triggered with a #PF for EAUG.
Following is a trace when it happens.
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:sgx_encl_eaug_page+0xc7/0x210
Call Trace:
? __kmem_cache_alloc_node+0x16a/0x440
? xa_load+0x6e/0xa0
sgx_vma_fault+0x119/0x230
__do_fault+0x36/0x140
do_fault+0x12f/0x400
__handle_mm_fault+0x728/0x1110
handle_mm_fault+0x105/0x310
do_user_addr_fault+0x1ee/0x750
? __this_cpu_preempt_check+0x13/0x20
exc_page_fault+0x76/0x180
asm_exc_page_fault+0x27/0x30
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Signed-off-by: Haitao Huang <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Reviewed-by: Kai Huang <[email protected]>
Acked-by: Reinette Chatre <[email protected]>
Cc:[email protected]
Link: https://lore.kernel.org/all/20230728051024.33063-1-haitao.huang%40linux.intel.com
|
|
a two-phase approach
Instead of setting x86_virt_bits to a possibly-correct value and then
correcting it later, do all the necessary checks before setting it.
At this point, the #VC handler references boot_cpu_data.x86_virt_bits,
and in the previous version, it would be triggered by the CPUIDs between
the point at which it is set to 48 and when it is set to the correct
value.
Suggested-by: Dave Hansen <[email protected]>
Signed-off-by: Adam Dunlap <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Jacob Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add mitigation for the speculative return stack overflow vulnerability
which exists on Hygon processors too.
Signed-off-by: Pu Wen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Borislav Petkov (AMD) <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When SVM is disabled by BIOS, one cannot use KVM but the
SVM feature is still shown in the output of /proc/cpuinfo.
On Intel machines, VMX is cleared by init_ia32_feat_ctl(),
so do the same on AMD and Hygon processors.
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
If the user has requested no SRSO mitigation, other mitigations can use
the lighter-weight SBPB instead of IBPB.
Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/b20820c3cfd1003171135ec8d762a0b957348497.1693889988.git.jpoimboe@kernel.org
|
|
To support live migration, the hypervisor sets the "lowest common
denominator" of features. Probing the microcode isn't allowed because
any detected features might go away after a migration.
As Andy Cooper states:
"Linux must not probe microcode when virtualised. What it may see
instantaneously on boot (owing to MSR_PRED_CMD being fully passed
through) is not accurate for the lifetime of the VM."
Rely on the hypervisor to set the needed IBPB_BRTYPE and SBPB bits.
Fixes: 1b5277c0ea0b ("x86/srso: Add SRSO_NO support")
Suggested-by: Andrew Cooper <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Andrew Cooper <[email protected]>
Acked-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/3938a7209606c045a3f50305d201d840e8c834c7.1693889988.git.jpoimboe@kernel.org
|
|
Booting with mitigations=off incorrectly prevents the
X86_FEATURE_{IBPB_BRTYPE,SBPB} CPUID bits from getting set.
Also, future CPUs without X86_BUG_SRSO might still have IBPB with branch
type prediction flushing, in which case SBPB should be used instead of
IBPB. The current code doesn't allow for that.
Also, cpu_has_ibpb_brtype_microcode() has some surprising side effects
and the setting of these feature bits really doesn't belong in the
mitigation code anyway. Move it to earlier.
Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/869a1709abfe13b673bdd10c2f4332ca253a40bc.1693889988.git.jpoimboe@kernel.org
|
|
Reading the 'spec_rstack_overflow' sysfs file can trigger an unnecessary
MSR write, and possibly even a (handled) exception if the microcode
hasn't been updated.
Avoid all that by just checking X86_FEATURE_IBPB_BRTYPE instead, which
gets set by srso_select_mitigation() if the updated microcode exists.
Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Acked-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/27d128899cb8aee9eb2b57ddc996742b0c1d776b.1693889988.git.jpoimboe@kernel.org
|
|
Another major aspect of supporting running of 32bit processes is the
ability to access 32bit syscalls. Such syscalls can be invoked by
using the legacy int 0x80 handler and sysenter/syscall instructions.
If IA32 emulation is disabled ensure that each of those 3 distinct
mechanisms are also disabled. For int 0x80 a #GP exception would be
generated since the respective descriptor is not going to be loaded at
all. Invoking sysenter will also result in a #GP since IA32_SYSENTER_CS
contains an invalid segment. Finally, syscall instruction cannot really
be disabled so it's configured to execute a minimal handler.
Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The SYSCALL instruction cannot really be disabled in compatibility mode.
The best that can be done is to configure the CSTAR msr to point to a
minimal handler. Currently this handler has a rather misleading name -
ignore_sysret() as it's not really doing anything with sysret.
Give it a more descriptive name.
Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix preemption delays in the SGX code, remove unnecessarily
UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and
make the x86 SMP init code a bit more conservative to fix kexec()
lockups"
* tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Break up long non-preemptible delays in sgx_vepc_release()
x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI
x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld
x86/smp: Don't send INIT to non-present and non-booted CPUs
|
|
On large enclaves we hit the softlockup warning with following call trace:
xa_erase()
sgx_vepc_release()
__fput()
task_work_run()
do_exit()
The latency issue is similar to the one fixed in:
8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves")
The test system has 64GB of enclave memory, and all is assigned to a single VM.
Release of 'vepc' takes a longer time and causes long latencies, which triggers
the softlockup warning.
Add cond_resched() to give other tasks a chance to run and reduce
latencies, which also avoids the softlockup detector.
[ mingo: Rewrote the changelog. ]
Fixes: 540745ddbc70 ("x86/sgx: Introduce virtual EPC for use by KVM guests")
Reported-by: Yu Zhang <[email protected]>
Signed-off-by: Jack Wang <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Yu Zhang <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Reviewed-by: Kai Huang <[email protected]>
Acked-by: Haitao Huang <[email protected]>
Cc: [email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Support for SEV-SNP guests on Hyper-V (Tianyu Lan)
- Support for TDX guests on Hyper-V (Dexuan Cui)
- Use SBRM API in Hyper-V balloon driver (Mitchell Levy)
- Avoid dereferencing ACPI root object handle in VMBus driver (Maciej
Szmigiero)
- A few misecllaneous fixes (Jiapeng Chong, Nathan Chancellor, Saurabh
Sengar)
* tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits)
x86/hyperv: Remove duplicate include
x86/hyperv: Move the code in ivm.c around to avoid unnecessary ifdef's
x86/hyperv: Remove hv_isolation_type_en_snp
x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor
Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor
x86/hyperv: Introduce a global variable hyperv_paravisor_present
Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM
x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests
Drivers: hv: vmbus: Support fully enlightened TDX guests
x86/hyperv: Support hypercalls for fully enlightened TDX guests
x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
x86/hyperv: Fix undefined reference to isolation_type_en_snp without CONFIG_HYPERV
x86/hyperv: Add missing 'inline' to hv_snp_boot_ap() stub
hv: hyperv.h: Replace one-element array with flexible-array member
Drivers: hv: vmbus: Don't dereference ACPI root object handle
x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES
x86/hyperv: Add smp support for SEV-SNP guest
clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp enlightened guest
x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest
drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP enlightened guest
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Dave Hansen:
"The most important fix here adds a missing CPU model to the recent
Gather Data Sampling (GDS) mitigation list to ensure that mitigations
are available on that CPU.
There are also a pair of warning fixes, and closure of a covert
channel that pops up when protection keys are disabled.
Summary:
- Mark all Skylake CPUs as vulnerable to GDS
- Fix PKRU covert channel
- Fix -Wmissing-variable-declarations warning for ia32_xyz_class
- Fix kernel-doc annotation warning"
* tag 'x86-urgent-2023-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu/xstate: Fix PKRU covert channel
x86/irq/i8259: Fix kernel-doc annotation warning
x86/speculation: Mark all Skylake CPUs as vulnerable to GDS
x86/audit: Fix -Wmissing-variable-declarations warning for ia32_xyz_class
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is a small set of driver core updates and additions for 6.6-rc1.
Included in here are:
- stable kernel documentation updates
- class structure const work from Ivan on various subsystems
- kernfs tweaks
- driver core tests!
- kobject sanity cleanups
- kobject structure reordering to save space
- driver core error code handling fixups
- other minor driver core cleanups
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
driver core: Call in reversed order in device_platform_notify_remove()
driver core: Return proper error code when dev_set_name() fails
kobject: Remove redundant checks for whether ktype is NULL
kobject: Add sanity check for kset->kobj.ktype in kset_register()
drivers: base: test: Add missing MODULE_* macros to root device tests
drivers: base: test: Add missing MODULE_* macros for platform devices tests
drivers: base: Free devm resources when unregistering a device
drivers: base: Add basic devm tests for platform devices
drivers: base: Add basic devm tests for root devices
kernfs: fix missing kernfs_iattr_rwsem locking
docs: stable-kernel-rules: mention that regressions must be prevented
docs: stable-kernel-rules: fine-tune various details
docs: stable-kernel-rules: make the examples for option 1 a proper list
docs: stable-kernel-rules: move text around to improve flow
docs: stable-kernel-rules: improve structure by changing headlines
base/node: Remove duplicated include
kernfs: attach uuid for every kernfs and report it in fsid
kernfs: add stub helper for kernfs_generic_poll()
x86/resctrl: make pseudo_lock_class a static const structure
x86/MSR: make msr_class a static const structure
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 shadow stack support from Dave Hansen:
"This is the long awaited x86 shadow stack support, part of Intel's
Control-flow Enforcement Technology (CET).
CET consists of two related security features: shadow stacks and
indirect branch tracking. This series implements just the shadow stack
part of this feature, and just for userspace.
The main use case for shadow stack is providing protection against
return oriented programming attacks. It works by maintaining a
secondary (shadow) stack using a special memory type that has
protections against modification. When executing a CALL instruction,
the processor pushes the return address to both the normal stack and
to the special permission shadow stack. Upon RET, the processor pops
the shadow stack copy and compares it to the normal stack copy.
For more information, refer to the links below for the earlier
versions of this patch set"
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/lkml/[email protected]/
* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
x86/shstk: Change order of __user in type
x86/ibt: Convert IBT selftest to asm
x86/shstk: Don't retry vm_munmap() on -EINTR
x86/kbuild: Fix Documentation/ reference
x86/shstk: Move arch detail comment out of core mm
x86/shstk: Add ARCH_SHSTK_STATUS
x86/shstk: Add ARCH_SHSTK_UNLOCK
x86: Add PTRACE interface for shadow stack
selftests/x86: Add shadow stack test
x86/cpufeatures: Enable CET CR4 bit for shadow stack
x86/shstk: Wire in shadow stack interface
x86: Expose thread features in /proc/$PID/status
x86/shstk: Support WRSS for userspace
x86/shstk: Introduce map_shadow_stack syscall
x86/shstk: Check that signal frame is shadow stack mem
x86/shstk: Check that SSP is aligned on sigreturn
x86/shstk: Handle signals for shadow stack
x86/shstk: Introduce routines modifying shstk
x86/shstk: Handle thread shadow stack
x86/shstk: Add user-mode shadow stack support
...
|
|
The Gather Data Sampling (GDS) vulnerability is common to all Skylake
processors. However, the "client" Skylakes* are now in this list:
https://www.intel.com/content/www/us/en/support/articles/000022396/processors.html
which means they are no longer included for new vulnerabilities here:
https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html
or in other GDS documentation. Thus, they were not included in the
original GDS mitigation patches.
Mark SKYLAKE and SKYLAKE_L as vulnerable to GDS to match all the
other Skylake CPUs (which include Kaby Lake). Also group the CPUs
so that the ones that share the exact same vulnerabilities are next
to each other.
Last, move SRBDS to the end of each line. This makes it clear at a
glance that SKYLAKE_X is unique. Of the five Skylakes, it is the
only "server" CPU and has a different implementation from the
clients of the "special register" hardware, making it immune to SRBDS.
This makes the diff much harder to read, but the resulting table is
worth it.
I very much appreciate the report from Michael Zhivich about this
issue. Despite what level of support a hardware vendor is providing,
the kernel very much needs an accurate and up-to-date list of
vulnerable CPUs. More reports like this are very welcome.
* Client Skylakes are CPUID 406E3/506E3 which is family 6, models
0x4E and 0x5E, aka INTEL_FAM6_SKYLAKE and INTEL_FAM6_SKYLAKE_L.
Reported-by: Michael Zhivich <[email protected]>
Fixes: 8974eb588283 ("x86/speculation: Add Gather Data Sampling mitigation")
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Daniel Sneddon <[email protected]>
Cc: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Dave Hansen:
"This includes a very thorough rework of the 'struct apic' handlers.
Quite a variety of them popped up over the years, especially in the
32-bit days when odd apics were much more in vogue.
The end result speaks for itself, which is a removal of a ton of code
and static calls to replace indirect calls.
If there's any breakage here, it's likely to be around the 32-bit
museum pieces that get light to no testing these days.
Summary:
- Rework apic callbacks, getting rid of unnecessary ones and
coalescing lots of silly duplicates.
- Use static_calls() instead of indirect calls for apic->foo()
- Tons of cleanups an crap removal along the way"
* tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
x86/apic: Turn on static calls
x86/apic: Provide static call infrastructure for APIC callbacks
x86/apic: Wrap IPI calls into helper functions
x86/apic: Mark all hotpath APIC callback wrappers __always_inline
x86/xen/apic: Mark apic __ro_after_init
x86/apic: Convert other overrides to apic_update_callback()
x86/apic: Replace acpi_wake_cpu_handler_update() and apic_set_eoi_cb()
x86/apic: Provide apic_update_callback()
x86/xen/apic: Use standard apic driver mechanism for Xen PV
x86/apic: Provide common init infrastructure
x86/apic: Wrap apic->native_eoi() into a helper
x86/apic: Nuke ack_APIC_irq()
x86/apic: Remove pointless arguments from [native_]eoi_write()
x86/apic/noop: Tidy up the code
x86/apic: Remove pointless NULL initializations
x86/apic: Sanitize APIC ID range validation
x86/apic: Prepare x2APIC for using apic::max_apic_id
x86/apic: Simplify X2APIC ID validation
x86/apic: Add max_apic_id member
x86/apic: Wrap APIC ID validation into an inline
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event updates from Ingo Molnar:
- AMD IBS improvements
- Intel PMU driver updates
- Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events
- Micro-optimize software events and the ring-buffer code
- Misc cleanups & fixes
* tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call
perf/x86/intel: Add Crestmont PMU
x86/cpu: Update Hybrids
x86/cpu: Fix Crestmont uarch
x86/cpu: Fix Gracemont uarch
perf: Remove unused extern declaration arch_perf_get_page_size()
perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability
perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
perf/mem: Introduce PERF_MEM_LVLNUM_UNC
perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin
locking/arch: Avoid variable shadowing in local_try_cmpxchg()
perf/core: Use local64_try_cmpxchg in perf_swevent_set_period
perf/x86: Use local64_try_cmpxchg
perf/amd: Prevent grouping of IBS events
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Borislav Petkov:
"The first, cleanup part of the microcode loader reorg tglx has been
working on. The other part wasn't fully ready in time so it will
follow on later.
This part makes the loader core code as it is practically enabled on
pretty much every baremetal machine so there's no need to have the
Kconfig items.
In addition, there are cleanups which prepare for future feature
enablement"
* tag 'x86_microcode_for_v6.6_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Remove remaining references to CONFIG_MICROCODE_AMD
x86/microcode/intel: Remove pointless mutex
x86/microcode/intel: Remove debug code
x86/microcode: Move core specific defines to local header
x86/microcode/intel: Rename get_datasize() since its used externally
x86/microcode: Make reload_early_microcode() static
x86/microcode: Include vendor headers into microcode.h
x86/microcode/intel: Move microcode functions out of cpu/intel.c
x86/microcode: Hide the config knob
x86/mm: Remove unused microcode.h include
x86/microcode: Remove microcode_mutex
x86/microcode/AMD: Rip out static buffers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- Add a quirk for AMD Zen machines where Instruction Fetch unit poison
consumption MCEs are not delivered synchronously but still within the
same context, which can lead to erroneously increased error severity
and unneeded kernel panics
- Do not log errors caught by polling shared MCA banks as they
materialize as duplicated error records otherwise
* tag 'ras_core_for_v6.6_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE: Always save CS register on AMD Zen IF Poison errors
x86/mce: Prevent duplicate error records
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug updates from Thomas Gleixner:
"Updates for the CPU hotplug core:
- Support partial SMT enablement.
So far the sysfs SMT control only allows to toggle between SMT on
and off. That's sufficient for x86 which usually has at max two
threads except for the Xeon PHI platform which has four threads per
core
Though PowerPC has up to 16 threads per core and so far it's only
possible to control the number of enabled threads per core via a
command line option. There is some way to control this at runtime,
but that lacks enforcement and the usability is awkward
This update expands the sysfs interface and the core infrastructure
to accept numerical values so PowerPC can build SMT runtime control
for partial SMT enablement on top
The core support has also been provided to the PowerPC maintainers
who added the PowerPC related changes on top
- Minor cleanups and documentation updates"
* tag 'smp-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation: core-api/cpuhotplug: Fix state names
cpu/hotplug: Remove unused function declaration cpu_set_state_online()
cpu/SMT: Fix cpu_smt_possible() comment
cpu/SMT: Allow enabling partial SMT states via sysfs
cpu/SMT: Create topology_smt_thread_allowed()
cpu/SMT: Remove topology_smt_supported()
cpu/SMT: Store the current/max number of threads
cpu/SMT: Move smt/control simple exit cases earlier
cpu/SMT: Move SMT prototypes into cpu_smt.h
cpu/hotplug: Remove dependancy against cpu_primary_thread_mask
|
|
Commit e6bcfdd75d53 ("x86/microcode: Hide the config knob") removed the
MICROCODE_AMD config, but left some references in defconfigs and comments,
that have no effect on any kernel build around.
Clean up those remaining config references. No functional change.
Signed-off-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In ms_hyperv_init_platform(), do not distinguish between a SNP VM with
the paravisor and a SNP VM without the paravisor.
Replace hv_isolation_type_en_snp() with
!ms_hyperv.paravisor_present && hv_isolation_type_snp().
The hv_isolation_type_en_snp() in drivers/hv/hv.c and
drivers/hv/hv_common.c can be changed to hv_isolation_type_snp() since
we know !ms_hyperv.paravisor_present is true there.
Signed-off-by: Dexuan Cui <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Reviewed-by: Tianyu Lan <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|