Age | Commit message (Collapse) | Author | Files | Lines |
|
The per-resource MSR update functions cat_wrmsr(), mba_wrmsr_intel(),
and mba_wrmsr_amd() all take three arguments:
(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
struct msr_param contains pointers to both struct rdt_resource and struct
rdt_domain, thus only struct msr_param is necessary.
Pass struct msr_param as a single parameter. Clean up formatting and
fix some fir tree declaration ordering.
No functional change.
Suggested-by: Reinette Chatre <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Maciej Wieczor-Retman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask()
to call rdt_ctrl_update() on potentially one CPU from each domain.
But this means rdt_ctrl_update() needs to figure out which domain to
apply changes to. Doing so requires a search of all domains in a resource,
which can only be done safely if cpus_lock is held. Both callers do hold
this lock, but there isn't a way for a function called on another CPU
via IPI to verify this.
Commit
c0d848fcb09d ("x86/resctrl: Remove lockdep annotation that triggers
false positive")
removed the incorrect assertions.
Add the target domain to the msr_param structure and call
rdt_ctrl_update() for each domain separately using
smp_call_function_single(). This means that rdt_ctrl_update() doesn't
need to search for the domain and get_domain_from_cpu() can safely
assert that the cpus_lock is held since the remaining callers do not use
IPI.
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: James Morse <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Maciej Wieczor-Retman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Commit a72bbec70da2 ("crash: hotplug support for kexec_load()")
introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses
this flag to indicate to the kernel that it is safe to modify the
elfcorehdr of the kdump image loaded using the kexec_load system call.
However, it is possible that architectures may need to update kexec
segments other then elfcorehdr. For example, FDT (Flatten Device Tree)
on PowerPC. Introducing a new kexec flag for every new kexec segment
may not be a good solution. Hence, a generic kexec flag bit,
`KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory
hotplug support intent between the kexec tool and the kernel for the
kexec_load system call.
Now we have two kexec flags that enables crash hotplug support for
kexec_load system call. First is KEXEC_UPDATE_ELFCOREHDR (only used in
x86), and second is KEXEC_CRASH_HOTPLUG_SUPPORT (for all architectures).
To simplify the process of finding and reporting the crash hotplug
support the following changes are introduced.
1. Define arch specific function to process the kexec flags and
determine crash hotplug support
2. Rename the @update_elfcorehdr member of struct kimage to
@hotplug_support and populate it for both kexec_load and
kexec_file_load syscalls, because architecture can update more than
one kexec segment
3. Let generic function crash_check_hotplug_support report hotplug
support for loaded kdump image based on value of @hotplug_support
To bring the x86 crash hotplug support in line with the above points,
the following changes have been made:
- Introduce the arch_crash_hotplug_support function to process kexec
flags and determine crash hotplug support
- Remove the arch_crash_hotplug_[cpu|memory]_support functions
Signed-off-by: Sourabh Jain <[email protected]>
Acked-by: Baoquan He <[email protected]>
Acked-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
In the event of memory hotplug or online/offline events, the crash
memory hotplug notifier `crash_memhp_notifier()` receives a
`memory_notify` object but doesn't forward that object to the
generic and architecture-specific crash hotplug handler.
The `memory_notify` object contains the starting PFN (Page Frame Number)
and the number of pages in the hot-removed memory. This information is
necessary for architectures like PowerPC to update/recreate the kdump
image, specifically `elfcorehdr`.
So update the function signature of `crash_handle_hotplug_event()` and
`arch_crash_handle_hotplug_event()` to accept the `memory_notify` object
as an argument from crash memory hotplug notifier.
Since no such object is available in the case of CPU hotplug event, the
crash CPU hotplug notifier `crash_cpuhp_online()` passes NULL to the
crash hotplug handler.
Signed-off-by: Sourabh Jain <[email protected]>
Acked-by: Baoquan He <[email protected]>
Acked-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
The MWAITX and MONITORX instructions generate the same #VC error code as
the MWAIT and MONITOR instructions, respectively. Update the #VC handler
opcode checking to also support the MWAITX and MONITORX opcodes.
Fixes: e3ef461af35a ("x86/sev: Harden #VC instruction emulation somewhat")
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/453d5a7cfb4b9fe818b6fb67f93ae25468bc9e23.1713793161.git.thomas.lendacky@amd.com
|
|
New CPU #defines encode vendor and family as well as model.
Update the example usage comment in arch/x86/kernel/cpu/match.c
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Pull fix for SEV-SNP late disable bugs.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Fix cpuid_deps[] to list the correct dependencies for GFNI, VAES, and
VPCLMULQDQ. These features don't depend on AVX512, and there exist CPUs
that support these features but not AVX512. GFNI actually doesn't even
depend on AVX.
This prevents GFNI from being unnecessarily disabled if AVX is disabled
to mitigate the GDS vulnerability.
This also prevents all three features from being unnecessarily disabled
if AVX512VL (or its dependency AVX512F) were to be disabled, but it
looks like there isn't any case where this happens anyway.
Fixes: c128dbfa0f87 ("x86/cpufeatures: Enable new SSE/AVX/AVX512 CPU features")
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Confusingly, X86_FEATURE_RETPOLINE doesn't mean retpolines are enabled,
as it also includes the original "AMD retpoline" which isn't a retpoline
at all.
Also replace cpu_feature_enabled() with boot_cpu_has() because this is
before alternatives are patched and cpu_feature_enabled()'s fallback
path is slower than plain old boot_cpu_has().
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Pawan Gupta <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/ad3807424a3953f0323c011a643405619f2a4927.1712944776.git.jpoimboe@kernel.org
|
|
per-CPU SEV data is dominantly accessed from their own local CPUs,
so allocate them node-local to improve performance.
Signed-off-by: Li RongQing <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Nikunj A Dadhania <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
There's a new conflict between this commit pending in x86/cpu:
63edbaa48a57 x86/cpu/topology: Add support for the AMD 0x80000026 leaf
And these fixes in x86/urgent:
c064b536a8f9 x86/cpu/amd: Make the NODEID_MSR union actually work
1b3108f6898e x86/cpu/amd: Make the CPUID 0x80000008 parser correct
Resolve them.
Conflicts:
arch/x86/kernel/cpu/topology_amd.c
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The topology rework missed that early_init_amd() tries to re-enable the
Topology Extensions when the BIOS disabled them.
The new parser is invoked before early_init_amd() so the re-enable attempt
happens too late.
Move it into the AMD specific topology parser code where it belongs.
Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/878r1j260l.ffs@tglx
|
|
A system with NODEID_MSR was reported to crash during early boot without
any output.
The reason is that the union which is used for accessing the bitfields in
the MSR is written wrongly and the resulting executable code accesses the
wrong part of the MSR data.
As a consequence a later division by that value results in 0 and that
result is used for another division as divisor, which obviously does not
work well.
The magic world of C, unions and bitfields:
union {
u64 bita : 3,
bitb : 3;
u64 all;
} x;
x.all = foo();
a = x.bita;
b = x.bitb;
results in the effective executable code of:
a = b = x.bita;
because bita and bitb are treated as union members and therefore both end
up at bit offset 0.
Wrapping the bitfield into an anonymous struct:
union {
struct {
u64 bita : 3,
bitb : 3;
};
u64 all;
} x;
works like expected.
Rework the NODEID_MSR union in exactly that way to cure the problem.
Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Reported-by: "kernelci.org bot" <[email protected]>
Reported-by: Laura Nao <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Laura Nao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Closes: https://lore.kernel.org/all/[email protected]/
|
|
CPUID 0x80000008 ECX.cpu_nthreads describes the number of threads in the
package. The parser uses this value to initialize the SMT domain level.
That's wrong because cpu_nthreads does not describe the number of threads
per physical core. So this needs to set the CORE domain level and let the
later parsers set the SMT shift if available.
Preset the SMT domain level with the assumption of one thread per core,
which is correct ifrt here are no other CPUID leafs to parse, and propagate
cpu_nthreads and the core level APIC bitwidth into the CORE domain.
Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Reported-by: "kernelci.org bot" <[email protected]>
Reported-by: Laura Nao <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Laura Nao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
For consistency with the other CONFIG_MITIGATION_* options, replace the
CONFIG_SPECTRE_BHI_{ON,OFF} options with a single
CONFIG_MITIGATION_SPECTRE_BHI option.
[ mingo: Fix ]
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Link: https://lore.kernel.org/r/3833812ea63e7fdbe36bf8b932e63f70d18e2a2a.1712813475.git.jpoimboe@kernel.org
|
|
Unlike most other mitigations' "auto" options, spectre_bhi=auto only
mitigates newer systems, which is confusing and not particularly useful.
Remove it.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org
|
|
SEV-ES allows passing custom contents for x87, SSE and AVX state into the VMSA.
Allow userspace to do that with the usual KVM_SET_XSAVE API and only mark
FPU contents as confidential after it has been copied and encrypted into
the VMSA.
Since the XSAVE state for AVX is the first, it does not need the
compacted-state handling of get_xsave_addr(). However, there are other
parts of XSAVE state in the VMSA that currently are not handled, and
the validation logic of get_xsave_addr() is pointless to duplicate
in KVM, so move get_xsave_addr() to public FPU API; it is really just
a facility to operate on XSAVE state and does not expose any internal
details of arch/x86/kernel/fpu.
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
changes
Signed-off-by: Ingo Molnar <[email protected]>
|
|
While syscall hardening helps prevent some BHI attacks, there's still
other low-hanging fruit remaining. Don't classify it as a mitigation
and make it clear that the system may still be vulnerable if it doesn't
have a HW or SW mitigation enabled.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org
|
|
The ARCH_CAP_RRSBA check isn't correct: RRSBA may have already been
disabled by the Spectre v2 mitigation (or can otherwise be disabled by
the BHI mitigation itself if needed). In that case retpolines are fine.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/6f56f13da34a0834b69163467449be7f58f253dc.1712813475.git.jpoimboe@kernel.org
|
|
So we are using the 'ia32_cap' value in a number of places,
which got its name from MSR_IA32_ARCH_CAPABILITIES MSR register.
But there's very little 'IA32' about it - this isn't 32-bit only
code, nor does it originate from there, it's just a historic
quirk that many Intel MSR names are prefixed with IA32_.
This is already clear from the helper method around the MSR:
x86_read_arch_cap_msr(), which doesn't have the IA32 prefix.
So rename 'ia32_cap' to 'x86_arch_cap_msr' to be consistent with
its role and with the naming of the helper function.
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
|
|
There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and
over. It's even read in the BHI sysfs function which is a big no-no.
Just read it once and cache it.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
|
|
topo_set_cpuids() updates cpu_present_map and cpu_possible map. It is
invoked during enumeration and "physical hotplug" operations. In the
latter case this results in a kernel crash because cpu_possible_map is
marked read only after init completes.
There is no reason to update cpu_possible_map in that function. During
enumeration cpu_possible_map is not relevant and gets fully initialized
after enumeration completed. On "physical hotplug" the bit is already set
because the kernel allows only CPUs to be plugged which have been
enumerated and associated to a CPU number during early boot.
Remove the bogus update of cpu_possible_map.
Fixes: 0e53e7b656cf ("x86/cpu/topology: Sanitize the APIC admission logic")
Reported-by: Jonathan Cameron <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/87ttkc6kwx.ffs@tglx
|
|
The definition of spectre_bhi_state() incorrectly returns a const char
* const. This causes the a compiler warning when building with W=1:
warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
2812 | static const char * const spectre_bhi_state(void)
Remove the const qualifier from the pointer.
Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Reported-by: Sean Christopherson <[email protected]>
Signed-off-by: Daniel Sneddon <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In smp_prepare_cpus_common() and x2apic_prepare_cpu():
- use 'cpu' instead of 'i'
- use 'node' instead of 'n'
- use vertical alignment to improve readability
- better structure basic blocks
- reduce col80 checkpatch damage
Signed-off-by: Ingo Molnar <[email protected]>
Cc: [email protected]
|
|
per-CPU cpumasks are dominantly accessed from their own local CPUs,
so allocate them node-local to improve performance.
[ mingo: Rewrote the changelog. ]
Signed-off-by: Li RongQing <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In a reverse x-mas tree.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Return early if NOPs have already been optimized.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
There's no need to carve out bits of the NOP optimization functionality
and look at JMP opcodes - simply do one more NOPs optimization pass
at the end of patching.
A lot simpler code.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Instead of optimizing NOPs in-place, use a temporary buffer like the
usual alternatives patching flow does. This obviates the need to grab
locks when patching, see
6778977590da ("x86/alternatives: Disable interrupts and sync when optimizing NOPs in place")
While at it, add nomenclature definitions clarifying and simplifying the
naming of function-local variables in the alternatives code.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
After alternatives have been patched, changes to the X86_FEATURE flags
won't take effect and could potentially even be wrong.
Warn about it.
This is something which has been long overdue.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Srikanth Aithal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Machine check SMIs (MSMI) signaled during SEAM operation (typically
inside TDX guests), on a system with Intel eMCA enabled, might eventually
be reported to the kernel #MC handler with the saved RIP on the stack
pointing to the instruction in kernel code after the SEAMCALL instruction
that entered the SEAM operation. Linux currently says that is a fatal
error and shuts down.
There is a new bit in IA32_MCG_STATUS that, when set to 1, indicates
that the machine check didn't originally occur at that saved RIP, but
during SEAM non-root operation.
Add new entries to the severity table to detect this for both data load
and instruction fetch that set the severity to "AR" (action required).
Increase the width of the mcgmask/mcgres fields in "struct severity"
from unsigned char to unsigned short since the new bit is in position 12.
Action required for these errors is just mark the page as poisoned and
return from the machine check handler.
HW ABI notes:
=============
The SEAM_NR bit in IA32_MCG_STATUS hasn't yet made it into the Intel
Software Developers' Manual. But it is described in section 16.5.2
of "Intel(R) Trust Domain Extensions (Intel(R) TDX) Module Base
Architecture Specification" downloadable from:
https://cdrdv2.intel.com/v1/dl/getContent/733575
Backport notes:
===============
Little value in backporting this patch to stable or LTS kernels as
this is only relevant with support for TDX, which I assume won't be
backported. But for anyone taking this to v6.1 or older, you also
need commit:
a51cbd0d86d3 ("x86/mce: Use severity table to handle uncorrected errors in kernel")
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Pull x86 mitigations from Thomas Gleixner:
"Mitigations for the native BHI hardware vulnerabilty:
Branch History Injection (BHI) attacks may allow a malicious
application to influence indirect branch prediction in kernel by
poisoning the branch history. eIBRS isolates indirect branch targets
in ring0. The BHB can still influence the choice of indirect branch
predictor entry, and although branch predictor entries are isolated
between modes when eIBRS is enabled, the BHB itself is not isolated
between modes.
Add mitigations against it either with the help of microcode or with
software sequences for the affected CPUs"
[ This also ends up enabling the full mitigation by default despite the
system call hardening, because apparently there are other indirect
calls that are still sufficiently reachable, and the 'auto' case just
isn't hardened enough.
We'll have some more inevitable tweaking in the future - Linus ]
* tag 'nativebhi' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
KVM: x86: Add BHI_NO
x86/bhi: Mitigate KVM by default
x86/bhi: Add BHI mitigation knob
x86/bhi: Enumerate Branch History Injection (BHI) bug
x86/bhi: Define SPEC_CTRL_BHI_DIS_S
x86/bhi: Add support for clearing branch history at syscall entry
x86/syscall: Don't force use of indirect calls for system calls
x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
|
|
BHI mitigation mode spectre_bhi=auto does not deploy the software
mitigation by default. In a cloud environment, it is a likely scenario
where userspace is trusted but the guests are not trusted. Deploying
system wide mitigation in such cases is not desirable.
Update the auto mode to unconditionally mitigate against malicious
guests. Deploy the software sequence at VMexit in auto mode also, when
hardware mitigation is not available. Unlike the force =on mode,
software sequence is not deployed at syscalls in auto mode.
Suggested-by: Alexandre Chartre <[email protected]>
Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Daniel Sneddon <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
|
|
Branch history clearing software sequences and hardware control
BHI_DIS_S were defined to mitigate Branch History Injection (BHI).
Add cmdline spectre_bhi={on|off|auto} to control BHI mitigation:
auto - Deploy the hardware mitigation BHI_DIS_S, if available.
on - Deploy the hardware mitigation BHI_DIS_S, if available,
otherwise deploy the software sequence at syscall entry and
VMexit.
off - Turn off BHI mitigation.
The default is auto mode which does not deploy the software sequence
mitigation. This is because of the hardening done in the syscall
dispatch path, which is the likely target of BHI.
Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Daniel Sneddon <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
|
|
Mitigation for BHI is selected based on the bug enumeration. Add bits
needed to enumerate BHI bug.
Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Daniel Sneddon <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
|
|
Newer processors supports a hardware control BHI_DIS_S to mitigate
Branch History Injection (BHI). Setting BHI_DIS_S protects the kernel
from userspace BHI attacks without having to manually overwrite the
branch history.
Define MSR_SPEC_CTRL bit BHI_DIS_S and its enumeration CPUID.BHI_CTRL.
Mitigation is enabled later.
Signed-off-by: Daniel Sneddon <[email protected]>
Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Daniel Sneddon <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
|
|
Change the format of the 'spectre_v2' vulnerabilities sysfs file
slightly by converting the commas to semicolons, so that mitigations for
future variants can be grouped together and separated by commas.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Daniel Sneddon <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
We want to fix:
0e110732473e ("x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO")
So merge in Linus's latest into x86/urgent to have it available.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Orphaned after
05e91e721138 ("x86/microcode/AMD: Rip out static buffers")
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
|
|
Older versions of clang show a warning for amd.c after a fix for a gcc
warning:
arch/x86/kernel/cpu/microcode/amd.c:478:47: error: format specifies type \
'unsigned char' but the argument has type 'u16' (aka 'unsigned short') [-Werror,-Wformat]
"amd-ucode/microcode_amd_fam%02hhxh.bin", family);
~~~~~~ ^~~~~~
%02hx
In clang-16 and higher, this warning is disabled by default, but clang-15 is
still supported, and it's trivial to avoid by adapting the types according
to the range of the passed data and the format string.
[ bp: Massage commit message. ]
Fixes: 2e9064faccd1 ("x86/microcode/amd: Fix snprintf() format string warning in W=1 build")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, bluetooth and bpf.
Fairly usual collection of driver and core fixes. The large selftest
accompanying one of the fixes is also becoming a common occurrence.
Current release - regressions:
- ipv6: fix infinite recursion in fib6_dump_done()
- net/rds: fix possible null-deref in newly added error path
Current release - new code bugs:
- net: do not consume a full cacheline for system_page_pool
- bpf: fix bpf_arena-related file descriptor leaks in the verifier
- drv: ice: fix freeing uninitialized pointers, fixing misuse of the
newfangled __free() auto-cleanup
Previous releases - regressions:
- x86/bpf: fixes the BPF JIT with retbleed=stuff
- xen-netfront: add missing skb_mark_for_recycle, fix page pool
accounting leaks, revealed by recently added explicit warning
- tcp: fix bind() regression for v6-only wildcard and v4-mapped-v6
non-wildcard addresses
- Bluetooth:
- replace "hci_qca: Set BDA quirk bit if fwnode exists in DT" with
better workarounds to un-break some buggy Qualcomm devices
- set conn encrypted before conn establishes, fix re-connecting to
some headsets which use slightly unusual sequence of msgs
- mptcp:
- prevent BPF accessing lowat from a subflow socket
- don't account accept() of non-MPC client as fallback to TCP
- drv: mana: fix Rx DMA datasize and skb_over_panic
- drv: i40e: fix VF MAC filter removal
Previous releases - always broken:
- gro: various fixes related to UDP tunnels - netns crossing
problems, incorrect checksum conversions, and incorrect packet
transformations which may lead to panics
- bpf: support deferring bpf_link dealloc to after RCU grace period
- nf_tables:
- release batch on table validation from abort path
- release mutex after nft_gc_seq_end from abort path
- flush pending destroy work before exit_net release
- drv: r8169: skip DASH fw status checks when DASH is disabled"
* tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
netfilter: validate user input for expected length
net/sched: act_skbmod: prevent kernel-infoleak
net: usb: ax88179_178a: avoid the interface always configured as random address
net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()
net: ravb: Always update error counters
net: ravb: Always process TX descriptor ring
netfilter: nf_tables: discard table flag update with pending basechain deletion
netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
netfilter: nf_tables: reject new basechain after table flag update
netfilter: nf_tables: flush pending destroy work before exit_net release
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
netfilter: nf_tables: release batch on table validation from abort path
Revert "tg3: Remove residual error handling in tg3_suspend"
tg3: Remove residual error handling in tg3_suspend
net: mana: Fix Rx DMA datasize and skb_over_panic
net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
net: stmmac: fix rx queue priority assignment
net: txgbe: fix i2c dev name cannot match clkdev
net: fec: Set mac_managed_pm during probe
...
|
|
Modifying a MCA bank's MCA_CTL bits which control which error types to
be reported is done over
/sys/devices/system/machinecheck/
├── machinecheck0
│ ├── bank0
│ ├── bank1
│ ├── bank10
│ ├── bank11
...
sysfs nodes by writing the new bit mask of events to enable.
When the write is accepted, the kernel deletes all current timers and
reinits all banks.
Doing that in parallel can lead to initializing a timer which is already
armed and in the timer wheel, i.e., in use already:
ODEBUG: init active (active state 0) object: ffff888063a28000 object
type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642
WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514
debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514
Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code
does.
Reported by: Yue Sun <[email protected]>
Reported by: xingwei lee <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%[email protected]
|
|
After
034ff37d3407 ("x86: rewrite '__copy_user_nocache' function")
rewrote __copy_user_nocache() to use EX_TYPE_UACCESS instead of the
EX_TYPE_COPY exception type, there are no more EX_TYPE_COPY users, so
remove it.
[ bp: Massage commit message. ]
Signed-off-by: Tong Tiangen <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The host SNP worthiness can determined later, after alternatives have
been patched, in snp_rmptable_init() depending on cmdline options like
iommu=pt which is incompatible with SNP, for example.
Which means that one cannot use X86_FEATURE_SEV_SNP and will need to
have a special flag for that control.
Use that newly added CC_ATTR_HOST_SEV_SNP in the appropriate places.
Move kdump_sev_callback() to its rightful place, while at it.
Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support")
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Tested-by: Srikanth Aithal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
There are few uses of CoCo that don't rely on working cryptography and
hence a working RNG. Unfortunately, the CoCo threat model means that the
VM host cannot be trusted and may actively work against guests to
extract secrets or manipulate computation. Since a malicious host can
modify or observe nearly all inputs to guests, the only remaining source
of entropy for CoCo guests is RDRAND.
If RDRAND is broken -- due to CPU hardware fault -- the RNG as a whole
is meant to gracefully continue on gathering entropy from other sources,
but since there aren't other sources on CoCo, this is catastrophic.
This is mostly a concern at boot time when initially seeding the RNG, as
after that the consequences of a broken RDRAND are much more
theoretical.
So, try at boot to seed the RNG using 256 bits of RDRAND output. If this
fails, panic(). This will also trigger if the system is booted without
RDRAND, as RDRAND is essential for a safe CoCo boot.
Add this deliberately to be "just a CoCo x86 driver feature" and not
part of the RNG itself. Many device drivers and platforms have some
desire to contribute something to the RNG, and add_device_randomness()
is specifically meant for this purpose.
Any driver can call it with seed data of any quality, or even garbage
quality, and it can only possibly make the quality of the RNG better or
have no effect, but can never make it worse.
Rather than trying to build something into the core of the RNG, consider
the particular CoCo issue just a CoCo issue, and therefore separate it
all out into driver (well, arch/platform) code.
[ bp: Massage commit message. ]
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Elena Reshetova <[email protected]>
Reviewed-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
The following commit:
d69c1382e1b7 ("x86/kvm: Convert FPU handling to a single swap buffer")
reworked KVM FPU handling, but forgot to update the comments
in xstate_op_valid(): fpu_swap_kvm_fpu() doesn't exist anymore,
fpu_swap_kvm_fpstate() is used instead.
Update the comments accordingly.
[ mingo: Improved the changelog. ]
Signed-off-by: Li RongQing <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Pull KVM fixes from Paolo Bonzini:
"ARM:
- Ensure perf events programmed to count during guest execution are
actually enabled before entering the guest in the nVHE
configuration
- Restore out-of-range handler for stage-2 translation faults
- Several fixes to stage-2 TLB invalidations to avoid stale
translations, possibly including partial walk caches
- Fix early handling of architectural VHE-only systems to ensure E2H
is appropriately set
- Correct a format specifier warning in the arch_timer selftest
- Make the KVM banner message correctly handle all of the possible
configurations
RISC-V:
- Remove redundant semicolon in num_isa_ext_regs()
- Fix APLIC setipnum_le/be write emulation
- Fix APLIC in_clrip[x] read emulation
x86:
- Fix a bug in KVM_SET_CPUID{2,} where KVM looks at the wrong CPUID
entries (old vs. new) and ultimately neglects to clear PV_UNHALT
from vCPUs with HLT-exiting disabled
- Documentation fixes for SEV
- Fix compat ABI for KVM_MEMORY_ENCRYPT_OP
- Fix a 14-year-old goof in a declaration shared by host and guest;
the enabled field used by Linux when running as a guest pushes the
size of "struct kvm_vcpu_pv_apf_data" from 64 to 68 bytes. This is
really unconsequential because KVM never consumes anything beyond
the first 64 bytes, but the resulting struct does not match the
documentation
Selftests:
- Fix spelling mistake in arch_timer selftest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
KVM: arm64: Rationalise KVM banner output
arm64: Fix early handling of FEAT_E2H0 not being implemented
KVM: arm64: Ensure target address is granule-aligned for range TLBI
KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range()
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
KVM: arm64: Don't defer TLB invalidation when zapping table entries
KVM: selftests: Fix __GUEST_ASSERT() format warnings in ARM's arch timer test
KVM: arm64: Fix out-of-IPA space translation fault handling
KVM: arm64: Fix host-programmed guest events in nVHE
RISC-V: KVM: Fix APLIC in_clrip[x] read emulation
RISC-V: KVM: Fix APLIC setipnum_le/be write emulation
RISC-V: KVM: Remove second semicolon
KVM: selftests: Fix spelling mistake "trigged" -> "triggered"
Documentation: kvm/sev: clarify usage of KVM_MEMORY_ENCRYPT_OP
Documentation: kvm/sev: separate description of firmware
KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP
KVM: selftests: Check that PV_UNHALT is cleared when HLT exiting is disabled
KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT
KVM: x86: Introduce __kvm_get_hypervisor_cpuid() helper
KVM: SVM: Return -EINVAL instead of -EBUSY on attempt to re-init SEV/SEV-ES
...
|