Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix the 8 bytes get_user() logic on x86-32
- Fix build bug that creates weird & mistaken target directory under
arch/x86/
* tag 'x86-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Don't add the EFI stub to targets, again
x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checking
|
|
The logic in __efi_memmap_init() is shared between two different
execution flows:
- mapping the EFI memory map early or late into the kernel VA space, so
that its entries can be accessed;
- the x86 specific cloning of the EFI memory map in order to insert new
entries that are created as a result of making a memory reservation
via a call to efi_mem_reserve().
In the former case, the underlying memory containing the kernel's view
of the EFI memory map (which may be heavily modified by the kernel
itself on x86) is not modified at all, and the only thing that changes
is the virtual mapping of this memory, which is different between early
and late boot.
In the latter case, an entirely new allocation is created that carries a
new, updated version of the kernel's view of the EFI memory map. When
installing this new version, the old version will no longer be
referenced, and if the memory was allocated by the kernel, it will leak
unless it gets freed.
The logic that implements this freeing currently lives on the code path
that is shared between these two use cases, but it should only apply to
the latter. So move it to the correct spot.
While at it, drop the dummy definition for non-x86 architectures, as
that is no longer needed.
Cc: <[email protected]>
Fixes: f0ef6523475f ("efi: Fix efi_memmap_alloc() leaks")
Tested-by: Ashish Kalra <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Ard Biesheuvel <[email protected]>
|
|
Avoid needlessly reacquiring the RCU read lock if the TDP MMU fails to
allocate a shadow page during eager page splitting. Opportunistically
drop the local variable ret as well now that it's no longer necessary.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Move the implementation of tdp_mmu_alloc_sp_for_split() to its one and
only caller to reduce unnecessary nesting and make it more clear why the
eager split loop continues after allocating a new SP.
Opportunistically drop the double-underscores from
__tdp_mmu_alloc_sp_for_split() now that its parent is gone.
No functional change intended.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Now that the GFP_NOWAIT case is gone, hard code GFP_KERNEL_ACCOUNT when
allocating shadow pages during eager page splitting in the TDP MMU.
Opportunistically replace use of __GFP_ZERO with allocations that zero
to improve readability.
No functional change intended.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Always drop mmu_lock to allocate shadow pages in the TDP MMU when doing
eager page splitting. Dropping mmu_lock during eager page splitting is
cheap since KVM does not have to flush remote TLBs, and avoids stalling
vCPU threads that are taking page faults while KVM is eager splitting
under mmu_lock held for write.
This change reduces 20%+ dips in MySQL throughput during live migration
in a 160 vCPU VM while userspace is issuing CLEAR_DIRTY_LOG ioctls
(tested with 1GiB and 8GiB CLEARs). Userspace could issue finer-grained
CLEARs, which would also reduce contention on mmu_lock, but doing so
will increase the rate of remote TLB flushing, since KVM must flush TLBs
before returning from CLEAR_DITY_LOG.
When there isn't contention on mmu_lock[1], this change does not regress
the time it takes to perform eager page splitting (the cost of releasing
and re-acquiring an uncontended lock is minimal on x86).
[1] Tested with dirty_log_perf_test, which does not run vCPUs during
eager page splitting, and with a 16 vCPU VM Live Migration with
manual-protect disabled (where mmu_lock is held for read).
Cc: Bibo Mao <[email protected]>
Cc: Sean Christopherson <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Reword the BUILD_BUG_ON() comment in the legacy #PF handler to explicitly
describe how asserting that synthetic PFERR flags are limited to bits 31:0
protects KVM against inadvertently passing a synthetic flag to the common
page fault handler.
No functional change intended.
Suggested-by: Xiaoyao Li <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
That function is only used when the kernel is compiled with FORTIFY_SOURCE and
when the kernel proper string.h header is used. The decompressor code doesn't
use the kernel proper header but has local copy which doesn't contain any
fortified implementations of the various string functions. As such
__fortify_panic() can never be called from the decompressor so remove it.
Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fixes from Mike Rapoport:
"Fix validation of NUMA coverage.
memblock_validate_numa_coverage() was checking for a unset node ID
using NUMA_NO_NODE, but x86 used MAX_NUMNODES when no node ID was
specified by buggy firmware.
Update memblock to substitute MAX_NUMNODES with NUMA_NO_NODE in
memblock_set_node() and use NUMA_NO_NODE in x86::numa_init()"
* tag 'fixes-2024-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
x86/mm/numa: Use NUMA_NO_NODE when calling memblock_set_node()
memblock: make memblock_set_node() also warn about use of MAX_NUMNODES
|
|
The routine is used on syscall exit and on non-AMD CPUs is guaranteed to
be empty.
It probably does not need to be a function call even on CPUs which do need the
mitigation.
[ bp: Make sure it is always inlined so that noinstr marking works. ]
Signed-off-by: Mateusz Guzik <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This is a re-commit of
da05b143a308 ("x86/boot: Don't add the EFI stub to targets")
after the tagged patch incorrectly reverted it.
vmlinux-objs-y is added to targets, with an assumption that they are all
relative to $(obj); adding a $(objtree)/drivers/... path causes the
build to incorrectly create a useless
arch/x86/boot/compressed/drivers/... directory tree.
Fix this just by using a different make variable for the EFI stub.
Fixes: cb8bda8ad443 ("x86/boot/compressed: Rename efi_thunk_64.S to efi-mixed.S")
Signed-off-by: Ben Segall <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Cc: [email protected] # v6.1+
Link: https://lore.kernel.org/r/[email protected]
|
|
AMD Zen-based systems use a System Management Network (SMN) that
provides access to implementation-specific registers.
SMN accesses are done indirectly through an index/data pair in PCI
config space. The accesses can fail for a variety of reasons.
Include code comments to describe some possible scenarios.
Require error checking for callers of amd_smn_read() and amd_smn_write().
This is needed because many error conditions cannot be checked by these
functions.
[ bp: Touchup comment. ]
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
syscall*.tbl files were added to make it easier to check which system
calls are supported on each architecture and to check for their numbers.
Arm and x86 files lack Linux-syscall-note license exception present in
files for all other architectures.
Signed-off-by: Marcin Juszkiewicz <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
Adding uretprobe syscall instead of trap to speed up return probe.
At the moment the uretprobe setup/path is:
- install entry uprobe
- when the uprobe is hit, it overwrites probed function's return address
on stack with address of the trampoline that contains breakpoint
instruction
- the breakpoint trap code handles the uretprobe consumers execution and
jumps back to original return address
This patch replaces the above trampoline's breakpoint instruction with new
ureprobe syscall call. This syscall does exactly the same job as the trap
with some more extra work:
- syscall trampoline must save original value for rax/r11/rcx registers
on stack - rax is set to syscall number and r11/rcx are changed and
used by syscall instruction
- the syscall code reads the original values of those registers and
restore those values in task's pt_regs area
- only caller from trampoline exposed in '[uprobes]' is allowed,
the process will receive SIGILL signal otherwise
Even with some extra work, using the uretprobes syscall shows speed
improvement (compared to using standard breakpoint):
On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz)
current:
uretprobe-nop : 1.498 ± 0.000M/s
uretprobe-push : 1.448 ± 0.001M/s
uretprobe-ret : 0.816 ± 0.001M/s
with the fix:
uretprobe-nop : 1.969 ± 0.002M/s < 31% speed up
uretprobe-push : 1.910 ± 0.000M/s < 31% speed up
uretprobe-ret : 0.934 ± 0.000M/s < 14% speed up
On Amd (AMD Ryzen 7 5700U)
current:
uretprobe-nop : 0.778 ± 0.001M/s
uretprobe-push : 0.744 ± 0.001M/s
uretprobe-ret : 0.540 ± 0.001M/s
with the fix:
uretprobe-nop : 0.860 ± 0.001M/s < 10% speed up
uretprobe-push : 0.818 ± 0.001M/s < 10% speed up
uretprobe-ret : 0.578 ± 0.000M/s < 7% speed up
The performance test spawns a thread that runs loop which triggers
uprobe with attached bpf program that increments the counter that
gets printed in results above.
The uprobe (and uretprobe) kind is determined by which instruction
is being patched with breakpoint instruction. That's also important
for uretprobes, because uprobe is installed for each uretprobe.
The performance test is part of bpf selftests:
tools/testing/selftests/bpf/run_bench_uprobes.sh
Note at the moment uretprobe syscall is supported only for native
64-bit process, compat process still uses standard breakpoint.
Note that when shadow stack is enabled the uretprobe syscall returns
via iret, which is slower than return via sysret, but won't cause the
shadow stack violation.
Link: https://lore.kernel.org/all/[email protected]/
Suggested-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
|
|
Wiring up uretprobe system call, which comes in following changes.
We need to do the wiring before, because the uretprobe implementation
needs the syscall number.
Note at the moment uretprobe syscall is supported only for native
64-bit process.
Link: https://lore.kernel.org/all/[email protected]/
Reviewed-by: Oleg Nesterov <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
|
|
Currently the application with enabled shadow stack will crash
if it sets up return uprobe. The reason is the uretprobe kernel
code changes the user space task's stack, but does not update
shadow stack accordingly.
Adding new functions to update values on shadow stack and using
them in uprobe code to keep shadow stack in sync with uretprobe
changes to user stack.
Link: https://lore.kernel.org/all/[email protected]/
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: Rick Edgecombe <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Fixes: 488af8ea7131 ("x86/shstk: Wire in shadow stack interface")
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
|
|
When reworking the range checking for get_user(), the get_user_8() case
on 32-bit wasn't zeroing the high register. (The jump to bad_get_user_8
was accidentally dropped.) Restore the correct error handling
destination (and rename the jump to using the expected ".L" prefix).
While here, switch to using a named argument ("size") for the call
template ("%c4" to "%c[size]") as already used in the other call
templates in this file.
Found after moving the usercopy selftests to KUnit:
# usercopy_test_invalid: EXPECTATION FAILED at
lib/usercopy_kunit.c:278
Expected val_u64 == 0, but
val_u64 == -60129542144 (0xfffffff200000000)
Closes: https://lore.kernel.org/all/CABVgOSn=tb=Lj9SxHuT4_9MTjjKVxsq-ikdXC4kGHO4CfKVmGQ@mail.gmail.com
Fixes: b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()")
Reported-by: David Gow <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Qiuxu Zhuo <[email protected]>
Tested-by: David Gow <[email protected]>
Link: https://lore.kernel.org/all/20240610210213.work.143-kees%40kernel.org
|
|
Now that KVM unconditionally sets l1tf_flush_l1d in kvm_arch_vcpu_load(),
drop the redundant store from vcpu_run(). The flag is cleared only when
VM-Enter is imminent, deep below vcpu_run(), i.e. barring a KVM bug, it's
impossible for l1tf_flush_l1d to be cleared between loading the vCPU and
calling vcpu_run().
Acked-by: Kai Huang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Always set l1tf_flush_l1d during kvm_arch_vcpu_load() instead of setting
it only when the vCPU is being scheduled back in. The flag is processed
only when VM-Enter is imminent, and KVM obviously needs to load the vCPU
before VM-Enter, so attempting to precisely set l1tf_flush_l1d provides no
meaningful value. I.e. the flag _will_ be set either way, it's simply a
matter of when.
Acked-by: Kai Huang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Delete kvm_arch_sched_in() now that all implementations are nops.
Reviewed-by: Bibo Mao <[email protected]>
Acked-by: Kai Huang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Fold the guts of kvm_arch_sched_in() into kvm_arch_vcpu_load(), keying
off the recently added kvm_vcpu.scheduled_out as appropriate.
Note, there is a very slight functional change, as PLE shrink updates will
now happen after blasting WBINVD, but that is quite uninteresting as the
two operations do not interact in any way.
Acked-by: Kai Huang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Move VMX's {grow,shrink}_ple_window() above vmx_vcpu_load() in preparation
of moving the sched_in logic, which handles shrinking the PLE window, into
vmx_vcpu_load().
No functional change intended.
Acked-by: Kai Huang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Now that KVM sets up empty IRQ routing during VM creation, don't recreate
empty routing during KVM_CAP_SPLIT_IRQCHIP. Setting IRQ routes during
KVM_CAP_SPLIT_IRQCHIP can result in 20+ milliseconds of delay due to the
synchronize_srcu_expedited() call in kvm_set_irq_routing().
Note, the empty routing is guaranteed to be intact as KVM x86 only allows
changing the IRQ routing after an in-kernel IRQCHIP has been created, and
KVM_CAP_SPLIT_IRQCHIP is disallowed after creating an IRQCHIP.
Signed-off-by: Yi Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[sean: massage changelog, remove unused empty_routing array]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Some AMD Zen 4 processors support a new feature FAST CPPC which
allows for a faster CPPC loop due to internal architectural
enhancements. The goal of this faster loop is higher performance
at the same power consumption.
Reference:
See the page 99 of PPR for AMD Family 19h Model 61h rev.B1, docID 56713
Signed-off-by: Perry Yuan <[email protected]>
Signed-off-by: Xiaojian Du <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
|
|
Add a helper, intel_pmu_enable_fixed_counter_bits(), to dedup code that
enables fixed counter bits, i.e. when KVM clears bits in the reserved mask
used to detect invalid MSR_CORE_PERF_FIXED_CTR_CTRL values.
No functional change intended.
Cc: Dapeng Mi <[email protected]>
Reviewed-by: Dapeng Mi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Now that the new macros have been gradually put in place, replace the
old ones. Leave the new label numbers starting at 7xx as a hint that the
new nested alternatives are being used now.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
When a vCPU is interrupted by a signal while running a nested guest,
KVM will exit to userspace with L2 state. However, userspace has no
way to know whether it sees L1 or L2 state (besides calling
KVM_GET_STATS_FD, which does not have a stable ABI).
This causes multiple problems:
The simplest one is L2 state corruption when userspace marks the sregs
as dirty. See this mailing list thread [1] for a complete discussion.
Another problem is that if userspace decides to continue by emulating
instructions, it will unknowingly emulate with L2 state as if L1
doesn't exist, which can be considered a weird guest escape.
Introduce a new flag, KVM_RUN_X86_GUEST_MODE, in the kvm_run data
structure, which is set when the vCPU exited while running a nested
guest. Also introduce a new capability, KVM_CAP_X86_GUEST_MODE, to
advertise the functionality to userspace.
[1] https://lore.kernel.org/kvm/[email protected]/T/#m280aadcb2e10ae02c191a7dc4ed4b711a74b1f55
Signed-off-by: Thomas Prescher <[email protected]>
Signed-off-by: Julian Stecklina <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Zap the hack of using an ALTERNATIVE_3() internal label, as suggested by
bgerst:
https://lore.kernel.org/r/CAMzpN2i4oJ-Dv0qO46Fd-DxNv5z9=x%2BvO%[email protected]
in favor of a label local to this macro only, as it should be done.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The C macro.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
All architectures that implement function graph also implements
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR. Remove it, as it is no longer a
differentiator.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Split conversion deliberately into minimal pieces to ease bisection
because debugging alternatives is a nightmare.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Instead of making increasingly complicated ALTERNATIVE_n()
implementations, use a nested alternative expression.
The only difference between:
ALTERNATIVE_2(oldinst, newinst1, flag1, newinst2, flag2)
and
ALTERNATIVE(ALTERNATIVE(oldinst, newinst1, flag1),
newinst2, flag2)
is that the outer alternative can add additional padding when the inner
alternative is the shorter one, which then results in
alt_instr::instrlen being inconsistent.
However, this is easily remedied since the alt_instr entries will be
consecutive and it is trivial to compute the max(alt_instr::instrlen) at
runtime while patching.
Specifically, after this the ALTERNATIVE_2 macro, after CPP expansion
(and manual layout), looks like this:
.macro ALTERNATIVE_2 oldinstr, newinstr1, ft_flags1, newinstr2, ft_flags2
740:
740: \oldinstr ;
741: .skip -(((744f-743f)-(741b-740b)) > 0) * ((744f-743f)-(741b-740b)),0x90 ;
742: .pushsection .altinstructions,"a" ;
altinstr_entry 740b,743f,\ft_flags1,742b-740b,744f-743f ;
.popsection ;
.pushsection .altinstr_replacement,"ax" ;
743: \newinstr1 ;
744: .popsection ; ;
741: .skip -(((744f-743f)-(741b-740b)) > 0) * ((744f-743f)-(741b-740b)),0x90 ;
742: .pushsection .altinstructions,"a" ;
altinstr_entry 740b,743f,\ft_flags2,742b-740b,744f-743f ;
.popsection ;
.pushsection .altinstr_replacement,"ax" ;
743: \newinstr2 ;
744: .popsection ;
.endm
The only label that is ambiguous is 740, however they all reference the
same spot, so that doesn't matter.
NOTE: obviously only @oldinstr may be an alternative; making @newinstr
an alternative would mean patching .altinstr_replacement which very
likely isn't what is intended, also the labels will be confused in that
case.
[ bp: Debug an issue where it would match the wrong two insns and
and consider them nested due to the same signed offsets in the
.alternative section and use instr_va() to compare the full virtual
addresses instead.
- Use new labels to denote that the new, nested
alternatives are being used when staring at preprocessed output.
- Use the %c constraint everywhere instead of %P and document the
difference for future reference. ]
Signed-off-by: Peter Zijlstra <[email protected]>
Co-developed-by: Borislav Petkov (AMD) <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Unused.
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Yue and Xingwei reported a jump label failure. It's caused by the lack of
serialization in set_attr_rdpmc():
CPU0 CPU1
Assume: x86_pmu.attr_rdpmc == 0
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
static_branch_dec(&rdpmc_never_available_key);
if (val != x86_pmu.attr_rdpmc) {
if (val == 0)
...
else if (x86_pmu.attr_rdpmc == 0)
FAIL, due to imbalance ---> static_branch_dec(&rdpmc_never_available_key);
The reported BUG() is a consequence of the above and of another bug in the
jump label core code. The core code needs a separate fix, but that cannot
prevent the imbalance problem caused by set_attr_rdpmc().
Prevent this by serializing set_attr_rdpmc() locally.
Fixes: a66734297f78 ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks")
Closes: https://lore.kernel.org/r/CAEkJfYNzfW1vG=ZTMdz_Weoo=RXY1NDunbxnDaLyj8R4kEoE_w@mail.gmail.com
Reported-by: Yue Sun <[email protected]>
Reported-by: Xingwei Lee <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The SVSM Calling Area (CA) is used to communicate between Linux and the
SVSM. Since the firmware supplied CA for the BSP is likely to be in
reserved memory, switch off that CA to a kernel provided CA so that access
and use of the CA is available during boot. The CA switch is done using
the SVSM core protocol SVSM_CORE_REMAP_CA call.
An SVSM call is executed by filling out the SVSM CA and setting the proper
register state as documented by the SVSM protocol. The SVSM is invoked by
by requesting the hypervisor to run VMPL0.
Once it is safe to allocate/reserve memory, allocate a CA for each CPU.
After allocating the new CAs, the BSP will switch from the boot CA to the
per-CPU CA. The CA for an AP is identified to the SVSM when creating the
VMSA in preparation for booting the AP.
[ bp: Heavily simplify svsm_issue_call() asm, other touchups. ]
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/fa8021130bcc3bcf14d722a25548cb0cdf325456.1717600736.git.thomas.lendacky@amd.com
|
|
During early boot phases, check for the presence of an SVSM when running
as an SEV-SNP guest.
An SVSM is present if not running at VMPL0 and the 64-bit value at offset
0x148 into the secrets page is non-zero. If an SVSM is present, save the
SVSM Calling Area address (CAA), located at offset 0x150 into the secrets
page, and set the VMPL level of the guest, which should be non-zero, to
indicate the presence of an SVSM.
[ bp: Touchups. ]
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/9d3fe161be93d4ea60f43c2a3f2c311fe708b63b.1717600736.git.thomas.lendacky@amd.com
|
|
Functions that need to disable IRQs, but are common to both early boot and
post-boot execution, are unable to deal with paravirt support associated
with local_irq_save() and local_irq_restore().
Create native versions of these for use in these situations.
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/c4c33c0d07200164a3dd8cfd6da0344f57732648.1717600736.git.thomas.lendacky@amd.com
|
|
Move guest_cpuid_is_amd_or_hygon() into cpuid.c now that, except for one
Intel quirk in the emulator, KVM checks for AMD vs. Intel *compatible*
vCPUs, not exact vendors, i.e. now that there should not be any reason for
KVM at-large to care about the exact vendor.
Opportunistically refactor the guts of the helper to use "entry" instead
of "best", and short circuit the !entry path to make the common case more
readable.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Open code the is_guest_vendor_intel() check in string_registers_quirk() to
discourage makiking exact vendor==Intel checks in the emulator, and to
remove the rather awful #ifdeffery.
The string quirk is literally the only Intel specific, *non-architectural*
behavior that KVM emulates. All Intel specific behavior that is
architecturally defined applies to all vendors that are compatible with
Intel's architecture, i.e. should use guest_cpuid_is_intel_compatible().
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Emulate SYSENTER in Compatibility Mode for all vCPUs models that are
compatible with Intel's architecture, as the behavior if SYSENTER is
architecturally defined in Intel's SDM, i.e. should be followed by any
CPU that implements Intel's architecture.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Emulate bits 63:32 of the SYSENTER_R{I,S}P MSRs for all vCPUs that are
compatible with Intel's architecture, not just strictly vCPUs that have
vendor==Intel. The behavior of bits 63:32 is architecturally defined in
the SDM, i.e. not some uarch specific quirk of Intel CPUs.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Use guest_cpuid_is_intel_compatible() to determine whether SYSCALL in
32-bit Protected Mode (including Compatibility Mode) should #UD or succeed.
The existing code already does the exact equivalent of
guest_cpuid_is_intel_compatible(), just in a rather roundabout way.
No functional change intended.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|