aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2024-06-15Merge tag 'x86-urgent-2024-06-15' of ↵Linus Torvalds3-5/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix the 8 bytes get_user() logic on x86-32 - Fix build bug that creates weird & mistaken target directory under arch/x86/ * tag 'x86-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Don't add the EFI stub to targets, again x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checking
2024-06-15efi/x86: Free EFI memory map only when installing a new one.Ard Biesheuvel2-2/+11
The logic in __efi_memmap_init() is shared between two different execution flows: - mapping the EFI memory map early or late into the kernel VA space, so that its entries can be accessed; - the x86 specific cloning of the EFI memory map in order to insert new entries that are created as a result of making a memory reservation via a call to efi_mem_reserve(). In the former case, the underlying memory containing the kernel's view of the EFI memory map (which may be heavily modified by the kernel itself on x86) is not modified at all, and the only thing that changes is the virtual mapping of this memory, which is different between early and late boot. In the latter case, an entirely new allocation is created that carries a new, updated version of the kernel's view of the EFI memory map. When installing this new version, the old version will no longer be referenced, and if the memory was allocated by the kernel, it will leak unless it gets freed. The logic that implements this freeing currently lives on the code path that is shared between these two use cases, but it should only apply to the latter. So move it to the correct spot. While at it, drop the dummy definition for non-x86 architectures, as that is no longer needed. Cc: <[email protected]> Fixes: f0ef6523475f ("efi: Fix efi_memmap_alloc() leaks") Tested-by: Ashish Kalra <[email protected]> Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Ard Biesheuvel <[email protected]>
2024-06-14KVM: x86/mmu: Avoid reacquiring RCU if TDP MMU fails to allocate an SPDavid Matlack1-7/+5
Avoid needlessly reacquiring the RCU read lock if the TDP MMU fails to allocate a shadow page during eager page splitting. Opportunistically drop the local variable ret as well now that it's no longer necessary. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: David Matlack <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-14KVM: x86/mmu: Unnest TDP MMU helpers that allocate SPs for eager splittingDavid Matlack1-30/+18
Move the implementation of tdp_mmu_alloc_sp_for_split() to its one and only caller to reduce unnecessary nesting and make it more clear why the eager split loop continues after allocating a new SP. Opportunistically drop the double-underscores from __tdp_mmu_alloc_sp_for_split() now that its parent is gone. No functional change intended. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: David Matlack <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-14KVM: x86/mmu: Hard code GFP flags for TDP MMU eager split allocationsDavid Matlack1-6/+4
Now that the GFP_NOWAIT case is gone, hard code GFP_KERNEL_ACCOUNT when allocating shadow pages during eager page splitting in the TDP MMU. Opportunistically replace use of __GFP_ZERO with allocations that zero to improve readability. No functional change intended. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: David Matlack <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-14KVM: x86/mmu: Always drop mmu_lock to allocate TDP MMU SPs for eager splittingDavid Matlack1-15/+1
Always drop mmu_lock to allocate shadow pages in the TDP MMU when doing eager page splitting. Dropping mmu_lock during eager page splitting is cheap since KVM does not have to flush remote TLBs, and avoids stalling vCPU threads that are taking page faults while KVM is eager splitting under mmu_lock held for write. This change reduces 20%+ dips in MySQL throughput during live migration in a 160 vCPU VM while userspace is issuing CLEAR_DIRTY_LOG ioctls (tested with 1GiB and 8GiB CLEARs). Userspace could issue finer-grained CLEARs, which would also reduce contention on mmu_lock, but doing so will increase the rate of remote TLB flushing, since KVM must flush TLBs before returning from CLEAR_DITY_LOG. When there isn't contention on mmu_lock[1], this change does not regress the time it takes to perform eager page splitting (the cost of releasing and re-acquiring an uncontended lock is minimal on x86). [1] Tested with dirty_log_perf_test, which does not run vCPUs during eager page splitting, and with a 16 vCPU VM Live Migration with manual-protect disabled (where mmu_lock is held for read). Cc: Bibo Mao <[email protected]> Cc: Sean Christopherson <[email protected]> Signed-off-by: David Matlack <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-14KVM: x86/mmu: Rephrase comment about synthetic PFERR flags in #PF handlerSean Christopherson1-1/+4
Reword the BUILD_BUG_ON() comment in the legacy #PF handler to explicitly describe how asserting that synthetic PFERR flags are limited to bits 31:0 protects KVM against inadvertently passing a synthetic flag to the common page fault handler. No functional change intended. Suggested-by: Xiaoyao Li <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-14x86/boot: Remove unused function __fortify_panic()Nikolay Borisov1-5/+0
That function is only used when the kernel is compiled with FORTIFY_SOURCE and when the kernel proper string.h header is used. The decompressor code doesn't use the kernel proper header but has local copy which doesn't contain any fortified implementations of the various string functions. As such __fortify_panic() can never be called from the decompressor so remove it. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-13Merge tag 'fixes-2024-06-13' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock fixes from Mike Rapoport: "Fix validation of NUMA coverage. memblock_validate_numa_coverage() was checking for a unset node ID using NUMA_NO_NODE, but x86 used MAX_NUMNODES when no node ID was specified by buggy firmware. Update memblock to substitute MAX_NUMNODES with NUMA_NO_NODE in memblock_set_node() and use NUMA_NO_NODE in x86::numa_init()" * tag 'fixes-2024-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: x86/mm/numa: Use NUMA_NO_NODE when calling memblock_set_node() memblock: make memblock_set_node() also warn about use of MAX_NUMNODES
2024-06-13x86/CPU/AMD: Always inline amd_clear_divider()Mateusz Guzik2-12/+11
The routine is used on syscall exit and on non-AMD CPUs is guaranteed to be empty. It probably does not need to be a function call even on CPUs which do need the mitigation. [ bp: Make sure it is always inlined so that noinstr marking works. ] Signed-off-by: Mateusz Guzik <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-13x86/boot: Don't add the EFI stub to targets, againBenjamin Segall1-2/+2
This is a re-commit of da05b143a308 ("x86/boot: Don't add the EFI stub to targets") after the tagged patch incorrectly reverted it. vmlinux-objs-y is added to targets, with an assumption that they are all relative to $(obj); adding a $(objtree)/drivers/... path causes the build to incorrectly create a useless arch/x86/boot/compressed/drivers/... directory tree. Fix this just by using a different make variable for the EFI stub. Fixes: cb8bda8ad443 ("x86/boot/compressed: Rename efi_thunk_64.S to efi-mixed.S") Signed-off-by: Ben Segall <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Cc: [email protected] # v6.1+ Link: https://lore.kernel.org/r/[email protected]
2024-06-12x86/amd_nb: Enhance SMN access error checkingYazen Ghannam2-7/+41
AMD Zen-based systems use a System Management Network (SMN) that provides access to implementation-specific registers. SMN accesses are done indirectly through an index/data pair in PCI config space. The accesses can fail for a variety of reasons. Include code comments to describe some possible scenarios. Require error checking for callers of amd_smn_read() and amd_smn_write(). This is needed because many error conditions cannot be checked by these functions. [ bp: Touchup comment. ] Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-12x86, arm: Add missing license tag to syscall tables filesMarcin Juszkiewicz2-0/+2
syscall*.tbl files were added to make it easier to check which system calls are supported on each architecture and to check for their numbers. Arm and x86 files lack Linux-syscall-note license exception present in files for all other architectures. Signed-off-by: Marcin Juszkiewicz <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2024-06-12uprobe: Add uretprobe syscall to speed up return probeJiri Olsa3-0/+124
Adding uretprobe syscall instead of trap to speed up return probe. At the moment the uretprobe setup/path is: - install entry uprobe - when the uprobe is hit, it overwrites probed function's return address on stack with address of the trampoline that contains breakpoint instruction - the breakpoint trap code handles the uretprobe consumers execution and jumps back to original return address This patch replaces the above trampoline's breakpoint instruction with new ureprobe syscall call. This syscall does exactly the same job as the trap with some more extra work: - syscall trampoline must save original value for rax/r11/rcx registers on stack - rax is set to syscall number and r11/rcx are changed and used by syscall instruction - the syscall code reads the original values of those registers and restore those values in task's pt_regs area - only caller from trampoline exposed in '[uprobes]' is allowed, the process will receive SIGILL signal otherwise Even with some extra work, using the uretprobes syscall shows speed improvement (compared to using standard breakpoint): On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz) current: uretprobe-nop : 1.498 ± 0.000M/s uretprobe-push : 1.448 ± 0.001M/s uretprobe-ret : 0.816 ± 0.001M/s with the fix: uretprobe-nop : 1.969 ± 0.002M/s < 31% speed up uretprobe-push : 1.910 ± 0.000M/s < 31% speed up uretprobe-ret : 0.934 ± 0.000M/s < 14% speed up On Amd (AMD Ryzen 7 5700U) current: uretprobe-nop : 0.778 ± 0.001M/s uretprobe-push : 0.744 ± 0.001M/s uretprobe-ret : 0.540 ± 0.001M/s with the fix: uretprobe-nop : 0.860 ± 0.001M/s < 10% speed up uretprobe-push : 0.818 ± 0.001M/s < 10% speed up uretprobe-ret : 0.578 ± 0.000M/s < 7% speed up The performance test spawns a thread that runs loop which triggers uprobe with attached bpf program that increments the counter that gets printed in results above. The uprobe (and uretprobe) kind is determined by which instruction is being patched with breakpoint instruction. That's also important for uretprobes, because uprobe is installed for each uretprobe. The performance test is part of bpf selftests: tools/testing/selftests/bpf/run_bench_uprobes.sh Note at the moment uretprobe syscall is supported only for native 64-bit process, compat process still uses standard breakpoint. Note that when shadow stack is enabled the uretprobe syscall returns via iret, which is slower than return via sysret, but won't cause the shadow stack violation. Link: https://lore.kernel.org/all/[email protected]/ Suggested-by: Andrii Nakryiko <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2024-06-12uprobe: Wire up uretprobe system callJiri Olsa1-0/+1
Wiring up uretprobe system call, which comes in following changes. We need to do the wiring before, because the uretprobe implementation needs the syscall number. Note at the moment uretprobe syscall is supported only for native 64-bit process. Link: https://lore.kernel.org/all/[email protected]/ Reviewed-by: Oleg Nesterov <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2024-06-12x86/shstk: Make return uprobe work with shadow stackJiri Olsa3-1/+19
Currently the application with enabled shadow stack will crash if it sets up return uprobe. The reason is the uretprobe kernel code changes the user space task's stack, but does not update shadow stack accordingly. Adding new functions to update values on shadow stack and using them in uprobe code to keep shadow stack in sync with uretprobe changes to user stack. Link: https://lore.kernel.org/all/[email protected]/ Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Rick Edgecombe <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Fixes: 488af8ea7131 ("x86/shstk: Wire in shadow stack interface") Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2024-06-11x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checkingKees Cook2-3/+7
When reworking the range checking for get_user(), the get_user_8() case on 32-bit wasn't zeroing the high register. (The jump to bad_get_user_8 was accidentally dropped.) Restore the correct error handling destination (and rename the jump to using the expected ".L" prefix). While here, switch to using a named argument ("size") for the call template ("%c4" to "%c[size]") as already used in the other call templates in this file. Found after moving the usercopy selftests to KUnit: # usercopy_test_invalid: EXPECTATION FAILED at lib/usercopy_kunit.c:278 Expected val_u64 == 0, but val_u64 == -60129542144 (0xfffffff200000000) Closes: https://lore.kernel.org/all/CABVgOSn=tb=Lj9SxHuT4_9MTjjKVxsq-ikdXC4kGHO4CfKVmGQ@mail.gmail.com Fixes: b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()") Reported-by: David Gow <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Qiuxu Zhuo <[email protected]> Tested-by: David Gow <[email protected]> Link: https://lore.kernel.org/all/20240610210213.work.143-kees%40kernel.org
2024-06-11KVM: x86: Drop now-superflous setting of l1tf_flush_l1d in vcpu_run()Sean Christopherson2-4/+4
Now that KVM unconditionally sets l1tf_flush_l1d in kvm_arch_vcpu_load(), drop the redundant store from vcpu_run(). The flag is cleared only when VM-Enter is imminent, deep below vcpu_run(), i.e. barring a KVM bug, it's impossible for l1tf_flush_l1d to be cleared between loading the vCPU and calling vcpu_run(). Acked-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-11KVM: x86: Unconditionally set l1tf_flush_l1d during vCPU loadSean Christopherson1-6/+5
Always set l1tf_flush_l1d during kvm_arch_vcpu_load() instead of setting it only when the vCPU is being scheduled back in. The flag is processed only when VM-Enter is imminent, and KVM obviously needs to load the vCPU before VM-Enter, so attempting to precisely set l1tf_flush_l1d provides no meaningful value. I.e. the flag _will_ be set either way, it's simply a matter of when. Acked-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-11KVM: Delete the now unused kvm_arch_sched_in()Sean Christopherson2-8/+3
Delete kvm_arch_sched_in() now that all implementations are nops. Reviewed-by: Bibo Mao <[email protected]> Acked-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-11KVM: x86: Fold kvm_arch_sched_in() into kvm_arch_vcpu_load()Sean Christopherson7-27/+16
Fold the guts of kvm_arch_sched_in() into kvm_arch_vcpu_load(), keying off the recently added kvm_vcpu.scheduled_out as appropriate. Note, there is a very slight functional change, as PLE shrink updates will now happen after blasting WBINVD, but that is quite uninteresting as the two operations do not interact in any way. Acked-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-11KVM: VMX: Move PLE grow/shrink helpers above vmx_vcpu_load()Sean Christopherson1-32/+32
Move VMX's {grow,shrink}_ple_window() above vmx_vcpu_load() in preparation of moving the sched_in logic, which handles shrinking the PLE window, into vmx_vcpu_load(). No functional change intended. Acked-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-11KVM: x86: Don't re-setup empty IRQ routing when KVM_CAP_SPLIT_IRQCHIPYi Wang3-11/+0
Now that KVM sets up empty IRQ routing during VM creation, don't recreate empty routing during KVM_CAP_SPLIT_IRQCHIP. Setting IRQ routes during KVM_CAP_SPLIT_IRQCHIP can result in 20+ milliseconds of delay due to the synchronize_srcu_expedited() call in kvm_set_irq_routing(). Note, the empty routing is guaranteed to be intact as KVM x86 only allows changing the IRQ routing after an in-kernel IRQCHIP has been created, and KVM_CAP_SPLIT_IRQCHIP is disallowed after creating an IRQCHIP. Signed-off-by: Yi Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: massage changelog, remove unused empty_routing array] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-11x86/cpufeatures: Add AMD FAST CPPC feature flagPerry Yuan2-0/+2
Some AMD Zen 4 processors support a new feature FAST CPPC which allows for a faster CPPC loop due to internal architectural enhancements. The goal of this faster loop is higher performance at the same power consumption. Reference: See the page 99 of PPR for AMD Family 19h Model 61h rev.B1, docID 56713 Signed-off-by: Perry Yuan <[email protected]> Signed-off-by: Xiaojian Du <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]>
2024-06-11KVM: x86/pmu: Add a helper to enable bits in FIXED_CTR_CTRLSean Christopherson1-10/+12
Add a helper, intel_pmu_enable_fixed_counter_bits(), to dedup code that enables fixed counter bits, i.e. when KVM clears bits in the reserved mask used to detect invalid MSR_CORE_PERF_FIXED_CTR_CTRL values. No functional change intended. Cc: Dapeng Mi <[email protected]> Reviewed-by: Dapeng Mi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-11x86/alternative: Replace the old macrosBorislav Petkov (AMD)1-121/+32
Now that the new macros have been gradually put in place, replace the old ones. Leave the new label numbers starting at 7xx as a hint that the new nested alternatives are being used now. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11x86/alternative: Convert the asm ALTERNATIVE_3() macroBorislav Petkov (AMD)1-25/+0
Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11x86/alternative: Convert the asm ALTERNATIVE_2() macroBorislav Petkov (AMD)1-22/+0
Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11x86/alternative: Convert the asm ALTERNATIVE() macroBorislav Petkov (AMD)1-21/+1
Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11KVM: x86: Add KVM_RUN_X86_GUEST_MODE kvm_run flagThomas Prescher2-0/+4
When a vCPU is interrupted by a signal while running a nested guest, KVM will exit to userspace with L2 state. However, userspace has no way to know whether it sees L1 or L2 state (besides calling KVM_GET_STATS_FD, which does not have a stable ABI). This causes multiple problems: The simplest one is L2 state corruption when userspace marks the sregs as dirty. See this mailing list thread [1] for a complete discussion. Another problem is that if userspace decides to continue by emulating instructions, it will unknowingly emulate with L2 state as if L1 doesn't exist, which can be considered a weird guest escape. Introduce a new flag, KVM_RUN_X86_GUEST_MODE, in the kvm_run data structure, which is set when the vCPU exited while running a nested guest. Also introduce a new capability, KVM_CAP_X86_GUEST_MODE, to advertise the functionality to userspace. [1] https://lore.kernel.org/kvm/[email protected]/T/#m280aadcb2e10ae02c191a7dc4ed4b711a74b1f55 Signed-off-by: Thomas Prescher <[email protected]> Signed-off-by: Julian Stecklina <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-11x86/alternative: Convert ALTERNATIVE_3()Borislav Petkov (AMD)2-29/+9
Zap the hack of using an ALTERNATIVE_3() internal label, as suggested by bgerst: https://lore.kernel.org/r/CAMzpN2i4oJ-Dv0qO46Fd-DxNv5z9=x%2BvO%[email protected] in favor of a label local to this macro only, as it should be done. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11x86/alternative: Convert ALTERNATIVE_TERNARY()Borislav Petkov (AMD)1-6/+0
The C macro. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11x86/alternative: Convert alternative_call_2()Borislav Petkov (AMD)1-15/+6
Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11x86/alternative: Convert alternative_call()Borislav Petkov (AMD)1-5/+1
Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11x86/alternative: Convert alternative_io()Borislav Petkov (AMD)1-5/+1
Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11x86/alternative: Convert alternative_input()Borislav Petkov (AMD)1-5/+1
Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11x86/alternative: Convert alternative_2()Borislav Petkov (AMD)1-3/+0
Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11function_graph: Everyone uses HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, remove itSteven Rostedt (Google)1-2/+0
All architectures that implement function graph also implements HAVE_FUNCTION_GRAPH_RET_ADDR_PTR. Remove it, as it is no longer a differentiator. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Guo Ren <[email protected]> Cc: Huacai Chen <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-06-11x86/alternative: Convert alternative()Borislav Petkov (AMD)1-4/+1
Split conversion deliberately into minimal pieces to ease bisection because debugging alternatives is a nightmare. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11x86/alternatives: Add nested alternatives macrosPeter Zijlstra2-4/+136
Instead of making increasingly complicated ALTERNATIVE_n() implementations, use a nested alternative expression. The only difference between: ALTERNATIVE_2(oldinst, newinst1, flag1, newinst2, flag2) and ALTERNATIVE(ALTERNATIVE(oldinst, newinst1, flag1), newinst2, flag2) is that the outer alternative can add additional padding when the inner alternative is the shorter one, which then results in alt_instr::instrlen being inconsistent. However, this is easily remedied since the alt_instr entries will be consecutive and it is trivial to compute the max(alt_instr::instrlen) at runtime while patching. Specifically, after this the ALTERNATIVE_2 macro, after CPP expansion (and manual layout), looks like this: .macro ALTERNATIVE_2 oldinstr, newinstr1, ft_flags1, newinstr2, ft_flags2 740: 740: \oldinstr ; 741: .skip -(((744f-743f)-(741b-740b)) > 0) * ((744f-743f)-(741b-740b)),0x90 ; 742: .pushsection .altinstructions,"a" ; altinstr_entry 740b,743f,\ft_flags1,742b-740b,744f-743f ; .popsection ; .pushsection .altinstr_replacement,"ax" ; 743: \newinstr1 ; 744: .popsection ; ; 741: .skip -(((744f-743f)-(741b-740b)) > 0) * ((744f-743f)-(741b-740b)),0x90 ; 742: .pushsection .altinstructions,"a" ; altinstr_entry 740b,743f,\ft_flags2,742b-740b,744f-743f ; .popsection ; .pushsection .altinstr_replacement,"ax" ; 743: \newinstr2 ; 744: .popsection ; .endm The only label that is ambiguous is 740, however they all reference the same spot, so that doesn't matter. NOTE: obviously only @oldinstr may be an alternative; making @newinstr an alternative would mean patching .altinstr_replacement which very likely isn't what is intended, also the labels will be confused in that case. [ bp: Debug an issue where it would match the wrong two insns and and consider them nested due to the same signed offsets in the .alternative section and use instr_va() to compare the full virtual addresses instead. - Use new labels to denote that the new, nested alternatives are being used when staring at preprocessed output. - Use the %c constraint everywhere instead of %P and document the difference for future reference. ] Signed-off-by: Peter Zijlstra <[email protected]> Co-developed-by: Borislav Petkov (AMD) <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-06-11x86/alternative: Zap alternative_ternary()Borislav Petkov (AMD)1-3/+0
Unused. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-11perf/x86: Serialize set_attr_rdpmc()Thomas Gleixner1-0/+3
Yue and Xingwei reported a jump label failure. It's caused by the lack of serialization in set_attr_rdpmc(): CPU0 CPU1 Assume: x86_pmu.attr_rdpmc == 0 if (val != x86_pmu.attr_rdpmc) { if (val == 0) ... else if (x86_pmu.attr_rdpmc == 0) static_branch_dec(&rdpmc_never_available_key); if (val != x86_pmu.attr_rdpmc) { if (val == 0) ... else if (x86_pmu.attr_rdpmc == 0) FAIL, due to imbalance ---> static_branch_dec(&rdpmc_never_available_key); The reported BUG() is a consequence of the above and of another bug in the jump label core code. The core code needs a separate fix, but that cannot prevent the imbalance problem caused by set_attr_rdpmc(). Prevent this by serializing set_attr_rdpmc() locally. Fixes: a66734297f78 ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks") Closes: https://lore.kernel.org/r/CAEkJfYNzfW1vG=ZTMdz_Weoo=RXY1NDunbxnDaLyj8R4kEoE_w@mail.gmail.com Reported-by: Yue Sun <[email protected]> Reported-by: Xingwei Lee <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-06-11x86/sev: Use kernel provided SVSM Calling AreasTom Lendacky6-39/+362
The SVSM Calling Area (CA) is used to communicate between Linux and the SVSM. Since the firmware supplied CA for the BSP is likely to be in reserved memory, switch off that CA to a kernel provided CA so that access and use of the CA is available during boot. The CA switch is done using the SVSM core protocol SVSM_CORE_REMAP_CA call. An SVSM call is executed by filling out the SVSM CA and setting the proper register state as documented by the SVSM protocol. The SVSM is invoked by by requesting the hypervisor to run VMPL0. Once it is safe to allocate/reserve memory, allocate a CA for each CPU. After allocating the new CAs, the BSP will switch from the boot CA to the per-CPU CA. The CA for an AP is identified to the SVSM when creating the VMSA in preparation for booting the AP. [ bp: Heavily simplify svsm_issue_call() asm, other touchups. ] Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/fa8021130bcc3bcf14d722a25548cb0cdf325456.1717600736.git.thomas.lendacky@amd.com
2024-06-11x86/sev: Check for the presence of an SVSM in the SNP secrets pageTom Lendacky5-9/+132
During early boot phases, check for the presence of an SVSM when running as an SEV-SNP guest. An SVSM is present if not running at VMPL0 and the 64-bit value at offset 0x148 into the secrets page is non-zero. If an SVSM is present, save the SVSM Calling Area address (CAA), located at offset 0x150 into the secrets page, and set the VMPL level of the guest, which should be non-zero, to indicate the presence of an SVSM. [ bp: Touchups. ] Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/9d3fe161be93d4ea60f43c2a3f2c311fe708b63b.1717600736.git.thomas.lendacky@amd.com
2024-06-11x86/irqflags: Provide native versions of the local_irq_save()/restore()Tom Lendacky1-0/+20
Functions that need to disable IRQs, but are common to both early boot and post-boot execution, are unable to deal with paravirt support associated with local_irq_save() and local_irq_restore(). Create native versions of these for use in these situations. Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/c4c33c0d07200164a3dd8cfd6da0344f57732648.1717600736.git.thomas.lendacky@amd.com
2024-06-10KVM: x86: Bury guest_cpuid_is_amd_or_hygon() in cpuid.cSean Christopherson2-10/+12
Move guest_cpuid_is_amd_or_hygon() into cpuid.c now that, except for one Intel quirk in the emulator, KVM checks for AMD vs. Intel *compatible* vCPUs, not exact vendors, i.e. now that there should not be any reason for KVM at-large to care about the exact vendor. Opportunistically refactor the guts of the helper to use "entry" instead of "best", and short circuit the !entry path to make the common case more readable. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-10KVM: x86: Open code vendor_intel() in string_registers_quirk()Sean Christopherson1-12/+8
Open code the is_guest_vendor_intel() check in string_registers_quirk() to discourage makiking exact vendor==Intel checks in the emulator, and to remove the rather awful #ifdeffery. The string quirk is literally the only Intel specific, *non-architectural* behavior that KVM emulates. All Intel specific behavior that is architecturally defined applies to all vendors that are compatible with Intel's architecture, i.e. should use guest_cpuid_is_intel_compatible(). Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-10KVM: x86: Allow SYSENTER in Compatibility Mode for all Intel compat vCPUsSean Christopherson1-4/+6
Emulate SYSENTER in Compatibility Mode for all vCPUs models that are compatible with Intel's architecture, as the behavior if SYSENTER is architecturally defined in Intel's SDM, i.e. should be followed by any CPU that implements Intel's architecture. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-10KVM: SVM: Emulate SYSENTER RIP/RSP behavior for all Intel compat vCPUsSean Christopherson2-15/+7
Emulate bits 63:32 of the SYSENTER_R{I,S}P MSRs for all vCPUs that are compatible with Intel's architecture, not just strictly vCPUs that have vendor==Intel. The behavior of bits 63:32 is architecturally defined in the SDM, i.e. not some uarch specific quirk of Intel CPUs. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-10KVM: x86: Use "is Intel compatible" helper to emulate SYSCALL in !64-bitSean Christopherson3-36/+16
Use guest_cpuid_is_intel_compatible() to determine whether SYSCALL in 32-bit Protected Mode (including Compatibility Mode) should #UD or succeed. The existing code already does the exact equivalent of guest_cpuid_is_intel_compatible(), just in a rather roundabout way. No functional change intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>