aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/include/asm/processor.h
AgeCommit message (Collapse)AuthorFilesLines
2024-09-17Merge tag 'x86-fred-2024-09-17' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FRED updates from Thomas Gleixner: - Enable FRED right after init_mem_mapping() because at that point the early IDT fault handler is replaced by the real fault handler. The real fault handler retrieves the faulting address from the stack frame and not from CR2 when the FRED feature is set. But that obviously only works when FRED is enabled in the CPU as well. - Set SS to __KERNEL_DS when enabling FRED to prevent a corner case where ERETS can observe a SS mismatch and raises a #GP. * tag 'x86-fred-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry: Set FRED RSP0 on return to userspace instead of context switch x86/msr: Switch between WRMSRNS and WRMSR with the alternatives mechanism x86/entry: Test ti_work for zero before processing individual bits x86/fred: Set SS to __KERNEL_DS when enabling FRED x86/fred: Enable FRED right after init_mem_mapping() x86/fred: Move FRED RSP initialization into a separate function x86/fred: Parse cmdline param "fred=" in cpu_parse_early_param()
2024-09-11x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()Mario Limonciello1-3/+0
The function name is ambiguous because it returns an intermediate value for calculating maximum frequency rather than the CPPC 'Highest Perf' register. Rename the function to clarify its use and allow the function to return errors. Adjust the consumer in acpi-cpufreq to catch errors. Reviewed-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Mario Limonciello <[email protected]>
2024-08-13x86/fred: Enable FRED right after init_mem_mapping()Xin Li (Intel)1-1/+2
On 64-bit init_mem_mapping() relies on the minimal page fault handler provided by the early IDT mechanism. The real page fault handler is installed right afterwards into the IDT. This is problematic on CPUs which have X86_FEATURE_FRED set because the real page fault handler retrieves the faulting address from the FRED exception stack frame and not from CR2, but that does obviously not work when FRED is not yet enabled in the CPU. To prevent this enable FRED right after init_mem_mapping() without interrupt stacks. Those are enabled later in trap_init() after the CPU entry area is set up. [ tglx: Encapsulate the FRED details ] Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code") Reported-by: Hou Wenlong <[email protected]> Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-06-13x86/CPU/AMD: Always inline amd_clear_divider()Mateusz Guzik1-1/+11
The routine is used on syscall exit and on non-AMD CPUs is guaranteed to be empty. It probably does not need to be a function call even on CPUs which do need the mitigation. [ bp: Make sure it is always inlined so that noinstr marking works. ] Signed-off-by: Mateusz Guzik <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-05-13Merge tag 'x86-cpu-2024-05-13' of ↵Linus Torvalds1-3/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: - Rework the x86 CPU vendor/family/model code: introduce the 'VFM' value that is an 8+8+8 bit concatenation of the vendor/family/model value, and add macros that work on VFM values. This simplifies the addition of new Intel models & families, and simplifies existing enumeration & quirk code. - Add support for the AMD 0x80000026 leaf, to better parse topology information - Optimize the NUMA allocation layout of more per-CPU data structures - Improve the workaround for AMD erratum 1386 - Clear TME from /proc/cpuinfo as well, when disabled by the firmware - Improve x86 self-tests - Extend the mce_record tracepoint with the ::ppin and ::microcode fields - Implement recovery for MCE errors in TDX/SEAM non-root mode - Misc cleanups and fixes * tag 'x86-cpu-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/mm: Switch to new Intel CPU model defines x86/tsc_msr: Switch to new Intel CPU model defines x86/tsc: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines x86/resctrl: Switch to new Intel CPU model defines x86/microcode/intel: Switch to new Intel CPU model defines x86/mce: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines x86/cpu/intel_epb: Switch to new Intel CPU model defines x86/aperfmperf: Switch to new Intel CPU model defines x86/apic: Switch to new Intel CPU model defines perf/x86/msr: Switch to new Intel CPU model defines perf/x86/intel/uncore: Switch to new Intel CPU model defines perf/x86/intel/pt: Switch to new Intel CPU model defines perf/x86/lbr: Switch to new Intel CPU model defines perf/x86/intel/cstate: Switch to new Intel CPU model defines x86/bugs: Switch to new Intel CPU model defines x86/bugs: Switch to new Intel CPU model defines x86/cpu/vfm: Update arch/x86/include/asm/intel-family.h x86/cpu/vfm: Add new macros to work with (vendor/family/model) values ...
2024-05-13Merge tag 'x86-boot-2024-05-13' of ↵Linus Torvalds1-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: - Move the kernel cmdline setup earlier in the boot process (again), to address a split_lock_detect= boot parameter bug - Ignore relocations in .notes sections - Simplify boot stack setup - Re-introduce a bootloader quirk wrt CR4 handling - Miscellaneous cleanups & fixes * tag 'x86-boot-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/64: Clear most of CR4 in startup_64(), except PAE, MCE and LA57 x86/boot: Move kernel cmdline setup earlier in the boot process (again) x86/build: Clean up arch/x86/tools/relocs.c a bit x86/boot: Ignore relocations in .notes sections in walk_relocs() too x86: Rename __{start,end}_init_task to __{start,end}_init_stack x86/boot: Simplify boot stack setup
2024-05-13Merge tag 'x86-asm-2024-05-13' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: - Clean up & fix asm() operand modifiers & constraints - Misc cleanups * tag 'x86-asm-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives: Remove a superfluous newline in _static_cpu_has() x86/asm/64: Clean up memset16(), memset32(), memset64() assembly constraints in <asm/string_64.h> x86/asm: Use "m" operand constraint in WRUSSQ asm template x86/asm: Use %a instead of %P operand modifier in asm templates x86/asm: Use %c/%n instead of %P operand modifier in asm templates x86/asm: Remove %P operand modifier from altinstr asm templates
2024-05-01x86/mm: Remove broken vsyscall emulation code from the page fault codeLinus Torvalds1-1/+0
The syzbot-reported stack trace from hell in this discussion thread actually has three nested page faults: https://lore.kernel.org/r/[email protected] ... and I think that's actually the important thing here: - the first page fault is from user space, and triggers the vsyscall emulation. - the second page fault is from __do_sys_gettimeofday(), and that should just have caused the exception that then sets the return value to -EFAULT - the third nested page fault is due to _raw_spin_unlock_irqrestore() -> preempt_schedule() -> trace_sched_switch(), which then causes a BPF trace program to run, which does that bpf_probe_read_compat(), which causes that page fault under pagefault_disable(). It's quite the nasty backtrace, and there's a lot going on. The problem is literally the vsyscall emulation, which sets current->thread.sig_on_uaccess_err = 1; and that causes the fixup_exception() code to send the signal *despite* the exception being caught. And I think that is in fact completely bogus. It's completely bogus exactly because it sends that signal even when it *shouldn't* be sent - like for the BPF user mode trace gathering. In other words, I think the whole "sig_on_uaccess_err" thing is entirely broken, because it makes any nested page-faults do all the wrong things. Now, arguably, I don't think anybody should enable vsyscall emulation any more, but this test case clearly does. I think we should just make the "send SIGSEGV" be something that the vsyscall emulation does on its own, not this broken per-thread state for something that isn't actually per thread. The x86 page fault code actually tried to deal with the "incorrect nesting" by having that: if (in_interrupt()) return; which ignores the sig_on_uaccess_err case when it happens in interrupts, but as shown by this example, these nested page faults do not need to be about interrupts at all. IOW, I think the only right thing is to remove that horrendously broken code. The attached patch looks like the ObviouslyCorrect(tm) thing to do. NOTE! This broken code goes back to this commit in 2011: 4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults") ... and back then the reason was to get all the siginfo details right. Honestly, I do not for a moment believe that it's worth getting the siginfo details right here, but part of the commit says: This fixes issues with UML when vsyscall=emulate. ... and so my patch to remove this garbage will probably break UML in this situation. I do not believe that anybody should be running with vsyscall=emulate in 2024 in the first place, much less if you are doing things like UML. But let's see if somebody screams. Reported-and-tested-by: [email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Jiri Olsa <[email protected]> Acked-by: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@mail.gmail.com
2024-04-22x86/cpu/vfm: Add/initialize x86_vfm field to struct cpuinfo_x86Tony Luck1-3/+17
Refactor struct cpuinfo_x86 so that the vendor, family, and model fields are overlaid in a union with a 32-bit field that combines all three (together with a one byte reserved field in the upper byte). This will make it easy, cheap, and reliable to check all three values at once. See https://lore.kernel.org/r/Zgr6kT8oULbnmEXx@agluck-desk3 for why the ordering is (low-to-high bits): (vendor, family, model) [ bp: Move comments over the line, add the backstory about the particular order of the fields. ] Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-21x86/boot: Simplify boot stack setupBrian Gerst1-4/+2
Define the symbol __top_init_kernel_stack instead of duplicating the offset from __end_init_task in multiple places. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Kees Cook <[email protected]> Cc: Uros Bizjak <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-19x86/asm: Remove %P operand modifier from altinstr asm templatesUros Bizjak1-3/+3
The "P" asm operand modifier is a x86 target-specific modifier. For x86_64, when used with a symbol reference, the "%P" modifier emits "sym" instead of "sym(%rip)". This property is currently used to prevent %RIP-relative addressing in .altinstr sections. %RIP-relative addresses are nowadays correctly handled in .altinstr sections, so remove %P operand modifier from altinstr asm templates. Also note that unlike GCC, clang emits %rip-relative symbol reference with "P" asm operand modifier, so the patch also unifies symbol handling with both compilers. No functional changes intended. Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-11Merge tag 'x86-core-2024-03-11' of ↵Linus Torvalds1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: - The biggest change is the rework of the percpu code, to support the 'Named Address Spaces' GCC feature, by Uros Bizjak: - This allows C code to access GS and FS segment relative memory via variables declared with such attributes, which allows the compiler to better optimize those accesses than the previous inline assembly code. - The series also includes a number of micro-optimizations for various percpu access methods, plus a number of cleanups of %gs accesses in assembly code. - These changes have been exposed to linux-next testing for the last ~5 months, with no known regressions in this area. - Fix/clean up __switch_to()'s broken but accidentally working handling of FPU switching - which also generates better code - Propagate more RIP-relative addressing in assembly code, to generate slightly better code - Rework the CPU mitigations Kconfig space to be less idiosyncratic, to make it easier for distros to follow & maintain these options - Rework the x86 idle code to cure RCU violations and to clean up the logic - Clean up the vDSO Makefile logic - Misc cleanups and fixes * tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) x86/idle: Select idle routine only once x86/idle: Let prefer_mwait_c1_over_halt() return bool x86/idle: Cleanup idle_setup() x86/idle: Clean up idle selection x86/idle: Sanitize X86_BUG_AMD_E400 handling sched/idle: Conditionally handle tick broadcast in default_idle_call() x86: Increase brk randomness entropy for 64-bit systems x86/vdso: Move vDSO to mmap region x86/vdso/kbuild: Group non-standard build attributes and primary object file rules together x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o x86/retpoline: Ensure default return thunk isn't used at runtime x86/vdso: Use CONFIG_COMPAT_32 to specify vdso32 x86/vdso: Use $(addprefix ) instead of $(foreach ) x86/vdso: Simplify obj-y addition x86/vdso: Consolidate targets and clean-files x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK x86/bugs: Rename CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSO x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY => CONFIG_MITIGATION_IBRS_ENTRY x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY x86/bugs: Rename CONFIG_SLS => CONFIG_MITIGATION_SLS ...
2024-03-11Merge tag 'x86-cleanups-2024-03-11' of ↵Linus Torvalds1-28/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups, including a large series from Thomas Gleixner to cure sparse warnings" * tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/nmi: Drop unused declaration of proc_nmi_enabled() x86/callthunks: Use EXPORT_PER_CPU_SYMBOL_GPL() for per CPU variables x86/cpu: Provide a declaration for itlb_multihit_kvm_mitigation x86/cpu: Use EXPORT_PER_CPU_SYMBOL_GPL() for x86_spec_ctrl_current x86/uaccess: Add missing __force to casts in __access_ok() and valid_user_address() x86/percpu: Cure per CPU madness on UP smp: Consolidate smp_prepare_boot_cpu() x86/msr: Add missing __percpu annotations x86/msr: Prepare for including <linux/percpu.h> into <asm/msr.h> perf/x86/amd/uncore: Fix __percpu annotation x86/nmi: Remove an unnecessary IS_ENABLED(CONFIG_SMP) x86/apm_32: Remove dead function apm_get_battery_status() x86/insn-eval: Fix function param name in get_eff_addr_sib()
2024-03-11Merge tag 'x86-fred-2024-03-10' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FRED support from Thomas Gleixner: "Support for x86 Fast Return and Event Delivery (FRED). FRED is a replacement for IDT event delivery on x86 and addresses most of the technical nightmares which IDT exposes: 1) Exception cause registers like CR2 need to be manually preserved in nested exception scenarios. 2) Hardware interrupt stack switching is suboptimal for nested exceptions as the interrupt stack mechanism rewinds the stack on each entry which requires a massive effort in the low level entry of #NMI code to handle this. 3) No hardware distinction between entry from kernel or from user which makes establishing kernel context more complex than it needs to be especially for unconditionally nestable exceptions like NMI. 4) NMI nesting caused by IRET unconditionally reenabling NMIs, which is a problem when the perf NMI takes a fault when collecting a stack trace. 5) Partial restore of ESP when returning to a 16-bit segment 6) Limitation of the vector space which can cause vector exhaustion on large systems. 7) Inability to differentiate NMI sources FRED addresses these shortcomings by: 1) An extended exception stack frame which the CPU uses to save exception cause registers. This ensures that the meta information for each exception is preserved on stack and avoids the extra complexity of preserving it in software. 2) Hardware interrupt stack switching is non-rewinding if a nested exception uses the currently interrupt stack. 3) The entry points for kernel and user context are separate and GS BASE handling which is required to establish kernel context for per CPU variable access is done in hardware. 4) NMIs are now nesting protected. They are only reenabled on the return from NMI. 5) FRED guarantees full restore of ESP 6) FRED does not put a limitation on the vector space by design because it uses a central entry points for kernel and user space and the CPUstores the entry type (exception, trap, interrupt, syscall) on the entry stack along with the vector number. The entry code has to demultiplex this information, but this removes the vector space restriction. The first hardware implementations will still have the current restricted vector space because lifting this limitation requires further changes to the local APIC. 7) FRED stores the vector number and meta information on stack which allows having more than one NMI vector in future hardware when the required local APIC changes are in place. The series implements the initial FRED support by: - Reworking the existing entry and IDT handling infrastructure to accomodate for the alternative entry mechanism. - Expanding the stack frame to accomodate for the extra 16 bytes FRED requires to store context and meta information - Providing FRED specific C entry points for events which have information pushed to the extended stack frame, e.g. #PF and #DB. - Providing FRED specific C entry points for #NMI and #MCE - Implementing the FRED specific ASM entry points and the C code to demultiplex the events - Providing detection and initialization mechanisms and the necessary tweaks in context switching, GS BASE handling etc. The FRED integration aims for maximum code reuse vs the existing IDT implementation to the extent possible and the deviation in hot paths like context switching are handled with alternatives to minimalize the impact. The low level entry and exit paths are seperate due to the extended stack frame and the hardware based GS BASE swichting and therefore have no impact on IDT based systems. It has been extensively tested on existing systems and on the FRED simulation and as of now there are no outstanding problems" * tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) x86/fred: Fix init_task thread stack pointer initialization MAINTAINERS: Add a maintainer entry for FRED x86/fred: Fix a build warning with allmodconfig due to 'inline' failing to inline properly x86/fred: Invoke FRED initialization code to enable FRED x86/fred: Add FRED initialization functions x86/syscall: Split IDT syscall setup code into idt_syscall_init() KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling x86/entry: Add fred_entry_from_kvm() for VMX to handle IRQ/NMI x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual entry code x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled x86/traps: Add sysvec_install() to install a system interrupt handler x86/fred: FRED entry/exit and dispatch code x86/fred: Add a machine check entry stub for FRED x86/fred: Add a NMI entry stub for FRED x86/fred: Add a debug fault entry stub for FRED x86/idtentry: Incorporate definitions/declarations of the FRED entries x86/fred: Make exc_page_fault() work for FRED x86/fred: Allow single-step trap and NMI when starting a new task x86/fred: No ESPFIX needed when FRED is enabled ...
2024-03-07x86/fred: Fix init_task thread stack pointer initializationXin Li (Intel)1-2/+4
As TOP_OF_KERNEL_STACK_PADDING was defined as 0 on x86_64, it went unnoticed that the initialization of the .sp field in INIT_THREAD and some calculations in the low level startup code do not take the padding into account. FRED enabled kernels require a 16 byte padding, which means that the init task initialization and the low level startup code use the wrong stack offset. Subtract TOP_OF_KERNEL_STACK_PADDING in all affected places to adjust for this. Fixes: 65c9cc9e2c14 ("x86/fred: Reserve space for the FRED stack frame") Fixes: 3adee777ad0d ("x86/smpboot: Remove initial_stack on 64-bit") Reported-by: kernel test robot <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Link: https://lore.kernel.org/r/[email protected]
2024-03-04x86/idle: Select idle routine only onceThomas Gleixner1-1/+1
The idle routine selection is done on every CPU bringup operation and has a guard in place which is effective after the first invocation, which is a pointless exercise. Invoke it once on the boot CPU and mark the related functions __init. The guard check has to stay as xen_set_default_idle() runs early. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/87edcu6vaq.ffs@tglx
2024-03-04x86/percpu: Cure per CPU madness on UPThomas Gleixner1-5/+0
On UP builds Sparse complains rightfully about accesses to cpu_info with per CPU accessors: cacheinfo.c:282:30: sparse: warning: incorrect type in initializer (different address spaces) cacheinfo.c:282:30: sparse: expected void const [noderef] __percpu *__vpp_verify cacheinfo.c:282:30: sparse: got unsigned int * The reason is that on UP builds cpu_info which is a per CPU variable on SMP is mapped to boot_cpu_info which is a regular variable. There is a hideous accessor cpu_data() which tries to hide this, but it's not sufficient as some places require raw accessors and generates worse code than the regular per CPU accessors. Waste sizeof(struct x86_cpuinfo) memory on UP and provide the per CPU cpu_info unconditionally. This requires to update the CPU info on the boot CPU as SMP does. (Ab)use the weakly defined smp_prepare_boot_cpu() function and implement exactly that. This allows to use regular per CPU accessors uncoditionally and paves the way to remove the cpu_data() hackery. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-04x86/msr: Add missing __percpu annotationsThomas Gleixner1-1/+0
Sparse rightfully complains about using a plain pointer for per CPU accessors: msr-smp.c:15:23: sparse: warning: incorrect type in initializer (different address spaces) msr-smp.c:15:23: sparse: expected void const [noderef] __percpu *__vpp_verify msr-smp.c:15:23: sparse: got struct msr * Add __percpu annotations to the related datastructure and function arguments to cure this. This also cures the related sparse warnings at the callsites in drivers/edac/amd64_edac.c. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-04x86/msr: Prepare for including <linux/percpu.h> into <asm/msr.h>Thomas Gleixner1-22/+0
To clean up the per CPU insanity of UP which causes sparse to be rightfully unhappy and prevents the usage of the generic per CPU accessors on cpu_info it is necessary to include <linux/percpu.h> into <asm/msr.h>. Including <linux/percpu.h> into <asm/msr.h> is impossible because it ends up in header dependency hell. The problem is that <asm/processor.h> includes <asm/msr.h>. The inclusion of <linux/percpu.h> results in a compile fail where the compiler cannot longer handle an include in <asm/cpufeature.h> which references boot_cpu_data which is defined in <asm/processor.h>. The only reason why <asm/msr.h> is included in <asm/processor.h> are the set/get_debugctlmsr() inlines. They are defined there because <asm/processor.h> is such a nice dump ground for everything. In fact they belong obviously into <asm/debugreg.h>. Move them to <asm/debugreg.h> and fix up the resulting damage which is just exposing the reliance on random include chains. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-16x86/cpu/topology: Get rid of cpuinfo::x86_max_coresThomas Gleixner1-2/+0
Now that __num_cores_per_package and __num_threads_per_package are available, cpuinfo::x86_max_cores and the related math all over the place can be replaced with the ready to consume data. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu: Remove x86_coreid_bitsThomas Gleixner1-2/+0
No more users. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Zhang Rui <[email protected]> Tested-by: Wang Wendy <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu: Use common topology code for AMDThomas Gleixner1-2/+0
Switch it over to the new topology evaluation mechanism and remove the random bits and pieces which are sprinkled all over the place. No functional change intended. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Zhang Rui <[email protected]> Tested-by: Wang Wendy <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-15x86/cpu/amd: Provide a separate accessor for Node IDThomas Gleixner1-0/+3
AMD (ab)uses topology_die_id() to store the Node ID information and topology_max_dies_per_pkg to store the number of nodes per package. This collides with the proper processor die level enumeration which is coming on AMD with CPUID 8000_0026, unless there is a correlation between the two. There is zero documentation about that. So provide new storage and new accessors which for now still access die_id and topology_max_die_per_pkg(). Will be mopped up after AMD and HYGON are converted over. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Zhang Rui <[email protected]> Tested-by: Wang Wendy <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-14Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the ↵Ingo Molnar1-26/+57
branch Conflicts: arch/x86/include/asm/percpu.h arch/x86/include/asm/text-patching.h Signed-off-by: Ingo Molnar <[email protected]>
2023-12-11x86/percpu: Fix "const_pcpu_hot" version generation failureUros Bizjak1-1/+1
Version generation for "const_pcpu_hot" symbol failed because genksyms doesn't know the __seg_gs keyword. Effectively revert commit 4604c052b84d ("x86/percpu: Declare const_pcpu_hot as extern const variable") and use this_cpu_read_const() instead to avoid "sparse: dereference of noderef expression" warning when reading const_pcpu_hot. Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-11-13x86/barrier: Do not serialize MSR accesses on AMDBorislav Petkov (AMD)1-0/+18
AMD does not have the requirement for a synchronization barrier when acccessing a certain group of MSRs. Do not incur that unnecessary penalty there. There will be a CPUID bit which explicitly states that a MFENCE is not needed. Once that bit is added to the APM, this will be extended with it. While at it, move to processor.h to avoid include hell. Untangling that file properly is a matter for another day. Some notes on the performance aspect of why this is relevant, courtesy of Kishon VijayAbraham <[email protected]>: On a AMD Zen4 system with 96 cores, a modified ipi-bench[1] on a VM shows x2AVIC IPI rate is 3% to 4% lower than AVIC IPI rate. The ipi-bench is modified so that the IPIs are sent between two vCPUs in the same CCX. This also requires to pin the vCPU to a physical core to prevent any latencies. This simulates the use case of pinning vCPUs to the thread of a single CCX to avoid interrupt IPI latency. In order to avoid run-to-run variance (for both x2AVIC and AVIC), the below configurations are done: 1) Disable Power States in BIOS (to prevent the system from going to lower power state) 2) Run the system at fixed frequency 2500MHz (to prevent the system from increasing the frequency when the load is more) With the above configuration: *) Performance measured using ipi-bench for AVIC: Average Latency: 1124.98ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.6759M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] *) Performance measured using ipi-bench for x2AVIC: Average Latency: 1172.42ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 40.9432M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] From above, x2AVIC latency is ~4% more than AVIC. However, the expectation is x2AVIC performance to be better or equivalent to AVIC. Upon analyzing the perf captures, it is observed significant time is spent in weak_wrmsr_fence() invoked by x2apic_send_IPI(). With the fix to skip weak_wrmsr_fence() *) Performance measured using ipi-bench for x2AVIC: Average Latency: 1117.44ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.9608M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] Comparing the performance of x2AVIC with and without the fix, it can be seen the performance improves by ~4%. Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option with and without weak_wrmsr_fence() on a Zen4 system also showed significant performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores CCX or CCD and just picks random vCPU. Average throughput (10 iterations) with weak_wrmsr_fence(), Cumulative throughput: 4933374 IPI/s Average throughput (10 iterations) without weak_wrmsr_fence(), Cumulative throughput: 6355156 IPI/s [1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-30Merge tag 'x86-core-2023-10-29-v2' of ↵Linus Torvalds1-15/+38
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Thomas Gleixner: - Limit the hardcoded topology quirk for Hygon CPUs to those which have a model ID less than 4. The newer models have the topology CPUID leaf 0xB correctly implemented and are not affected. - Make SMT control more robust against enumeration failures SMT control was added to allow controlling SMT at boottime or runtime. The primary purpose was to provide a simple mechanism to disable SMT in the light of speculation attack vectors. It turned out that the code is sensible to enumeration failures and worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration which means the primary thread mask is not set up correctly. By chance a XEN/PV boot ends up with smp_num_siblings == 2, which makes the hotplug control stay at its default value "enabled". So the mask is never evaluated. The ongoing rework of the topology evaluation caused XEN/PV to end up with smp_num_siblings == 1, which sets the SMT control to "not supported" and the empty primary thread mask causes the hotplug core to deny the bringup of the APS. Make the decision logic more robust and take 'not supported' and 'not implemented' into account for the decision whether a CPU should be booted or not. - Fake primary thread mask for XEN/PV Pretend that all XEN/PV vCPUs are primary threads, which makes the usage of the primary thread mask valid on XEN/PV. That is consistent with because all of the topology information on XEN/PV is fake or even non-existent. - Encapsulate topology information in cpuinfo_x86 Move the randomly scattered topology data into a separate data structure for readability and as a preparatory step for the topology evaluation overhaul. - Consolidate APIC ID data type to u32 It's fixed width hardware data and not randomly u16, int, unsigned long or whatever developers decided to use. - Cure the abuse of cpuinfo for persisting logical IDs. Per CPU cpuinfo is used to persist the logical package and die IDs. That's really not the right place simply because cpuinfo is subject to be reinitialized when a CPU goes through an offline/online cycle. Use separate per CPU data for the persisting to enable the further topology management rework. It will be removed once the new topology management is in place. - Provide a debug interface for inspecting topology information Useful in general and extremly helpful for validating the topology management rework in terms of correctness or "bug" compatibility. * tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too x86/cpu: Provide debug interface x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids x86/apic: Use u32 for wakeup_secondary_cpu[_64]() x86/apic: Use u32 for [gs]et_apic_id() x86/apic: Use u32 for phys_pkg_id() x86/apic: Use u32 for cpu_present_to_apicid() x86/apic: Use u32 for check_apicid_used() x86/apic: Use u32 for APIC IDs in global data x86/apic: Use BAD_APICID consistently x86/cpu: Move cpu_l[l2]c_id into topology info x86/cpu: Move logical package and die IDs into topology info x86/cpu: Remove pointless evaluation of x86_coreid_bits x86/cpu: Move cu_id into topology info x86/cpu: Move cpu_core_id into topology info hwmon: (fam15h_power) Use topology_core_id() scsi: lpfc: Use topology_core_id() x86/cpu: Move cpu_die_id into topology info x86/cpu: Move phys_proc_id into topology info x86/cpu: Encapsulate topology information in cpuinfo_x86 ...
2023-10-30Merge tag 'x86-mm-2023-10-28' of ↵Linus Torvalds1-8/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm handling updates from Ingo Molnar: - Add new NX-stack self-test - Improve NUMA partial-CFMWS handling - Fix #VC handler bugs resulting in SEV-SNP boot failures - Drop the 4MB memory size restriction on minimal NUMA nodes - Reorganize headers a bit, in preparation to header dependency reduction efforts - Misc cleanups & fixes * tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size selftests/x86/lam: Zero out buffer for readlink() x86/sev: Drop unneeded #include x86/sev: Move sev_setup_arch() to mem_encrypt.c x86/tdx: Replace deprecated strncpy() with strtomem_pad() selftests/x86/mm: Add new test that userspace stack is in fact NX x86/sev: Make boot_ghcb_page[] static x86/boot: Move x86_cache_alignment initialization to correct spot x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach x86/sev-es: Allow copy_from_kernel_nofault() in earlier boot x86_64: Show CR4.PSE on auxiliaries like on BSP x86/iommu/docs: Update AMD IOMMU specification document URL x86/sev/docs: Update document URL in amd-memory-encryption.rst x86/mm: Move arch_memory_failure() and arch_is_platform_page() definitions from <asm/processor.h> to <asm/pgtable.h> ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window x86/numa: Introduce numa_fill_memblks()
2023-10-24x86/percpu: Return correct variable from current_top_of_stack()Uros Bizjak1-1/+1
current_top_of_stack() should return variable from _seg_gs qualified named address space when CONFIG_USE_X86_SEG_SUPPORT=y is enbled. Fixes: ed2f752e0e0a ("x86/percpu: Introduce const-qualified const_pcpu_hot to micro-optimize code generation") Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: Andy Lutomirski <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Sean Christopherson <[email protected]>
2023-10-23x86/percpu: Introduce const-qualified const_pcpu_hot to micro-optimize code ↵Uros Bizjak1-0/+3
generation Some variables in pcpu_hot, currently current_task and top_of_stack are actually per-thread variables implemented as per-CPU variables and thus stable for the duration of the respective task. There is already an attempt to eliminate redundant reads from these variables using this_cpu_read_stable() asm macro, which hides the dependency on the read memory address. However, the compiler has limited ability to eliminate asm common subexpressions, so this approach results in a limited success. The solution is to allow more aggressive elimination by aliasing pcpu_hot into a const-qualified const_pcpu_hot, and to read stable per-CPU variables from this constant copy. The current per-CPU infrastructure does not support reads from const-qualified variables. However, when the compiler supports segment qualifiers, it is possible to declare the const-aliased variable in the relevant named address space. The compiler considers access to the variable, declared in this way, as a read from a constant location, and will optimize reads from the variable accordingly. By implementing constant-qualified const_pcpu_hot, the compiler can eliminate redundant reads from the constant variables, reducing the number of loads from current_task from 3766 to 3217 on a test build, a -14.6% reduction. The reduction of loads translates to the following code savings: text data bss dec hex filename 25,477,353 4389456 808452 30675261 1d4113d vmlinux-old.o 25,476,074 4389440 808452 30673966 1d40c2e vmlinux-new.o representing a code size reduction of -1279 bytes. [ mingo: Updated the changelog, EXPORT(const_pcpu_hot). ] Co-developed-by: Nadav Amit <[email protected]> Signed-off-by: Nadav Amit <[email protected]> Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-10x86/apic: Use u32 for APIC IDs in global dataThomas Gleixner1-2/+2
APIC IDs are used with random data types u16, u32, int, unsigned int, unsigned long. Make it all consistently use u32 because that reflects the hardware register width and fixup the most obvious usage sites of that. The APIC callbacks will be addressed separately. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Zhang Rui <[email protected]> Reviewed-by: Arjan van de Ven <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-10x86/cpu: Move cpu_l[l2]c_id into topology infoThomas Gleixner1-1/+13
The topology IDs which identify the LLC and L2 domains clearly belong to the per CPU topology information. Move them into cpuinfo_x86::cpuinfo_topo and get rid of the extra per CPU data and the related exports. This also paves the way to do proper topology evaluation during early boot because it removes the only per CPU dependency for that. No functional change. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Zhang Rui <[email protected]> Reviewed-by: Arjan van de Ven <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-10x86/cpu: Move logical package and die IDs into topology infoThomas Gleixner1-4/+4
Yet another topology related data pair. Rename logical_proc_id to logical_pkg_id so it fits the common naming conventions. No functional change. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Zhang Rui <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-10x86/cpu: Move cu_id into topology infoThomas Gleixner1-1/+3
No functional change. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Zhang Rui <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-10x86/cpu: Move cpu_core_id into topology infoThomas Gleixner1-1/+3
Rename it to core_id and stick it to the other ID fields. No functional change. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Zhang Rui <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-10x86/cpu: Move cpu_die_id into topology infoThomas Gleixner1-1/+3
Move the next member. No functional change. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Zhang Rui <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-10x86/cpu: Move phys_proc_id into topology infoThomas Gleixner1-2/+3
Rename it to pkg_id which is the terminology used in the kernel. No functional change. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Zhang Rui <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-10x86/cpu: Encapsulate topology information in cpuinfo_x86Thomas Gleixner1-5/+9
The topology related information is randomly scattered across cpuinfo_x86. Create a new structure cpuinfo_topo and move in a first step initial_apicid and apicid into it. Aside of being better readable this is in preparation for replacing the horribly fragile CPU topology evaluation code further down the road. Consolidate APIC ID fields to u32 as that represents the hardware type. No functional change. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Zhang Rui <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-05Merge tag 'v6.6-rc4' into x86/entry, to pick up fixesIngo Molnar1-2/+0
Signed-off-by: Ingo Molnar <[email protected]>
2023-09-22x86/mm: Move arch_memory_failure() and arch_is_platform_page() definitions ↵Ingo Molnar1-8/+0
from <asm/processor.h> to <asm/pgtable.h> <linux/mm.h> relies on these definitions being included first, which is true currently due to historic header spaghetti, but in the future <asm/processor.h> will not guaranteed to be included by the MM code. Move these definitions over into a suitable MM header. This is a preparatory patch for x86 header dependency simplifications and reductions. Signed-off-by: Ingo Molnar <[email protected]> Cc: [email protected]
2023-09-19x86/srso: Set CPUID feature bits independently of bug or mitigation statusJosh Poimboeuf1-2/+0
Booting with mitigations=off incorrectly prevents the X86_FEATURE_{IBPB_BRTYPE,SBPB} CPUID bits from getting set. Also, future CPUs without X86_BUG_SRSO might still have IBPB with branch type prediction flushing, in which case SBPB should be used instead of IBPB. The current code doesn't allow for that. Also, cpu_has_ibpb_brtype_microcode() has some surprising side effects and the setting of these feature bits really doesn't belong in the mitigation code anyway. Move it to earlier. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/869a1709abfe13b673bdd10c2f4332ca253a40bc.1693889988.git.jpoimboe@kernel.org
2023-09-14x86/entry: Rename ignore_sysret()Nikolay Borisov1-1/+1
The SYSCALL instruction cannot really be disabled in compatibility mode. The best that can be done is to configure the CSTAR msr to point to a minimal handler. Currently this handler has a rather misleading name - ignore_sysret() as it's not really doing anything with sysret. Give it a more descriptive name. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-31Merge tag 'x86_shstk_for_6.6-rc1' of ↵Linus Torvalds1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 shadow stack support from Dave Hansen: "This is the long awaited x86 shadow stack support, part of Intel's Control-flow Enforcement Technology (CET). CET consists of two related security features: shadow stacks and indirect branch tracking. This series implements just the shadow stack part of this feature, and just for userspace. The main use case for shadow stack is providing protection against return oriented programming attacks. It works by maintaining a secondary (shadow) stack using a special memory type that has protections against modification. When executing a CALL instruction, the processor pushes the return address to both the normal stack and to the special permission shadow stack. Upon RET, the processor pops the shadow stack copy and compares it to the normal stack copy. For more information, refer to the links below for the earlier versions of this patch set" Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/lkml/[email protected]/ * tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) x86/shstk: Change order of __user in type x86/ibt: Convert IBT selftest to asm x86/shstk: Don't retry vm_munmap() on -EINTR x86/kbuild: Fix Documentation/ reference x86/shstk: Move arch detail comment out of core mm x86/shstk: Add ARCH_SHSTK_STATUS x86/shstk: Add ARCH_SHSTK_UNLOCK x86: Add PTRACE interface for shadow stack selftests/x86: Add shadow stack test x86/cpufeatures: Enable CET CR4 bit for shadow stack x86/shstk: Wire in shadow stack interface x86: Expose thread features in /proc/$PID/status x86/shstk: Support WRSS for userspace x86/shstk: Introduce map_shadow_stack syscall x86/shstk: Check that signal frame is shadow stack mem x86/shstk: Check that SSP is aligned on sigreturn x86/shstk: Handle signals for shadow stack x86/shstk: Introduce routines modifying shstk x86/shstk: Handle thread shadow stack x86/shstk: Add user-mode shadow stack support ...
2023-08-30Merge tag 'x86_apic_for_6.6-rc1' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Dave Hansen: "This includes a very thorough rework of the 'struct apic' handlers. Quite a variety of them popped up over the years, especially in the 32-bit days when odd apics were much more in vogue. The end result speaks for itself, which is a removal of a ton of code and static calls to replace indirect calls. If there's any breakage here, it's likely to be around the 32-bit museum pieces that get light to no testing these days. Summary: - Rework apic callbacks, getting rid of unnecessary ones and coalescing lots of silly duplicates. - Use static_calls() instead of indirect calls for apic->foo() - Tons of cleanups an crap removal along the way" * tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) x86/apic: Turn on static calls x86/apic: Provide static call infrastructure for APIC callbacks x86/apic: Wrap IPI calls into helper functions x86/apic: Mark all hotpath APIC callback wrappers __always_inline x86/xen/apic: Mark apic __ro_after_init x86/apic: Convert other overrides to apic_update_callback() x86/apic: Replace acpi_wake_cpu_handler_update() and apic_set_eoi_cb() x86/apic: Provide apic_update_callback() x86/xen/apic: Use standard apic driver mechanism for Xen PV x86/apic: Provide common init infrastructure x86/apic: Wrap apic->native_eoi() into a helper x86/apic: Nuke ack_APIC_irq() x86/apic: Remove pointless arguments from [native_]eoi_write() x86/apic/noop: Tidy up the code x86/apic: Remove pointless NULL initializations x86/apic: Sanitize APIC ID range validation x86/apic: Prepare x2APIC for using apic::max_apic_id x86/apic: Simplify X2APIC ID validation x86/apic: Add max_apic_id member x86/apic: Wrap APIC ID validation into an inline ...
2023-08-28Merge tag 'x86-cleanups-2023-08-28' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 cleanups from Ingo Molnar: "The following commit deserves special mention: 22dc02f81cddd Revert "sched/fair: Move unused stub functions to header" This is in x86/cleanups, because the revert is a re-application of a number of cleanups that got removed inadvertedly" [ This also effectively undoes the amd_check_microcode() microcode declaration change I had done in my microcode loader merge in commit 42a7f6e3ffe0 ("Merge tag 'x86_microcode_for_v6.6_rc1' [...]"). I picked the declaration change by Arnd from this branch instead, which put it in <asm/processor.h> instead of <asm/microcode.h> like I had done in my merge resolution - Linus ] * tag 'x86-cleanups-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/uv: Refactor code using deprecated strncpy() interface to use strscpy() x86/hpet: Refactor code using deprecated strncpy() interface to use strscpy() x86/platform/uv: Refactor code using deprecated strcpy()/strncpy() interfaces to use strscpy() x86/qspinlock-paravirt: Fix missing-prototype warning x86/paravirt: Silence unused native_pv_lock_init() function warning x86/alternative: Add a __alt_reloc_selftest() prototype x86/purgatory: Include header for warn() declaration x86/asm: Avoid unneeded __div64_32 function definition Revert "sched/fair: Move unused stub functions to header" x86/apic: Hide unused safe_smp_processor_id() on 32-bit UP x86/cpu: Fix amd_check_microcode() declaration
2023-08-12locking: remove spin_lock_prefetchMateusz Guzik1-6/+0
The only remaining consumer is new_inode, where it showed up in 2001 as commit c37fa164f793 ("v2.4.9.9 -> v2.4.9.10") in a historical repo [1] with a changelog which does not mention it. Since then the line got only touched up to keep compiling. While it may have been of benefit back in the day, it is guaranteed to at best not get in the way in the multicore setting -- as the code performs *a lot* of work between the prefetch and actual lock acquire, any contention means the cacheline is already invalid by the time the routine calls spin_lock(). It adds spurious traffic, for short. On top of it prefetch is notoriously tricky to use for single-threaded purposes, making it questionable from the get go. As such, remove it. I admit upfront I did not see value in benchmarking this change, but I can do it if that is deemed appropriate. Removal from new_inode and of the entire thing are in the same patch as requested by Linus, so whatever weird looks can be directed at that guy. Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/fs/inode.c?id=c37fa164f793735b32aa3f53154ff1a7659e6442 [1] Signed-off-by: Mateusz Guzik <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2023-08-10x86: Move gds_ucode_mitigated() declaration to headerArnd Bergmann1-0/+2
The declaration got placed in the .c file of the caller, but that causes a warning for the definition: arch/x86/kernel/cpu/bugs.c:682:6: error: no previous prototype for 'gds_ucode_mitigated' [-Werror=missing-prototypes] Move it to a header where both sides can observe it instead. Fixes: 81ac7e5d74174 ("KVM: Add GDS_NO support to KVM") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Tested-by: Daniel Sneddon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/all/20230809130530.1913368-2-arnd%40kernel.org
2023-08-09x86/cpu: Make identify_boot_cpu() staticThomas Gleixner1-1/+0
It's not longer used outside the source file. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Michael Kelley <[email protected]> Tested-by: Sohil Mehta <[email protected]> Tested-by: Juergen Gross <[email protected]> # Xen PV (dom0 and unpriv. guest)
2023-08-09x86/CPU/AMD: Do not leak quotient data after a division by 0Borislav Petkov (AMD)1-0/+2
Under certain circumstances, an integer division by 0 which faults, can leave stale quotient data from a previous division operation on Zen1 microarchitectures. Do a dummy division 0/1 before returning from the #DE exception handler in order to avoid any leaks of potentially sensitive data. Signed-off-by: Borislav Petkov (AMD) <[email protected]> Cc: <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2023-08-02x86/shstk: Add user-mode shadow stack supportRick Edgecombe1-0/+2
Introduce basic shadow stack enabling/disabling/allocation routines. A task's shadow stack is allocated from memory with VM_SHADOW_STACK flag and has a fixed size of min(RLIMIT_STACK, 4GB). Keep the task's shadow stack address and size in thread_struct. This will be copied when cloning new threads, but needs to be cleared during exec, so add a function to do this. 32 bit shadow stack is not expected to have many users and it will complicate the signal implementation. So do not support IA32 emulation or x32. Co-developed-by: Yu-cheng Yu <[email protected]> Signed-off-by: Yu-cheng Yu <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Tested-by: Pengfei Xu <[email protected]> Tested-by: John Allen <[email protected]> Tested-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/all/20230613001108.3040476-29-rick.p.edgecombe%40intel.com