aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/include/asm
AgeCommit message (Collapse)AuthorFilesLines
2024-07-16Merge tag 'x86_cache_for_v6.11_rc1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Enable Sub-NUMA clustering to work with resource control on Intel by teaching resctrl to handle scopes due to the clustering which partitions the L3 cache into sets. Modify and extend the subsystem to handle such scopes properly * tag 'x86_cache_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Update documentation with Sub-NUMA cluster changes x86/resctrl: Detect Sub-NUMA Cluster (SNC) mode x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems x86/resctrl: Make __mon_event_count() handle sum domains x86/resctrl: Fill out rmid_read structure for smp_call*() to read a counter x86/resctrl: Handle removing directories in Sub-NUMA Cluster (SNC) mode x86/resctrl: Create Sub-NUMA Cluster (SNC) monitor files x86/resctrl: Allocate a new field in union mon_data_bits x86/resctrl: Refactor mkdir_mondata_subdir() with a helper function x86/resctrl: Initialize on-stack struct rmid_read instances x86/resctrl: Add a new field to struct rmid_read for summation of domains x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster (SNC) systems x86/resctrl: Introduce snc_nodes_per_l3_cache x86/resctrl: Add node-scope to the options for feature scope x86/resctrl: Split the rdt_domain and rdt_hw_domain structures x86/resctrl: Prepare for different scope for control/monitor operations x86/resctrl: Prepare to split rdt_domain structure x86/resctrl: Prepare for new domain scope
2024-07-16KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_opsWei Wang1-0/+1
Similar to kvm_x86_call(), kvm_pmu_call() is added to streamline the usage of static calls of kvm_pmu_ops, which improves code readability. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Wei Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-16KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_opsWei Wang1-4/+6
Introduces kvm_x86_call(), to streamline the usage of static calls of kvm_x86_ops. The current implementation of these calls is verbose and could lead to alignment challenges. This makes the code susceptible to exceeding the "80 columns per single line of code" limit as defined in the coding-style document. Another issue with the existing implementation is that the addition of kvm_x86_ prefix to hooks at the static_call sites hinders code readability and navigation. kvm_x86_call() is added to improve code readability and maintainability, while adhering to the coding style guidelines. Signed-off-by: Wei Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-16KVM: x86: Replace static_call_cond() with static_call()Wei Wang3-6/+4
The use of static_call_cond() is essentially the same as static_call() on x86 (e.g. static_call() now handles a NULL pointer as a NOP), so replace it with static_call() to simplify the code. Link: https://lore.kernel.org/all/3916caa1dcd114301a49beafa5030eca396745c1.1679456900.git.jpoimboe@kernel.org/ Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Wei Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-16Merge branch 'kvm-6.11-sev-attestation' into HEADPaolo Bonzini1-0/+48
The GHCB 2.0 specification defines 2 GHCB request types to allow SNP guests to send encrypted messages/requests to firmware: SNP Guest Requests and SNP Extended Guest Requests. These encrypted messages are used for things like servicing attestation requests issued by the guest. Implementing support for these is required to be fully GHCB-compliant. For the most part, KVM only needs to handle forwarding these requests to firmware (to be issued via the SNP_GUEST_REQUEST firmware command defined in the SEV-SNP Firmware ABI), and then forwarding the encrypted response to the guest. However, in the case of SNP Extended Guest Requests, the host is also able to provide the certificate data corresponding to the endorsement key used by firmware to sign attestation report requests. This certificate data is provided by userspace because: 1) It allows for different keys/key types to be used for each particular guest with requiring any sort of KVM API to configure the certificate table in advance on a per-guest basis. 2) It provides additional flexibility with how attestation requests might be handled during live migration where the certificate data for source/dest might be different. 3) It allows all synchronization between certificates and firmware/signing key updates to be handled purely by userspace rather than requiring some in-kernel mechanism to facilitate it. [1] To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data, but is still enough to provide compliance with the GHCB 2.0 spec.
2024-07-16x86/sev: Move sev_guest.h into common SEV headerMichael Roth1-0/+48
sev_guest.h currently contains various definitions relating to the format of SNP_GUEST_REQUEST commands to SNP firmware. Currently only the sev-guest driver makes use of them, but when the KVM side of this is implemented there's a need to parse the SNP_GUEST_REQUEST header to determine whether additional information needs to be provided to the guest. Prepare for this by moving those definitions to a common header that's shared by host/guest code so that KVM can also make use of them. Reviewed-by: Tom Lendacky <[email protected]> Reviewed-by: Liam Merwick <[email protected]> Signed-off-by: Michael Roth <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-07-16Merge tag 'kvm-x86-vmx-6.11' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini2-3/+1
KVM VMX changes for 6.11 - Remove an unnecessary EPT TLB flush when enabling hardware. - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1). - Misc cleanups
2024-07-16Merge tag 'kvm-x86-pmu-6.11' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-13/+17
KVM x86/pmu changes for 6.11 - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored. - Update to the newfangled Intel CPU FMS infrastructure. - Use macros instead of open-coded literals to clean up KVM's manipulation of FIXED_CTR_CTRL MSRs.
2024-07-16Merge tag 'kvm-x86-mtrrs-6.11' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-11/+4
KVM x86 MTRR virtualization removal Remove support for virtualizing MTRRs on Intel CPUs, along with a nasty CR0.CD hack, and instead always honor guest PAT on CPUs that support self-snoop.
2024-07-16Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-3/+21
KVM x86 misc changes for 6.11 - Add a global struct to consolidate tracking of host values, e.g. EFER, and move "shadow_phys_bits" into the structure as "maxphyaddr". - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX. - Print the name of the APICv/AVIC inhibits in the relevant tracepoint. - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor. - Misc cleanups
2024-07-16Merge tag 'kvm-x86-generic-6.11' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini2-3/+0
KVM generic changes for 6.11 - Enable halt poll shrinking by default, as Intel found it to be a clear win. - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86. - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out(). - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs. - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout. - A few minor cleanups
2024-07-15Merge tag 'x86_cpu_for_v6.11_rc1' of ↵Linus Torvalds5-525/+470
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu model updates from Borislav Petkov: - Flip the logic to add feature names to /proc/cpuinfo to having to explicitly specify the flag if there's a valid reason to show it in /proc/cpuinfo - Switch a bunch of Intel x86 model checking code to the new CPU model defines - Fixes and cleanups * tag 'x86_cpu_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/intel: Drop stray FAM6 check with new Intel CPU model defines x86/cpufeatures: Flip the /proc/cpuinfo appearance logic x86/CPU/AMD: Always inline amd_clear_divider() x86/mce/inject: Add missing MODULE_DESCRIPTION() line perf/x86/rapl: Switch to new Intel CPU model defines x86/boot: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines perf/x86/intel: Switch to new Intel CPU model defines x86/virt/tdx: Switch to new Intel CPU model defines x86/PCI: Switch to new Intel CPU model defines x86/cpu/intel: Switch to new Intel CPU model defines x86/platform/intel-mid: Switch to new Intel CPU model defines x86/pconfig: Remove unused MKTME pconfig code x86/cpu: Remove useless work in detect_tme_early()
2024-07-15Merge tag 'x86_vmware_for_v6.11_rc1' of ↵Linus Torvalds1-33/+303
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vmware updates from Borislav Petkov: - Add a unified VMware hypercall API layer which should be used by all callers instead of them doing homegrown solutions. This will provide for adding API support for confidential computing solutions like TDX * tag 'x86_vmware_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vmware: Add TDX hypercall support x86/vmware: Remove legacy VMWARE_HYPERCALL* macros x86/vmware: Correct macro names x86/vmware: Use VMware hypercall API drm/vmwgfx: Use VMware hypercall API input/vmmouse: Use VMware hypercall API ptp/vmware: Use VMware hypercall API x86/vmware: Introduce VMware hypercall API
2024-07-15Merge tag 'x86_misc_for_v6.11_rc1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Borislav Petkov: - Make error checking of AMD SMN accesses more robust in the callers as they're the only ones who can interpret the results properly - The usual cleanups and fixes, left and right * tag 'x86_misc_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kmsan: Fix hook for unaligned accesses x86/platform/iosf_mbi: Convert PCIBIOS_* return codes to errnos x86/pci/xen: Fix PCIBIOS_* return code handling x86/pci/intel_mid_pci: Fix PCIBIOS_* return code handling x86/of: Return consistent error type from x86_of_pci_irq_enable() hwmon: (k10temp) Rename _data variable hwmon: (k10temp) Remove unused HAVE_TDIE() macro hwmon: (k10temp) Reduce k10temp_get_ccd_support() parameters hwmon: (k10temp) Define a helper function to read CCD temperature x86/amd_nb: Enhance SMN access error checking hwmon: (k10temp) Check return value of amd_smn_read() EDAC/amd64: Check return value of amd_smn_read() EDAC/amd64: Remove unused register accesses tools/x86/kcpuid: Add missing dir via Makefile x86, arm: Add missing license tag to syscall tables files
2024-07-15Merge tag 'x86_cc_for_v6.11_rc1' of ↵Linus Torvalds7-2/+32
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 confidential computing updates from Borislav Petkov: "Unrelated x86/cc changes queued here to avoid ugly cross-merges and conflicts: - Carve out CPU hotplug function declarations into a separate header with the goal to be able to use the lockdep assertions in a more flexible manner - As a result, refactor cacheinfo code after carving out a function to return the cache ID associated with a given cache level - Cleanups Add support to be able to kexec TDX guests: - Expand ACPI MADT CPU offlining support - Add machinery to prepare CoCo guests memory before kexec-ing into a new kernel - Cleanup, readjust and massage related code" * tag 'x86_cc_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed x86/acpi: Add support for CPU offlining for ACPI MADT wakeup method x86/mm: Introduce kernel_ident_mapping_free() x86/smp: Add smp_ops.stop_this_cpu() callback x86/acpi: Do not attempt to bring up secondary CPUs in the kexec case x86/acpi: Rename fields in the acpi_madt_multiproc_wakeup structure x86/mm: Do not zap page table entries mapping unaccepted memory table during kdump x86/mm: Make e820__end_ram_pfn() cover E820_TYPE_ACPI ranges x86/tdx: Convert shared memory back to private on kexec x86/mm: Add callbacks to prepare encrypted memory for kexec x86/tdx: Account shared memory x86/mm: Return correct level from lookup_address() if pte is none x86/mm: Make x86_platform.guest.enc_status_change_*() return an error x86/kexec: Keep CR4.MCE set during kexec for TDX guest x86/relocate_kernel: Use named labels for less confusion cpu/hotplug, x86/acpi: Disable CPU offlining for ACPI MADT wakeup cpu/hotplug: Add support for declaring CPU offlining not supported x86/apic: Mark acpi_mp_wake_* variables as __ro_after_init x86/acpi: Extract ACPI MADT wakeup code into a separate file x86/kexec: Remove spurious unconditional JMP from from identity_mapped() ...
2024-07-15Merge tag 'x86_boot_for_v6.11_rc1' of ↵Linus Torvalds1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Borislav Petkov: - Add a check to warn when cmdline parsing happens before the final cmdline string has been built and thus arguments can get lost - Code cleanups and simplifications * tag 'x86_boot_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/setup: Warn when option parsing is done too early x86/boot: Clean up the arch/x86/boot/main.c code a bit x86/boot: Use current_stack_pointer to avoid asm() in init_heap()
2024-07-15Merge tag 'x86_alternatives_for_v6.11_rc1' of ↵Linus Torvalds2-166/+77
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 alternatives updates from Borislav Petkov: "This is basically PeterZ's idea to nest the alternative macros to avoid the need to "spell out" the number of alternates in an ALTERNATIVE_n() macro and thus have an ever-increasing complexity in those definitions. For ease of bisection, the old macros are converted to the new, nested variants in a step-by-step manner so that in case an issue is encountered during testing, one can pinpoint the place where it fails easier. Because debugging alternatives is a serious pain" * tag 'x86_alternatives_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives, kvm: Fix a couple of CALLs without a frame pointer x86/alternative: Replace the old macros x86/alternative: Convert the asm ALTERNATIVE_3() macro x86/alternative: Convert the asm ALTERNATIVE_2() macro x86/alternative: Convert the asm ALTERNATIVE() macro x86/alternative: Convert ALTERNATIVE_3() x86/alternative: Convert ALTERNATIVE_TERNARY() x86/alternative: Convert alternative_call_2() x86/alternative: Convert alternative_call() x86/alternative: Convert alternative_io() x86/alternative: Convert alternative_input() x86/alternative: Convert alternative_2() x86/alternative: Convert alternative() x86/alternatives: Add nested alternatives macros x86/alternative: Zap alternative_ternary()
2024-07-15Merge tag 'ras_core_for_v6.11_rc1' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - A cleanup and a correction to the error injection driver to inject a MCA_MISC value only when one has actually been supplied by the user * tag 'ras_core_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Remove unused variable and return value in machine_check_poll() x86/mce/inject: Only write MCA_MISC when a value has been supplied
2024-07-15Merge tag 'timers-core-2024-07-14' of ↵Linus Torvalds4-12/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timers, timekeeping and related functionality: Core: - Make the takeover of a hrtimer based broadcast timer reliable during CPU hot-unplug. The current implementation suffers from a race which can lead to broadcast timer starvation in the worst case. - VDSO related cleanups and simplifications - Small cleanups and enhancements all over the place PTP: - Replace the architecture specific base clock to clocksource, e.g. ART to TSC, conversion function with generic functionality to avoid exposing such internals to drivers and convert all existing drivers over. This also allows to provide functionality which converts the other way round in the core code based on the same parameter set. - Provide a function to convert CLOCK_REALTIME to the base clock to support the upcoming PPS output driver on Intel platforms. Drivers: - A set of Device Tree bindings for new hardware - Cleanups and enhancements all over the place" * tag 'timers-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) clocksource/drivers/realtek: Add timer driver for rtl-otto platforms dt-bindings: timer: Add schema for realtek,otto-timer dt-bindings: timer: Add SOPHGO SG2002 clint dt-bindings: timer: renesas,tmu: Add R-Car Gen2 support dt-bindings: timer: renesas,tmu: Add RZ/G1 support dt-bindings: timer: renesas,tmu: Add R-Mobile APE6 support clocksource/drivers/mips-gic-timer: Correct sched_clock width clocksource/drivers/mips-gic-timer: Refine rating computation clocksource/drivers/sh_cmt: Address race condition for clock events clocksource/driver/arm_global_timer: Remove unnecessary ‘0’ values from err clocksource/drivers/arm_arch_timer: Remove unnecessary ‘0’ values from irq tick/broadcast: Make takeover of broadcast hrtimer reliable tick/sched: Combine WARN_ON_ONCE and print_once x86/vdso: Remove unused include x86/vgtod: Remove unused typedef gtod_long_t x86/vdso: Fix function reference in comment vdso: Add comment about reason for vdso struct ordering vdso/gettimeofday: Clarify comment about open coded function timekeeping: Add missing kernel-doc function comments tick: Remove unnused tick_nohz_get_idle_calls() ...
2024-07-15Merge branch 'word-at-a-time'Linus Torvalds1-34/+23
Merge minor word-at-a-time instruction choice improvements for x86 and arm64. This is the second of four branches that came out of me looking at the code generation for path lookup on arm64. The word-at-a-time infrastructure is used to do string operations in chunks of one word both when copying the pathname from user space (in strncpy_from_user()), and when parsing and hashing the individual path components (in link_path_walk()). In particular, the "find the first zero byte" uses various bit tricks to figure out the end of the string or path component, and get the length without having to do things one byte at a time. Both x86-64 and arm64 had less than optimal code choices for that. The commit message for the arm64 change in particular tries to explain the exact code flow for the zero byte finding for people who care. It's made a bit more complicated by the fact that we support big-endian hardware too, and so we have some extra abstraction layers to allow different models for finding the zero byte, quite apart from the issue of picking specialized instructions. * word-at-a-time: arm64: word-at-a-time: improve byte count calculations for LE x86-64: word-at-a-time: improve byte count calculations
2024-07-15Merge branch 'runtime-constants'Linus Torvalds1-0/+61
Merge runtime constants infrastructure with implementations for x86 and arm64. This is one of four branches that came out of me looking at profiles of my kernel build filesystem load on my 128-core Altra arm64 system, where pathname walking and the user copies (particularly strncpy_from_user() for fetching the pathname from user space) is very hot. This is a very specialized "instruction alternatives" model where the dentry hash pointer and hash count will be constants for the lifetime of the kernel, but the allocation are not static but done early during the kernel boot. In order to avoid the pointer load and dynamic shift, we just rewrite the constants in the instructions in place. We can't use the "generic" alternative instructions infrastructure, because different architectures do it very differently, and it's actually simpler to just have very specific helpers, with a fallback to the generic ("old") model of just using variables for architectures that do not implement the runtime constant patching infrastructure. Link: https://lore.kernel.org/all/CAHk-=widPe38fUNjUOmX11ByDckaeEo9tN4Eiyke9u1SAtu9sA@mail.gmail.com/ * runtime-constants: arm64: add 'runtime constant' support runtime constants: add x86 architecture support runtime constants: add default dummy infrastructure vfs: dcache: move hashlen_hash() from callers into d_hash()
2024-07-13Merge tag 'timers-v6.11-rc1' of ↵Thomas Gleixner6-20/+15
https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clocksource/event driver updates from Daniel Lezcano: - Remove unnecessary local variables initialization as they will be initialized in the code path anyway right after on the ARM arch timer and the ARM global timer (Li kunyu) - Fix a race condition in the interrupt leading to a deadlock on the SH CMT driver. Note that this fix was not tested on the platform using this timer but the fix seems reasonable enough to be picked confidently (Niklas Söderlund) - Increase the rating of the gic-timer and use the configured width clocksource register on the MIPS architecture (Jiaxun Yang) - Add the DT bindings for the TMU on the Renesas platforms (Geert Uytterhoeven) - Add the DT bindings for the SOPHGO SG2002 clint on RiscV (Thomas Bonnefille) - Add the rtl-otto timer driver along with the DT bindings for the Realtek platform (Chris Packham) Link: https://lore.kernel.org/all/[email protected]
2024-07-12Merge tag 'loongarch-kvm-6.11' of ↵Paolo Bonzini4-19/+13
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.11 1. Add ParaVirt steal time support. 2. Add some VM migration enhancement. 3. Add perf kvm-stat support for loongarch.
2024-07-10clone3: drop __ARCH_WANT_SYS_CLONE3 macroArnd Bergmann1-1/+0
When clone3() was introduced, it was not obvious how each architecture deals with setting up the stack and keeping the register contents in a fork()-like system call, so this was left for the architecture maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those that already implement it. Five years later, we still have a few architectures left that are missing clone3(), and the macro keeps getting in the way as it's fundamentally different from all the other __ARCH_WANT_SYS_* macros that are meant to provide backwards-compatibility with applications using older syscalls that are no longer provided by default. Address this by reversing the polarity of the macro, adding an __ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3 from all the other ones. Acked-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2024-07-10Merge back cpufreq material for 6.11.Rafael J. Wysocki2-0/+4
2024-07-08x86/efistub: Enable SMBIOS protocol handling for x86Ard Biesheuvel1-1/+6
The smbios.c source file is not currently included in the x86 build, and before we can do so, it needs some tweaks to build correctly in combination with the EFI mixed mode support. Reviewed-by: Lukas Wunner <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2024-07-04perf/x86/intel: Support Perfmon MSRs aliasingKan Liang1-0/+6
The architectural performance monitoring V6 supports a new range of counters' MSRs in the 19xxH address range. They include all the GP counter MSRs, the GP control MSRs, and the fixed counter MSRs. The step between each sibling counter is 4. Add intel_pmu_addr_offset() to calculate the correct offset. Add fixedctr in struct x86_pmu to store the address of the fixed counter 0. It can be used to calculate the rest of the fixed counters. The MSR address of the fixed counter control is not changed. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-07-04perf/x86/intel: Support PERFEVTSEL extensionKan Liang1-0/+4
Two new fields (the unit mask2, and the equal flag) are added in the IA32_PERFEVTSELx MSRs. They can be enumerated by the CPUID.23H.0.EBX. Update the config_mask in x86_pmu and x86_hybrid_pmu for the true layout of the PERFEVTSEL. Expose the new formats into sysfs if they are available. The umask extension reuses the same format attr name "umask" as the previous umask. Add umask2_show to determine/display the correct format for the current machine. Co-developed-by: Dapeng Mi <[email protected]> Signed-off-by: Dapeng Mi <[email protected]> Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-07-04perf/x86: Add Lunar Lake and Arrow Lake supportKan Liang1-0/+4
From PMU's perspective, Lunar Lake and Arrow Lake are similar to the previous generation Meteor Lake. Both are hybrid platforms, with e-core and p-core. The key differences include: - The e-core supports 3 new fixed counters - The p-core supports an updated PEBS Data Source format - More GP counters (Updated event constraint table) - New Architectural performance monitoring V6 (New Perfmon MSRs aliasing, umask2, eq). - New PEBS format V6 (Counters Snapshotting group) - New RDPMC metrics clear mode The legacy features, the 3 new fixed counters and updated event constraint table are enabled in this patch. The new PEBS data source format, the architectural performance monitoring V6, the PEBS format V6, and the new RDPMC metrics clear mode are supported in the following patches. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-07-04perf/x86/intel: Support the PEBS event maskKan Liang1-0/+1
The current perf assumes that the counters that support PEBS are contiguous. But it's not guaranteed with the new leaf 0x23 introduced. The counters are enumerated with a counter mask. There may be holes in the counter mask for future platforms or in a virtualization environment. Store the PEBS event mask rather than the maximum number of PEBS counters in the x86 PMU structures. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-07-03x86/vdso: Remove unused includeAnna-Maria Behnsen1-1/+0
Including hrtimer.h is not required and is probably a historical leftover. Remove it. Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-03x86/vgtod: Remove unused typedef gtod_long_tAnna-Maria Behnsen1-5/+0
The typedef gtod_long_t is not used anymore so remove it. The header file contains then only includes dependent on CONFIG_GENERIC_GETTIMEOFDAY to not break ARCH=um. Nevertheless, keep the header file only with those includes to prevent spreading ifdeffery all over the place. Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-03x86/vdso: Fix function reference in commentAnna-Maria Behnsen1-3/+2
Replace the reference to the non-existent function arch_vdso_cycles_valid() by the proper function arch_vdso_cycles_ok(). Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-02x86/mm: Fix LAM inconsistency during context switchYosry Ahmed2-6/+11
LAM can only be enabled when a process is single-threaded. But _kernel_ threads can temporarily use a single-threaded process's mm. That means that a context-switching kernel thread can race and observe the mm's LAM metadata (mm->context.lam_cr3_mask) change. The context switch code does two logical things with that metadata: populate CR3 and populate 'cpu_tlbstate.lam'. If it hits this race, 'cpu_tlbstate.lam' and CR3 can end up out of sync. This de-synchronization is currently harmless. But it is confusing and might lead to warnings or real bugs. Update set_tlbstate_lam_mode() to take in the LAM mask and untag mask instead of an mm_struct pointer, and while we are at it, rename it to cpu_tlbstate_update_lam(). This should also make it clearer that we are updating cpu_tlbstate. In switch_mm_irqs_off(), read the LAM mask once and use it for both the cpu_tlbstate update and the CR3 update. Signed-off-by: Yosry Ahmed <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Link: https://lore.kernel.org/all/20240702132139.3332013-3-yosryahmed%40google.com
2024-07-02x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systemsTony Luck1-0/+1
Hardware has two RMID configuration options for SNC systems. The default mode divides RMID counters between SNC nodes. E.g. with 200 RMIDs and two SNC nodes per L3 cache RMIDs 0..99 are used on node 0, and 100..199 on node 1. This isn't compatible with Linux resctrl usage. On this example system a process using RMID 5 would only update monitor counters while running on SNC node 0. The other mode is "RMID Sharing Mode". This is enabled by clearing bit 0 of the RMID_SNC_CONFIG (0xCA0) model specific register. In this mode the number of logical RMIDs is the number of physical RMIDs (from CPUID leaf 0xF) divided by the number of SNC nodes per L3 cache instance. A process can use the same RMID across different SNC nodes. See the "Intel Resource Director Technology Architecture Specification" for additional details. When SNC is enabled, update the MSR when a monitor domain is marked online. Technically this is overkill. It only needs to be done once per L3 cache instance rather than per SNC domain. But there is no harm in doing it more than once, and this is not in a critical path. Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-02x86/efi: Drop support for fake EFI memory mapsArd Biesheuvel1-15/+0
Between kexec and confidential VM support, handling the EFI memory maps correctly on x86 is already proving to be rather difficult (as opposed to other EFI architectures which manage to never modify the EFI memory map to begin with) EFI fake memory map support is essentially a development hack (for testing new support for the 'special purpose' and 'more reliable' EFI memory attributes) that leaked into production code. The regions marked in this manner are not actually recognized as such by the firmware itself or the EFI stub (and never have), and marking memory as 'more reliable' seems rather futile if the underlying memory is just ordinary RAM. Marking memory as 'special purpose' in this way is also dubious, but may be in use in production code nonetheless. However, the same should be achievable by using the memmap= command line option with the ! operator. EFI fake memmap support is not enabled by any of the major distros (Debian, Fedora, SUSE, Ubuntu) and does not exist on other architectures, so let's drop support for it. Acked-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Dan Williams <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2024-07-01x86/alternatives, kvm: Fix a couple of CALLs without a frame pointerBorislav Petkov (AMD)2-5/+8
objtool complains: arch/x86/kvm/kvm.o: warning: objtool: .altinstr_replacement+0xc5: call without frame pointer save/setup vmlinux.o: warning: objtool: .altinstr_replacement+0x2eb: call without frame pointer save/setup Make sure %rSP is an output operand to the respective asm() statements. The test_cc() hunk and ALT_OUTPUT_SP() courtesy of peterz. Also from him add some helpful debugging info to the documentation. Now on to the explanations: tl;dr: The alternatives macros are pretty fragile. If I do ALT_OUTPUT_SP(output) in order to be able to package in a %rsp reference for objtool so that a stack frame gets properly generated, the inline asm input operand with positional argument 0 in clear_page(): "0" (page) gets "renumbered" due to the added : "+r" (current_stack_pointer), "=D" (page) and then gcc says: ./arch/x86/include/asm/page_64.h:53:9: error: inconsistent operand constraints in an ‘asm’ The fix is to use an explicit "D" constraint which points to a singleton register class (gcc terminology) which ends up doing what is expected here: the page pointer - input and output - should be in the same %rdi register. Other register classes have more than one register in them - example: "r" and "=r" or "A": ‘A’ The ‘a’ and ‘d’ registers. This class is used for instructions that return double word results in the ‘ax:dx’ register pair. Single word values will be allocated either in ‘ax’ or ‘dx’. so using "D" and "=D" just works in this particular case. And yes, one would say, sure, why don't you do "+D" but then: : "+r" (current_stack_pointer), "+D" (page) : [old] "i" (clear_page_orig), [new1] "i" (clear_page_rep), [new2] "i" (clear_page_erms), : "cc", "memory", "rax", "rcx") now find the Waldo^Wcomma which throws a wrench into all this. Because that silly macro has an "input..." consume-all last macro arg and in it, one is supposed to supply input *and* clobbers, leading to silly syntax snafus. Yap, they need to be cleaned up, one fine day... Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Reported-by: kernel test robot <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Sean Christopherson <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/20240625112056.GDZnqoGDXgYuWBDUwu@fat_crate.local
2024-06-30x86-32: fix cmpxchg8b_emu build error with clangLinus Torvalds1-7/+5
The kernel test robot reported that clang no longer compiles the 32-bit x86 kernel in some configurations due to commit 95ece48165c1 ("locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions"). The build fails with arch/x86/include/asm/cmpxchg_32.h:149:9: error: inline assembly requires more registers than available and the reason seems to be that not only does the cmpxchg8b instruction need four fixed registers (EDX:EAX and ECX:EBX), with the emulation fallback the inline asm also wants a fifth fixed register for the address (it uses %esi for that, but that's just a software convention with cmpxchg8b_emu). Avoiding using another pointer input to the asm (and just forcing it to use the "0(%esi)" addressing that we end up requiring for the sw fallback) seems to fix the issue. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Fixes: 95ece48165c1 ("locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions") Link: https://lore.kernel.org/all/[email protected]/ Suggested-by: Uros Bizjak <[email protected]> Reviewed-and-Tested-by: Uros Bizjak <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2024-06-28Merge tag 'hardening-v6.10-rc6' of ↵Linus Torvalds1-9/+6
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - Remove invalid tty __counted_by annotation (Nathan Chancellor) - Add missing MODULE_DESCRIPTION()s for KUnit string tests (Jeff Johnson) - Remove non-functional per-arch kstack entropy filtering * tag 'hardening-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: tty: mxser: Remove __counted_by from mxser_board.ports[] randomize_kstack: Remove non-functional per-arch entropy filtering string: kunit: add missing MODULE_DESCRIPTION() macros
2024-06-28x86/cpufeatures: Add HWP highest perf change feature flagSrinivas Pandruvada1-0/+1
When CPUID[6].EAX[15] is set to 1, this CPU supports notification for HWP (Hardware P-states) highest performance change. Add a feature flag to check if the CPU supports HWP highest performance change. Signed-off-by: Srinivas Pandruvada <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Rafael J. Wysocki <[email protected]>
2024-06-28KVM: x86/pmu: Introduce distinct macros for GP/fixed counter max numberDapeng Mi1-8/+12
Refine the macros which define maximum General Purpose (GP) and fixed counter numbers. Currently the macro KVM_INTEL_PMC_MAX_GENERIC is used to represent the maximum supported General Purpose (GP) counter number ambiguously across Intel and AMD platforms. This would cause issues if AMD begins to support more GP counters than Intel. Thus a bunch of new macros including vendor specific and vendor independent are introduced to replace the old macros. The vendor independent macros are used in x86 common code to hide vendor difference and eliminate the ambiguity. No logic changes are introduced in this patch. Signed-off-by: Dapeng Mi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Co-developed-by: Sean Christopherson <[email protected]> Signed-off-by: Sean Christopherson <[email protected]>
2024-06-28KVM: nVMX: Fold requested virtual interrupt check into has_nested_events()Sean Christopherson2-2/+0
Check for a Requested Virtual Interrupt, i.e. a virtual interrupt that is pending delivery, in vmx_has_nested_events() and drop the one-off kvm_x86_ops.guest_apic_has_interrupt() hook. In addition to dropping a superfluous hook, this fixes a bug where KVM would incorrectly treat virtual interrupts _for L2_ as always enabled due to kvm_arch_interrupt_allowed(), by way of vmx_interrupt_blocked(), treating IRQs as enabled if L2 is active and vmcs12 is configured to exit on IRQs, i.e. KVM would treat a virtual interrupt for L2 as a valid wake event based on L1's IRQ blocking status. Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-28KVM: nVMX: Request immediate exit iff pending nested event needs injectionSean Christopherson1-1/+1
When requesting an immediate exit from L2 in order to inject a pending event, do so only if the pending event actually requires manual injection, i.e. if and only if KVM actually needs to regain control in order to deliver the event. Avoiding the "immediate exit" isn't simply an optimization, it's necessary to make forward progress, as the "already expired" VMX preemption timer trick that KVM uses to force a VM-Exit has higher priority than events that aren't directly injected. At present time, this is a glorified nop as all events processed by vmx_has_nested_events() require injection, but that will not hold true in the future, e.g. if there's a pending virtual interrupt in vmcs02.RVI. I.e. if KVM is trying to deliver a virtual interrupt to L2, the expired VMX preemption timer will trigger VM-Exit before the virtual interrupt is delivered, and KVM will effectively hang the vCPU in an endless loop of forced immediate VM-Exits (because the pending virtual interrupt never goes away). Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2024-06-28randomize_kstack: Remove non-functional per-arch entropy filteringKees Cook1-9/+6
An unintended consequence of commit 9c573cd31343 ("randomize_kstack: Improve entropy diffusion") was that the per-architecture entropy size filtering reduced how many bits were being added to the mix, rather than how many bits were being used during the offsetting. All architectures fell back to the existing default of 0x3FF (10 bits), which will consume at most 1KiB of stack space. It seems that this is working just fine, so let's avoid the confusion and update everything to use the default. The prior intent of the per-architecture limits were: arm64: capped at 0x1FF (9 bits), 5 bits effective powerpc: uncapped (10 bits), 6 or 7 bits effective riscv: uncapped (10 bits), 6 bits effective x86: capped at 0xFF (8 bits), 5 (x86_64) or 6 (ia32) bits effective s390: capped at 0xFF (8 bits), undocumented effective entropy Current discussion has led to just dropping the original per-architecture filters. The additional entropy appears to be safe for arm64, x86, and s390. Quoting Arnd, "There is no point pretending that 15.75KB is somehow safe to use while 15.00KB is not." Co-developed-by: Yuntao Liu <[email protected]> Signed-off-by: Yuntao Liu <[email protected]> Fixes: 9c573cd31343 ("randomize_kstack: Improve entropy diffusion") Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Arnd Bergmann <[email protected]> Acked-by: Mark Rutland <[email protected]> Acked-by: Heiko Carstens <[email protected]> # s390 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-06-27Merge tag 'amd-pstate-v6.11-2024-06-26' of ↵Rafael J. Wysocki1-0/+2
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux Merge more amd-pstate driver updates for 6.11 from Mario Limonciello: "Add support for amd-pstate core performance boost support which allows controlling which CPU cores can operate above nominal frequencies for short periods of time." * tag 'amd-pstate-v6.11-2024-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux: Documentation: cpufreq: amd-pstate: update doc for Per CPU boost control method cpufreq: amd-pstate: Cap the CPPC.max_perf to nominal_perf if CPB is off cpufreq: amd-pstate: initialize core precision boost state cpufreq: acpi: move MSR_K7_HWCR_CPB_DIS_BIT into msr-index.h
2024-06-27Merge back cpufreq material for v6.11.Rafael J. Wysocki1-0/+1
2024-06-26cpufreq: acpi: move MSR_K7_HWCR_CPB_DIS_BIT into msr-index.hPerry Yuan1-0/+2
There are some other drivers also need to use the MSR_K7_HWCR_CPB_DIS_BIT for CPB control bit, so it makes sense to move the definition to a common header file to allow other driver to use it. No intentional functional impact. Suggested-by: Gautham Ranjal Shenoy <[email protected]> Signed-off-by: Perry Yuan <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Acked-by: Huang Rui <[email protected]> Link: https://lore.kernel.org/r/78b6c75e6cffddce3e950dd543af6ae9f8eeccc3.1718988436.git.perry.yuan@amd.com Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mario Limonciello <[email protected]>
2024-06-25x86/vmware: Add TDX hypercall supportAlexey Makhalov1-0/+45
VMware hypercalls use I/O port, VMCALL or VMMCALL instructions. Add a call to __tdx_hypercall() in order to support TDX guests. No change in high bandwidth hypercalls, as only low bandwidth ones are supported for TDX guests. [ bp: Massage, clear on-stack struct tdx_module_args variable. ] Co-developed-by: Tim Merrifield <[email protected]> Signed-off-by: Tim Merrifield <[email protected]> Signed-off-by: Alexey Makhalov <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-25x86/vmware: Remove legacy VMWARE_HYPERCALL* macrosAlexey Makhalov1-26/+0
No more direct use of these macros should be allowed. The vmware_hypercallX API still uses the new implementation of VMWARE_HYPERCALL macro internally, but it is not exposed outside of the vmware.h. Signed-off-by: Alexey Makhalov <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-25x86/vmware: Introduce VMware hypercall APIAlexey Makhalov1-14/+265
Introduce a vmware_hypercall family of functions. It is a common implementation to be used by the VMware guest code and virtual device drivers in architecture independent manner. The API consists of vmware_hypercallX and vmware_hypercall_hb_{out,in} set of functions analogous to KVM's hypercall API. Architecture-specific implementation is hidden inside. It will simplify future enhancements in VMware hypercalls such as SEV-ES and TDX related changes without needs to modify a caller in device drivers code. Current implementation extends an idea from bac7b4e84323 ("x86/vmware: Update platform detection code for VMCALL/VMMCALL hypercalls") to have a slow, but safe path vmware_hypercall_slow() earlier during the boot when alternatives are not yet applied. The code inherits VMWARE_CMD logic from the commit mentioned above. Move common macros from vmware.c to vmware.h. [ bp: Fold in a fix: https://lore.kernel.org/r/[email protected] ] Signed-off-by: Alexey Makhalov <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]