aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2024-08-13KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page()Yue Haibing1-1/+0
There is no caller in tree since introduction in commit b4f69df0f65e ("KVM: x86: Make Hyper-V emulation optional") Signed-off-by: Yue Haibing <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-08-13x86/apic: Make x2apic_disable() work correctlyYuntao Wang1-5/+6
x2apic_disable() clears x2apic_state and x2apic_mode unconditionally, even when the state is X2APIC_ON_LOCKED, which prevents the kernel to disable it thereby creating inconsistent state. Due to the early state check for X2APIC_ON, the code path which warns about a locked X2APIC cannot be reached. Test for state < X2APIC_ON instead and move the clearing of the state and mode variables to the place which actually disables X2APIC. [ tglx: Massaged change log. Added Fixes tag. Moved clearing so it's at the right place for back ports ] Fixes: a57e456a7b28 ("x86/apic: Fix fallout from x2apic cleanup") Signed-off-by: Yuntao Wang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/all/[email protected]
2024-08-13KVM: SVM: Fix an error code in sev_gmem_post_populate()Dan Carpenter1-2/+3
The copy_from_user() function returns the number of bytes which it was not able to copy. Return -EFAULT instead. Fixes: dee5a47cc7a4 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command") Signed-off-by: Dan Carpenter <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-08-13KVM: SVM: Fix uninitialized variable bugDan Carpenter1-1/+1
If snp_lookup_rmpentry() fails then "assigned" is printed in the error message but it was never initialized. Initialize it to false. Fixes: dee5a47cc7a4 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command") Signed-off-by: Dan Carpenter <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-08-13x86/irq: Fix comment on IRQ vector layoutSohil Mehta1-2/+2
commit f5a3562ec9dd ("x86/irq: Reserve a per CPU IDT vector for posted MSIs") changed the first system vector from LOCAL_TIMER_VECTOR to POSTED_MSI_NOTIFICATION_VECTOR. Reflect this change in the vector layout comment as well. However, instead of pointing to the specific vector, use the FIRST_SYSTEM_VECTOR indirection which essentially refers to the same. This avoids unnecessary modifications to the same comment whenever additional system vectors get added. Signed-off-by: Sohil Mehta <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Jacob Pan <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-13x86/apic: Remove unused extern declarationsYue Haibing1-6/+0
The removal of get_physical_broadcast(), generic_bigsmp_probe() and default_apic_id_valid() left the stale declarations around. Remove them. Signed-off-by: Yue Haibing <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-12introduce fd_file(), convert all accessors to it.Al Viro2-10/+10
For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
2024-08-12platform/x86/intel/ifs: Add SBAF test image loading supportJithu Joseph1-0/+2
Structural Based Functional Test at Field (SBAF) is a new type of testing that provides comprehensive core test coverage complementing existing IFS tests like Scan at Field (SAF) or ArrayBist. SBAF device will appear as a new device instance (intel_ifs_2) under /sys/devices/virtual/misc. The user interaction necessary to load the test image and test a particular core is the same as the existing scan test (intel_ifs_0). During the loading stage, the driver will look for a file named ff-mm-ss-<batch02x>.sbft in the /lib/firmware/intel/ifs_2 directory. The hardware interaction needed for loading the image is similar to SAF, with the only difference being the MSR addresses used. Reuse the SAF image loading code, passing the SBAF-specific MSR addresses via struct ifs_test_msrs in the driver device data. Unlike SAF, the SBAF test image chunks are further divided into smaller logical entities called bundles. Since the SBAF test is initiated per bundle, cache the maximum number of bundles in the current image, which is used for iterating through bundles during SBAF test execution. Reviewed-by: Ashok Raj <[email protected]> Reviewed-by: Tony Luck <[email protected]> Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Jithu Joseph <[email protected]> Co-developed-by: Kuppuswamy Sathyanarayanan <[email protected]> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]> Link: https://lore.kernel.org/r/20240801051814.1935149-3-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Hans de Goede <[email protected]>
2024-08-10x86/mm: Remove unused CR3_HW_ASID_BITSYosry Ahmed1-3/+0
Commit 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches") removed the last usage of CR3_HW_ASID_BITS and opted to use X86_CR3_PCID_BITS instead. Remove CR3_HW_ASID_BITS. Signed-off-by: Yosry Ahmed <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-10crypto: x86/aes-gcm - fix PREEMPT_RT issue in gcm_crypt()Eric Biggers1-31/+28
On PREEMPT_RT, kfree() takes sleeping locks and must not be called with preemption disabled. Therefore, on PREEMPT_RT skcipher_walk_done() must not be called from within a kernel_fpu_{begin,end}() pair, even when it's the last call which is guaranteed to not allocate memory. Therefore, move the last skcipher_walk_done() in gcm_crypt() to the end of the function so that it goes after the kernel_fpu_end(). To make this work cleanly, rework the data processing loop to handle only non-last data segments. Fixes: b06affb1cb58 ("crypto: x86/aes-gcm - add VAES and AVX512 / AVX10 optimized AES-GCM") Reported-by: Sebastian Andrzej Siewior <[email protected]> Closes: https://lore.kernel.org/linux-crypto/[email protected] Signed-off-by: Eric Biggers <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2024-08-09x86/apic: Remove logical destination mode for 64-bitThomas Gleixner2-120/+7
Logical destination mode of the local APIC is used for systems with up to 8 CPUs. It has an advantage over physical destination mode as it allows to target multiple CPUs at once with IPIs. That advantage was definitely worth it when systems with up to 8 CPUs were state of the art for servers and workstations, but that's history. Aside of that there are systems which fail to work with logical destination mode as the ACPI/DMI quirks show and there are AMD Zen1 systems out there which fail when interrupt remapping is enabled as reported by Rob and Christian. The latter problem can be cured by firmware updates, but not all OEMs distribute the required changes. Physical destination mode is guaranteed to work because it is the only way to get a CPU up and running via the INIT/INIT/STARTUP sequence. As the number of CPUs keeps increasing, logical destination mode becomes a less used code path so there is no real good reason to keep it around. Therefore remove logical destination mode support for 64-bit and default to physical destination mode. Reported-by: Rob Newcater <[email protected]> Reported-by: Christian Heusel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Rob Newcater <[email protected]> Link: https://lore.kernel.org/all/877cd5u671.ffs@tglx
2024-08-08x86: Ignore stack unwinding in KCOVDmitry Vyukov1-0/+8
Stack unwinding produces large amounts of uninteresting coverage. It's called from KASAN kmalloc/kfree hooks, fault injection, etc. It's not particularly useful and is not a function of system call args. Ignore that code. Signed-off-by: Dmitry Vyukov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Reviewed-by: Marco Elver <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/eaf54b8634970b73552dcd38bf9be6ef55238c10.1718092070.git.dvyukov@google.com
2024-08-08x86/entry: Remove unwanted instrumentation in common_interrupt()Dmitry Vyukov2-5/+9
common_interrupt() and related variants call kvm_set_cpu_l1tf_flush_l1d(), which is neither marked noinstr nor __always_inline. So compiler puts it out of line and adds instrumentation to it. Since the call is inside of instrumentation_begin/end(), objtool does not warn about it. The manifestation is that KCOV produces spurious coverage in kvm_set_cpu_l1tf_flush_l1d() in random places because the call happens when preempt count is not yet updated to say that the kernel is in an interrupt. Mark kvm_set_cpu_l1tf_flush_l1d() as __always_inline and move it out of the instrumentation_begin/end() section. It only calls __this_cpu_write() which is already safe to call in noinstr contexts. Fixes: 6368558c3710 ("x86/entry: Provide IDTENTRY_SYSVEC") Signed-off-by: Dmitry Vyukov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/all/3f9a1de9e415fcb53d07dc9e19fa8481bb021b1b.1718092070.git.dvyukov@google.com
2024-08-08x86/mm: Don't print out SRAT table informationLi RongQing1-4/+2
This per CPU log is becoming longer with more and more CPUs in system, which slows down the boot process due to the serializing nature of printk(). The value of this information is dubious and it can be retrieved by lscpu from user space if required.. Downgrade the printk() to pr_debug() so it is still accessible for debug purposes. [ tglx: Massaged changelog ] Signed-off-by: Li RongQing <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-08x86/mtrr: Check if fixed MTRRs exist before saving themAndi Kleen1-1/+1
MTRRs have an obsolete fixed variant for fine grained caching control of the 640K-1MB region that uses separate MSRs. This fixed variant has a separate capability bit in the MTRR capability MSR. So far all x86 CPUs which support MTRR have this separate bit set, so it went unnoticed that mtrr_save_state() does not check the capability bit before accessing the fixed MTRR MSRs. Though on a CPU that does not support the fixed MTRR capability this results in a #GP. The #GP itself is harmless because the RDMSR fault is handled gracefully, but results in a WARN_ON(). Add the missing capability check to prevent this. Fixes: 2b1f6278d77c ("[PATCH] x86: Save the MTRRs of the BSP before booting an AP") Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/paravirt: Fix incorrect virt spinlock setting on bare metalChen Yu2-9/+10
The kernel can change spinlock behavior when running as a guest. But this guest-friendly behavior causes performance problems on bare metal. The kernel uses a static key to switch between the two modes. In theory, the static key is enabled by default (run in guest mode) and should be disabled for bare metal (and in some guests that want native behavior or paravirt spinlock). A performance drop is reported when running encode/decode workload and BenchSEE cache sub-workload. Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS is disabled the virt_spin_lock_key is incorrectly set to true on bare metal. The qspinlock degenerates to test-and-set spinlock, which decreases the performance on bare metal. Set the default value of virt_spin_lock_key to false. If booting in a VM, enable this key. Later during the VM initialization, if other high-efficient spinlock is preferred (e.g. paravirt-spinlock), or the user wants the native qspinlock (via nopvspin boot commandline), the virt_spin_lock_key is disabled accordingly. This results in the following decision matrix: X86_FEATURE_HYPERVISOR Y Y Y N CONFIG_PARAVIRT_SPINLOCKS Y Y N Y/N PV spinlock Y N N Y/N virt_spin_lock_key N Y/N Y N Fixes: ce0a1b608bfc ("x86/paravirt: Silence unused native_pv_lock_init() function warning") Reported-by: Prem Nath Dey <[email protected]> Reported-by: Xiaoping Zhou <[email protected]> Suggested-by: Dave Hansen <[email protected]> Suggested-by: Qiuxu Zhuo <[email protected]> Suggested-by: Nikolay Borisov <[email protected]> Signed-off-by: Chen Yu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailboxZhiquan Li1-1/+1
On a platform using the "Multiprocessor Wakeup Structure"[1] to startup secondary CPUs the control processor needs to memremap() the physical address of the MP Wakeup Structure mailbox to the variable acpi_mp_wake_mailbox, which holds the virtual address of mailbox. To wake up the AP the control processor writes the APIC ID of AP, the wakeup vector and the ACPI_MP_WAKE_COMMAND_WAKEUP command into the mailbox. Current implementation doesn't consider the case which restricts boot time CPU bringup to 1 with the kernel parameter "maxcpus=1" and brings other CPUs online later from user space as it sets acpi_mp_wake_mailbox to read-only after init. So when the first AP is tried to brought online after init, the attempt to update the variable results in a kernel panic. The memremap() call that initializes the variable cannot be moved into acpi_parse_mp_wake() because memremap() is not functional at that point in the boot process. Also as the APs might never be brought up, keep the memremap() call in acpi_wakeup_cpu() so that the operation only takes place when needed. Fixes: 24dd05da8c79 ("x86/apic: Mark acpi_mp_wake_* variables as __ro_after_init") Signed-off-by: Zhiquan Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/apic: Remove unused inline function apic_set_eoi_cb()Yue Haibing1-1/+0
Commit 2744a7ce34a7 ("x86/apic: Replace acpi_wake_cpu_handler_update() and apic_set_eoi_cb()") left it unused. Signed-off-by: Yue Haibing <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup remaining coding style issuesThomas Gleixner1-11/+12
Add missing new lines and reorder variable definitions. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup line breaksThomas Gleixner1-37/+18
80 character limit is history. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup bracket usageThomas Gleixner1-16/+18
Add brackets around if/for constructs as required by coding style or remove pointless line breaks to make it true single line statements which do not require brackets. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup commentsThomas Gleixner1-49/+37
Use proper comment styles and shrink comments to their scope where applicable. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Move replace_pin_at_irq_node() to the call siteThomas Gleixner1-22/+18
It's only used by check_timer(). Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/mpparse: Cleanup apic_printk()sThomas Gleixner1-7/+6
Use the new apic_pr_verbose() helper. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup guarded debug printk()sThomas Gleixner1-38/+29
Cleanup the APIC printk()s which are inside of a apic verbosity guarded region by using apic_dbg() for the KERN_DEBUG level prints. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup apic_printk()sThomas Gleixner1-103/+73
Replace apic_printk($LEVEL) with the corresponding apic_pr_*() helpers and use pr_info() for APIC_QUIET as that is always printed so the indirection is pointless noise. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/apic: Cleanup apic_printk()sThomas Gleixner1-46/+35
Use the new apic_pr_*() helpers and cleanup the apic_printk() maze. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/apic: Provide apic_printk() helpersThomas Gleixner1-14/+19
apic_printk() requires the APIC verbosity level and printk level which is tedious and horrible to read. Provide helpers to simplify all of that. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Use guard() for locking where applicableThomas Gleixner1-128/+64
KISS rules! Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Cleanup structsThomas Gleixner1-18/+14
Make them conforming to the TIP coding style guide. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Mark mp_alloc_timer_irq() __initThomas Gleixner1-2/+2
Only invoked from check_timer() which is __init too. Cleanup the variable declaration while at it. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Tested-by: Breno Leitao <[email protected]> Reviewed-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/ioapic: Handle allocation failures gracefullyThomas Gleixner1-24/+22
Breno observed panics when using failslab under certain conditions during runtime: can not alloc irq_pin_list (-1,0,20) Kernel panic - not syncing: IO-APIC: failed to add irq-pin. Can not proceed panic+0x4e9/0x590 mp_irqdomain_alloc+0x9ab/0xa80 irq_domain_alloc_irqs_locked+0x25d/0x8d0 __irq_domain_alloc_irqs+0x80/0x110 mp_map_pin_to_irq+0x645/0x890 acpi_register_gsi_ioapic+0xe6/0x150 hpet_open+0x313/0x480 That's a pointless panic which is a leftover of the historic IO/APIC code which panic'ed during early boot when the interrupt allocation failed. The only place which might justify panic is the PIT/HPET timer_check() code which tries to figure out whether the timer interrupt is delivered through the IO/APIC. But that code does not require to handle interrupt allocation failures. If the interrupt cannot be allocated then timer delivery fails and it either panics due to that or falls back to legacy mode. Cure this by removing the panic wrapper around __add_pin_to_irq_node() and making mp_irqdomain_alloc() aware of the failure condition and handle it as any other failure in this function gracefully. Reported-by: Breno Leitao <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Breno Leitao <[email protected]> Tested-by: Qiuxu Zhuo <[email protected]> Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/all/[email protected]
2024-08-07x86/mm: Fix PTI for i386 some moreThomas Gleixner1-16/+29
So it turns out that we have to do two passes of pti_clone_entry_text(), once before initcalls, such that device and late initcalls can use user-mode-helper / modprobe and once after free_initmem() / mark_readonly(). Now obviously mark_readonly() can cause PMD splits, and pti_clone_pgtable() doesn't like that much. Allow the late clone to split PMDs so that pagetables stay in sync. [peterz: Changelog and comments] Reported-by: Guenter Roeck <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Guenter Roeck <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-08-06x86/traps: Enable UBSAN traps on x86Gatlin Newhouse2-5/+66
Currently ARM64 extracts which specific sanitizer has caused a trap via encoded data in the trap instruction. Clang on x86 currently encodes the same data in the UD1 instruction but x86 handle_bug() and is_valid_bugaddr() currently only look at UD2. Bring x86 to parity with ARM64, similar to commit 25b84002afb9 ("arm64: Support Clang UBSAN trap codes for better reporting"). See the llvm links for information about the code generation. Enable the reporting of UBSAN sanitizer details on x86 compiled with clang when CONFIG_UBSAN_TRAP=y by analysing UD1 and retrieving the type immediate which is encoded by the compiler after the UD1. [ tglx: Simplified it by moving the printk() into handle_bug() ] Signed-off-by: Gatlin Newhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Kees Cook <[email protected]> Link: https://lore.kernel.org/all/[email protected] Link: https://github.com/llvm/llvm-project/commit/c5978f42ec8e9#diff-bb68d7cd885f41cfc35843998b0f9f534adb60b415f647109e597ce448e92d9f Link: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86InstrSystem.td#L27
2024-08-05perf/x86/intel/bts: Fix comment about default perf_event_paranoid settingJames Clark1-3/+0
The default paranoid setting was updated in commit 0161028b7c8a ("perf/core: Change the default paranoia level to 2") so this comment is no longer true. Signed-off-by: James Clark <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-08-05perf/x86/intel/uncore: Use D0:F0 as a default deviceZhenyu Wang1-0/+4
Some uncore PMON registers are located in the MMIO space of the Host Bridge and DRAM Controller device, which is located at D0:F0 for Tiger Lake and later client generation. Use D0:F0 as a default device. So it doesn't need to keep adding the complete Device ID list for each generation anymore. Signed-off-by: Zhenyu Wang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Kan Liang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-08-05perf/x86/intel/uncore: Add LNL uncore iMC freerunning supportZhenyu Wang1-0/+1
LNL uncore imc freerunning counters keep same as previous HW. Signed-off-by: Zhenyu Wang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Kan Liang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-08-05perf/x86/intel/uncore: Add Lunar Lake supportKan Liang3-0/+141
The uncore subsystem for Lunar Lake is similar to the previous Meteor Lake. The uncore PerfMon registers are located at both MSR and MMIO space. The ARB and iMC are kept. There is no difference from the Meteor Lake. Move the global control initialization to the first box of the CBOX. The sNCU is moved to the MMIO space. The HBO is newly added and only be accessed from the MMIO space. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-08-05perf/x86/intel/uncore: Factor out common MMIO init and ops functionsKan Liang1-17/+30
Some uncore PMON registers are located in the MMIO space. For the client machine, the MMIO space is usually located at D0:F0 but in a different BAR. For example, some uncore PMON registers are located in the SAF BAR, not the MCHBAR in the Lunar Lake. The current __uncore_imc_init_box() hard code the BAR information. Factor out the uncore_get_box_mmio_addr() which uses the BAR information as a parameter. The only change is the error output message. The hardcode name 'MCHBAR' is replaced by the offset of a BAR. Add a new macro, MMIO_UNCORE_COMMON_OPS(), since the MMIO ops functions are usually the same among different generations. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-08-05perf/x86/intel/uncore: Add Arrow Lake supportKan Liang1-0/+3
>From the perspective of the uncore PMU, the Arrow Lake is the same as the previous Meteor Lake. The only difference is the event list, which will be supported in the perf tool later. Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-08-05x86/mm/ident_map: Use gbpages only where full GB page should be mapped.Steve Wahl1-5/+18
When ident_pud_init() uses only GB pages to create identity maps, large ranges of addresses not actually requested can be included in the resulting table; a 4K request will map a full GB. This can include a lot of extra address space past that requested, including areas marked reserved by the BIOS. That allows processor speculation into reserved regions, that on UV systems can cause system halts. Only use GB pages when map creation requests include the full GB page of space. Fall back to using smaller 2M pages when only portions of a GB page are included in the request. No attempt is made to coalesce mapping requests. If a request requires a map entry at the 2M (pmd) level, subsequent mapping requests within the same 1G region will also be at the pmd level, even if adjacent or overlapping such requests could have been combined to map a full GB page. Existing usage starts with larger regions and then adds smaller regions, so this should not have any great consequence. Signed-off-by: Steve Wahl <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Pavin Joseph <[email protected]> Tested-by: Sarah Brofeldt <[email protected]> Tested-by: Eric Hagberg <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-05x86/kexec: Add EFI config table identity mapping for kexec kernelTao Liu1-0/+27
A kexec kernel boot failure is sometimes observed on AMD CPUs due to an unmapped EFI config table array. This can be seen when "nogbpages" is on the kernel command line, and has been observed as a full BIOS reboot rather than a successful kexec. This was also the cause of reported regressions attributed to Commit 7143c5f4cf20 ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.") which was subsequently reverted. To avoid this page fault, explicitly include the EFI config table array in the kexec identity map. Further explanation: The following 2 commits caused the EFI config table array to be accessed when enabling sev at kernel startup. commit ec1c66af3a30 ("x86/compressed/64: Detect/setup SEV/SME features earlier during boot") commit c01fce9cef84 ("x86/compressed: Add SEV-SNP feature detection/setup") This is in the code that examines whether SEV should be enabled or not, so it can even affect systems that are not SEV capable. This may result in a page fault if the EFI config table array's address is unmapped. Since the page fault occurs before the new kernel establishes its own identity map and page fault routines, it is unrecoverable and kexec fails. Most often, this problem is not seen because the EFI config table array gets included in the map by the luck of being placed at a memory address close enough to other memory areas that *are* included in the map created by kexec. Both the "nogbpages" command line option and the "use gpbages only where full GB page should be mapped" change greatly reduce the chance of being included in the map by luck, which is why the problem appears. Signed-off-by: Tao Liu <[email protected]> Signed-off-by: Steve Wahl <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Pavin Joseph <[email protected]> Tested-by: Sarah Brofeldt <[email protected]> Tested-by: Eric Hagberg <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-04Merge tag 'x86-urgent-2024-08-04' of ↵Linus Torvalds8-17/+36
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver. A recent change in the ACPI code which consolidated code pathes moved the invocation of init_freq_invariance_cppc() to be moved to a CPU hotplug handler. The first invocation on AMD CPUs ends up enabling a static branch which dead locks because the static branch enable tries to acquire cpu_hotplug_lock but that lock is already held write by the hotplug machinery. Use static_branch_enable_cpuslocked() instead and take the hotplug lock read for the Intel code path which is invoked from the architecture code outside of the CPU hotplug operations. - Fix the number of reserved bits in the sev_config structure bit field so that the bitfield does not exceed 64 bit. - Add missing Zen5 model numbers - Fix the alignment assumptions of pti_clone_pgtable() and clone_entry_text() on 32-bit: The code assumes PMD aligned code sections, but on 32-bit the kernel entry text is not PMD aligned. So depending on the code size and location, which is configuration and compiler dependent, entry text can cross a PMD boundary. As the start is not PMD aligned adding PMD size to the start address is larger than the end address which results in partially mapped entry code for user space. That causes endless recursion on the first entry from userspace (usually #PF). Cure this by aligning the start address in the addition so it ends up at the next PMD start address. clone_entry_text() enforces PMD mapping, but on 32-bit the tail might eventually be PTE mapped, which causes a map fail because the PMD for the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for the clone() invocation which resolves to PTE on 32-bit and PMD on 64-bit. - Zero the 8-byte case for get_user() on range check failure on 32-bit The recend consolidation of the 8-byte get_user() case broke the zeroing in the failure case again. Establish it by clearing ECX before the range check and not afterwards as that obvioulsy can't be reached when the range check fails * tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit x86/mm: Fix pti_clone_entry_text() for i386 x86/mm: Fix pti_clone_pgtable() alignment assumption x86/setup: Parse the builtin command line before merging x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range x86/sev: Fix __reserved field in sev_config x86/aperfmperf: Fix deadlock on cpu_hotplug_lock
2024-08-04Merge tag 'perf-urgent-2024-08-04' of ↵Linus Torvalds2-12/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Thomas Gleixner: - Move the smp_processor_id() invocation back into the non-preemtible region, so that the result is valid to use - Add the missing package C2 residency counters for Sierra Forest CPUs to make the newly added support actually useful * tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix smp_processor_id()-in-preemptible warnings perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest
2024-08-02x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequencyMichael Kelley1-0/+1
A Linux guest on Hyper-V gets the TSC frequency from a synthetic MSR, if available. In this case, set X86_FEATURE_TSC_KNOWN_FREQ so that Linux doesn't unnecessarily do refined TSC calibration when setting up the TSC clocksource. With this change, a message such as this is no longer output during boot when the TSC is used as the clocksource: [ 1.115141] tsc: Refined TSC clocksource calibration: 2918.408 MHz Furthermore, the guest and host will have exactly the same view of the TSC frequency, which is important for features such as the TSC deadline timer that are emulated by the Hyper-V host. Signed-off-by: Michael Kelley <[email protected]> Reviewed-by: Roman Kisel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]> Message-ID: <[email protected]>
2024-08-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds7-20/+24
Pull kvm updates from Paolo Bonzini: "The bulk of the changes here is a largish change to guest_memfd, delaying the clearing and encryption of guest-private pages until they are actually added to guest page tables. This started as "let's make it impossible to misuse the API" for SEV-SNP; but then it ballooned a bit. The new logic is generally simpler and more ready for hugepage support in guest_memfd. Summary: - fix latent bug in how usage of large pages is determined for confidential VMs - fix "underline too short" in docs - eliminate log spam from limited APIC timer periods - disallow pre-faulting of memory before SEV-SNP VMs are initialized - delay clearing and encrypting private memory until it is added to guest page tables - this change also enables another small cleanup: the checks in SNP_LAUNCH_UPDATE that limit it to non-populated, private pages can now be moved in the common kvm_gmem_populate() function - fix compilation error that the RISC-V merge introduced in selftests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: fix determination of max NPT mapping level for private pages KVM: riscv: selftests: Fix compile error KVM: guest_memfd: abstract how prepared folios are recorded KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns KVM: extend kvm_range_has_memory_attributes() to check subset of attributes KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes() KVM: guest_memfd: move check for already-populated page to common code KVM: remove kvm_arch_gmem_prepare_needed() KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_* KVM: guest_memfd: do not go through struct page KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation KVM: guest_memfd: return folio from __kvm_gmem_get_pfn() KVM: x86: disallow pre-fault for SNP VMs before initialization KVM: Documentation: Fix title underline too short warning KVM: x86: Eliminate log spam from limited APIC timer periods
2024-08-02x86/tsc: Check for sockets instead of CPUs to make code match commentPaul E. McKenney1-1/+1
The unsynchronized_tsc() eventually checks num_possible_cpus(), and if the system is non-Intel and the number of possible CPUs is greater than one, assumes that TSCs are unsynchronized. This despite the comment saying "assume multi socket systems are not synchronized", that is, socket rather than CPU. This behavior was preserved by commit 8fbbc4b45ce3 ("x86: merge tsc_init and clocksource code") and by the previous relevant commit 7e69f2b1ead2 ("clocksource: Remove the update callback"). The clocksource drivers were added by commit 5d0cf410e94b ("Time: i386 Clocksource Drivers") back in 2006, and the comment still said "socket" rather than "CPU". Therefore, bravely (and perhaps foolishly) make the code match the comment. Note that it is possible to bypass both code and comment by booting with tsc=reliable, but this also disables the clocksource watchdog, which is undesirable when trust in the TSC is strictly limited. Reported-by: Zhengxu Chen <[email protected]> Reported-by: Danielle Costantino <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-02Merge branch 'kvm-fixes' into HEADPaolo Bonzini7-20/+24
* fix latent bug in how usage of large pages is determined for confidential VMs * fix "underline too short" in docs * eliminate log spam from limited APIC timer periods * disallow pre-faulting of memory before SEV-SNP VMs are initialized * delay clearing and encrypting private memory until it is added to guest page tables * this change also enables another small cleanup: the checks in SNP_LAUNCH_UPDATE that limit it to non-populated, private pages can now be moved in the common kvm_gmem_populate() function
2024-08-02clockevents/drivers/i8253: Fix stop sequence for timer 0David Woodhouse1-11/+0
According to the data sheet, writing the MODE register should stop the counter (and thus the interrupts). This appears to work on real hardware, at least modern Intel and AMD systems. It should also work on Hyper-V. However, on some buggy virtual machines the mode change doesn't have any effect until the counter is subsequently loaded (or perhaps when the IRQ next fires). So, set MODE 0 and then load the counter, to ensure that those buggy VMs do the right thing and the interrupts stop. And then write MODE 0 *again* to stop the counter on compliant implementations too. Apparently, Hyper-V keeps firing the IRQ *repeatedly* even in mode zero when it should only happen once, but the second MODE write stops that too. Userspace test program (mostly written by tglx): ===== #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <stdint.h> #include <sys/io.h> static __always_inline void __out##bwl(type value, uint16_t port) \ { \ asm volatile("out" #bwl " %" #bw "0, %w1" \ : : "a"(value), "Nd"(port)); \ } \ \ static __always_inline type __in##bwl(uint16_t port) \ { \ type value; \ asm volatile("in" #bwl " %w1, %" #bw "0" \ : "=a"(value) : "Nd"(port)); \ return value; \ } BUILDIO(b, b, uint8_t) #define inb __inb #define outb __outb #define PIT_MODE 0x43 #define PIT_CH0 0x40 #define PIT_CH2 0x42 static int is8254; static void dump_pit(void) { if (is8254) { // Latch and output counter and status outb(0xC2, PIT_MODE); printf("%02x %02x %02x\n", inb(PIT_CH0), inb(PIT_CH0), inb(PIT_CH0)); } else { // Latch and output counter outb(0x0, PIT_MODE); printf("%02x %02x\n", inb(PIT_CH0), inb(PIT_CH0)); } } int main(int argc, char* argv[]) { int nr_counts = 2; if (argc > 1) nr_counts = atoi(argv[1]); if (argc > 2) is8254 = 1; if (ioperm(0x40, 4, 1) != 0) return 1; dump_pit(); printf("Set oneshot\n"); outb(0x38, PIT_MODE); outb(0x00, PIT_CH0); outb(0x0F, PIT_CH0); dump_pit(); usleep(1000); dump_pit(); printf("Set periodic\n"); outb(0x34, PIT_MODE); outb(0x00, PIT_CH0); outb(0x0F, PIT_CH0); dump_pit(); usleep(1000); dump_pit(); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); printf("Set stop (%d counter writes)\n", nr_counts); outb(0x30, PIT_MODE); while (nr_counts--) outb(0xFF, PIT_CH0); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); printf("Set MODE 0\n"); outb(0x30, PIT_MODE); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); return 0; } ===== Suggested-by: Sean Christopherson <[email protected]> Co-developed-by: Li RongQing <[email protected]> Signed-off-by: Li RongQing <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Link: https://lore.kernel.org/all/[email protected]
2024-08-02x86/i8253: Disable PIT timer 0 when not in useDavid Woodhouse1-2/+9
Leaving the PIT interrupt running can cause noticeable steal time for virtual guests. The VMM generally has a timer which toggles the IRQ input to the PIC and I/O APIC, which takes CPU time away from the guest. Even on real hardware, running the counter may use power needlessly (albeit not much). Make sure it's turned off if it isn't going to be used. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michael Kelley <[email protected]> Link: https://lore.kernel.org/all/[email protected]