aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2022-12-12Merge branches 'acpi-scan', 'acpi-bus', 'acpi-tables' and 'acpi-sysfs'Rafael J. Wysocki1-2/+1
Merge ACPI changes related to device enumeration, device object managenet, operation region handling, table parsing and sysfs interface: - Use ZERO_PAGE(0) instead of empty_zero_page in the ACPI device enumeration code (Giulio Benetti). - Change the return type of the ACPI driver remove callback to void and update its users accordingly (Dawei Li). - Add general support for FFH address space type and implement the low- level part of it for ARM64 (Sudeep Holla). - Fix stale comments in the ACPI tables parsing code and make it print more messages related to MADT (Hanjun Guo, Huacai Chen). - Replace invocations of generic library functions with more kernel- specific counterparts in the ACPI sysfs interface (Christophe JAILLET, Xu Panda). * acpi-scan: ACPI: scan: substitute empty_zero_page with helper ZERO_PAGE(0) * acpi-bus: ACPI: FFH: Silence missing prototype warnings ACPI: make remove callback of ACPI driver void ACPI: bus: Fix the _OSC capability check for FFH OpRegion arm64: Add architecture specific ACPI FFH Opregion callbacks ACPI: Implement a generic FFH Opregion handler * acpi-tables: ACPI: tables: Fix the stale comments for acpi_locate_initial_tables() ACPI: tables: Print CORE_PIC information when MADT is parsed * acpi-sysfs: ACPI: sysfs: use sysfs_emit() to instead of scnprintf() ACPI: sysfs: Use kstrtobool() instead of strtobool()
2022-12-11mm/sparse-vmemmap: generalise vmemmap_populate_hugepages()Feiyang Chen1-60/+32
Generalise vmemmap_populate_hugepages() so ARM64 & X86 & LoongArch can share its implementation. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Feiyang Chen <[email protected]> Signed-off-by: Huacai Chen <[email protected]> Acked-by: Will Deacon <[email protected]> Acked-by: Dave Hansen <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Guo Ren <[email protected]> Cc: Jiaxun Yang <[email protected]> Cc: Min Zhou <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Philippe Mathieu-Daudé <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Xuefeng Li <[email protected]> Cc: Xuerui Wang <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-12-10x86/mm/kmmio: Use rcu_read_lock_sched_notrace()Steven Rostedt1-3/+3
The mmiotrace tracer is "special". The purpose is to help reverse engineer binary drivers by removing the memory allocated by the driver and when the driver goes to access it, a fault occurs, the mmiotracer will record what the driver was doing and then do the work on its behalf by single stepping through the process. But to achieve this ability, it must do some special things. One is to take the rcu_read_lock() when the fault occurs, and then release it in the breakpoint that is single stepping. This makes lockdep unhappy, as it changes the state of RCU from within an exception that is not contained in that exception, and we get a nasty splat from lockdep. Instead, switch to rcu_read_lock_sched_notrace() as the RCU sched variant has the same grace period as normal RCU. This is basically the same as rcu_read_lock() but does not make lockdep complain about it. Note, the preempt_disable() is still needed as it uses preempt_enable_no_resched(). Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Karol Herbst <[email protected]> Cc: Pekka Paalanen <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-12-10x86/PCI: Use pr_info() when possibleBjorn Helgaas1-21/+18
Use pr_info() and similar when possible. No functional change intended. Link: https://lore.kernel.org/r/20221209205131.GA1726524@bhelgaas Suggested-by: Andy Shevchenko <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]>
2022-12-10x86/PCI: Fix log message typoBjorn Helgaas1-1/+1
Add missing word in the log message: - ... so future kernels can this automatically + ... so future kernels can do this automatically Suggested-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Hans de Goede <[email protected]>
2022-12-10x86/PCI: Tidy E820 removal messagesBjorn Helgaas1-2/+10
These messages: clipped [mem size 0x00000000 64bit] to [mem size 0xfffffffffffa0000 64bit] for e820 entry [mem 0x0009f000-0x000fffff] aren't as useful as they could be because (a) the resource is often IORESOURCE_UNSET, so we print the size instead of the start/end and (b) we print the available resource even if it is empty after removing the E820 entry. Print the available space by hand to avoid the IORESOURCE_UNSET problem and only if it's non-empty. No functional change intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Hans de Goede <[email protected]>
2022-12-10efi/x86: Remove EfiMemoryMappedIO from E820 mapBjorn Helgaas1-0/+46
Firmware can use EfiMemoryMappedIO to request that MMIO regions be mapped by the OS so they can be accessed by EFI runtime services, but should have no other significance to the OS (UEFI r2.10, sec 7.2). However, most bootloaders and EFI stubs convert EfiMemoryMappedIO regions to E820_TYPE_RESERVED entries, which prevent Linux from allocating space from them (see remove_e820_regions()). Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and PCI host bridge windows, which means Linux can't allocate BAR space for hot-added devices. Remove large EfiMemoryMappedIO regions from the E820 map to avoid this problem. Leave small (< 256KB) EfiMemoryMappedIO regions alone because on some platforms, these describe non-window space that's included in host bridge _CRS. If we assign that space to PCI devices, they don't work. On the Lenovo X1 Carbon, this leads to suspend/resume failures. The previous solution to the problem of allocating BARs in these regions was to add pci_crs_quirks[] entries to disable E820 checking for these machines (see d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks")): Acer DMI_PRODUCT_NAME Spin SP513-54N Clevo DMI_BOARD_NAME X170KM-G Lenovo DMI_PRODUCT_VERSION *IIL* Florent reported the BAR allocation issue on the Clevo NL4XLU. We could add another quirk for the NL4XLU, but I hope this generic change can solve it for many machines without having to add quirks. This change has been tested on Clevo X170KM-G (Konrad) and Lenovo Ideapad Slim 3 (Matt) and solves the problem even when overriding the existing quirks by booting with "pci=use_e820". Link: https://bugzilla.kernel.org/show_bug.cgi?id=216565 Clevo NL4XLU Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459#c78 Clevo X170KM-G Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 Ideapad Slim 3 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2029207 X1 Carbon Link: https://lore.kernel.org/r/[email protected] Reported-by: Florent DELAHAYE <[email protected]> Tested-by: Konrad J Hambrick <[email protected]> Tested-by: Matt Hansen <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Hans de Goede <[email protected]>
2022-12-09x86/mm/kmmio: Switch to arch_spin_lock()Steven Rostedt1-9/+22
The mmiotrace tracer is "special". The purpose is to help reverse engineer binary drivers by removing the memory allocated by the driver and when the driver goes to access it, a fault occurs, the mmiotracer will record what the driver was doing and then do the work on its behalf by single stepping through the process. But to achieve this ability, it must do some special things. One is it needs to grab a lock while in the breakpoint handler. This is considered an NMI state, and then lockdep warns that the lock is being held in both an NMI state (really a breakpoint handler) and also in normal context. As the breakpoint/NMI state only happens when the driver is accessing memory, there's no concern of a race condition against the setup and tear-down of mmiotracer. To make lockdep and mmiotrace work together, convert the locks used in the breakpoint handler into arch_spin_lock(). Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/lkml/[email protected]/ Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Karol Herbst <[email protected]> Cc: Pekka Paalanen <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-12-09ftrace/x86: Add back ftrace_expected for ftrace bug reportsSteven Rostedt (Google)1-0/+2
After someone reported a bug report with a failed modification due to the expected value not matching what was found, it came to my attention that the ftrace_expected is no longer set when that happens. This makes for debugging the issue a bit more difficult. Set ftrace_expected to the expected code before calling ftrace_bug, so that it shows what was expected and why it failed. Link: https://lore.kernel.org/all/CA+wXwBQ-VhK+hpBtYtyZP-NiX4g8fqRRWithFOHQW-0coQ3vLg@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "[email protected]" <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-12-09x86/vdso: Conditionally export __vdso_sgx_enter_enclave()Nathan Chancellor1-0/+2
Recently, ld.lld moved from '--undefined-version' to '--no-undefined-version' as the default, which breaks building the vDSO when CONFIG_X86_SGX is not set: ld.lld: error: version script assignment of 'LINUX_2.6' to symbol '__vdso_sgx_enter_enclave' failed: symbol not defined __vdso_sgx_enter_enclave is only included in the vDSO when CONFIG_X86_SGX is set. Only export it if it will be present in the final object, which clears up the error. Fixes: 8466436952017 ("x86/vdso: Implement a vDSO for Intel SGX enclave call") Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Link: https://github.com/ClangBuiltLinux/linux/issues/1756 Link: https://lore.kernel.org/r/[email protected]
2022-12-09Merge tag 'kvmarm-6.2' of ↵Paolo Bonzini2-11/+6
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.2 - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on. - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Add/Enable/Fix a bunch of selftests covering memslots, breakpoints, stage-2 faults and access tracking. You name it, we got it, we probably broke it. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. As a side effect, this tag also drags: - The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring series - A shared branch with the arm64 tree that repaints all the system registers to match the ARM ARM's naming, and resulting in interesting conflicts
2022-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-10/+26
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07Merge tag 'v6.1-rc8' into efi/nextArd Biesheuvel28-144/+194
Linux 6.1-rc8
2022-12-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+1
Pull kvm fixes from Paolo Bonzini: "Unless anything comes from the ARM side, this should be the last pull request for this release - and it's mostly documentation: - Document the interaction between KVM_CAP_HALT_POLL and halt_poll_ns - s390: fix multi-epoch extension in nested guests - x86: fix uninitialized variable on nested triple fault" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Document the interaction between KVM_CAP_HALT_POLL and halt_poll_ns KVM: Move halt-polling documentation into common directory KVM: x86: fix uninitialized variable use on KVM_REQ_TRIPLE_FAULT KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field
2022-12-05x86/apic/msi: Enable PCI/IMSThomas Gleixner1-0/+5
Enable IMS in the domain init and allocation mapping code, but do not enable it on the vector domain as discussed in various threads on LKML. The interrupt remap domains can expand this setting like they do with PCI multi MSI. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-05x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYNThomas Gleixner1-1/+1
x86 MSI irqdomains can handle MSI-X allocation post MSI-X enable just out of the box - on the vector domain and on the remapping domains, Add the feature flag to the supported feature list Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-05x86/apic/msi: Remove arch_create_remap_msi_irq_domain()Thomas Gleixner2-45/+1
and related code which is not longer required now that the interrupt remap code has been converted to MSI parent domains. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-05iommu/amd: Switch to MSI base domainsThomas Gleixner1-0/+1
Remove the global PCI/MSI irqdomain implementation and provide the required MSI parent ops so the PCI/MSI code can detect the new parent and setup per device domains. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-05iommu/vt-d: Switch to MSI parent domainsThomas Gleixner1-0/+2
Remove the global PCI/MSI irqdomain implementation and provide the required MSI parent ops so the PCI/MSI code can detect the new parent and setup per device domains. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-05x86/apic/vector: Provide MSI parent domainThomas Gleixner3-47/+136
Enable MSI parent domain support in the x86 vector domain and fixup the checks in the iommu implementations to check whether device::msi::domain is the default MSI parent domain. That keeps the existing logic to protect e.g. devices behind VMD working. The interrupt remap PCI/MSI code still works because the underlying vector domain still provides the same functionality. None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are affected either. They still work the same way both at the low level and the PCI/MSI implementations they provide. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-05x86/microcode/intel: Do not retry microcode reloading on the APsAshok Raj1-7/+1
The retries in load_ucode_intel_ap() were in place to support systems with mixed steppings. Mixed steppings are no longer supported and there is only one microcode image at a time. Any retries will simply reattempt to apply the same image over and over without making progress. [ bp: Zap the circumstantial reasoning from the commit message. ] Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading") Signed-off-by: Ashok Raj <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2022-12-05genirq/msi: Move IRQ_DOMAIN_MSI_NOMASK_QUIRK to MSI flagsThomas Gleixner1-3/+2
It's truly a MSI only flag and for the upcoming per device MSI domains this must be in the MSI flags so it can be set during domain setup without exposing this quirk outside of x86. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-05Merge branch kvm-arm64/dirty-ring into kvmarm-master/nextMarc Zyngier2-11/+6
* kvm-arm64/dirty-ring: : . : Add support for the "per-vcpu dirty-ring tracking with a bitmap : and sprinkles on top", courtesy of Gavin Shan. : : This branch drags the kvmarm-fixes-6.1-3 tag which was already : merged in 6.1-rc4 so that the branch is in a working state. : . KVM: Push dirty information unconditionally to backup bitmap KVM: selftests: Automate choosing dirty ring size in dirty_log_test KVM: selftests: Clear dirty ring states between two modes in dirty_log_test KVM: selftests: Use host page size to map ring buffer in dirty_log_test KVM: arm64: Enable ring-based dirty memory tracking KVM: Support dirty ring in conjunction with bitmap KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL Signed-off-by: Marc Zyngier <[email protected]>
2022-12-05x86/xen: Fix memory leak in xen_init_lock_cpu()Xiu Jianfeng1-3/+3
In xen_init_lock_cpu(), the @name has allocated new string by kasprintf(), if bind_ipi_to_irqhandler() fails, it should be freed, otherwise may lead to a memory leak issue, fix it. Fixes: 2d9e1e2f58b5 ("xen: implement Xen-specific spinlocks") Signed-off-by: Xiu Jianfeng <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2022-12-05x86/xen: Fix memory leak in xen_smp_intr_init{_pv}()Xiu Jianfeng2-18/+18
These local variables @{resched|pmu|callfunc...}_name saves the new string allocated by kasprintf(), and when bind_{v}ipi_to_irqhandler() fails, it goes to the @fail tag, and calls xen_smp_intr_free{_pv}() to free resource, however the new string is not saved, which cause a memory leak issue. fix it. Fixes: 9702785a747a ("i386: move xen") Signed-off-by: Xiu Jianfeng <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2022-12-05uprobes/x86: Allow to probe a NOP instruction with 0x66 prefixOleg Nesterov1-1/+3
Intel ICC -hotpatch inserts 2-byte "0x66 0x90" NOP at the start of each function to reserve extra space for hot-patching, and currently it is not possible to probe these functions because branch_setup_xol_ops() wrongly rejects NOP with REP prefix as it treats them like word-sized branch instructions. Fixes: 250bbd12c2fe ("uprobes/x86: Refuse to attach uprobe to "word-sized" branch insns") Reported-by: Seiji Nishikawa <[email protected]> Suggested-by: Denys Vlasenko <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-05x86/mtrr: Make message for disabled MTRRs more descriptiveJuergen Gross1-1/+3
Instead of just saying "Disabled" when MTRRs are disabled for any reason, tell what is disabled and why. Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-05x86/pat: Handle TDX guest PAT initializationJuergen Gross1-1/+6
With the decoupling of PAT and MTRR initialization, PAT will be used even with MTRRs disabled. This seems to break booting up as TDX guest, as the recommended sequence to set the PAT MSR across CPUs can't work in TDX guests due to disabling caches via setting CR0.CD isn't allowed in TDX mode. This is an inconsistency in the Intel documentation between the SDM and the TDX specification. For now handle TDX mode the same way as Xen PV guest mode by just accepting the current PAT MSR setting without trying to modify it. [ bp: Align conditions for better readability. ] Signed-off-by: Juergen Gross <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-05efi: Put Linux specific magic number in the DOS headerArd Biesheuvel1-1/+2
GRUB currently relies on the magic number in the image header of ARM and arm64 EFI kernel images to decide whether or not the image in question is a bootable kernel. However, the purpose of the magic number is to identify the image as one that implements the bare metal boot protocol, and so GRUB, which only does EFI boot, is limited unnecessarily to booting images that could potentially be booted in a non-EFI manner as well. This is problematic for the new zboot decompressor image format, as it can only boot in EFI mode, and must therefore not use the bare metal boot magic number in its header. For this reason, the strict magic number was dropped from GRUB, to permit essentially any kind of EFI executable to be booted via the 'linux' command, blurring the line between the linux loader and the chainloader. So let's use the same field in the DOS header that RISC-V and arm64 already use for their 'bare metal' magic numbers to store a 'generic Linux kernel' magic number, which can be used to identify bootable kernel images in PE format which don't necessarily implement a bare metal boot protocol in the same binary. Note that, in the context of EFI, the MS-DOS header is only described in terms of the fields that it shares with the hybrid PE/COFF image format, (i.e., the MS-DOS EXE magic number at offset #0 and the PE header offset at byte offset #0x3c). Since we aim for compatibility with EFI only, and not with MS-DOS or MS-Windows, we can use the remaining space in the MS-DOS header however we want. Let's set the generic magic number for x86 images as well: existing bootloaders already have their own methods to identify x86 Linux images that can be booted in a non-EFI manner, and having the magic number in place there will ease any future transitions in loader implementations to merge the x86 and non-x86 EFI boot paths. Note that 32-bit ARM already uses the same location in the header for a different purpose, but the ARM support is already widely implemented and the EFI zboot decompressor is not available on ARM anyway, so we just disregard it here. Acked-by: Leif Lindholm <[email protected]> Reviewed-by: Daniel Kiper <[email protected]> Signed-off-by: Ard Biesheuvel <[email protected]>
2022-12-03x86/microcode/intel: Do not print microcode revision and processor flagsAshok Raj1-8/+0
collect_cpu_info() is used to collect the current microcode revision and processor flags on every CPU. It had a weird mechanism to try to mimick a "once" functionality in the sense that, that information should be issued only when it is differing from the previous CPU. However (1): the new calling sequence started doing that in parallel: microcode_init() |-> schedule_on_each_cpu(setup_online_cpu) |-> collect_cpu_info() resulting in multiple redundant prints: microcode: sig=0x50654, pf=0x80, revision=0x2006e05 microcode: sig=0x50654, pf=0x80, revision=0x2006e05 microcode: sig=0x50654, pf=0x80, revision=0x2006e05 However (2): dumping this here is not that important because the kernel does not support mixed silicon steppings microcode. Finally! Besides, there is already a pr_info() in microcode_reload_late() that shows both the old and new revisions. What is more, the CPU signature (sig=0x50654) and Processor Flags (pf=0x80) above aren't that useful to the end user, they are available via /proc/cpuinfo and they don't change anyway. Remove the redundant pr_info(). [ bp: Heavily massage. ] Fixes: b6f86689d5b7 ("x86/microcode: Rip out the subsys interface gunk") Reported-by: Tony Luck <[email protected]> Signed-off-by: Ashok Raj <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-02x86/bugs: Make sure MSR_SPEC_CTRL is updated properly upon resume from S3Pawan Gupta3-9/+16
The "force" argument to write_spec_ctrl_current() is currently ambiguous as it does not guarantee the MSR write. This is due to the optimization that writes to the MSR happen only when the new value differs from the cached value. This is fine in most cases, but breaks for S3 resume when the cached MSR value gets out of sync with the hardware MSR value due to S3 resetting it. When x86_spec_ctrl_current is same as x86_spec_ctrl_base, the MSR write is skipped. Which results in SPEC_CTRL mitigations not getting restored. Move the MSR write from write_spec_ctrl_current() to a new function that unconditionally writes to the MSR. Update the callers accordingly and rename functions. [ bp: Rework a bit. ] Fixes: caa0ff24d5d0 ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value") Suggested-by: Borislav Petkov <[email protected]> Signed-off-by: Pawan Gupta <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/806d39b0bfec2fe8f50dc5446dff20f5bb24a959.1669821572.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Linus Torvalds <[email protected]>
2022-12-02Merge tag 'mm-hotfixes-stable-2022-12-02' of ↵Linus Torvalds1-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "15 hotfixes, 11 marked cc:stable. Only three or four of the latter address post-6.0 issues, which is hopefully a sign that things are converging" * tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible" Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths mm/khugepaged: fix GUP-fast interaction by sending IPI mm/khugepaged: take the right locks for page table retraction mm: migrate: fix THP's mapcount on isolation mm: introduce arch_has_hw_nonleaf_pmd_young() mm: add dummy pmd_young() for architectures not having it mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes() tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep" nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry() hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing madvise: use zap_page_range_single for madvise dontneed mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
2022-12-02Merge branch 'gpc-fixes' of git://git.infradead.org/users/dwmw2/linux into HEADPaolo Bonzini3-66/+85
Pull Xen-for-KVM changes from David Woodhouse: * add support for 32-bit guests in SCHEDOP_poll * the rest of the gfn-to-pfn cache API cleanup "I still haven't reinstated the last of those patches to make gpc->len immutable." Signed-off-by: Paolo Bonzini <[email protected]>
2022-12-02KVM: x86: Advertise that the SMM_CTL MSR is not supportedJim Mattson1-0/+4
CPUID.80000021H:EAX[bit 9] indicates that the SMM_CTL MSR (0xc0010116) is not supported. This defeature can be advertised by KVM_GET_SUPPORTED_CPUID regardless of whether or not the host enumerates it; currently it will be included only if the host enumerates at least leaf 8000001DH, due to a preexisting bug in QEMU that KVM has to work around (commit f751d8eac176, "KVM: x86: work around QEMU issue with synthetic CPUID leaves", 2022-04-29). Signed-off-by: Jim Mattson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-12-02KVM: x86: remove unnecessary exportsPaolo Bonzini4-14/+0
Several symbols are not used by vendor modules but still exported. Removing them ensures that new coupling between kvm.ko and kvm-*.ko is noticed and reviewed. Co-developed-by: Sean Christopherson <[email protected]> Co-developed-by: Like Xu <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Like Xu <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-12-02KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itselfYuan ZhaoXiong1-2/+3
When a VM reboots itself, the reset process will result in an ioctl(KVM_SET_LAPIC, ...) to disable x2APIC mode and set the xAPIC id of the vCPU to its default value, which is the vCPU id. That will be handled in KVM as follows: kvm_vcpu_ioctl_set_lapic kvm_apic_set_state kvm_lapic_set_base => disable X2APIC mode kvm_apic_state_fixup kvm_lapic_xapic_id_updated kvm_xapic_id(apic) != apic->vcpu->vcpu_id kvm_set_apicv_inhibit(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s)) => update APIC_ID When kvm_apic_set_state invokes kvm_lapic_set_base to disable x2APIC mode, the old 32-bit x2APIC id is still present rather than the 8-bit xAPIC id. kvm_lapic_xapic_id_updated will set the APICV_INHIBIT_REASON_APIC_ID_MODIFIED bit and disable APICv/x2AVIC. Instead, kvm_lapic_xapic_id_updated must be called after APIC_ID is changed. In fact, this fixes another small issue in the code in that potential changes to a vCPU's xAPIC ID need not be tracked for KVM_GET_LAPIC. Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") Signed-off-by: Yuan ZhaoXiong <[email protected]> Message-Id: <[email protected]> Cc: [email protected] Reported-by: Alejandro Jimenez <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-12-02Merge tag 'kvm-x86-fixes-6.2-1' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini10-51/+161
Misc KVM x86 fixes and cleanups for 6.2: - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. - Clean up the MSR filter docs. - Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. - Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency.
2022-12-02KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctlJavier Martinez Canillas1-8/+0
The documentation says that the ioctl has been deprecated, but it has been actually removed and the remaining references are just left overs. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Javier Martinez Canillas <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-12-02x86/sgx: Replace kmap/kunmap_atomic() callsKristen Carlson Accardi3-12/+12
kmap_local_page() is the preferred way to create temporary mappings when it is feasible, because the mappings are thread-local and CPU-local. kmap_local_page() uses per-task maps rather than per-CPU maps. This in effect removes the need to disable preemption on the local CPU while the mapping is active, and thus vastly reduces overall system latency. It is also valid to take pagefaults within the mapped region. The use of kmap_atomic() in the SGX code was not an explicit design choice to disable page faults or preemption, and there is no compelling design reason to using kmap_atomic() vs. kmap_local_page(). Signed-off-by: Kristen Carlson Accardi <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Reviewed-by: Fabio M. De Francesco <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Link: https://lore.kernel.org/linux-sgx/Y0biN3%[email protected]/ Link: https://lore.kernel.org/r/[email protected]
2022-12-02x86/of: Add support for boot time interrupt delivery mode configurationRahul Tanwar1-1/+8
Presently, init/boot time interrupt delivery mode is enumerated only for ACPI enabled systems by parsing MADT table or for older systems by parsing MP table. But for OF based x86 systems, it is assumed & hardcoded to be legacy PIC mode. This causes a boot time crash for platforms which do not provide a 8259 compliant legacy PIC. Add support for configuration of init time interrupt delivery mode for x86 OF based systems by introducing a new optional boolean property 'intel,virtual-wire-mode' for the local APIC interrupt-controller node. This property emulates IMCRP Bit 7 of MP feature info byte 2 of MP floating pointer structure. Defaults to legacy PIC mode if absent. Configures it to virtual wire compatibility mode if present. Signed-off-by: Rahul Tanwar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-02x86/of: Replace printk(KERN_LVL) with pr_lvl()Rahul Tanwar1-2/+2
Use pr_lvl() instead of the deprecated printk(KERN_LVL). Just a upgrade of print utilities usage. no functional changes. Suggested-by: Andy Shevchenko <[email protected]> Signed-off-by: Rahul Tanwar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-02x86/of: Remove unused early_init_dt_add_memory_arch()Andy Shevchenko1-5/+0
Recently objtool started complaining about dead code in the object files, in particular vmlinux.o: warning: objtool: early_init_dt_scan_memory+0x191: unreachable instruction when CONFIG_OF=y. Indeed, early_init_dt_scan() is not used on x86 and making it compile (with help of CONFIG_OF) will abrupt the code flow since in the middle of it there is a BUG() instruction. Remove the pointless function. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-02x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOSMateusz Jończyk3-9/+11
A kernel that was compiled without CONFIG_X86_X2APIC was unable to boot on platforms that have x2APIC already enabled in the BIOS before starting the kernel. The kernel was supposed to panic with an approprite error message in validate_x2apic() due to the missing X2APIC support. However, validate_x2apic() was run too late in the boot cycle, and the kernel tried to initialize the APIC nonetheless. This resulted in an earlier panic in setup_local_APIC() because the APIC was not registered. In my experiments, a panic message in setup_local_APIC() was not visible in the graphical console, which resulted in a hang with no indication what has gone wrong. Instead of calling panic(), disable the APIC, which results in a somewhat working system with the PIC only (and no SMP). This way the user is able to diagnose the problem more easily. Disabling X2APIC mode is not an option because it's impossible on systems with locked x2APIC. The proper place to disable the APIC in this case is in check_x2apic(), which is called early from setup_arch(). Doing this in __apic_intr_mode_select() is too late. Make check_x2apic() unconditionally available and remove the empty stub. Reported-by: Paul Menzel <[email protected]> Reported-by: Robert Elliott (Servers) <[email protected]> Signed-off-by: Mateusz Jończyk <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Link: https://lore.kernel.org/lkml/[email protected] Link: https://lore.kernel.org/all/[email protected]
2022-12-02x86/asm/32: Remove setup_once()Brian Gerst1-22/+0
After the removal of the stack canary segment setup code, this function does nothing. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-02x86/alternative: Remove noinline from __ibt_endbr_seal[_end]() stubsMiaohe Lin1-1/+1
Due to the explicit 'noinline' GCC-7.3 is not able to optimize away the argument setup of: apply_ibt_endbr(__ibt_endbr_seal, __ibt_enbr_seal_end); even when X86_KERNEL_IBT=n and the function is an empty stub, which leads to link errors due to missing __ibt_endbr_seal* symbols: ld: arch/x86/kernel/alternative.o: in function `alternative_instructions': alternative.c:(.init.text+0x15d): undefined reference to `__ibt_endbr_seal_end' ld: alternative.c:(.init.text+0x164): undefined reference to `__ibt_endbr_seal' Remove the explicit 'noinline' to help gcc optimize them away. Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-02crypto: Prepare to move crypto_tfm_ctxHerbert Xu1-1/+1
The helper crypto_tfm_ctx is only used by the Crypto API algorithm code and should really be in algapi.h. However, for historical reasons many files relied on it to be in crypto.h. This patch changes those files to use algapi.h instead in prepartion for a move. Signed-off-by: Herbert Xu <[email protected]>
2022-12-02crypto: x86/curve25519 - disable gcovJoe Fradley1-0/+3
curve25519-x86_64.c fails to build when CONFIG_GCOV_KERNEL is enabled. The error is "inline assembly requires more registers than available" thrown from the `fsqr()` function. Therefore, excluding this file from GCOV profiling until this issue is resolved. Thereby allowing CONFIG_GCOV_PROFILE_ALL to be enabled for x86. Signed-off-by: Joe Fradley <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2022-12-01mm/pgtable: Fix multiple -Wstringop-overflow warningsGustavo A. R. Silva1-9/+13
The actual size of the following arrays at run-time depends on CONFIG_X86_PAE. 427 pmd_t *u_pmds[MAX_PREALLOCATED_USER_PMDS]; 428 pmd_t *pmds[MAX_PREALLOCATED_PMDS]; If CONFIG_X86_PAE is not enabled, their final size will be zero (which is technically not a legal storage size in C, but remains "valid" via the GNU extension). In that case, the compiler complains about trying to access objects of size zero when calling functions where these objects are passed as arguments. Fix this by sanity-checking the size of those arrays just before the function calls. Also, the following warnings are fixed by these changes when building with GCC 11+ and -Wstringop-overflow enabled: arch/x86/mm/pgtable.c:437:13: warning: ‘preallocate_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=] arch/x86/mm/pgtable.c:440:13: warning: ‘preallocate_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=] arch/x86/mm/pgtable.c:462:9: warning: ‘free_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=] arch/x86/mm/pgtable.c:455:9: warning: ‘pgd_prepopulate_user_pmd’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=] arch/x86/mm/pgtable.c:464:9: warning: ‘free_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=] This is one of the last cases in the ongoing effort to globally enable -Wstringop-overflow. The alternative to this is to make the originally suggested change: make the pmds argument from an array pointer to a pointer pointer. That situation is considered "legal" for C in the sense that it does not have a way to reason about the storage. i.e.: -static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[]) +static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t **pmds) With the above change, there's no difference in binary output, and the compiler warning is silenced. However, with this patch, the compiler can actually figure out that it isn't using the code at all, and it gets dropped: text data bss dec hex filename 8218 718 32 8968 2308 arch/x86/mm/pgtable.o.before 7765 694 32 8491 212b arch/x86/mm/pgtable.o.after So this case (fixing a warning and reducing image size) is a clear win. Additionally drops an old work-around for GCC in the same code. Link: https://github.com/KSPP/linux/issues/203 Link: https://github.com/KSPP/linux/issues/181 Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/Yytb67xvrnctxnEe@work
2022-12-01vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copyJann Horn1-23/+0
find_timens_vvar_page() is not architecture-specific, as can be seen from how all five per-architecture versions of it are the same. (arm64, powerpc and riscv are exactly the same; x86 and s390 have two characters difference inside a comment, less blank lines, and mark the !CONFIG_TIME_NS version as inline.) Refactor the five copies into a central copy in kernel/time/namespace.c. Signed-off-by: Jann Horn <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-30KVM: x86: Use current rather than snapshotted TSC frequency if it is constantAnton Romanov1-11/+33
Don't snapshot tsc_khz into per-cpu cpu_tsc_khz if the host TSC is constant, in which case the actual TSC frequency will never change and thus capturing TSC during initialization is unnecessary, KVM can simply use tsc_khz. This value is snapshotted from kvm_timer_init->kvmclock_cpu_online->tsc_khz_changed(NULL) On CPUs with constant TSC, but not a hardware-specified TSC frequency, snapshotting cpu_tsc_khz and using that to set a VM's target TSC frequency can lead to VM to think its TSC frequency is not what it actually is if refining the TSC completes after KVM snapshots tsc_khz. The actual frequency never changes, only the kernel's calculation of what that frequency is changes. Ideally, KVM would not be able to race with TSC refinement, or would have a hook into tsc_refine_calibration_work() to get an alert when refinement is complete. Avoiding the race altogether isn't practical as refinement takes a relative eternity; it's deliberately put on a work queue outside of the normal boot sequence to avoid unnecessarily delaying boot. Adding a hook is doable, but somewhat gross due to KVM's ability to be built as a module. And if the TSC is constant, which is likely the case for every VMX/SVM-capable CPU produced in the last decade, the race can be hit if and only if userspace is able to create a VM before TSC refinement completes; refinement is slow, but not that slow. For now, punt on a proper fix, as not taking a snapshot can help some uses cases and not taking a snapshot is arguably correct irrespective of the race with refinement. Signed-off-by: Anton Romanov <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>