Age | Commit message (Collapse) | Author | Files | Lines |
|
Merge ACPI changes related to device enumeration, device object
managenet, operation region handling, table parsing and sysfs
interface:
- Use ZERO_PAGE(0) instead of empty_zero_page in the ACPI device
enumeration code (Giulio Benetti).
- Change the return type of the ACPI driver remove callback to void and
update its users accordingly (Dawei Li).
- Add general support for FFH address space type and implement the low-
level part of it for ARM64 (Sudeep Holla).
- Fix stale comments in the ACPI tables parsing code and make it print
more messages related to MADT (Hanjun Guo, Huacai Chen).
- Replace invocations of generic library functions with more kernel-
specific counterparts in the ACPI sysfs interface (Christophe JAILLET,
Xu Panda).
* acpi-scan:
ACPI: scan: substitute empty_zero_page with helper ZERO_PAGE(0)
* acpi-bus:
ACPI: FFH: Silence missing prototype warnings
ACPI: make remove callback of ACPI driver void
ACPI: bus: Fix the _OSC capability check for FFH OpRegion
arm64: Add architecture specific ACPI FFH Opregion callbacks
ACPI: Implement a generic FFH Opregion handler
* acpi-tables:
ACPI: tables: Fix the stale comments for acpi_locate_initial_tables()
ACPI: tables: Print CORE_PIC information when MADT is parsed
* acpi-sysfs:
ACPI: sysfs: use sysfs_emit() to instead of scnprintf()
ACPI: sysfs: Use kstrtobool() instead of strtobool()
|
|
Generalise vmemmap_populate_hugepages() so ARM64 & X86 & LoongArch can
share its implementation.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Feiyang Chen <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dinh Nguyen <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: Min Zhou <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Philippe Mathieu-Daudé <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Xuefeng Li <[email protected]>
Cc: Xuerui Wang <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The mmiotrace tracer is "special". The purpose is to help reverse engineer
binary drivers by removing the memory allocated by the driver and when the
driver goes to access it, a fault occurs, the mmiotracer will record what
the driver was doing and then do the work on its behalf by single stepping
through the process.
But to achieve this ability, it must do some special things. One is to
take the rcu_read_lock() when the fault occurs, and then release it in the
breakpoint that is single stepping. This makes lockdep unhappy, as it
changes the state of RCU from within an exception that is not contained in
that exception, and we get a nasty splat from lockdep.
Instead, switch to rcu_read_lock_sched_notrace() as the RCU sched variant
has the same grace period as normal RCU. This is basically the same as
rcu_read_lock() but does not make lockdep complain about it.
Note, the preempt_disable() is still needed as it uses preempt_enable_no_resched().
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Karol Herbst <[email protected]>
Cc: Pekka Paalanen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Use pr_info() and similar when possible. No functional change intended.
Link: https://lore.kernel.org/r/20221209205131.GA1726524@bhelgaas
Suggested-by: Andy Shevchenko <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
|
|
Add missing word in the log message:
- ... so future kernels can this automatically
+ ... so future kernels can do this automatically
Suggested-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Hans de Goede <[email protected]>
|
|
These messages:
clipped [mem size 0x00000000 64bit] to [mem size 0xfffffffffffa0000 64bit] for e820 entry [mem 0x0009f000-0x000fffff]
aren't as useful as they could be because (a) the resource is often
IORESOURCE_UNSET, so we print the size instead of the start/end and (b) we
print the available resource even if it is empty after removing the E820
entry.
Print the available space by hand to avoid the IORESOURCE_UNSET problem and
only if it's non-empty. No functional change intended.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Hans de Goede <[email protected]>
|
|
Firmware can use EfiMemoryMappedIO to request that MMIO regions be mapped
by the OS so they can be accessed by EFI runtime services, but should have
no other significance to the OS (UEFI r2.10, sec 7.2). However, most
bootloaders and EFI stubs convert EfiMemoryMappedIO regions to
E820_TYPE_RESERVED entries, which prevent Linux from allocating space from
them (see remove_e820_regions()).
Some platforms use EfiMemoryMappedIO entries for PCI MMCONFIG space and PCI
host bridge windows, which means Linux can't allocate BAR space for
hot-added devices.
Remove large EfiMemoryMappedIO regions from the E820 map to avoid this
problem.
Leave small (< 256KB) EfiMemoryMappedIO regions alone because on some
platforms, these describe non-window space that's included in host bridge
_CRS. If we assign that space to PCI devices, they don't work. On the
Lenovo X1 Carbon, this leads to suspend/resume failures.
The previous solution to the problem of allocating BARs in these regions
was to add pci_crs_quirks[] entries to disable E820 checking for these
machines (see d341838d776a ("x86/PCI: Disable E820 reserved region clipping
via quirks")):
Acer DMI_PRODUCT_NAME Spin SP513-54N
Clevo DMI_BOARD_NAME X170KM-G
Lenovo DMI_PRODUCT_VERSION *IIL*
Florent reported the BAR allocation issue on the Clevo NL4XLU. We could
add another quirk for the NL4XLU, but I hope this generic change can solve
it for many machines without having to add quirks.
This change has been tested on Clevo X170KM-G (Konrad) and Lenovo Ideapad
Slim 3 (Matt) and solves the problem even when overriding the existing
quirks by booting with "pci=use_e820".
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216565 Clevo NL4XLU
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206459#c78 Clevo X170KM-G
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 Ideapad Slim 3
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2029207 X1 Carbon
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Florent DELAHAYE <[email protected]>
Tested-by: Konrad J Hambrick <[email protected]>
Tested-by: Matt Hansen <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Hans de Goede <[email protected]>
|
|
The mmiotrace tracer is "special". The purpose is to help reverse engineer
binary drivers by removing the memory allocated by the driver and when the
driver goes to access it, a fault occurs, the mmiotracer will record what
the driver was doing and then do the work on its behalf by single stepping
through the process.
But to achieve this ability, it must do some special things. One is it
needs to grab a lock while in the breakpoint handler. This is considered
an NMI state, and then lockdep warns that the lock is being held in both
an NMI state (really a breakpoint handler) and also in normal context.
As the breakpoint/NMI state only happens when the driver is accessing
memory, there's no concern of a race condition against the setup and
tear-down of mmiotracer.
To make lockdep and mmiotrace work together, convert the locks used in the
breakpoint handler into arch_spin_lock().
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/lkml/[email protected]/
Cc: Masami Hiramatsu <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Karol Herbst <[email protected]>
Cc: Pekka Paalanen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
After someone reported a bug report with a failed modification due to the
expected value not matching what was found, it came to my attention that
the ftrace_expected is no longer set when that happens. This makes for
debugging the issue a bit more difficult.
Set ftrace_expected to the expected code before calling ftrace_bug, so
that it shows what was expected and why it failed.
Link: https://lore.kernel.org/all/CA+wXwBQ-VhK+hpBtYtyZP-NiX4g8fqRRWithFOHQW-0coQ3vLg@mail.gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Recently, ld.lld moved from '--undefined-version' to
'--no-undefined-version' as the default, which breaks building the vDSO
when CONFIG_X86_SGX is not set:
ld.lld: error: version script assignment of 'LINUX_2.6' to symbol '__vdso_sgx_enter_enclave' failed: symbol not defined
__vdso_sgx_enter_enclave is only included in the vDSO when
CONFIG_X86_SGX is set. Only export it if it will be present in the final
object, which clears up the error.
Fixes: 8466436952017 ("x86/vdso: Implement a vDSO for Intel SGX enclave call")
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Link: https://github.com/ClangBuiltLinux/linux/issues/1756
Link: https://lore.kernel.org/r/[email protected]
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.2
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on.
- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
to have its own view of a vcpu, keeping that state private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
only) as a prefix of the oncoming support for 4kB and 16kB pages.
- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
stage-2 faults and access tracking. You name it, we got it, we
probably broke it.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
As a side effect, this tag also drags:
- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
series
- A shared branch with the arm64 tree that repaints all the system
registers to match the ARM ARM's naming, and resulting in
interesting conflicts
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Linux 6.1-rc8
|
|
Pull kvm fixes from Paolo Bonzini:
"Unless anything comes from the ARM side, this should be the last pull
request for this release - and it's mostly documentation:
- Document the interaction between KVM_CAP_HALT_POLL and halt_poll_ns
- s390: fix multi-epoch extension in nested guests
- x86: fix uninitialized variable on nested triple fault"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Document the interaction between KVM_CAP_HALT_POLL and halt_poll_ns
KVM: Move halt-polling documentation into common directory
KVM: x86: fix uninitialized variable use on KVM_REQ_TRIPLE_FAULT
KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field
|
|
Enable IMS in the domain init and allocation mapping code, but do not
enable it on the vector domain as discussed in various threads on LKML.
The interrupt remap domains can expand this setting like they do with
PCI multi MSI.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
x86 MSI irqdomains can handle MSI-X allocation post MSI-X enable just out
of the box - on the vector domain and on the remapping domains,
Add the feature flag to the supported feature list
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
and related code which is not longer required now that the interrupt remap
code has been converted to MSI parent domains.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Remove the global PCI/MSI irqdomain implementation and provide the required
MSI parent ops so the PCI/MSI code can detect the new parent and setup per
device domains.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Remove the global PCI/MSI irqdomain implementation and provide the required
MSI parent ops so the PCI/MSI code can detect the new parent and setup per
device domains.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Enable MSI parent domain support in the x86 vector domain and fixup the
checks in the iommu implementations to check whether device::msi::domain is
the default MSI parent domain. That keeps the existing logic to protect
e.g. devices behind VMD working.
The interrupt remap PCI/MSI code still works because the underlying vector
domain still provides the same functionality.
None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are
affected either. They still work the same way both at the low level and the
PCI/MSI implementations they provide.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The retries in load_ucode_intel_ap() were in place to support systems
with mixed steppings. Mixed steppings are no longer supported and there is
only one microcode image at a time. Any retries will simply reattempt to
apply the same image over and over without making progress.
[ bp: Zap the circumstantial reasoning from the commit message. ]
Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Signed-off-by: Ashok Raj <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
It's truly a MSI only flag and for the upcoming per device MSI domains this
must be in the MSI flags so it can be set during domain setup without
exposing this quirk outside of x86.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
* kvm-arm64/dirty-ring:
: .
: Add support for the "per-vcpu dirty-ring tracking with a bitmap
: and sprinkles on top", courtesy of Gavin Shan.
:
: This branch drags the kvmarm-fixes-6.1-3 tag which was already
: merged in 6.1-rc4 so that the branch is in a working state.
: .
KVM: Push dirty information unconditionally to backup bitmap
KVM: selftests: Automate choosing dirty ring size in dirty_log_test
KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
KVM: selftests: Use host page size to map ring buffer in dirty_log_test
KVM: arm64: Enable ring-based dirty memory tracking
KVM: Support dirty ring in conjunction with bitmap
KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
Signed-off-by: Marc Zyngier <[email protected]>
|
|
In xen_init_lock_cpu(), the @name has allocated new string by kasprintf(),
if bind_ipi_to_irqhandler() fails, it should be freed, otherwise may lead
to a memory leak issue, fix it.
Fixes: 2d9e1e2f58b5 ("xen: implement Xen-specific spinlocks")
Signed-off-by: Xiu Jianfeng <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
These local variables @{resched|pmu|callfunc...}_name saves the new
string allocated by kasprintf(), and when bind_{v}ipi_to_irqhandler()
fails, it goes to the @fail tag, and calls xen_smp_intr_free{_pv}() to
free resource, however the new string is not saved, which cause a memory
leak issue. fix it.
Fixes: 9702785a747a ("i386: move xen")
Signed-off-by: Xiu Jianfeng <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
Intel ICC -hotpatch inserts 2-byte "0x66 0x90" NOP at the start of each
function to reserve extra space for hot-patching, and currently it is not
possible to probe these functions because branch_setup_xol_ops() wrongly
rejects NOP with REP prefix as it treats them like word-sized branch
instructions.
Fixes: 250bbd12c2fe ("uprobes/x86: Refuse to attach uprobe to "word-sized" branch insns")
Reported-by: Seiji Nishikawa <[email protected]>
Suggested-by: Denys Vlasenko <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Instead of just saying "Disabled" when MTRRs are disabled for any
reason, tell what is disabled and why.
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
With the decoupling of PAT and MTRR initialization, PAT will be used
even with MTRRs disabled. This seems to break booting up as TDX guest,
as the recommended sequence to set the PAT MSR across CPUs can't work
in TDX guests due to disabling caches via setting CR0.CD isn't allowed
in TDX mode.
This is an inconsistency in the Intel documentation between the SDM
and the TDX specification. For now handle TDX mode the same way as Xen
PV guest mode by just accepting the current PAT MSR setting without
trying to modify it.
[ bp: Align conditions for better readability. ]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
GRUB currently relies on the magic number in the image header of ARM and
arm64 EFI kernel images to decide whether or not the image in question
is a bootable kernel.
However, the purpose of the magic number is to identify the image as one
that implements the bare metal boot protocol, and so GRUB, which only
does EFI boot, is limited unnecessarily to booting images that could
potentially be booted in a non-EFI manner as well.
This is problematic for the new zboot decompressor image format, as it
can only boot in EFI mode, and must therefore not use the bare metal
boot magic number in its header.
For this reason, the strict magic number was dropped from GRUB, to
permit essentially any kind of EFI executable to be booted via the
'linux' command, blurring the line between the linux loader and the
chainloader.
So let's use the same field in the DOS header that RISC-V and arm64
already use for their 'bare metal' magic numbers to store a 'generic
Linux kernel' magic number, which can be used to identify bootable
kernel images in PE format which don't necessarily implement a bare
metal boot protocol in the same binary. Note that, in the context of
EFI, the MS-DOS header is only described in terms of the fields that it
shares with the hybrid PE/COFF image format, (i.e., the MS-DOS EXE magic
number at offset #0 and the PE header offset at byte offset #0x3c).
Since we aim for compatibility with EFI only, and not with MS-DOS or
MS-Windows, we can use the remaining space in the MS-DOS header however
we want.
Let's set the generic magic number for x86 images as well: existing
bootloaders already have their own methods to identify x86 Linux images
that can be booted in a non-EFI manner, and having the magic number in
place there will ease any future transitions in loader implementations
to merge the x86 and non-x86 EFI boot paths.
Note that 32-bit ARM already uses the same location in the header for a
different purpose, but the ARM support is already widely implemented and
the EFI zboot decompressor is not available on ARM anyway, so we just
disregard it here.
Acked-by: Leif Lindholm <[email protected]>
Reviewed-by: Daniel Kiper <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
|
|
collect_cpu_info() is used to collect the current microcode revision and
processor flags on every CPU.
It had a weird mechanism to try to mimick a "once" functionality in the
sense that, that information should be issued only when it is differing
from the previous CPU.
However (1):
the new calling sequence started doing that in parallel:
microcode_init()
|-> schedule_on_each_cpu(setup_online_cpu)
|-> collect_cpu_info()
resulting in multiple redundant prints:
microcode: sig=0x50654, pf=0x80, revision=0x2006e05
microcode: sig=0x50654, pf=0x80, revision=0x2006e05
microcode: sig=0x50654, pf=0x80, revision=0x2006e05
However (2):
dumping this here is not that important because the kernel does not
support mixed silicon steppings microcode. Finally!
Besides, there is already a pr_info() in microcode_reload_late() that
shows both the old and new revisions.
What is more, the CPU signature (sig=0x50654) and Processor Flags
(pf=0x80) above aren't that useful to the end user, they are available
via /proc/cpuinfo and they don't change anyway.
Remove the redundant pr_info().
[ bp: Heavily massage. ]
Fixes: b6f86689d5b7 ("x86/microcode: Rip out the subsys interface gunk")
Reported-by: Tony Luck <[email protected]>
Signed-off-by: Ashok Raj <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The "force" argument to write_spec_ctrl_current() is currently ambiguous
as it does not guarantee the MSR write. This is due to the optimization
that writes to the MSR happen only when the new value differs from the
cached value.
This is fine in most cases, but breaks for S3 resume when the cached MSR
value gets out of sync with the hardware MSR value due to S3 resetting
it.
When x86_spec_ctrl_current is same as x86_spec_ctrl_base, the MSR write
is skipped. Which results in SPEC_CTRL mitigations not getting restored.
Move the MSR write from write_spec_ctrl_current() to a new function that
unconditionally writes to the MSR. Update the callers accordingly and
rename functions.
[ bp: Rework a bit. ]
Fixes: caa0ff24d5d0 ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value")
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/806d39b0bfec2fe8f50dc5446dff20f5bb24a959.1669821572.git.pawan.kumar.gupta@linux.intel.com
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"15 hotfixes, 11 marked cc:stable.
Only three or four of the latter address post-6.0 issues, which is
hopefully a sign that things are converging"
* tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible"
Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
mm/khugepaged: fix GUP-fast interaction by sending IPI
mm/khugepaged: take the right locks for page table retraction
mm: migrate: fix THP's mapcount on isolation
mm: introduce arch_has_hw_nonleaf_pmd_young()
mm: add dummy pmd_young() for architectures not having it
mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
madvise: use zap_page_range_single for madvise dontneed
mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
|
|
Pull Xen-for-KVM changes from David Woodhouse:
* add support for 32-bit guests in SCHEDOP_poll
* the rest of the gfn-to-pfn cache API cleanup
"I still haven't reinstated the last of those patches to make gpc->len
immutable."
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
CPUID.80000021H:EAX[bit 9] indicates that the SMM_CTL MSR (0xc0010116) is
not supported. This defeature can be advertised by KVM_GET_SUPPORTED_CPUID
regardless of whether or not the host enumerates it; currently it will be
included only if the host enumerates at least leaf 8000001DH, due to a
preexisting bug in QEMU that KVM has to work around (commit f751d8eac176,
"KVM: x86: work around QEMU issue with synthetic CPUID leaves", 2022-04-29).
Signed-off-by: Jim Mattson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Several symbols are not used by vendor modules but still exported.
Removing them ensures that new coupling between kvm.ko and kvm-*.ko
is noticed and reviewed.
Co-developed-by: Sean Christopherson <[email protected]>
Co-developed-by: Like Xu <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When a VM reboots itself, the reset process will result in
an ioctl(KVM_SET_LAPIC, ...) to disable x2APIC mode and set
the xAPIC id of the vCPU to its default value, which is the
vCPU id.
That will be handled in KVM as follows:
kvm_vcpu_ioctl_set_lapic
kvm_apic_set_state
kvm_lapic_set_base => disable X2APIC mode
kvm_apic_state_fixup
kvm_lapic_xapic_id_updated
kvm_xapic_id(apic) != apic->vcpu->vcpu_id
kvm_set_apicv_inhibit(APICV_INHIBIT_REASON_APIC_ID_MODIFIED)
memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s)) => update APIC_ID
When kvm_apic_set_state invokes kvm_lapic_set_base to disable
x2APIC mode, the old 32-bit x2APIC id is still present rather
than the 8-bit xAPIC id. kvm_lapic_xapic_id_updated will set the
APICV_INHIBIT_REASON_APIC_ID_MODIFIED bit and disable APICv/x2AVIC.
Instead, kvm_lapic_xapic_id_updated must be called after APIC_ID is
changed.
In fact, this fixes another small issue in the code in that
potential changes to a vCPU's xAPIC ID need not be tracked for
KVM_GET_LAPIC.
Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
Signed-off-by: Yuan ZhaoXiong <[email protected]>
Message-Id: <[email protected]>
Cc: [email protected]
Reported-by: Alejandro Jimenez <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Misc KVM x86 fixes and cleanups for 6.2:
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
- Clean up the MSR filter docs.
- Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
- Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
- Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
|
|
The documentation says that the ioctl has been deprecated, but it has been
actually removed and the remaining references are just left overs.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Javier Martinez Canillas <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
kmap_local_page() is the preferred way to create temporary mappings when it
is feasible, because the mappings are thread-local and CPU-local.
kmap_local_page() uses per-task maps rather than per-CPU maps. This in
effect removes the need to disable preemption on the local CPU while the
mapping is active, and thus vastly reduces overall system latency. It is
also valid to take pagefaults within the mapped region.
The use of kmap_atomic() in the SGX code was not an explicit design choice
to disable page faults or preemption, and there is no compelling design
reason to using kmap_atomic() vs. kmap_local_page().
Signed-off-by: Kristen Carlson Accardi <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Reviewed-by: Fabio M. De Francesco <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Link: https://lore.kernel.org/linux-sgx/Y0biN3%[email protected]/
Link: https://lore.kernel.org/r/[email protected]
|
|
Presently, init/boot time interrupt delivery mode is enumerated only for
ACPI enabled systems by parsing MADT table or for older systems by parsing
MP table. But for OF based x86 systems, it is assumed & hardcoded to be
legacy PIC mode. This causes a boot time crash for platforms which do not
provide a 8259 compliant legacy PIC.
Add support for configuration of init time interrupt delivery mode for x86
OF based systems by introducing a new optional boolean property
'intel,virtual-wire-mode' for the local APIC interrupt-controller
node. This property emulates IMCRP Bit 7 of MP feature info byte 2 of MP
floating pointer structure.
Defaults to legacy PIC mode if absent. Configures it to virtual wire
compatibility mode if present.
Signed-off-by: Rahul Tanwar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Use pr_lvl() instead of the deprecated printk(KERN_LVL).
Just a upgrade of print utilities usage. no functional changes.
Suggested-by: Andy Shevchenko <[email protected]>
Signed-off-by: Rahul Tanwar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Recently objtool started complaining about dead code in the object files,
in particular
vmlinux.o: warning: objtool: early_init_dt_scan_memory+0x191: unreachable instruction
when CONFIG_OF=y.
Indeed, early_init_dt_scan() is not used on x86 and making it compile (with
help of CONFIG_OF) will abrupt the code flow since in the middle of it
there is a BUG() instruction.
Remove the pointless function.
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
A kernel that was compiled without CONFIG_X86_X2APIC was unable to boot on
platforms that have x2APIC already enabled in the BIOS before starting the
kernel.
The kernel was supposed to panic with an approprite error message in
validate_x2apic() due to the missing X2APIC support.
However, validate_x2apic() was run too late in the boot cycle, and the
kernel tried to initialize the APIC nonetheless. This resulted in an
earlier panic in setup_local_APIC() because the APIC was not registered.
In my experiments, a panic message in setup_local_APIC() was not visible
in the graphical console, which resulted in a hang with no indication
what has gone wrong.
Instead of calling panic(), disable the APIC, which results in a somewhat
working system with the PIC only (and no SMP). This way the user is able to
diagnose the problem more easily.
Disabling X2APIC mode is not an option because it's impossible on systems
with locked x2APIC.
The proper place to disable the APIC in this case is in check_x2apic(),
which is called early from setup_arch(). Doing this in
__apic_intr_mode_select() is too late.
Make check_x2apic() unconditionally available and remove the empty stub.
Reported-by: Paul Menzel <[email protected]>
Reported-by: Robert Elliott (Servers) <[email protected]>
Signed-off-by: Mateusz Jończyk <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Link: https://lore.kernel.org/lkml/[email protected]
Link: https://lore.kernel.org/all/[email protected]
|
|
After the removal of the stack canary segment setup code, this function
does nothing.
Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Due to the explicit 'noinline' GCC-7.3 is not able to optimize away the
argument setup of:
apply_ibt_endbr(__ibt_endbr_seal, __ibt_enbr_seal_end);
even when X86_KERNEL_IBT=n and the function is an empty stub, which leads
to link errors due to missing __ibt_endbr_seal* symbols:
ld: arch/x86/kernel/alternative.o: in function `alternative_instructions':
alternative.c:(.init.text+0x15d): undefined reference to `__ibt_endbr_seal_end'
ld: alternative.c:(.init.text+0x164): undefined reference to `__ibt_endbr_seal'
Remove the explicit 'noinline' to help gcc optimize them away.
Signed-off-by: Miaohe Lin <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The helper crypto_tfm_ctx is only used by the Crypto API algorithm
code and should really be in algapi.h. However, for historical
reasons many files relied on it to be in crypto.h. This patch
changes those files to use algapi.h instead in prepartion for a
move.
Signed-off-by: Herbert Xu <[email protected]>
|
|
curve25519-x86_64.c fails to build when CONFIG_GCOV_KERNEL is enabled.
The error is "inline assembly requires more registers than available"
thrown from the `fsqr()` function. Therefore, excluding this file from
GCOV profiling until this issue is resolved. Thereby allowing
CONFIG_GCOV_PROFILE_ALL to be enabled for x86.
Signed-off-by: Joe Fradley <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
The actual size of the following arrays at run-time depends on
CONFIG_X86_PAE.
427 pmd_t *u_pmds[MAX_PREALLOCATED_USER_PMDS];
428 pmd_t *pmds[MAX_PREALLOCATED_PMDS];
If CONFIG_X86_PAE is not enabled, their final size will be zero (which
is technically not a legal storage size in C, but remains "valid" via
the GNU extension). In that case, the compiler complains about trying to
access objects of size zero when calling functions where these objects
are passed as arguments.
Fix this by sanity-checking the size of those arrays just before the
function calls. Also, the following warnings are fixed by these changes
when building with GCC 11+ and -Wstringop-overflow enabled:
arch/x86/mm/pgtable.c:437:13: warning: ‘preallocate_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
arch/x86/mm/pgtable.c:440:13: warning: ‘preallocate_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
arch/x86/mm/pgtable.c:462:9: warning: ‘free_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
arch/x86/mm/pgtable.c:455:9: warning: ‘pgd_prepopulate_user_pmd’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
arch/x86/mm/pgtable.c:464:9: warning: ‘free_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
This is one of the last cases in the ongoing effort to globally enable
-Wstringop-overflow.
The alternative to this is to make the originally suggested change:
make the pmds argument from an array pointer to a pointer pointer. That
situation is considered "legal" for C in the sense that it does not have
a way to reason about the storage. i.e.:
-static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
+static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t **pmds)
With the above change, there's no difference in binary output, and the
compiler warning is silenced.
However, with this patch, the compiler can actually figure out that it
isn't using the code at all, and it gets dropped:
text data bss dec hex filename
8218 718 32 8968 2308 arch/x86/mm/pgtable.o.before
7765 694 32 8491 212b arch/x86/mm/pgtable.o.after
So this case (fixing a warning and reducing image size) is a clear win.
Additionally drops an old work-around for GCC in the same code.
Link: https://github.com/KSPP/linux/issues/203
Link: https://github.com/KSPP/linux/issues/181
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/Yytb67xvrnctxnEe@work
|
|
find_timens_vvar_page() is not architecture-specific, as can be seen from
how all five per-architecture versions of it are the same.
(arm64, powerpc and riscv are exactly the same; x86 and s390 have two
characters difference inside a comment, less blank lines, and mark the
!CONFIG_TIME_NS version as inline.)
Refactor the five copies into a central copy in kernel/time/namespace.c.
Signed-off-by: Jann Horn <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Don't snapshot tsc_khz into per-cpu cpu_tsc_khz if the host TSC is
constant, in which case the actual TSC frequency will never change and thus
capturing TSC during initialization is unnecessary, KVM can simply use
tsc_khz. This value is snapshotted from
kvm_timer_init->kvmclock_cpu_online->tsc_khz_changed(NULL)
On CPUs with constant TSC, but not a hardware-specified TSC frequency,
snapshotting cpu_tsc_khz and using that to set a VM's target TSC frequency
can lead to VM to think its TSC frequency is not what it actually is if
refining the TSC completes after KVM snapshots tsc_khz. The actual
frequency never changes, only the kernel's calculation of what that
frequency is changes.
Ideally, KVM would not be able to race with TSC refinement, or would have
a hook into tsc_refine_calibration_work() to get an alert when refinement
is complete. Avoiding the race altogether isn't practical as refinement
takes a relative eternity; it's deliberately put on a work queue outside of
the normal boot sequence to avoid unnecessarily delaying boot.
Adding a hook is doable, but somewhat gross due to KVM's ability to be
built as a module. And if the TSC is constant, which is likely the case
for every VMX/SVM-capable CPU produced in the last decade, the race can be
hit if and only if userspace is able to create a VM before TSC refinement
completes; refinement is slow, but not that slow.
For now, punt on a proper fix, as not taking a snapshot can help some uses
cases and not taking a snapshot is arguably correct irrespective of the
race with refinement.
Signed-off-by: Anton Romanov <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|