Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"Most of this is part of my ongoing work to clean up the system call
tables. In this bit, all of the newer architectures are converted to
use the machine readable syscall.tbl format instead in place of
complex macros in include/uapi/asm-generic/unistd.h.
This follows an earlier series that fixed various API mismatches and
in turn is used as the base for planned simplifications.
The other two patches are dead code removal and a warning fix"
* tag 'asm-generic-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
vmlinux.lds.h: catch .bss..L* sections into BSS")
fixmap: Remove unused set_fixmap_offset_io()
riscv: convert to generic syscall table
openrisc: convert to generic syscall table
nios2: convert to generic syscall table
loongarch: convert to generic syscall table
hexagon: use new system call table
csky: convert to generic syscall table
arm64: rework compat syscall macros
arm64: generate 64-bit syscall.tbl
arm64: convert unistd_32.h to syscall.tbl format
arc: convert to generic syscall table
clone3: drop __ARCH_WANT_SYS_CLONE3 macro
kbuild: add syscall table generation to scripts/Makefile.asm-headers
kbuild: verify asm-generic header list
loongarch: avoid generating extra header files
um: don't generate asm/bpf_perf_event.h
csky: drop asm/gpio.h wrapper
syscalls: add generic scripts/syscall.tbl
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- Add support for running the kernel in a SEV-SNP guest, over a Secure
VM Service Module (SVSM).
When running over a SVSM, different services can run at different
protection levels, apart from the guest OS but still within the
secure SNP environment. They can provide services to the guest, like
a vTPM, for example.
This series adds the required facilities to interface with such a
SVSM module.
- The usual fixlets, refactoring and cleanups
[ And as always: "SEV" is AMD's "Secure Encrypted Virtualization".
I can't be the only one who gets all the newer x86 TLA's confused,
can I?
- Linus ]
* tag 'x86_sev_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/ABI/configfs-tsm: Fix an unexpected indentation silly
x86/sev: Do RMP memory coverage check after max_pfn has been set
x86/sev: Move SEV compilation units
virt: sev-guest: Mark driver struct with __refdata to prevent section mismatch
x86/sev: Allow non-VMPL0 execution when an SVSM is present
x86/sev: Extend the config-fs attestation support for an SVSM
x86/sev: Take advantage of configfs visibility support in TSM
fs/configfs: Add a callback to determine attribute visibility
sev-guest: configfs-tsm: Allow the privlevel_floor attribute to be updated
virt: sev-guest: Choose the VMPCK key based on executing VMPL
x86/sev: Provide guest VMPL level to userspace
x86/sev: Provide SVSM discovery support
x86/sev: Use the SVSM to create a vCPU when not in VMPL0
x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0
x86/sev: Use kernel provided SVSM Calling Areas
x86/sev: Check for the presence of an SVSM in the SNP secrets page
x86/irqflags: Provide native versions of the local_irq_save()/restore()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
- Enable Sub-NUMA clustering to work with resource control on Intel by
teaching resctrl to handle scopes due to the clustering which
partitions the L3 cache into sets. Modify and extend the subsystem to
handle such scopes properly
* tag 'x86_cache_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Update documentation with Sub-NUMA cluster changes
x86/resctrl: Detect Sub-NUMA Cluster (SNC) mode
x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems
x86/resctrl: Make __mon_event_count() handle sum domains
x86/resctrl: Fill out rmid_read structure for smp_call*() to read a counter
x86/resctrl: Handle removing directories in Sub-NUMA Cluster (SNC) mode
x86/resctrl: Create Sub-NUMA Cluster (SNC) monitor files
x86/resctrl: Allocate a new field in union mon_data_bits
x86/resctrl: Refactor mkdir_mondata_subdir() with a helper function
x86/resctrl: Initialize on-stack struct rmid_read instances
x86/resctrl: Add a new field to struct rmid_read for summation of domains
x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files
x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster (SNC) systems
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Add node-scope to the options for feature scope
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
x86/resctrl: Prepare for different scope for control/monitor operations
x86/resctrl: Prepare to split rdt_domain structure
x86/resctrl: Prepare for new domain scope
|
|
Similar to kvm_x86_call(), kvm_pmu_call() is added to streamline the usage
of static calls of kvm_pmu_ops, which improves code readability.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Introduces kvm_x86_call(), to streamline the usage of static calls of
kvm_x86_ops. The current implementation of these calls is verbose and
could lead to alignment challenges. This makes the code susceptible to
exceeding the "80 columns per single line of code" limit as defined in
the coding-style document. Another issue with the existing implementation
is that the addition of kvm_x86_ prefix to hooks at the static_call sites
hinders code readability and navigation. kvm_x86_call() is added to
improve code readability and maintainability, while adhering to the coding
style guidelines.
Signed-off-by: Wei Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The use of static_call_cond() is essentially the same as static_call() on
x86 (e.g. static_call() now handles a NULL pointer as a NOP), so replace
it with static_call() to simplify the code.
Link: https://lore.kernel.org/all/3916caa1dcd114301a49beafa5030eca396745c1.1679456900.git.jpoimboe@kernel.org/
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The GHCB 2.0 specification defines 2 GHCB request types to allow SNP guests
to send encrypted messages/requests to firmware: SNP Guest Requests and SNP
Extended Guest Requests. These encrypted messages are used for things like
servicing attestation requests issued by the guest. Implementing support for
these is required to be fully GHCB-compliant.
For the most part, KVM only needs to handle forwarding these requests to
firmware (to be issued via the SNP_GUEST_REQUEST firmware command defined
in the SEV-SNP Firmware ABI), and then forwarding the encrypted response to
the guest.
However, in the case of SNP Extended Guest Requests, the host is also
able to provide the certificate data corresponding to the endorsement key
used by firmware to sign attestation report requests. This certificate data
is provided by userspace because:
1) It allows for different keys/key types to be used for each particular
guest with requiring any sort of KVM API to configure the certificate
table in advance on a per-guest basis.
2) It provides additional flexibility with how attestation requests might
be handled during live migration where the certificate data for
source/dest might be different.
3) It allows all synchronization between certificates and firmware/signing
key updates to be handled purely by userspace rather than requiring
some in-kernel mechanism to facilitate it. [1]
To support fetching certificate data from userspace, a new KVM exit type will
be needed to handle fetching the certificate from userspace. An attempt to
define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle this
was introduced in v1 of this patchset, but is still being discussed by
community, so for now this patchset only implements a stub version of SNP
Extended Guest Requests that does not provide certificate data, but is still
enough to provide compliance with the GHCB 2.0 spec.
|
|
sev_guest.h currently contains various definitions relating to the
format of SNP_GUEST_REQUEST commands to SNP firmware. Currently only the
sev-guest driver makes use of them, but when the KVM side of this is
implemented there's a need to parse the SNP_GUEST_REQUEST header to
determine whether additional information needs to be provided to the
guest. Prepare for this by moving those definitions to a common header
that's shared by host/guest code so that KVM can also make use of them.
Reviewed-by: Tom Lendacky <[email protected]>
Reviewed-by: Liam Merwick <[email protected]>
Signed-off-by: Michael Roth <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
KVM VMX changes for 6.11
- Remove an unnecessary EPT TLB flush when enabling hardware.
- Fix a series of bugs that cause KVM to fail to detect nested pending posted
interrupts as valid wake eents for a vCPU executing HLT in L2 (with
HLT-exiting disable by L1).
- Misc cleanups
|
|
KVM x86/pmu changes for 6.11
- Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads
'0' and writes from userspace are ignored.
- Update to the newfangled Intel CPU FMS infrastructure.
- Use macros instead of open-coded literals to clean up KVM's manipulation of
FIXED_CTR_CTRL MSRs.
|
|
KVM x86 MTRR virtualization removal
Remove support for virtualizing MTRRs on Intel CPUs, along with a nasty CR0.CD
hack, and instead always honor guest PAT on CPUs that support self-snoop.
|
|
KVM x86 misc changes for 6.11
- Add a global struct to consolidate tracking of host values, e.g. EFER, and
move "shadow_phys_bits" into the structure as "maxphyaddr".
- Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
bus frequency, because TDX.
- Print the name of the APICv/AVIC inhibits in the relevant tracepoint.
- Clean up KVM's handling of vendor specific emulation to consistently act on
"compatible with Intel/AMD", versus checking for a specific vendor.
- Misc cleanups
|
|
KVM generic changes for 6.11
- Enable halt poll shrinking by default, as Intel found it to be a clear win.
- Setup empty IRQ routing when creating a VM to avoid having to synchronize
SRCU when creating a split IRQCHIP on x86.
- Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
that arch code can use for hooking both sched_in() and sched_out().
- Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
truncating a bogus value from userspace, e.g. to help userspace detect bugs.
- Mark a vCPU as preempted if and only if it's scheduled out while in the
KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
memory when retrieving guest state during live migration blackout.
- A few minor cleanups
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu model updates from Borislav Petkov:
- Flip the logic to add feature names to /proc/cpuinfo to having to
explicitly specify the flag if there's a valid reason to show it in
/proc/cpuinfo
- Switch a bunch of Intel x86 model checking code to the new CPU model
defines
- Fixes and cleanups
* tag 'x86_cpu_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/intel: Drop stray FAM6 check with new Intel CPU model defines
x86/cpufeatures: Flip the /proc/cpuinfo appearance logic
x86/CPU/AMD: Always inline amd_clear_divider()
x86/mce/inject: Add missing MODULE_DESCRIPTION() line
perf/x86/rapl: Switch to new Intel CPU model defines
x86/boot: Switch to new Intel CPU model defines
x86/cpu: Switch to new Intel CPU model defines
perf/x86/intel: Switch to new Intel CPU model defines
x86/virt/tdx: Switch to new Intel CPU model defines
x86/PCI: Switch to new Intel CPU model defines
x86/cpu/intel: Switch to new Intel CPU model defines
x86/platform/intel-mid: Switch to new Intel CPU model defines
x86/pconfig: Remove unused MKTME pconfig code
x86/cpu: Remove useless work in detect_tme_early()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vmware updates from Borislav Petkov:
- Add a unified VMware hypercall API layer which should be used by all
callers instead of them doing homegrown solutions. This will provide
for adding API support for confidential computing solutions like TDX
* tag 'x86_vmware_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vmware: Add TDX hypercall support
x86/vmware: Remove legacy VMWARE_HYPERCALL* macros
x86/vmware: Correct macro names
x86/vmware: Use VMware hypercall API
drm/vmwgfx: Use VMware hypercall API
input/vmmouse: Use VMware hypercall API
ptp/vmware: Use VMware hypercall API
x86/vmware: Introduce VMware hypercall API
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
- Make error checking of AMD SMN accesses more robust in the callers as
they're the only ones who can interpret the results properly
- The usual cleanups and fixes, left and right
* tag 'x86_misc_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kmsan: Fix hook for unaligned accesses
x86/platform/iosf_mbi: Convert PCIBIOS_* return codes to errnos
x86/pci/xen: Fix PCIBIOS_* return code handling
x86/pci/intel_mid_pci: Fix PCIBIOS_* return code handling
x86/of: Return consistent error type from x86_of_pci_irq_enable()
hwmon: (k10temp) Rename _data variable
hwmon: (k10temp) Remove unused HAVE_TDIE() macro
hwmon: (k10temp) Reduce k10temp_get_ccd_support() parameters
hwmon: (k10temp) Define a helper function to read CCD temperature
x86/amd_nb: Enhance SMN access error checking
hwmon: (k10temp) Check return value of amd_smn_read()
EDAC/amd64: Check return value of amd_smn_read()
EDAC/amd64: Remove unused register accesses
tools/x86/kcpuid: Add missing dir via Makefile
x86, arm: Add missing license tag to syscall tables files
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 confidential computing updates from Borislav Petkov:
"Unrelated x86/cc changes queued here to avoid ugly cross-merges and
conflicts:
- Carve out CPU hotplug function declarations into a separate header
with the goal to be able to use the lockdep assertions in a more
flexible manner
- As a result, refactor cacheinfo code after carving out a function
to return the cache ID associated with a given cache level
- Cleanups
Add support to be able to kexec TDX guests:
- Expand ACPI MADT CPU offlining support
- Add machinery to prepare CoCo guests memory before kexec-ing into a
new kernel
- Cleanup, readjust and massage related code"
* tag 'x86_cc_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed
x86/acpi: Add support for CPU offlining for ACPI MADT wakeup method
x86/mm: Introduce kernel_ident_mapping_free()
x86/smp: Add smp_ops.stop_this_cpu() callback
x86/acpi: Do not attempt to bring up secondary CPUs in the kexec case
x86/acpi: Rename fields in the acpi_madt_multiproc_wakeup structure
x86/mm: Do not zap page table entries mapping unaccepted memory table during kdump
x86/mm: Make e820__end_ram_pfn() cover E820_TYPE_ACPI ranges
x86/tdx: Convert shared memory back to private on kexec
x86/mm: Add callbacks to prepare encrypted memory for kexec
x86/tdx: Account shared memory
x86/mm: Return correct level from lookup_address() if pte is none
x86/mm: Make x86_platform.guest.enc_status_change_*() return an error
x86/kexec: Keep CR4.MCE set during kexec for TDX guest
x86/relocate_kernel: Use named labels for less confusion
cpu/hotplug, x86/acpi: Disable CPU offlining for ACPI MADT wakeup
cpu/hotplug: Add support for declaring CPU offlining not supported
x86/apic: Mark acpi_mp_wake_* variables as __ro_after_init
x86/acpi: Extract ACPI MADT wakeup code into a separate file
x86/kexec: Remove spurious unconditional JMP from from identity_mapped()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Borislav Petkov:
- Add a check to warn when cmdline parsing happens before the final
cmdline string has been built and thus arguments can get lost
- Code cleanups and simplifications
* tag 'x86_boot_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/setup: Warn when option parsing is done too early
x86/boot: Clean up the arch/x86/boot/main.c code a bit
x86/boot: Use current_stack_pointer to avoid asm() in init_heap()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 alternatives updates from Borislav Petkov:
"This is basically PeterZ's idea to nest the alternative macros to
avoid the need to "spell out" the number of alternates in an
ALTERNATIVE_n() macro and thus have an ever-increasing complexity in
those definitions.
For ease of bisection, the old macros are converted to the new, nested
variants in a step-by-step manner so that in case an issue is
encountered during testing, one can pinpoint the place where it fails
easier.
Because debugging alternatives is a serious pain"
* tag 'x86_alternatives_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives, kvm: Fix a couple of CALLs without a frame pointer
x86/alternative: Replace the old macros
x86/alternative: Convert the asm ALTERNATIVE_3() macro
x86/alternative: Convert the asm ALTERNATIVE_2() macro
x86/alternative: Convert the asm ALTERNATIVE() macro
x86/alternative: Convert ALTERNATIVE_3()
x86/alternative: Convert ALTERNATIVE_TERNARY()
x86/alternative: Convert alternative_call_2()
x86/alternative: Convert alternative_call()
x86/alternative: Convert alternative_io()
x86/alternative: Convert alternative_input()
x86/alternative: Convert alternative_2()
x86/alternative: Convert alternative()
x86/alternatives: Add nested alternatives macros
x86/alternative: Zap alternative_ternary()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
- A cleanup and a correction to the error injection driver to inject a
MCA_MISC value only when one has actually been supplied by the user
* tag 'ras_core_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Remove unused variable and return value in machine_check_poll()
x86/mce/inject: Only write MCA_MISC when a value has been supplied
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Updates for timers, timekeeping and related functionality:
Core:
- Make the takeover of a hrtimer based broadcast timer reliable
during CPU hot-unplug. The current implementation suffers from a
race which can lead to broadcast timer starvation in the worst
case.
- VDSO related cleanups and simplifications
- Small cleanups and enhancements all over the place
PTP:
- Replace the architecture specific base clock to clocksource, e.g.
ART to TSC, conversion function with generic functionality to avoid
exposing such internals to drivers and convert all existing drivers
over. This also allows to provide functionality which converts the
other way round in the core code based on the same parameter set.
- Provide a function to convert CLOCK_REALTIME to the base clock to
support the upcoming PPS output driver on Intel platforms.
Drivers:
- A set of Device Tree bindings for new hardware
- Cleanups and enhancements all over the place"
* tag 'timers-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
clocksource/drivers/realtek: Add timer driver for rtl-otto platforms
dt-bindings: timer: Add schema for realtek,otto-timer
dt-bindings: timer: Add SOPHGO SG2002 clint
dt-bindings: timer: renesas,tmu: Add R-Car Gen2 support
dt-bindings: timer: renesas,tmu: Add RZ/G1 support
dt-bindings: timer: renesas,tmu: Add R-Mobile APE6 support
clocksource/drivers/mips-gic-timer: Correct sched_clock width
clocksource/drivers/mips-gic-timer: Refine rating computation
clocksource/drivers/sh_cmt: Address race condition for clock events
clocksource/driver/arm_global_timer: Remove unnecessary ‘0’ values from err
clocksource/drivers/arm_arch_timer: Remove unnecessary ‘0’ values from irq
tick/broadcast: Make takeover of broadcast hrtimer reliable
tick/sched: Combine WARN_ON_ONCE and print_once
x86/vdso: Remove unused include
x86/vgtod: Remove unused typedef gtod_long_t
x86/vdso: Fix function reference in comment
vdso: Add comment about reason for vdso struct ordering
vdso/gettimeofday: Clarify comment about open coded function
timekeeping: Add missing kernel-doc function comments
tick: Remove unnused tick_nohz_get_idle_calls()
...
|
|
Merge minor word-at-a-time instruction choice improvements for x86 and
arm64.
This is the second of four branches that came out of me looking at the
code generation for path lookup on arm64.
The word-at-a-time infrastructure is used to do string operations in
chunks of one word both when copying the pathname from user space (in
strncpy_from_user()), and when parsing and hashing the individual path
components (in link_path_walk()).
In particular, the "find the first zero byte" uses various bit tricks to
figure out the end of the string or path component, and get the length
without having to do things one byte at a time. Both x86-64 and arm64
had less than optimal code choices for that.
The commit message for the arm64 change in particular tries to explain
the exact code flow for the zero byte finding for people who care. It's
made a bit more complicated by the fact that we support big-endian
hardware too, and so we have some extra abstraction layers to allow
different models for finding the zero byte, quite apart from the issue
of picking specialized instructions.
* word-at-a-time:
arm64: word-at-a-time: improve byte count calculations for LE
x86-64: word-at-a-time: improve byte count calculations
|
|
Merge runtime constants infrastructure with implementations for x86 and
arm64.
This is one of four branches that came out of me looking at profiles of
my kernel build filesystem load on my 128-core Altra arm64 system, where
pathname walking and the user copies (particularly strncpy_from_user()
for fetching the pathname from user space) is very hot.
This is a very specialized "instruction alternatives" model where the
dentry hash pointer and hash count will be constants for the lifetime of
the kernel, but the allocation are not static but done early during the
kernel boot. In order to avoid the pointer load and dynamic shift, we
just rewrite the constants in the instructions in place.
We can't use the "generic" alternative instructions infrastructure,
because different architectures do it very differently, and it's
actually simpler to just have very specific helpers, with a fallback to
the generic ("old") model of just using variables for architectures that
do not implement the runtime constant patching infrastructure.
Link: https://lore.kernel.org/all/CAHk-=widPe38fUNjUOmX11ByDckaeEo9tN4Eiyke9u1SAtu9sA@mail.gmail.com/
* runtime-constants:
arm64: add 'runtime constant' support
runtime constants: add x86 architecture support
runtime constants: add default dummy infrastructure
vfs: dcache: move hashlen_hash() from callers into d_hash()
|
|
https://git.linaro.org/people/daniel.lezcano/linux into timers/core
Pull clocksource/event driver updates from Daniel Lezcano:
- Remove unnecessary local variables initialization as they will be
initialized in the code path anyway right after on the ARM arch
timer and the ARM global timer (Li kunyu)
- Fix a race condition in the interrupt leading to a deadlock on the
SH CMT driver. Note that this fix was not tested on the platform
using this timer but the fix seems reasonable enough to be picked
confidently (Niklas Söderlund)
- Increase the rating of the gic-timer and use the configured width
clocksource register on the MIPS architecture (Jiaxun Yang)
- Add the DT bindings for the TMU on the Renesas platforms (Geert
Uytterhoeven)
- Add the DT bindings for the SOPHGO SG2002 clint on RiscV (Thomas
Bonnefille)
- Add the rtl-otto timer driver along with the DT bindings for the
Realtek platform (Chris Packham)
Link: https://lore.kernel.org/all/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.11
1. Add ParaVirt steal time support.
2. Add some VM migration enhancement.
3. Add perf kvm-stat support for loongarch.
|
|
When clone3() was introduced, it was not obvious how each architecture
deals with setting up the stack and keeping the register contents in
a fork()-like system call, so this was left for the architecture
maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those
that already implement it.
Five years later, we still have a few architectures left that are missing
clone3(), and the macro keeps getting in the way as it's fundamentally
different from all the other __ARCH_WANT_SYS_* macros that are meant
to provide backwards-compatibility with applications using older
syscalls that are no longer provided by default.
Address this by reversing the polarity of the macro, adding an
__ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't
already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3
from all the other ones.
Acked-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
|
|
The smbios.c source file is not currently included in the x86 build, and
before we can do so, it needs some tweaks to build correctly in
combination with the EFI mixed mode support.
Reviewed-by: Lukas Wunner <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
|
|
The architectural performance monitoring V6 supports a new range of
counters' MSRs in the 19xxH address range. They include all the GP
counter MSRs, the GP control MSRs, and the fixed counter MSRs.
The step between each sibling counter is 4. Add intel_pmu_addr_offset()
to calculate the correct offset.
Add fixedctr in struct x86_pmu to store the address of the fixed counter
0. It can be used to calculate the rest of the fixed counters.
The MSR address of the fixed counter control is not changed.
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Two new fields (the unit mask2, and the equal flag) are added in the
IA32_PERFEVTSELx MSRs. They can be enumerated by the CPUID.23H.0.EBX.
Update the config_mask in x86_pmu and x86_hybrid_pmu for the true layout
of the PERFEVTSEL.
Expose the new formats into sysfs if they are available. The umask
extension reuses the same format attr name "umask" as the previous
umask. Add umask2_show to determine/display the correct format
for the current machine.
Co-developed-by: Dapeng Mi <[email protected]>
Signed-off-by: Dapeng Mi <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
From PMU's perspective, Lunar Lake and Arrow Lake are similar to the
previous generation Meteor Lake. Both are hybrid platforms, with e-core
and p-core.
The key differences include:
- The e-core supports 3 new fixed counters
- The p-core supports an updated PEBS Data Source format
- More GP counters (Updated event constraint table)
- New Architectural performance monitoring V6
(New Perfmon MSRs aliasing, umask2, eq).
- New PEBS format V6 (Counters Snapshotting group)
- New RDPMC metrics clear mode
The legacy features, the 3 new fixed counters and updated event
constraint table are enabled in this patch.
The new PEBS data source format, the architectural performance
monitoring V6, the PEBS format V6, and the new RDPMC metrics clear mode
are supported in the following patches.
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The current perf assumes that the counters that support PEBS are
contiguous. But it's not guaranteed with the new leaf 0x23 introduced.
The counters are enumerated with a counter mask. There may be holes in
the counter mask for future platforms or in a virtualization
environment.
Store the PEBS event mask rather than the maximum number of PEBS
counters in the x86 PMU structures.
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Including hrtimer.h is not required and is probably a historical
leftover. Remove it.
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The typedef gtod_long_t is not used anymore so remove it.
The header file contains then only includes dependent on
CONFIG_GENERIC_GETTIMEOFDAY to not break ARCH=um. Nevertheless, keep the
header file only with those includes to prevent spreading ifdeffery all
over the place.
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Replace the reference to the non-existent function arch_vdso_cycles_valid()
by the proper function arch_vdso_cycles_ok().
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
LAM can only be enabled when a process is single-threaded. But _kernel_
threads can temporarily use a single-threaded process's mm. That means
that a context-switching kernel thread can race and observe the mm's LAM
metadata (mm->context.lam_cr3_mask) change.
The context switch code does two logical things with that metadata:
populate CR3 and populate 'cpu_tlbstate.lam'. If it hits this race,
'cpu_tlbstate.lam' and CR3 can end up out of sync.
This de-synchronization is currently harmless. But it is confusing and
might lead to warnings or real bugs.
Update set_tlbstate_lam_mode() to take in the LAM mask and untag mask
instead of an mm_struct pointer, and while we are at it, rename it to
cpu_tlbstate_update_lam(). This should also make it clearer that we are
updating cpu_tlbstate. In switch_mm_irqs_off(), read the LAM mask once
and use it for both the cpu_tlbstate update and the CR3 update.
Signed-off-by: Yosry Ahmed <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/all/20240702132139.3332013-3-yosryahmed%40google.com
|
|
Hardware has two RMID configuration options for SNC systems. The default
mode divides RMID counters between SNC nodes. E.g. with 200 RMIDs and
two SNC nodes per L3 cache RMIDs 0..99 are used on node 0, and 100..199
on node 1. This isn't compatible with Linux resctrl usage. On this
example system a process using RMID 5 would only update monitor counters
while running on SNC node 0.
The other mode is "RMID Sharing Mode". This is enabled by clearing bit
0 of the RMID_SNC_CONFIG (0xCA0) model specific register. In this mode
the number of logical RMIDs is the number of physical RMIDs (from CPUID
leaf 0xF) divided by the number of SNC nodes per L3 cache instance. A
process can use the same RMID across different SNC nodes.
See the "Intel Resource Director Technology Architecture Specification"
for additional details.
When SNC is enabled, update the MSR when a monitor domain is marked
online. Technically this is overkill. It only needs to be done once
per L3 cache instance rather than per SNC domain. But there is no harm
in doing it more than once, and this is not in a critical path.
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Between kexec and confidential VM support, handling the EFI memory maps
correctly on x86 is already proving to be rather difficult (as opposed
to other EFI architectures which manage to never modify the EFI memory
map to begin with)
EFI fake memory map support is essentially a development hack (for
testing new support for the 'special purpose' and 'more reliable' EFI
memory attributes) that leaked into production code. The regions marked
in this manner are not actually recognized as such by the firmware
itself or the EFI stub (and never have), and marking memory as 'more
reliable' seems rather futile if the underlying memory is just ordinary
RAM.
Marking memory as 'special purpose' in this way is also dubious, but may
be in use in production code nonetheless. However, the same should be
achievable by using the memmap= command line option with the ! operator.
EFI fake memmap support is not enabled by any of the major distros
(Debian, Fedora, SUSE, Ubuntu) and does not exist on other
architectures, so let's drop support for it.
Acked-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Dan Williams <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
|
|
objtool complains:
arch/x86/kvm/kvm.o: warning: objtool: .altinstr_replacement+0xc5: call without frame pointer save/setup
vmlinux.o: warning: objtool: .altinstr_replacement+0x2eb: call without frame pointer save/setup
Make sure %rSP is an output operand to the respective asm() statements.
The test_cc() hunk and ALT_OUTPUT_SP() courtesy of peterz. Also from him
add some helpful debugging info to the documentation.
Now on to the explanations:
tl;dr: The alternatives macros are pretty fragile.
If I do ALT_OUTPUT_SP(output) in order to be able to package in a %rsp
reference for objtool so that a stack frame gets properly generated, the
inline asm input operand with positional argument 0 in clear_page():
"0" (page)
gets "renumbered" due to the added
: "+r" (current_stack_pointer), "=D" (page)
and then gcc says:
./arch/x86/include/asm/page_64.h:53:9: error: inconsistent operand constraints in an ‘asm’
The fix is to use an explicit "D" constraint which points to a singleton
register class (gcc terminology) which ends up doing what is expected
here: the page pointer - input and output - should be in the same %rdi
register.
Other register classes have more than one register in them - example:
"r" and "=r" or "A":
‘A’
The ‘a’ and ‘d’ registers. This class is used for
instructions that return double word results in the ‘ax:dx’
register pair. Single word values will be allocated either in
‘ax’ or ‘dx’.
so using "D" and "=D" just works in this particular case.
And yes, one would say, sure, why don't you do "+D" but then:
: "+r" (current_stack_pointer), "+D" (page)
: [old] "i" (clear_page_orig), [new1] "i" (clear_page_rep), [new2] "i" (clear_page_erms),
: "cc", "memory", "rax", "rcx")
now find the Waldo^Wcomma which throws a wrench into all this.
Because that silly macro has an "input..." consume-all last macro arg
and in it, one is supposed to supply input *and* clobbers, leading to
silly syntax snafus.
Yap, they need to be cleaned up, one fine day...
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Sean Christopherson <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/20240625112056.GDZnqoGDXgYuWBDUwu@fat_crate.local
|
|
The kernel test robot reported that clang no longer compiles the 32-bit
x86 kernel in some configurations due to commit 95ece48165c1
("locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}()
functions").
The build fails with
arch/x86/include/asm/cmpxchg_32.h:149:9: error: inline assembly requires more registers than available
and the reason seems to be that not only does the cmpxchg8b instruction
need four fixed registers (EDX:EAX and ECX:EBX), with the emulation
fallback the inline asm also wants a fifth fixed register for the
address (it uses %esi for that, but that's just a software convention
with cmpxchg8b_emu).
Avoiding using another pointer input to the asm (and just forcing it to
use the "0(%esi)" addressing that we end up requiring for the sw
fallback) seems to fix the issue.
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Fixes: 95ece48165c1 ("locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions")
Link: https://lore.kernel.org/all/[email protected]/
Suggested-by: Uros Bizjak <[email protected]>
Reviewed-and-Tested-by: Uros Bizjak <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- Remove invalid tty __counted_by annotation (Nathan Chancellor)
- Add missing MODULE_DESCRIPTION()s for KUnit string tests (Jeff
Johnson)
- Remove non-functional per-arch kstack entropy filtering
* tag 'hardening-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
tty: mxser: Remove __counted_by from mxser_board.ports[]
randomize_kstack: Remove non-functional per-arch entropy filtering
string: kunit: add missing MODULE_DESCRIPTION() macros
|
|
When CPUID[6].EAX[15] is set to 1, this CPU supports notification for
HWP (Hardware P-states) highest performance change.
Add a feature flag to check if the CPU supports HWP highest performance
change.
Signed-off-by: Srinivas Pandruvada <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Refine the macros which define maximum General Purpose (GP) and fixed
counter numbers.
Currently the macro KVM_INTEL_PMC_MAX_GENERIC is used to represent the
maximum supported General Purpose (GP) counter number ambiguously across
Intel and AMD platforms. This would cause issues if AMD begins to support
more GP counters than Intel.
Thus a bunch of new macros including vendor specific and vendor
independent are introduced to replace the old macros. The vendor
independent macros are used in x86 common code to hide vendor difference
and eliminate the ambiguity.
No logic changes are introduced in this patch.
Signed-off-by: Dapeng Mi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
|
|
Check for a Requested Virtual Interrupt, i.e. a virtual interrupt that is
pending delivery, in vmx_has_nested_events() and drop the one-off
kvm_x86_ops.guest_apic_has_interrupt() hook.
In addition to dropping a superfluous hook, this fixes a bug where KVM
would incorrectly treat virtual interrupts _for L2_ as always enabled due
to kvm_arch_interrupt_allowed(), by way of vmx_interrupt_blocked(),
treating IRQs as enabled if L2 is active and vmcs12 is configured to exit
on IRQs, i.e. KVM would treat a virtual interrupt for L2 as a valid wake
event based on L1's IRQ blocking status.
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
When requesting an immediate exit from L2 in order to inject a pending
event, do so only if the pending event actually requires manual injection,
i.e. if and only if KVM actually needs to regain control in order to
deliver the event.
Avoiding the "immediate exit" isn't simply an optimization, it's necessary
to make forward progress, as the "already expired" VMX preemption timer
trick that KVM uses to force a VM-Exit has higher priority than events
that aren't directly injected.
At present time, this is a glorified nop as all events processed by
vmx_has_nested_events() require injection, but that will not hold true in
the future, e.g. if there's a pending virtual interrupt in vmcs02.RVI.
I.e. if KVM is trying to deliver a virtual interrupt to L2, the expired
VMX preemption timer will trigger VM-Exit before the virtual interrupt is
delivered, and KVM will effectively hang the vCPU in an endless loop of
forced immediate VM-Exits (because the pending virtual interrupt never
goes away).
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
|
|
An unintended consequence of commit 9c573cd31343 ("randomize_kstack:
Improve entropy diffusion") was that the per-architecture entropy size
filtering reduced how many bits were being added to the mix, rather than
how many bits were being used during the offsetting. All architectures
fell back to the existing default of 0x3FF (10 bits), which will consume
at most 1KiB of stack space. It seems that this is working just fine,
so let's avoid the confusion and update everything to use the default.
The prior intent of the per-architecture limits were:
arm64: capped at 0x1FF (9 bits), 5 bits effective
powerpc: uncapped (10 bits), 6 or 7 bits effective
riscv: uncapped (10 bits), 6 bits effective
x86: capped at 0xFF (8 bits), 5 (x86_64) or 6 (ia32) bits effective
s390: capped at 0xFF (8 bits), undocumented effective entropy
Current discussion has led to just dropping the original per-architecture
filters. The additional entropy appears to be safe for arm64, x86,
and s390. Quoting Arnd, "There is no point pretending that 15.75KB is
somehow safe to use while 15.00KB is not."
Co-developed-by: Yuntao Liu <[email protected]>
Signed-off-by: Yuntao Liu <[email protected]>
Fixes: 9c573cd31343 ("randomize_kstack: Improve entropy diffusion")
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Arnd Bergmann <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Acked-by: Heiko Carstens <[email protected]> # s390
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux
Merge more amd-pstate driver updates for 6.11 from Mario Limonciello:
"Add support for amd-pstate core performance boost support which
allows controlling which CPU cores can operate above nominal
frequencies for short periods of time."
* tag 'amd-pstate-v6.11-2024-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
Documentation: cpufreq: amd-pstate: update doc for Per CPU boost control method
cpufreq: amd-pstate: Cap the CPPC.max_perf to nominal_perf if CPB is off
cpufreq: amd-pstate: initialize core precision boost state
cpufreq: acpi: move MSR_K7_HWCR_CPB_DIS_BIT into msr-index.h
|
|
|
|
There are some other drivers also need to use the
MSR_K7_HWCR_CPB_DIS_BIT for CPB control bit, so it makes sense to move
the definition to a common header file to allow other driver to use it.
No intentional functional impact.
Suggested-by: Gautham Ranjal Shenoy <[email protected]>
Signed-off-by: Perry Yuan <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Acked-by: Huang Rui <[email protected]>
Link: https://lore.kernel.org/r/78b6c75e6cffddce3e950dd543af6ae9f8eeccc3.1718988436.git.perry.yuan@amd.com
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mario Limonciello <[email protected]>
|
|
VMware hypercalls use I/O port, VMCALL or VMMCALL instructions. Add a call to
__tdx_hypercall() in order to support TDX guests.
No change in high bandwidth hypercalls, as only low bandwidth ones are supported
for TDX guests.
[ bp: Massage, clear on-stack struct tdx_module_args variable. ]
Co-developed-by: Tim Merrifield <[email protected]>
Signed-off-by: Tim Merrifield <[email protected]>
Signed-off-by: Alexey Makhalov <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|