Age | Commit message (Collapse) | Author | Files | Lines |
|
Merge user access fast validation using address masking.
This allows architectures to optionally use a data dependent address
masking model instead of a conditional branch for validating user
accesses. That avoids the Spectre-v1 speculation barriers.
Right now only x86-64 takes advantage of this, and not all architectures
will be able to do it. It requires a guard region between the user and
kernel address spaces (so that you can't overflow from one to the
other), and an easy way to generate a guaranteed-to-fault address for
invalid user pointers.
Also note that this currently assumes that there is no difference
between user read and write accesses. If extended to architectures like
powerpc, we'll also need to separate out the user read-vs-write cases.
* address-masking:
x86: make the masked_user_access_begin() macro use its argument only once
x86: do the user address masking outside the user access area
x86: support user address masking instead of non-speculative conditional
|
|
This doesn't actually matter for any of the current users, but before
merging it mainline, make sure we don't have any surprising semantics.
We don't actually want to use an inline function here, because we want
to allow - but not require - const pointer arguments, and return them as
such. But we already had a local auto-type variable, so let's just use
it to avoid any possible double evaluation.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Introduce '__attribute__((bpf_fastcall))' for helpers and kfuncs with
corresponding support in LLVM.
It is similar to existing 'no_caller_saved_registers' attribute in
GCC/LLVM with a provision for backward compatibility. It allows
compilers generate more efficient BPF code assuming the verifier or
JITs will inline or partially inline a helper/kfunc with such
attribute. bpf_cast_to_kern_ctx, bpf_rdonly_cast,
bpf_get_smp_processor_id are the first set of such helpers.
- Harden and extend ELF build ID parsing logic.
When called from sleepable context the relevants parts of ELF file
will be read to find and fetch .note.gnu.build-id information. Also
harden the logic to avoid TOCTOU, overflow, out-of-bounds problems.
- Improvements and fixes for sched-ext:
- Allow passing BPF iterators as kfunc arguments
- Make the pointer returned from iter_next method trusted
- Fix x86 JIT convergence issue due to growing/shrinking conditional
jumps in variable length encoding
- BPF_LSM related:
- Introduce few VFS kfuncs and consolidate them in
fs/bpf_fs_kfuncs.c
- Enforce correct range of return values from certain LSM hooks
- Disallow attaching to other LSM hooks
- Prerequisite work for upcoming Qdisc in BPF:
- Allow kptrs in program provided structs
- Support for gen_epilogue in verifier_ops
- Important fixes:
- Fix uprobe multi pid filter check
- Fix bpf_strtol and bpf_strtoul helpers
- Track equal scalars history on per-instruction level
- Fix tailcall hierarchy on x86 and arm64
- Fix signed division overflow to prevent INT_MIN/-1 trap on x86
- Fix get kernel stack in BPF progs attached to tracepoint:syscall
- Selftests:
- Add uprobe bench/stress tool
- Generate file dependencies to drastically improve re-build time
- Match JIT-ed and BPF asm with __xlated/__jited keywords
- Convert older tests to test_progs framework
- Add support for RISC-V
- Few fixes when BPF programs are compiled with GCC-BPF backend
(support for GCC-BPF in BPF CI is ongoing in parallel)
- Add traffic monitor
- Enable cross compile and musl libc
* tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (260 commits)
btf: require pahole 1.21+ for DEBUG_INFO_BTF with default DWARF version
btf: move pahole check in scripts/link-vmlinux.sh to lib/Kconfig.debug
btf: remove redundant CONFIG_BPF test in scripts/link-vmlinux.sh
bpf: Call the missed kfree() when there is no special field in btf
bpf: Call the missed btf_record_free() when map creation fails
selftests/bpf: Add a test case to write mtu result into .rodata
selftests/bpf: Add a test case to write strtol result into .rodata
selftests/bpf: Rename ARG_PTR_TO_LONG test description
selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test
bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error
bpf: Improve check_raw_mode_ok test for MEM_UNINIT-tagged types
bpf: Fix helper writes to read-only maps
bpf: Remove truncation test in bpf_strtol and bpf_strtoul helpers
bpf: Fix bpf_strtol and bpf_strtoul helpers for 32bit
selftests/bpf: Add tests for sdiv/smod overflow cases
bpf: Fix a sdiv overflow issue
libbpf: Add bpf_object__token_fd accessor
docs/bpf: Add missing BPF program types to docs
docs/bpf: Add constant values for linkages
bpf: Use fake pt_regs when doing bpf syscall tracepoint tracing
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Many singleton patches - please see the various changelogs for
details.
Quite a lot of nilfs2 work this time around.
Notable patch series in this pull request are:
- "mul_u64_u64_div_u64: new implementation" by Nicolas Pitre, with
assistance from Uwe Kleine-König. Reimplement mul_u64_u64_div_u64()
to provide (much) more accurate results. The current implementation
was causing Uwe some issues in the PWM drivers.
- "xz: Updates to license, filters, and compression options" from
Lasse Collin. Miscellaneous maintenance and kinor feature work to
the xz decompressor.
- "Fix some GDB command error and add some GDB commands" from
Kuan-Ying Lee. Fixes and enhancements to the gdb scripts.
- "treewide: add missing MODULE_DESCRIPTION() macros" from Jeff
Johnson. Adds lots of MODULE_DESCRIPTIONs, thus fixing lots of
warnings about this.
- "nilfs2: add support for some common ioctls" from Ryusuke Konishi.
Adds various commonly-available ioctls to nilfs2.
- "This series fixes a number of formatting issues in kernel doc
comments" from Ryusuke Konishi does that.
- "nilfs2: prevent unexpected ENOENT propagation" from Ryusuke
Konishi. Fix issues where -ENOENT was being unintentionally and
inappropriately returned to userspace.
- "nilfs2: assorted cleanups" from Huang Xiaojia.
- "nilfs2: fix potential issues with empty b-tree nodes" from Ryusuke
Konishi fixes some issues which can occur on corrupted nilfs2
filesystems.
- "scripts/decode_stacktrace.sh: improve error reporting and
usability" from Luca Ceresoli does those things"
* tag 'mm-nonmm-stable-2024-09-21-07-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (103 commits)
list: test: increase coverage of list_test_list_replace*()
list: test: fix tests for list_cut_position()
proc: use __auto_type more
treewide: correct the typo 'retun'
ocfs2: cleanup return value and mlog in ocfs2_global_read_info()
nilfs2: remove duplicate 'unlikely()' usage
nilfs2: fix potential oob read in nilfs_btree_check_delete()
nilfs2: determine empty node blocks as corrupted
nilfs2: fix potential null-ptr-deref in nilfs_btree_insert()
user_namespace: use kmemdup_array() instead of kmemdup() for multiple allocation
tools/mm: rm thp_swap_allocator_test when make clean
squashfs: fix percpu address space issues in decompressor_multi_percpu.c
lib: glob.c: added null check for character class
nilfs2: refactor nilfs_segctor_thread()
nilfs2: use kthread_create and kthread_stop for the log writer thread
nilfs2: remove sc_timer_task
nilfs2: do not repair reserved inode bitmap in nilfs_new_inode()
nilfs2: eliminate the shared counter and spinlock for i_generation
nilfs2: separate inode type information from i_state field
nilfs2: use the BITS_PER_LONG macro
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Along with the usual shower of singleton patches, notable patch series
in this pull request are:
- "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
consistency to the APIs and behaviour of these two core allocation
functions. This also simplifies/enables Rustification.
- "Some cleanups for shmem" from Baolin Wang. No functional changes -
mode code reuse, better function naming, logic simplifications.
- "mm: some small page fault cleanups" from Josef Bacik. No
functional changes - code cleanups only.
- "Various memory tiering fixes" from Zi Yan. A small fix and a
little cleanup.
- "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
simplifications and .text shrinkage.
- "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
Butt. This is a feature, it adds new feilds to /proc/vmstat such as
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0
which tells us that 11391 processes used 4k of stack while none at
all used 16k. Useful for some system tuning things, but
partivularly useful for "the dynamic kernel stack project".
- "kmemleak: support for percpu memory leak detect" from Pavel
Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.
- "mm: memcg: page counters optimizations" from Roman Gushchin. "3
independent small optimizations of page counters".
- "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
David Hildenbrand. Improves PTE/PMD splitlock detection, makes
powerpc/8xx work correctly by design rather than by accident.
- "mm: remove arch_make_page_accessible()" from David Hildenbrand.
Some folio conversions which make arch_make_page_accessible()
unneeded.
- "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
Finkel. Cleans up and fixes our handling of the resetting of the
cgroup/process peak-memory-use detector.
- "Make core VMA operations internal and testable" from Lorenzo
Stoakes. Rationalizaion and encapsulation of the VMA manipulation
APIs. With a view to better enable testing of the VMA functions,
even from a userspace-only harness.
- "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
issues in the zswap global shrinker, resulting in improved
performance.
- "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
in some missing info in /proc/zoneinfo.
- "mm: replace follow_page() by folio_walk" from David Hildenbrand.
Code cleanups and rationalizations (conversion to folio_walk())
resulting in the removal of follow_page().
- "improving dynamic zswap shrinker protection scheme" from Nhat
Pham. Some tuning to improve zswap's dynamic shrinker. Significant
reductions in swapin and improvements in performance are shown.
- "mm: Fix several issues with unaccepted memory" from Kirill
Shutemov. Improvements to the new unaccepted memory feature,
- "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
DAX PUDs. This was missing, although nobody seems to have notied
yet.
- "Introduce a store type enum for the Maple tree" from Sidhartha
Kumar. Cleanups and modest performance improvements for the maple
tree library code.
- "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
more cgroup v1 remnants away from the v2 memcg code.
- "memcg: initiate deprecation of v1 features" from Shakeel Butt.
Adds various warnings telling users that memcg v1 features are
deprecated.
- "mm: swap: mTHP swap allocator base on swap cluster order" from
Chris Li. Greatly improves the success rate of the mTHP swap
allocation.
- "mm: introduce numa_memblks" from Mike Rapoport. Moves various
disparate per-arch implementations of numa_memblk code into generic
code.
- "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
improves the performance of munmap() of swap-filled ptes.
- "support large folio swap-out and swap-in for shmem" from Baolin
Wang. With this series we no longer split shmem large folios into
simgle-page folios when swapping out shmem.
- "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
performance improvements and code reductions for gigantic folios.
- "support shmem mTHP collapse" from Baolin Wang. Adds support for
khugepaged's collapsing of shmem mTHP folios.
- "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
performance regression due to the addition of mseal().
- "Increase the number of bits available in page_type" from Matthew
Wilcox. Increases the number of bits available in page_type!
- "Simplify the page flags a little" from Matthew Wilcox. Many legacy
page flags are now folio flags, so the page-based flags and their
accessors/mutators can be removed.
- "mm: store zero pages to be swapped out in a bitmap" from Usama
Arif. An optimization which permits us to avoid writing/reading
zero-filled zswap pages to backing store.
- "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
window which occurs when a MAP_FIXED operqtion is occurring during
an unrelated vma tree walk.
- "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
the vma_merge() functionality, making ot cleaner, more testable and
better tested.
- "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
Minor fixups of DAMON selftests and kunit tests.
- "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
Code cleanups and folio conversions.
- "Shmem mTHP controls and stats improvements" from Ryan Roberts.
Cleanups for shmem controls and stats.
- "mm: count the number of anonymous THPs per size" from Barry Song.
Expose additional anon THP stats to userspace for improved tuning.
- "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
folio conversions and removal of now-unused page-based APIs.
- "replace per-quota region priorities histogram buffer with
per-context one" from SeongJae Park. DAMON histogram
rationalization.
- "Docs/damon: update GitHub repo URLs and maintainer-profile" from
SeongJae Park. DAMON documentation updates.
- "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
improve related doc and warn" from Jason Wang: fixes usage of page
allocator __GFP_NOFAIL and GFP_ATOMIC flags.
- "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
This was overprovisioning THPs in sparsely accessed memory areas.
- "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
Add support for zram run-time compression algorithm tuning.
- "mm: Care about shadow stack guard gap when getting an unmapped
area" from Mark Brown. Fix up the various arch_get_unmapped_area()
implementations to better respect guard areas.
- "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
of mem_cgroup_iter() and various code cleanups.
- "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
pfnmap support.
- "resource: Fix region_intersects() vs add_memory_driver_managed()"
from Huang Ying. Fix a bug in region_intersects() for systems with
CXL memory.
- "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
a couple more code paths to correctly recover from the encountering
of poisoned memry.
- "mm: enable large folios swap-in support" from Barry Song. Support
the swapin of mTHP memory into appropriately-sized folios, rather
than into single-page folios"
* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
zram: free secondary algorithms names
uprobes: turn xol_area->pages[2] into xol_area->page
uprobes: introduce the global struct vm_special_mapping xol_mapping
Revert "uprobes: use vm_special_mapping close() functionality"
mm: support large folios swap-in for sync io devices
mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
set_memory: add __must_check to generic stubs
mm/vma: return the exact errno in vms_gather_munmap_vmas()
memcg: cleanup with !CONFIG_MEMCG_V1
mm/show_mem.c: report alloc tags in human readable units
mm: support poison recovery from copy_present_page()
mm: support poison recovery from do_cow_fault()
resource, kunit: add test case for region_intersects()
resource: make alloc_free_mem_region() works for iomem_resource
mm: z3fold: deprecate CONFIG_Z3FOLD
vfio/pci: implement huge_fault support
mm/arm64: support large pfn mappings
mm/x86: support large pfn mappings
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RT enablement from Thomas Gleixner:
"Enable PREEMPT_RT on supported architectures:
After twenty years of development we finally reached the point to
enable PREEMPT_RT support in the mainline kernel.
All prerequisites are merged, so enable it on the supported
architectures ARM64, RISCV and X86(32/64-bit)"
* tag 'sched-rt-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
riscv: Allow to enable PREEMPT_RT.
arm64: Allow to enable PREEMPT_RT.
x86: Allow to enable PREEMPT_RT.
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- support DMA zones for arm64 systems where memory starts at > 4GB
(Baruch Siach, Catalin Marinas)
- support direct calls into dma-iommu and thus obsolete dma_map_ops for
many common configurations (Leon Romanovsky)
- add DMA-API tracing (Sean Anderson)
- remove the not very useful return value from various dma_set_* APIs
(Christoph Hellwig)
- misc cleanups and minor optimizations (Chen Y, Yosry Ahmed, Christoph
Hellwig)
* tag 'dma-mapping-6.12-2024-09-19' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: reflow dma_supported
dma-mapping: reliably inform about DMA support for IOMMU
dma-mapping: add tracing for dma-mapping API calls
dma-mapping: use IOMMU DMA calls for common alloc/free page calls
dma-direct: optimize page freeing when it is not addressable
dma-mapping: clearly mark DMA ops as an architecture feature
vdpa_sim: don't select DMA_OPS
arm64: mm: keep low RAM dma zone
dma-mapping: don't return errors from dma_set_max_seg_size
dma-mapping: don't return errors from dma_set_seg_boundary
dma-mapping: don't return errors from dma_set_min_align_mask
scsi: check that busses support the DMA API before setting dma parameters
arm64: mm: fix DMA zone when dma-ranges is missing
dma-mapping: direct calls for dma-iommu
dma-mapping: call ->unmap_page and ->unmap_sg unconditionally
arm64: support DMA zone above 4GB
dma-mapping: replace zone_dma_bits by zone_dma_limit
dma-mapping: use bit masking to check VM_DMA_COHERENT
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform drivers updates from Hans de Goede:
- asus-wmi: Add support for vivobook fan profiles
- dell-laptop: Add knobs to change battery charge settings
- lg-laptop: Add operation region support
- intel-uncore-freq: Add support for efficiency latency control
- intel/ifs: Add SBAF test support
- intel/pmc: Ignore all LTRs during suspend
- platform/surface: Support for arm64 based Surface devices
- wmi: Pass event data directly to legacy notify handlers
- x86/platform/geode: switch GPIO buttons and LEDs to software
properties
- bunch of small cleanups, fixes, hw-id additions, etc.
* tag 'platform-drivers-x86-v6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (65 commits)
MAINTAINERS: adjust file entry in INTEL MID PLATFORM
platform/x86: x86-android-tablets: Adjust Xiaomi Pad 2 bottom bezel touch buttons LED
platform/mellanox: mlxbf-pmc: fix lockdep warning
platform/x86/amd: pmf: Add quirk for TUF Gaming A14
platform/x86: touchscreen_dmi: add nanote-next quirk
platform/x86: asus-wmi: don't fail if platform_profile already registered
platform/x86: asus-wmi: add debug print in more key places
platform/x86: intel_scu_wdt: Move intel_scu_wdt.h to x86 subfolder
platform/x86: intel_scu_ipc: Move intel_scu_ipc.h out of arch/x86/include/asm
MAINTAINERS: Add Intel MID section
platform/x86: panasonic-laptop: Add support for programmable buttons
platform/olpc: Remove redundant null pointer checks in olpc_ec_setup_debugfs()
platform/x86: intel/pmc: Ignore all LTRs during suspend
platform/x86: wmi: Call both legacy and WMI driver notify handlers
platform/x86: wmi: Merge get_event_data() with wmi_get_notify_data()
platform/x86: wmi: Remove wmi_get_event_data()
platform/x86: wmi: Pass event data directly to legacy notify handlers
platform/x86: thinkpad_acpi: Fix uninitialized symbol 's' warning
platform/x86: x86-android-tablets: Fix spelling in the comments
platform/x86: ideapad-laptop: Make the scope_guard() clear of its scope
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- fix a boot problem as a Xen dom0 on some AMD systems
- fix Xen PVH boot problems with KASAN enabled
- fix for a build warning
- fixes to swiotlb-xen
* tag 'for-linus-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/swiotlb: fix allocated size
xen/swiotlb: add alignment check for dma buffers
xen/pci: Avoid -Wflex-array-member-not-at-end warning
xen/xenbus: Convert to use ERR_CAST()
xen, pvh: fix unbootable VMs by inlining memset() in xen_prepare_pvh()
x86/cpu: fix unbootable VMs by inlining memcmp() in hypervisor_cpuid_base()
xen, pvh: fix unbootable VMs (PVH + KASAN - AMD_MEM_ENCRYPT)
xen: tolerate ACPI NVS memory overlapping with Xen allocated memory
xen: allow mapping ACPI data using a different physical address
xen: add capability to remap non-RAM pages to different PFNs
xen: move max_pfn in xen_memory_setup() out of function scope
xen: move checks for e820 conflicts further up
xen: introduce generic helper checking for memory map conflicts
xen: use correct end address of kernel for conflict checking
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V updates from Wei Liu:
- Optimize boot time by concurrent execution of hv_synic_init()
(Saurabh Sengar)
- Use helpers to read control registers in hv_snp_boot_ap() (Yosry
Ahmed)
- Add memory allocation check in hv_fcopy_start (Zhu Jun)
* tag 'hyperv-next-signed-20240916' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
tools/hv: Add memory allocation check in hv_fcopy_start
x86/hyperv: use helpers to read control registers in hv_snp_boot_ap()
Drivers: hv: vmbus: Optimize boot time by concurrent execution of hv_synic_init()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"Originally I'd planned on sending each of the vDSO getrandom()
architecture ports to their respective arch trees. But as we started
to work on this, we found lots of interesting issues in the shared
code and infrastructure, the fixes for which the various archs needed
to base their work.
So in the end, this turned into a nice collaborative effort fixing up
issues and porting to 5 new architectures -- arm64, powerpc64,
powerpc32, s390x, and loongarch64 -- with everybody pitching in and
commenting on each other's code. It was a fun development cycle.
This contains:
- Numerous fixups to the vDSO selftest infrastructure, getting it
running successfully on more platforms, and fixing bugs in it.
- Additions to the vDSO getrandom & chacha selftests. Basically every
time manual review unearthed a bug in a revision of an arch patch,
or an ambiguity, the tests were augmented.
By the time the last arch was submitted for review, s390x, v1 of
the series was essentially fine right out of the gate.
- Fixes to the the generic C implementation of vDSO getrandom, to
build and run successfully on all archs, decoupling it from
assumptions we had (unintentionally) made on x86_64 that didn't
carry through to the other architectures.
- Port of vDSO getrandom to LoongArch64, from Xi Ruoyao and acked by
Huacai Chen.
- Port of vDSO getrandom to ARM64, from Adhemerval Zanella and acked
by Will Deacon.
- Port of vDSO getrandom to PowerPC, in both 32-bit and 64-bit
varieties, from Christophe Leroy and acked by Michael Ellerman.
- Port of vDSO getrandom to S390X from Heiko Carstens, the arch
maintainer.
While it'd be natural for there to be things to fix up over the course
of the development cycle, these patches got a decent amount of review
from a fairly diverse crew of folks on the mailing lists, and, for the
most part, they've been cooking in linux-next, which has been helpful
for ironing out build issues.
In terms of architectures, I think that mostly takes care of the
important 64-bit archs with hardware still being produced and running
production loads in settings where vDSO getrandom is likely to help.
Arguably there's still RISC-V left, and we'll see for 6.13 whether
they find it useful and submit a port"
* tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (47 commits)
selftests: vDSO: check cpu caps before running chacha test
s390/vdso: Wire up getrandom() vdso implementation
s390/vdso: Move vdso symbol handling to separate header file
s390/vdso: Allow alternatives in vdso code
s390/module: Provide find_section() helper
s390/facility: Let test_facility() generate static branch if possible
s390/alternatives: Remove ALT_FACILITY_EARLY
s390/facility: Disable compile time optimization for decompressor code
selftests: vDSO: fix vdso_config for s390
selftests: vDSO: fix ELF hash table entry size for s390x
powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64
powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32
powerpc/vdso: Refactor CFLAGS for CVDSO build
powerpc/vdso32: Add crtsavres
mm: Define VM_DROPPABLE for powerpc/32
powerpc/vdso: Fix VDSO data access when running in a non-root time namespace
selftests: vDSO: don't include generated headers for chacha test
arm64: vDSO: Wire up getrandom() vDSO implementation
arm64: alternative: make alternative_has_cap_likely() VDSO compatible
selftests: vDSO: also test counter in vdso_test_chacha
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Implement per-PMU context rescheduling to significantly improve
single-PMU performance, and related cleanups/fixes (Peter Zijlstra
and Namhyung Kim)
- Fix ancient bug resulting in a lot of events being dropped
erroneously at higher sampling frequencies (Luo Gengkun)
- uprobes enhancements:
- Implement RCU-protected hot path optimizations for better
performance:
"For baseline vs SRCU, peak througput increased from 3.7 M/s
(million uprobe triggerings per second) up to about 8 M/s. For
uretprobes it's a bit more modest with bump from 2.4 M/s to
5 M/s.
For SRCU vs RCU Tasks Trace, peak throughput for uprobes
increases further from 8 M/s to 10.3 M/s (+28%!), and for
uretprobes from 5.3 M/s to 5.8 M/s (+11%), as we have more
work to do on uretprobes side.
Even single-thread (no contention) performance is slightly
better: 3.276 M/s to 3.396 M/s (+3.5%) for uprobes, and 2.055
M/s to 2.174 M/s (+5.8%) for uretprobes."
(Andrii Nakryiko et al)
- Document mmap_lock, don't abuse get_user_pages_remote() (Oleg
Nesterov)
- Cleanups & fixes to prepare for future work:
- Remove uprobe_register_refctr()
- Simplify error handling for alloc_uprobe()
- Make uprobe_register() return struct uprobe *
- Fold __uprobe_unregister() into uprobe_unregister()
- Shift put_uprobe() from delete_uprobe() to uprobe_unregister()
- BPF: Fix use-after-free in bpf_uprobe_multi_link_attach()
(Oleg Nesterov)
- New feature & ABI extension: allow events to use PERF_SAMPLE READ
with inheritance, enabling sample based profiling of a group of
counters over a hierarchy of processes or threads (Ben Gainey)
- Intel uncore & power events updates:
- Add Arrow Lake and Lunar Lake support
- Add PERF_EV_CAP_READ_SCOPE
- Clean up and enhance cpumask and hotplug support
(Kan Liang)
- Add LNL uncore iMC freerunning support
- Use D0:F0 as a default device
(Zhenyu Wang)
- Intel PT: fix AUX snapshot handling race (Adrian Hunter)
- Misc fixes and cleanups (James Clark, Jiri Olsa, Oleg Nesterov and
Peter Zijlstra)
* tag 'perf-core-2024-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
dmaengine: idxd: Clean up cpumask and hotplug for perfmon
iommu/vt-d: Clean up cpumask and hotplug for perfmon
perf/x86/intel/cstate: Clean up cpumask and hotplug
perf: Add PERF_EV_CAP_READ_SCOPE
perf: Generic hotplug support for a PMU with a scope
uprobes: perform lockless SRCU-protected uprobes_tree lookup
rbtree: provide rb_find_rcu() / rb_find_add_rcu()
perf/uprobe: split uprobe_unregister()
uprobes: travers uprobe's consumer list locklessly under SRCU protection
uprobes: get rid of enum uprobe_filter_ctx in uprobe filter callbacks
uprobes: protected uprobe lifetime with SRCU
uprobes: revamp uprobe refcounting and lifetime management
bpf: Fix use-after-free in bpf_uprobe_multi_link_attach()
perf/core: Fix small negative period being ignored
perf: Really fix event_function_call() locking
perf: Optimize __pmu_ctx_sched_out()
perf: Add context time freeze
perf: Fix event_function_call() locking
perf: Extract a few helpers
perf: Optimize context reschedule for single PMU cases
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:
"Left over from the v6.11 cycle:
- Fix energy-pkg event enumeration on certain AMD CPUs
- Set up the LBR branch stack for BPF counting events too"
* tag 'perf-urgent-2024-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Allow to setup LBR for counting event for BPF
perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Neeraj Upadhyay:
"Context tracking:
- rename context tracking state related symbols and remove references
to "dynticks" in various context tracking state variables and
related helpers
- force context_tracking_enabled_this_cpu() to be inlined to avoid
leaving a noinstr section
CSD lock:
- enhance CSD-lock diagnostic reports
- add an API to provide an indication of ongoing CSD-lock stall
nocb:
- update and simplify RCU nocb code to handle (de-)offloading of
callbacks only for offline CPUs
- fix RT throttling hrtimer being armed from offline CPU
rcutorture:
- remove redundant rcu_torture_ops get_gp_completed fields
- add SRCU ->same_gp_state and ->get_comp_state functions
- add generic test for NUM_ACTIVE_*RCU_POLL* for testing RCU and SRCU
polled grace periods
- add CFcommon.arch for arch-specific Kconfig options
- print number of update types in rcu_torture_write_types()
- add rcutree.nohz_full_patience_delay testing to the TREE07 scenario
- add a stall_cpu_repeat module parameter to test repeated CPU stalls
- add argument to limit number of CPUs a guest OS can use in
torture.sh
rcustall:
- abbreviate RCU CPU stall warnings during CSD-lock stalls
- Allow dump_cpu_task() to be called without disabling preemption
- defer printing stall-warning backtrace when holding rcu_node lock
srcu:
- make SRCU gp seq wrap-around faster
- add KCSAN checks for concurrent updates to ->srcu_n_exp_nodelay and
->reschedule_count which are used in heuristics governing
auto-expediting of normal SRCU grace periods and
grace-period-state-machine delays
- mark idle SRCU-barrier callbacks to help identify stuck
SRCU-barrier callback
rcu tasks:
- remove RCU Tasks Rude asynchronous APIs as they are no longer used
- stop testing RCU Tasks Rude asynchronous APIs
- fix access to non-existent percpu regions
- check processor-ID assumptions during chosen CPU calculation for
callback enqueuing
- update description of rtp->tasks_gp_seq grace-period sequence
number
- add rcu_barrier_cb_is_done() to identify whether a given
rcu_barrier callback is stuck
- mark idle Tasks-RCU-barrier callbacks
- add *torture_stats_print() functions to print detailed diagnostics
for Tasks-RCU variants
- capture start time of rcu_barrier_tasks*() operation to help
distinguish a hung barrier operation from a long series of barrier
operations
refscale:
- add a TINY scenario to support tests of Tiny RCU and Tiny
SRCU
- optimize process_durations() operation
rcuscale:
- dump stacks of stalled rcu_scale_writer() instances and
grace-period statistics when rcu_scale_writer() stalls
- mark idle RCU-barrier callbacks to identify stuck RCU-barrier
callbacks
- print detailed grace-period and barrier diagnostics on
rcu_scale_writer() hangs for Tasks-RCU variants
- warn if async module parameter is specified for RCU implementations
that do not have async primitives such as RCU Tasks Rude
- make all writer tasks report upon hang
- tolerate repeated GFP_KERNEL failure in rcu_scale_writer()
- use special allocator for rcu_scale_writer()
- NULL out top-level pointers to heap memory to avoid double-free
bugs on modprobe failures
- maintain per-task instead of per-CPU callbacks count to avoid any
issues with migration of either tasks or callbacks
- constify struct ref_scale_ops
Fixes:
- use system_unbound_wq for kfree_rcu work to avoid disturbing
isolated CPUs
Misc:
- warn on unexpected rcu_state.srs_done_tail state
- better define "atomic" for list_replace_rcu() and
hlist_replace_rcu() routines
- annotate struct kvfree_rcu_bulk_data with __counted_by()"
* tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (90 commits)
rcu: Defer printing stall-warning backtrace when holding rcu_node lock
rcu/nocb: Remove superfluous memory barrier after bypass enqueue
rcu/nocb: Conditionally wake up rcuo if not already waiting on GP
rcu/nocb: Fix RT throttling hrtimer armed from offline CPU
rcu/nocb: Simplify (de-)offloading state machine
context_tracking: Tag context_tracking_enabled_this_cpu() __always_inline
context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching
rcu: Update stray documentation references to rcu_dynticks_eqs_{enter, exit}()
rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()
rcu: Rename rcu_implicit_dynticks_qs() into rcu_watching_snap_recheck()
rcu: Rename dyntick_save_progress_counter() into rcu_watching_snap_save()
rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap
rcu: Rename struct rcu_data .dynticks_snap into .watching_snap
rcu: Rename rcu_dynticks_zero_in_eqs() into rcu_watching_zero_in_eqs()
rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since()
rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()
rcu: Rename rcu_dynticks_eqs_online() into rcu_watching_online()
context_tracking, rcu: Rename rcu_dynticks_curr_cpu_in_eqs() into rcu_is_watching_curr_cpu()
context_tracking, rcu: Rename rcu_dynticks_task*() into rcu_task*()
refscale: Constify struct ref_scale_ops
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Thomas Gleixner:
- Use the topology information of number of packages for making the
decision about TSC trust instead of using the number of online nodes
which is not reflecting the real topology.
- Stop the PIT timer 0 when its not in use as to stop pointless
emulation in the VMM.
- Fix the PIT timer stop sequence for timer 0 so it truly stops both
real hardware and buggy VMM emulations.
* tag 'x86-timers-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Check for sockets instead of CPUs to make code match comment
clockevents/drivers/i8253: Fix stop sequence for timer 0
x86/i8253: Disable PIT timer 0 when not in use
x86/tsc: Use topology_max_packages() to get package number
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Thomas Gleixner:
- Rework kcpuid to handle the the autogenerated CSV file correctly and
update the CSV file to cover the whole zoo of CPUID.
- Avoid memcpy() for ia32 syscall_get_arguments() and use direct
assignments as fortified memcpy() is unhappy about writing/reading
beyond the end of the addresses destination/source struct member
- A few new PCI IDs for AMD
- Update MAINTAINERS to cover x86 specific selftests
* tag 'x86-misc-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add selftests/x86 entry
x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h-70h
x86/syscall: Avoid memcpy() for ia32 syscall_get_arguments()
MAINTAINERS: Add x86 cpuid database entry
tools/x86/kcpuid: Introduce a complete cpuid bitfields CSV file
tools/x86/kcpuid: Parse subleaf ranges if provided
tools/x86/kcpuid: Recognize all leaves with subleaves
tools/x86/kcpuid: Strip bitfield names leading/trailing whitespace
tools/x86/kcpuid: Protect against faulty "max subleaf" values
tools/x86/kcpuid: Set max possible subleaves count to 64
tools/x86/kcpuid: Properly align long-description columns
tools/x86/kcpuid: Remove unused variable
x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform update from Thomas Gleixner:
"Remove a stale declaration from the UV platform code"
* tag 'x86-platform-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/uv: Remove unused declaration uv_irq_2_mmr_info()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 memory management updates from Thomas Gleixner:
- Make LAM enablement safe vs. kernel threads using a process mm
temporarily as switching back to the process would not update CR3 and
therefore not enable LAM causing faults in user space when using
tagged pointers. Cure it by synchronizing LAM enablement via IPIs to
all CPUs which use the related mm.
- Cure a LAM harmless inconsistency between CR3 and the state during
context switch. It's both confusing and prone to lead to real bugs
- Handle alt stack handling for threads which run with a non-zero
protection key. The non-zero key prevents the kernel to access the
alternate stack. Cure it by temporarily enabling all protection keys
for the alternate stack setup/restore operations.
- Provide a EFI config table identity mapping for kexec kernel to
prevent kexec fails because the new kernel cannot access the config
table array
- Use GB pages only when a full GB is mapped in the identity map as
otherwise the CPU can speculate into reserved areas after the end of
memory which causes malfunction on UV systems.
- Remove the noisy and pointless SRAT table dump during boot
- Use is_ioremap_addr() for iounmap() address range checks instead of
high_memory. is_ioremap_addr() is more precise.
* tag 'x86-mm-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioremap: Improve iounmap() address range checks
x86/mm: Remove duplicate check from build_cr3()
x86/mm: Remove unused NX related declarations
x86/mm: Remove unused CR3_HW_ASID_BITS
x86/mm: Don't print out SRAT table information
x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
x86/kexec: Add EFI config table identity mapping for kexec kernel
selftests/mm: Add new testcases for pkeys
x86/pkeys: Restore altstack access in sigreturn()
x86/pkeys: Update PKRU to enable all pkeys before XSAVE
x86/pkeys: Add helper functions to update PKRU on the sigframe
x86/pkeys: Add PKRU as a parameter in signal handling functions
x86/mm: Cleanup prctl_enable_tagged_addr() nr_bits error checking
x86/mm: Fix LAM inconsistency during context switch
x86/mm: Use IPIs to synchronize LAM enablement
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FRED updates from Thomas Gleixner:
- Enable FRED right after init_mem_mapping() because at that point the
early IDT fault handler is replaced by the real fault handler. The
real fault handler retrieves the faulting address from the stack
frame and not from CR2 when the FRED feature is set. But that
obviously only works when FRED is enabled in the CPU as well.
- Set SS to __KERNEL_DS when enabling FRED to prevent a corner case
where ERETS can observe a SS mismatch and raises a #GP.
* tag 'x86-fred-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry: Set FRED RSP0 on return to userspace instead of context switch
x86/msr: Switch between WRMSRNS and WRMSR with the alternatives mechanism
x86/entry: Test ti_work for zero before processing individual bits
x86/fred: Set SS to __KERNEL_DS when enabling FRED
x86/fred: Enable FRED right after init_mem_mapping()
x86/fred: Move FRED RSP initialization into a separate function
x86/fred: Parse cmdline param "fred=" in cpu_parse_early_param()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Thomas Gleixner:
"Provide FPU buffer layout in core dumps:
Debuggers have guess the FPU buffer layout in core dumps, which is
error prone. This is because AMD and Intel layouts differ.
To avoid buggy heuristics add a ELF section which describes the buffer
layout which can be retrieved by tools"
* tag 'x86-fpu-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/elf: Add a new FPU buffer layout info to x86 core files
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core update from Thomas Gleixner:
"Enable UBSAN traps for x86, which provides better reporting through
metadata encodeded into UD1"
* tag 'x86-core-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/traps: Enable UBSAN traps on x86
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Thomas Gleixner:
- Handle an allocation failure in the IO/APIC code gracefully instead
of crashing the machine.
- Remove support for APIC local destination mode on 64bit
Logical destination mode of the local APIC is used for systems with
up to 8 CPUs. It has an advantage over physical destination mode as
it allows to target multiple CPUs at once with IPIs. That advantage
was definitely worth it when systems with up to 8 CPUs were state of
the art for servers and workstations, but that's history.
In the recent past there were quite some reports of new laptops
failing to boot with logical destination mode, but they work fine
with physical destination mode. That's not a suprise because physical
destination mode is guaranteed to work as it's the only way to get a
CPU up and running via the INIT/INIT/STARTUP sequence. Some of the
affected systems were cured by BIOS updates, but not all OEMs provide
them.
As the number of CPUs keep increasing, logical destination mode
becomes less used and the benefit for small systems, like laptops, is
not really worth the trouble. So just remove logical destination mode
support for 64bit and be done with it.
- Code and comment cleanups in the APIC area.
* tag 'x86-apic-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Fix comment on IRQ vector layout
x86/apic: Remove unused extern declarations
x86/apic: Remove logical destination mode for 64-bit
x86/apic: Remove unused inline function apic_set_eoi_cb()
x86/ioapic: Cleanup remaining coding style issues
x86/ioapic: Cleanup line breaks
x86/ioapic: Cleanup bracket usage
x86/ioapic: Cleanup comments
x86/ioapic: Move replace_pin_at_irq_node() to the call site
iommu/vt-d: Cleanup apic_printk()
x86/mpparse: Cleanup apic_printk()s
x86/ioapic: Cleanup guarded debug printk()s
x86/ioapic: Cleanup apic_printk()s
x86/apic: Cleanup apic_printk()s
x86/apic: Provide apic_printk() helpers
x86/ioapic: Use guard() for locking where applicable
x86/ioapic: Cleanup structs
x86/ioapic: Mark mp_alloc_timer_irq() __init
x86/ioapic: Handle allocation failures gracefully
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Thomas Gleixner:
"A set of cleanups across x86:
- Use memremap() for the EISA probe instead of ioremap(). EISA is
strictly memory and not MMIO
- Cleanups and enhancement all over the place"
* tag 'x86-cleanups-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/EISA: Dereference memory directly instead of using readl()
x86/extable: Remove unused declaration fixup_bug()
x86/boot/64: Strip percpu address space when setting up GDT descriptors
x86/cpu: Clarify the error message when BIOS does not support SGX
x86/kexec: Add comments around swap_pages() assembly to improve readability
x86/kexec: Fix a comment of swap_pages() assembly
x86/sgx: Fix a W=1 build warning in function comment
x86/EISA: Use memremap() to probe for the EISA BIOS signature
x86/mtrr: Remove obsolete declaration for mtrr_bp_restore()
x86/cpu_entry_area: Annotate percpu_setup_exception_stacks() as __init
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Thomas Gleixner:
"Updates for KCOV instrumentation on x86:
- Prevent spurious KCOV coverage in common_interrupt()
- Fixup the KCOV Makefile directive which got stale due to a source
file rename
- Exclude stack unwinding from KCOV as it creates large amounts of
uninteresting coverage
- Provide a self test to validate that KCOV coverage of the interrupt
handling code starts not before preempt count got updated"
* tag 'x86-build-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Ignore stack unwinding in KCOV
module: Fix KCOV-ignored file name
kcov: Add interrupt handling self test
x86/entry: Remove unwanted instrumentation in common_interrupt()
|
|
It is really time.
x86 has all the required architecture related changes, that have been
identified over time, in order to enable PREEMPT_RT. With the recent
printk changes, the last known road block has been addressed.
Allow to enable PREEMPT_RT on x86.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240906111841.562402-2-bigeasy@linutronix.de
|
|
Helpers to install and detect special pmd/pud entries. In short, bit 9 on
x86 is not used for pmd/pud, so we can directly define them the same as
the pte level. One note is that it's also used in _PAGE_BIT_CPA_TEST but
that is only used in the debug test, and shouldn't conflict in this case.
One note is that pxx_set|clear_flags() for pmd/pud will need to be moved
upper so that they can be referenced by the new special bit helpers.
There's no change in the code that was moved.
Link: https://lkml.kernel.org/r/20240826204353.2228736-18-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use the new API that can understand huge pfn mappings.
Link: https://lkml.kernel.org/r/20240826204353.2228736-13-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Core:
- Overhaul of posix-timers in preparation of removing the workaround
for periodic timers which have signal delivery ignored.
- Remove the historical extra jiffie in msleep()
msleep() adds an extra jiffie to the timeout value to ensure
minimal sleep time. The timer wheel ensures minimal sleep time
since the large rewrite to a non-cascading wheel, but the extra
jiffie in msleep() remained unnoticed. Remove it.
- Make the timer slack handling correct for realtime tasks.
The procfs interface is inconsistent and does neither reflect
reality nor conforms to the man page. Show the correct 0 slack for
real time tasks and enforce it at the core level instead of having
inconsistent individual checks in various timer setup functions.
- The usual set of updates and enhancements all over the place.
Drivers:
- Allow the ACPI PM timer to be turned off during suspend
- No new drivers
- The usual updates and enhancements in various drivers"
* tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
ntp: Make sure RTC is synchronized when time goes backwards
treewide: Fix wrong singular form of jiffies in comments
cpu: Use already existing usleep_range()
timers: Rename next_expiry_recalc() to be unique
platform/x86:intel/pmc: Fix comment for the pmc_core_acpi_pm_timer_suspend_resume function
clocksource/drivers/jcore: Use request_percpu_irq()
clocksource/drivers/cadence-ttc: Add missing clk_disable_unprepare in ttc_setup_clockevent
clocksource/drivers/asm9260: Add missing clk_disable_unprepare in asm9260_timer_init
clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
clocksource/drivers/ingenic: Use devm_clk_get_enabled() helpers
platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended
clocksource: acpi_pm: Add external callback for suspend/resume
clocksource/drivers/arm_arch_timer: Using for_each_available_child_of_node_scoped()
dt-bindings: timer: rockchip: Add rk3576 compatible
timers: Annotate possible non critical data race of next_expiry
timers: Remove historical extra jiffie for timeout in msleep()
hrtimer: Use and report correct timerslack values for realtime tasks
hrtimer: Annotate hrtimer_cpu_base_.*_expiry() for sparse.
timers: Add sparse annotation for timer_sync_wait_running().
signal: Replace BUG_ON()s
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"By the number of new lines of code, the most visible change here is
the addition of hybrid CPU capacity scaling support to the
intel_pstate driver. Next are the amd-pstate driver changes related to
the calculation of the AMD boost numerator and preferred core
detection.
As far as new hardware support is concerned, the intel_idle driver
will now handle Granite Rapids Xeon processors natively, the
intel_rapl power capping driver will recognize family 1Ah of AMD
processors and Intel ArrowLake-U chipos, and intel_pstate will handle
Granite Rapids and Sierra Forest chips in the out-of-band (OOB) mode.
Apart from the above, there is a usual collection of assorted fixes
and code cleanups in many places and there are tooling updates.
Specifics:
- Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef)
- Add support for Granite Rapids and Sierra Forest in OOB mode to the
intel_pstate cpufreq driver (Srinivas Pandruvada)
- Add basic support for CPU capacity scaling on x86 and make the
intel_pstate driver set asymmetric CPU capacity on hybrid systems
without SMT (Rafael Wysocki)
- Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq
driver (Jeff Johnson)
- Several OF related cleanups in cpufreq drivers (Rob Herring)
- Enable COMPILE_TEST for ARM drivers (Rob Herrring)
- Introduce quirks for syscon failures and use socinfo to get
revision for TI cpufreq driver (Dhruva Gole, Nishanth Menon)
- Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay
Ugwekar)
- Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
(Danila Tikhonov, Huacai Chen, and Liu Jing)
- Make amd-pstate validate return of any attempt to update EPP
limits, which fixes the masking hardware problems (Mario
Limonciello)
- Move the calculation of the AMD boost numerator outside of
amd-pstate, correcting acpi-cpufreq on systems with preferred cores
(Mario Limonciello)
- Harden preferred core detection in amd-pstate to avoid potential
false positives (Mario Limonciello)
- Add extra unit test coverage for mode state machine (Mario
Limonciello)
- Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang
Liu)
- Add Granite Rapids Xeon support to intel_idle (Artem Bityutskiy)
- Disable promotion to C1E on Jasper Lake and Elkhart Lake in
intel_idle (Kai-Heng Feng)
- Use scoped device node handling to fix missing of_node_put() and
simplify walking OF children in the riscv-sbi cpuidle driver
(Krzysztof Kozlowski)
- Remove dead code from cpuidle_enter_state() (Dhruva Gole)
- Change an error pointer to NULL to fix error handling in the
intel_rapl power capping driver (Dan Carpenter)
- Fix off by one in get_rpi() in the intel_rapl power capping driver
(Dan Carpenter)
- Add support for ArrowLake-U to the intel_rapl power capping driver
(Sumeet Pawnikar)
- Fix the energy-pkg event for AMD CPUs in the intel_rapl power
capping driver (Dhananjay Ugwekar)
- Add support for AMD family 1Ah processors to the intel_rapl power
capping driver (Dhananjay Ugwekar)
- Remove unused stub for saveable_highmem_page() and remove
deprecated macros from power management documentation (Andy
Shevchenko)
- Use ysfs_emit() and sysfs_emit_at() in "show" functions in the PM
sysfs interface (Xueqin Luo)
- Update the maintainers information for the
operating-points-v2-ti-cpu DT binding (Dhruva Gole)
- Drop unnecessary of_match_ptr() from ti-opp-supply (Rob Herring)
- Add missing MODULE_DESCRIPTION() macros to devfreq governors (Jeff
Johnson)
- Use devm_clk_get_enabled() in the exynos-bus devfreq driver (Anand
Moon)
- Use of_property_present() instead of of_get_property() in the
imx-bus devfreq driver (Rob Herring)
- Update directory handling and installation process in the pm-graph
Makefile and add .gitignore to ignore sleepgraph.py artifacts to
pm-graph (Amit Vadhavana, Yo-Jung Lin)
- Make cpupower display residency value in idle-info (Aboorva
Devarajan)
- Add missing powercap_set_enabled() stub function to cpupower (John
B. Wyatt IV)
- Add SWIG support to cpupower (John B. Wyatt IV)"
* tag 'pm-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (62 commits)
cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
cpufreq/amd-pstate-ut: Add test case for mode switches
cpufreq/amd-pstate: Export symbols for changing modes
amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
x86/amd: Move amd_get_highest_perf() out of amd-pstate
ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
ACPI: CPPC: Drop check for non zero perf ratio
x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
PM: hibernate: Remove unused stub for saveable_highmem_page()
pm:cpupower: Add error warning when SWIG is not installed
MAINTAINERS: Add Maintainers for SWIG Python bindings
pm:cpupower: Include test_raw_pylibcpupower.py
pm:cpupower: Add SWIG bindings files for libcpupower
pm:cpupower: Add missing powercap_set_enabled() stub function
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The highlights are support for Arm's "Permission Overlay Extension"
using memory protection keys, support for running as a protected guest
on Android as well as perf support for a bunch of new interconnect
PMUs.
Summary:
ACPI:
- Enable PMCG erratum workaround for HiSilicon HIP10 and 11
platforms.
- Ensure arm64-specific IORT header is covered by MAINTAINERS.
CPU Errata:
- Enable workaround for hardware access/dirty issue on Ampere-1A
cores.
Memory management:
- Define PHYSMEM_END to fix a crash in the amdgpu driver.
- Avoid tripping over invalid kernel mappings on the kexec() path.
- Userspace support for the Permission Overlay Extension (POE) using
protection keys.
Perf and PMUs:
- Add support for the "fixed instruction counter" extension in the
CPU PMU architecture.
- Extend and fix the event encodings for Apple's M1 CPU PMU.
- Allow LSM hooks to decide on SPE permissions for physical
profiling.
- Add support for the CMN S3 and NI-700 PMUs.
Confidential Computing:
- Add support for booting an arm64 kernel as a protected guest under
Android's "Protected KVM" (pKVM) hypervisor.
Selftests:
- Fix vector length issues in the SVE/SME sigreturn tests
- Fix build warning in the ptrace tests.
Timers:
- Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
non-determinism arising from the architected counter.
Miscellaneous:
- Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
don't succeed.
- Minor fixes and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
perf: arm-ni: Fix an NULL vs IS_ERR() bug
arm64: hibernate: Fix warning for cast from restricted gfp_t
arm64: esr: Define ESR_ELx_EC_* constants as UL
arm64: pkeys: remove redundant WARN
perf: arm_pmuv3: Use BR_RETIRED for HW branch event if enabled
MAINTAINERS: List Arm interconnect PMUs as supported
perf: Add driver for Arm NI-700 interconnect PMU
dt-bindings/perf: Add Arm NI-700 PMU
perf/arm-cmn: Improve format attr printing
perf/arm-cmn: Clean up unnecessary NUMA_NO_NODE check
arm64/mm: use lm_alias() with addresses passed to memblock_free()
mm: arm64: document why pte is not advanced in contpte_ptep_set_access_flags()
arm64: Expose the end of the linear map in PHYSMEM_END
arm64: trans_pgd: mark PTEs entries as valid to avoid dead kexec()
arm64/mm: Delete __init region from memblock.reserved
perf/arm-cmn: Support CMN S3
dt-bindings: perf: arm-cmn: Add CMN S3
perf/arm-cmn: Refactor DTC PMU register access
perf/arm-cmn: Make cycle counts less surprising
perf/arm-cmn: Improve build-time assertion
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX updates from Dave Hansen:
"These fix a deadlock in the SGX NUMA allocator.
It's probably only triggerable today on servers with buggy BIOSes, but
it's theoretically possible it can happen on less goofy systems"
* tag 'x86_sgx_for_6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Log information when a node lacks an EPC section
x86/sgx: Fix deadlock in SGX NUMA node search
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 hw mitigation updates from Borislav Petkov:
- Add CONFIG_ option for every hw CPU mitigation. The intent is to
support configurations and scenarios where the mitigations code is
irrelevant
- Other small fixlets and improvements
* tag 'x86_bugs_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Fix handling when SRSO mitigation is disabled
x86/bugs: Add missing NO_SSB flag
Documentation/srso: Document a method for checking safe RET operates properly
x86/bugs: Add a separate config for GDS
x86/bugs: Remove GDS Force Kconfig option
x86/bugs: Add a separate config for SSB
x86/bugs: Add a separate config for Spectre V2
x86/bugs: Add a separate config for SRBDS
x86/bugs: Add a separate config for Spectre v1
x86/bugs: Add a separate config for RETBLEED
x86/bugs: Add a separate config for L1TF
x86/bugs: Add a separate config for MMIO Stable Data
x86/bugs: Add a separate config for TAA
x86/bugs: Add a separate config for MDS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Add the final conversions to the new Intel VFM CPU model matching
macros which include the vendor and finally drop the old ones which
hardcode family 6
* tag 'x86_cpu_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/vfm: Delete all the *_FAM6_ CPU #defines
x86/cpu/vfm: Delete X86_MATCH_INTEL_FAM6_MODEL[_STEPPING]() macros
extcon: axp288: Switch to new Intel CPU model defines
x86/cpu/intel: Replace PAT erratum model/family magic numbers with symbolic IFM references
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- A bunch of cleanups to the sev-guest driver. All in preparation for
future SEV work
* tag 'x86_sev_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
virt: sev-guest: Ensure the SNP guest messages do not exceed a page
virt: sev-guest: Fix user-visible strings
virt: sev-guest: Rename local guest message variables
virt: sev-guest: Replace dev_dbg() with pr_debug()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- Reorganize the struct mce populating functions so that MCA errors
reported through BIOS' BERT method can report the correct CPU number
the error has been detected on
* tag 'ras_core_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Use mce_prep_record() helpers for apei_smca_report_x86_error()
x86/mce: Define mce_prep_record() helpers for common and per-CPU fields
x86/mce: Rename mce_setup() to mce_prep_record()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Borislav Petkov:
- Simplify microcode patches loading on AMD Zen and newer by using the
family, model and stepping encoded in the patch revision number
- Fix a silly clang warning
* tag 'x86_microcode_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Fix a -Wsometimes-uninitialized clang false positive
x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu"
"API:
- Make self-test asynchronous
Algorithms:
- Remove MPI functions added for SM3
- Add allocation error checks to remaining MPI functions (introduced
for SM3)
- Set default Jitter RNG OSR to 3
Drivers:
- Add hwrng driver for Rockchip RK3568 SoC
- Allow disabling SR-IOV VFs through sysfs in qat
- Fix device reset bugs in hisilicon
- Fix authenc key parsing by using generic helper in octeontx*
Others:
- Fix xor benchmarking on parisc"
* tag 'v6.12-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (96 commits)
crypto: n2 - Set err to EINVAL if snprintf fails for hmac
crypto: camm/qi - Use ERR_CAST() to return error-valued pointer
crypto: mips/crc32 - Clean up useless assignment operations
crypto: qcom-rng - rename *_of_data to *_match_data
crypto: qcom-rng - fix support for ACPI-based systems
dt-bindings: crypto: qcom,prng: document support for SA8255p
crypto: aegis128 - Fix indentation issue in crypto_aegis128_process_crypt()
crypto: octeontx* - Select CRYPTO_AUTHENC
crypto: testmgr - Hide ENOENT errors
crypto: qat - Remove trailing space after \n newline
crypto: hisilicon/sec - Remove trailing space after \n newline
crypto: algboss - Pass instance creation error up
crypto: api - Fix generic algorithm self-test races
crypto: hisilicon/qm - inject error before stopping queue
crypto: hisilicon/hpre - mask cluster timeout error
crypto: hisilicon/qm - reset device before enabling it
crypto: hisilicon/trng - modifying the order of header files
crypto: hisilicon - add a lock for the qp send operation
crypto: hisilicon - fix missed error branch
crypto: ccp - do not request interrupt on cmd completion when irqs disabled
...
|
|
Pull kvm fix from Paolo Bonzini:
"Do not always honor guest PAT on CPUs that support self-snoop.
This triggers an issue in the bochsdrm driver, which used ioremap()
instead of ioremap_wc() to map the video RAM.
The revert lets video RAM use the WB memory type instead of the slower
UC memory type"
* tag 'for-linus-6.11' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
Revert "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop"
|
|
This reverts commit 377b2f359d1f71c75f8cc352b5c81f2210312d83.
This caused a regression with the bochsdrm driver, which used ioremap()
instead of ioremap_wc() to map the video RAM. After the commit, the
WB memory type is used without the IGNORE_PAT, resulting in the slower
UC memory type. In fact, UC is slow enough to basically cause guests
to not boot... but only on new processors such as Sapphire Rapids and
Cascade Lake. Coffee Lake for example works properly, though that might
also be an effect of being on a larger, more NUMA system.
The driver has been fixed but that does not help older guests. Until we
figure out whether Cascade Lake and newer processors are working as
intended, revert the commit. Long term we might add a quirk, but the
details depend on whether the processors are working as intended: for
example if they are, the quirk might reference bochs-compatible devices,
e.g. in the name and documentation, so that userspace can disable the
quirk by default and only leave it enabled if such a device is being
exposed to the guest.
If instead this is actually a bug in CLX+, then the actions we need to
take are different and depend on the actual cause of the bug.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Without a prototype, we'll have to add a prototype for each architecture
implementing vDSO getrandom. As most architectures will likely have the
vDSO getrandom implemented in a near future, and we'd like to keep the
declarations compatible everywhere (to ease the libc implementor work),
we should really just have one copy of the prototype.
This also is what's already done inside of include/vdso/gettime.h for
those vDSO functions, so this continues that convention.
Suggested-by: Huacai Chen <chenhuacai@kernel.org>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Acked-by: Huacai Chen <chenhuacai@kernel.org>
[Jason: rewrite docbook comment for prototype.]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Depending on the architecture, building a 32-bit vDSO on a 64-bit kernel
is problematic when some system headers are included.
Minimise the amount of headers by moving needed items, such as
__{get,put}_unaligned_t, into dedicated common headers and in general
use more specific headers, similar to what was done in commit
8165b57bca21 ("linux/const.h: Extract common header for vDSO") and
commit 8c59ab839f52 ("lib/vdso: Enable common headers").
On some architectures this results in missing PAGE_SIZE, as was
described by commit 8b3843ae3634 ("vdso/datapage: Quick fix - use
asm/page-def.h for ARM64"), so define this if necessary, in the same way
as done prior by commit cffaefd15a8f ("vdso: Use CONFIG_PAGE_SHIFT in
vdso/datapage.h").
Removing linux/time64.h leads to missing 'struct timespec64' in
x86's asm/pvclock.h. Add a forward declaration of that struct in
that file.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
_vdso_data is specific to x86 and __arch_get_k_vdso_data() is provided
so that all architectures can provide the requested pointer.
Do the same with _vdso_rng_data, provide __arch_get_k_vdso_rng_data()
and don't use x86 _vdso_rng_data directly.
Until now vdso/vsyscall.h was only included by time/vsyscall.c but now
it will also be included in char/random.c, leading to a duplicate
declaration of _vdso_data and _vdso_rng_data.
To fix this issue, move the declaration in a C file. vma.c looks like
the most appropriate candidate. We don't need to replace the definitions
in vsyscall.h by declarations as declarations are already in asm/vvar.h.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Having the prototype for __arch_chacha20_blocks_nostack in
arch/x86/include/asm/vdso/getrandom.h meant that the prototype and large
doc comment were cloned by every architecture, which has been causing
unnecessary churn. Instead move it into include/vdso/getrandom.h, where
it can be shared by all archs implementing it.
As a side bonus, this then lets us use that prototype in the
vdso_test_chacha self test, to ensure that it matches the source, and
indeed doing so turned up some inconsistencies, which are rectified
here.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
If this memset() is not inlined than PVH early boot code can call
into KASAN-instrumented memset() which results in unbootable VMs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240802154253.482658-3-adobriyan@gmail.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
If this memcmp() is not inlined then PVH early boot code can call
into KASAN-instrumented memcmp() which results in unbootable VMs:
pvh_start_xen
xen_prepare_pvh
xen_cpuid_base
hypervisor_cpuid_base
memcmp
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240802154253.482658-2-adobriyan@gmail.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Uninstrument arch/x86/platform/pvh/enlighten.c: KASAN has not been setup
_this_ early in the boot process.
Steps to reproduce:
make allnoconfig
make sure CONFIG_AMD_MEM_ENCRYPT is disabled
AMD_MEM_ENCRYPT independently uninstruments lib/string.o
so PVH boot code calls into uninstrumented memset() and
memcmp() which can make the bug disappear depending on
the compiler.
enable CONFIG_PVH
enable CONFIG_KASAN
enable serial console
this is fun exercise if you never done it from nothing :^)
make
qemu-system-x86_64 \
-enable-kvm \
-cpu host \
-smp cpus=1 \
-m 4096 \
-serial stdio \
-kernel vmlinux \
-append 'console=ttyS0 ignore_loglevel'
Messages on serial console will easily tell OK kernel from unbootable
kernel. In bad case qemu hangs in an infinite loop stroboscoping
"SeaBIOS" message.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Juergen Gross <jgross@suse.com>
Message-ID: <20240802154253.482658-1-adobriyan@gmail.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
In order to minimize required special handling for running as Xen PV
dom0, the memory layout is modified to match that of the host. This
requires to have only RAM at the locations where Xen allocated memory
is living. Unfortunately there seem to be some machines, where ACPI
NVS is located at 64 MB, resulting in a conflict with the loaded
kernel or the initial page tables built by Xen.
Avoid this conflict by swapping the ACPI NVS area in the memory map
with unused RAM. This is possible via modification of the dom0 P2M map.
Accesses to the ACPI NVS area are done either for saving and restoring
it across suspend operations (this will work the same way as before),
or by ACPI code when NVS memory is referenced from other ACPI tables.
The latter case is handled by a Xen specific indirection of
acpi_os_ioremap().
While the E820 map can (and should) be modified right away, the P2M
map can be updated only after memory allocation is working, as the P2M
map might need to be extended.
Fixes: 808fdb71936c ("xen: check for kernel memory conflicting with memory layout")
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
When running as a Xen PV dom0 the system needs to map ACPI data of the
host using host physical addresses, while those addresses can conflict
with the guest physical addresses of the loaded linux kernel. The same
problem might apply in case a PV guest is configured to use the host
memory map.
This conflict can be solved by mapping the ACPI data to a different
guest physical address, but mapping the data via acpi_os_ioremap()
must still be possible using the host physical address, as this
address might be generated by AML when referencing some of the ACPI
data.
When configured to support running as a Xen PV domain, have an
implementation of acpi_os_ioremap() being aware of the possibility to
need above mentioned translation of a host physical address to the
guest physical address.
This modification requires to #include linux/acpi.h in some sources
which need to include asm/acpi.h directly.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Merge cpufreq updates for 6.12-rc1:
- Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef).
- Add support for Granite Rapids and Sierra Forest in OOB mode to the
intel_pstate cpufreq driver (Srinivas Pandruvada).
- Add basic support for CPU capacity scaling on x86 and make the
intel_pstate driver set asymmetric CPU capacity on hybrid systems
without SMT (Rafael Wysocki).
- Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq
driver (Jeff Johnson).
- Several OF related cleanups in cpufreq drivers (Rob Herring).
- Enable COMPILE_TEST for ARM drivers (Rob Herrring).
- Introduce quirks for syscon failures and use socinfo to get revision
for TI cpufreq driver (Dhruva Gole, Nishanth Menon).
- Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay
Ugwekar).
- Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
(Danila Tikhonov, Huacai Chen, and Liu Jing).
- Make amd-pstate validate return of any attempt to update EPP limits,
which fixes the masking hardware problems (Mario Limonciello).
- Move the calculation of the AMD boost numerator outside of amd-pstate,
correcting acpi-cpufreq on systems with preferred cores (Mario
Limonciello).
- Harden preferred core detection in amd-pstate to avoid potential
false positives (Mario Limonciello).
- Add extra unit test coverage for mode state machine (Mario
Limonciello).
- Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang Liu).
* pm-cpufreq: (35 commits)
cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
cpufreq/amd-pstate-ut: Add test case for mode switches
cpufreq/amd-pstate: Export symbols for changing modes
amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
x86/amd: Move amd_get_highest_perf() out of amd-pstate
ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
ACPI: CPPC: Drop check for non zero perf ratio
x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
cpufreq/amd-pstate: Catch failures for amd_pstate_epp_update_limit()
cpufreq: ti-cpufreq: Use socinfo to get revision in AM62 family
cpufreq: Fix the cacography in powernv-cpufreq.c
cpufreq: ti-cpufreq: Introduce quirks to handle syscon fails appropriately
cpufreq: loongson3: Use raw_smp_processor_id() in do_service_request()
cpufreq: amd-pstate: add check for cpufreq_cpu_get's return value
...
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux
Merge the second round of amd-pstate changes for 6.12 from Mario
Limonciello:
"* Move the calculation of the AMD boost numerator outside of
amd-pstate, correcting acpi-cpufreq on systems with preferred cores
* Harden preferred core detection to avoid potential false positives
* Add extra unit test coverage for mode state machine"
* tag 'amd-pstate-v6.12-2024-09-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
cpufreq/amd-pstate-ut: Add test case for mode switches
cpufreq/amd-pstate: Export symbols for changing modes
amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
x86/amd: Move amd_get_highest_perf() out of amd-pstate
ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
ACPI: CPPC: Drop check for non zero perf ratio
x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
|