aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2024-03-11Revert "ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512"Catalin Marinas1-2/+1
This reverts commit 0499a78369adacec1af29340b71ff8dd375b4697. Enabling CPUMASK_OFFSTACK on arm64 triggers a warning in the dev_pm_opp_set_config() function followed by a failure to set the regulators and cpufreq-dt probing error. There is no apparent reason why this happens, so revert this commit until further investigation. Signed-off-by: Catalin Marinas <[email protected]> Reported-by: Marek Szyprowski <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-11Merge tag 'linux_kselftest-next-6.9-rc1' of ↵Linus Torvalds2-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest update from Shuah Khan: - livepatch restructuring to move the module out of lib to be built as a out-of-tree modules during kselftest build. This makes it easier change, debug and rebuild the tests by running make on the selftests/livepatch directory, which is not currently possible since the modules on lib/livepatch are build and installed using the main makefile modules target. - livepatch restructuring fixes for problems found by kernel test robot. The change skips the test if kernel-devel isn't installed (default value of KDIR), or if KDIR variable passed doesn't exists. - resctrl test restructuring and new non-contiguous CBMs CAT test - new ktap_helpers to print diagnostic messages, pass/fail tests based on exit code, abort test, and finish the test. - a new test verify power supply properties. - a new ftrace to exercise function tracer across cpu hotplug. - timeout increase for mqueue test to allow the test to run on i3.metal AWS instances. - minor spelling corrections in several tests. - missing gitignore files and changes to existing gitignore files. * tag 'linux_kselftest-next-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (57 commits) kselftest: Add basic test for probing the rust sample modules selftests: lib.mk: Do not process TEST_GEN_MODS_DIR selftests: livepatch: Avoid running the tests if kernel-devel is missing selftests: livepatch: Add initial .gitignore selftests/resctrl: Add non-contiguous CBMs CAT test selftests/resctrl: Add resource_info_file_exists() selftests/resctrl: Split validate_resctrl_feature_request() selftests/resctrl: Add a helper for the non-contiguous test selftests/resctrl: Add test groups and name L3 CAT test L3_CAT selftests: sched: Fix spelling mistake "hiearchy" -> "hierarchy" selftests/mqueue: Set timeout to 180 seconds selftests/ftrace: Add test to exercize function tracer across cpu hotplug selftest: ftrace: fix minor typo in log selftests: thermal: intel: workload_hint: add missing gitignore selftests: thermal: intel: power_floor: add missing gitignore selftests: uevent: add missing gitignore selftests: Add test to verify power supply properties selftests: ktap_helpers: Add a helper to finish the test selftests: ktap_helpers: Add a helper to abort the test selftests: ktap_helpers: Add helper to pass/fail test based on exit code ...
2024-03-11ARM: 9359/1: flush: check if the folio is reserved for no-mapping addressesYongqiang Liu1-0/+3
Since commit a4d5613c4dc6 ("arm: extend pfn_valid to take into account freed memory map alignment") changes the semantics of pfn_valid() to check presence of the memory map for a PFN. A valid page for an address which is reserved but not mapped by the kernel[1], the system crashed during some uio test with the following memory layout: node 0: [mem 0x00000000c0a00000-0x00000000cc8fffff] node 0: [mem 0x00000000d0000000-0x00000000da1fffff] the uio layout is:0xc0900000, 0x100000 the crash backtrace like: Unable to handle kernel paging request at virtual address bff00000 [...] CPU: 1 PID: 465 Comm: startapp.bin Tainted: G O 5.10.0 #1 Hardware name: Generic DT based system PC is at b15_flush_kern_dcache_area+0x24/0x3c LR is at __sync_icache_dcache+0x6c/0x98 [...] (b15_flush_kern_dcache_area) from (__sync_icache_dcache+0x6c/0x98) (__sync_icache_dcache) from (set_pte_at+0x28/0x54) (set_pte_at) from (remap_pfn_range+0x1a0/0x274) (remap_pfn_range) from (uio_mmap+0x184/0x1b8 [uio]) (uio_mmap [uio]) from (__mmap_region+0x264/0x5f4) (__mmap_region) from (__do_mmap_mm+0x3ec/0x440) (__do_mmap_mm) from (do_mmap+0x50/0x58) (do_mmap) from (vm_mmap_pgoff+0xfc/0x188) (vm_mmap_pgoff) from (ksys_mmap_pgoff+0xac/0xc4) (ksys_mmap_pgoff) from (ret_fast_syscall+0x0/0x5c) Code: e0801001 e2423001 e1c00003 f57ff04f (ee070f3e) ---[ end trace 09cf0734c3805d52 ]--- Kernel panic - not syncing: Fatal exception So check if PG_reserved was set to solve this issue. [1]: https://lore.kernel.org/lkml/[email protected]/ Fixes: a4d5613c4dc6 ("arm: extend pfn_valid to take into account freed memory map alignment") Suggested-by: Mike Rapoport <[email protected]> Signed-off-by: Yongqiang Liu <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2024-03-11mm: Introduce vmap_page_range() to map pages in PCI address spaceAlexei Starovoitov4-8/+8
ioremap_page_range() should be used for ranges within vmalloc range only. The vmalloc ranges are allocated by get_vm_area(). PCI has "resource" allocator that manages PCI_IOBASE, IO_SPACE_LIMIT address range, hence introduce vmap_page_range() to be used exclusively to map pages in PCI address space. Fixes: 3e49a866c9dc ("mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.") Reported-by: Miguel Ojeda <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: Miguel Ojeda <[email protected]> Link: https://lore.kernel.org/bpf/CANiq72ka4rir+RTN2FQoT=Vvprp_Ao-CvoYEkSNqtSY+RZj+AA@mail.gmail.com
2024-03-11ARM: 9354/1: ptrace: Use bitfield helpersGeert Uytterhoeven1-2/+3
The isa_mode() macro extracts two fields, and recombines them into a single value. Make this more obvious by using the FIELD_GET() helper, and shifting the result into its final resting place. Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2024-03-11Merge branch 'pm-cpuidle'Rafael J. Wysocki1-2/+2
Merge cpuidle updates for 6.9-rc1: - Prevent the haltpoll cpuidle governor from shrinking guest poll_limit_ns below grow_start (Parshuram Sangle). - Avoid potential overflow in integer multiplication when computing cpuidle state parameters (C Cheng). - Adjust MWAIT hint target C-state computation in the ACPI cpuidle driver and in intel_idle to return a correct value for C0 (He Rongguang). * pm-cpuidle: cpuidle: ACPI/intel: fix MWAIT hint target C-state computation cpuidle: Avoid potential overflow in integer multiplication cpuidle: haltpoll: do not shrink guest poll_limit_ns below grow_start
2024-03-11Merge tag 'kvm-x86-xen-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini11-119/+323
KVM Xen and pfncache changes for 6.9: - Rip out the half-baked support for using gfn_to_pfn caches to manage pages that are "mapped" into guests via physical addresses. - Add support for using gfn_to_pfn caches with only a host virtual address, i.e. to bypass the "gfn" stage of the cache. The primary use case is overlay pages, where the guest may change the gfn used to reference the overlay page, but the backing hva+pfn remains the same. - Add an ioctl() to allow mapping Xen's shared_info page using an hva instead of a gpa, so that userspace doesn't need to reconfigure and invalidate the cache/mapping if the guest changes the gpa (but userspace keeps the resolved hva the same). - When possible, use a single host TSC value when computing the deadline for Xen timers in order to improve the accuracy of the timer emulation. - Inject pending upcall events when the vCPU software-enables its APIC to fix a bug where an upcall can be lost (and to follow Xen's behavior). - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen events fails, e.g. if the guest has aliased xAPIC IDs. - Extend gfn_to_pfn_cache's mutex to cover (de)activation (in addition to refresh), and drop a now-redundant acquisition of xen_lock (that was protecting the shared_info cache) to fix a deadlock due to recursively acquiring xen_lock.
2024-03-11Merge tag 'kvm-x86-pmu-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini11-237/+267
KVM x86 PMU changes for 6.9: - Fix several bugs where KVM speciously prevents the guest from utilizing fixed counters and architectural event encodings based on whether or not guest CPUID reports support for the _architectural_ encoding. - Fix a variety of bugs in KVM's emulation of RDPMC, e.g. for "fast" reads, priority of VMX interception vs #GP, PMC types in architectural PMUs, etc. - Add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID, i.e. are difficult to validate via KVM-Unit-Tests. - Zero out PMU metadata on AMD if the virtual PMU is disabled to avoid wasting cycles, e.g. when checking if a PMC event needs to be synthesized when skipping an instruction. - Optimize triggering of emulated events, e.g. for "count instructions" events when skipping an instruction, which yields a ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest. - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit.
2024-03-11Merge tag 'kvm-x86-vmx-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini3-41/+34
KVM VMX changes for 6.9: - Fix a bug where KVM would report stale/bogus exit qualification information when exiting to userspace due to an unexpected VM-Exit while the CPU was vectoring an exception. - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support. - Clean up the logic for massaging the passthrough MSR bitmaps when userspace changes its MSR filter.
2024-03-11Merge tag 'kvm-x86-mmu-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini6-76/+203
KVM x86 MMU changes for 6.9: - Clean up code related to unprotecting shadow pages when retrying a guest instruction after failed #PF-induced emulation. - Zap TDP MMU roots at 4KiB granularity to minimize the delay in yielding if a reschedule is needed, e.g. if a high priority task needs to run. Because KVM doesn't support yielding in the middle of processing a zapped non-leaf SPTE, zapping at 1GiB granularity can result in multi-millisecond lag when attempting to schedule in a high priority. - Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for read, e.g. to avoid serializing vCPUs when userspace deletes a memslot. - Allocate write-tracking metadata on-demand to avoid the memory overhead when running kernels built with KVMGT support (external write-tracking enabled), but for workloads that don't use nested virtualization (shadow paging) or KVMGT.
2024-03-11Merge tag 'kvm-x86-misc-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini13-176/+163
KVM x86 misc changes for 6.9: - Explicitly initialize a variety of on-stack variables in the emulator that triggered KMSAN false positives (though in fairness in KMSAN, it's comically difficult to see that the uninitialized memory is never truly consumed). - Fix the deubgregs ABI for 32-bit KVM, and clean up code related to reading DR6 and DR7. - Rework the "force immediate exit" code so that vendor code ultimately decides how and when to force the exit. This allows VMX to further optimize handling preemption timer exits, and allows SVM to avoid sending a duplicate IPI (SVM also has a need to force an exit). - Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if vCPU creation ultimately failed, and add WARN to guard against similar bugs. - Provide a dedicated arch hook for checking if a different vCPU was in-kernel (for directed yield), and simplify the logic for checking if the currently loaded vCPU is in-kernel. - Misc cleanups and fixes.
2024-03-11LoongArch: Add kernel livepatching supportJinyang He3-0/+46
The arch-specified function ftrace_regs_set_instruction_pointer() has been implemented in arch/loongarch/include/asm/ftrace.h, so here only implement arch_stack_walk_reliable() function. Here are the test logs: [root@linux fedora]# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-6.8.0-rc2 root=/dev/sda3 [root@linux fedora]# modprobe livepatch-sample [root@linux fedora]# cat /proc/cmdline this has been live patched [root@linux fedora]# echo 0 > /sys/kernel/livepatch/livepatch_sample/enabled [root@linux fedora]# rmmod livepatch_sample [root@linux fedora]# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-6.8.0-rc2 root=/dev/sda3 [root@linux fedora]# dmesg -t | tail -5 livepatch: enabling patch 'livepatch_sample' livepatch: 'livepatch_sample': starting patching transition livepatch: 'livepatch_sample': patching complete livepatch: 'livepatch_sample': starting unpatching transition livepatch: 'livepatch_sample': unpatching complete Signed-off-by: Jinyang He <[email protected]> Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2024-03-11LoongArch: Add ORC stack unwinder supportTiezhu Yang35-40/+861
The kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which is similar in concept to a DWARF unwinder. The difference is that the format of the ORC data is much simpler than DWARF, which in turn allows the ORC unwinder to be much simpler and faster. The ORC data consists of unwind tables which are generated by objtool. After analyzing all the code paths of a .o file, it determines information about the stack state at each instruction address in the file and outputs that information to the .orc_unwind and .orc_unwind_ip sections. The per-object ORC sections are combined at link time and are sorted and post-processed at boot time. The unwinder uses the resulting data to correlate instruction addresses with their stack states at run time. Most of the logic are similar with x86, in order to get ra info before ra is saved into stack, add ra_reg and ra_offset into orc_entry. At the same time, modify some arch-specific code to silence the objtool warnings. Co-developed-by: Jinyang He <[email protected]> Signed-off-by: Jinyang He <[email protected]> Co-developed-by: Youling Tang <[email protected]> Signed-off-by: Youling Tang <[email protected]> Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2024-03-11Merge tag 'kvm-riscv-6.9-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini3-0/+19
KVM/riscv changes for 6.9 - Exception and interrupt handling for selftests - Sstc (aka arch_timer) selftest - Forward seed CSR access to KVM userspace - Ztso extension support for Guest/VM - Zacas extension support for Guest/VM
2024-03-11Merge back cpufreq material for 6.9-rc1.Rafael J. Wysocki1-2/+3
2024-03-11Merge tag 'kvmarm-6.9' of ↵Paolo Bonzini46-329/+1346
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.9 - Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it - Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path - Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register - Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests
2024-03-11Merge tag 'loongarch-kvm-6.9' of ↵Paolo Bonzini164-647/+976
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.9 * Set reserved bits as zero in CPUCFG. * Start SW timer only when vcpu is blocking. * Do not restart SW timer when it is expired. * Remove unnecessary CSR register saving during enter guest.
2024-03-11mips: cm: Convert __mips_cm_phys_base() to weak functionSerge Semin2-8/+4
Based on the design pattern utilized in the CM GCR base address getter implementation, the platform-specific code is capable to re-define the getter and re-use the weakly defined initial version. But since the pattern hasn't been used for over 10 years and another similar case (CM L2-sync only base address getter) has just been fixed, let's unify the interface and convert it to a more traditional single weakly defined method: mips_cm_phys_base() (see the link below for the discussion around this). Link: https://lore.kernel.org/linux-mips/[email protected] Signed-off-by: Serge Semin <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2024-03-11mips: cm: Convert __mips_cm_l2sync_phys_base() to weak functionSerge Semin2-4/+14
The __mips_cm_l2sync_phys_base() and mips_cm_l2sync_phys_base() couple was introduced in commit 9f98f3dd0c51 ("MIPS: Add generic CM probe & access code") where the former method was a weak implementation of the later function. Such design pattern permitted to re-define the original method and to use the weak implementation in the new function. A similar approach was introduced in the framework of another arch-specific programmable interface: mips_cm_phys_base() and __mips_cm_phys_base(). The only difference is that the underscored method of the later couple was declared in the "asm/mips-cm.h" header file, but it wasn't done for the CM L2-sync methods in the subject. Due to the missing global function declaration the "missing prototype" warning was spotted in the framework of the commit 9a2036724cd6 ("mips: mark local function static if possible") and fixed just be re-qualifying the weak method as static. Doing that broke what was originally implied by having the weak implementation globally defined. Let's fix the broken CM2 L2-sync arch-interface by dropping the static qualifier and, seeing the implemented pattern hasn't been used for over 10 years but will be required soon (see the link for the discussion around it), converting it to a single weakly defined method: mips_cm_l2sync_phys_base(). Fixes: 9a2036724cd6 ("mips: mark local function static if possible") Link: https://lore.kernel.org/linux-mips/[email protected] Signed-off-by: Serge Semin <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2024-03-11mips: dts: ralink: mt7621: add cell count properties to usbJustin Swartz1-0/+3
Add default #address-cells and #size-cells properties to the usb node, which should be suitable for hubs and devices without explicitly declared interface nodes, as: "#address-cells": description: should be 1 for hub nodes with device nodes, should be 2 for device nodes with interface nodes. enum: [1, 2] "#size-cells": const: 0 -- from Documentation/devicetree/bindings/usb/usb-device.yaml Acked-by: Sergio Paracuellos <[email protected]> Signed-off-by: Justin Swartz <[email protected]> Reviewed-by: Arınç ÜNAL <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2024-03-11mips: dts: ralink: mt7621: add serial1 and serial2 nodesJustin Swartz1-0/+40
Add serial1 and serial2 nodes to define the existence of the MT7621's second and third UARTs. Acked-by: Sergio Paracuellos <[email protected]> Signed-off-by: Justin Swartz <[email protected]> Reviewed-by: Arınç ÜNAL <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2024-03-11mips: dts: ralink: mt7621: reorder serial0 propertiesJustin Swartz1-2/+3
Reorder serial0 properties according to the guidelines laid out in Documentation/devicetree/bindings/dts-coding-style.rst Acked-by: Sergio Paracuellos <[email protected]> Reviewed-by: AngeloGioacchino Del Regno <[email protected]> Signed-off-by: Justin Swartz <[email protected]> Reviewed-by: Arınç ÜNAL <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2024-03-11mips: dts: ralink: mt7621: associate uart1_pins with serial0Justin Swartz1-0/+3
Add missing pinctrl-name and pinctrl-0 properties to declare that the uart1_pins group is associated with serial0. Acked-by: Sergio Paracuellos <[email protected]> Signed-off-by: Justin Swartz <[email protected]> Reviewed-by: Arınç ÜNAL <[email protected]> Reviewed-by: AngeloGioacchino Del Regno <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2024-03-11MIPS: ralink: Don't use "proxy" headersAndy Shevchenko1-2/+8
Update header inclusions to follow IWYU (Include What You Use) principle. Fixes: 5804be061848 ("MIPS: ralink: Remove unused of_gpio.h") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Sergio Paracuellos <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2024-03-11Merge tag 'riscv-dt-fixes-for-v6.8-final' of ↵Arnd Bergmann9-38/+22
https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/dt RISC-V Devicetree fixes for v6.8-final Starfive: The previous cleanup broke boot on the jh7100 as the driver depended on the fallback clock name created based on the node-name when clock-output-names is not present. Add clock-output-names to restore working order. Generic: BUILTIN_DTB has been broken for ages on any platform other than the nommu Canaan k210 SoC as the first dtb built (in alphanumerical order), would get built into the image. This didn't get fixed for ages because nobody actually cared about running it other than the k210 enough to fix it. The folks doing Sophgo SG2042 development have come along and fixed it, as they want to use builtin dtbs. linux-boot on that platform reuses the dtb it was provided by OpenSBI when booting linux proper, which is unfortunately not possible to boot a mainline kernel with. Signed-off-by: Conor Dooley <[email protected]> * tag 'riscv-dt-fixes-for-v6.8-final' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: riscv: dts: Move BUILTIN_DTB_SOURCE to common Kconfig riscv: dts: starfive: jh7100: fix root clock names Link: https://lore.kernel.org/r/20240306-waltz-facial-9e4e1b792053@spud Signed-off-by: Arnd Bergmann <[email protected]>
2024-03-11Merge tag 'arm-soc/for-6.9/soc' of https://github.com/Broadcom/stblinux into ↵Arnd Bergmann2-4/+5
soc/late This pull request contains Broadcom ARM-based SoCs updates for 6.9, please pull the following: - Arnd removes a select CONFIG_TICK_ONESHOT done by the ARCH_BCM_MOBILE architecture which could cause compilation failures - Florian adds a debug UART entry for BCM74165 * tag 'arm-soc/for-6.9/soc' of https://github.com/Broadcom/stblinux: ARM: bcm: stop selecing CONFIG_TICK_ONESHOT ARM: brcmstb: Add debug UART entry for 74165 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2024-03-11Merge tag 'arm-soc/for-6.9/devicetree-arm64' of ↵Arnd Bergmann2-9/+7
https://github.com/Broadcom/stblinux into soc/late This pull request contains Broadcom ARM64-based SoCs changes for 6.9, please pull the following: - Rafal defines a proper NVMEM layout for the Asus GT-AC5300 router and removes some invalid Device Tree properties pertaining to the Ethernet switch on bcm4908 * tag 'arm-soc/for-6.9/devicetree-arm64' of https://github.com/Broadcom/stblinux: arm64: dts: broadcom: bcmbca: bcm4908: drop invalid switch cells arm64: dts: broadcom: bcmbca: bcm4908: use NVMEM layout for Asus GT-AC5300 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2024-03-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-13/+71
Pull kvm fixes from Paolo Bonzini: "KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not writable from userspace, so there would be no way to write to a read-only guest_memfd). - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely for development and testing. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD dirty logging test that caused false passes. x86 fixes: - Fix missing marking of a guest page as dirty when emulating an atomic access. - Check for mmu_notifier invalidation events before faulting in the pfn, and before acquiring mmu_lock, to avoid unnecessary work and lock contention with preemptible kernels (including CONFIG_PREEMPT_DYNAMIC in non-preemptible mode). - Disable AMD DebugSwap by default, it breaks VMSA signing and will be re-enabled with a better VM creation API in 6.10. - Do the cache flush of converted pages in svm_register_enc_region() before dropping kvm->lock, to avoid a race with unregistering of the same region and the consequent use-after-free issue" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: SEV: disable SEV-ES DebugSwap by default KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region() KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY KVM: x86: Mark target gfn of emulated atomic instruction as dirty
2024-03-10openrisc: Use asm-generic's version of fix_to_virt() & virt_to_fix()Dawei Li1-30/+1
Openrisc's implementation of fix_to_virt() & virt_to_fix() share same functionality with ones of asm generic. Plus, generic version of fix_to_virt() can trap invalid index at compile time. Thus, Replace the arch-specific implementations with asm generic's ones. Signed-off-by: Dawei Li <[email protected]> Signed-off-by: Stafford Horne <[email protected]>
2024-03-10openrisc: Call setup_memory() earlier in the init sequenceOreoluwa Babatunde1-3/+3
The unflatten_and_copy_device_tree() function contains a call to memblock_alloc(). This means that memblock is allocating memory before any of the reserved memory regions are set aside in the setup_memory() function which calls early_init_fdt_scan_reserved_mem(). Therefore, there is a possibility for memblock to allocate from any of the reserved memory regions. Hence, move the call to setup_memory() to be earlier in the init sequence so that the reserved memory regions are set aside before any allocations are done using memblock. Signed-off-by: Oreoluwa Babatunde <[email protected]> Signed-off-by: Stafford Horne <[email protected]>
2024-03-09Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of ↵Paolo Bonzini52-156/+246
https://github.com/kvm-x86/linux into HEAD KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating ABI that KVM can't sanely support. - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely a development and testing vehicle, and come with zero guarantees. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
2024-03-09SEV: disable SEV-ES DebugSwap by defaultPaolo Bonzini1-2/+5
The DebugSwap feature of SEV-ES provides a way for confidential guests to use data breakpoints. However, because the status of the DebugSwap feature is recorded in the VMSA, enabling it by default invalidates the attestation signatures. In 6.10 we will introduce a new API to create SEV VMs that will allow enabling DebugSwap based on what the user tells KVM to do. Contextually, we will change the legacy KVM_SEV_ES_INIT API to never enable DebugSwap. For compatibility with kernels that pre-date the introduction of DebugSwap, as well as with those where KVM_SEV_ES_INIT will never enable it, do not enable the feature by default. If anybody wants to use it, for now they can enable the sev_es_debug_swap_enabled module parameter, but this will result in a warning. Fixes: d1f85fbe836e ("KVM: SEV: Enable data breakpoints in SEV-ES") Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2024-03-09Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of ↵Paolo Bonzini2-4/+5
https://github.com/kvm-x86/linux into HEAD KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating ABI that KVM can't sanely support. - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely a development and testing vehicle, and come with zero guarantees. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
2024-03-09Merge tag 'kvm-x86-fixes-6.8-2' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini2-0/+52
KVM x86 fixes for 6.8, round 2: - When emulating an atomic access, mark the gfn as dirty in the memslot to fix a bug where KVM could fail to mark the slot as dirty during live migration, ultimately resulting in guest data corruption due to a dirty page not being re-copied from the source to the target. - Check for mmu_notifier invalidation events before faulting in the pfn, and before acquiring mmu_lock, to avoid unnecessary work and lock contention. Contending mmu_lock is especially problematic on preemptible kernels, as KVM may yield mmu_lock in response to the contention, which severely degrades overall performance due to vCPUs making it difficult for the task that triggered invalidation to make forward progress. Note, due to another kernel bug, this fix isn't limited to preemtible kernels, as any kernel built with CONFIG_PREEMPT_DYNAMIC=y will yield contended rwlocks and spinlocks. https://lore.kernel.org/all/[email protected]
2024-03-09arm64, bpf: Use bpf_prog_pack for arm64 bpf trampolinePuranjay Mohan1-9/+46
We used bpf_prog_pack to aggregate bpf programs into huge page to relieve the iTLB pressure on the system. This was merged for ARM64[1] We can apply it to bpf trampoline as well. This would increase the preformance of fentry and struct_ops programs. [1] https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Puranjay Mohan <[email protected]> Reviewed-by: Pu Lehui <[email protected]> Message-ID: <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2024-03-09x86/efistub: Remap kernel text read-only before dropping NX attributeArd Biesheuvel3-1/+3
Currently, the EFI stub invokes the EFI memory attributes protocol to strip any NX restrictions from the entire loaded kernel, resulting in all code and data being mapped read-write-execute. The point of the EFI memory attributes protocol is to remove the need for all memory allocations to be mapped with both write and execute permissions by default, and make it the OS loader's responsibility to transition data mappings to code mappings where appropriate. Even though the UEFI specification does not appear to leave room for denying memory attribute changes based on security policy, let's be cautious and avoid relying on the ability to create read-write-execute mappings. This is trivially achievable, given that the amount of kernel code executing via the firmware's 1:1 mapping is rather small and limited to the .head.text region. So let's drop the NX restrictions only on that subregion, but not before remapping it as read-only first. Signed-off-by: Ard Biesheuvel <[email protected]>
2024-03-08x86/hyperv: Use per cpu initial stack for vtl contextSaurabh Sengar1-4/+15
Currently, the secondary CPUs in Hyper-V VTL context lack support for parallel startup. Therefore, relying on the single initial_stack fetched from the current task structure suffices for all vCPUs. However, common initial_stack risks stack corruption when parallel startup is enabled. In order to facilitate parallel startup, use the initial_stack from the per CPU idle thread instead of the current task. Fixes: 3be1bc2fe9d2 ("x86/hyperv: VTL support for Hyper-V") Signed-off-by: Saurabh Sengar <[email protected]> Reviewed-by: Michael Kelley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]> Message-ID: <[email protected]>
2024-03-08sparc32: Fix section mismatch in leon_pci_grpciSam Ravnborg2-2/+2
Passing a datastructre marked _initconst to platform_driver_register() is wrong. Drop the __initconst notation. This fixes the following warnings: WARNING: modpost: vmlinux: section mismatch in reference: grpci1_of_driver+0x30 (section: .data) -> grpci1_of_match (section: .init.rodata) WARNING: modpost: vmlinux: section mismatch in reference: grpci2_of_driver+0x30 (section: .data) -> grpci2_of_match (section: .init.rodata) Signed-off-by: Sam Ravnborg <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Fixes: 4154bb821f0b ("sparc: leon: grpci1: constify of_device_id") Fixes: 03949b1cb9f1 ("sparc: leon: grpci2: constify of_device_id") Tested-by: Randy Dunlap <[email protected]> # build-tested Reviewed-by: Andreas Larsson <[email protected]> Tested-by: Andreas Larsson <[email protected]> Signed-off-by: Andreas Larsson <[email protected]> Link: https://lore.kernel.org/r/20240224-sam-fix-sparc32-all-builds-v2-7-1f186603c5c4@ravnborg.org
2024-03-08sparc32: Fix parport build with sparc32Sam Ravnborg2-252/+263
include/asm/parport.h is sparc64 specific. Rename it to parport_64.h and use the generic version for sparc32. This fixed all{mod,yes}config build errors like: parport_pc.c:(.text):undefined-reference-to-ebus_dma_enable parport_pc.c:(.text):undefined-reference-to-ebus_dma_irq_enable parport_pc.c:(.text):undefined-reference-to-ebus_dma_register The errors occur as the sparc32 build references sparc64 symbols. Signed-off-by: Sam Ravnborg <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Maciej W. Rozycki <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Fixes: 66bcd06099bb ("parport_pc: Also enable driver for PCI systems") Cc: [email protected] # v5.18+ Tested-by: Randy Dunlap <[email protected]> # build-tested Reviewed-by: Andreas Larsson <[email protected]> Signed-off-by: Andreas Larsson <[email protected]> Link: https://lore.kernel.org/r/20240224-sam-fix-sparc32-all-builds-v2-6-1f186603c5c4@ravnborg.org
2024-03-08sparc32: Do not select GENERIC_ISA_DMASam Ravnborg1-4/+0
sparc32 do not support generic isa dma, so do not select the symbol. This fixes the following warnings: dma.c:70:5: error: no previous prototype for 'request_dma' [-Werror=missing-prototypes] dma.c:88:6: error: no previous prototype for 'free_dma' [-Werror=missing-prototypes] Signed-off-by: Sam Ravnborg <[email protected]> Fixes: 0fcb70851fbf ("Makefile.extrawarn: turn on missing-prototypes globally") Acked-by: Randy Dunlap <[email protected]> Tested-by: Randy Dunlap <[email protected]> # build-tested Cc: Andreas Larsson <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Maciej W. Rozycki <[email protected]> Reviewed-by: Andreas Larsson <[email protected]> Signed-off-by: Andreas Larsson <[email protected]> Link: https://lore.kernel.org/r/20240224-sam-fix-sparc32-all-builds-v2-5-1f186603c5c4@ravnborg.org
2024-03-08sparc32: Fix build with trapbaseSam Ravnborg5-14/+14
Fix the following build errors: irq_32.c:258:7: error: array subscript [16, 79] is outside array bounds of 'struct tt_entry[1] irq_32.c:271:14: error: assignment to 'struct tt_entry *' from incompatible pointer type 'struct tt_entry (*)[] trapbase is a pointer to an array of tt_entry, but the code declared it as a pointer so the compiler see a single entry and not an array. Fix this by modifyinf the declaration to be an array, and modify all users to take the address of the first member. Signed-off-by: Sam Ravnborg <[email protected]> Acked-by: Randy Dunlap <[email protected]> Tested-by: Randy Dunlap <[email protected]> # build-tested Cc: Andreas Larsson <[email protected]> Cc: "David S. Miller" <[email protected]> Reviewed-by: Andreas Larsson <[email protected]> Tested-by: Andreas Larsson <[email protected]> Signed-off-by: Andreas Larsson <[email protected]> Link: https://lore.kernel.org/r/20240224-sam-fix-sparc32-all-builds-v2-2-1f186603c5c4@ravnborg.org
2024-03-08sparc32: Use generic cmpdi2/ucmpdi2 variantsSam Ravnborg4-50/+4
Use the generic variants - the implementation is the same. As a nice side-effect fix the following warnings: cmpdi2.c: warning: no previous prototype for '__cmpdi2' [-Wmissing-prototypes] ucmpdi2.c: warning: no previous prototype for '__ucmpdi2' [-Wmissing-prototypes] Signed-off-by: Sam Ravnborg <[email protected]> Fixes: 0fcb70851fbf ("Makefile.extrawarn: turn on missing-prototypes globally") Reviewed-by: Randy Dunlap <[email protected]> Tested-by: Randy Dunlap <[email protected]> # build-tested Reviewed-by: Maciej W. Rozycki <[email protected]> Tested-by: Maciej W. Rozycki <[email protected]> # build-tested Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Reviewed-by: Andreas Larsson <[email protected]> Tested-by: Andreas Larsson <[email protected]> Signed-off-by: Andreas Larsson <[email protected]> Link: https://lore.kernel.org/r/20240224-sam-fix-sparc32-all-builds-v2-1-1f186603c5c4@ravnborg.org
2024-03-08x86/of: Unconditionally call unflatten_and_copy_device_tree()Stephen Boyd1-12/+14
Call this function unconditionally so that we can populate an empty DTB on platforms that don't boot with a firmware provided or builtin DTB. There's no harm in calling unflatten_device_tree() unconditionally here. If there isn't a non-NULL 'initial_boot_params' pointer then unflatten_device_tree() returns early. Cc: Rob Herring <[email protected]> Cc: Frank Rowand <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: [email protected] Cc: H. Peter Anvin <[email protected]> Tested-by: Saurabh Sengar <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2024-03-08um: Unconditionally call unflatten_device_tree()Stephen Boyd1-8/+8
Call this function unconditionally so that we can populate an empty DTB on platforms that don't boot with a command line provided DTB. There's no harm in calling unflatten_device_tree() unconditionally. If there isn't a valid initial_boot_params dtb then unflatten_device_tree() returns early. Cc: Rob Herring <[email protected]> Cc: Frank Rowand <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Johannes Berg <[email protected]> Cc: [email protected] Signed-off-by: Stephen Boyd <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2024-03-08x86/sev: Disable KMSAN for memory encryption TUsChangbin Du2-0/+2
Instrumenting sev.c and mem_encrypt_identity.c with KMSAN will result in a triple-faulting kernel. Some of the code is invoked too early during boot, before KMSAN is ready. Disable KMSAN instrumentation for the two translation units. [ bp: Massage commit message. ] Signed-off-by: Changbin Du <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-07platform: goldfish: move the separate 'default' propery for CONFIG_GOLDFISHMasahiro Yamada1-4/+0
Currently, there are two entries for CONFIG_GOLDFISH. In arch/x86/Kconfig: config GOLDFISH def_bool y depends on X86_GOLDFISH In drivers/platform/goldfish/Kconfig: menuconfig GOLDFISH bool "Platform support for Goldfish virtual devices" depends on HAS_IOMEM && HAS_DMA While Kconfig allows multiple entries, it generally leads to tricky code. Prior to commit bd2f348db503 ("goldfish: refactor goldfish platform configs"), CONFIG_GOLDFISH was an alias of CONFIG_X86_GOLDFISH. After the mentioned commit added the second entry with a user prompt, the former provides the 'default' property that is effective only when X86_GOLDFISH=y. Merge them tegether to clarify how it has worked in the past 8 years. Cc: Greg Hackmann <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-03-07Merge branch 'for-next/stage1-lpa2' into for-next/coreCatalin Marinas53-1124/+1931
* for-next/stage1-lpa2: (48 commits) : Add support for LPA2 and WXN and stage 1 arm64/mm: Avoid ID mapping of kpti flag if it is no longer needed arm64/mm: Use generic __pud_free() helper in pud_free() implementation arm64: gitignore: ignore relacheck arm64: Use Signed/Unsigned enums for TGRAN{4,16,64} and VARange arm64: mm: Make PUD folding check in set_pud() a runtime check arm64: mm: add support for WXN memory translation attribute mm: add arch hook to validate mmap() prot flags arm64: defconfig: Enable LPA2 support arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels arm64: ptdump: Deal with translation levels folded at runtime arm64: ptdump: Disregard unaddressable VA space arm64: mm: Add support for folding PUDs at runtime arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging arm64: mm: Add 5 level paging support to fixmap and swapper handling arm64: Enable LPA2 at boot if supported by the system arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion arm64: mm: Add definitions to support 5 levels of paging arm64: mm: Add LPA2 support to phys<->pte conversion routines arm64: mm: Wire up TCR.DS bit to PTE shareability fields ...
2024-03-07Merge branches 'for-next/reorg-va-space', 'for-next/rust-for-arm64', ↵Catalin Marinas42-220/+466
'for-next/misc', 'for-next/daif-cleanup', 'for-next/kselftest', 'for-next/documentation', 'for-next/sysreg' and 'for-next/dpisa', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: (39 commits) docs: perf: Fix build warning of hisi-pcie-pmu.rst perf: starfive: Only allow COMPILE_TEST for 64-bit architectures MAINTAINERS: Add entry for StarFive StarLink PMU docs: perf: Add description for StarFive's StarLink PMU dt-bindings: perf: starfive: Add JH8100 StarLink PMU perf: starfive: Add StarLink PMU support docs: perf: Update usage for target filter of hisi-pcie-pmu drivers/perf: hisi_pcie: Merge find_related_event() and get_event_idx() drivers/perf: hisi_pcie: Relax the check on related events drivers/perf: hisi_pcie: Check the target filter properly drivers/perf: hisi_pcie: Add more events for counting TLP bandwidth drivers/perf: hisi_pcie: Fix incorrect counting under metric mode drivers/perf: hisi_pcie: Introduce hisi_pcie_pmu_get_event_ctrl_val() drivers/perf: hisi_pcie: Rename hisi_pcie_pmu_{config,clear}_filter() drivers/perf: hisi: Enable HiSilicon Erratum 162700402 quirk for HIP09 perf/arm_cspmu: Add devicetree support dt-bindings/perf: Add Arm CoreSight PMU perf/arm_cspmu: Simplify counter reset perf/arm_cspmu: Simplify attribute groups perf/arm_cspmu: Simplify initialisation ... * for-next/reorg-va-space: : Reorganise the arm64 kernel VA space in preparation for LPA2 support : (52-bit VA/PA). arm64: kaslr: Adjust randomization range dynamically arm64: mm: Reclaim unused vmemmap region for vmalloc use arm64: vmemmap: Avoid base2 order of struct page size to dimension region arm64: ptdump: Discover start of vmemmap region at runtime arm64: ptdump: Allow all region boundaries to be defined at boot time arm64: mm: Move fixmap region above vmemmap region arm64: mm: Move PCI I/O emulation region above the vmemmap region * for-next/rust-for-arm64: : Enable Rust support for arm64 arm64: rust: Enable Rust support for AArch64 rust: Refactor the build target to allow the use of builtin targets * for-next/misc: : Miscellaneous arm64 patches ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 arm64: Remove enable_daif macro arm64/hw_breakpoint: Directly use ESR_ELx_WNR for an watchpoint exception arm64: cpufeatures: Clean up temporary variable to simplify code arm64: Update setup_arch() comment on interrupt masking arm64: remove unnecessary ifdefs around is_compat_task() arm64: ftrace: Don't forbid CALL_OPS+CC_OPTIMIZE_FOR_SIZE with Clang arm64/sme: Ensure that all fields in SMCR_EL1 are set to known values arm64/sve: Ensure that all fields in ZCR_EL1 are set to known values arm64/sve: Document that __SVE_VQ_MAX is much larger than needed arm64: make member of struct pt_regs and it's offset macro in the same order arm64: remove unneeded BUILD_BUG_ON assertion arm64: kretprobes: acquire the regs via a BRK exception arm64: io: permit offset addressing arm64: errata: Don't enable workarounds for "rare" errata by default * for-next/daif-cleanup: : Clean up DAIF handling for EL0 returns arm64: Unmask Debug + SError in do_notify_resume() arm64: Move do_notify_resume() to entry-common.c arm64: Simplify do_notify_resume() DAIF masking * for-next/kselftest: : Miscellaneous arm64 kselftest patches kselftest/arm64: Test that ptrace takes effect in the target process * for-next/documentation: : arm64 documentation patches arm64/sme: Remove spurious 'is' in SME documentation arm64/fp: Clarify effect of setting an unsupported system VL arm64/sme: Fix cut'n'paste in ABI document arm64/sve: Remove bitrotted comment about syscall behaviour * for-next/sysreg: : sysreg updates arm64/sysreg: Update ID_AA64DFR0_EL1 register arm64/sysreg: Update ID_DFR0_EL1 register fields arm64/sysreg: Add register fields for ID_AA64DFR1_EL1 * for-next/dpisa: : Support for 2023 dpISA extensions kselftest/arm64: Add 2023 DPISA hwcap test coverage kselftest/arm64: Add basic FPMR test kselftest/arm64: Handle FPMR context in generic signal frame parser arm64/hwcap: Define hwcaps for 2023 DPISA features arm64/ptrace: Expose FPMR via ptrace arm64/signal: Add FPMR signal handling arm64/fpsimd: Support FEAT_FPMR arm64/fpsimd: Enable host kernel access to FPMR arm64/cpufeature: Hook new identification registers up to cpufeature
2024-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski38-298/+437
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-07ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512Christoph Lameter (Ampere)1-1/+2
Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere Computing) are planning to ship systems with 512 CPUs. So that all CPUs on these systems can be used with defconfig, we'd like to bump NR_CPUS to 512. Therefore this patch increases the default NR_CPUS from 256 to 512. As increasing NR_CPUS will increase the size of cpumasks, there's a fear that this might have a significant impact on stack usage due to code which places cpumasks on the stack. To mitigate that concern, we can select CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with NR_CPUS=256, we only select this when NR_CPUS > 256. CPUMASK_OFFSTACK configures the cpumasks in the kernel to be dynamically allocated. This was used in the X86 architecture in the past to enable support for larger CPU configurations up to 8k cpus. With that is becomes possible to dynamically size the allocation of the cpu bitmaps depending on the quantity of processors detected on bootup. Memory used for cpumasks will increase if the kernel is run on a machine with more cores. Further increases may be needed if ARM processor vendors start supporting more processors. Given the current inflationary trends in core counts from multiple processor manufacturers this may occur. There are minor regressions for hackbench. The kernel data size for 512 cpus is smaller with offstack than with onstack. Benchmark results using hackbench average over 10 runs of hackbench -s 512 -l 2000 -g 15 -f 25 -P on Altra 80 Core Support for 256 CPUs on stack. Baseline 7.8564 sec Support for 512 CUs on stack. 7.8713 sec + 0.18% 512 CPUS offstack 7.8916 sec + 0.44% Kernel size comparison: text data filename Difference to onstack256 baseline 25755648 9589248 vmlinuz-6.8.0-rc4-onstack256 25755648 9607680 vmlinuz-6.8.0-rc4-onstack512 +0.19% 25755648 9603584 vmlinuz-6.8.0-rc4-offstack512 +0.14% Tested-by: Eric Mackay <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Signed-off-by: Christoph Lameter (Ampere) <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] [[email protected]: use 'select' instead of duplicating 'config CPUMASK_OFFSTACK'] Signed-off-by: Catalin Marinas <[email protected]>