aboutsummaryrefslogtreecommitdiff
path: root/arch/s390/kernel
AgeCommit message (Collapse)AuthorFilesLines
2024-05-14s390/ipl: Introduce macros for (re)ipl sysfs attribute 'scp_data'Alexander Egorenkov1-163/+83
This is a refactoring change to reduce code duplication and improve code reuse. Acked-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Egorenkov <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/ipl: Fix incorrect initialization of nvme dump blockAlexander Egorenkov1-3/+3
Initialize the correct fields of the nvme dump block. This bug had not been detected before because first, the fcp and nvme fields of struct ipl_parameter_block are part of the same union and, therefore, overlap in memory and second, they are identical in structure and size. Fixes: d70e38cb1dee ("s390: nvme dump support") Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Egorenkov <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/ipl: Fix incorrect initialization of len fields in nvme reipl blockAlexander Egorenkov1-2/+2
Use correct symbolic constants IPL_BP_NVME_LEN and IPL_BP0_NVME_LEN to initialize nvme reipl block when 'scp_data' sysfs attribute is being updated. This bug had not been detected before because the corresponding fcp and nvme symbolic constants are equal. Fixes: 23a457b8d57d ("s390: nvme reipl") Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Egorenkov <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/ipl: Do not accept z/VM CP diag X'008' cmds longer than max lengthAlexander Egorenkov1-1/+5
The old implementation of vmcmd sysfs string attributes truncated passed z/VM CP diagnose X'008' commands which were longer than the max allowed number of characters but the reported number of written characters was still equal to the entire length of a given string. This can result in silent failures of some s390-tools (e.g. dumpconf) which can be very hard to detect. Therefore, this commit makes a write attempt to a vmcmd sysfs attribute * fail with E2BIG error if a given string is longer than the maximum allowed one * never destroy the old data in the vmcmd sysfs attribute if the new data doesn't fit into it entirely * return the actual number of written characters if it succeeds Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Egorenkov <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/ipl: Fix size of vmcmd buffers for sending z/VM CP diag X'008' cmdsAlexander Egorenkov1-13/+15
z/VM CP diagnose X'008' accepts commands of max 240 characters. Using a smaller value as a buffer size makes kernel send truncated CP commands which are longer than the old buffer size. This can result in invalid CP commands passed to z/VM. Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Egorenkov <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/alternatives: Convert runtime sanity check into compile time checkHeiko Carstens1-7/+0
__apply_alternatives() contains a runtime check which verifies that the size of the to be patched code area is even. Convert this to a compile time check using a similar ".org" trick, which is already used to verify that old and new code areas have the same size. Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/irq: Set CIF_NOHZ_DELAY in do_io_irq()Sven Schnelle1-0/+1
Both do_airq_interrupt() and do_io_interrupt() set CIF_NOHZ_DELAY. Move it to do_io_irq() to simplify the code. Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14Makefile: remove redundant tool coverage variablesMasahiro Yamada2-16/+0
Now Kbuild provides reasonable defaults for objtool, sanitizers, and profilers. Remove redundant variables. Note: This commit changes the coverage for some objects: - include arch/mips/vdso/vdso-image.o into UBSAN, GCOV, KCOV - include arch/sparc/vdso/vdso-image-*.o into UBSAN - include arch/sparc/vdso/vma.o into UBSAN - include arch/x86/entry/vdso/extable.o into KASAN, KCSAN, UBSAN, GCOV, KCOV - include arch/x86/entry/vdso/vdso-image-*.o into KASAN, KCSAN, UBSAN, GCOV, KCOV - include arch/x86/entry/vdso/vdso32-setup.o into KASAN, KCSAN, UBSAN, GCOV, KCOV - include arch/x86/entry/vdso/vma.o into GCOV, KCOV - include arch/x86/um/vdso/vma.o into KASAN, GCOV, KCOV I believe these are positive effects because all of them are kernel space objects. Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Kees Cook <[email protected]> Tested-by: Roberto Sassu <[email protected]>
2024-05-14s390/fpu: Remove comment about TIF_FPUThomas Huth1-5/+0
It has been removed in commit 2c6b96762fbd ("s390/fpu: remove TIF_FPU"), so we should not mention TIF_FPU in the comment here anymore. Since the remaining parts of the comment just document the obvious fact that save_user_fpu_regs() saves the FPU state, simply remove the comment now completely. Signed-off-by: Thomas Huth <[email protected]> Acked-by: Heiko Carstens <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/vtime: Use get_cpu_timer()Sven Schnelle1-9/+1
Instead of implementing get_vtimer() use get_cpu_timer() which does the same. Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/idle: Rewrite psw_idle() in CSven Schnelle3-32/+7
To ease maintenance and further enhancements, convert the psw_idle() function to C. Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/stackstrace: Detect vdso stack framesHeiko Carstens4-8/+35
Clear the backchain of the extra stack frame added by the vdso user wrapper code. This allows the user stack walker to detect and skip the non-standard stack frame. Without this an incorrect instruction pointer would be added to stack traces, and stack frame walking would be continued with a more or less random back chain. Fixes: aa44433ac4ee ("s390: add USER_STACKTRACE support") Reviewed-by: Jens Remus <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/vdso: Introduce and use struct stack_frame_vdso_wrapperHeiko Carstens2-10/+12
Introduce and use struct stack_frame_vdso_wrapper within vdso user wrapper code. With this structure it is possible to automatically generate an asm-offset define which can be used to save and restore the return address of the calling function. Also use STACK_FRAME_USER_OVERHEAD instead of STACK_FRAME_OVERHEAD to document that the code works with user space stack frames with the standard stack frame layout. Fixes: aa44433ac4ee ("s390: add USER_STACKTRACE support") Reviewed-by: Jens Remus <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/stacktrace: Improve detection of invalid instruction pointersHeiko Carstens1-4/+22
Add basic checks to identify invalid instruction pointers when walking stack frames: Instruction pointers must - have even addresses - be larger than mmap_min_addr - lower than the asce_limit of the process Alternatively it would also be possible to walk page tables similar to fast GUP and verify that the mapping of the corresponding page is executable, however that seems to be overkill. Fixes: aa44433ac4ee ("s390: add USER_STACKTRACE support") Reviewed-by: Jens Remus <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/stacktrace: Skip first user stack frameHeiko Carstens1-4/+4
When walking user stack frames the first stack frame (where the stack pointer points to) should be skipped: the return address of the current function is saved in the previous stack frame, not the current stack frame, which is allocated for to be called functions. Fixes: aa44433ac4ee ("s390: add USER_STACKTRACE support") Reviewed-by: Jens Remus <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/stacktrace: Merge perf_callchain_user() and arch_stack_walk_user()Heiko Carstens2-38/+29
The two functions perf_callchain_user() and arch_stack_walk_user() are nearly identical. Reduce code duplication and add a common helper which can be called by both functions. Fixes: aa44433ac4ee ("s390: add USER_STACKTRACE support") Reviewed-by: Jens Remus <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/vdso: Use standard stack frame layoutHeiko Carstens2-0/+2
By default user space is compiled with standard stack frame layout and not with the packed stack layout. The vdso code however inherited the -mpacked-stack compiler option from the kernel. Remove this option to make sure the vdso is compiled with standard stack frame layout. This makes sure that the stack frame backchain location for vdso generated stack frames is the same like for calling code (if compiled with default options). This allows to manually walk stack frames without DWARF information, like the kernel is doing it e.g. with arch_stack_walk_user(). Fixes: 4bff8cb54502 ("s390: convert to GENERIC_VDSO") Reviewed-by: Jens Remus <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14s390/vdso: Generate unwind information for C modulesJens Remus2-2/+4
GDB fails to unwind vDSO functions with error message "PC not saved", for instance when stepping through gettimeofday(). Add -fasynchronous-unwind-tables to CFLAGS to generate .eh_frame DWARF unwind information for the vDSO C modules. Fixes: 4bff8cb54502 ("s390: convert to GENERIC_VDSO") Signed-off-by: Jens Remus <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-05-14arch: make execmem setup available regardless of CONFIG_MODULESMike Rapoport (IBM)1-27/+0
execmem does not depend on modules, on the contrary modules use execmem. To make execmem available when CONFIG_MODULES=n, for instance for kprobes, split execmem_params initialization out from arch/*/kernel/module.c and compile it when CONFIG_EXECMEM=y Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2024-05-14mm/execmem, arch: convert remaining overrides of module_alloc to execmemMike Rapoport (IBM)1-32/+22
Extend execmem parameters to accommodate more complex overrides of module_alloc() by architectures. This includes specification of a fallback range required by arm, arm64 and powerpc, EXECMEM_MODULE_DATA type required by powerpc, support for allocation of KASAN shadow required by s390 and x86 and support for late initialization of execmem required by arm64. The core implementation of execmem_alloc() takes care of suppressing warnings when the initial allocation fails but there is a fallback range defined. Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Will Deacon <[email protected]> Acked-by: Song Liu <[email protected]> Tested-by: Liviu Dudau <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2024-05-14mm: introduce execmem_alloc() and execmem_free()Mike Rapoport (IBM)3-6/+7
module_alloc() is used everywhere as a mean to allocate memory for code. Beside being semantically wrong, this unnecessarily ties all subsystems that need to allocate code, such as ftrace, kprobes and BPF to modules and puts the burden of code allocation to the modules code. Several architectures override module_alloc() because of various constraints where the executable memory can be located and this causes additional obstacles for improvements of code allocation. Start splitting code allocation from modules by introducing execmem_alloc() and execmem_free() APIs. Initially, execmem_alloc() is a wrapper for module_alloc() and execmem_free() is a replacement of module_memfree() to allow updating all call sites to use the new APIs. Since architectures define different restrictions on placement, permissions, alignment and other parameters for memory that can be used by different subsystems that allocate executable memory, execmem_alloc() takes a type argument, that will be used to identify the calling subsystem and to allow architectures define parameters for ranges suitable for that subsystem. No functional changes. Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2024-05-13Merge tag 'sched-core-2024-05-13' of ↵Linus Torvalds2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Add cpufreq pressure feedback for the scheduler - Rework misfit load-balancing wrt affinity restrictions - Clean up and simplify the code around ::overutilized and ::overload access. - Simplify sched_balance_newidle() - Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES handling that changed the output. - Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch() - Reorganize, clean up and unify most of the higher level scheduler balancing function names around the sched_balance_*() prefix - Simplify the balancing flag code (sched_balance_running) - Miscellaneous cleanups & fixes * tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) sched/pelt: Remove shift of thermal clock sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure() thermal/cpufreq: Remove arch_update_thermal_pressure() sched/cpufreq: Take cpufreq feedback into account cpufreq: Add a cpufreq pressure feedback for the scheduler sched/fair: Fix update of rd->sg_overutilized sched/vtime: Do not include <asm/vtime.h> header s390/irq,nmi: Include <asm/vtime.h> header directly s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover sched/vtime: Get rid of generic vtime_task_switch() implementation sched/vtime: Remove confusing arch_vtime_task_switch() declaration sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized() sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded() sched/fair: Rename root_domain::overload to ::overloaded sched/fair: Use helper functions to access root_domain::overload sched/fair: Check root_domain::overload value before update sched/fair: Combine EAS check with root_domain::overutilized access sched/fair: Simplify the continue_balancing logic in sched_balance_newidle() ...
2024-05-13Merge tag 's390-6.10-1' of ↵Linus Torvalds12-84/+127
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Store AP Query Configuration Information in a static buffer - Rework the AP initialization and add missing cleanups to the error path - Swap IRQ and AP bus/device registration to avoid race conditions - Export prot_virt_guest symbol - Introduce AP configuration changes notifier interface to facilitate modularization of the AP bus - Add CONFIG_AP kernel configuration option to allow modularization of the AP bus - Rework CONFIG_ZCRYPT_DEBUG kernel configuration option description and dependency and rename it to CONFIG_AP_DEBUG - Convert sprintf() and snprintf() to sysfs_emit() in CIO code - Adjust indentation of RELOCS command build step - Make crypto performance counters upward compatible - Convert make_page_secure() and gmap_make_secure() to use folio - Rework channel-utilization-block (CUB) handling in preparation of introducing additional CUBs - Use attribute groups to simplify registration, removal and extension of measurement-related channel-path sysfs attributes - Add a per-channel-path binary "ext_measurement" sysfs attribute that provides access to extended channel-path measurement data - Export measurement data for all channel-measurement-groups (CMG), not only for a specific ones. This enables support of new CMG data formats in userspace without the need for kernel changes - Add a per-channel-path sysfs attribute "speed_bps" that provides the operating speed in bits per second or 0 if the operating speed is not available - The CIO tracepoint subchannel-type field "st" is incorrectly set to the value of subchannel-enabled SCHIB "ena" field. Fix that - Do not forcefully limit vmemmap starting address to MAX_PHYSMEM_BITS - Consider the maximum physical address available to a DCSS segment (512GB) when memory layout is set up - Simplify the virtual memory layout setup by reducing the size of identity mapping vs vmemmap overlap - Swap vmalloc and Lowcore/Real Memory Copy areas in virtual memory. This will allow to place the kernel image next to kernel modules - Move everyting KASLR related from <asm/setup.h> to <asm/page.h> - Put virtual memory layout information into a structure to improve code generation - Currently __kaslr_offset is the kernel offset in both physical and virtual memory spaces. Uncouple these offsets to allow uncoupling of the addresses spaces - Currently the identity mapping base address is implicit and is always set to zero. Make it explicit by putting into __identity_base persistent boot variable and use it in proper context - Introduce .amode31 section start and end macros AMODE31_START and AMODE31_END - Introduce OS_INFO entries that do not reference any data in memory, but rather provide only values - Store virtual memory layout in OS_INFO. It is read out by makedumpfile, crash and other tools - Store virtual memory layout in VMCORE_INFO. It is read out by crash and other tools when /proc/kcore device is used - Create additional PT_LOAD ELF program header that covers kernel image only, so that vmcore tools could locate kernel text and data when virtual and physical memory spaces are uncoupled - Uncouple physical and virtual address spaces - Map kernel at fixed location when KASLR mode is disabled. The location is defined by CONFIG_KERNEL_IMAGE_BASE kernel configuration value. - Rework deployment of kernel image for both compressed and uncompressed variants as defined by CONFIG_KERNEL_UNCOMPRESSED kernel configuration value - Move .vmlinux.relocs section in front of the compressed kernel. The interim section rescue step is avoided as result - Correct modules thunk offset calculation when branch target is more than 2GB away - Kernel modules contain their own set of expoline thunks. Now that the kernel modules area is less than 4GB away from kernel expoline thunks, make modules use kernel expolines. Also make EXPOLINE_EXTERN the default if the compiler supports it - userfaultfd can insert shared zeropages into processes running VMs, but that is not allowed for s390. Fallback to allocating a fresh zeroed anonymous folio and insert that instead - Re-enable shared zeropages for non-PV and non-skeys KVM guests - Rename hex2bitmap() to ap_hex2bitmap() and export it for external use - Add ap_config sysfs attribute to provide the means for setting or displaying adapters, domains and control domains assigned to a vfio-ap mediated device in a single operation - Make vfio_ap_mdev_link_queue() ignore duplicate link requests - Add write support to ap_config sysfs attribute to allow atomic update a vfio-ap mediated device state - Document ap_config sysfs attribute - Function os_info_old_init() is expected to be called only from a regular kdump kernel. Enable it to be called from a stand-alone dump kernel - Address gcc -Warray-bounds warning and fix array size in struct os_info - s390 does not support SMBIOS, so drop unneeded CONFIG_DMI checks - Use unwinder instead of __builtin_return_address() with ftrace to prevent returning of undefined values - Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD kernel is enabled. Drop these for the case CONFIG_PIE_BUILD is disabled - Compile kernel with -fPIC and link with -no-pie to allow kpatch feature always succeed and drop the whole CONFIG_PIE_BUILD option-enabled code - Add missing virt_to_phys() converter for VSIE facility and crypto control blocks * tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits) Revert "s390: Relocate vmlinux ELF data to virtual address space" KVM: s390: vsie: Use virt_to_phys for crypto control block s390: Relocate vmlinux ELF data to virtual address space s390: Compile kernel with -fPIC and link with -no-pie s390: vmlinux.lds.S: Drop .hash and .gnu.hash for !CONFIG_PIE_BUILD s390/ftrace: Use unwinder instead of __builtin_return_address() s390/pci: Drop unneeded reference to CONFIG_DMI s390/os_info: Fix array size in struct os_info s390/os_info: Initialize old os_info in standalone dump kernel docs: Update s390 vfio-ap doc for ap_config sysfs attribute s390/vfio-ap: Add write support to sysfs attr ap_config s390/vfio-ap: Ignore duplicate link requests in vfio_ap_mdev_link_queue s390/vfio-ap: Add sysfs attr, ap_config, to export mdev state s390/ap: Externalize AP bus specific bitmap reading function s390/mm: Re-enable the shared zeropage for !PV and !skeys KVM guests mm/userfaultfd: Do not place zeropages when zeropages are disallowed s390/expoline: Make modules use kernel expolines s390/nospec: Correct modules thunk offset calculation s390/boot: Do not rescue .vmlinux.relocs section s390/boot: Rework deployment of the kernel image ...
2024-05-10kbuild: use $(src) instead of $(srctree)/$(src) for source directoryMasahiro Yamada3-4/+4
Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for checked-in source files. It is merely a convention without any functional difference. In fact, $(obj) and $(src) are exactly the same, as defined in scripts/Makefile.build: src := $(obj) When the kernel is built in a separate output directory, $(src) does not accurately reflect the source directory location. While Kbuild resolves this discrepancy by specifying VPATH=$(srctree) to search for source files, it does not cover all cases. For example, when adding a header search path for local headers, -I$(srctree)/$(src) is typically passed to the compiler. This introduces inconsistency between upstream and downstream Makefiles because $(src) is used instead of $(srctree)/$(src) for the latter. To address this inconsistency, this commit changes the semantics of $(src) so that it always points to the directory in the source tree. Going forward, the variables used in Makefiles will have the following meanings: $(obj) - directory in the object tree $(src) - directory in the source tree (changed by this commit) $(objtree) - the top of the kernel object tree $(srctree) - the top of the kernel source tree Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced with $(src). Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Nicolas Schier <[email protected]>
2024-05-02arch: use $(obj)/ instead of $(src)/ for preprocessed linker scriptsMasahiro Yamada2-2/+2
These are generated files. Prefix them with $(obj)/ instead of $(src)/. Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Helge Deller <[email protected]> Reviewed-by: Nicolas Schier <[email protected]>
2024-04-29s390: Compile kernel with -fPIC and link with -no-pieSumanth Korikkar1-33/+0
When the kernel is built with CONFIG_PIE_BUILD option enabled it uses dynamic symbols, for which the linker does not allow more than 64K number of entries. This can break features like kpatch. Hence, whenever possible the kernel is built with CONFIG_PIE_BUILD option disabled. For that support of unaligned symbols generated by linker scripts in the compiler is necessary. However, older compilers might lack such support. In that case the build process resorts to CONFIG_PIE_BUILD option-enabled build. Compile object files with -fPIC option and then link the kernel binary with -no-pie linker option. As result, the dynamic symbols are not generated and not only kpatch feature succeeds, but also the whole CONFIG_PIE_BUILD option-enabled code could be dropped. [ agordeev: Reworded the commit message ] Suggested-by: Ulrich Weigand <[email protected]> Signed-off-by: Sumanth Korikkar <[email protected]> Reviewed-by: Alexander Gordeev <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-29s390: vmlinux.lds.S: Drop .hash and .gnu.hash for !CONFIG_PIE_BUILDSumanth Korikkar1-1/+1
Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD option is enabled. Drop these for the case CONFIG_PIE_BUILD is disabled. [ agordeev: Reworded the commit message ] Fixes: 778666df60f0 ("s390: compile relocatable kernel without -fPIE") Suggested-by: Alexander Gordeev <[email protected]> Signed-off-by: Sumanth Korikkar <[email protected]> Reviewed-by: Alexander Gordeev <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-29s390/ftrace: Use unwinder instead of __builtin_return_address()Sven Schnelle2-0/+21
Using __builtin_return_address(n) might return undefined values when used with values of n outside of the stack. This was noticed when __builtin_return_address() was called in ftrace on top level functions like the interrupt handlers. As this behaviour cannot be fixed, use the s390 stack unwinder and remove the ftrace compilation flags for unwind_bc.c and stacktrace.c to prevent the unwinding function polluting function traces. Another advantage is that this also works with clang. Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-29s390/os_info: Fix array size in struct os_infoSven Schnelle1-0/+1
gcc's -Warray-bounds warned about an out-of-bounds access to the entry array contained in struct os_info. This doesn't trigger a bug right now because there's a large reserved space after the array. Nevertheless fix this, and also add a BUILD_BUG_ON to make sure struct os_info is always exactly on page in size. Fixes: f4cac27dc0d6 ("s390/crash: Use old os_info to create PT_LOAD headers") Reviewed-by: Alexander Gordeev <[email protected]> Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-29s390/os_info: Initialize old os_info in standalone dump kernelAlexander Egorenkov1-1/+2
The commit be42660d0c13 ("s390/crash: use old os_info to create PT_LOAD headers") introduced use of the old os_info into standalone dump kernel. Before this change os_info_old_init() expected to be called only from a regular kdump kernel although the function itself is able to work in standalone dump kernels as well (because copy_oldmem_kernel() is able to handle both use cases). Therefore, fix the expectation of os_info_old_init() and enable it to be called from a standalone dump kernel. Fixes: f4cac27dc0d6 ("s390/crash: Use old os_info to create PT_LOAD headers") Acked-by: Alexander Gordeev <[email protected]> Signed-off-by: Alexander Egorenkov <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-26s390/vdso: Add CFI for RA register to asm macro vdso_funcJens Remus1-0/+2
The return-address (RA) register r14 is specified as volatile in the s390x ELF ABI [1]. Nevertheless proper CFI directives must be provided for an unwinder to restore the return address, if the RA register value is changed from its value at function entry, as it is the case. [1]: s390x ELF ABI, https://github.com/IBM/s390x-abi/releases Fixes: 4bff8cb54502 ("s390: convert to GENERIC_VDSO") Signed-off-by: Jens Remus <[email protected]> Acked-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-25fix missing vmalloc.h includesKent Overstreet2-0/+2
Patch series "Memory allocation profiling", v6. Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production. Example output: root@moria-kvm:~# sort -rn /proc/allocinfo 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext 56373248 4737 mm/slub.c:2259 func:alloc_slab_page 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start 3940352 962 mm/memory.c:4214 func:alloc_anon_folio 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node ... Usage: kconfig options: - CONFIG_MEM_ALLOC_PROFILING - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT - CONFIG_MEM_ALLOC_PROFILING_DEBUG adds warnings for allocations that weren't accounted because of a missing annotation sysctl: /proc/sys/vm/mem_profiling Runtime info: /proc/allocinfo Notes: [1]: Overhead To measure the overhead we are comparing the following configurations: (1) Baseline with CONFIG_MEMCG_KMEM=n (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT (6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y (7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y Performance overhead: To evaluate performance we implemented an in-kernel test executing multiple get_free_page/free_page and kmalloc/kfree calls with allocation sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU affinity set to a specific CPU to minimize the noise. Below are results from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on 56 core Intel Xeon: kmalloc pgalloc (1 baseline) 6.764s 16.902s (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) (6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%) (7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%) Memory overhead: Kernel size: text data bss dec diff (1) 26515311 18890222 17018880 62424413 (2) 26524728 19423818 16740352 62688898 264485 (3) 26524724 19423818 16740352 62688894 264481 (4) 26524728 19423818 16740352 62688898 264485 (5) 26541782 18964374 16957440 62463596 39183 Memory consumption on a 56 core Intel CPU with 125GB of memory: Code tags: 192 kB PageExts: 262144 kB (256MB) SlabExts: 9876 kB (9.6MB) PcpuExts: 512 kB (0.5MB) Total overhead is 0.2% of total memory. Benchmarks: Hackbench tests run 100 times: hackbench -s 512 -l 200 -g 15 -f 25 -P baseline disabled profiling enabled profiling avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023) stdev 0.0137 0.0188 0.0077 hackbench -l 10000 baseline disabled profiling enabled profiling avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859) stdev 0.0933 0.0286 0.0489 stress-ng tests: stress-ng --class memory --seq 4 -t 60 stress-ng --class cpu --seq 4 -t 60 Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/ [2] https://lore.kernel.org/all/[email protected]/ This patch (of 37): The next patch drops vmalloc.h from a system header in order to fix a circular dependency; this adds it to all the files that were pulling it in implicitly. [[email protected]: fix arch/alpha/lib/memcpy.c] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix arch/x86/mm/numa_32.c] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: a few places were depending on sizes.h] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix mm/kasan/hw_tags.c] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix arc build] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Pasha Tatashin <[email protected]> Tested-by: Kees Cook <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alex Gaynor <[email protected]> Cc: Alice Ryhl <[email protected]> Cc: Andreas Hindborg <[email protected]> Cc: Benno Lossin <[email protected]> Cc: "Björn Roy Baron" <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Gary Guo <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Wedson Almeida Filho <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-04-17s390/mm: Fix NULL pointer dereferenceSven Schnelle1-1/+2
The recently added check to figure out if a fault happened on gmap ASCE dereferences the gmap pointer in lowcore without checking that it is not NULL. For all non-KVM processes the pointer is NULL, so that some value from lowcore will be read. With the current layouts of struct gmap and struct lowcore the read value (aka ASCE) is zero, so that this doesn't lead to any observable bug; at least currently. Fix this by adding the missing NULL pointer check. Fixes: 64c3431808bd ("s390/entry: compare gmap asce to determine guest/host fault") Signed-off-by: Sven Schnelle <[email protected]> Reviewed-by: Claudio Imbrenda <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-17s390/nospec: Correct modules thunk offset calculationVasily Gorbik1-2/+2
Fix offset calculation when branch target is more then 2Gb away. Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-17s390/boot: Rework deployment of the kernel imageAlexander Gordeev1-3/+2
Rework deployment of kernel image for both compressed and uncompressed variants as defined by CONFIG_KERNEL_UNCOMPRESSED kernel configuration variable. In case CONFIG_KERNEL_UNCOMPRESSED is disabled avoid uncompressing the kernel to a temporary buffer and copying it to the target address. Instead, uncompress it directly to the target destination. In case CONFIG_KERNEL_UNCOMPRESSED is enabled avoid moving the kernel to default 0x100000 location when KASLR is disabled or failed. Instead, use the uncompressed kernel image directly. In case KASLR is disabled or failed .amode31 section location in memory is not randomized and precedes the kernel image. In case CONFIG_KERNEL_UNCOMPRESSED is disabled that location overlaps the area used by the decompression algorithm. That is fine, since that area is not used after the decompression finished and the size of .amode31 section is not expected to exceed BOOT_HEAP_SIZE ever. There is no decompression in case CONFIG_KERNEL_UNCOMPRESSED is enabled. Therefore, rename decompress_kernel() to deploy_kernel(), which better describes both uncompressed and compressed cases. Introduce AMODE31_SIZE macro to avoid immediate value of 0x3000 (the size of .amode31 section) in the decompressor linker script. Modify the vmlinux linker script to force the size of .amode31 section to AMODE31_SIZE (the value of (_eamode31 - _samode31) could otherwise differ as result of compiler options used). Introduce __START_KERNEL macro that defines the kernel ELF image entry point and set it to the currrent value of 0x100000. Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-17s390/crash: Use old os_info to create PT_LOAD headersAlexander Gordeev2-5/+39
This is a preparatory rework to allow uncoupling virtual and physical addresses spaces. The vmcore ELF program headers describe virtual memory regions of a crashed kernel. User level tools use that information for the kernel text and data analysis (e.g vmcore-dmesg extracts the kernel log). Currently the kernel image is covered by program headers describing the identity mapping regions. But in the future the kernel image will be mapped into separate region outside of the identity mapping. Create the additional ELF program header that covers kernel image only, so that vmcore tools could locate kernel text and data. Further, the identity mapping in crashed and capture kernels will have different base address. Due to that __va() macro can not be used in the capture kernel. Instead, read crashed kernel identity mapping base address from os_info and use it for PT_LOAD type program headers creation. Acked-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-17s390/vmcoreinfo: Store virtual memory layoutAlexander Gordeev1-0/+2
This is a preparatory rework to allow uncoupling virtual and physical addresses spaces. The virtual memory layout is needed for address translation by crash tool when /proc/kcore device is used as the memory image. Acked-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-17s390/os_info: Store virtual memory layoutAlexander Gordeev1-0/+7
This is a preparatory rework to allow uncoupling virtual and physical addresses spaces. The virtual memory layout will be read out by makedumpfile, crash and other user tools for virtual address translation. Acked-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-17s390/os_info: Introduce value entriesAlexander Gordeev2-5/+16
Introduce entries that do not reference any data in memory, but rather provide values. Set the size of such entries to zero and do not compute checksum for them, since there is no data which integrity needs to be checked. The integrity of the value entries itself is still covered by the os_info checksum. Reserve the lowest unused entry index OS_INFO_RESERVED for future use - presumably for the number of entries present. That could later be used by user level tools. The existing tools would not notice any difference. Acked-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-17s390/boot: Make .amode31 section address range explicitAlexander Gordeev1-1/+1
This is a preparatory rework to allow uncoupling virtual and physical addresses spaces. Introduce .amode31 section address range AMODE31_START and AMODE31_END macros for later use. Acked-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-17s390/boot: Make identity mapping base address explicitAlexander Gordeev1-0/+1
This is a preparatory rework to allow uncoupling virtual and physical addresses spaces. Currently the identity mapping base address is implicit and is always set to zero. Make it explicit by putting into __identity_base persistent boot variable and use it in proper context - which is the value of PAGE_OFFSET. Acked-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-17s390/mm: Create virtual memory layout structureAlexander Gordeev1-2/+1
This is a preparatory rework to allow uncoupling virtual and physical addresses spaces. Put virtual memory layout information into a structure to improve code generation when accessing the structure members, which are currently only ident_map_size and __kaslr_offset. Acked-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-17s390/irq,nmi: Include <asm/vtime.h> header directlyAlexander Gordeev2-0/+2
update_timer_sys() and update_timer_mcck() are inlines used for CPU time accounting from the interrupt and machine-check handlers. These routines are specific to s390 architecture, but included via <linux/vtime.h> header implicitly. Avoid the extra loop and include <asm/vtime.h> header directly. Signed-off-by: Alexander Gordeev <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Heiko Carstens <[email protected]> Link: https://lore.kernel.org/r/3fb696637c0eb7e9d6ffd6cbf9e647d7c5986b3d.1712760275.git.agordeev@linux.ibm.com
2024-04-09s390/mm: Convert gmap_make_secure to use a folioMatthew Wilcox (Oracle)1-13/+14
Remove uses of deprecated page APIs, and move the check for large folios to here to avoid taking the folio lock if the folio is too large. We could do better here by attempting to split the large folio, but I'll leave that improvement for someone who can test it. Acked-by: Claudio Imbrenda <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-09s390/mm: Convert make_page_secure to use a folioMatthew Wilcox (Oracle)1-13/+16
These page APIs are deprecated, so convert the incoming page to a folio and use the folio APIs instead. The ultravisor API cannot handle large folios, so return -EINVAL if one has slipped through. Acked-by: Claudio Imbrenda <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-09s390/cpum_cf: make crypto counters upward compatible across machine typesThomas Richter2-9/+4
The CPU Measurement facility crypto counter set functionality is defined by the Second Counter Version Number. This number varies between machine types, but is upward compatible. Lessen the checks to reflect this behavior. Signed-off-by: Thomas Richter <[email protected]> Acked-by: Sumanth Korikkar <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-09s390/uv: export prot_virt_guest symbol in uvHolger Dengler1-0/+1
The inline function is_prot_virt_guest() in asm/uv.h makes use of the prot_virt_guest symbol. As this inline function can be called by other parts of the kernel (modules and built-in), the symbol should be exported, similar to the prot_virt_host symbol. One consumer of is_prot_virt_guest() will be the ap bus code. Cc: Janosch Frank <[email protected]> Cc: Claudio Imbrenda <[email protected]> Signed-off-by: Holger Dengler <[email protected]> Reviewed-by: Harald Freudenberger <[email protected]> Acked-by: Claudio Imbrenda <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-04-03s390/entry: align system call table on 8 bytesSumanth Korikkar1-0/+1
Align system call table on 8 bytes. With sys_call_table entry size of 8 bytes that eliminates the possibility of a system call pointer crossing cache line boundary. Cc: [email protected] Suggested-by: Ulrich Weigand <[email protected]> Reviewed-by: Alexander Gordeev <[email protected]> Signed-off-by: Sumanth Korikkar <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2024-04-03s390/pai: fix sampling event removal for PMU device driverThomas Richter2-6/+14
In case of a sampling event, the PAI PMU device drivers need a reference to this event. Currently to PMU device driver reference is removed when a sampling event is destroyed. This may lead to situations where the reference of the PMU device driver is removed while being used by a different sampling event. Reset the event reference pointer of the PMU device driver when a sampling event is deleted and before the next one might be added. Fixes: 39d62336f5c1 ("s390/pai: add support for cryptography counters") Signed-off-by: Thomas Richter <[email protected]> Acked-by: Sumanth Korikkar <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2024-03-19Merge tag 's390-6.9-2' of ↵Linus Torvalds3-45/+34
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Heiko Carstens: - Various virtual vs physical address usage fixes - Add new bitwise types and helper functions and use them in s390 specific drivers and code to make it easier to find virtual vs physical address usage bugs. Right now virtual and physical addresses are identical for s390, except for module, vmalloc, and similar areas. This will be changed, hopefully with the next merge window, so that e.g. the kernel image and modules will be located close to each other, allowing for direct branches and also for some other simplifications. As a prerequisite this requires to fix all misuses of virtual and physical addresses. As it turned out people are so used to the concept that virtual and physical addresses are the same, that new bugs got added to code which was already fixed. In order to avoid that even more code gets merged which adds such bugs add and use new bitwise types, so that sparse can be used to find such usage bugs. Most likely the new types can go away again after some time - Provide a simple ARCH_HAS_DEBUG_VIRTUAL implementation - Fix kprobe branch handling: if an out-of-line single stepped relative branch instruction has a target address within a certain address area in the entry code, the program check handler may incorrectly execute cleanup code as if KVM code was executed, leading to crashes - Fix reference counting of zcrypt card objects - Various other small fixes and cleanups * tag 's390-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits) s390/entry: compare gmap asce to determine guest/host fault s390/entry: remove OUTSIDE macro s390/entry: add CIF_SIE flag and remove sie64a() address check s390/cio: use while (i--) pattern to clean up s390/raw3270: make class3270 constant s390/raw3270: improve raw3270_init() readability s390/tape: make tape_class constant s390/vmlogrdr: make vmlogrdr_class constant s390/vmur: make vmur_class constant s390/zcrypt: make zcrypt_class constant s390/mm: provide simple ARCH_HAS_DEBUG_VIRTUAL support s390/vfio_ccw_cp: use new address translation helpers s390/iucv: use new address translation helpers s390/ctcm: use new address translation helpers s390/lcs: use new address translation helpers s390/qeth: use new address translation helpers s390/zfcp: use new address translation helpers s390/tape: fix virtual vs physical address confusion s390/3270: use new address translation helpers s390/3215: use new address translation helpers ...