aboutsummaryrefslogtreecommitdiff
path: root/arch/arm64/kernel
AgeCommit message (Collapse)AuthorFilesLines
2022-05-17arm64: mte: Ensure the cleared tags are visible before setting the PTECatalin Marinas1-0/+3
As an optimisation, only pages mapped with PROT_MTE in user space have the MTE tags zeroed. This is done lazily at the set_pte_at() time via mte_sync_tags(). However, this function is missing a barrier and another CPU may see the PTE updated before the zeroed tags are visible. Add an smp_wmb() barrier if the mapping is Normal Tagged. Signed-off-by: Catalin Marinas <[email protected]> Fixes: 34bfeea4a9e9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE") Cc: <[email protected]> # 5.10.x Reported-by: Vladimir Murzin <[email protected]> Cc: Will Deacon <[email protected]> Reviewed-by: Steven Price <[email protected]> Tested-by: Vladimir Murzin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2022-05-17arm64: kexec: load from kimage prior to clobberingMark Rutland1-7/+15
In arm64_relocate_new_kernel() we load some fields out of the kimage structure after relocation has occurred. As the kimage structure isn't allocated to be relocation-safe, it may be clobbered during relocation, and we may load junk values out of the structure. Due to this, kexec may fail when the kimage allocation happens to fall within a PA range that an object will be relocated to. This has been observed to occur for regular kexec on a QEMU TCG 'virt' machine with 2GiB of RAM, where the PA range of the new kernel image overlaps the kimage structure. Avoid this by ensuring we load all values from the kimage structure prior to relocation. I've tested this atop v5.16 and v5.18-rc6. Fixes: 878fdbd70486 ("arm64: kexec: pass kimage as the only argument to relocation function") Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: James Morse <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Will Deacon <[email protected]> Reviewed-by: Pasha Tatashin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2022-05-17arm64: paravirt: Use RCU read locks to guard stolen_timePrakruthi Deepak Heragu1-8/+21
During hotplug, the stolen time data structure is unmapped and memset. There is a possibility of the timer IRQ being triggered before memset and stolen time is getting updated as part of this timer IRQ handler. This causes the below crash in timer handler - [ 3457.473139][ C5] Unable to handle kernel paging request at virtual address ffffffc03df05148 ... [ 3458.154398][ C5] Call trace: [ 3458.157648][ C5] para_steal_clock+0x30/0x50 [ 3458.162319][ C5] irqtime_account_process_tick+0x30/0x194 [ 3458.168148][ C5] account_process_tick+0x3c/0x280 [ 3458.173274][ C5] update_process_times+0x5c/0xf4 [ 3458.178311][ C5] tick_sched_timer+0x180/0x384 [ 3458.183164][ C5] __run_hrtimer+0x160/0x57c [ 3458.187744][ C5] hrtimer_interrupt+0x258/0x684 [ 3458.192698][ C5] arch_timer_handler_virt+0x5c/0xa0 [ 3458.198002][ C5] handle_percpu_devid_irq+0xdc/0x414 [ 3458.203385][ C5] handle_domain_irq+0xa8/0x168 [ 3458.208241][ C5] gic_handle_irq.34493+0x54/0x244 [ 3458.213359][ C5] call_on_irq_stack+0x40/0x70 [ 3458.218125][ C5] do_interrupt_handler+0x60/0x9c [ 3458.223156][ C5] el1_interrupt+0x34/0x64 [ 3458.227560][ C5] el1h_64_irq_handler+0x1c/0x2c [ 3458.232503][ C5] el1h_64_irq+0x7c/0x80 [ 3458.236736][ C5] free_vmap_area_noflush+0x108/0x39c [ 3458.242126][ C5] remove_vm_area+0xbc/0x118 [ 3458.246714][ C5] vm_remove_mappings+0x48/0x2a4 [ 3458.251656][ C5] __vunmap+0x154/0x278 [ 3458.255796][ C5] stolen_time_cpu_down_prepare+0xc0/0xd8 [ 3458.261542][ C5] cpuhp_invoke_callback+0x248/0xc34 [ 3458.266842][ C5] cpuhp_thread_fun+0x1c4/0x248 [ 3458.271696][ C5] smpboot_thread_fn+0x1b0/0x400 [ 3458.276638][ C5] kthread+0x17c/0x1e0 [ 3458.280691][ C5] ret_from_fork+0x10/0x20 As a fix, introduce rcu lock to update stolen time structure. Fixes: 75df529bec91 ("arm64: paravirt: Initialize steal time when cpu is online") Cc: [email protected] Suggested-by: Will Deacon <[email protected]> Signed-off-by: Prakruthi Deepak Heragu <[email protected]> Signed-off-by: Elliot Berman <[email protected]> Reviewed-by: Srivatsa S. Bhat (VMware) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2022-05-17arm64: lds: move special code sections out of kernel exec segmentArd Biesheuvel1-9/+12
There are a few code sections that are emitted into the kernel's executable .text segment simply because they contain code, but are actually never executed via this mapping, so they can happily live in a region that gets mapped without executable permissions, reducing the risk of being gadgetized. Note that the kexec and hibernate region contents are always copied into a fresh page, and so there is no need to align them as long as the overall size of each is below 4 KiB. Signed-off-by: Ard Biesheuvel <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-16arm64/sme: Remove _EL0 from name of SVCR - FIXME sysreg.hMark Brown4-26/+26
The defines for SVCR call it SVCR_EL0 however the architecture calls the register SVCR with no _EL0 suffix. In preparation for generating the sysreg definitions rename to match the architecture, no functional change. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-16arm64/sme: Standardise bitfield names for SVCRMark Brown1-3/+3
The bitfield definitions for SVCR have a SYS_ added to the names of the constant which will be a problem for automatic generation. Remove the prefixes, no functional change. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-16arm64/fp: Rename SVE and SME LEN field name to _WIDTHMark Brown1-2/+2
The SVE and SVE length configuration field LEN have constants specifying their width called _SIZE rather than the more normal _WIDTH, in preparation for automatic generation rename to _WIDTH. No functional change. Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-16Merge branch 'for-next/sme' into for-next/sysreg-genCatalin Marinas10-112/+1299
* for-next/sme: (29 commits) : Scalable Matrix Extensions support. arm64/sve: Make kernel FPU protection RT friendly arm64/sve: Delay freeing memory in fpsimd_flush_thread() arm64/sme: More sensibly define the size for the ZA register set arm64/sme: Fix NULL check after kzalloc arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding() arm64/sme: Provide Kconfig for SME KVM: arm64: Handle SME host state when running guests KVM: arm64: Trap SME usage in guest KVM: arm64: Hide SME system registers from guests arm64/sme: Save and restore streaming mode over EFI runtime calls arm64/sme: Disable streaming mode and ZA when flushing CPU state arm64/sme: Add ptrace support for ZA arm64/sme: Implement ptrace support for streaming mode SVE registers arm64/sme: Implement ZA signal handling arm64/sme: Implement streaming SVE signal handling arm64/sme: Disable ZA and streaming mode when handling signals arm64/sme: Implement traps and syscall handling for SME arm64/sme: Implement ZA context switching arm64/sme: Implement streaming SVE context switching arm64/sme: Implement SVCR context switching ...
2022-05-16arm64/sve: Make kernel FPU protection RT friendlySebastian Andrzej Siewior1-2/+14
Non RT kernels need to protect FPU against preemption and bottom half processing. This is achieved by disabling bottom halves via local_bh_disable() which implictly disables preemption. On RT kernels this protection mechanism is not sufficient because local_bh_disable() does not disable preemption. It serializes bottom half related processing via a CPU local lock. As bottom halves are running always in thread context on RT kernels disabling preemption is the proper choice as it implicitly prevents bottom half processing. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Acked-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-16arm64/sve: Delay freeing memory in fpsimd_flush_thread()Sebastian Andrzej Siewior1-2/+15
fpsimd_flush_thread() invokes kfree() via sve_free()+sme_free() within a preempt disabled section which is not working on -RT. Delay freeing of memory until preemption is enabled again. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-12arm64: Enable repeat tlbi workaround on KRYO4XX gold CPUsShreyas K K1-0/+2
Add KRYO4XX gold/big cores to the list of CPUs that need the repeat TLBI workaround. Apply this to the affected KRYO4XX cores (rcpe to rfpe). The variant and revision bits are implementation defined and are different from the their Cortex CPU counterparts on which they are based on, i.e., (r0p0 to r3p0) is equivalent to (rcpe to rfpe). Signed-off-by: Shreyas K K <[email protected]> Reviewed-by: Sai Prakash Ranjan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2022-05-12arm64: cpufeature: remove duplicate ID_AA64ISAR2_EL1 entryKristina Martsenko1-2/+1
The ID register table should have one entry per ID register but currently has two entries for ID_AA64ISAR2_EL1. Only one entry has an override, and get_arm64_ftr_reg() can end up choosing the other, causing the override to be ignored. Fix this by removing the duplicate entry. While here, also make the check in sort_ftr_regs() more strict so that duplicate entries can't be added in the future. Fixes: def8c222f054 ("arm64: Add support of PAuth QARMA3 architected algorithm") Signed-off-by: Kristina Martsenko <[email protected]> Reviewed-by: Vladimir Murzin <[email protected]> Reviewed-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2022-05-11Merge branch 'v5.18-rc5'Peter Zijlstra8-31/+37
Obtain the new INTEL_FAM6 stuff required. Signed-off-by: Peter Zijlstra <[email protected]>
2022-05-11arm64: Declare non global symbols as staticLinu Cherian1-1/+1
Fix below sparse warnings introduced while adding errata. arch/arm64/kernel/cpu_errata.c:218:25: sparse: warning: symbol 'cavium_erratum_23154_cpus' was not declared. Should it be static? Reported-by: kernel test robot <[email protected]> Signed-off-by: Linu Cherian <[email protected]> Acked-by: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-10arm64: vdso: fix makefile dependency on vdso.soJoey Gouly3-6/+4
There is currently no dependency for vdso*-wrap.S on vdso*.so, which means that you can get a build that uses a stale vdso*-wrap.o. In commit a5b8ca97fbf8, the file that includes the vdso.so was moved and renamed from arch/arm64/kernel/vdso/vdso.S to arch/arm64/kernel/vdso-wrap.S, when this happened the Makefile was not updated to force the dependcy on vdso.so. Fixes: a5b8ca97fbf8 ("arm64: do not descend to vdso directories twice") Signed-off-by: Joey Gouly <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2022-05-08arm64: entry: use stackleak_erase_on_task_stack()Mark Rutland1-1/+1
On arm64 we always call stackleak_erase() on a task stack, and never call it on another stack. We can avoid some redundant work by using stackleak_erase_on_task_stack(), telling the stackleak code that it's being called on a task stack. Signed-off-by: Mark Rutland <[email protected]> Cc: Alexander Popov <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Kees Cook <[email protected]> Cc: Will Deacon <[email protected]> Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-08randstruct: Split randstruct Makefile and CFLAGSKees Cook1-1/+2
To enable the new Clang randstruct implementation[1], move randstruct into its own Makefile and split the CFLAGS from GCC_PLUGINS_CFLAGS into RANDSTRUCT_CFLAGS. [1] https://reviews.llvm.org/D121556 Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-07arm64: kdump: Reimplement crashkernel=XChen Zhou2-4/+17
There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel in DMA zone, which will fail when there is not enough low memory. 2. If reserving crashkernel above DMA zone, in this case, crash dump kernel will fail to boot because there is no low memory available for allocation. To solve these issues, introduce crashkernel=X,[high,low]. The "crashkernel=X,high" is used to select a region above DMA zone, and the "crashkernel=Y,low" is used to allocate specified size low memory. Signed-off-by: Chen Zhou <[email protected]> Co-developed-by: Zhen Lei <[email protected]> Signed-off-by: Zhen Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-07arm64: Use insert_resource() to simplify codeZhen Lei1-14/+3
insert_resource() traverses the subtree layer by layer from the root node until a proper location is found. Compared with request_resource(), the parent node does not need to be determined in advance. In addition, move the insertion of node 'crashk_res' into function reserve_crashkernel() to make the associated code close together. Signed-off-by: Zhen Lei <[email protected]> Acked-by: John Donnelly <[email protected]> Acked-by: Baoquan He <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-07fork: Generalize PF_IO_WORKER handlingEric W. Biederman1-4/+3
Add fn and fn_arg members into struct kernel_clone_args and test for them in copy_thread (instead of testing for PF_KTHREAD | PF_IO_WORKER). This allows any task that wants to be a user space task that only runs in kernel mode to use this functionality. The code on x86 is an exception and still retains a PF_KTHREAD test because x86 unlikely everything else handles kthreads slightly differently than user space tasks that start with a function. The functions that created tasks that start with a function have been updated to set ".fn" and ".fn_arg" instead of ".stack" and ".stack_size". These functions are fork_idle(), create_io_thread(), kernel_thread(), and user_mode_thread(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Eric W. Biederman" <[email protected]>
2022-05-07fork: Pass struct kernel_clone_args into copy_threadEric W. Biederman1-2/+5
With io_uring we have started supporting tasks that are for most purposes user space tasks that exclusively run code in kernel mode. The kernel task that exec's init and tasks that exec user mode helpers are also user mode tasks that just run kernel code until they call kernel execve. Pass kernel_clone_args into copy_thread so these oddball tasks can be supported more cleanly and easily. v2: Fix spelling of kenrel_clone_args on h8300 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Eric W. Biederman" <[email protected]>
2022-05-06cpufreq: CPPC: Add per_cpu efficiency_classPierre Gondois1-0/+1
In ACPI, describing power efficiency of CPUs can be done through the following arm specific field: ACPI 6.4, s5.2.12.14 'GIC CPU Interface (GICC) Structure', 'Processor Power Efficiency Class field': Describes the relative power efficiency of the associated pro- cessor. Lower efficiency class numbers are more efficient than higher ones (e.g. efficiency class 0 should be treated as more efficient than efficiency class 1). However, absolute values of this number have no meaning: 2 isn’t necessarily half as efficient as 1. The efficiency_class field is stored in the GicC structure of the ACPI MADT table and it's currently supported in Linux for arm64 only. Thus, this new functionality is introduced for arm64 only. To allow the cppc_cpufreq driver to know and preprocess the efficiency_class values of all the CPUs, add a per_cpu efficiency_class variable to store them. At least 2 different efficiency classes must be present, otherwise there is no use in creating an Energy Model. The efficiency_class values are squeezed in [0:#efficiency_class-1] while conserving the order. For instance, efficiency classes of: [111, 212, 250] will be mapped to: [0 (was 111), 1 (was 212), 2 (was 250)]. Each policy being independently registered in the driver, populating the per_cpu efficiency_class is done only once at the driver initialization. This prevents from having each policy re-searching the efficiency_class values of other CPUs. The EM will be registered in a following patch. The patch also exports acpi_cpu_get_madt_gicc() to fetch the GicC structure of the ACPI MADT table for each CPU. Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Pierre Gondois <[email protected]> Acked-by: Viresh Kumar <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-05-06arm64/sme: More sensibly define the size for the ZA register setMark Brown1-2/+10
Since the vector length configuration mechanism is identical between SVE and SME we share large elements of the code including the definition for the maximum vector length. Unfortunately when we were defining the ABI for SVE we included not only the actual maximum vector length of 2048 bits but also the value possible if all the bits reserved in the architecture for expansion of the LEN field were used, 16384 bits. This starts creating problems if we try to allocate anything for the ZA matrix based on the maximum possible vector length, as we do for the regset used with ptrace during the process of generating a core dump. While the maximum potential size for ZA with the current architecture is a reasonably managable 64K with the higher reserved limit ZA would be 64M which leads to entirely reasonable complaints from the memory management code when we try to allocate a buffer of that size. Avoid these issues by defining the actual maximum vector length for the architecture and using it for the SME regsets. Also use the full ZA_PT_SIZE() with the header rather than just the actual register payload when specifying the size, fixing support for the largest vector lengths now that we have this new, lower define. With the SVE maximum this did not cause problems due to the extra headroom we had. While we're at it add a comment clarifying why even though ZA is a single register we tell the regset code that it is a multi-register regset. Reported-by: Qian Cai <[email protected]> Signed-off-by: Mark Brown <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-04arm64/sysreg: Standardise ID_AA64ISAR0_EL1 macro namesMark Brown1-35/+35
The macros for accessing fields in ID_AA64ISAR0_EL1 omit the _EL1 from the name of the register. In preparation for converting this register to be automatically generated update the names to include an _EL1, there should be no functional change. Signed-off-by: Mark Brown <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-04arm64: Update name of ID_AA64ISAR0_EL1_ATOMIC to reflect ARMMark Brown1-3/+3
The architecture reference manual refers to the field in bits 23:20 of ID_AA64ISAR0_EL1 with the name "atomic" but the kernel defines for this bitfield use the name "atomics". Bring the two into sync to make it easier to cross reference with the specification. Signed-off-by: Mark Brown <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-04arm64/mte: Make TCF field values and naming more standardMark Brown1-4/+5
In preparation for automatic generation of the defines for system registers make the values used for the enumeration in SCTLR_ELx.TCF suitable for use with the newly defined SYS_FIELD_PREP_ENUM helper, removing the shift from the define and using the helper to generate it on use instead. Since we only ever interact with this field in EL1 and in preparation for generation of the defines also rename from SCTLR_ELx to SCTLR_EL1. SCTLR_EL2 is not quite the same as SCTLR_EL1 so the conversion does not share the field definitions. There should be no functional change from this patch. Signed-off-by: Mark Brown <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-04arm64/mte: Make TCF0 naming and field values more standardMark Brown1-3/+3
In preparation for automatic generation of SCTLR_EL1 register definitions make the macros used to define SCTLR_EL1.TCF0 and the enumeration values it has more standard so they can be used with FIELD_PREP() via the newly defined SYS_FIELD_PREP_ helpers. Since the field also exists in SCTLR_EL2 with the same values also rename the macros to SCTLR_ELx rather than SCTLR_EL1. There should be no functional change as a result of this patch. Signed-off-by: Mark Brown <[email protected]> Acked-by: Mark Rutland <[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-05-04Merge branch kvm-arm64/wfxt into kvmarm-master/nextMarc Zyngier2-0/+14
* kvm-arm64/wfxt: : . : Add support for the WFET/WFIT instructions that provide the same : service as WFE/WFI, only with a timeout. : . KVM: arm64: Expose the WFXT feature to guests KVM: arm64: Offer early resume for non-blocking WFxT instructions KVM: arm64: Handle blocking WFIT instruction KVM: arm64: Introduce kvm_counter_compute_delta() helper KVM: arm64: Simplify kvm_cpu_has_pending_timer() arm64: Use WFxT for __delay() when possible arm64: Add wfet()/wfit() helpers arm64: Add HWCAP advertising FEAT_WFXT arm64: Add RV and RN fields for ESR_ELx_WFx_ISS arm64: Expand ESR_ELx_WFx_ISS_TI to match its ARMv8.7 definition Signed-off-by: Marc Zyngier <[email protected]>
2022-05-04Merge remote-tracking branch 'arm64/for-next/sme' into kvmarm-master/nextMarc Zyngier10-109/+1263
Merge arm64's SME branch to resolve conflicts with the WFxT branch. Signed-off-by: Marc Zyngier <[email protected]>
2022-04-29vmcore: convert copy_oldmem_page() to take an iov_iterMatthew Wilcox (Oracle)1-25/+4
Patch series "Convert vmcore to use an iov_iter", v5. For some reason several people have been sending bad patches to fix compiler warnings in vmcore recently. Here's how it should be done. Compile-tested only on x86. As noted in the first patch, s390 should take this conversion a bit further, but I'm not inclined to do that work myself. This patch (of 3): Instead of passing in a 'buf' and 'userbuf' argument, pass in an iov_iter. s390 needs more work to pass the iov_iter down further, or refactor, but I'd be more comfortable if someone who can test on s390 did that work. It's more convenient to convert the whole of read_from_oldmem() to take an iov_iter at the same time, so rename it to read_from_oldmem_iter() and add a temporary read_from_oldmem() wrapper that creates an iov_iter. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Heiko Carstens <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-04-29arm64: Treat ESR_ELx as a 64-bit registerAlexandru Elisei8-54/+54
In the initial release of the ARM Architecture Reference Manual for ARMv8-A, the ESR_ELx registers were defined as 32-bit registers. This changed in 2018 with version D.a (ARM DDI 0487D.a) of the architecture, when they became 64-bit registers, with bits [63:32] defined as RES0. In version G.a, a new field was added to ESR_ELx, ISS2, which covers bits [36:32]. This field is used when the Armv8.7 extension FEAT_LS64 is implemented. As a result of the evolution of the register width, Linux stores it as both a 64-bit value and a 32-bit value, which hasn't affected correctness so far as Linux only uses the lower 32 bits of the register. Make the register type consistent and always treat it as 64-bit wide. The register is redefined as an "unsigned long", which is an unsigned double-word (64-bit quantity) for the LP64 machine (aapcs64 [1], Table 1, page 14). The type was chosen because "unsigned int" is the most frequent type for ESR_ELx and because FAR_ELx, which is used together with ESR_ELx in exception handling, is also declared as "unsigned long". The 64-bit type also makes adding support for architectural features that use fields above bit 31 easier in the future. The KVM hypervisor will receive a similar update in a subsequent patch. [1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf Signed-off-by: Alexandru Elisei <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-29arm64: compat: Do not treat syscall number as ESR_ELx for a bad syscallAlexandru Elisei1-1/+1
If a compat process tries to execute an unknown system call above the __ARM_NR_COMPAT_END number, the kernel sends a SIGILL signal to the offending process. Information about the error is printed to dmesg in compat_arm_syscall() -> arm64_notify_die() -> arm64_force_sig_fault() -> arm64_show_signal(). arm64_show_signal() interprets a non-zero value for current->thread.fault_code as an exception syndrome and displays the message associated with the ESR_ELx.EC field (bits 31:26). current->thread.fault_code is set in compat_arm_syscall() -> arm64_notify_die() with the bad syscall number instead of a valid ESR_ELx value. This means that the ESR_ELx.EC field has the value that the user set for the syscall number and the kernel can end up printing bogus exception messages*. For example, for the syscall number 0x68000000, which evaluates to ESR_ELx.EC value of 0x1A (ESR_ELx_EC_FPAC) the kernel prints this error: [ 18.349161] syscall[300]: unhandled exception: ERET/ERETAA/ERETAB, ESR 0x68000000, Oops - bad compat syscall(2) in syscall[10000+50000] [ 18.350639] CPU: 2 PID: 300 Comm: syscall Not tainted 5.18.0-rc1 #79 [ 18.351249] Hardware name: Pine64 RockPro64 v2.0 (DT) [..] which is misleading, as the bad compat syscall has nothing to do with pointer authentication. Stop arm64_show_signal() from printing exception syndrome information by having compat_arm_syscall() set the ESR_ELx value to 0, as it has no meaning for an invalid system call number. The example above now becomes: [ 19.935275] syscall[301]: unhandled exception: Oops - bad compat syscall(2) in syscall[10000+50000] [ 19.936124] CPU: 1 PID: 301 Comm: syscall Not tainted 5.18.0-rc1-00005-g7e08006d4102 #80 [ 19.936894] Hardware name: Pine64 RockPro64 v2.0 (DT) [..] which although shows less information because the syscall number, wrongfully advertised as the ESR value, is missing, it is better than showing plainly wrong information. The syscall number can be easily obtained with strace. *A 32-bit value above or equal to 0x8000_0000 is interpreted as a negative integer in compat_arm_syscal() and the condition scno < __ARM_NR_COMPAT_END evaluates to true; the syscall will exit to userspace in this case with the ENOSYS error code instead of arm64_notify_die() being called. Signed-off-by: Alexandru Elisei <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-29arm64/ftrace: Make function graph use ftrace directlyChengming Zhou2-17/+17
As we do in commit 0c0593b45c9b ("x86/ftrace: Make function graph use ftrace directly"), we don't need special hook for graph tracer, but instead we use graph_ops:func function to install return_hooker. Since commit 3b23e4991fb6 ("arm64: implement ftrace with regs") add implementation for FTRACE_WITH_REGS on arm64, we can easily adopt the same cleanup on arm64. And this cleanup only changes the FTRACE_WITH_REGS implementation, so the mcount-based implementation is unaffected. While in theory it would be possible to make a similar cleanup for !FTRACE_WITH_REGS, this will require rework of the core code, and so for now we only change the FTRACE_WITH_REGS implementation. Tested-by: Mark Rutland <[email protected]> Reviewed-by: Mark Rutland <[email protected]> Signed-off-by: Chengming Zhou <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-29arm64/sme: Fix NULL check after kzallocWan Jiabing1-1/+1
Fix following coccicheck error: ./arch/arm64/kernel/process.c:322:2-23: alloc with no test, possible model on line 326 Here should be dst->thread.sve_state. Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME") Signed-off-by: Wan Jiabing <[email protected]> Reviwed-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-28elf: Fix the arm64 MTE ELF segment name and valueCatalin Marinas1-1/+1
Unfortunately, the name/value choice for the MTE ELF segment type (PT_ARM_MEMTAG_MTE) was pretty poor: LOPROC+1 is already in use by PT_AARCH64_UNWIND, as defined in the AArch64 ELF ABI (https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst). Update the ELF segment type value to LOPROC+2 and also change the define to PT_AARCH64_MEMTAG_MTE to match the AArch64 ELF ABI namespace. The AArch64 ELF ABI document is updating accordingly (segment type not previously mentioned in the document). Signed-off-by: Catalin Marinas <[email protected]> Fixes: 761b9b366cec ("elf: Introduce the ARM MTE ELF segment type") Cc: Will Deacon <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Kees Cook <[email protected]> Cc: Luis Machado <[email protected]> Cc: Richard Earnshaw <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2022-04-27arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding()Mark Brown1-0/+1
We need to explicitly enumerate all the ID registers which we rely on for CPU capabilities in __read_sysreg_by_encoding(), ID_AA64SMFR0_EL1 was missed from this list so we trip a BUG() in paths which rely on that function such as CPU hotplug. Add the register. Reported-by: Marek Szyprowski <[email protected]> Signed-off-by: Mark Brown <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-25arm64: Add support for user sub-page fault probingCatalin Marinas1-0/+30
With MTE, even if the pte allows an access, a mismatched tag somewhere within a page can still cause a fault. Select ARCH_HAS_SUBPAGE_FAULTS if MTE is enabled and implement the probe_subpage_writeable() function. Note that get_user() is sufficient for the writeable MTE check since the same tag mismatch fault would be triggered by a read. The caller of probe_subpage_writeable() will need to check the pte permissions (put_user, GUP). Signed-off-by: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Save and restore streaming mode over EFI runtime callsMark Brown1-6/+42
When saving and restoring the floating point state over an EFI runtime call ensure that we handle streaming mode, only handling FFR if we are not in streaming mode and ensuring that we are in normal mode over the call into runtime services. We currently assume that ZA will not be modified by runtime services, the specification is not yet finalised so this may need updating if that changes. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Disable streaming mode and ZA when flushing CPU stateMark Brown1-0/+9
Both streaming mode and ZA may increase power consumption when they are enabled and streaming mode makes many FPSIMD and SVE instructions undefined which will cause problems for any kernel mode floating point so disable both when we flush the CPU state. This covers both kernel_neon_begin() and idle and after flushing the state a reload is always required anyway. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Add ptrace support for ZAMark Brown1-0/+144
The ZA array can be read and written with the NT_ARM_ZA. Similarly to our interface for the SVE vector registers the regset consists of a header with information on the current vector length followed by an optional register data payload, represented as for signals as a series of horizontal vectors from 0 to VL/8 in the endianness independent format used for vectors. On get if ZA is enabled then register data will be provided, otherwise it will be omitted. On set if register data is provided then ZA is enabled and initialized using the provided data, otherwise it is disabled. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Implement ptrace support for streaming mode SVE registersMark Brown2-54/+191
The streaming mode SVE registers are represented using the same data structures as for SVE but since the vector lengths supported and in use may not be the same as SVE we represent them with a new type NT_ARM_SSVE. Unfortunately we only have a single 16 bit reserved field available in the header so there is no space to fit the current and maximum vector length for both standard and streaming SVE mode without redefining the structure in a way the creates a complicatd and fragile ABI. Since FFR is not present in streaming mode it is read and written as zero. Setting NT_ARM_SSVE registers will put the task into streaming mode, similarly setting NT_ARM_SVE registers will exit it. Reads that do not correspond to the current mode of the task will return the header with no register data. For compatibility reasons on write setting no flag for the register type will be interpreted as setting SVE registers, though users can provide no register data as an alternative mechanism for doing so. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Implement ZA signal handlingMark Brown2-3/+139
Implement support for ZA in signal handling in a very similar way to how we implement support for SVE registers, using a signal context structure with optional register state after it. Where present this register state stores the ZA matrix as a series of horizontal vectors numbered from 0 to VL/8 in the endinanness independent format used for vectors. As with SVE we do not allow changes in the vector length during signal return but we do allow ZA to be enabled or disabled. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Implement streaming SVE signal handlingMark Brown1-10/+32
When in streaming mode we have the same set of SVE registers as we do in regular SVE mode with the exception of FFR and the use of the SME vector length. Provide signal handling for these registers by taking one of the reserved words in the SVE signal context as a flags field and defining a flag which is set for streaming mode. When the flag is set the vector length is set to the streaming mode vector length and we save and restore streaming mode data. We support entering or leaving streaming mode based on the value of the flag but do not support changing the vector length, this is not currently supported SVE signal handling. We could instead allocate a separate record in the signal frame for the streaming mode SVE context but this inflates the size of the maximal signal frame required and adds complication when validating signal frames from userspace, especially given the current structure of the code. Any implementation of support for streaming mode vectors in signals will have some potential for causing issues for applications that attempt to handle SVE vectors in signals, use streaming mode but do not understand streaming mode in their signal handling code, it is hard to identify a case that is clearly better than any other - they all have cases where they could cause unexpected register corruption or faults. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Disable ZA and streaming mode when handling signalsMark Brown1-0/+7
The ABI requires that streaming mode and ZA are disabled when invoking signal handlers, do this in setup_return() when we prepare the task state for the signal handler. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Implement traps and syscall handling for SMEMark Brown4-23/+214
By default all SME operations in userspace will trap. When this happens we allocate storage space for the SME register state, set up the SVE registers and disable traps. We do not need to initialize ZA since the architecture guarantees that it will be zeroed when enabled and when we trap ZA is disabled. On syscall we exit streaming mode if we were previously in it and ensure that all but the lower 128 bits of the registers are zeroed while preserving the state of ZA. This follows the aarch64 PCS for SME, ZA state is preserved over a function call and streaming mode is exited. Since the traps for SME do not distinguish between streaming mode SVE and ZA usage if ZA is in use rather than reenabling traps we instead zero the parts of the SVE registers not shared with FPSIMD and leave SME enabled, this simplifies handling SME traps. If ZA is not in use then we reenable SME traps and fall through to normal handling of SVE. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Implement ZA context switchingMark Brown2-7/+35
Allocate space for storing ZA on first access to SME and use that to save and restore ZA state when context switching. We do this by using the vector form of the LDR and STR ZA instructions, these do not require streaming mode and have implementation recommendations that they avoid contention issues in shared SMCU implementations. Since ZA is architecturally guaranteed to be zeroed when enabled we do not need to explicitly zero ZA, either we will be restoring from a saved copy or trapping on first use of SME so we know that ZA must be disabled. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Implement streaming SVE context switchingMark Brown2-20/+94
When in streaming mode we need to save and restore the streaming mode SVE register state rather than the regular SVE register state. This uses the streaming mode vector length and omits FFR but is otherwise identical, if TIF_SVE is enabled when we are in streaming mode then streaming mode takes precedence. This does not handle use of streaming SVE state with KVM, ptrace or signals. This will be updated in further patches. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Implement SVCR context switchingMark Brown2-1/+19
In SME the use of both streaming SVE mode and ZA are tracked through PSTATE.SM and PSTATE.ZA, visible through the system register SVCR. In order to context switch the floating point state for SME we need to context switch the contents of this register as part of context switching the floating point state. Since changing the vector length exits streaming SVE mode and disables ZA we also make sure we update SVCR appropriately when setting vector length, and similarly ensure that new threads have streaming SVE mode and ZA disabled. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Implement support for TPIDR2Mark Brown2-2/+16
The Scalable Matrix Extension introduces support for a new thread specific data register TPIDR2 intended for use by libc. The kernel must save the value of TPIDR2 on context switch and should ensure that all new threads start off with a default value of 0. Add a field to the thread_struct to store TPIDR2 and context switch it with the other thread specific data. In case there are future extensions which also use TPIDR2 we introduce system_supports_tpidr2() and use that rather than system_supports_sme() for TPIDR2 handling. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2022-04-22arm64/sme: Implement vector length configuration prctl()sMark Brown1-0/+32
As for SVE provide a prctl() interface which allows processes to configure their SME vector length. Signed-off-by: Mark Brown <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>