aboutsummaryrefslogtreecommitdiff
path: root/arch/riscv/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-02-15riscv: ftrace: Reduce the detour code size to halfGuo Ren2-72/+35
Use a temporary register to reduce the size of detour code from 16 bytes to 8 bytes. The previous implementation is from 'commit afc76b8b8011 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")'. Before the patch: <func_prolog>: 0: REG_S ra, -SZREG(sp) 4: auipc ra, ? 8: jalr ?(ra) 12: REG_L ra, -SZREG(sp) (func_boddy) After the patch: <func_prolog>: 0: auipc t0, ? 4: jalr t0, ?(t0) (func_boddy) This patch not just reduces the size of detour code, but also fixes an important issue: An Ftrace callback registered with FTRACE_OPS_FL_IPMODIFY flag can actually change the instruction pointer, e.g. to "replace" the given kernel function with a new one, which is needed for livepatching, etc. In this case, the trampoline (ftrace_regs_caller) would not return to <func_prolog+12> but would rather jump to the new function. So, "REG_L ra, -SZREG(sp)" would not run and the original return address would not be restored. The kernel is likely to hang or crash as a result. This can be easily demonstrated if one tries to "replace", say, cmdline_proc_show() with a new function with the same signature using instruction_pointer_set(&fregs->regs, new_func_addr) in the Ftrace callback. Link: https://lore.kernel.org/linux-riscv/20221122075440.1165172-1-suagrfillet@gmail.com/ Link: https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/ Co-developed-by: Song Shuai <suagrfillet@gmail.com> Signed-off-by: Song Shuai <suagrfillet@gmail.com> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Evgenii Shatokhin <e.shatokhin@yadro.com> Reviewed-by: Evgenii Shatokhin <e.shatokhin@yadro.com> Link: https://lore.kernel.org/r/20230112090603.1295340-4-guoren@kernel.org Cc: stable@vger.kernel.org Fixes: 10626c32e382 ("riscv/ftrace: Add basic support") Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+3
Pull RISC-V kvm updates from Paolo Bonzini: - Allow unloading KVM module - Allow KVM user-space to set mvendorid, marchid, and mimpid - Several fixes and cleanups * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: RISC-V: KVM: Add ONE_REG interface for mvendorid, marchid, and mimpid RISC-V: KVM: Save mvendorid, marchid, and mimpid when creating VCPU RISC-V: Export sbi_get_mvendorid() and friends RISC-V: KVM: Move sbi related struct and functions to kvm_vcpu_sbi.h RISC-V: KVM: Use switch-case in kvm_riscv_vcpu_set/get_reg() RISC-V: KVM: Remove redundant includes of asm/csr.h RISC-V: KVM: Remove redundant includes of asm/kvm_vcpu_timer.h RISC-V: KVM: Fix reg_val check in kvm_riscv_vcpu_set_reg_config() RISC-V: KVM: Simplify kvm_arch_prepare_memory_region() RISC-V: KVM: Exit run-loop immediately if xfer_to_guest fails RISC-V: KVM: use vma_lookup() instead of find_vma_intersection() RISC-V: KVM: Add exit logic to main.c
2022-12-14Merge tag 'riscv-for-linus-6.2-mw1' of ↵Linus Torvalds15-87/+210
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the T-Head PMU via the perf subsystem - ftrace support for rv32 - Support for non-volatile memory devices - Various fixes and cleanups * tag 'riscv-for-linus-6.2-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (52 commits) Documentation: RISC-V: patch-acceptance: s/implementor/implementer Documentation: RISC-V: Mention the UEFI Standards Documentation: RISC-V: Allow patches for non-standard behavior Documentation: RISC-V: Fix a typo in patch-acceptance riscv: Fixup compile error with !MMU riscv: Fix P4D_SHIFT definition for 3-level page table mode riscv: Apply a static assert to riscv_isa_ext_id RISC-V: Add some comments about the shadow and overflow stacks RISC-V: Align the shadow stack RISC-V: Ensure Zicbom has a valid block size RISC-V: Introduce riscv_isa_extension_check RISC-V: Improve use of isa2hwcap[] riscv: Don't duplicate _ALTERNATIVE_CFG* macros riscv: alternatives: Drop the underscores from the assembly macro names riscv: alternatives: Don't name unused macro parameters riscv: Don't duplicate __ALTERNATIVE_CFG in __ALTERNATIVE_CFG_2 riscv: mm: call best_map_size many times during linear-mapping riscv: Move cast inside kernel_mapping_[pv]a_to_[vp]a riscv: Fix crash during early errata patching riscv: boot: add zstd support ...
2022-12-13Merge tag 'efi-next-for-v6.2' of ↵Linus Torvalds1-6/+0
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: "Another fairly sizable pull request, by EFI subsystem standards. Most of the work was done by me, some of it in collaboration with the distro and bootloader folks (GRUB, systemd-boot), where the main focus has been on removing pointless per-arch differences in the way EFI boots a Linux kernel. - Refactor the zboot code so that it incorporates all the EFI stub logic, rather than calling the decompressed kernel as a EFI app. - Add support for initrd= command line option to x86 mixed mode. - Allow initrd= to be used with arbitrary EFI accessible file systems instead of just the one the kernel itself was loaded from. - Move some x86-only handling and manipulation of the EFI memory map into arch/x86, as it is not used anywhere else. - More flexible handling of any random seeds provided by the boot environment (i.e., systemd-boot) so that it becomes available much earlier during the boot. - Allow improved arch-agnostic EFI support in loaders, by setting a uniform baseline of supported features, and adding a generic magic number to the DOS/PE header. This should allow loaders such as GRUB or systemd-boot to reduce the amount of arch-specific handling substantially. - (arm64) Run EFI runtime services from a dedicated stack, and use it to recover from synchronous exceptions that might occur in the firmware code. - (arm64) Ensure that we don't allocate memory outside of the 48-bit addressable physical range. - Make EFI pstore record size configurable - Add support for decoding CXL specific CPER records" * tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits) arm64: efi: Recover from synchronous exceptions occurring in firmware arm64: efi: Execute runtime services from a dedicated stack arm64: efi: Limit allocations to 48-bit addressable physical region efi: Put Linux specific magic number in the DOS header efi: libstub: Always enable initrd command line loader and bump version efi: stub: use random seed from EFI variable efi: vars: prohibit reading random seed variables efi: random: combine bootloader provided RNG seed with RNG protocol output efi/cper, cxl: Decode CXL Error Log efi/cper, cxl: Decode CXL Protocol Error Section efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment efi: x86: Move EFI runtime map sysfs code to arch/x86 efi: runtime-maps: Clarify purpose and enable by default for kexec efi: pstore: Add module parameter for setting the record size efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures efi: memmap: Move manipulation routines into x86 arch tree efi: memmap: Move EFI fake memmap support into x86 arch tree efi: libstub: Undeprecate the command line initrd loader efi: libstub: Add mixed mode support to command line initrd loader efi: libstub: Permit mixed mode return types other than efi_status_t ...
2022-12-12Merge tag 'timers-core-2022-12-10' of ↵Linus Torvalds1-22/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timers, timekeeping and drivers: Core: - The timer_shutdown[_sync]() infrastructure: Tearing down timers can be tedious when there are circular dependencies to other things which need to be torn down. A prime example is timer and workqueue where the timer schedules work and the work arms the timer. What needs to prevented is that pending work which is drained via destroy_workqueue() does not rearm the previously shutdown timer. Nothing in that shutdown sequence relies on the timer being functional. The conclusion was that the semantics of timer_shutdown_sync() should be: - timer is not enqueued - timer callback is not running - timer cannot be rearmed Preventing the rearming of shutdown timers is done by discarding rearm attempts silently. A warning for the case that a rearm attempt of a shutdown timer is detected would not be really helpful because it's entirely unclear how it should be acted upon. The only way to address such a case is to add 'if (in_shutdown)' conditionals all over the place. This is error prone and in most cases of teardown not required all. - The real fix for the bluetooth HCI teardown based on timer_shutdown_sync(). A larger scale conversion to timer_shutdown_sync() is work in progress. - Consolidation of VDSO time namespace helper functions - Small fixes for timer and timerqueue Drivers: - Prevent integer overflow on the XGene-1 TVAL register which causes an never ending interrupt storm. - The usual set of new device tree bindings - Small fixes and improvements all over the place" * tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) dt-bindings: timer: renesas,cmt: Add r8a779g0 CMT support dt-bindings: timer: renesas,tmu: Add r8a779g0 support clocksource/drivers/arm_arch_timer: Use kstrtobool() instead of strtobool() clocksource/drivers/timer-ti-dm: Fix missing clk_disable_unprepare in dmtimer_systimer_init_clock() clocksource/drivers/timer-ti-dm: Clear settings on probe and free clocksource/drivers/timer-ti-dm: Make timer_get_irq static clocksource/drivers/timer-ti-dm: Fix warning for omap_timer_match clocksource/drivers/arm_arch_timer: Fix XGene-1 TVAL register math error clocksource/drivers/timer-npcm7xx: Enable timer 1 clock before use dt-bindings: timer: nuvoton,npcm7xx-timer: Allow specifying all clocks dt-bindings: timer: rockchip: Add rockchip,rk3128-timer clockevents: Repair kernel-doc for clockevent_delta2ns() clocksource/drivers/ingenic-ost: Define pm functions properly in platform_driver struct clocksource/drivers/sh_cmt: Access registers according to spec vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copy Bluetooth: hci_qca: Fix the teardown problem for real timers: Update the documentation to reflect on the new timer_shutdown() API timers: Provide timer_shutdown[_sync]() timers: Add shutdown mechanism to the internal functions timers: Split [try_to_]del_timer[_sync]() to prepare for shutdown mode ...
2022-12-12Merge patch series "RISC-V: Align the shadow stack"Palmer Dabbelt2-3/+40
Palmer Dabbelt <palmer@rivosinc.com> says: This contains a pair of cleanups that depend on a fix that has already landed upstream. * b4-shazam-merge: RISC-V: Add some comments about the shadow and overflow stacks RISC-V: Align the shadow stack riscv: fix race when vmap stack overflow Link: https://lore.kernel.org/r/20221130023515.20217-1-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-12RISC-V: Add some comments about the shadow and overflow stacksPalmer Dabbelt1-7/+13
It took me a while to page all this back in when trying to review the recent spin_shadow_stack, so I figured I'd just write up some comments. Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/r/20221130023515.20217-2-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-12RISC-V: Align the shadow stackPalmer Dabbelt1-1/+1
The standard RISC-V ABIs all require 16-byte stack alignment. We're only calling that one function on the shadow stack so I doubt it'd result in a real issue, but might as well keep this lined up. Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection") Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/r/20221130023515.20217-1-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09Merge patch series "RISC-V: Ensure Zicbom has a valid block size"Palmer Dabbelt1-10/+33
Andrew Jones <ajones@ventanamicro.com> says: When a DT puts zicbom in the isa string, but does not provide a block size, ALT_CMO_OP() will attempt to do cache operations on address zero since the start address will be ANDed with zero. We can't simply BUG() in riscv_init_cbom_blocksize() when we fail to find a block size because the failure will happen before logging works, leaving users to scratch their heads as to why the boot hung. Instead, ensure Zicbom is disabled and output an error which will hopefully alert people that the DT needs to be fixed. While at it, add a check that the block size is a power-of-2 too. * b4-shazam-merge: RISC-V: Ensure Zicbom has a valid block size RISC-V: Introduce riscv_isa_extension_check RISC-V: Improve use of isa2hwcap[] Link: https://lore.kernel.org/r/20221129143447.49714-1-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09RISC-V: Ensure Zicbom has a valid block sizeAndrew Jones1-0/+13
When a DT puts zicbom in the isa string, but does not provide a block size, ALT_CMO_OP() will attempt to do cache operations on address zero since the start address will be ANDed with zero. We can't simply BUG() in riscv_init_cbom_blocksize() when we fail to find a block size because the failure will happen before logging works, leaving users to scratch their heads as to why the boot hung. Instead, ensure Zicbom is disabled and output an error which will hopefully alert people that the DT needs to be fixed. While at it, add a check that the block size is a power-of-2 too. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20221129143447.49714-4-ajones@ventanamicro.com [Palmer: base on 5c20a3a9df19 ("RISC-V: Fix compilation without RISCV_ISA_ZICBOM"] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09RISC-V: Introduce riscv_isa_extension_checkAndrew Jones1-3/+11
Currently any isa extension found in the isa string is set in the isa bitmap. An isa extension set in the bitmap indicates that the extension is present and may be used (a.k.a is enabled). However, when an extension cannot be used due to missing dependencies or errata it should not be added to the bitmap. Introduce a function where additional checks may be placed in order to determine if an extension should be enabled or not. Note, the checks may simply indicate an issue with the DT, but, since extensions may be used in early boot, it's not always possible to simply produce an error at the point the issue is determined. It's best to keep the extension disabled and produce an error. No functional change intended, as the function is only introduced and always returns true. A later patch will provide checks for an isa extension. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20221129143447.49714-3-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09RISC-V: Improve use of isa2hwcap[]Andrew Jones1-9/+11
Improve isa2hwcap[] by removing it from static storage, as riscv_fill_hwcap() is only called once, and by reducing its size from 256 bytes to 26. The latter improvement is possible because isa2hwcap[] will never be indexed with capital letters and we can precompute the offsets from 'a'. No functional change intended. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20221129143447.49714-2-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08Merge patch "RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path"Palmer Dabbelt2-26/+26
I'm merging this in as a single patch to make it easier to handle the backports. * b4-shazam-merge: RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path Link: https://lore.kernel.org/r/20221111223108.1976562-1-abrestic@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08RISC-V: Fix unannoted hardirqs-on in return to userspace slow-pathAndrew Bresticker2-26/+26
The return to userspace path in entry.S may enable interrupts without the corresponding lockdep annotation, producing a splat[0] when DEBUG_LOCKDEP is enabled. Simply calling __trace_hardirqs_on() here gets a bit messy due to the use of RA to point back to ret_from_exception, so just move the whole slow-path loop into C. It's more readable and it lets us use local_irq_{enable,disable}(), avoiding the need for manual annotations altogether. [0]: ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled()) WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5512 check_flags+0x10a/0x1e0 Modules linked in: CPU: 2 PID: 1 Comm: init Not tainted 6.1.0-rc4-00160-gb56b6e2b4f31 #53 Hardware name: riscv-virtio,qemu (DT) epc : check_flags+0x10a/0x1e0 ra : check_flags+0x10a/0x1e0 <snip> status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [<ffffffff808edb90>] lock_is_held_type+0x78/0x14e [<ffffffff8003dae2>] __might_resched+0x26/0x22c [<ffffffff8003dd24>] __might_sleep+0x3c/0x66 [<ffffffff80022c60>] get_signal+0x9e/0xa70 [<ffffffff800054a2>] do_notify_resume+0x6e/0x422 [<ffffffff80003c68>] ret_from_exception+0x0/0x10 irq event stamp: 44512 hardirqs last enabled at (44511): [<ffffffff808f901c>] _raw_spin_unlock_irqrestore+0x54/0x62 hardirqs last disabled at (44512): [<ffffffff80008200>] __trace_hardirqs_off+0xc/0x14 softirqs last enabled at (44472): [<ffffffff808f9fbe>] __do_softirq+0x3de/0x51e softirqs last disabled at (44467): [<ffffffff80017760>] irq_exit+0xd6/0x104 ---[ end trace 0000000000000000 ]--- possible reason: unannotated irqs-on. Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> Fixes: 3c4697982982 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT") Link: https://lore.kernel.org/r/20221111223108.1976562-1-abrestic@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-07Merge tag 'v6.1-rc8' into efi/nextArd Biesheuvel6-17/+167
Linux 6.1-rc8
2022-12-07RISC-V: Export sbi_get_mvendorid() and friendsAnup Patel1-0/+3
The sbi_get_mvendorid(), sbi_get_marchid(), and sbi_get_mimpid() can be used by KVM module so let us export these functions. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-12-05riscv: stacktrace: Make walk_stackframe cross pt_regs frameGuo Ren2-1/+11
The current walk_stackframe with FRAME_POINTER would stop unwinding at ret_from_exception: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192 Call Trace: [<ffffffe0002038c8>] walk_stackframe+0x0/0xee [<ffffffe000aecf48>] show_stack+0x32/0x4a [<ffffffe000af1618>] dump_stack_lvl+0x72/0x8e [<ffffffe000af1648>] dump_stack+0x14/0x1c [<ffffffe000239ad2>] ___might_sleep+0x12e/0x138 [<ffffffe000239aec>] __might_sleep+0x10/0x18 [<ffffffe000afe3fe>] down_read+0x22/0xa4 [<ffffffe000207588>] do_page_fault+0xb0/0x2fe [<ffffffe000201b80>] ret_from_exception+0x0/0xc The optimization would help walk_stackframe cross the pt_regs frame and get more backtrace of debug info: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192 Call Trace: [<ffffffe0002038c8>] walk_stackframe+0x0/0xee [<ffffffe000aecf48>] show_stack+0x32/0x4a [<ffffffe000af1618>] dump_stack_lvl+0x72/0x8e [<ffffffe000af1648>] dump_stack+0x14/0x1c [<ffffffe000239ad2>] ___might_sleep+0x12e/0x138 [<ffffffe000239aec>] __might_sleep+0x10/0x18 [<ffffffe000afe3fe>] down_read+0x22/0xa4 [<ffffffe000207588>] do_page_fault+0xb0/0x2fe [<ffffffe000201b80>] ret_from_exception+0x0/0xc [<ffffffe000613c06>] riscv_intc_irq+0x1a/0x72 [<ffffffe000201b80>] ret_from_exception+0x0/0xc [<ffffffe00033f44a>] vma_link+0x54/0x160 [<ffffffe000341d7a>] mmap_region+0x2cc/0x4d0 [<ffffffe000342256>] do_mmap+0x2d8/0x3ac [<ffffffe000326318>] vm_mmap_pgoff+0x70/0xb8 [<ffffffe00032638a>] vm_mmap+0x2a/0x36 [<ffffffe0003cfdde>] elf_map+0x72/0x84 [<ffffffe0003d05f8>] load_elf_binary+0x69a/0xec8 [<ffffffe000376240>] bprm_execve+0x246/0x53a [<ffffffe00037786c>] kernel_execve+0xe8/0x124 [<ffffffe000aecdf2>] run_init_process+0xfa/0x10c [<ffffffe000aece16>] try_to_run_init_process+0x12/0x3c [<ffffffe000afa920>] kernel_init+0xb4/0xf8 [<ffffffe000201b80>] ret_from_exception+0x0/0xc Here is the error injection test code for the above output: drivers/irqchip/irq-riscv-intc.c: static asmlinkage void riscv_intc_irq(struct pt_regs *regs) { unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG; + u32 tmp; __get_user(tmp, (u32 *)0); Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20221109064937.3643993-3-guoren@kernel.org [Palmer: use SYM_CODE_*] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-05riscv: stacktrace: Fixup ftrace_graph_ret_addr retp argumentGuo Ren1-1/+1
The 'retp' is a pointer to the return address on the stack, so we must pass the current return address pointer as the 'retp' argument to ftrace_push_return_trace(). Not parent function's return address on the stack. Fixes: b785ec129bd9 ("riscv/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR support") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20221109064937.3643993-2-guoren@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-05RISC-V: kexec: Fix memory leak of elf header bufferLi Huafei1-0/+4
This is reported by kmemleak detector: unreferenced object 0xff2000000403d000 (size 4096): comm "kexec", pid 146, jiffies 4294900633 (age 64.792s) hex dump (first 32 bytes): 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............ 04 00 f3 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000566ca97c>] kmemleak_vmalloc+0x3c/0xbe [<00000000979283d8>] __vmalloc_node_range+0x3ac/0x560 [<00000000b4b3712a>] __vmalloc_node+0x56/0x62 [<00000000854f75e2>] vzalloc+0x2c/0x34 [<00000000e9a00db9>] crash_prepare_elf64_headers+0x80/0x30c [<0000000067e8bf48>] elf_kexec_load+0x3e8/0x4ec [<0000000036548e09>] kexec_image_load_default+0x40/0x4c [<0000000079fbe1b4>] sys_kexec_file_load+0x1c4/0x322 [<0000000040c62c03>] ret_from_syscall+0x0/0x2 In elf_kexec_load(), a buffer is allocated via vzalloc() to store elf headers. While it's not freed back to system when kdump kernel is reloaded or unloaded, or when image->elf_header is successfully set and then fails to load kdump kernel for some reason. Fix it by freeing the buffer in arch_kimage_file_post_load_cleanup(). Fixes: 8acea455fafa ("RISC-V: Support for kexec_file on panic") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221104095658.141222-2-lihuafei1@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-05RISC-V: kexec: Fix memory leak of fdt bufferLi Huafei1-0/+10
This is reported by kmemleak detector: unreferenced object 0xff60000082864000 (size 9588): comm "kexec", pid 146, jiffies 4294900634 (age 64.788s) hex dump (first 32 bytes): d0 0d fe ed 00 00 12 ed 00 00 00 48 00 00 11 40 ...........H...@ 00 00 00 28 00 00 00 11 00 00 00 02 00 00 00 00 ...(............ backtrace: [<00000000f95b17c4>] kmemleak_alloc+0x34/0x3e [<00000000b9ec8e3e>] kmalloc_order+0x9c/0xc4 [<00000000a95cf02e>] kmalloc_order_trace+0x34/0xb6 [<00000000f01e68b4>] __kmalloc+0x5c2/0x62a [<000000002bd497b2>] kvmalloc_node+0x66/0xd6 [<00000000906542fa>] of_kexec_alloc_and_setup_fdt+0xa6/0x6ea [<00000000e1166bde>] elf_kexec_load+0x206/0x4ec [<0000000036548e09>] kexec_image_load_default+0x40/0x4c [<0000000079fbe1b4>] sys_kexec_file_load+0x1c4/0x322 [<0000000040c62c03>] ret_from_syscall+0x0/0x2 In elf_kexec_load(), a buffer is allocated via kvmalloc() to store fdt. While it's not freed back to system when kexec kernel is reloaded or unloaded. Then memory leak is caused. Fix it by introducing riscv specific function arch_kimage_file_post_load_cleanup(), and freeing the buffer there. Fixes: 6261586e0c91 ("RISC-V: Add kexec_file support") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Liao Chang <liaochang1@huawei.com> Link: https://lore.kernel.org/r/20221104095658.141222-1-lihuafei1@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02Merge patch series "Support VMCOREINFO export for RISCV64"Palmer Dabbelt2-0/+22
Add arch_crash_save_vmcoreinfo(), which exports VM layout(MODULES, VMALLOC, VMEMMAP ranges and KERNEL_LINK_ADDR), va bits and ram base for vmcore. * b4-shazam-merge: Documentation: kdump: describe VMCOREINFO export for RISCV64 RISC-V: Add arch_crash_save_vmcoreinfo support Link: https://lore.kernel.org/r/20221026144208.373504-1-xianting.tian@linux.alibaba.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02RISC-V: Add arch_crash_save_vmcoreinfo supportXianting Tian2-0/+22
Add arch_crash_save_vmcoreinfo(), which exports VM layout(MODULES, VMALLOC, VMEMMAP ranges and KERNEL_LINK_ADDR), va bits and ram base for vmcore. Default pagetable levels and PAGE_OFFSET aren't same for different kernel version as below. For pagetable levels, it sets sv57 by default and falls back to setting sv48 at boot time if sv57 is not supported by the hardware. For ram base, the default value is 0x80200000 for qemu riscv64 env and, for example, is 0x200000 on the XuanTie 910 CPU. * Linux Kernel 5.18 ~ * PGTABLE_LEVELS = 5 * PAGE_OFFSET = 0xff60000000000000 * Linux Kernel 5.17 ~ * PGTABLE_LEVELS = 4 * PAGE_OFFSET = 0xffffaf8000000000 * Linux Kernel 4.19 ~ * PGTABLE_LEVELS = 3 * PAGE_OFFSET = 0xffffffe000000000 Since these configurations change from time to time and version to version, it is preferable to export them via vmcoreinfo than to change the crash's code frequently, it can simplify the development of crash tool. Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Tested-by: Deepak Gupta <debug@rivosinc.com> Tested-by: Guo Ren <guoren@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20221026144208.373504-2-xianting.tian@linux.alibaba.com [Palmer: wrap commit text] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02riscv: add riscv rethook implementationBinglei Wang5-17/+39
Implement the kretprobes on riscv arch by using rethook machenism which abstracts general kretprobe info into a struct rethook_node to be embedded in the struct kretprobe_instance. Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Binglei Wang <l3b2w1@gmail.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221025151831.1097417-1-conor@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02Merge patch series "RISC-V: Dynamic ftrace support for RV32I"Palmer Dabbelt1-21/+23
Jamie Iles <jamie@jamieiles.com> says: This series enables dynamic ftrace support for RV32I bringing it to parity with RV64I. Most of the work is already there, this is largely just assembly fixes to handle register sizes, correct handling of the psABI calling convention and Kconfig change. Validated with all ftrace boot time self test with qemu for RV32I and RV64I in addition to real tracing on an RV32I FPGA design. * b4-shazam-merge: RISC-V: enable dynamic ftrace for RV32I RISC-V: preserve a1 in mcount RISC-V: reduce mcount save space on RV32 RISC-V: use REG_S/REG_L for mcount Link: https://lore.kernel.org/r/20221115200832.706370-1-jamie@jamieiles.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02RISC-V: preserve a1 in mcountJamie Iles1-2/+4
The RISC-V ELF psABI states that both a0 and a1 are used for return values so we should preserve them both in return_to_handler. This is especially important for RV32 for functions returning a 64-bit quantity otherwise the return value can be corrupted and undefined behaviour results. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Jamie Iles <jamie@jamieiles.com> Link: https://lore.kernel.org/r/20221115200832.706370-4-jamie@jamieiles.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02RISC-V: reduce mcount save space on RV32Jamie Iles1-16/+16
For RV32 we can reduce the size of the ABI save+restore state by using SZREG so that register stores are packed rather than on an 8 byte boundary. Signed-off-by: Jamie Iles <jamie@jamieiles.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20221115200832.706370-3-jamie@jamieiles.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02RISC-V: use REG_S/REG_L for mcountJamie Iles1-15/+15
In preparation for rv32i ftrace support, convert mcount routines to use native sized loads/stores. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Jamie Iles <jamie@jamieiles.com> Link: https://lore.kernel.org/r/20221115200832.706370-2-jamie@jamieiles.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-01RISC-V: Fix a race condition during kernel stack overflowPalmer Dabbelt2-0/+31
This fixes a concrete bug but is also the basis for some cleanup work, so I'm merging it based on the offending commit in order to minimize future conflicts. * commit '7e1864332fbc1b993659eab7974da9fe8bf8c128': riscv: fix race when vmap stack overflow
2022-12-01vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copyJann Horn1-22/+0
find_timens_vvar_page() is not architecture-specific, as can be seen from how all five per-architecture versions of it are the same. (arm64, powerpc and riscv are exactly the same; x86 and s390 have two characters difference inside a comment, less blank lines, and mark the !CONFIG_TIME_NS version as inline.) Refactor the five copies into a central copy in kernel/time/namespace.c. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130115320.2918447-1-jannh@google.com
2022-11-29Merge patch series "riscv: kexec: Fxiup crash_save percpu and ↵Palmer Dabbelt2-13/+130
machine_kexec_mask_interrupts" guoren@kernel.org <guoren@kernel.org> says: From: Guo Ren <guoren@linux.alibaba.com> Current riscv kexec can't crash_save percpu states and disable interrupts properly. The patch series fix them, make kexec work correct. * b4-shazam-merge: riscv: kexec: Fixup crash_smp_send_stop without multi cores riscv: kexec: Fixup irq controller broken in kexec crash path Link: https://lore.kernel.org/r/20221020141603.2856206-1-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-29riscv: kexec: Fixup crash_smp_send_stop without multi coresGuo Ren2-18/+100
Current crash_smp_send_stop is the same as the generic one in kernel/panic and misses crash_save_cpu in percpu. This patch is inspired by 78fd584cdec0 ("arm64: kdump: implement machine_crash_shutdown()") and adds the same mechanism for riscv. Before this patch, test result: crash> help -r CPU 0: [OFFLINE] CPU 1: epc : ffffffff80009ff0 ra : ffffffff800b789a sp : ff2000001098bb40 gp : ffffffff815fca60 tp : ff60000004680000 t0 : 6666666666663c5b t1 : 0000000000000000 t2 : 666666666666663c s0 : ff2000001098bc90 s1 : ffffffff81600798 a0 : ff2000001098bb48 a1 : 0000000000000000 a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000000 a5 : ff60000004690800 a6 : 0000000000000000 a7 : 0000000000000000 s2 : ff2000001098bb48 s3 : ffffffff81093ec8 s4 : ffffffff816004ac s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80e7f720 s8 : 00fffffffffff3f0 s9 : 0000000000000007 s10: 00aaaaaaaab98700 s11: 0000000000000001 t3 : ffffffff819a8097 t4 : ffffffff819a8097 t5 : ffffffff819a8098 t6 : ff2000001098b9a8 CPU 2: [OFFLINE] CPU 3: [OFFLINE] After this patch, test result: crash> help -r CPU 0: epc : ffffffff80003f34 ra : ffffffff808caa7c sp : ffffffff81403eb0 gp : ffffffff815fcb48 tp : ffffffff81413400 t0 : 0000000000000000 t1 : 0000000000000000 t2 : 0000000000000000 s0 : ffffffff81403ec0 s1 : 0000000000000000 a0 : 0000000000000000 a1 : 0000000000000000 a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000 s2 : ffffffff816001c8 s3 : ffffffff81600370 s4 : ffffffff80c32e18 s5 : ffffffff819d3018 s6 : ffffffff810e2110 s7 : 0000000000000000 s8 : 0000000000000000 s9 : 0000000080039eac s10: 0000000000000000 s11: 0000000000000000 t3 : 0000000000000000 t4 : 0000000000000000 t5 : 0000000000000000 t6 : 0000000000000000 CPU 1: epc : ffffffff80003f34 ra : ffffffff808caa7c sp : ff2000000068bf30 gp : ffffffff815fcb48 tp : ff6000000240d400 t0 : 0000000000000000 t1 : 0000000000000000 t2 : 0000000000000000 s0 : ff2000000068bf40 s1 : 0000000000000001 a0 : 0000000000000000 a1 : 0000000000000000 a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000 s2 : ffffffff816001c8 s3 : ffffffff81600370 s4 : ffffffff80c32e18 s5 : ffffffff819d3018 s6 : ffffffff810e2110 s7 : 0000000000000000 s8 : 0000000000000000 s9 : 0000000080039ea8 s10: 0000000000000000 s11: 0000000000000000 t3 : 0000000000000000 t4 : 0000000000000000 t5 : 0000000000000000 t6 : 0000000000000000 CPU 2: epc : ffffffff80003f34 ra : ffffffff808caa7c sp : ff20000000693f30 gp : ffffffff815fcb48 tp : ff6000000240e900 t0 : 0000000000000000 t1 : 0000000000000000 t2 : 0000000000000000 s0 : ff20000000693f40 s1 : 0000000000000002 a0 : 0000000000000000 a1 : 0000000000000000 a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000 s2 : ffffffff816001c8 s3 : ffffffff81600370 s4 : ffffffff80c32e18 s5 : ffffffff819d3018 s6 : ffffffff810e2110 s7 : 0000000000000000 s8 : 0000000000000000 s9 : 0000000080039eb0 s10: 0000000000000000 s11: 0000000000000000 t3 : 0000000000000000 t4 : 0000000000000000 t5 : 0000000000000000 t6 : 0000000000000000 CPU 3: epc : ffffffff8000a1e4 ra : ffffffff800b7bba sp : ff200000109bbb40 gp : ffffffff815fcb48 tp : ff6000000373aa00 t0 : 6666666666663c5b t1 : 0000000000000000 t2 : 666666666666663c s0 : ff200000109bbc90 s1 : ffffffff816007a0 a0 : ff200000109bbb48 a1 : 0000000000000000 a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000000 a5 : ff60000002c61c00 a6 : 0000000000000000 a7 : 0000000000000000 s2 : ff200000109bbb48 s3 : ffffffff810941a8 s4 : ffffffff816004b4 s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80e7f7a0 s8 : 00fffffffffff3f0 s9 : 0000000000000007 s10: 00aaaaaaaab98700 s11: 0000000000000001 t3 : ffffffff819a8097 t4 : ffffffff819a8097 t5 : ffffffff819a8098 t6 : ff200000109bb9a8 Fixes: ad943893d5f1 ("RISC-V: Fixup schedule out issue in machine_crash_shutdown()") Reviewed-by: Xianting Tian <xianting.tian@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Nick Kossifidis <mick@ics.forth.gr> Link: https://lore.kernel.org/r/20221020141603.2856206-3-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-29riscv: kexec: Fixup irq controller broken in kexec crash pathGuo Ren1-0/+35
If a crash happens on cpu3 and all interrupts are binding on cpu0, the bad irq routing will cause a crash kernel which can't receive any irq. Because crash kernel won't clean up all harts' PLIC enable bits in enable registers. This patch is similar to 9141a003a491 ("ARM: 7316/1: kexec: EOI active and mask all interrupts in kexec crash path") and 78fd584cdec0 ("arm64: kdump: implement machine_crash_shutdown()"), and PowerPC also has the same mechanism. Fixes: fba8a8674f68 ("RISC-V: Add kexec support") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Reviewed-by: Xianting Tian <xianting.tian@linux.alibaba.com> Cc: Nick Kossifidis <mick@ics.forth.gr> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20221020141603.2856206-2-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-29riscv: mm: Proper page permissions after initmem freeBjörn Töpel1-4/+5
64-bit RISC-V kernels have the kernel image mapped separately to alias the linear map. The linear map and the kernel image map are documented as "direct mapping" and "kernel" respectively in [1]. At image load time, the linear map corresponding to the kernel image is set to PAGE_READ permission, and the kernel image map is set to PAGE_READ|PAGE_EXEC. When the initmem is freed, the pages in the linear map should be restored to PAGE_READ|PAGE_WRITE, whereas the corresponding pages in the kernel image map should be restored to PAGE_READ, by removing the PAGE_EXEC permission. This is not the case. For 64-bit kernels, only the linear map is restored to its proper page permissions at initmem free, and not the kernel image map. In practise this results in that the kernel can potentially jump to dead __init code, and start executing invalid instructions, without getting an exception. Restore the freed initmem properly, by setting both the kernel image map to the correct permissions. [1] Documentation/riscv/vm-layout.rst Fixes: e5c35fa04019 ("riscv: Map the kernel with correct permissions the first time") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Alexandre Ghiti <alex@ghiti.fr> Tested-by: Alexandre Ghiti <alex@ghiti.fr> Link: https://lore.kernel.org/r/20221115090641.258476-1-bjorn@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-29riscv: vdso: fix section overlapping under some conditionsJisheng Zhang1-0/+1
lkp reported a build error, I tried the config and can reproduce build error as below: VDSOLD arch/riscv/kernel/vdso/vdso.so.dbg ld.lld: error: section .note file range overlaps with .text >>> .note range is [0x7C8, 0x803] >>> .text range is [0x800, 0x1993] ld.lld: error: section .text file range overlaps with .dynamic >>> .text range is [0x800, 0x1993] >>> .dynamic range is [0x808, 0x937] ld.lld: error: section .note virtual address range overlaps with .text >>> .note range is [0x7C8, 0x803] >>> .text range is [0x800, 0x1993] Fix it by setting DISABLE_BRANCH_PROFILING which will disable branch tracing for vdso, thus avoid useless _ftrace_annotated_branch section and _ftrace_branch section. Although we can also fix it by removing the hardcoded .text begin address, but I think that's another story and should be put into another patch. Link: https://lore.kernel.org/lkml/202210122123.Cc4FPShJ-lkp@intel.com/#r Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/r/20221102170254.1925-1-jszhang@kernel.org Fixes: ad5d1122b82f ("riscv: use vDSO common flow to reduce the latency of the time-related functions") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-29riscv: fix race when vmap stack overflowJisheng Zhang2-0/+31
Currently, when detecting vmap stack overflow, riscv firstly switches to the so called shadow stack, then use this shadow stack to call the get_overflow_stack() to get the overflow stack. However, there's a race here if two or more harts use the same shadow stack at the same time. To solve this race, we introduce spin_shadow_stack atomic var, which will be swap between its own address and 0 in atomic way, when the var is set, it means the shadow_stack is being used; when the var is cleared, it means the shadow_stack isn't being used. Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Suggested-by: Guo Ren <guoren@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20221030124517.2370-1-jszhang@kernel.org [Palmer: Add AQ to the swap, and also some comments.] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-18Merge tag 'efi-zboot-direct-for-v6.2' into efi/nextArd Biesheuvel1-6/+0
2022-11-10RISC-V: vdso: Do not add missing symbols to version section in linker scriptNathan Chancellor2-0/+5
Recently, ld.lld moved from '--undefined-version' to '--no-undefined-version' as the default, which breaks the compat vDSO build: ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_gettimeofday' failed: symbol not defined ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_clock_gettime' failed: symbol not defined ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_clock_getres' failed: symbol not defined These symbols are not present in the compat vDSO or the regular vDSO for 32-bit but they are unconditionally included in the version section of the linker script, which is prohibited with '--no-undefined-version'. Fix this issue by only including the symbols that are actually exported in the version section of the linker script. Link: https://github.com/ClangBuiltLinux/linux/issues/1756 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221108171324.3377226-1-nathan@kernel.org/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-10riscv: fix reserved memory setupConor Dooley1-0/+1
Currently, RISC-V sets up reserved memory using the "early" copy of the device tree. As a result, when trying to get a reserved memory region using of_reserved_mem_lookup(), the pointer to reserved memory regions is using the early, pre-virtual-memory address which causes a kernel panic when trying to use the buffer's name: Unable to handle kernel paging request at virtual address 00000000401c31ac Oops [#1] Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1 Hardware name: Microchip PolarFire-SoC Icicle Kit (DT) epc : string+0x4a/0xea ra : vsnprintf+0x1e4/0x336 epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0 gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000 t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20 s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000 a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008 s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00 s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002 s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617 t5 : ffffffff812f3618 t6 : ffffffff81203d08 status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d [<ffffffff80338936>] vsnprintf+0x1e4/0x336 [<ffffffff80055ae2>] vprintk_store+0xf6/0x344 [<ffffffff80055d86>] vprintk_emit+0x56/0x192 [<ffffffff80055ed8>] vprintk_default+0x16/0x1e [<ffffffff800563d2>] vprintk+0x72/0x80 [<ffffffff806813b2>] _printk+0x36/0x50 [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24 [<ffffffff808057ec>] paging_init+0x528/0x5bc [<ffffffff808031ae>] setup_arch+0xd0/0x592 [<ffffffff8080070e>] start_kernel+0x82/0x73c early_init_fdt_scan_reserved_mem() takes no arguments as it operates on initial_boot_params, which is populated by early_init_dt_verify(). On RISC-V, early_init_dt_verify() is called twice. Once, directly, in setup_arch() if CONFIG_BUILTIN_DTB is not enabled and once indirectly, very early in the boot process, by parse_dtb() when it calls early_init_dt_scan_nodes(). This first call uses dtb_early_va to set initial_boot_params, which is not usable later in the boot process when early_init_fdt_scan_reserved_mem() is called. On arm64 for example, the corresponding call to early_init_dt_scan_nodes() uses fixmap addresses and doesn't suffer the same fate. Move early_init_fdt_scan_reserved_mem() further along the boot sequence, after the direct call to early_init_dt_verify() in setup_arch() so that the names use the correct virtual memory addresses. The above supposed that CONFIG_BUILTIN_DTB was not set, but should work equally in the case where it is - unflatted_and_copy_device_tree() also updates initial_boot_params. Reported-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com> Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com> Link: https://lore.kernel.org/linux-riscv/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/ Fixes: 922b0375fc93 ("riscv: Fix memblock reservation for device tree blob") Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com> Link: https://lore.kernel.org/r/20221107151524.3941467-1-conor.dooley@microchip.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-10riscv: vdso: fix build with llvmJisheng Zhang1-1/+1
Even after commit 89fd4a1df829 ("riscv: jump_label: mark arguments as const to satisfy asm constraints"), building with CC_OPTIMIZE_FOR_SIZE + LLVM=1 can reproduce below build error: CC arch/riscv/kernel/vdso/vgettimeofday.o In file included from <built-in>:4: In file included from lib/vdso/gettimeofday.c:5: In file included from include/vdso/datapage.h:17: In file included from include/vdso/processor.h:10: In file included from arch/riscv/include/asm/vdso/processor.h:7: In file included from include/linux/jump_label.h:112: arch/riscv/include/asm/jump_label.h:42:3: error: invalid operand for inline asm constraint 'i' " .option push \n\t" ^ 1 error generated. I think the problem is when "-Os" is passed as CFLAGS, it's removed by "CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os" which is introduced in commit e05d57dcb8c7 ("riscv: Fixup __vdso_gettimeofday broke dynamic ftrace"), thus no optimization at all for vgettimeofday.c arm64 does remove "-Os" as well, but it forces "-O2" after removing "-Os". I compared the generated vgettimeofday.o with "-O2" and "-Os", I think no big performance difference. So let's tell the kbuild not to remove "-Os" rather than follow arm64 style. vdso related performance can be improved a lot when building kernel with CC_OPTIMIZE_FOR_SIZE after this commit, ("-Os" VS no optimization) Fixes: e05d57dcb8c7 ("riscv: Fixup __vdso_gettimeofday broke dynamic ftrace") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Tested-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221031182943.2453-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-10riscv: process: fix kernel info leakageJisheng Zhang1-0/+2
thread_struct's s[12] may contain random kernel memory content, which may be finally leaked to userspace. This is a security hole. Fix it by clearing the s[12] array in thread_struct when fork. As for kthread case, it's better to clear the s[12] array as well. Fixes: 7db91e57a0ac ("RISC-V: Task implementation") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Tested-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20221029113450.4027-1-jszhang@kernel.org Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/CAJF2gTSdVyAaM12T%2B7kXAdRPGS4VyuO08X1c7paE-n4Fr8OtRA@mail.gmail.com/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-09efi: libstub: Provide local implementations of strrchr() and memchr()Ard Biesheuvel1-2/+0
Clone the implementations of strrchr() and memchr() in lib/string.c so we can use them in the standalone zboot decompressor app. These routines are used by the FDT handling code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-09efi: libstub: Enable efi_printk() in zboot decompressorArd Biesheuvel1-2/+0
Split the efi_printk() routine into its own source file, and provide local implementations of strlen() and strnlen() so that the standalone zboot app can efi_err and efi_info etc. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-09efi: libstub: Clone memcmp() into the stubArd Biesheuvel1-1/+0
We will no longer be able to call into the kernel image once we merge the decompressor with the EFI stub, so we need our own implementation of memcmp(). Let's add the one from lib/string.c and simplify it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-09efi: libstub: Use local strncmp() implementation unconditionallyArd Biesheuvel1-1/+0
In preparation for moving the EFI stub functionality into the zboot decompressor, switch to the stub's implementation of strncmp() unconditionally. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-10-27RISC-V: Fix /proc/cpuinfo cpumask warningAndrew Jones1-0/+3
Commit 78e5a3399421 ("cpumask: fix checking valid cpu range") has started issuing warnings[*] when cpu indices equal to nr_cpu_ids - 1 are passed to cpumask_next* functions. seq_read_iter() and cpuinfo's start and next seq operations implement a pattern like n = cpumask_next(n - 1, mask); show(n); while (1) { ++n; n = cpumask_next(n - 1, mask); if (n >= nr_cpu_ids) break; show(n); } which will issue the warning when reading /proc/cpuinfo. Ensure no warning is generated by validating the cpu index before calling cpumask_next(). [*] Warnings will only appear with DEBUG_PER_CPU_MAPS enabled. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20221014155845.1986223-2-ajones@ventanamicro.com/ Fixes: 78e5a3399421 ("cpumask: fix checking valid cpu range") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-27RISC-V: Cache SBI vendor valuesHeiko Stuebner1-3/+27
sbi_get_mvendorid(), sbi_get_marchid() and sbi_get_mimpid() might get called multiple times, though the values of these CSRs should not change during the runtime of a specific machine. Though the values can be different depending on which hart of the system they get called. So hook into the newly introduced cpuinfo struct to allow retrieving these cached values via new functions. Also use arch_initcall for the cpuinfo setup instead, as that now clearly is "architecture specific initialization" and also makes these information available slightly earlier. [caching vendor ids] Suggested-by: Atish Patra <atishp@atishpatra.org> [using cpuinfo struct as cache] Suggested-by: Anup Patel <apatel@ventanamicro.com> Link: https://lore.kernel.org/all/20221011231841.2951264-2-heiko@sntech.de/ Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-14Merge tag 'riscv-for-linus-6.1-mw2' of ↵Linus Torvalds6-34/+85
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - DT updates for the PolarFire SOC - a fix to correct the handling of write-only mappings - m{vetndor,arcd,imp}id is now in /proc/cpuinfo - the SiFive L2 cache controller support has been refactored to also support L3 caches - misc fixes, cleanups and improvements throughout the tree * tag 'riscv-for-linus-6.1-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits) MAINTAINERS: add RISC-V's patchwork RISC-V: Make port I/O string accessors actually work riscv: enable software resend of irqs RISC-V: Re-enable counter access from userspace riscv: vdso: fix NULL deference in vdso_join_timens() when vfork riscv: Add cache information in AUX vector soc: sifive: ccache: define the macro for the register shifts soc: sifive: ccache: use pr_fmt() to remove CCACHE: prefixes soc: sifive: ccache: reduce printing on init soc: sifive: ccache: determine the cache level from dts soc: sifive: ccache: Rename SiFive L2 cache to Composable cache. dt-bindings: sifive-ccache: change Sifive L2 cache to Composable cache riscv: check for kernel config option in t-head memory types errata riscv: use BIT() marco for cpufeature probing riscv: use BIT() macros in t-head errata init riscv: drop some idefs from CMO initialization riscv: cleanup svpbmt cpufeature probing riscv: Pass -mno-relax only on lld < 15.0.0 RISC-V: Avoid dereferening NULL regs in die() dt-bindings: riscv: add new riscv,isa strings for emulators ...
2022-10-13RISC-V: Add mvendorid, marchid, and mimpid to /proc/cpuinfo outputPalmer Dabbelt1-0/+51
I'm merging this in as a single commit as it's a dependency for some other work. * commit '3baca1a4d490484fcd555413f1fec85b2e071912': RISC-V: Add mvendorid, marchid, and mimpid to /proc/cpuinfo output
2022-10-13RISC-V: Make mmap() with PROT_WRITE imply PROT_READPalmer Dabbelt1-3/+0
Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid") made mmap() reject mappings with only PROT_WRITE set in an attempt to fix an observed inconsistency in behavior when attempting to read from a PROT_WRITE-only mapping. The root cause of this behavior was actually that while RISC-V's protection_map maps VM_WRITE to readable PTE permissions (since write-only PTEs are considered reserved by the privileged spec), the page fault handler considered loads from VM_WRITE-only VMAs illegal accesses. Fix the underlying cause by handling faults in VM_WRITE-only VMAs (patch 1) and then re-enable use of mmap(PROT_WRITE) (patch 2), making RISC-V's behavior consistent with all other architectures that don't support write-only PTEs. * remotes/palmer/riscv-wonly: riscv: Allow PROT_WRITE-only mmap() riscv: Make VM_WRITE imply VM_READ Link: https://lore.kernel.org/r/20220915193702.2201018-1-abrestic@rivosinc.com/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-13riscv: vdso: fix NULL deference in vdso_join_timens() when vforkJisheng Zhang1-3/+10
Testing tools/testing/selftests/timens/vfork_exec.c got below kernel log: [ 6.838454] Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000020 [ 6.842255] Oops [#1] [ 6.842871] Modules linked in: [ 6.844249] CPU: 1 PID: 64 Comm: vfork_exec Not tainted 6.0.0-rc3-rt15+ #8 [ 6.845861] Hardware name: riscv-virtio,qemu (DT) [ 6.848009] epc : vdso_join_timens+0xd2/0x110 [ 6.850097] ra : vdso_join_timens+0xd2/0x110 [ 6.851164] epc : ffffffff8000635c ra : ffffffff8000635c sp : ff6000000181fbf0 [ 6.852562] gp : ffffffff80cff648 tp : ff60000000fdb700 t0 : 3030303030303030 [ 6.853852] t1 : 0000000000000030 t2 : 3030303030303030 s0 : ff6000000181fc40 [ 6.854984] s1 : ff60000001e6c000 a0 : 0000000000000010 a1 : ffffffff8005654c [ 6.856221] a2 : 00000000ffffefff a3 : 0000000000000000 a4 : 0000000000000000 [ 6.858114] a5 : 0000000000000000 a6 : 0000000000000008 a7 : 0000000000000038 [ 6.859484] s2 : ff60000001e6c068 s3 : ff6000000108abb0 s4 : 0000000000000000 [ 6.860751] s5 : 0000000000001000 s6 : ffffffff8089dc40 s7 : ffffffff8089dc38 [ 6.862029] s8 : ffffffff8089dc30 s9 : ff60000000fdbe38 s10: 000000000000005e [ 6.863304] s11: ffffffff80cc3510 t3 : ffffffff80d1112f t4 : ffffffff80d1112f [ 6.864565] t5 : ffffffff80d11130 t6 : ff6000000181fa00 [ 6.865561] status: 0000000000000120 badaddr: 0000000000000020 cause: 000000000000000d [ 6.868046] [<ffffffff8008dc94>] timens_commit+0x38/0x11a [ 6.869089] [<ffffffff8008dde8>] timens_on_fork+0x72/0xb4 [ 6.870055] [<ffffffff80190096>] begin_new_exec+0x3c6/0x9f0 [ 6.871231] [<ffffffff801d826c>] load_elf_binary+0x628/0x1214 [ 6.872304] [<ffffffff8018ee7a>] bprm_execve+0x1f2/0x4e4 [ 6.873243] [<ffffffff8018f90c>] do_execveat_common+0x16e/0x1ee [ 6.874258] [<ffffffff8018f9c8>] sys_execve+0x3c/0x48 [ 6.875162] [<ffffffff80003556>] ret_from_syscall+0x0/0x2 [ 6.877484] ---[ end trace 0000000000000000 ]--- This is because the mm->context.vdso_info is NULL in vfork case. From another side, mm->context.vdso_info either points to vdso info for RV64 or vdso info for compat, there's no need to bloat riscv's mm_context_t, we can handle the difference when setup the additional page for vdso. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Suggested-by: Palmer Dabbelt <palmer@rivosinc.com> Fixes: 3092eb456375 ("riscv: compat: vdso: Add setup additional pages implementation") Link: https://lore.kernel.org/r/20220924070737.3048-1-jszhang@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>