aboutsummaryrefslogtreecommitdiff
path: root/arch/riscv/kernel/probes
AgeCommit message (Collapse)AuthorFilesLines
2024-07-24trace: riscv: Remove deprecated kprobe on ftrace supportJinjie Ruan2-66/+0
Since commit 7caa9765465f60 ("ftrace: riscv: move from REGS to ARGS"), kprobe on ftrace is not supported by riscv, because riscv's support for FTRACE_WITH_REGS has been replaced with support for FTRACE_WITH_ARGS, and KPROBES_ON_FTRACE will be supplanted by FPROBES. So remove the deprecated kprobe on ftrace support, which is misunderstood. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240613111347.1745379-1-ruanjinjie@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: Pass patch_text() the length in bytesSamuel Holland1-8/+10
patch_text_nosync() already handles an arbitrary length of code, so this removes a superfluous loop and reduces the number of icache flushes. Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240327160520.791322-6-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-26riscv: kprobes: Use patch_text_nosync() for insn slotsSamuel Holland1-3/+2
These instructions are not yet visible to the rest of the system, so there is no need to do the whole stop_machine() dance. Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240327160520.791322-4-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-19Merge tag 'mm-stable-2024-05-17-19-19' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ...
2024-05-17Merge tag 'probes-v6.10' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: - tracing/probes: Add new pseudo-types %pd and %pD support for dumping dentry name from 'struct dentry *' and file name from 'struct file *' - uprobes performance optimizations: - Speed up the BPF uprobe event by delaying the fetching of the uprobe event arguments that are not used in BPF - Avoid locking by speculatively checking whether uprobe event is valid - Reduce lock contention by using read/write_lock instead of spinlock for uprobe list operation. This improved BPF uprobe benchmark result 43% on average - rethook: Remove non-fatal warning messages when tracing stack from BPF and skip rcu_is_watching() validation in rethook if possible - objpool: Optimize objpool (which is used by kretprobes and fprobe as rethook backend storage) by inlining functions and avoid caching nr_cpu_ids because it is a const value - fprobe: Add entry/exit callbacks types (code cleanup) - kprobes: Check ftrace was killed in kprobes if it uses ftrace * tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: kprobe/ftrace: bail out if ftrace was killed selftests/ftrace: Fix required features for VFS type test case objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids objpool: enable inlining objpool_push() and objpool_pop() operations rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get() ftrace: make extra rcu_is_watching() validation check optional uprobes: reduce contention on uprobes_tree access rethook: Remove warning messages printed for finding return address of a frame. fprobe: Add entry/exit callbacks types selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD" selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD" Documentation: tracing: add new type '%pd' and '%pD' for kprobe tracing/probes: support '%pD' type for print struct file's name tracing/probes: support '%pd' type for print struct dentry's name uprobes: add speculative lockless system-wide uprobe filter check uprobes: prepare uprobe args buffer lazily uprobes: encapsulate preparation of uprobe args buffer
2024-05-16kprobe/ftrace: bail out if ftrace was killedStephen Brennan1-0/+3
If an error happens in ftrace, ftrace_kill() will prevent disarming kprobes. Eventually, the ftrace_ops associated with the kprobes will be freed, yet the kprobes will still be active, and when triggered, they will use the freed memory, likely resulting in a page fault and panic. This behavior can be reproduced quite easily, by creating a kprobe and then triggering a ftrace_kill(). For simplicity, we can simulate an ftrace error with a kernel module like [1]: [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer sudo perf probe --add commit_creds sudo perf trace -e probe:commit_creds # In another terminal make sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug # Back to perf terminal # ctrl-c sudo perf probe --del commit_creds After a short period, a page fault and panic would occur as the kprobe continues to execute and uses the freed ftrace_ops. While ftrace_kill() is supposed to be used only in extreme circumstances, it is invoked in FTRACE_WARN_ON() and so there are many places where an unexpected bug could be triggered, yet the system may continue operating, possibly without the administrator noticing. If ftrace_kill() does not panic the system, then we should do everything we can to continue operating, rather than leave a ticking time bomb. Link: https://lore.kernel.org/all/20240501162956.229427-1-stephen.s.brennan@oracle.com/ Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-14riscv: extend execmem_params for generated code allocationsMike Rapoport (IBM)1-10/+0
The memory allocations for kprobes and BPF on RISC-V are not placed in the modules area and these custom allocations are implemented with overrides of alloc_insn_page() and bpf_jit_alloc_exec(). Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END for 32 bit and slightly reorder execmem_params initialization to support both 32 and 64 bit variants, define EXECMEM_KPROBES and EXECMEM_BPF ranges in riscv::execmem_params and drop overrides of alloc_insn_page() and bpf_jit_alloc_exec(). Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-04-25fix missing vmalloc.h includesKent Overstreet1-0/+1
Patch series "Memory allocation profiling", v6. Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production. Example output: root@moria-kvm:~# sort -rn /proc/allocinfo 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext 56373248 4737 mm/slub.c:2259 func:alloc_slab_page 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start 3940352 962 mm/memory.c:4214 func:alloc_anon_folio 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node ... Usage: kconfig options: - CONFIG_MEM_ALLOC_PROFILING - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT - CONFIG_MEM_ALLOC_PROFILING_DEBUG adds warnings for allocations that weren't accounted because of a missing annotation sysctl: /proc/sys/vm/mem_profiling Runtime info: /proc/allocinfo Notes: [1]: Overhead To measure the overhead we are comparing the following configurations: (1) Baseline with CONFIG_MEMCG_KMEM=n (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT (6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y (7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y Performance overhead: To evaluate performance we implemented an in-kernel test executing multiple get_free_page/free_page and kmalloc/kfree calls with allocation sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU affinity set to a specific CPU to minimize the noise. Below are results from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on 56 core Intel Xeon: kmalloc pgalloc (1 baseline) 6.764s 16.902s (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) (6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%) (7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%) Memory overhead: Kernel size: text data bss dec diff (1) 26515311 18890222 17018880 62424413 (2) 26524728 19423818 16740352 62688898 264485 (3) 26524724 19423818 16740352 62688894 264481 (4) 26524728 19423818 16740352 62688898 264485 (5) 26541782 18964374 16957440 62463596 39183 Memory consumption on a 56 core Intel CPU with 125GB of memory: Code tags: 192 kB PageExts: 262144 kB (256MB) SlabExts: 9876 kB (9.6MB) PcpuExts: 512 kB (0.5MB) Total overhead is 0.2% of total memory. Benchmarks: Hackbench tests run 100 times: hackbench -s 512 -l 200 -g 15 -f 25 -P baseline disabled profiling enabled profiling avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023) stdev 0.0137 0.0188 0.0077 hackbench -l 10000 baseline disabled profiling enabled profiling avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859) stdev 0.0933 0.0286 0.0489 stress-ng tests: stress-ng --class memory --seq 4 -t 60 stress-ng --class cpu --seq 4 -t 60 Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/ [2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/ This patch (of 37): The next patch drops vmalloc.h from a system header in order to fix a circular dependency; this adds it to all the files that were pulling it in implicitly. [kent.overstreet@linux.dev: fix arch/alpha/lib/memcpy.c] Link: https://lkml.kernel.org/r/20240327002152.3339937-1-kent.overstreet@linux.dev [surenb@google.com: fix arch/x86/mm/numa_32.c] Link: https://lkml.kernel.org/r/20240402180933.1663992-1-surenb@google.com [kent.overstreet@linux.dev: a few places were depending on sizes.h] Link: https://lkml.kernel.org/r/20240404034744.1664840-1-kent.overstreet@linux.dev [arnd@arndb.de: fix mm/kasan/hw_tags.c] Link: https://lkml.kernel.org/r/20240404124435.3121534-1-arnd@kernel.org [surenb@google.com: fix arc build] Link: https://lkml.kernel.org/r/20240405225115.431056-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-2-surenb@google.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-11-06riscv: Use SYM_*() assembly macros instead of deprecated onesClément Léger1-2/+2
ENTRY()/END()/WEAK() macros are deprecated and we should make use of the new SYM_*() macros [1] for better annotation of symbols. Replace the deprecated ones with the new ones and fix wrong usage of END()/ENDPROC() to correctly describe the symbols. [1] https://docs.kernel.org/core-api/asm-annotations.html Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231024132655.730417-3-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-05riscv: kprobes: allow writing to x0Nam Cao1-1/+1
Instructions can write to x0, so we should simulate these instructions normally. Currently, the kernel hangs if an instruction who writes to x0 is simulated. Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Cc: stable@vger.kernel.org Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230829182500.61875-1-namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-05riscv: provide riscv-specific is_trap_insn()Nam Cao1-0/+6
uprobes expects is_trap_insn() to return true for any trap instructions, not just the one used for installing uprobe. The current default implementation only returns true for 16-bit c.ebreak if C extension is enabled. This can confuse uprobes if a 32-bit ebreak generates a trap exception from userspace: uprobes asks is_trap_insn() who says there is no trap, so uprobes assume a probe was there before but has been removed, and return to the trap instruction. This causes an infinite loop of entering and exiting trap handler. Instead of using the default implementation, implement this function speficially for riscv with checks for both ebreak and c.ebreak. Fixes: 74784081aac8 ("riscv: Add uprobes supported") Signed-off-by: Nam Cao <namcaov@gmail.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230829083614.117748-1-namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: kprobes: simulate c.beqz and c.bnezNam Cao3-2/+48
kprobes currently rejects instruction c.beqz and c.bnez. Implement them. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/1d879dba4e4ee9a82e27625d6483b5c9cfed684f.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: kprobes: simulate c.jr and c.jalr instructionsNam Cao3-2/+41
kprobes currently rejects c.jr and c.jalr instructions. Implement them. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/db8b7787e9208654cca50484f68334f412be2ea9.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: kprobes: simulate c.j instructionNam Cao3-1/+27
kprobes currently rejects c.j instruction. Implement it. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/6ef76cd9984b8015826649d13f870f8ac45a2d0d.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-30Merge tag 'riscv-for-linus-6.5-mw1' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for ACPI - Various cleanups to the ISA string parsing, including making them case-insensitive - Support for the vector extension - Support for independent irq/softirq stacks - Our CPU DT binding now has "unevaluatedProperties: false" * tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (78 commits) riscv: hibernate: remove WARN_ON in save_processor_state dt-bindings: riscv: cpus: switch to unevaluatedProperties: false dt-bindings: riscv: cpus: add a ref the common cpu schema riscv: stack: Add config of thread stack size riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK RISC-V: always report presence of extensions formerly part of the base ISA dt-bindings: riscv: explicitly mention assumption of Zicntr & Zihpm support RISC-V: remove decrement/increment dance in ISA string parser RISC-V: rework comments in ISA string parser RISC-V: validate riscv,isa at boot, not during ISA string parsing RISC-V: split early & late of_node to hartid mapping RISC-V: simplify register width check in ISA string parsing perf: RISC-V: Limit the number of counters returned from SBI riscv: replace deprecated scall with ecall riscv: uprobes: Restore thread.bad_cause riscv: mm: try VMA lock-based page fault handling first riscv: mm: Pre-allocate PGD entries for vmalloc/modules area RISC-V: hwprobe: Expose Zba, Zbb, and Zbs RISC-V: Track ISA extensions per hart ...
2023-06-20riscv: uprobes: Restore thread.bad_causeTiezhu Yang1-0/+2
thread.bad_cause is saved in arch_uprobe_pre_xol(), it should be restored in arch_uprobe_{post,abort}_xol() accordingly, otherwise the save operation is meaningless, this change is similar with x86 and powerpc. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Guo Ren <guoren@kernel.org> Fixes: 74784081aac8 ("riscv: Add uprobes supported") Link: https://lore.kernel.org/r/1682214146-3756-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-05-18rethook, fprobe: do not trace rethook related functionsZe Gao1-0/+2
These functions are already marked as NOKPROBE to prevent recursion and we have the same reason to blacklist them if rethook is used with fprobe, since they are beyond the recursion-free region ftrace can guard. Link: https://lore.kernel.org/all/20230517034510.15639-5-zegao@tencent.com/ Fixes: f3a112c0c40d ("x86,rethook,kprobes: Replace kretprobe with rethook on x86") Signed-off-by: Ze Gao <zegao@tencent.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-25Merge tag 'riscv-for-linus-6.3-mw1' of ↵Linus Torvalds2-37/+11
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: "There's a bunch of fixes/cleanups throughout the tree as usual, but we also have a handful of new features: - Various improvements to the extension detection and alternative patching infrastructure - Zbb-optimized string routines - Support for cpu-capacity in the RISC-V DT bindings - Zicbom no longer depends on toolchain support - Some performance and code size improvements to ftrace - Support for ARCH_WANT_LD_ORPHAN_WARN - Oops now contain the faulting instruction" * tag 'riscv-for-linus-6.3-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (67 commits) RISC-V: add a spin_shadow_stack declaration riscv: mm: hugetlb: Enable ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP riscv: Add header include guards to insn.h riscv: alternative: proceed one more instruction for auipc/jalr pair riscv: Avoid enabling interrupts in die() riscv, mm: Perform BPF exhandler fixup on page fault RISC-V: take text_mutex during alternative patching riscv: hwcap: Don't alphabetize ISA extension IDs RISC-V: fix ordering of Zbb extension riscv: jump_label: Fixup unaligned arch_static_branch function RISC-V: Only provide the single-letter extensions in HWCAP riscv: mm: fix regression due to update_mmu_cache change scripts/decodecode: Add support for RISC-V riscv: Add instruction dump to RISC-V splats riscv: select ARCH_WANT_LD_ORPHAN_WARN for !XIP_KERNEL riscv: vmlinux.lds.S: explicitly catch .init.bss sections from EFI stub riscv: vmlinux.lds.S: explicitly catch .riscv.attributes sections riscv: vmlinux.lds.S: explicitly catch .rela.dyn symbols riscv: lds: define RUNTIME_DISCARD_EXIT RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes ...
2023-02-20Merge tag 'for-netdev' of ↵Jakub Kicinski1-7/+8
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-02-17 We've added 64 non-merge commits during the last 7 day(s) which contain a total of 158 files changed, 4190 insertions(+), 988 deletions(-). The main changes are: 1) Add a rbtree data structure following the "next-gen data structure" precedent set by recently-added linked-list, that is, by using kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky. 2) Add a new benchmark for hashmap lookups to BPF selftests, from Anton Protopopov. 3) Fix bpf_fib_lookup to only return valid neighbors and add an option to skip the neigh table lookup, from Martin KaFai Lau. 4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accouting for container environments, from Yafang Shao. 5) Batch of ice multi-buffer and driver performance fixes, from Alexander Lobakin. 6) Fix a bug in determining whether global subprog's argument is PTR_TO_CTX, which is based on type names which breaks kprobe progs, from Andrii Nakryiko. 7) Prep work for future -mcpu=v4 LLVM option which includes usage of BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier, from Eduard Zingerman. 8) More prep work for later building selftests with Memory Sanitizer in order to detect usages of undefined memory, from Ilya Leoshkevich. 9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer dereference via sendmsg(), from Maciej Fijalkowski. 10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui. 11) Fix BPF memory allocator in combination with BPF hashtab where it could corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao. 12) Fix LoongArch BPF JIT to always use 4 instructions for function address so that instruction sequences don't change between passes, from Hengqi Chen. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits) selftests/bpf: Add bpf_fib_lookup test bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup riscv, bpf: Add bpf trampoline support for RV64 riscv, bpf: Add bpf_arch_text_poke support for RV64 riscv, bpf: Factor out emit_call for kernel and bpf context riscv: Extend patch_text for multiple instructions Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" selftests/bpf: Add global subprog context passing tests selftests/bpf: Convert test_global_funcs test to test_loader framework bpf: Fix global subprog context argument resolution logic LoongArch, bpf: Use 4 instructions for function address in JIT bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state bpf: Disable bh in bpf_test_run for xdp and tc prog xsk: check IFF_UP earlier in Tx path Fix typos in selftest/bpf files selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd() libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd() ... ==================== Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-17riscv: Extend patch_text for multiple instructionsPu Lehui1-7/+8
Extend patch_text for multiple instructions. This is the preparaiton for multiple instructions text patching in riscv BPF trampoline, and may be useful for other scenario. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/bpf/20230215135205.1411105-2-pulehui@huaweicloud.com
2023-02-15RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobesHeiko Stuebner1-3/+0
The __RISCV_INSN_FUNCS originally declared riscv_insn_is_* functions inside the kprobes implementation. This got moved into a central header in commit ec5f90877516 ("RISC-V: Move riscv_insn_is_* macros into a common header"). Though it looks like I overlooked two of them, so fix that. FENCE itself is an instruction defined directly by its own opcode, while the created riscv_isn_is_system function covers all instructions defined under the SYSTEM opcode. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Link: https://lore.kernel.org/r/20230113211955.3534431-1-heiko@sntech.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-14Merge patch series "dt-bindings: Add a cpu-capacity property for RISC-V"Palmer Dabbelt1-2/+2
Conor Dooley <conor@kernel.org> says: From: Conor Dooley <conor.dooley@microchip.com> Ever since RISC-V starting using generic arch topology code, the code paths for cpu-capacity have been there but there's no binding defined to actually convey the information. Defining the same property as used on arm seems to be the only logical thing to do, so do it. [Palmer: This is on top of the fix required to make it work, which itself wasn't merged until late in the 6.2 cycle and thus pulls in various other fixes.] * b4-shazam-merge: dt-bindings: riscv: add a capacity-dmips-mhz cpu property dt-bindings: arm: move cpu-capacity to a shared loation riscv: Move call to init_cpu_topology() to later initialization stage riscv/kprobe: Fix instruction simulation of JALR riscv: fix -Wundef warning for CONFIG_RISCV_BOOT_SPINWAIT MAINTAINERS: add an IRC entry for RISC-V RISC-V: fix compile error from deduplicated __ALTERNATIVE_CFG_2 dt-bindings: riscv: fix single letter canonical order dt-bindings: riscv: fix underscore requirement for multi-letter extensions riscv: uaccess: fix type of 0 variable on error in get_user() riscv, kprobes: Stricter c.jr/c.jalr decoding Link: https://lore.kernel.org/r/20230104180513.1379453-1-conor@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-09riscv: kprobe: Fixup misaligned load textGuo Ren1-3/+5
The current kprobe would cause a misaligned load for the probe point. This patch fixup it with two half-word loads instead. Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/linux-riscv/878rhig9zj.fsf@all.your.base.are.belong.to.us/ Reported-by: Bjorn Topel <bjorn.topel@gmail.com> Reviewed-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20230204063531.740220-1-guoren@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-01riscv: kprobe: Fixup kernel panic when probing an illegal positionGuo Ren1-0/+18
The kernel would panic when probed for an illegal position. eg: (CONFIG_RISCV_ISA_C=n) echo 'p:hello kernel_clone+0x16 a0=%a0' >> kprobe_events echo 1 > events/kprobes/hello/enable cat trace Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __do_sys_newfstatat+0xb8/0xb8 CPU: 0 PID: 111 Comm: sh Not tainted 6.2.0-rc1-00027-g2d398fe49a4d #490 Hardware name: riscv-virtio,qemu (DT) Call Trace: [<ffffffff80007268>] dump_backtrace+0x38/0x48 [<ffffffff80c5e83c>] show_stack+0x50/0x68 [<ffffffff80c6da28>] dump_stack_lvl+0x60/0x84 [<ffffffff80c6da6c>] dump_stack+0x20/0x30 [<ffffffff80c5ecf4>] panic+0x160/0x374 [<ffffffff80c6db94>] generic_handle_arch_irq+0x0/0xa8 [<ffffffff802deeb0>] sys_newstat+0x0/0x30 [<ffffffff800158c0>] sys_clone+0x20/0x30 [<ffffffff800039e8>] ret_from_syscall+0x0/0x4 ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __do_sys_newfstatat+0xb8/0xb8 ]--- That is because the kprobe's ebreak instruction broke the kernel's original code. The user should guarantee the correction of the probe position, but it couldn't make the kernel panic. This patch adds arch_check_kprobe in arch_prepare_kprobe to prevent an illegal position (Such as the middle of an instruction). Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Reviewed-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20230201040604.3390509-1-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-24riscv/kprobe: Fix instruction simulation of JALRLiao Chang1-2/+2
Set kprobe at 'jalr 1140(ra)' of vfs_write results in the following crash: [ 32.092235] Unable to handle kernel access to user memory without uaccess routines at virtual address 00aaaaaad77b1170 [ 32.093115] Oops [#1] [ 32.093251] Modules linked in: [ 32.093626] CPU: 0 PID: 135 Comm: ftracetest Not tainted 6.2.0-rc2-00013-gb0aa5e5df0cb-dirty #16 [ 32.093985] Hardware name: riscv-virtio,qemu (DT) [ 32.094280] epc : ksys_read+0x88/0xd6 [ 32.094855] ra : ksys_read+0xc0/0xd6 [ 32.095016] epc : ffffffff801cda80 ra : ffffffff801cdab8 sp : ff20000000d7bdc0 [ 32.095227] gp : ffffffff80f14000 tp : ff60000080f9cb40 t0 : ffffffff80f13e80 [ 32.095500] t1 : ffffffff8000c29c t2 : ffffffff800dbc54 s0 : ff20000000d7be60 [ 32.095716] s1 : 0000000000000000 a0 : ffffffff805a64ae a1 : ffffffff80a83708 [ 32.095921] a2 : ffffffff80f160a0 a3 : 0000000000000000 a4 : f229b0afdb165300 [ 32.096171] a5 : f229b0afdb165300 a6 : ffffffff80eeebd0 a7 : 00000000000003ff [ 32.096411] s2 : ff6000007ff76800 s3 : fffffffffffffff7 s4 : 00aaaaaad77b1170 [ 32.096638] s5 : ffffffff80f160a0 s6 : ff6000007ff76800 s7 : 0000000000000030 [ 32.096865] s8 : 00ffffffc3d97be0 s9 : 0000000000000007 s10: 00aaaaaad77c9410 [ 32.097092] s11: 0000000000000000 t3 : ffffffff80f13e48 t4 : ffffffff8000c29c [ 32.097317] t5 : ffffffff8000c29c t6 : ffffffff800dbc54 [ 32.097505] status: 0000000200000120 badaddr: 00aaaaaad77b1170 cause: 000000000000000d [ 32.098011] [<ffffffff801cdb72>] ksys_write+0x6c/0xd6 [ 32.098222] [<ffffffff801cdc06>] sys_write+0x2a/0x38 [ 32.098405] [<ffffffff80003c76>] ret_from_syscall+0x0/0x2 Since the rs1 and rd might be the same one, such as 'jalr 1140(ra)', hence it requires obtaining the target address from rs1 followed by updating rd. Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Signed-off-by: Liao Chang <liaochang1@huawei.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230116064342.2092136-1-liaochang1@huawei.com [Palmer: Pick Guo's cleanup] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-05riscv, kprobes: Stricter c.jr/c.jalr decodingBjörn Töpel1-2/+2
In the compressed instruction extension, c.jr, c.jalr, c.mv, and c.add is encoded the following way (each instruction is 16b): ---+-+-----------+-----------+-- 100 0 rs1[4:0]!=0 00000 10 : c.jr 100 1 rs1[4:0]!=0 00000 10 : c.jalr 100 0 rd[4:0]!=0 rs2[4:0]!=0 10 : c.mv 100 1 rd[4:0]!=0 rs2[4:0]!=0 10 : c.add The following logic is used to decode c.jr and c.jalr: insn & 0xf007 == 0x8002 => instruction is an c.jr insn & 0xf007 == 0x9002 => instruction is an c.jalr When 0xf007 is used to mask the instruction, c.mv can be incorrectly decoded as c.jr, and c.add as c.jalr. Correct the decoding by changing the mask from 0xf007 to 0xf07f. Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230102160748.1307289-1-bjorn@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-29RISC-V: kprobes: use central defined funct3 constantsHeiko Stuebner1-13/+6
Don't redefine values that are already available in the central header asm/insn.h . Use the values from there instead. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Link: https://lore.kernel.org/r/20221223221332.4127602-9-heiko@sntech.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-29RISC-V: rename parse_asm.h to insn.hHeiko Stuebner1-1/+1
The current parse_asm header should become a more centralized place for everything concerning parsing and constructing instructions. We already have a header insn-def.h similar to aarch64, so rename parse_asm.h to insn.h (again similar to aarch64) to show that it's meant for more than simple instruction parsing. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Link: https://lore.kernel.org/r/20221223221332.4127602-8-heiko@sntech.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-29RISC-V: Move riscv_insn_is_* macros into a common headerHeiko Stuebner1-21/+5
Right now the riscv kernel has (at least) two independent sets of functions to check if an encoded instruction is of a specific type. One in kgdb and one kprobes simulate-insn code. More parts of the kernel will probably need this in the future, so instead of allowing this duplication to go on further, move macros that do the function declaration in a common header, similar to at least aarch64. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Link: https://lore.kernel.org/r/20221223221332.4127602-7-heiko@sntech.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02riscv: add riscv rethook implementationBinglei Wang5-17/+39
Implement the kretprobes on riscv arch by using rethook machenism which abstracts general kretprobe info into a struct rethook_node to be embedded in the struct kretprobe_instance. Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Binglei Wang <l3b2w1@gmail.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20221025151831.1097417-1-conor@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-08-11riscv:uprobe fix SR_SPIE set/clear handlingYipeng Zou1-6/+0
In riscv the process of uprobe going to clear spie before exec the origin insn,and set spie after that.But When access the page which origin insn has been placed a page fault may happen and irq was disabled in arch_uprobe_pre_xol function,It cause a WARN as follows. There is no need to clear/set spie in arch_uprobe_pre/post/abort_xol. We can just remove it. [ 31.684157] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1488 [ 31.684677] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 76, name: work [ 31.684929] preempt_count: 0, expected: 0 [ 31.685969] CPU: 2 PID: 76 Comm: work Tainted: G [ 31.686542] Hardware name: riscv-virtio,qemu (DT) [ 31.686797] Call Trace: [ 31.687053] [<ffffffff80006442>] dump_backtrace+0x30/0x38 [ 31.687699] [<ffffffff80812118>] show_stack+0x40/0x4c [ 31.688141] [<ffffffff8081817a>] dump_stack_lvl+0x44/0x5c [ 31.688396] [<ffffffff808181aa>] dump_stack+0x18/0x20 [ 31.688653] [<ffffffff8003e454>] __might_resched+0x114/0x122 [ 31.688948] [<ffffffff8003e4b2>] __might_sleep+0x50/0x7a [ 31.689435] [<ffffffff80822676>] down_read+0x30/0x130 [ 31.689728] [<ffffffff8000b650>] do_page_fault+0x166/x446 [ 31.689997] [<ffffffff80003c0c>] ret_from_exception+0x0/0xc Fixes: 74784081aac8 ("riscv: Add uprobes supported") Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Reviewed-by: Guo Ren <guoren@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220721065820.245755-1-zouyipeng@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2021-10-27ftrace: disable preemption when recursion locked王贇1-2/+0
As the documentation explained, ftrace_test_recursion_trylock() and ftrace_test_recursion_unlock() were supposed to disable and enable preemption properly, however currently this work is done outside of the function, which could be missing by mistake. And since the internal using of trace_test_and_set_recursion() and trace_clear_recursion() also require preemption disabled, we can just merge the logical. This patch will make sure the preemption has been disabled when trace_test_and_set_recursion() return bit >= 0, and trace_clear_recursion() will enable the preemption if previously enabled. Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com CC: Petr Mladek <pmladek@suse.com> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> CC: Steven Rostedt <rostedt@goodmis.org> CC: Miroslav Benes <mbenes@suse.cz> Reported-by: Abaci <abaci@linux.alibaba.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> [ Removed extra line in comment - SDR ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: treewide: Make it harder to refer kretprobe_trampoline directlyMasami Hiramatsu2-3/+3
Since now there is kretprobe_trampoline_addr() for referring the address of kretprobe trampoline code, we don't need to access kretprobe_trampoline directly. Make it harder to refer by renaming it to __kretprobe_trampoline(). Link: https://lkml.kernel.org/r/163163045446.489837.14510577516938803097.stgit@devnote2 Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()Masami Hiramatsu1-1/+1
The __kretprobe_trampoline_handler() callback, called from low level arch kprobes methods, has the 'trampoline_address' parameter, which is entirely superfluous as it basically just replicates: dereference_kernel_function_descriptor(kretprobe_trampoline) In fact we had bugs in arch code where it wasn't replicated correctly. So remove this superfluous parameter and use kretprobe_trampoline_addr() instead. Link: https://lkml.kernel.org/r/163163044546.489837.13505751885476015002.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: treewide: Cleanup the error messages for kprobesMasami Hiramatsu1-6/+5
This clean up the error/notification messages in kprobes related code. Basically this defines 'pr_fmt()' macros for each files and update the messages which describes - what happened, - what is the kernel going to do or not do, - is the kernel fine, - what can the user do about it. Also, if the message is not needed (e.g. the function returns unique error code, or other error message is already shown.) remove it, and replace the message with WARN_*() macros if suitable. Link: https://lkml.kernel.org/r/163163036568.489837.14085396178727185469.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-07-21riscv: kprobes: implement the branch instructionsChen Lifu2-2/+79
This has been tested by probing a module that contains each of the flavors of branches we have. Signed-off-by: Chen Lifu <chenlifu@huawei.com> [Palmer: commit message, fix kconfig errors] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-21riscv: kprobes: implement the auipc instructionChen Lifu2-1/+35
This has been tested by probing a module that contains an auipc instruction. Signed-off-by: Chen Lifu <chenlifu@huawei.com> [Palmer: commit message] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-09Merge tag 'riscv-for-linus-5.14-mw0' of ↵Linus Torvalds1-31/+9
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: "We have a handful of new features for 5.14: - Support for transparent huge pages. - Support for generic PCI resources mapping. - Support for the mem= kernel parameter. - Support for KFENCE. - A handful of fixes to avoid W+X mappings in the kernel. - Support for VMAP_STACK based overflow detection. - An optimized copy_{to,from}_user" * tag 'riscv-for-linus-5.14-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (37 commits) riscv: xip: Fix duplicate included asm/pgtable.h riscv: Fix PTDUMP output now BPF region moved back to module region riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall riscv: add VMAP_STACK overflow detection riscv: ptrace: add argn syntax riscv: mm: fix build errors caused by mk_pmd() riscv: Introduce structure that group all variables regarding kernel mapping riscv: Map the kernel with correct permissions the first time riscv: Introduce set_kernel_memory helper riscv: Enable KFENCE for riscv64 RISC-V: Use asm-generic for {in,out}{bwlq} riscv: add ASID-based tlbflushing methods riscv: pass the mm_struct to __sbi_tlb_flush_range riscv: Add mem kernel parameter support riscv: Simplify xip and !xip kernel address conversion macros riscv: Remove CONFIG_PHYS_RAM_BASE_FIXED riscv: Only initialize swiotlb when necessary riscv: fix typo in init.c riscv: Cleanup unused functions riscv: mm: Use better bitmap_zalloc() ...
2021-06-28Merge tag 'perf-core-2021-06-28' of ↵Linus Torvalds1-17/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: - Platform PMU driver updates: - x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers - Fix RDPMC support - Fix [extended-]PEBS-via-PT support - Fix Sapphire Rapids event constraints - Fix :ppp support on Sapphire Rapids - Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU - Other heterogenous-PMU fixes - Kprobes: - Remove the unused and misguided kprobe::fault_handler callbacks. - Warn about kprobes taking a page fault. - Fix the 'nmissed' stat counter. - Misc cleanups and fixes. * tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix task context PMU for Hetero perf/x86/intel: Fix instructions:ppp support in Sapphire Rapids perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire Rapids perf/x86/intel: Fix fixed counter check warning for some Alder Lake perf/x86/intel: Fix PEBS-via-PT reload base value for Extended PEBS perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task kprobes: Do not increment probe miss count in the fault handler x86,kprobes: WARN if kprobes tries to handle a fault kprobes: Remove kprobe::fault_handler uprobes: Update uprobe_write_opcode() kernel-doc comment perf/hw_breakpoint: Fix DocBook warnings in perf hw_breakpoint perf/core: Fix DocBook warnings perf/core: Make local function perf_pmu_snapshot_aux() static perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedure perf/x86/intel/uncore: Drop unnecessary NULL checks after container_of()
2021-06-03kprobes: Do not increment probe miss count in the fault handlerNaveen N. Rao1-7/+0
Kprobes has a counter 'nmissed', that is used to count the number of times a probe handler was not called. This generally happens when we hit a kprobe while handling another kprobe. However, if one of the probe handlers causes a fault, we are currently incrementing 'nmissed'. The comment in fault handler indicates that this can be used to account faults taken by the probe handlers. But, this has never been the intention as is evident from the comment above 'nmissed' in 'struct kprobe': /*count the number of times this probe was temporarily disarmed */ unsigned long nmissed; Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20210601120150.672652-1-naveen.n.rao@linux.vnet.ibm.com
2021-06-01kprobes: Remove kprobe::fault_handlerPeter Zijlstra1-10/+0
The reason for kprobe::fault_handler(), as given by their comment: * We come here because instructions in the pre/post * handler caused the page_fault, this could happen * if handler tries to access user space by * copy_from_user(), get_user() etc. Let the * user-specified handler try to fix it first. Is just plain bad. Those other handlers are ran from non-preemptible context and had better use _nofault() functions. Also, there is no upstream usage of this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210525073213.561116662@infradead.org
2021-05-29riscv: kprobes: Remove redundant kprobe_step_ctxJisheng Zhang1-31/+9
Inspired by commit ba090f9cafd5 ("arm64: kprobes: Remove redundant kprobe_step_ctx"), the ss_pending and match_addr of kprobe_step_ctx are redundant because those can be replaced by KPROBE_HIT_SS and &cur_kprobe->ainsn.api.insn[0] + GET_INSN_LENGTH(cur->opcode) respectively. Remove the kprobe_step_ctx to simplify the code. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-22riscv: kprobes: Fix build error when MMU=nJisheng Zhang1-0/+2
lkp reported a randconfig failure: arch/riscv/kernel/probes/kprobes.c:90:22: error: use of undeclared identifier 'PAGE_KERNEL_READ_EXEC' We implemented the alloc_insn_page() to allocate PAGE_KERNEL_READ_EXEC page for kprobes insn page for STRICT_MODULE_RWX. But if MMU=n, we should fall back to the generic weak alloc_insn_page() by generic kprobe subsystem. Fixes: cdd1b2bd358f ("riscv: kprobes: Implement alloc_insn_page()") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-06Merge tag 'riscv-for-linus-5.13-mw0' of ↵Linus Torvalds1-1/+11
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the memtest= kernel command-line argument. - Support for building the kernel with FORTIFY_SOURCE. - Support for generic clockevent broadcasts. - Support for the buildtar build target. - Some build system cleanups to pass more LLVM-friendly arguments. - Support for kprobes. - A rearranged kernel memory map, the first part of supporting sv48 systems. - Improvements to kexec, along with support for kdump and crash kernels. - An alternatives-based errata framework, along with support for handling a pair of errata that manifest on some SiFive designs (including the HiFive Unmatched). - Support for XIP. - A device tree for the Microchip PolarFire ICICLE SoC and associated dev board. ... along with a bunch of cleanups. There are already a handful of fixes on the list so there will likely be a part 2. * tag 'riscv-for-linus-5.13-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (45 commits) RISC-V: Always define XIP_FIXUP riscv: Remove 32b kernel mapping from page table dump riscv: Fix 32b kernel build with CONFIG_DEBUG_VIRTUAL=y RISC-V: Fix error code returned by riscv_hartid_to_cpuid() RISC-V: Enable Microchip PolarFire ICICLE SoC RISC-V: Initial DTS for Microchip ICICLE board dt-bindings: riscv: microchip: Add YAML documentation for the PolarFire SoC RISC-V: Add Microchip PolarFire SoC kconfig option RISC-V: enable XIP RISC-V: Add crash kernel support RISC-V: Add kdump support RISC-V: Improve init_resources() RISC-V: Add kexec support RISC-V: Add EM_RISCV to kexec UAPI header riscv: vdso: fix and clean-up Makefile riscv/mm: Use BUG_ON instead of if condition followed by BUG. riscv/kprobe: fix kernel panic when invoking sys_read traced by kprobe riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU riscv: module: Create module allocations without exec permissions riscv: bpf: Avoid breaking W^X ...
2021-04-26riscv/kprobe: fix kernel panic when invoking sys_read traced by kprobeLiao Chang1-1/+3
The execution of sys_read end up hitting a BUG_ON() in __find_get_block after installing kprobe at sys_read, the BUG message like the following: [ 65.708663] ------------[ cut here ]------------ [ 65.709987] kernel BUG at fs/buffer.c:1251! [ 65.711283] Kernel BUG [#1] [ 65.712032] Modules linked in: [ 65.712925] CPU: 0 PID: 51 Comm: sh Not tainted 5.12.0-rc4 #1 [ 65.714407] Hardware name: riscv-virtio,qemu (DT) [ 65.715696] epc : __find_get_block+0x218/0x2c8 [ 65.716835] ra : __getblk_gfp+0x1c/0x4a [ 65.717831] epc : ffffffe00019f11e ra : ffffffe00019f56a sp : ffffffe002437930 [ 65.719553] gp : ffffffe000f06030 tp : ffffffe0015abc00 t0 : ffffffe00191e038 [ 65.721290] t1 : ffffffe00191e038 t2 : 000000000000000a s0 : ffffffe002437960 [ 65.723051] s1 : ffffffe00160ad00 a0 : ffffffe00160ad00 a1 : 000000000000012a [ 65.724772] a2 : 0000000000000400 a3 : 0000000000000008 a4 : 0000000000000040 [ 65.726545] a5 : 0000000000000000 a6 : ffffffe00191e000 a7 : 0000000000000000 [ 65.728308] s2 : 000000000000012a s3 : 0000000000000400 s4 : 0000000000000008 [ 65.730049] s5 : 000000000000006c s6 : ffffffe00240f800 s7 : ffffffe000f080a8 [ 65.731802] s8 : 0000000000000001 s9 : 000000000000012a s10: 0000000000000008 [ 65.733516] s11: 0000000000000008 t3 : 00000000000003ff t4 : 000000000000000f [ 65.734434] t5 : 00000000000003ff t6 : 0000000000040000 [ 65.734613] status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 65.734901] Call Trace: [ 65.735076] [<ffffffe00019f11e>] __find_get_block+0x218/0x2c8 [ 65.735417] [<ffffffe00020017a>] __ext4_get_inode_loc+0xb2/0x2f6 [ 65.735618] [<ffffffe000201b6c>] ext4_get_inode_loc+0x3a/0x8a [ 65.735802] [<ffffffe000203380>] ext4_reserve_inode_write+0x2e/0x8c [ 65.735999] [<ffffffe00020357a>] __ext4_mark_inode_dirty+0x4c/0x18e [ 65.736208] [<ffffffe000206bb0>] ext4_dirty_inode+0x46/0x66 [ 65.736387] [<ffffffe000192914>] __mark_inode_dirty+0x12c/0x3da [ 65.736576] [<ffffffe000180dd2>] touch_atime+0x146/0x150 [ 65.736748] [<ffffffe00010d762>] filemap_read+0x234/0x246 [ 65.736920] [<ffffffe00010d834>] generic_file_read_iter+0xc0/0x114 [ 65.737114] [<ffffffe0001f5d7a>] ext4_file_read_iter+0x42/0xea [ 65.737310] [<ffffffe000163f2c>] new_sync_read+0xe2/0x15a [ 65.737483] [<ffffffe000165814>] vfs_read+0xca/0xf2 [ 65.737641] [<ffffffe000165bae>] ksys_read+0x5e/0xc8 [ 65.737816] [<ffffffe000165c26>] sys_read+0xe/0x16 [ 65.737973] [<ffffffe000003972>] ret_from_syscall+0x0/0x2 [ 65.738858] ---[ end trace fe93f985456c935d ]--- A simple reproducer looks like: echo 'p:myprobe sys_read fd=%a0 buf=%a1 count=%a2' > /sys/kernel/debug/tracing/kprobe_events echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable cat /sys/kernel/debug/tracing/trace Here's what happens to hit that BUG_ON(): 1) After installing kprobe at entry of sys_read, the first instruction is replaced by 'ebreak' instruction on riscv64 platform. 2) Once kernel reach the 'ebreak' instruction at the entry of sys_read, it trap into the riscv breakpoint handler, where it do something to setup for coming single-step of origin instruction, including backup the 'sstatus' in pt_regs, followed by disable interrupt during single stepping via clear 'SIE' bit of 'sstatus' in pt_regs. 3) Then kernel restore to the instruction slot contains two instructions, one is original instruction at entry of sys_read, the other is 'ebreak'. Here it trigger a 'Instruction page fault' exception (value at 'scause' is '0xc'), if PF is not filled into PageTabe for that slot yet. 4) Again kernel trap into page fault exception handler, where it choose different policy according to the state of running kprobe. Because afte 2) the state is KPROBE_HIT_SS, so kernel reset the current kprobe and 'pc' points back to the probe address. 5) Because 'epc' point back to 'ebreak' instrution at sys_read probe, kernel trap into breakpoint handler again, and repeat the operations at 2), however 'sstatus' without 'SIE' is keep at 4), it cause the real 'sstatus' saved at 2) is overwritten by the one withou 'SIE'. 6) When kernel cross the probe the 'sstatus' CSR restore with value without 'SIE', and reach __find_get_block where it requires the interrupt must be enabled. Fix this is very trivial, just restore the value of 'sstatus' in pt_regs with backup one at 2) when the instruction being single stepped cause a page fault. Fixes: c22b0bcb1dd02 ("riscv: Add kprobes supported") Signed-off-by: Liao Chang <liaochang1@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: kprobes: Implement alloc_insn_page()Jisheng Zhang1-0/+8
Allocate PAGE_KERNEL_READ_EXEC(read only, executable) page for kprobes insn page. This is to prepare for STRICT_MODULE_RWX. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-15riscv: kprobes/ftrace: Add recursion protection to the ftrace callbackJisheng Zhang1-1/+10
Currently, the riscv's kprobes(powerred by ftrace) handler is preemptible. Futher check indicates we miss something similar as the commit c536aa1c5b17 ("kprobes/ftrace: Add recursion protection to the ftrace callback"), so do similar modifications as the commit does. Fixes: 829adda597fe ("riscv: Add KPROBES_ON_FTRACE supported") Cc: stable@vger.kernel.org Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-03-16ftrace: Fix spelling mistake "disabed" -> "disabled"Colin Ian King1-1/+1
There is a spelling mistake in a comment, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-03-16riscv: fix bugon.cocci warningskernel test robot1-2/+1
Use BUG_ON instead of a if condition followed by BUG. Generated by: scripts/coccinelle/misc/bugon.cocci Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") CC: Guo Ren <guoren@linux.alibaba.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Julia Lawall <julia.lawall@inria.fr> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-03-09riscv: ftrace: Use ftrace_get_regs helperNanyong Sun1-7/+9
Use ftrace_get_regs() helper call to get pt_regs from ftrace_regs struct, this makes the code simpler. Signed-off-by: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>