Age | Commit message (Collapse) | Author | Files | Lines |
|
Anup Patel <[email protected]> says:
The Linux NVDIMM PEM drivers require arch support to map and access the
persistent memory device. This series adds RISC-V PMEM support using
recently added Svpbmt and Zicbom support.
* b4-shazam-merge:
RISC-V: Enable PMEM drivers
RISC-V: Implement arch specific PMEM APIs
RISC-V: Fix MEMREMAP_WB for systems with Svpbmt
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
We now have PMEM arch support available in RISC-V kernel so let us
enable relevant drivers in defconfig.
Signed-off-by: Anup Patel <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The NVDIMM PMEM driver expects arch specific APIs for cache maintenance
and if arch does not provide these APIs then NVDIMM PMEM driver will
always use MEMREMAP_WT to map persistent memory which in-turn maps as
UC memory type defined by the RISC-V Svpbmt specification.
Now that the Svpbmt and Zicbom support is available in RISC-V kernel,
we implement PMEM APIs using ALT_CMO_OP() macros so that the NVDIMM
PMEM driver can use MEMREMAP_WB to map persistent memory.
Co-developed-by: Mayuresh Chitale <[email protected]>
Signed-off-by: Mayuresh Chitale <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Currently, the memremap() called with MEMREMAP_WB maps memory using
the generic ioremap() function which breaks on system with Svpbmt
because memory mapped using _PAGE_IOREMAP page attributes is treated
as strongly-ordered non-cacheable IO memory.
To address this, we implement RISC-V specific arch_memremap_wb()
which maps memory using _PAGE_KERNEL page attributes resulting in
write-back cacheable mapping on systems with Svpbmt.
Fixes: ff689fd21cb1 ("riscv: add RISC-V Svpbmt extension support")
Co-developed-by: Mayuresh Chitale <[email protected]>
Signed-off-by: Mayuresh Chitale <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
I'm merging this in as a single patch to make it easier to handle the
backports.
* b4-shazam-merge:
RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The return to userspace path in entry.S may enable interrupts without the
corresponding lockdep annotation, producing a splat[0] when DEBUG_LOCKDEP
is enabled. Simply calling __trace_hardirqs_on() here gets a bit messy
due to the use of RA to point back to ret_from_exception, so just move
the whole slow-path loop into C. It's more readable and it lets us use
local_irq_{enable,disable}(), avoiding the need for manual annotations
altogether.
[0]:
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled())
WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5512 check_flags+0x10a/0x1e0
Modules linked in:
CPU: 2 PID: 1 Comm: init Not tainted 6.1.0-rc4-00160-gb56b6e2b4f31 #53
Hardware name: riscv-virtio,qemu (DT)
epc : check_flags+0x10a/0x1e0
ra : check_flags+0x10a/0x1e0
<snip>
status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[<ffffffff808edb90>] lock_is_held_type+0x78/0x14e
[<ffffffff8003dae2>] __might_resched+0x26/0x22c
[<ffffffff8003dd24>] __might_sleep+0x3c/0x66
[<ffffffff80022c60>] get_signal+0x9e/0xa70
[<ffffffff800054a2>] do_notify_resume+0x6e/0x422
[<ffffffff80003c68>] ret_from_exception+0x0/0x10
irq event stamp: 44512
hardirqs last enabled at (44511): [<ffffffff808f901c>] _raw_spin_unlock_irqrestore+0x54/0x62
hardirqs last disabled at (44512): [<ffffffff80008200>] __trace_hardirqs_off+0xc/0x14
softirqs last enabled at (44472): [<ffffffff808f9fbe>] __do_softirq+0x3de/0x51e
softirqs last disabled at (44467): [<ffffffff80017760>] irq_exit+0xd6/0x104
---[ end trace 0000000000000000 ]---
possible reason: unannotated irqs-on.
Signed-off-by: Andrew Bresticker <[email protected]>
Fixes: 3c4697982982 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Current implementation of update_mmu_cache function performs local TLB
flush. It does not take into account ASID information. Besides, it does
not take into account other harts currently running the same mm context
or possible migration of the running context to other harts. Meanwhile
TLB flush is not performed for every context switch if ASID support
is enabled.
Patch [1] proposed to add ASID support to update_mmu_cache to avoid
flushing local TLB entirely. This patch takes into account other
harts currently running the same mm context as well as possible
migration of this context to other harts.
For this purpose the approach from flush_icache_mm is reused. Remote
harts currently running the same mm context are informed via SBI calls
that they need to flush their local TLBs. All the other harts are marked
as needing a deferred TLB flush when this mm context runs on them.
[1] https://lore.kernel.org/linux-riscv/[email protected]/
Signed-off-by: Sergey Matyukevich <[email protected]>
Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator")
Cc: [email protected]
Link: https://lore.kernel.org/linux-riscv/[email protected]/#t
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The current walk_stackframe with FRAME_POINTER would stop unwinding at
ret_from_exception:
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init
CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192
Call Trace:
[<ffffffe0002038c8>] walk_stackframe+0x0/0xee
[<ffffffe000aecf48>] show_stack+0x32/0x4a
[<ffffffe000af1618>] dump_stack_lvl+0x72/0x8e
[<ffffffe000af1648>] dump_stack+0x14/0x1c
[<ffffffe000239ad2>] ___might_sleep+0x12e/0x138
[<ffffffe000239aec>] __might_sleep+0x10/0x18
[<ffffffe000afe3fe>] down_read+0x22/0xa4
[<ffffffe000207588>] do_page_fault+0xb0/0x2fe
[<ffffffe000201b80>] ret_from_exception+0x0/0xc
The optimization would help walk_stackframe cross the pt_regs frame and
get more backtrace of debug info:
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init
CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192
Call Trace:
[<ffffffe0002038c8>] walk_stackframe+0x0/0xee
[<ffffffe000aecf48>] show_stack+0x32/0x4a
[<ffffffe000af1618>] dump_stack_lvl+0x72/0x8e
[<ffffffe000af1648>] dump_stack+0x14/0x1c
[<ffffffe000239ad2>] ___might_sleep+0x12e/0x138
[<ffffffe000239aec>] __might_sleep+0x10/0x18
[<ffffffe000afe3fe>] down_read+0x22/0xa4
[<ffffffe000207588>] do_page_fault+0xb0/0x2fe
[<ffffffe000201b80>] ret_from_exception+0x0/0xc
[<ffffffe000613c06>] riscv_intc_irq+0x1a/0x72
[<ffffffe000201b80>] ret_from_exception+0x0/0xc
[<ffffffe00033f44a>] vma_link+0x54/0x160
[<ffffffe000341d7a>] mmap_region+0x2cc/0x4d0
[<ffffffe000342256>] do_mmap+0x2d8/0x3ac
[<ffffffe000326318>] vm_mmap_pgoff+0x70/0xb8
[<ffffffe00032638a>] vm_mmap+0x2a/0x36
[<ffffffe0003cfdde>] elf_map+0x72/0x84
[<ffffffe0003d05f8>] load_elf_binary+0x69a/0xec8
[<ffffffe000376240>] bprm_execve+0x246/0x53a
[<ffffffe00037786c>] kernel_execve+0xe8/0x124
[<ffffffe000aecdf2>] run_init_process+0xfa/0x10c
[<ffffffe000aece16>] try_to_run_init_process+0x12/0x3c
[<ffffffe000afa920>] kernel_init+0xb4/0xf8
[<ffffffe000201b80>] ret_from_exception+0x0/0xc
Here is the error injection test code for the above output:
drivers/irqchip/irq-riscv-intc.c:
static asmlinkage void riscv_intc_irq(struct pt_regs *regs)
{
unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG;
+ u32 tmp; __get_user(tmp, (u32 *)0);
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[Palmer: use SYM_CODE_*]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The 'retp' is a pointer to the return address on the stack, so we
must pass the current return address pointer as the 'retp'
argument to ftrace_push_return_trace(). Not parent function's
return address on the stack.
Fixes: b785ec129bd9 ("riscv/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR support")
Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
This is reported by kmemleak detector:
unreferenced object 0xff2000000403d000 (size 4096):
comm "kexec", pid 146, jiffies 4294900633 (age 64.792s)
hex dump (first 32 bytes):
7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............
04 00 f3 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000566ca97c>] kmemleak_vmalloc+0x3c/0xbe
[<00000000979283d8>] __vmalloc_node_range+0x3ac/0x560
[<00000000b4b3712a>] __vmalloc_node+0x56/0x62
[<00000000854f75e2>] vzalloc+0x2c/0x34
[<00000000e9a00db9>] crash_prepare_elf64_headers+0x80/0x30c
[<0000000067e8bf48>] elf_kexec_load+0x3e8/0x4ec
[<0000000036548e09>] kexec_image_load_default+0x40/0x4c
[<0000000079fbe1b4>] sys_kexec_file_load+0x1c4/0x322
[<0000000040c62c03>] ret_from_syscall+0x0/0x2
In elf_kexec_load(), a buffer is allocated via vzalloc() to store elf
headers. While it's not freed back to system when kdump kernel is
reloaded or unloaded, or when image->elf_header is successfully set and
then fails to load kdump kernel for some reason. Fix it by freeing the
buffer in arch_kimage_file_post_load_cleanup().
Fixes: 8acea455fafa ("RISC-V: Support for kexec_file on panic")
Signed-off-by: Li Huafei <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
This is reported by kmemleak detector:
unreferenced object 0xff60000082864000 (size 9588):
comm "kexec", pid 146, jiffies 4294900634 (age 64.788s)
hex dump (first 32 bytes):
d0 0d fe ed 00 00 12 ed 00 00 00 48 00 00 11 40 ...........H...@
00 00 00 28 00 00 00 11 00 00 00 02 00 00 00 00 ...(............
backtrace:
[<00000000f95b17c4>] kmemleak_alloc+0x34/0x3e
[<00000000b9ec8e3e>] kmalloc_order+0x9c/0xc4
[<00000000a95cf02e>] kmalloc_order_trace+0x34/0xb6
[<00000000f01e68b4>] __kmalloc+0x5c2/0x62a
[<000000002bd497b2>] kvmalloc_node+0x66/0xd6
[<00000000906542fa>] of_kexec_alloc_and_setup_fdt+0xa6/0x6ea
[<00000000e1166bde>] elf_kexec_load+0x206/0x4ec
[<0000000036548e09>] kexec_image_load_default+0x40/0x4c
[<0000000079fbe1b4>] sys_kexec_file_load+0x1c4/0x322
[<0000000040c62c03>] ret_from_syscall+0x0/0x2
In elf_kexec_load(), a buffer is allocated via kvmalloc() to store fdt.
While it's not freed back to system when kexec kernel is reloaded or
unloaded. Then memory leak is caused. Fix it by introducing riscv
specific function arch_kimage_file_post_load_cleanup(), and freeing the
buffer there.
Fixes: 6261586e0c91 ("RISC-V: Add kexec_file support")
Signed-off-by: Li Huafei <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Reviewed-by: Liao Chang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Add arch_crash_save_vmcoreinfo(), which exports VM layout(MODULES,
VMALLOC, VMEMMAP ranges and KERNEL_LINK_ADDR), va bits and ram base for
vmcore.
* b4-shazam-merge:
Documentation: kdump: describe VMCOREINFO export for RISCV64
RISC-V: Add arch_crash_save_vmcoreinfo support
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The following interrelated definitions and ranges are needed by the
kdump crash tool, which are exported by
"arch/riscv/kernel/crash_core.c":
VA_BITS,
PAGE_OFFSET,
phys_ram_base,
KERNEL_LINK_ADDR,
MODULES_VADDR ~ MODULES_END,
VMALLOC_START ~ VMALLOC_END,
VMEMMAP_START ~ VMEMMAP_END,
Document these RISCV64 exports above.
Reviewed-by: Bagas Sanjaya <[email protected]>
Signed-off-by: Xianting Tian <[email protected]>
Acked-by: Baoquan He <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[Palmer: wrap commit text]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Add arch_crash_save_vmcoreinfo(), which exports VM layout(MODULES,
VMALLOC, VMEMMAP ranges and KERNEL_LINK_ADDR), va bits and ram base for
vmcore.
Default pagetable levels and PAGE_OFFSET aren't same for different
kernel version as below. For pagetable levels, it sets sv57 by default
and falls back to setting sv48 at boot time if sv57 is not supported by
the hardware.
For ram base, the default value is 0x80200000 for qemu riscv64 env and,
for example, is 0x200000 on the XuanTie 910 CPU.
* Linux Kernel 5.18 ~
* PGTABLE_LEVELS = 5
* PAGE_OFFSET = 0xff60000000000000
* Linux Kernel 5.17 ~
* PGTABLE_LEVELS = 4
* PAGE_OFFSET = 0xffffaf8000000000
* Linux Kernel 4.19 ~
* PGTABLE_LEVELS = 3
* PAGE_OFFSET = 0xffffffe000000000
Since these configurations change from time to time and version to
version, it is preferable to export them via vmcoreinfo than to change
the crash's code frequently, it can simplify the development of crash
tool.
Signed-off-by: Xianting Tian <[email protected]>
Tested-by: Deepak Gupta <[email protected]>
Tested-by: Guo Ren <[email protected]>
Acked-by: Baoquan He <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[Palmer: wrap commit text]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Implement the kretprobes on riscv arch by using rethook machenism
which abstracts general kretprobe info into a struct rethook_node
to be embedded in the struct kretprobe_instance.
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Binglei Wang <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
With the PG_arch_1 we keep track if the page's data cache is clean,
architecture rely on this property to treat new pages as dirty with
respect to the data cache and perform the flushing before mapping the pages
into userspace.
This patch adds a new architecture hook, arch_clear_hugepage_flags,so that
architectures which rely on the page flags being in a particular state for
fresh allocations can adjust the flags accordingly when a page is freed
into the pool.
Fixes: 9e953cda5cdf ("riscv: Introduce huge page support for 32/64bit kernel")
Signed-off-by: Tong Tiangen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
HugeTLB pages are always fully mapped, so only setting head page's
PG_dcache_clean flag is enough.
Signed-off-by: Tong Tiangen <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Add CONFIG_SERIAL_8250_DW=y, which is a necessary option for
StarFive JH7110 and JH7100 SoCs to boot with serial ports.
Reviewed-by: Conor Dooley <[email protected]>
Signed-off-by: Hal Feng <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Jamie Iles <[email protected]> says:
This series enables dynamic ftrace support for RV32I bringing it to
parity with RV64I. Most of the work is already there, this is largely
just assembly fixes to handle register sizes, correct handling of the
psABI calling convention and Kconfig change.
Validated with all ftrace boot time self test with qemu for RV32I and
RV64I in addition to real tracing on an RV32I FPGA design.
* b4-shazam-merge:
RISC-V: enable dynamic ftrace for RV32I
RISC-V: preserve a1 in mcount
RISC-V: reduce mcount save space on RV32
RISC-V: use REG_S/REG_L for mcount
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The RISC-V mcount function is now capable of supporting RV32I so make it
available in the kernel config.
Signed-off-by: Jamie Iles <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The RISC-V ELF psABI states that both a0 and a1 are used for return
values so we should preserve them both in return_to_handler. This is
especially important for RV32 for functions returning a 64-bit quantity
otherwise the return value can be corrupted and undefined behaviour
results.
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Jamie Iles <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
For RV32 we can reduce the size of the ABI save+restore state by using
SZREG so that register stores are packed rather than on an 8 byte
boundary.
Signed-off-by: Jamie Iles <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
In preparation for rv32i ftrace support, convert mcount routines to use
native sized loads/stores.
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Jamie Iles <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
on an arch level, RISC-V defaults to FLATMEM. On PolarFire SoC, the
memory layout is almost always sparse, with a maximum of 1 GiB at
0x8000_0000 & a possible 16 GiB range at 0x10_0000_0000. The Icicle kit,
for example, has 2 GiB of DDR - so there's a big hole in the memory map
between the two gigs. Prior to v6.1-rc1, boot times from defconfig
builds were pretty bad on Icicle but enabling sparsemem would fix those
issues. As of v6.1-rc1, the Icicle kit no longer boots from defconfig
builds with the in-kernel devicetree. A change to the memory map
resulted in a futher "sparse-ification", producing a splat on boot:
OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
Machine model: Microchip PolarFire-SoC Icicle Kit
earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8')
printk: bootconsole [ns16550a0] enabled
printk: debug: skip boot console de-registration.
efi: UEFI not found.
Zone ranges:
DMA32 [mem 0x0000000080200000-0x00000000ffffffff]
Normal [mem 0x0000000100000000-0x000000107fffffff]
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000080200000-0x00000000bfbfffff]
node 0: [mem 0x00000000bfc00000-0x00000000bfffffff]
node 0: [mem 0x0000001040000000-0x000000107fffffff]
Initmem setup node 0 [mem 0x0000000080200000-0x000000107fffffff]
Kernel panic - not syncing: Failed to allocate 1073741824 bytes for node 0 memory map
CPU: 0 PID: 0 Comm: swapper Not tainted 5.19.0-dirty #1
Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
Call Trace:
[<ffffffff800057f0>] show_stack+0x30/0x3c
[<ffffffff807d5802>] dump_stack_lvl+0x4a/0x66
[<ffffffff807d5836>] dump_stack+0x18/0x20
[<ffffffff807d1ae8>] panic+0x124/0x2c6
[<ffffffff80814064>] free_area_init_core+0x0/0x11e
[<ffffffff80813720>] free_area_init_node+0xc2/0xf6
[<ffffffff8081331e>] free_area_init+0x222/0x260
[<ffffffff808064d6>] misc_mem_init+0x62/0x9a
[<ffffffff80803cb2>] setup_arch+0xb0/0xea
[<ffffffff8080039a>] start_kernel+0x88/0x4ee
---[ end Kernel panic - not syncing: Failed to allocate 1073741824 bytes for node 0 memory map ]---
With the aim of keeping defconfig builds booting on icicle, enable
SPARSEMEM_MANUAL.
Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Enable cpufreq kconfig menu for RISC-V.
Signed-off-by: Lad Prabhakar <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
After we support HAVE_ARCH_HUGE_VMAP, we can now enable
HAVE_ARCH_HUGE_VMALLOC too. This feature has been used in kvmalloc and
alloc_large_system_hash for now. This feature can be disabled by kernel
parameters "nohugevmalloc".
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: Björn Töpel <[email protected]>
Tested-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[Palmer: minor formatting]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
This sets the HAVE_ARCH_HUGE_VMAP option, and defines the required page
table functions. With this feature, ioremap area will be mapped with
huge page granularity according to its actual size. This feature can be
disabled by kernel parameter "nohugeiomap".
Signed-off-by: Liu Shixin <[email protected]>
Reviewed-by: Björn Töpel <[email protected]>
Tested-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[Palmer: minor formatting]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Change the two comments in ucontext.h by getting them up to
the coding style proposed by torvalds.
Signed-off-by: Cleo John <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/20221010182848.GA28029@watet-ms7b87
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Add macro definition to support update_mmu_tlb() for riscv,
this function is from commit:7df676974359 ("mm/memory.c:Update
local TLB if PTE entry exists").
update_mmu_tlb() is used when a thread notice that other cpu thread
has handled the fault and changed the PTE. For MIPS, it's worth to
do that,this cpu thread will trap in tlb fault again otherwise.
For RISCV, it's also better to flush local tlb than do nothing in
update_mmu_tlb(). There are two kinds of page fault that have
update_mmu_tlb() inside:
1.page fault which PTE is NOT none, only protection check error,
like write protection fault. If updata_mmu_tlb() is empty, after
finsh page fault this time and re-execute, cpu will find address
but protection checked error in tlb again. So this will cause
another page fault. PTE in memory is good now,so update_mmu_cache()
in handle_pte_fault() will be executed. If updata_mmu_tlb() is not
empty flush local tlb, cpu won't find this address in tlb next time,
and get entry in physical memory, so it won't cause another page
fault.
2.page fault which PTE is none or swapped.
For this case, this cpu thread won't cause another page fault,cpu
will have tlb miss when re-execute, and get entry in memory
directly. But "set pte in phycial memory and flush local tlb" is
pratice in Linux, it's better to flush local tlb if it find entry
in phycial memory has changed.
Maybe it's same for other ARCH which can't detect PTE changed and
update it in local tlb automatically.
Signed-off-by: Jinyu Tang <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
arch/riscv/kernel/head.o does not need any special treatment - the only
requirement is the ".head.text" section must be placed before the
normal ".text" section.
The linker script does the right thing to do. The build system does
not need to manipulate the link order of head.o.
Signed-off-by: Jisheng Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The PMU on T-Head C9xx cores is quite similar to the SSCOFPMF extension
but not completely identical, so this series adds a T-Head PMU errata
that handlen the differences.
* 'riscv-pmu' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
drivers/perf: riscv_pmu_sbi: add support for PMU variant on T-Head C9xx cores
RISC-V: Cache SBI vendor values
|
|
With the T-HEAD C9XX cores being designed before or during the ratification
to the SSCOFPMF extension, it implements functionality very similar but
not equal to it.
It implements overflow handling and also some privilege-mode filtering.
While SSCOFPMF supports this for all modes, the C9XX only implements the
filtering for M-mode and S-mode but not user-mode.
So add some adaptions to allow the C9XX to still handle
its PMU through the regular SBI PMU interface instead of defining new
interfaces or drivers.
To work properly, this requires a matching change in SBI, though the actual
interface between kernel and SBI does not change.
The main differences are a the overflow CSR and irq number.
As the reading of the overflow-csr is in the hot-path during irq handling,
use an errata and alternatives to not introduce new conditionals there.
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
sbi_get_mvendorid(), sbi_get_marchid() and sbi_get_mimpid() might get
called multiple times, though the values of these CSRs should not change
during the runtime of a specific machine.
Though the values can be different depending on which hart of the system
they get called. So hook into the newly introduced cpuinfo struct to allow
retrieving these cached values via new functions.
Also use arch_initcall for the cpuinfo setup instead, as that now clearly
is "architecture specific initialization" and also makes these information
available slightly earlier.
[caching vendor ids]
Suggested-by: Atish Patra <[email protected]>
[using cpuinfo struct as cache]
Suggested-by: Anup Patel <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Heiko Stuebner <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The adverbs 'therefor' and 'therefore' have different meaning.
As the meaning here is 'consequently' the spelling should be 'therefore'.
Signed-off-by: Heinrich Schuchardt <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
"This time with some large scale treewide cleanups.
The intent of this pull is to clean up the way callers fetch random
integers. The current rules for doing this right are:
- If you want a secure or an insecure random u64, use get_random_u64()
- If you want a secure or an insecure random u32, use get_random_u32()
The old function prandom_u32() has been deprecated for a while
now and is just a wrapper around get_random_u32(). Same for
get_random_int().
- If you want a secure or an insecure random u16, use get_random_u16()
- If you want a secure or an insecure random u8, use get_random_u8()
- If you want secure or insecure random bytes, use get_random_bytes().
The old function prandom_bytes() has been deprecated for a while
now and has long been a wrapper around get_random_bytes()
- If you want a non-uniform random u32, u16, or u8 bounded by a
certain open interval maximum, use prandom_u32_max()
I say "non-uniform", because it doesn't do any rejection sampling
or divisions. Hence, it stays within the prandom_*() namespace, not
the get_random_*() namespace.
I'm currently investigating a "uniform" function for 6.2. We'll see
what comes of that.
By applying these rules uniformly, we get several benefits:
- By using prandom_u32_max() with an upper-bound that the compiler
can prove at compile-time is ≤65536 or ≤256, internally
get_random_u16() or get_random_u8() is used, which wastes fewer
batched random bytes, and hence has higher throughput.
- By using prandom_u32_max() instead of %, when the upper-bound is
not a constant, division is still avoided, because
prandom_u32_max() uses a faster multiplication-based trick instead.
- By using get_random_u16() or get_random_u8() in cases where the
return value is intended to indeed be a u16 or a u8, we waste fewer
batched random bytes, and hence have higher throughput.
This series was originally done by hand while I was on an airplane
without Internet. Later, Kees and I worked on retroactively figuring
out what could be done with Coccinelle and what had to be done
manually, and then we split things up based on that.
So while this touches a lot of files, the actual amount of code that's
hand fiddled is comfortably small"
* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
prandom: remove unused functions
treewide: use get_random_bytes() when possible
treewide: use get_random_u32() when possible
treewide: use get_random_{u8,u16}() when possible, part 2
treewide: use get_random_{u8,u16}() when possible, part 1
treewide: use prandom_u32_max() when possible, part 2
treewide: use prandom_u32_max() when possible, part 1
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tools updates from Arnaldo Carvalho de Melo:
- Use BPF CO-RE (Compile Once, Run Everywhere) to support old kernels
when using bperf (perf BPF based counters) with cgroups.
- Support HiSilicon PCIe Performance Monitoring Unit (PMU), that
monitors bandwidth, latency, bus utilization and buffer occupancy.
Documented in Documentation/admin-guide/perf/hisi-pcie-pmu.rst.
- User space tasks can migrate between CPUs, so when tracing selected
CPUs, system-wide sideband is still needed, fix it in the setup of
Intel PT on hybrid systems.
- Fix metricgroups title message in 'perf list', it should state that
the metrics groups are to be used with the '-M' option, not '-e'.
- Sync the msr-index.h copy with the kernel sources, adding support for
using "AMD64_TSC_RATIO" in filter expressions in 'perf trace' as well
as decoding it when printing the MSR tracepoint arguments.
- Fix program header size and alignment when generating a JIT ELF in
'perf inject'.
- Add multiple new Intel PT 'perf test' entries, including a jitdump
one.
- Fix the 'perf test' entries for 'perf stat' CSV and JSON output when
running on PowerPC due to an invalid topology number in that arch.
- Fix the 'perf test' for arm_coresight failures on the ARM Juno
system.
- Fix the 'perf test' attr entry for PERF_FORMAT_LOST, adding this
option to the or expression expected in the intercepted
perf_event_open() syscall.
- Add missing condition flags ('hs', 'lo', 'vc', 'vs') for arm64 in the
'perf annotate' asm parser.
- Fix 'perf mem record -C' option processing, it was being chopped up
when preparing the underlying 'perf record -e mem-events' and thus
being ignored, requiring using '-- -C CPUs' as a workaround.
- Improvements and tidy ups for 'perf test' shell infra.
- Fix Intel PT information printing segfault in uClibc, where a NULL
format was being passed to fprintf.
* tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (23 commits)
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet
perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver
perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()
perf tests stat+json_output: Include sanity check for topology
perf tests stat+csv_output: Include sanity check for topology
perf intel-pt: Fix system_wide dummy event for hybrid
perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
perf test: Fix attr tests for PERF_FORMAT_LOST
perf test: test_intel_pt.sh: Add 9 tests
perf inject: Fix GEN_ELF_TEXT_OFFSET for jit
perf test: test_intel_pt.sh: Add jitdump test
perf test: test_intel_pt.sh: Tidy some alignment
perf test: test_intel_pt.sh: Print a message when skipping kernel tracing
perf test: test_intel_pt.sh: Tidy some perf record options
perf test: test_intel_pt.sh: Fix return checking again
perf: Skip and warn on unknown format 'configN' attrs
perf list: Fix metricgroups title message
perf mem: Fix -C option behavior for perf mem record
perf annotate: Add missing condition flags for arm64
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for the
combination of Clang >= 14 and GAS <= 2.35.
- Drop vmlinux.bz2 from the rpm package as it just annoyingly increased
the package size.
- Fix modpost error under build environments using musl.
- Make *.ll files keep value names for easier debugging
- Fix single directory build
- Prevent RISC-V from selecting the broken DWARF5 support when Clang
and GAS are used together.
* tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
kbuild: fix single directory build
kbuild: add -fno-discard-value-names to cmd_cc_ll_c
scripts/clang-tools: Convert clang-tidy args to list
modpost: put modpost options before argument
kbuild: Stop including vmlinux.bz2 in the rpm's
Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull more clk updates from Stephen Boyd:
"This is the final part of the clk patches for this merge window.
The clk rate range series needed another week to fully bake. Maxime
fixed the bug that broke clk notifiers and prevented this from being
included in the first pull request. He also added a unit test on top
to make sure it doesn't break so easily again. The majority of the
series fixes up how the clk_set_rate_*() APIs work, particularly
around when the rate constraints are dropped and how they move around
when reparenting clks. Overall it's a much needed improvement to the
clk rate range APIs that used to be pretty broken if you looked
sideways.
Beyond the core changes there are a few driver fixes for a compilation
issue or improper data causing clks to fail to register or have the
wrong parents. These are good to get in before the first -rc so that
the system actually boots on the affected devices"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (31 commits)
clk: tegra: Fix Tegra PWM parent clock
clk: at91: fix the build with binutils 2.27
clk: qcom: gcc-msm8660: Drop hardcoded fixed board clocks
clk: mediatek: clk-mux: Add .determine_rate() callback
clk: tests: Add tests for notifiers
clk: Update req_rate on __clk_recalc_rates()
clk: tests: Add missing test case for ranges
clk: qcom: clk-rcg2: Take clock boundaries into consideration for gfx3d
clk: Introduce the clk_hw_get_rate_range function
clk: Zero the clk_rate_request structure
clk: Stop forwarding clk_rate_requests to the parent
clk: Constify clk_has_parent()
clk: Introduce clk_core_has_parent()
clk: Switch from __clk_determine_rate to clk_core_round_rate_nolock
clk: Add our request boundaries in clk_core_init_rate_req
clk: Introduce clk_hw_init_rate_request()
clk: Move clk_core_init_rate_req() from clk_core_round_rate_nolock() to its caller
clk: Change clk_core_init_rate_req prototype
clk: Set req_rate on reparenting
clk: Take into account uncached clocks in clk_set_rate_range()
...
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull more cifs updates from Steve French:
- fix a regression in guest mounts to old servers
- improvements to directory leasing (caching directory entries safely
beyond the root directory)
- symlink improvement (reducing roundtrips needed to process symlinks)
- an lseek fix (to problem where some dir entries could be skipped)
- improved ioctl for returning more detailed information on directory
change notifications
- clarify multichannel interface query warning
- cleanup fix (for better aligning buffers using ALIGN and round_up)
- a compounding fix
- fix some uninitialized variable bugs found by Coverity and the kernel
test robot
* tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
smb3: improve SMB3 change notification support
cifs: lease key is uninitialized in two additional functions when smb1
cifs: lease key is uninitialized in smb1 paths
smb3: must initialize two ACL struct fields to zero
cifs: fix double-fault crash during ntlmssp
cifs: fix static checker warning
cifs: use ALIGN() and round_up() macros
cifs: find and use the dentry for cached non-root directories also
cifs: enable caching of directories for which a lease is held
cifs: prevent copying past input buffer boundaries
cifs: fix uninitialised var in smb2_compound_op()
cifs: improve symlink handling for smb2+
smb3: clarify multichannel warning
cifs: fix regression in very old smb1 mounts
cifs: fix skipping to incorrect offset in emit_cached_dirents
|
|
This reverts commit 78e5a3399421 ("cpumask: fix checking valid cpu range").
syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at
cpu_max_bits_warn() [1], for commit 78e5a3399421 ("cpumask: fix checking
valid cpu range") is broken. Obviously that patch hits WARN_ON_ONCE()
when e.g. reading /proc/cpuinfo because passing "cpu + 1" instead of
"cpu" will trivially hit cpu == nr_cpumask_bits condition.
Although syzbot found this problem in linux-next.git on 2022/09/27 [2],
this problem was not fixed immediately. As a result, that patch was
sent to linux.git before the patch author recognizes this problem, and
syzbot started failing to test changes in linux.git since 2022/10/10
[3].
Andrew Jones proposed a fix for x86 and riscv architectures [4]. But
[2] and [5] indicate that affected locations are not limited to arch
code. More delay before we find and fix affected locations, less tested
kernel (and more difficult to bisect and fix) before release.
We should have inspected and fixed basically all cpumask users before
applying that patch. We should not crash kernels in order to ask
existing cpumask users to update their code, even if limited to
CONFIG_DEBUG_PER_CPU_MAPS=y case.
Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd [1]
Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150 [2]
Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734 [3]
Link: https://lkml.kernel.org/r/[email protected] [4]
Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060 [5]
Reported-by: Andrew Jones <[email protected]>
Reported-by: [email protected]
Signed-off-by: Tetsuo Handa <[email protected]>
Cc: Yury Norov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When building with a RISC-V kernel with DWARF5 debug info using clang
and the GNU assembler, several instances of the following error appear:
/tmp/vgettimeofday-48aa35.s:2963: Error: non-constant .uleb128 is not supported
Dumping the .s file reveals these .uleb128 directives come from
.debug_loc and .debug_ranges:
.Ldebug_loc0:
.byte 4 # DW_LLE_offset_pair
.uleb128 .Lfunc_begin0-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp1-.Lfunc_begin0 # ending offset
.byte 1 # Loc expr size
.byte 90 # DW_OP_reg10
.byte 0 # DW_LLE_end_of_list
.Ldebug_ranges0:
.byte 4 # DW_RLE_offset_pair
.uleb128 .Ltmp6-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp27-.Lfunc_begin0 # ending offset
.byte 4 # DW_RLE_offset_pair
.uleb128 .Ltmp28-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp30-.Lfunc_begin0 # ending offset
.byte 0 # DW_RLE_end_of_list
There is an outstanding binutils issue to support a non-constant operand
to .sleb128 and .uleb128 in GAS for RISC-V but there does not appear to
be any movement on it, due to concerns over how it would work with
linker relaxation.
To avoid these build errors, prevent DWARF5 from being selected when
using clang and an assembler that does not have support for these symbol
deltas, which can be easily checked in Kconfig with as-instr plus the
small test program from the dwz test suite from the binutils issue.
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27215
Link: https://github.com/ClangBuiltLinux/linux/issues/1719
Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
Commit f110e5a250e3 ("kbuild: refactor single builds of *.ko") was wrong.
KBUILD_MODULES _is_ needed for single builds.
Otherwise, "make foo/bar/baz/" does not build module objects at all.
Fixes: f110e5a250e3 ("kbuild: refactor single builds of *.ko")
Reported-by: David Sterba <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Tested-by: David Sterba <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab hotfix from Vlastimil Babka:
"A single fix for the common-kmalloc series, for warnings on mips and
sparc64 reported by Guenter Roeck"
* tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
|
|
Pull OpenRISC updates from Stafford Horne:
"I have relocated to London so not much work from me while I get
settled.
Still, OpenRISC picked up two patches in this window:
- Fix for kernel page table walking from Jann Horn
- MAINTAINER entry cleanup from Palmer Dabbelt"
* tag 'for-linus' of https://github.com/openrisc/linux:
MAINTAINERS: git://github -> https://github.com for openrisc
openrisc: Fix pagewalk usage in arch_dma_{clear, set}_uncached
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fix from Bjorn Helgaas:
"Revert the attempt to distribute spare resources to unconfigured
hotplug bridges at boot time.
This fixed some dock hot-add scenarios, but Jonathan Cameron reported
that it broke a topology with a multi-function device where one
function was a Switch Upstream Port and the other was an Endpoint"
* tag 'pci-v6.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI: Distribute available resources for root buses, too"
|
|
After commit d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than
order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2)
requests to buddy like SLUB does.
SLAB has been using kmalloc caches to allocate freelist_idx_t array for
off slab caches. But after the commit, freelist_size can be bigger than
KMALLOC_MAX_CACHE_SIZE.
Instead of using pointer to kmalloc cache, use kmalloc_node() and only
check if the kmalloc cache is off slab during calculate_slab_order().
If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens
as it allocates freelist_idx_t array directly from buddy.
Link: https://lore.kernel.org/all/[email protected]/
Reported-and-tested-by: Guenter Roeck <[email protected]>
Fixes: d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator")
Signed-off-by: Hyeonggon Yoo <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
|
|
Github deprecated the git:// links about a year ago, so let's move to
the https:// URLs instead.
Reported-by: Conor Dooley <[email protected]>
Link: https://github.blog/2021-09-01-improving-git-protocol-security-github/
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Stafford Horne <[email protected]>
|
|
Change notification is a commonly supported feature by most servers,
but the current ioctl to request notification when a directory is
changed does not return the information about what changed
(even though it is returned by the server in the SMB3 change
notify response), it simply returns when there is a change.
This ioctl improves upon CIFS_IOC_NOTIFY by returning the notify
information structure which includes the name of the file(s) that
changed and why. See MS-SMB2 2.2.35 for details on the individual
filter flags and the file_notify_information structure returned.
To use this simply pass in the following (with enough space
to fit at least one file_notify_information structure)
struct __attribute__((__packed__)) smb3_notify {
uint32_t completion_filter;
bool watch_tree;
uint32_t data_len;
uint8_t data[];
} __packed;
using CIFS_IOC_NOTIFY_INFO 0xc009cf0b
or equivalently _IOWR(CIFS_IOCTL_MAGIC, 11, struct smb3_notify_info)
The ioctl will block until the server detects a change to that
directory or its subdirectories (if watch_tree is set).
Acked-by: Paulo Alcantara (SUSE) <[email protected]>
Acked-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
cifs_open and _cifsFileInfo_put also end up with lease_key uninitialized
in smb1 mounts. It is cleaner to set lease key to zero in these
places where leases are not supported (smb1 can not return lease keys
so the field was uninitialized).
Addresses-Coverity: 1514207 ("Uninitialized scalar variable")
Addresses-Coverity: 1514331 ("Uninitialized scalar variable")
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|