aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2024-09-16Merge tag 'net-next-6.12' of ↵Linus Torvalds5-2/+29
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "The zero-copy changes are relatively significant, but regression risk should be contained. The feature needs to be used to cause trouble. Also it feels like we got an order of magnitude more semi-automated "refactoring" chaff than usual, I wonder if it's just us. Core & protocols: - Support Device Memory TCP, ability to zero-copy receive TCP payloads to a DMABUF region of memory while packet headers land separately in normal kernel buffers, and TCP processes then as usual. - The ability to read the PTP PHC (Physical Hardware Clock) alongside MONOTONIC_RAW timestamps with PTP_SYS_OFFSET_EXTENDED. Previously only CLOCK_REALTIME was supported. - Allow matching on all bits of IP DSCP for routing decisions. Previously we only supported on matching TOS bits in IPv4 which is a narrower interpretation of the same header field. - Increase the range of weights used for multi-path routing from 8 bits to 16 bits. - Add support for IPv6 PIO p flag in the Prefix Information Option per draft-ietf-6man-pio-pflag. - IPv6 IOAM6 support for new tunsrc encap mode for better performance. - Detect destinations which blackhole MPTCP traffic and avoid initiating MPTCP connections to them for a certain period of time, 1h by default. - Improve IPsec control path performance by removing the inexact policies list. - AF_VSOCK: add support for SIOCOUTQ ioctl. - Add enum for reasons TCP reset was sent for easier tracing. - Add SMC ringbufs usage statistics. Drivers: - Handle netconsole setup failures more gracefully, don't fail loading, retain the specified target as disabled. - Extend bonding's IPsec offload pass thru capabilities (ESN, stats). Filtering: - Add TCP_BPF_SOCK_OPS_CB_FLAGS to bpf_*sockopt() to address the case when long-lived sockets miss a chance to set additional callbacks if a sockops program was not attached early in their lifetime. - Support using BPF skb helpers in tracepoints. - Conntrack Netlink: support CTA_FILTER for flush. - Improve SCTP support in nfnetlink_queue. - Improve performance of large nftables flush transactions. Things we sprinkled into general kernel code: - selftests: support setting an "interpreter" for script files; make it easy to run as separate cases tests where one "interpreter" is fed various test descriptions (in our case packet sequences). Driver API: - Extend core and ethtool APIs to support many PHYs connected to a single interface (PHY topologies). - Extend cable diagnostics to specify whether Time Domain Reflectometry (TDR) or Active Link Cable Diagnostic (ALCD) was used. - Add library for implementing MAC-PHY Ethernet drivers for SPI devices compatible with Open Alliance 10BASE-T1x MAC-PHY Serial Interface (TC6) standard. - Add helpers to the PHY framework, for PHYs following the Open Alliance standards: - 1000BaseT1 link settings - cable test and diagnostics - Support listing / dumping all allocated RSS contexts. - Add configuration for frequency Embedded SYNC in DPLL, which magically embeds sync pulses into Ethernet signaling. Device drivers: - Ethernet high-speed NICs: - Broadcom (bnxt): - use better FW APIs for queue reset - support QOS and TPID settings for the SR-IOV VLAN - support dynamic MSI-X allocation - Intel (100G, ice, idpf): - ice: support PCIe subfunctions - iavf: add support for TC U32 filters on VFs - ice: support Embedded SYNC in DPLL - nVidia/Mellanox (mlx5): - support HW managed steering tables - support PCIe PTM cross timestamping - AMD/Pensando: - ionic: use page_pool to increase Rx performance - Cisco (enic): - report per-queue statistics - Ethernet virtual: - Microsoft vNIC: - mana: support configuring ring length - netvsc: enable more channels on systems with many CPUs - IBM veth: - optimize polling to improve TCP_RR performance - optimize performance of Tx handling - VirtIO net: - synchronize the operstate with the admin state to allow a lower virtio-net to propagate the link status to an upper device like macvlan - Ethernet NICs consumer, and embedded: - Add driver for Realtek automotive PCIe devices (RTL9054, RTL9068, RTL9072, RTL9075, RTL9068, RTL9071) - Add driver for Microchip LAN8650/1 10BASE-T1S MAC-PHY. - Microchip: - lan743x: use phylink - support WOL, EEE, pause, link settings - add Wake-on-LAN support for KSZ87xx family - add KSZ8895/KSZ8864 switch support - factor out FDMA code and use it in sparx5 and lan966x (including DCB support in both) - Synopsys (stmmac): - support frame preemption (configured using TC and ethtool) - support Loongson DWMAC (GMAC v3.73) - support RockChips RK3576 DWMAC - TI: - am65-cpsw: add multi queue RX support - icssg-prueth: HSR offload support - Cadence (macb): - enable software (hrtimer based) IRQ coalescing by default - Xilinx (axinet): - expose HW statistics - improve multicast filtering - relax Rx checksum offload constraints - MediaTek: - mt7530: add EN7581 support - Aspeed (ftgmac100): - report link speed and duplex - Intel: - igc: add mqprio offload - igc: report EEE configuration - RealTek (r8169): - add support for RTL8126A rev.b - Vitesse (vsc73xx): - implement FDB add/del/dump operations - Freescale (fs_enet): - use phylink - Ethernet PHYs: - vitesse: implement downshift and MDI-X in vsc73xx PHYs - microchip: support LAN887x, supporting IEEE 802.3bw (100BASE-T1) and IEEE 802.3bp (1000BASE-T1) specifications - add Applied Micro QT2025 PHY driver (in Rust) - add Motorcomm yt8821 2.5G Ethernet PHY driver - CAN: - add driver for Rockchip RK3568 CAN-FD controller - flexcan: add wakeup support for imx95 - kvaser_usb: set hardware timestamp on transmitted packets - WiFi: - mac80211/cfg80211: - EHT rate support in AQL airtime fairness - handle DFS (radar detection) per link in Multi-Link Operation - RealTek (rtw89): - support RTL8852BT and 8852BE-VT (WiFi 6) - support hardware rfkill - support HW encryption in unicast management frames - support Wake-on-WLAN with supported network detection - RealTek (rtw89): - improve Rx performance by using USB frame aggregation - support USB 3 with RTL8822CU/RTL8822BU - Intel (iwlwifi/mvm): - offload RLC/SMPS functionality to firmware - Marvell (mwifiex): - add host based MLME to enable WPA3 - Bluetooth: - add support for Amlogic HCI UART protocol - add support for ISO data/packets to Intel and NXP drivers" * tag 'net-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1303 commits) net/mlx5: HWS, check the correct variable in hws_send_ring_alloc_sq() netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level() ice: Fix a NULL vs IS_ERR() check in probe() ice: Fix a couple NULL vs IS_ERR() bugs net: ethernet: fs_enet: Make the per clock optional net: ti: icssg-prueth: Add multicast filtering support in HSR mode net: ti: icssg-prueth: Enable HSR Tx duplication, Tx Tag and Rx Tag offload net: ti: icssg-prueth: Add support for HSR frame forward offload net: ti: icssg-prueth: Stop hardcoding def_inc net: ti: icss-iep: Move icss_iep structure net: ibm: emac: get rid of wol_irq net: ibm: emac: remove all waiting code net: ibm: emac: replace of_get_property net: ibm: emac: use netdev's phydev directly net: ibm: emac: use devm for register_netdev net: ibm: emac: remove mii_bus with devm net: ibm: emac: use devm for of_iomap net: ibm: emac: manage emac_irq with devm net: ibm: emac: use devm for alloc_etherdev octeontx2-af: debugfs: Add Channel info to RPM map ...
2024-09-15riscv: avoid Imbalance in RASJisheng Zhang1-2/+2
Inspired by[1], modify the code to remove the code of modifying ra to avoid imbalance RAS (return address stack) which may lead to incorret predictions on return. Link: https://lore.kernel.org/linux-riscv/[email protected]/ [1] Signed-off-by: Jisheng Zhang <[email protected]> Reviewed-by: Cyril Bur <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-15Merge patch series "Svvptc extension to remove preventive sfence.vma"Palmer Dabbelt9-1/+145
Alexandre Ghiti <[email protected]> says: In RISC-V, after a new mapping is established, a sfence.vma needs to be emitted for different reasons: - if the uarch caches invalid entries, we need to invalidate it otherwise we would trap on this invalid entry, - if the uarch does not cache invalid entries, a reordered access could fail to see the new mapping and then trap (sfence.vma acts as a fence). We can actually avoid emitting those (mostly) useless and costly sfence.vma by handling the traps instead: - for new kernel mappings: only vmalloc mappings need to be taken care of, other new mapping are rare and already emit the required sfence.vma if needed. That must be achieved very early in the exception path as explained in patch 3, and this also fixes our fragile way of dealing with vmalloc faults. - for new user mappings: Svvptc makes update_mmu_cache() a no-op but we can take some gratuitous page faults (which are very unlikely though). Patch 1 and 2 introduce Svvptc extension probing. On our uarch that does not cache invalid entries and a 6.5 kernel, the gains are measurable: * Kernel boot: 6% * ltp - mmapstress01: 8% * lmbench - lat_pagefault: 20% * lmbench - lat_mmap: 5% Here are the corresponding numbers of sfence.vma emitted: * Ubuntu boot to login: Before: ~630k sfence.vma After: ~200k sfence.vma * ltp - mmapstress01 Before: ~45k After: ~6.3k * lmbench - lat_pagefault Before: ~665k After: 832 (!) * lmbench - lat_mmap Before: ~546k After: 718 (!) Thanks to Ved and Matt Evans for triggering the discussion that led to this patchset! * b4-shazam-merge: riscv: Stop emitting preventive sfence.vma for new userspace mappings with Svvptc riscv: Stop emitting preventive sfence.vma for new vmalloc mappings dt-bindings: riscv: Add Svvptc ISA extension description riscv: Add ISA extension parsing for Svvptc Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-15riscv: cacheinfo: Add back init_cache_level() functionSteffen Persvold1-0/+5
commit 5944ce092b97 (arch_topology: Build cacheinfo from primary CPU) removed the init_cache_level() function from arch/riscv/kernel/cacheinfo.c and relies on the init_cpu_topology() function in drivers/base/arch_topology.c to call fetch_cache_info() which in turn calls init_of_cache_level() to populate the cache hierarchy information. However, init_cpu_topology() is only called from smpboot.c:smp_prepare_cpus() and thus only available when CONFIG_SMP is defined. To support non-SMP enabled kernels to still detect cache hierarchy, we add back the init_cache_level() function. The init_level_allocate_ci() function handles this gracefully on SMP-enabled kernels anyway where fetch_cache_info() is called from init_cpu_topology() earlier in the boot phase. Signed-off-by: Steffen Persvold <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-15riscv: Remove unused _TIF_WORK_MASKJinjie Ruan1-4/+0
Since commit f0bddf50586d ("riscv: entry: Convert to generic entry"), _TIF_WORK_MASK is no longer used, so remove it. Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry") Signed-off-by: Jinjie Ruan <[email protected]> Reviewed-by: Guo Ren <[email protected]> Reviewed-by: Andy Chiu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-15Merge patch series "riscv: select ARCH_USE_SYM_ANNOTATIONS"Palmer Dabbelt2-4/+5
Jisheng Zhang <[email protected]> says: commit 76329c693924 ("riscv: Use SYM_*() assembly macros instead of deprecated ones"), most riscv has been to converted the new style SYM_ assembler annotations. The remaining one is sifive's errata_cip_453.S, so convert to new style SYM_ annotations as well. After that select ARCH_USE_SYM_ANNOTATIONS. * b4-shazam-merge: riscv: select ARCH_USE_SYM_ANNOTATIONS riscv: errata: sifive: Use SYM_*() assembly macros Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-15Merge patch series "riscv: stacktrace: Add USER_STACKTRACE support"Palmer Dabbelt3-43/+47
Jinjie Ruan <[email protected]> says: Add RISC-V USER_STACKTRACE support, and fix the fp alignment bug in perf_callchain_user() by the way as Björn pointed out. * b4-shazam-merge: riscv: stacktrace: Add USER_STACKTRACE support riscv: Fix fp alignment bug in perf_callchain_user() Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-15riscv: define ILLEGAL_POINTER_VALUE for 64bitJisheng Zhang1-0/+5
This is used in poison.h for poison pointer offset. Based on current SV39, SV48 and SV57 vm layout, 0xdead000000000000 is a proper value that is not mappable, this can avoid potentially turning an oops to an expolit. Signed-off-by: Jisheng Zhang <[email protected]> Fixes: fbe934d69eb7 ("RISC-V: Build Infrastructure") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-15Merge tag 'for-linus-6.11' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-11/+7
Pull kvm fix from Paolo Bonzini: "Do not always honor guest PAT on CPUs that support self-snoop. This triggers an issue in the bochsdrm driver, which used ioremap() instead of ioremap_wc() to map the video RAM. The revert lets video RAM use the WB memory type instead of the slower UC memory type" * tag 'for-linus-6.11' of git://git.kernel.org/pub/scm/virt/kvm/kvm: Revert "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop"
2024-09-15riscv: Stop emitting preventive sfence.vma for new userspace mappings with ↵Alexandre Ghiti2-1/+28
Svvptc The preventive sfence.vma were emitted because new mappings must be made visible to the page table walker but Svvptc guarantees that it will happen within a bounded timeframe, so no need to sfence.vma for the uarchs that implement this extension, we will then take gratuitous (but very unlikely) page faults, similarly to x86 and arm64. This allows to drastically reduce the number of sfence.vma emitted: * Ubuntu boot to login: Before: ~630k sfence.vma After: ~200k sfence.vma * ltp - mmapstress01 Before: ~45k After: ~6.3k * lmbench - lat_pagefault Before: ~665k After: 832 (!) * lmbench - lat_mmap Before: ~546k After: 718 (!) Signed-off-by: Alexandre Ghiti <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-15riscv: Stop emitting preventive sfence.vma for new vmalloc mappingsAlexandre Ghiti5-1/+120
In 6.5, we removed the vmalloc fault path because that can't work (see [1] [2]). Then in order to make sure that new page table entries were seen by the page table walker, we had to preventively emit a sfence.vma on all harts [3] but this solution is very costly since it relies on IPI. And even there, we could end up in a loop of vmalloc faults if a vmalloc allocation is done in the IPI path (for example if it is traced, see [4]), which could result in a kernel stack overflow. Those preventive sfence.vma needed to be emitted because: - if the uarch caches invalid entries, the new mapping may not be observed by the page table walker and an invalidation may be needed. - if the uarch does not cache invalid entries, a reordered access could "miss" the new mapping and traps: in that case, we would actually only need to retry the access, no sfence.vma is required. So this patch removes those preventive sfence.vma and actually handles the possible (and unlikely) exceptions. And since the kernel stacks mappings lie in the vmalloc area, this handling must be done very early when the trap is taken, at the very beginning of handle_exception: this also rules out the vmalloc allocations in the fault path. Link: https://lore.kernel.org/linux-riscv/[email protected]/ [1] Link: https://lore.kernel.org/linux-riscv/[email protected] [2] Link: https://lore.kernel.org/linux-riscv/[email protected]/ [3] Link: https://lore.kernel.org/lkml/[email protected]/ [4] Signed-off-by: Alexandre Ghiti <[email protected]> Reviewed-by: Yunhui Cui <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-15riscv: Add ISA extension parsing for SvvptcAlexandre Ghiti2-0/+2
Add support to parse the Svvptc string in the riscv,isa string. Signed-off-by: Alexandre Ghiti <[email protected]> Reviewed-by: Conor Dooley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-15riscv: select ARCH_USE_SYM_ANNOTATIONSJisheng Zhang1-0/+1
Now, riscv has been converted to the new style SYM_ assembler annotations. So select ARCH_USE_SYM_ANNOTATIONS to ensure the deprecated macros such as ENTRY(), END(), WEAK() and so on are not available and we don't regress. Signed-off-by: Jisheng Zhang <[email protected]> Reviewed-By: Clément Léger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-15riscv: errata: sifive: Use SYM_*() assembly macrosJisheng Zhang1-4/+4
ENTRY()/END() macros are deprecated and we should make use of the new SYM_*() macros [1] for better annotation of symbols. Replace the deprecated ones with the new ones. [1] https://docs.kernel.org/core-api/asm-annotations.html Signed-off-by: Jisheng Zhang <[email protected]> Reviewed-By: Clément Léger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-14riscv: stacktrace: Add USER_STACKTRACE supportJinjie Ruan3-43/+47
Currently, userstacktrace is unsupported for riscv. So use the perf_callchain_user() code as blueprint to implement the arch_stack_walk_user() which add userstacktrace support on riscv. Meanwhile, we can use arch_stack_walk_user() to simplify the implementation of perf_callchain_user(). A ftrace test case is shown as below: # cd /sys/kernel/debug/tracing # echo 1 > options/userstacktrace # echo 1 > options/sym-userobj # echo 1 > events/sched/sched_process_fork/enable # cat trace ...... bash-178 [000] ...1. 97.968395: sched_process_fork: comm=bash pid=178 child_comm=bash child_pid=231 bash-178 [000] ...1. 97.970075: <user stack trace> => /lib/libc.so.6[+0xb5090] Also a simple perf test is ok as below: # perf record -e cpu-clock --call-graph fp top # perf report --call-graph ..... [[31m 66.54%[[m 0.00% top [kernel.kallsyms] [k] ret_from_exception | ---ret_from_exception | |--[[31m58.97%[[m--do_trap_ecall_u | | | |--[[31m17.34%[[m--__riscv_sys_read | | ksys_read | | | | | --[[31m16.88%[[m--vfs_read | | | | | |--[[31m10.90%[[m--seq_read Signed-off-by: Jinjie Ruan <[email protected]> Tested-by: Jinjie Ruan <[email protected]> Cc: Björn Töpel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-14riscv: Fix fp alignment bug in perf_callchain_user()Jinjie Ruan1-1/+1
The standard RISC-V calling convention said: "The stack grows downward and the stack pointer is always kept 16-byte aligned". So perf_callchain_user() should check whether 16-byte aligned for fp. Link: https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf Fixes: dbeb90b0c1eb ("riscv: Add perf callchain support") Signed-off-by: Jinjie Ruan <[email protected]> Cc: Björn Töpel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-15Revert "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop"Paolo Bonzini2-11/+7
This reverts commit 377b2f359d1f71c75f8cc352b5c81f2210312d83. This caused a regression with the bochsdrm driver, which used ioremap() instead of ioremap_wc() to map the video RAM. After the commit, the WB memory type is used without the IGNORE_PAT, resulting in the slower UC memory type. In fact, UC is slow enough to basically cause guests to not boot... but only on new processors such as Sapphire Rapids and Cascade Lake. Coffee Lake for example works properly, though that might also be an effect of being on a larger, more NUMA system. The driver has been fixed but that does not help older guests. Until we figure out whether Cascade Lake and newer processors are working as intended, revert the commit. Long term we might add a quirk, but the details depend on whether the processors are working as intended: for example if they are, the quirk might reference bochs-compatible devices, e.g. in the name and documentation, so that userspace can disable the quirk by default and only leave it enabled if such a device is being exposed to the guest. If instead this is actually a bug in CLX+, then the actions we need to take are different and depend on the actual cause of the bug. Signed-off-by: Paolo Bonzini <[email protected]>
2024-09-15Merge tag 'kvm-riscv-6.12-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini3-18/+21
KVM/riscv changes for 6.12 - Fix sbiret init before forwarding to userspace - Don't zero-out PMU snapshot area before freeing data - Allow legacy PMU access from guest - Fix to allow hpmcounter31 from the guest
2024-09-15Merge tag 'loongarch-kvm-6.12' of ↵Paolo Bonzini84-311/+1118
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.12 1. Revert qspinlock to test-and-set simple lock on VM. 2. Add Loongson Binary Translation extension support. 3. Add PMU support for guest. 4. Enable paravirt feature control from VMM. 5. Implement function kvm_para_has_feature().
2024-09-14riscv: Remove redundant restriction on memory sizeStuart Menefy1-7/+1
The original reason for reserving the top 4GiB of the direct map (space for modules/BPF/kernel) hasn't applied since the address map was reworked for KASAN. Signed-off-by: Stuart Menefy <[email protected]> Reviewed-by: Alexandre Ghiti <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-14riscv: vdso: do not strip debugging info for vdso.so.dbgChangbin Du1-1/+1
The vdso.so.dbg is a debug version of vdso and could be used for debugging purpose. For example, perf-annotate requires debugging info to show source lines. So let's keep its debugging info. Signed-off-by: Changbin Du <[email protected]> Reviewed-by: Cyril Bur <[email protected]> Reviewed-by: Alexandre Ghiti <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-09-13MIPS: Remove the obsoleted code for include/linux/mv643xx.hGaosheng Cui1-2/+5
Most of the drivers which used this header have been deleted, most of these code is obsoleted, move the only defines that are actually used into arch/powerpc/platforms/chrp/pegasos_eth.c and delete the file completely. Signed-off-by: Gaosheng Cui <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-09-13s390/vdso: Wire up getrandom() vdso implementationHeiko Carstens10-8/+289
Provide the s390 specific vdso getrandom() architecture backend. _vdso_rng_data required data is placed within the _vdso_data vvar page, by using a hardcoded offset larger than vdso_data. As required the chacha20 implementation does not write to the stack. The implementation follows more or less the arm64 implementations and makes use of vector instructions. It has a fallback to the getrandom() system call for machines where the vector facility is not installed. The check if the vector facility is installed, as well as an optimization for machines with the vector-enhancements facility 2, is implemented with alternatives, avoiding runtime checks. Note that __kernel_getrandom() is implemented without the vdso user wrapper which would setup a stack frame for odd cases (aka very old glibc variants) where the caller has not done that. All callers of __kernel_getrandom() are required to setup a stack frame, like the C ABI requires it. The vdso testcases vdso_test_getrandom and vdso_test_chacha pass. Benchmark on a z16: $ ./vdso_test_getrandom bench-single vdso: 25000000 times in 0.493703559 seconds syscall: 25000000 times in 6.584025337 seconds Signed-off-by: Heiko Carstens <[email protected]> Reviewed-by: Harald Freudenberger <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13arch/sparc: remove unused varible paddrbase in function leon_swprobe()Alex Shi1-7/+1
commit f22ed71cd602 ("sparc32,leon: SRMMU MMU Table probe fix") change return value from paddrbase to 'pte', but left the varible here. That causes a build warning for this varible, so we may remove it. make --keep-going CROSS_COMPILE=/home/alexs/0day/gcc-14.1.0-nolibc/sparc-linux/bin/sparc-linux- --jobs=16 KCFLAGS= -Wtautological-compare -Wno-error=return-type -Wreturn-type -Wcast-function-type -funsigned-char -Wundef -fstrict-flex-arrays=3 -Wformat-overflow -Wformat-truncation -Wrestrict -Wenum-conversion W=1 O=sparc ARCH=sparc defconfig SHELL=/bin/bash arch/sparc/mm/ mm/ -s <stdin>:1519:2: warning: #warning syscall clone3 not implemented [-Wcpp] ../arch/sparc/mm/leon_mm.c: In function 'leon_swprobe': ../arch/sparc/mm/leon_mm.c:42:32: warning: variable 'paddrbase' set but not used [-Wunused-but-set-variable] 42 | unsigned int lvl, pte, paddrbase; | ^~~~~~~~~ Signed-off-by: Alex Shi <[email protected]> To: [email protected] To: [email protected] To: Christian Brauner <[email protected]> To: Andreas Larsson <[email protected]> To: David S. Miller <[email protected]> Reviewed-by: Andreas Larsson <[email protected]> Tested-by: Andreas Larsson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Andreas Larsson <[email protected]>
2024-09-13s390/vdso: Move vdso symbol handling to separate header fileHeiko Carstens4-14/+19
The vdso.h header file, which is included at many places, includes generated header files. This can easily lead to recursive header file inclusions if the vdso code is changed. Therefore move the vdso symbol code, which requires the generated header files, to a separate header file, and include it at the two locations which require it. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13s390/vdso: Allow alternatives in vdso codeHeiko Carstens2-0/+23
Implement the infrastructure required to allow alternatives in vdso code. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13s390/module: Provide find_section() helperHeiko Carstens1-0/+14
Provide find_section() helper function which can be used to find a section by name, similar to other architectures. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13s390/facility: Let test_facility() generate static branch if possibleHeiko Carstens1-8/+29
Let test_facility() generate a branch instruction if the tested facility is a constant, and where the result cannot be evaluated during compile time. The branch instruction defaults to "false" and is patched to nop (branch not taken) if the tested facility is available. This avoids runtime checks and is similar to x86's static_cpu_has() and arm64's alternative_has_cap_likely(). Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13s390/alternatives: Remove ALT_FACILITY_EARLYHeiko Carstens2-6/+2
Patch all alternatives which depend on facilities from the decompressor. There is no technical reason which enforces to split patching of such alternatives to the decompressor and the kernel. This simplifies alternative handling a bit, since one alternative type is removed. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13s390/facility: Disable compile time optimization for decompressor codeHeiko Carstens1-2/+4
Disable compile time optimizations of test_facility() for the decompressor. The decompressor should not contain any optimized code depending on the architecture level set the kernel image is compiled for to avoid unexpected operation exceptions. Add a __DECOMPRESSOR check to test_facility() to enforce that facilities are always checked during runtime for the decompressor. Reviewed-by: Sven Schnelle <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64Christophe Leroy5-3/+69
Extend getrandom() vDSO implementation to VDSO64. Tested on QEMU on both ppc64_defconfig and ppc64le_defconfig. Results from a Power9 (PowerNV): ~ # ./vdso_test_getrandom bench-single    vdso: 25000000 times in 0.787943615 seconds    libc: 25000000 times in 14.101887252 seconds    syscall: 25000000 times in 14.047475082 seconds Signed-off-by: Christophe Leroy <[email protected]> Tested-by: Madhavan Srinivasan <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32Christophe Leroy11-4/+453
To be consistent with other VDSO functions, the function is called __kernel_getrandom() __arch_chacha20_blocks_nostack() fonction is implemented basically with 32 bits operations. It performs 4 QUARTERROUND operations in parallele. There are enough registers to avoid using the stack: On input: r3: output bytes r4: 32-byte key input r5: 8-byte counter input/output r6: number of 64-byte blocks to write to output During operation: stack: pointer to counter (r5) and non-volatile registers (r14-131) r0: counter of blocks (initialised with r6) r4: Value '4' after key has been read, used for indexing r5-r12: key r14-r15: block counter r16-r31: chacha state At the end: r0, r6-r12: Zeroised r5, r14-r31: Restored Performance on powerpc 885 (using kernel selftest): ~# ./vdso_test_getrandom bench-single vdso: 25000000 times in 62.938002291 seconds libc: 25000000 times in 535.581916866 seconds syscall: 25000000 times in 531.525042806 seconds Performance on powerpc 8321 (using kernel selftest): ~# ./vdso_test_getrandom bench-single vdso: 25000000 times in 16.899318858 seconds libc: 25000000 times in 131.050596522 seconds syscall: 25000000 times in 129.794790389 seconds This first patch adds support for VDSO32. As selftests cannot easily be generated only for VDSO32, and because the following patch brings support for VDSO64 anyway, this patch opts out all code in __arch_chacha20_blocks_nostack() so that vdso_test_chacha will not fail to compile and will not crash on PPC64/PPC64LE, allthough the selftest itself will fail. Signed-off-by: Christophe Leroy <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13powerpc/vdso: Refactor CFLAGS for CVDSO buildChristophe Leroy1-19/+13
In order to avoid two much duplication when we add new VDSO functionnalities in C like getrandom, refactor common CFLAGS. Signed-off-by: Christophe Leroy <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13powerpc/vdso32: Add crtsavresChristophe Leroy2-14/+4
Commit 08c18b63d965 ("powerpc/vdso32: Add missing _restgpr_31_x to fix build failure") added _restgpr_31_x to the vdso for gettimeofday, but the work on getrandom shows that we will need more of those functions. Remove _restgpr_31_x and link in crtsavres.o so that we get all save/restore functions when optimising the kernel for size. Signed-off-by: Christophe Leroy <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13powerpc/vdso: Fix VDSO data access when running in a non-root time namespaceChristophe Leroy4-3/+20
When running in a non-root time namespace, the global VDSO data page is replaced by a dedicated namespace data page and the global data page is mapped next to it. Detailed explanations can be found at commit 660fd04f9317 ("lib/vdso: Prepare for time namespace support"). When it happens, __kernel_get_syscall_map and __kernel_get_tbfreq and __kernel_sync_dicache don't work anymore because they read 0 instead of the data they need. To address that, clock_mode has to be read. When it is set to VDSO_CLOCKMODE_TIMENS, it means it is a dedicated namespace data page and the global data is located on the following page. Add a macro called get_realdatapage which reads clock_mode and add PAGE_SIZE to the pointer provided by get_datapage macro when clock_mode is equal to VDSO_CLOCKMODE_TIMENS. Use this new macro instead of get_datapage macro except for time functions as they handle it internally. Fixes: 74205b3fc2ef ("powerpc/vdso: Add support for time namespaces") Reported-by: Jason A. Donenfeld <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Christophe Leroy <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13arm64: vDSO: Wire up getrandom() vDSO implementationAdhemerval Zanella9-15/+279
Hook up the generic vDSO implementation to the aarch64 vDSO data page. The _vdso_rng_data required data is placed within the _vdso_data vvar page, by using a offset larger than the vdso_data. The vDSO function requires a ChaCha20 implementation that does not write to the stack, and that can do an entire ChaCha20 permutation. The one provided uses NEON on the permute operation, with a fallback to the syscall for chips that do not support AdvSIMD. This also passes the vdso_test_chacha test along with vdso_test_getrandom. The vdso_test_getrandom bench-single result on Neoverse-N1 shows: vdso: 25000000 times in 0.783884250 seconds libc: 25000000 times in 8.780275399 seconds syscall: 25000000 times in 8.786581518 seconds A small fixup to arch/arm64/include/asm/mman.h was required to avoid pulling kernel code into the vDSO, similar to what's already done in arch/arm64/include/asm/rwonce.h. Signed-off-by: Adhemerval Zanella <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13arm64: alternative: make alternative_has_cap_likely() VDSO compatibleMark Rutland1-0/+4
Currently alternative_has_cap_unlikely() can be used in VDSO code, but alternative_has_cap_likely() cannot as it references alt_cb_patch_nops, which is not available when linking the VDSO. This is unfortunate as it would be useful to have alternative_has_cap_likely() available in VDSO code. The use of alt_cb_patch_nops was added in commit: d926079f17bf8aa4 ("arm64: alternatives: add shared NOP callback") ... as removing duplicate NOPs within the kernel Image saved areasonable amount of space. Given the VDSO code will have nowhere near as many alternative branches as the main kernel image, this isn't much of a concern, and a few extra nops isn't a massive problem. Change alternative_has_cap_likely() to only use alt_cb_patch_nops for the main kernel image, and allow duplicate NOPs in VDSO code. Signed-off-by: Mark Rutland <[email protected]> Signed-off-by: Adhemerval Zanella <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13LoongArch: vDSO: Wire up getrandom() vDSO implementationXi Ruoyao9-1/+314
Hook up the generic vDSO implementation to the LoongArch vDSO data page by providing the required __arch_chacha20_blocks_nostack, __arch_get_k_vdso_rng_data, and getrandom_syscall implementations. Also wire up the selftests. Signed-off-by: Xi Ruoyao <[email protected]> Acked-by: Huacai Chen <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13random: vDSO: add a __vdso_getrandom prototype for all architecturesXi Ruoyao1-2/+0
Without a prototype, we'll have to add a prototype for each architecture implementing vDSO getrandom. As most architectures will likely have the vDSO getrandom implemented in a near future, and we'd like to keep the declarations compatible everywhere (to ease the libc implementor work), we should really just have one copy of the prototype. This also is what's already done inside of include/vdso/gettime.h for those vDSO functions, so this continues that convention. Suggested-by: Huacai Chen <[email protected]> Signed-off-by: Xi Ruoyao <[email protected]> Acked-by: Huacai Chen <[email protected]> [Jason: rewrite docbook comment for prototype.] Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13random: vDSO: minimize and simplify header includesChristophe Leroy1-0/+1
Depending on the architecture, building a 32-bit vDSO on a 64-bit kernel is problematic when some system headers are included. Minimise the amount of headers by moving needed items, such as __{get,put}_unaligned_t, into dedicated common headers and in general use more specific headers, similar to what was done in commit 8165b57bca21 ("linux/const.h: Extract common header for vDSO") and commit 8c59ab839f52 ("lib/vdso: Enable common headers"). On some architectures this results in missing PAGE_SIZE, as was described by commit 8b3843ae3634 ("vdso/datapage: Quick fix - use asm/page-def.h for ARM64"), so define this if necessary, in the same way as done prior by commit cffaefd15a8f ("vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.h"). Removing linux/time64.h leads to missing 'struct timespec64' in x86's asm/pvclock.h. Add a forward declaration of that struct in that file. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13random: vDSO: add __arch_get_k_vdso_rng_data() helper for data page accessChristophe Leroy2-3/+10
_vdso_data is specific to x86 and __arch_get_k_vdso_data() is provided so that all architectures can provide the requested pointer. Do the same with _vdso_rng_data, provide __arch_get_k_vdso_rng_data() and don't use x86 _vdso_rng_data directly. Until now vdso/vsyscall.h was only included by time/vsyscall.c but now it will also be included in char/random.c, leading to a duplicate declaration of _vdso_data and _vdso_rng_data. To fix this issue, move the declaration in a C file. vma.c looks like the most appropriate candidate. We don't need to replace the definitions in vsyscall.h by declarations as declarations are already in asm/vvar.h. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13random: vDSO: move prototype of arch chacha function to vdso/getrandom.hJason A. Donenfeld1-13/+0
Having the prototype for __arch_chacha20_blocks_nostack in arch/x86/include/asm/vdso/getrandom.h meant that the prototype and large doc comment were cloned by every architecture, which has been causing unnecessary churn. Instead move it into include/vdso/getrandom.h, where it can be shared by all archs implementing it. As a side bonus, this then lets us use that prototype in the vdso_test_chacha self test, to ensure that it matches the source, and indeed doing so turned up some inconsistencies, which are rectified here. Suggested-by: Christophe Leroy <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-09-13xtensa: Emulate one-byte cmpxchgPaul E. McKenney2-0/+3
Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa. [ paulmck: Apply kernel test robot feedback. ] [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ] [ Apply Geert Uytterhoeven feedback. ] Signed-off-by: Paul E. McKenney <[email protected]> Tested-by: Yujie Liu <[email protected]> Cc: Andi Shyti <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]>
2024-09-13sh: Emulate one-byte cmpxchgPaul E. McKenney2-0/+4
Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on sh. [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ] [ paulmck: Apply feedback from Naresh Kamboju. ] [ Apply Geert Uytterhoeven feedback. ] Signed-off-by: Paul E. McKenney <[email protected]> Cc: Andi Shyti <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: <[email protected]> Acked-by: John Paul Adrian Glaubitz <[email protected]>
2024-09-13ARC: Emulate one-byte cmpxchgPaul E. McKenney2-2/+5
Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on arc. [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ] [ paulmck: Apply feedback from Naresh Kamboju. ] [ paulmck: Apply kernel test robot feedback. ] [ paulmck: Apply feedback from Vineet Gupta. ] Signed-off-by: Paul E. McKenney <[email protected]> Cc: Andi Shyti <[email protected]> Cc: Andrzej Hajda <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: <[email protected]> Acked-by: Vineet Gupta <[email protected]>
2024-09-13crypto: mips/crc32 - Clean up useless assignment operationsWangYuli1-33/+37
When entering the "len & sizeof(u32)" branch, len must be less than 8. So after one operation, len must be less than 4. At this time, "len -= sizeof(u32)" is not necessary for 64-bit CPUs. After that, replace `while' loops with equivalent `for' to make the code structure a little bit better by the way. Suggested-by: Maciej W. Rozycki <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Suggested-by: Herbert Xu <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Guan Wentao <[email protected]> Signed-off-by: WangYuli <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2024-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski30-137/+208
Cross-merge networking fixes after downstream PR. No conflicts (sort of) and no adjacent changes. This merge reverts commit b3c9e65eb227 ("net: hsr: remove seqnr_lock") from net, as it was superseded by commit 430d67bdcb04 ("net: hsr: Use the seqnr lock for frames received via interlink port.") in net-next. Signed-off-by: Jakub Kicinski <[email protected]>
2024-09-13cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERSAlice Ryhl1-0/+16
Introduce a Kconfig option for enabling the experimental option to normalize integer types. This ensures that integer types of the same size and signedness are considered compatible by the Control Flow Integrity sanitizer. The security impact of this flag is minimal. When Sami Tolvanen looked into it, he found that integer normalization reduced the number of unique type hashes in the kernel by ~1%, which is acceptable. This option exists for compatibility with Rust, as C and Rust do not have the same set of integer types. There are cases where C has two different integer types of the same size and signedness, but Rust only has one integer type of that size and signedness. When Rust calls into C functions using such types in their signature, this results in CFI failures. One example is 'unsigned long long' and 'unsigned long' which are both 64-bit on LP64 targets, so on those targets this flag will give both types the same CFI tag. This flag changes the ABI heavily. It is not applied automatically when CONFIG_RUST is turned on to make sure that the CONFIG_RUST option does not change the ABI of C code. For example, some build may need to make other changes atomically with toggling this flag. Having it be a separate option makes it possible to first turn on normalized integer tags, and then later turn on CONFIG_RUST. Similarly, when turning on CONFIG_RUST in a build, you may need a few attempts where the RUST=y commit gets reverted a few times. It is inconvenient if reverting RUST=y also requires reverting the changes you made to support normalized integer tags. To avoid having this flag impact builds that don't care about this, the next patch in this series will make CONFIG_RUST turn on this option using `select` rather than `depends on`. Signed-off-by: Alice Ryhl <[email protected]> Reviewed-by: Sami Tolvanen <[email protected]> Tested-by: Gatlin Newhouse <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Miguel Ojeda <[email protected]>
2024-09-13rust: support for shadow call stack sanitizerAlice Ryhl3-2/+24
Add all of the flags that are needed to support the shadow call stack (SCS) sanitizer with Rust, and updates Kconfig to allow only configurations that work. The -Zfixed-x18 flag is required to use SCS on arm64, and requires rustc version 1.80.0 or greater. This restriction is reflected in Kconfig. When CONFIG_DYNAMIC_SCS is enabled, the build will be configured to include unwind tables in the build artifacts. Dynamic SCS uses the unwind tables at boot to find all places that need to be patched. The -Cforce-unwind-tables=y flag ensures that unwind tables are available for Rust code. In non-dynamic mode, the -Zsanitizer=shadow-call-stack flag is what enables the SCS sanitizer. Using this flag requires rustc version 1.82.0 or greater on the targets used by Rust in the kernel. This restriction is reflected in Kconfig. It is possible to avoid the requirement of rustc 1.80.0 by using -Ctarget-feature=+reserve-x18 instead of -Zfixed-x18. However, this flag emits a warning during the build, so this patch does not add support for using it and instead requires 1.80.0 or greater. The dependency is placed on `select HAVE_RUST` to avoid a situation where enabling Rust silently turns off the sanitizer. Instead, turning on the sanitizer results in Rust being disabled. We generally do not want changes to CONFIG_RUST to result in any mitigations being changed or turned off. At the time of writing, rustc 1.82.0 only exists via the nightly release channel. There is a chance that the -Zsanitizer=shadow-call-stack flag will end up needing 1.83.0 instead, but I think it is small. Reviewed-by: Sami Tolvanen <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Alice Ryhl <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Fixed indentation using spaces. - Miguel ] Signed-off-by: Miguel Ojeda <[email protected]>
2024-09-12Merge tag 'riscv-for-linus-6.11-rc8' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - Two fixes for smp_processor_id() calls in preemptible sections: one if the perf driver, and one in the fence.i prctl. * tag 'riscv-for-linus-6.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Disable preemption while handling PR_RISCV_CTX_SW_FENCEI_OFF drivers: perf: Fix smp_processor_id() use in preemptible code