Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"The zero-copy changes are relatively significant, but regression risk
should be contained. The feature needs to be used to cause trouble.
Also it feels like we got an order of magnitude more semi-automated
"refactoring" chaff than usual, I wonder if it's just us.
Core & protocols:
- Support Device Memory TCP, ability to zero-copy receive TCP
payloads to a DMABUF region of memory while packet headers land
separately in normal kernel buffers, and TCP processes then as
usual.
- The ability to read the PTP PHC (Physical Hardware Clock) alongside
MONOTONIC_RAW timestamps with PTP_SYS_OFFSET_EXTENDED. Previously
only CLOCK_REALTIME was supported.
- Allow matching on all bits of IP DSCP for routing decisions.
Previously we only supported on matching TOS bits in IPv4 which is
a narrower interpretation of the same header field.
- Increase the range of weights used for multi-path routing from
8 bits to 16 bits.
- Add support for IPv6 PIO p flag in the Prefix Information Option
per draft-ietf-6man-pio-pflag.
- IPv6 IOAM6 support for new tunsrc encap mode for better
performance.
- Detect destinations which blackhole MPTCP traffic and avoid
initiating MPTCP connections to them for a certain period of time,
1h by default.
- Improve IPsec control path performance by removing the inexact
policies list.
- AF_VSOCK: add support for SIOCOUTQ ioctl.
- Add enum for reasons TCP reset was sent for easier tracing.
- Add SMC ringbufs usage statistics.
Drivers:
- Handle netconsole setup failures more gracefully, don't fail
loading, retain the specified target as disabled.
- Extend bonding's IPsec offload pass thru capabilities (ESN, stats).
Filtering:
- Add TCP_BPF_SOCK_OPS_CB_FLAGS to bpf_*sockopt() to address the case
when long-lived sockets miss a chance to set additional callbacks
if a sockops program was not attached early in their lifetime.
- Support using BPF skb helpers in tracepoints.
- Conntrack Netlink: support CTA_FILTER for flush.
- Improve SCTP support in nfnetlink_queue.
- Improve performance of large nftables flush transactions.
Things we sprinkled into general kernel code:
- selftests: support setting an "interpreter" for script files; make
it easy to run as separate cases tests where one "interpreter" is
fed various test descriptions (in our case packet sequences).
Driver API:
- Extend core and ethtool APIs to support many PHYs connected to a
single interface (PHY topologies).
- Extend cable diagnostics to specify whether Time Domain
Reflectometry (TDR) or Active Link Cable Diagnostic (ALCD) was
used.
- Add library for implementing MAC-PHY Ethernet drivers for SPI
devices compatible with Open Alliance 10BASE-T1x MAC-PHY Serial
Interface (TC6) standard.
- Add helpers to the PHY framework, for PHYs following the Open
Alliance standards:
- 1000BaseT1 link settings
- cable test and diagnostics
- Support listing / dumping all allocated RSS contexts.
- Add configuration for frequency Embedded SYNC in DPLL, which
magically embeds sync pulses into Ethernet signaling.
Device drivers:
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- use better FW APIs for queue reset
- support QOS and TPID settings for the SR-IOV VLAN
- support dynamic MSI-X allocation
- Intel (100G, ice, idpf):
- ice: support PCIe subfunctions
- iavf: add support for TC U32 filters on VFs
- ice: support Embedded SYNC in DPLL
- nVidia/Mellanox (mlx5):
- support HW managed steering tables
- support PCIe PTM cross timestamping
- AMD/Pensando:
- ionic: use page_pool to increase Rx performance
- Cisco (enic):
- report per-queue statistics
- Ethernet virtual:
- Microsoft vNIC:
- mana: support configuring ring length
- netvsc: enable more channels on systems with many CPUs
- IBM veth:
- optimize polling to improve TCP_RR performance
- optimize performance of Tx handling
- VirtIO net:
- synchronize the operstate with the admin state to allow a
lower virtio-net to propagate the link status to an upper
device like macvlan
- Ethernet NICs consumer, and embedded:
- Add driver for Realtek automotive PCIe devices (RTL9054,
RTL9068, RTL9072, RTL9075, RTL9068, RTL9071)
- Add driver for Microchip LAN8650/1 10BASE-T1S MAC-PHY.
- Microchip:
- lan743x: use phylink - support WOL, EEE, pause, link settings
- add Wake-on-LAN support for KSZ87xx family
- add KSZ8895/KSZ8864 switch support
- factor out FDMA code and use it in sparx5 and lan966x
(including DCB support in both)
- Synopsys (stmmac):
- support frame preemption (configured using TC and ethtool)
- support Loongson DWMAC (GMAC v3.73)
- support RockChips RK3576 DWMAC
- TI:
- am65-cpsw: add multi queue RX support
- icssg-prueth: HSR offload support
- Cadence (macb):
- enable software (hrtimer based) IRQ coalescing by default
- Xilinx (axinet):
- expose HW statistics
- improve multicast filtering
- relax Rx checksum offload constraints
- MediaTek:
- mt7530: add EN7581 support
- Aspeed (ftgmac100):
- report link speed and duplex
- Intel:
- igc: add mqprio offload
- igc: report EEE configuration
- RealTek (r8169):
- add support for RTL8126A rev.b
- Vitesse (vsc73xx):
- implement FDB add/del/dump operations
- Freescale (fs_enet):
- use phylink
- Ethernet PHYs:
- vitesse: implement downshift and MDI-X in vsc73xx PHYs
- microchip: support LAN887x, supporting IEEE 802.3bw (100BASE-T1)
and IEEE 802.3bp (1000BASE-T1) specifications
- add Applied Micro QT2025 PHY driver (in Rust)
- add Motorcomm yt8821 2.5G Ethernet PHY driver
- CAN:
- add driver for Rockchip RK3568 CAN-FD controller
- flexcan: add wakeup support for imx95
- kvaser_usb: set hardware timestamp on transmitted packets
- WiFi:
- mac80211/cfg80211:
- EHT rate support in AQL airtime fairness
- handle DFS (radar detection) per link in Multi-Link Operation
- RealTek (rtw89):
- support RTL8852BT and 8852BE-VT (WiFi 6)
- support hardware rfkill
- support HW encryption in unicast management frames
- support Wake-on-WLAN with supported network detection
- RealTek (rtw89):
- improve Rx performance by using USB frame aggregation
- support USB 3 with RTL8822CU/RTL8822BU
- Intel (iwlwifi/mvm):
- offload RLC/SMPS functionality to firmware
- Marvell (mwifiex):
- add host based MLME to enable WPA3
- Bluetooth:
- add support for Amlogic HCI UART protocol
- add support for ISO data/packets to Intel and NXP drivers"
* tag 'net-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1303 commits)
net/mlx5: HWS, check the correct variable in hws_send_ring_alloc_sq()
netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level()
ice: Fix a NULL vs IS_ERR() check in probe()
ice: Fix a couple NULL vs IS_ERR() bugs
net: ethernet: fs_enet: Make the per clock optional
net: ti: icssg-prueth: Add multicast filtering support in HSR mode
net: ti: icssg-prueth: Enable HSR Tx duplication, Tx Tag and Rx Tag offload
net: ti: icssg-prueth: Add support for HSR frame forward offload
net: ti: icssg-prueth: Stop hardcoding def_inc
net: ti: icss-iep: Move icss_iep structure
net: ibm: emac: get rid of wol_irq
net: ibm: emac: remove all waiting code
net: ibm: emac: replace of_get_property
net: ibm: emac: use netdev's phydev directly
net: ibm: emac: use devm for register_netdev
net: ibm: emac: remove mii_bus with devm
net: ibm: emac: use devm for of_iomap
net: ibm: emac: manage emac_irq with devm
net: ibm: emac: use devm for alloc_etherdev
octeontx2-af: debugfs: Add Channel info to RPM map
...
|
|
Inspired by[1], modify the code to remove the code of modifying ra to
avoid imbalance RAS (return address stack) which may lead to incorret
predictions on return.
Link: https://lore.kernel.org/linux-riscv/[email protected]/ [1]
Signed-off-by: Jisheng Zhang <[email protected]>
Reviewed-by: Cyril Bur <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Alexandre Ghiti <[email protected]> says:
In RISC-V, after a new mapping is established, a sfence.vma needs to be
emitted for different reasons:
- if the uarch caches invalid entries, we need to invalidate it otherwise
we would trap on this invalid entry,
- if the uarch does not cache invalid entries, a reordered access could fail
to see the new mapping and then trap (sfence.vma acts as a fence).
We can actually avoid emitting those (mostly) useless and costly sfence.vma
by handling the traps instead:
- for new kernel mappings: only vmalloc mappings need to be taken care of,
other new mapping are rare and already emit the required sfence.vma if
needed.
That must be achieved very early in the exception path as explained in
patch 3, and this also fixes our fragile way of dealing with vmalloc faults.
- for new user mappings: Svvptc makes update_mmu_cache() a no-op but we can
take some gratuitous page faults (which are very unlikely though).
Patch 1 and 2 introduce Svvptc extension probing.
On our uarch that does not cache invalid entries and a 6.5 kernel, the
gains are measurable:
* Kernel boot: 6%
* ltp - mmapstress01: 8%
* lmbench - lat_pagefault: 20%
* lmbench - lat_mmap: 5%
Here are the corresponding numbers of sfence.vma emitted:
* Ubuntu boot to login:
Before: ~630k sfence.vma
After: ~200k sfence.vma
* ltp - mmapstress01
Before: ~45k
After: ~6.3k
* lmbench - lat_pagefault
Before: ~665k
After: 832 (!)
* lmbench - lat_mmap
Before: ~546k
After: 718 (!)
Thanks to Ved and Matt Evans for triggering the discussion that led to
this patchset!
* b4-shazam-merge:
riscv: Stop emitting preventive sfence.vma for new userspace mappings with Svvptc
riscv: Stop emitting preventive sfence.vma for new vmalloc mappings
dt-bindings: riscv: Add Svvptc ISA extension description
riscv: Add ISA extension parsing for Svvptc
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
commit 5944ce092b97 (arch_topology: Build cacheinfo from primary CPU)
removed the init_cache_level() function from arch/riscv/kernel/cacheinfo.c
and relies on the init_cpu_topology() function in drivers/base/arch_topology.c
to call fetch_cache_info() which in turn calls init_of_cache_level() to
populate the cache hierarchy information. However, init_cpu_topology() is only
called from smpboot.c:smp_prepare_cpus() and thus only available when
CONFIG_SMP is defined.
To support non-SMP enabled kernels to still detect cache hierarchy, we add back
the init_cache_level() function. The init_level_allocate_ci() function handles
this gracefully on SMP-enabled kernels anyway where fetch_cache_info() is
called from init_cpu_topology() earlier in the boot phase.
Signed-off-by: Steffen Persvold <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Since commit f0bddf50586d ("riscv: entry: Convert to generic entry"),
_TIF_WORK_MASK is no longer used, so remove it.
Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
Signed-off-by: Jinjie Ruan <[email protected]>
Reviewed-by: Guo Ren <[email protected]>
Reviewed-by: Andy Chiu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Jisheng Zhang <[email protected]> says:
commit 76329c693924 ("riscv: Use SYM_*() assembly macros instead
of deprecated ones"), most riscv has been to converted the new style
SYM_ assembler annotations. The remaining one is sifive's
errata_cip_453.S, so convert to new style SYM_ annotations as well.
After that select ARCH_USE_SYM_ANNOTATIONS.
* b4-shazam-merge:
riscv: select ARCH_USE_SYM_ANNOTATIONS
riscv: errata: sifive: Use SYM_*() assembly macros
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Jinjie Ruan <[email protected]> says:
Add RISC-V USER_STACKTRACE support, and fix the fp alignment bug
in perf_callchain_user() by the way as Björn pointed out.
* b4-shazam-merge:
riscv: stacktrace: Add USER_STACKTRACE support
riscv: Fix fp alignment bug in perf_callchain_user()
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
This is used in poison.h for poison pointer offset. Based on current
SV39, SV48 and SV57 vm layout, 0xdead000000000000 is a proper value
that is not mappable, this can avoid potentially turning an oops to
an expolit.
Signed-off-by: Jisheng Zhang <[email protected]>
Fixes: fbe934d69eb7 ("RISC-V: Build Infrastructure")
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Pull kvm fix from Paolo Bonzini:
"Do not always honor guest PAT on CPUs that support self-snoop.
This triggers an issue in the bochsdrm driver, which used ioremap()
instead of ioremap_wc() to map the video RAM.
The revert lets video RAM use the WB memory type instead of the slower
UC memory type"
* tag 'for-linus-6.11' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
Revert "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop"
|
|
Svvptc
The preventive sfence.vma were emitted because new mappings must be made
visible to the page table walker but Svvptc guarantees that it will
happen within a bounded timeframe, so no need to sfence.vma for the uarchs
that implement this extension, we will then take gratuitous (but very
unlikely) page faults, similarly to x86 and arm64.
This allows to drastically reduce the number of sfence.vma emitted:
* Ubuntu boot to login:
Before: ~630k sfence.vma
After: ~200k sfence.vma
* ltp - mmapstress01
Before: ~45k
After: ~6.3k
* lmbench - lat_pagefault
Before: ~665k
After: 832 (!)
* lmbench - lat_mmap
Before: ~546k
After: 718 (!)
Signed-off-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
In 6.5, we removed the vmalloc fault path because that can't work (see
[1] [2]). Then in order to make sure that new page table entries were
seen by the page table walker, we had to preventively emit a sfence.vma
on all harts [3] but this solution is very costly since it relies on IPI.
And even there, we could end up in a loop of vmalloc faults if a vmalloc
allocation is done in the IPI path (for example if it is traced, see
[4]), which could result in a kernel stack overflow.
Those preventive sfence.vma needed to be emitted because:
- if the uarch caches invalid entries, the new mapping may not be
observed by the page table walker and an invalidation may be needed.
- if the uarch does not cache invalid entries, a reordered access
could "miss" the new mapping and traps: in that case, we would actually
only need to retry the access, no sfence.vma is required.
So this patch removes those preventive sfence.vma and actually handles
the possible (and unlikely) exceptions. And since the kernel stacks
mappings lie in the vmalloc area, this handling must be done very early
when the trap is taken, at the very beginning of handle_exception: this
also rules out the vmalloc allocations in the fault path.
Link: https://lore.kernel.org/linux-riscv/[email protected]/ [1]
Link: https://lore.kernel.org/linux-riscv/[email protected] [2]
Link: https://lore.kernel.org/linux-riscv/[email protected]/ [3]
Link: https://lore.kernel.org/lkml/[email protected]/ [4]
Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Yunhui Cui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Add support to parse the Svvptc string in the riscv,isa string.
Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Now, riscv has been converted to the new style SYM_ assembler
annotations. So select ARCH_USE_SYM_ANNOTATIONS to ensure the
deprecated macros such as ENTRY(), END(), WEAK() and so on are not
available and we don't regress.
Signed-off-by: Jisheng Zhang <[email protected]>
Reviewed-By: Clément Léger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
ENTRY()/END() macros are deprecated and we should make use of the
new SYM_*() macros [1] for better annotation of symbols. Replace the
deprecated ones with the new ones.
[1] https://docs.kernel.org/core-api/asm-annotations.html
Signed-off-by: Jisheng Zhang <[email protected]>
Reviewed-By: Clément Léger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Currently, userstacktrace is unsupported for riscv. So use the
perf_callchain_user() code as blueprint to implement the
arch_stack_walk_user() which add userstacktrace support on riscv.
Meanwhile, we can use arch_stack_walk_user() to simplify the implementation
of perf_callchain_user().
A ftrace test case is shown as below:
# cd /sys/kernel/debug/tracing
# echo 1 > options/userstacktrace
# echo 1 > options/sym-userobj
# echo 1 > events/sched/sched_process_fork/enable
# cat trace
......
bash-178 [000] ...1. 97.968395: sched_process_fork: comm=bash pid=178 child_comm=bash child_pid=231
bash-178 [000] ...1. 97.970075: <user stack trace>
=> /lib/libc.so.6[+0xb5090]
Also a simple perf test is ok as below:
# perf record -e cpu-clock --call-graph fp top
# perf report --call-graph
.....
[[31m 66.54%[[m 0.00% top [kernel.kallsyms] [k] ret_from_exception
|
---ret_from_exception
|
|--[[31m58.97%[[m--do_trap_ecall_u
| |
| |--[[31m17.34%[[m--__riscv_sys_read
| | ksys_read
| | |
| | --[[31m16.88%[[m--vfs_read
| | |
| | |--[[31m10.90%[[m--seq_read
Signed-off-by: Jinjie Ruan <[email protected]>
Tested-by: Jinjie Ruan <[email protected]>
Cc: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The standard RISC-V calling convention said:
"The stack grows downward and the stack pointer is always
kept 16-byte aligned".
So perf_callchain_user() should check whether 16-byte aligned for fp.
Link: https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf
Fixes: dbeb90b0c1eb ("riscv: Add perf callchain support")
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
This reverts commit 377b2f359d1f71c75f8cc352b5c81f2210312d83.
This caused a regression with the bochsdrm driver, which used ioremap()
instead of ioremap_wc() to map the video RAM. After the commit, the
WB memory type is used without the IGNORE_PAT, resulting in the slower
UC memory type. In fact, UC is slow enough to basically cause guests
to not boot... but only on new processors such as Sapphire Rapids and
Cascade Lake. Coffee Lake for example works properly, though that might
also be an effect of being on a larger, more NUMA system.
The driver has been fixed but that does not help older guests. Until we
figure out whether Cascade Lake and newer processors are working as
intended, revert the commit. Long term we might add a quirk, but the
details depend on whether the processors are working as intended: for
example if they are, the quirk might reference bochs-compatible devices,
e.g. in the name and documentation, so that userspace can disable the
quirk by default and only leave it enabled if such a device is being
exposed to the guest.
If instead this is actually a bug in CLX+, then the actions we need to
take are different and depend on the actual cause of the bug.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
KVM/riscv changes for 6.12
- Fix sbiret init before forwarding to userspace
- Don't zero-out PMU snapshot area before freeing data
- Allow legacy PMU access from guest
- Fix to allow hpmcounter31 from the guest
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.12
1. Revert qspinlock to test-and-set simple lock on VM.
2. Add Loongson Binary Translation extension support.
3. Add PMU support for guest.
4. Enable paravirt feature control from VMM.
5. Implement function kvm_para_has_feature().
|
|
The original reason for reserving the top 4GiB of the direct map
(space for modules/BPF/kernel) hasn't applied since the address
map was reworked for KASAN.
Signed-off-by: Stuart Menefy <[email protected]>
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The vdso.so.dbg is a debug version of vdso and could be used for debugging
purpose. For example, perf-annotate requires debugging info to show source
lines. So let's keep its debugging info.
Signed-off-by: Changbin Du <[email protected]>
Reviewed-by: Cyril Bur <[email protected]>
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Most of the drivers which used this header have been deleted, most
of these code is obsoleted, move the only defines that are actually
used into arch/powerpc/platforms/chrp/pegasos_eth.c and delete the
file completely.
Signed-off-by: Gaosheng Cui <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Provide the s390 specific vdso getrandom() architecture backend.
_vdso_rng_data required data is placed within the _vdso_data vvar page,
by using a hardcoded offset larger than vdso_data.
As required the chacha20 implementation does not write to the stack.
The implementation follows more or less the arm64 implementations and
makes use of vector instructions. It has a fallback to the getrandom()
system call for machines where the vector facility is not installed.
The check if the vector facility is installed, as well as an
optimization for machines with the vector-enhancements facility 2, is
implemented with alternatives, avoiding runtime checks.
Note that __kernel_getrandom() is implemented without the vdso user
wrapper which would setup a stack frame for odd cases (aka very old
glibc variants) where the caller has not done that. All callers of
__kernel_getrandom() are required to setup a stack frame, like the C ABI
requires it.
The vdso testcases vdso_test_getrandom and vdso_test_chacha pass.
Benchmark on a z16:
$ ./vdso_test_getrandom bench-single
vdso: 25000000 times in 0.493703559 seconds
syscall: 25000000 times in 6.584025337 seconds
Signed-off-by: Heiko Carstens <[email protected]>
Reviewed-by: Harald Freudenberger <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
commit f22ed71cd602 ("sparc32,leon: SRMMU MMU Table probe fix") change
return value from paddrbase to 'pte', but left the varible here.
That causes a build warning for this varible, so we may remove it.
make --keep-going CROSS_COMPILE=/home/alexs/0day/gcc-14.1.0-nolibc/sparc-linux/bin/sparc-linux- --jobs=16 KCFLAGS= -Wtautological-compare -Wno-error=return-type -Wreturn-type -Wcast-function-type -funsigned-char -Wundef -fstrict-flex-arrays=3 -Wformat-overflow -Wformat-truncation -Wrestrict -Wenum-conversion W=1 O=sparc ARCH=sparc defconfig SHELL=/bin/bash arch/sparc/mm/ mm/ -s
<stdin>:1519:2: warning: #warning syscall clone3 not implemented [-Wcpp]
../arch/sparc/mm/leon_mm.c: In function 'leon_swprobe':
../arch/sparc/mm/leon_mm.c:42:32: warning: variable 'paddrbase' set but not used [-Wunused-but-set-variable]
42 | unsigned int lvl, pte, paddrbase;
| ^~~~~~~~~
Signed-off-by: Alex Shi <[email protected]>
To: [email protected]
To: [email protected]
To: Christian Brauner <[email protected]>
To: Andreas Larsson <[email protected]>
To: David S. Miller <[email protected]>
Reviewed-by: Andreas Larsson <[email protected]>
Tested-by: Andreas Larsson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Andreas Larsson <[email protected]>
|
|
The vdso.h header file, which is included at many places, includes
generated header files. This can easily lead to recursive header file
inclusions if the vdso code is changed.
Therefore move the vdso symbol code, which requires the generated
header files, to a separate header file, and include it at the two
locations which require it.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Implement the infrastructure required to allow alternatives in vdso code.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Provide find_section() helper function which can be used to find a
section by name, similar to other architectures.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Let test_facility() generate a branch instruction if the tested facility is
a constant, and where the result cannot be evaluated during compile
time. The branch instruction defaults to "false" and is patched to nop
(branch not taken) if the tested facility is available.
This avoids runtime checks and is similar to x86's static_cpu_has() and
arm64's alternative_has_cap_likely().
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Patch all alternatives which depend on facilities from the decompressor.
There is no technical reason which enforces to split patching of such
alternatives to the decompressor and the kernel.
This simplifies alternative handling a bit, since one alternative type is
removed.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Disable compile time optimizations of test_facility() for the
decompressor. The decompressor should not contain any optimized code
depending on the architecture level set the kernel image is compiled
for to avoid unexpected operation exceptions.
Add a __DECOMPRESSOR check to test_facility() to enforce that
facilities are always checked during runtime for the decompressor.
Reviewed-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Extend getrandom() vDSO implementation to VDSO64.
Tested on QEMU on both ppc64_defconfig and ppc64le_defconfig.
Results from a Power9 (PowerNV):
~ # ./vdso_test_getrandom bench-single
vdso: 25000000 times in 0.787943615 seconds
libc: 25000000 times in 14.101887252 seconds
syscall: 25000000 times in 14.047475082 seconds
Signed-off-by: Christophe Leroy <[email protected]>
Tested-by: Madhavan Srinivasan <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
To be consistent with other VDSO functions, the function is called
__kernel_getrandom()
__arch_chacha20_blocks_nostack() fonction is implemented basically
with 32 bits operations. It performs 4 QUARTERROUND operations in
parallele. There are enough registers to avoid using the stack:
On input:
r3: output bytes
r4: 32-byte key input
r5: 8-byte counter input/output
r6: number of 64-byte blocks to write to output
During operation:
stack: pointer to counter (r5) and non-volatile registers (r14-131)
r0: counter of blocks (initialised with r6)
r4: Value '4' after key has been read, used for indexing
r5-r12: key
r14-r15: block counter
r16-r31: chacha state
At the end:
r0, r6-r12: Zeroised
r5, r14-r31: Restored
Performance on powerpc 885 (using kernel selftest):
~# ./vdso_test_getrandom bench-single
vdso: 25000000 times in 62.938002291 seconds
libc: 25000000 times in 535.581916866 seconds
syscall: 25000000 times in 531.525042806 seconds
Performance on powerpc 8321 (using kernel selftest):
~# ./vdso_test_getrandom bench-single
vdso: 25000000 times in 16.899318858 seconds
libc: 25000000 times in 131.050596522 seconds
syscall: 25000000 times in 129.794790389 seconds
This first patch adds support for VDSO32. As selftests cannot easily
be generated only for VDSO32, and because the following patch brings
support for VDSO64 anyway, this patch opts out all code in
__arch_chacha20_blocks_nostack() so that vdso_test_chacha will not
fail to compile and will not crash on PPC64/PPC64LE, allthough the
selftest itself will fail.
Signed-off-by: Christophe Leroy <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
In order to avoid two much duplication when we add new VDSO
functionnalities in C like getrandom, refactor common CFLAGS.
Signed-off-by: Christophe Leroy <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Commit 08c18b63d965 ("powerpc/vdso32: Add missing _restgpr_31_x to fix
build failure") added _restgpr_31_x to the vdso for gettimeofday, but
the work on getrandom shows that we will need more of those functions.
Remove _restgpr_31_x and link in crtsavres.o so that we get all
save/restore functions when optimising the kernel for size.
Signed-off-by: Christophe Leroy <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
When running in a non-root time namespace, the global VDSO data page
is replaced by a dedicated namespace data page and the global data
page is mapped next to it. Detailed explanations can be found at
commit 660fd04f9317 ("lib/vdso: Prepare for time namespace support").
When it happens, __kernel_get_syscall_map and __kernel_get_tbfreq
and __kernel_sync_dicache don't work anymore because they read 0
instead of the data they need.
To address that, clock_mode has to be read. When it is set to
VDSO_CLOCKMODE_TIMENS, it means it is a dedicated namespace data page
and the global data is located on the following page.
Add a macro called get_realdatapage which reads clock_mode and add
PAGE_SIZE to the pointer provided by get_datapage macro when
clock_mode is equal to VDSO_CLOCKMODE_TIMENS. Use this new macro
instead of get_datapage macro except for time functions as they handle
it internally.
Fixes: 74205b3fc2ef ("powerpc/vdso: Add support for time namespaces")
Reported-by: Jason A. Donenfeld <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Christophe Leroy <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Hook up the generic vDSO implementation to the aarch64 vDSO data page.
The _vdso_rng_data required data is placed within the _vdso_data vvar
page, by using a offset larger than the vdso_data.
The vDSO function requires a ChaCha20 implementation that does not write
to the stack, and that can do an entire ChaCha20 permutation. The one
provided uses NEON on the permute operation, with a fallback to the
syscall for chips that do not support AdvSIMD.
This also passes the vdso_test_chacha test along with
vdso_test_getrandom. The vdso_test_getrandom bench-single result on
Neoverse-N1 shows:
vdso: 25000000 times in 0.783884250 seconds
libc: 25000000 times in 8.780275399 seconds
syscall: 25000000 times in 8.786581518 seconds
A small fixup to arch/arm64/include/asm/mman.h was required to avoid
pulling kernel code into the vDSO, similar to what's already done in
arch/arm64/include/asm/rwonce.h.
Signed-off-by: Adhemerval Zanella <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Currently alternative_has_cap_unlikely() can be used in VDSO code, but
alternative_has_cap_likely() cannot as it references alt_cb_patch_nops,
which is not available when linking the VDSO. This is unfortunate as it
would be useful to have alternative_has_cap_likely() available in VDSO
code.
The use of alt_cb_patch_nops was added in commit:
d926079f17bf8aa4 ("arm64: alternatives: add shared NOP callback")
... as removing duplicate NOPs within the kernel Image saved areasonable
amount of space.
Given the VDSO code will have nowhere near as many alternative branches
as the main kernel image, this isn't much of a concern, and a few extra
nops isn't a massive problem.
Change alternative_has_cap_likely() to only use alt_cb_patch_nops for
the main kernel image, and allow duplicate NOPs in VDSO code.
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Adhemerval Zanella <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Hook up the generic vDSO implementation to the LoongArch vDSO data page
by providing the required __arch_chacha20_blocks_nostack,
__arch_get_k_vdso_rng_data, and getrandom_syscall implementations. Also
wire up the selftests.
Signed-off-by: Xi Ruoyao <[email protected]>
Acked-by: Huacai Chen <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Without a prototype, we'll have to add a prototype for each architecture
implementing vDSO getrandom. As most architectures will likely have the
vDSO getrandom implemented in a near future, and we'd like to keep the
declarations compatible everywhere (to ease the libc implementor work),
we should really just have one copy of the prototype.
This also is what's already done inside of include/vdso/gettime.h for
those vDSO functions, so this continues that convention.
Suggested-by: Huacai Chen <[email protected]>
Signed-off-by: Xi Ruoyao <[email protected]>
Acked-by: Huacai Chen <[email protected]>
[Jason: rewrite docbook comment for prototype.]
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Depending on the architecture, building a 32-bit vDSO on a 64-bit kernel
is problematic when some system headers are included.
Minimise the amount of headers by moving needed items, such as
__{get,put}_unaligned_t, into dedicated common headers and in general
use more specific headers, similar to what was done in commit
8165b57bca21 ("linux/const.h: Extract common header for vDSO") and
commit 8c59ab839f52 ("lib/vdso: Enable common headers").
On some architectures this results in missing PAGE_SIZE, as was
described by commit 8b3843ae3634 ("vdso/datapage: Quick fix - use
asm/page-def.h for ARM64"), so define this if necessary, in the same way
as done prior by commit cffaefd15a8f ("vdso: Use CONFIG_PAGE_SHIFT in
vdso/datapage.h").
Removing linux/time64.h leads to missing 'struct timespec64' in
x86's asm/pvclock.h. Add a forward declaration of that struct in
that file.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
_vdso_data is specific to x86 and __arch_get_k_vdso_data() is provided
so that all architectures can provide the requested pointer.
Do the same with _vdso_rng_data, provide __arch_get_k_vdso_rng_data()
and don't use x86 _vdso_rng_data directly.
Until now vdso/vsyscall.h was only included by time/vsyscall.c but now
it will also be included in char/random.c, leading to a duplicate
declaration of _vdso_data and _vdso_rng_data.
To fix this issue, move the declaration in a C file. vma.c looks like
the most appropriate candidate. We don't need to replace the definitions
in vsyscall.h by declarations as declarations are already in asm/vvar.h.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Having the prototype for __arch_chacha20_blocks_nostack in
arch/x86/include/asm/vdso/getrandom.h meant that the prototype and large
doc comment were cloned by every architecture, which has been causing
unnecessary churn. Instead move it into include/vdso/getrandom.h, where
it can be shared by all archs implementing it.
As a side bonus, this then lets us use that prototype in the
vdso_test_chacha self test, to ensure that it matches the source, and
indeed doing so turned up some inconsistencies, which are rectified
here.
Suggested-by: Christophe Leroy <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
|
|
Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on xtensa.
[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
[ Apply Geert Uytterhoeven feedback. ]
Signed-off-by: Paul E. McKenney <[email protected]>
Tested-by: Yujie Liu <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
|
|
Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on sh.
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
[ paulmck: Apply feedback from Naresh Kamboju. ]
[ Apply Geert Uytterhoeven feedback. ]
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: <[email protected]>
Acked-by: John Paul Adrian Glaubitz <[email protected]>
|
|
Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on arc.
[ paulmck: Drop two-byte support per Arnd Bergmann feedback. ]
[ paulmck: Apply feedback from Naresh Kamboju. ]
[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Apply feedback from Vineet Gupta. ]
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Andi Shyti <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: <[email protected]>
Acked-by: Vineet Gupta <[email protected]>
|
|
When entering the "len & sizeof(u32)" branch, len must be less than 8.
So after one operation, len must be less than 4.
At this time, "len -= sizeof(u32)" is not necessary for 64-bit CPUs.
After that, replace `while' loops with equivalent `for' to make the
code structure a little bit better by the way.
Suggested-by: Maciej W. Rozycki <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Suggested-by: Herbert Xu <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Guan Wentao <[email protected]>
Signed-off-by: WangYuli <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts (sort of) and no adjacent changes.
This merge reverts commit b3c9e65eb227 ("net: hsr: remove seqnr_lock")
from net, as it was superseded by
commit 430d67bdcb04 ("net: hsr: Use the seqnr lock for frames received via interlink port.")
in net-next.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Introduce a Kconfig option for enabling the experimental option to
normalize integer types. This ensures that integer types of the same
size and signedness are considered compatible by the Control Flow
Integrity sanitizer.
The security impact of this flag is minimal. When Sami Tolvanen looked
into it, he found that integer normalization reduced the number of
unique type hashes in the kernel by ~1%, which is acceptable.
This option exists for compatibility with Rust, as C and Rust do not
have the same set of integer types. There are cases where C has two
different integer types of the same size and signedness, but Rust only
has one integer type of that size and signedness. When Rust calls into
C functions using such types in their signature, this results in CFI
failures. One example is 'unsigned long long' and 'unsigned long' which
are both 64-bit on LP64 targets, so on those targets this flag will give
both types the same CFI tag.
This flag changes the ABI heavily. It is not applied automatically when
CONFIG_RUST is turned on to make sure that the CONFIG_RUST option does
not change the ABI of C code. For example, some build may need to make
other changes atomically with toggling this flag. Having it be a
separate option makes it possible to first turn on normalized integer
tags, and then later turn on CONFIG_RUST.
Similarly, when turning on CONFIG_RUST in a build, you may need a few
attempts where the RUST=y commit gets reverted a few times. It is
inconvenient if reverting RUST=y also requires reverting the changes you
made to support normalized integer tags.
To avoid having this flag impact builds that don't care about this, the
next patch in this series will make CONFIG_RUST turn on this option
using `select` rather than `depends on`.
Signed-off-by: Alice Ryhl <[email protected]>
Reviewed-by: Sami Tolvanen <[email protected]>
Tested-by: Gatlin Newhouse <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
|
|
Add all of the flags that are needed to support the shadow call stack
(SCS) sanitizer with Rust, and updates Kconfig to allow only
configurations that work.
The -Zfixed-x18 flag is required to use SCS on arm64, and requires rustc
version 1.80.0 or greater. This restriction is reflected in Kconfig.
When CONFIG_DYNAMIC_SCS is enabled, the build will be configured to
include unwind tables in the build artifacts. Dynamic SCS uses the
unwind tables at boot to find all places that need to be patched. The
-Cforce-unwind-tables=y flag ensures that unwind tables are available
for Rust code.
In non-dynamic mode, the -Zsanitizer=shadow-call-stack flag is what
enables the SCS sanitizer. Using this flag requires rustc version 1.82.0
or greater on the targets used by Rust in the kernel. This restriction
is reflected in Kconfig.
It is possible to avoid the requirement of rustc 1.80.0 by using
-Ctarget-feature=+reserve-x18 instead of -Zfixed-x18. However, this flag
emits a warning during the build, so this patch does not add support for
using it and instead requires 1.80.0 or greater.
The dependency is placed on `select HAVE_RUST` to avoid a situation
where enabling Rust silently turns off the sanitizer. Instead, turning
on the sanitizer results in Rust being disabled. We generally do not
want changes to CONFIG_RUST to result in any mitigations being changed
or turned off.
At the time of writing, rustc 1.82.0 only exists via the nightly release
channel. There is a chance that the -Zsanitizer=shadow-call-stack flag
will end up needing 1.83.0 instead, but I think it is small.
Reviewed-by: Sami Tolvanen <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Alice Ryhl <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Fixed indentation using spaces. - Miguel ]
Signed-off-by: Miguel Ojeda <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- Two fixes for smp_processor_id() calls in preemptible sections: one
if the perf driver, and one in the fence.i prctl.
* tag 'riscv-for-linus-6.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Disable preemption while handling PR_RISCV_CTX_SW_FENCEI_OFF
drivers: perf: Fix smp_processor_id() use in preemptible code
|