Age | Commit message (Collapse) | Author | Files | Lines |
|
Commit 2eab791f940b ("kbuild: dummy-tools: support MPROFILE_KERNEL
checks for ppc") added support for ppc64le's checks for
-mprofile-kernel.
Now, commit aec0ba7472a7 ("powerpc/64: Use -mprofile-kernel for big
endian ELFv2 kernels") added support for -mprofile-kernel even on
big-endian ppc.
So lift the check in gcc-check-mprofile-kernel.sh to support big-endian too.
Fixes: aec0ba7472a7 ("powerpc/64: Use -mprofile-kernel for big endian ELFv2 kernels")
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
When EF10 RXDP firmware is operating in cut-through mode, packet length
is not known at the time the RX prefix is generated, so it is left as
zero and RX event merging is inhibited to ensure that the length is
available in the RX event. However, it has been found that in certain
circumstances the RX events for these packets still get merged,
meaning the driver cannot read the length from the RX event, and tries
to use the length from the prefix.
The resulting zero-length SKBs cause crashes in GRO since commit
1d11fa696733 ("net-gro: remove GRO_DROP"), so add a check to the driver
to detect these zero-length RX events and discard the packet.
Signed-off-by: Edward Cree <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Sriram Yagnaraman says:
====================
Avoid TCP resets when using ECMP for load-balancing between multiple servers.
All packets in the same flow (L3/L4 depending on multipath hash policy)
should be directed to the same target, but after [0]/[1] we see stray
packets directed towards other targets. This, for instance, causes RST
to be sent on TCP connections.
The first two patches solve the problem by ignoring route hints for
destinations that are part of multipath group, by using new SKB flags
for IPv4 and IPv6. The third patch is a selftest that tests the
scenario.
Thanks to Ido, for reviewing and suggesting a way forward in [2] and
also suggesting how to write a selftest for this.
v4->v5:
- Fixed review comments from Ido
v3->v4:
- Remove single path test
- Rebase to latest
v2->v3:
- Add NULL check for skb in fib6_select_path (Ido Schimmel)
- Use fib_tests.sh for selftest instead of the forwarding suite (Ido
Schimmel)
v1->v2:
- Update to commit messages describing the solution (Ido Schimmel)
- Use perf stat to count fib table lookups in selftest (Ido Schimmel)
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
The test uses perf stat to count the number of fib:fib_table_lookup
tracepoint hits for IPv4 and the number of fib6:fib6_table_lookup for
IPv6. The measured count is checked to be within 5% of the total number
of packets sent via veth1.
Signed-off-by: Sriram Yagnaraman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Route hints when the nexthop is part of a multipath group causes packets
in the same receive batch to be sent to the same nexthop irrespective of
the multipath hash of the packet. So, do not extract route hint for
packets whose destination is part of a multipath group.
A new SKB flag IP6SKB_MULTIPATH is introduced for this purpose, set the
flag when route is looked up in fib6_select_path() and use it in
ip6_can_use_hint() to check for the existence of the flag.
Fixes: 197dbf24e360 ("ipv6: introduce and uses route look hints for list input.")
Signed-off-by: Sriram Yagnaraman <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Route hints when the nexthop is part of a multipath group causes packets
in the same receive batch to be sent to the same nexthop irrespective of
the multipath hash of the packet. So, do not extract route hint for
packets whose destination is part of a multipath group.
A new SKB flag IPSKB_MULTIPATH is introduced for this purpose, set the
flag when route is looked up in ip_mkroute_input() and use it in
ip_extract_route_hint() to check for the existence of the flag.
Fixes: 02b24941619f ("ipv4: use dst hint for ipv4 list receive")
Signed-off-by: Sriram Yagnaraman <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions
once per nskb") added the call to zero copy functions in skb_segment().
The change introduced a bug in skb_segment() because skb_orphan_frags()
may possibly change the number of fragments or allocate new fragments
altogether leaving nrfrags and frag to point to the old values. This can
cause a panic with stacktrace like the one below.
[ 193.894380] BUG: kernel NULL pointer dereference, address: 00000000000000bc
[ 193.895273] CPU: 13 PID: 18164 Comm: vh-net-17428 Kdump: loaded Tainted: G O 5.15.123+ #26
[ 193.903919] RIP: 0010:skb_segment+0xb0e/0x12f0
[ 194.021892] Call Trace:
[ 194.027422] <TASK>
[ 194.072861] tcp_gso_segment+0x107/0x540
[ 194.082031] inet_gso_segment+0x15c/0x3d0
[ 194.090783] skb_mac_gso_segment+0x9f/0x110
[ 194.095016] __skb_gso_segment+0xc1/0x190
[ 194.103131] netem_enqueue+0x290/0xb10 [sch_netem]
[ 194.107071] dev_qdisc_enqueue+0x16/0x70
[ 194.110884] __dev_queue_xmit+0x63b/0xb30
[ 194.121670] bond_start_xmit+0x159/0x380 [bonding]
[ 194.128506] dev_hard_start_xmit+0xc3/0x1e0
[ 194.131787] __dev_queue_xmit+0x8a0/0xb30
[ 194.138225] macvlan_start_xmit+0x4f/0x100 [macvlan]
[ 194.141477] dev_hard_start_xmit+0xc3/0x1e0
[ 194.144622] sch_direct_xmit+0xe3/0x280
[ 194.147748] __dev_queue_xmit+0x54a/0xb30
[ 194.154131] tap_get_user+0x2a8/0x9c0 [tap]
[ 194.157358] tap_sendmsg+0x52/0x8e0 [tap]
[ 194.167049] handle_tx_zerocopy+0x14e/0x4c0 [vhost_net]
[ 194.173631] handle_tx+0xcd/0xe0 [vhost_net]
[ 194.176959] vhost_worker+0x76/0xb0 [vhost]
[ 194.183667] kthread+0x118/0x140
[ 194.190358] ret_from_fork+0x1f/0x30
[ 194.193670] </TASK>
In this case calling skb_orphan_frags() updated nr_frags leaving nrfrags
local variable in skb_segment() stale. This resulted in the code hitting
i >= nrfrags prematurely and trying to move to next frag_skb using
list_skb pointer, which was NULL, and caused kernel panic. Move the call
to zero copy functions before using frags and nr_frags.
Fixes: bf5c25d60861 ("skbuff: in skb_segment, call zerocopy functions once per nskb")
Signed-off-by: Mohamed Khalfella <[email protected]>
Reported-by: Amit Goyal <[email protected]>
Cc: [email protected]
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Recent fixes for an embargoed hardware security vulnerability failed to
link with ld.lld (LLVM's linker). [0] To be fair, our documentation
mentions ``CC=clang`` foremost with ``LLVM=1`` being buried "below the
fold."
We want to encourage the use of ``LLVM=1`` rather than just
``CC=clang``. Make that suggestion "above the fold" and "front and
center" in our docs.
While here, the following additional changes were made:
- remove the bit about CROSS_COMPILE setting --target=, that's no longer
true.
- Add ARCH=loongarch to the list of maintained targets (though we're
still working on getting defconfig building cleanly at the moment;
we're pretty close).
- Bump ARCH=powerpc from CC=clang to LLVM=1 status.
- Promote ARCH=riscv from being Maintained to being Supported. Android
is working towards supporting RISC-V, and we have excellent support
from multiple companies in this regard.
- Note that the toolchain distribution on kernel.org has been built with
profile data from kernel builds.
- Note how to use ccache with clang.
Link: https://github.com/ClangBuiltLinux/linux/issues/1907 [0]
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
.llvm.call-graph-profile section is added by clang when the kernel is
built with profiles (e.g. -fprofile-sample-use= or -fprofile-use=).
Note that .llvm.call-graph-profile intentionally uses REL relocations
to decrease the object size, for more details see
https://reviews.llvm.org/D104080.
The section contains edge information derived from text sections,
so .llvm.call-graph-profile itself doesn't need more analysis as
the text sections have been analyzed.
This change fixes the kernel build with clang and a sample profile
which currently fails with:
"FATAL: modpost: Please add code to calculate addend for this architecture"
Signed-off-by: Denis Nikitin <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Fangrui Song <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
The modules_sign target is currently only available for in-tree modules,
but it actually works for external modules as well.
Move the modules_sign rule to the common part.
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nicolas Schier <[email protected]>
|
|
Commit d890f510c8e4 ("MODSIGN: Add modules_sign make target") introduced
'make modules_sign' to manually sign modules.
Some time later, commit d9d8d7ed498e ("MODSIGN: Add option to not sign
modules during modules_install") introduced CONFIG_MODULE_SIG_ALL.
If it was disabled, mod_sign_cmd was set to no-op ('true' command).
It affected not only 'make modules_install' but also 'make modules_sign'.
With CONFIG_MODULE_SIG_ALL=n, neither modules_install nor modules_sign
is able to sign modules.
Kbuild has kept that behavior, and nobody has complained about it, but
I think it is weird.
CONFIG_MODULE_SIG_ALL=n should turn off signing only for modules_install.
If users want to sign modules manually, modules_sign should be offered.
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nicolas Schier <[email protected]>
|
|
Move more relevant code to scripts/Makefile.modinst.
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nicolas Schier <[email protected]>
|
|
Calling 'mkdir' for every module results in redundant syscalls.
Use $(sort ...) to drop the duplicated directories.
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nicolas Schier <[email protected]>
|
|
Eric Dumazet says:
====================
net: another round of data-race annotations
Series inspired by some syzbot reports, taking care
of 4 socket fields that can be read locklessly.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
sk->sk_bind_phc is read locklessly. Add corresponding annotations.
Fixes: d463126e23f1 ("net: sock: extend SO_TIMESTAMPING for PHC binding")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Yangbo Lu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
sk->sk_tsflags can be read locklessly, add corresponding annotations.
Fixes: b9f40e21ef42 ("net-timestamp: move timestamp flags out of sk_flags")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
msk->rmem_fwd_alloc can be read locklessly.
Add mptcp_rmem_fwd_alloc_add(), similar to sk_forward_alloc_add(),
and appropriate READ_ONCE()/WRITE_ONCE() annotations.
Fixes: 6511882cdd82 ("mptcp: allocate fwd memory separately on the rx and tx path")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Every time sk->sk_forward_alloc is read locklessly,
add a READ_ONCE().
Add sk_forward_alloc_add() helper to centralize updates,
to reduce number of WRITE_ONCE().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
inet_sk_diag_fill() has been changed to use sk_forward_alloc_get(),
but sk_get_meminfo() was forgotten.
Fixes: 292e6077b040 ("net: introduce sk_forward_alloc_get()")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We should not call trace_handshake_cmd_done_err() if socket lookup has failed.
Also we should call trace_handshake_cmd_done_err() before releasing the file,
otherwise dereferencing sock->sk can return garbage.
This also reverts 7afc6d0a107f ("net/handshake: Fix uninitialized local variable")
Unable to handle kernel paging request at virtual address dfff800000000003
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
Data abort info:
ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[dfff800000000003] address between user and kernel address ranges
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 5986 Comm: syz-executor292 Not tainted 6.5.0-rc7-syzkaller-gfe4469582053 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : handshake_nl_done_doit+0x198/0x9c8 net/handshake/netlink.c:193
lr : handshake_nl_done_doit+0x180/0x9c8
sp : ffff800096e37180
x29: ffff800096e37200 x28: 1ffff00012dc6e34 x27: dfff800000000000
x26: ffff800096e373d0 x25: 0000000000000000 x24: 00000000ffffffa8
x23: ffff800096e373f0 x22: 1ffff00012dc6e38 x21: 0000000000000000
x20: ffff800096e371c0 x19: 0000000000000018 x18: 0000000000000000
x17: 0000000000000000 x16: ffff800080516cc4 x15: 0000000000000001
x14: 1fffe0001b14aa3b x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000003
x8 : 0000000000000003 x7 : ffff800080afe47c x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff800080a88078
x2 : 0000000000000001 x1 : 00000000ffffffa8 x0 : 0000000000000000
Call trace:
handshake_nl_done_doit+0x198/0x9c8 net/handshake/netlink.c:193
genl_family_rcv_msg_doit net/netlink/genetlink.c:970 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1050 [inline]
genl_rcv_msg+0x96c/0xc50 net/netlink/genetlink.c:1067
netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2549
genl_rcv+0x38/0x50 net/netlink/genetlink.c:1078
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x834/0xb18 net/netlink/af_netlink.c:1914
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x56c/0x840 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmsg+0x26c/0x33c net/socket.c:2577
__do_sys_sendmsg net/socket.c:2586 [inline]
__se_sys_sendmsg net/socket.c:2584 [inline]
__arm64_sys_sendmsg+0x80/0x94 net/socket.c:2584
__invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155
el0_svc+0x58/0x16c arch/arm64/kernel/entry-common.c:678
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Code: 12800108 b90043e8 910062b3 d343fe68 (387b6908)
Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Chuck Lever <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-08-31
We've added 15 non-merge commits during the last 3 day(s) which contain
a total of 17 files changed, 468 insertions(+), 97 deletions(-).
The main changes are:
1) BPF selftest fixes: one flake and one related to clang18 testing,
from Yonghong Song.
2) Fix a d_path BPF selftest failure after fast-forward from Linus'
tree, from Jiri Olsa.
3) Fix a preempt_rt splat in sockmap when using raw_spin_lock_t,
from John Fastabend.
4) Fix a xsk_diag_fill use-after-free race during socket cleanup,
from Magnus Karlsson.
5) Fix xsk_build_skb to address a buggy dereference of an ERR_PTR(),
from Tirthendu Sarkar.
6) Fix a bpftool build warning when compiled with -Wtype-limits,
from Yafang Shao.
7) Several misc fixes and cleanups in standardization docs,
from David Vernet.
8) Fix BPF selftest install to consider no_alu32/cpuv4/bpf-gcc flavors,
from Björn Töpel.
9) Annotate a data race in bpf_long_memcpy for KCSAN, from Daniel Borkmann.
10) Extend documentation with a description for CO-RE relocations,
from Eduard Zingerman.
11) Fix several invalid escape sequence warnings in bpf_doc.py script,
from Vishal Chourasia.
12) Fix the instruction set doc wrt offset of BPF-to-BPF call,
from Will Hawkins.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Include build flavors for install target
bpf: Annotate bpf_long_memcpy with data_race
selftests/bpf: Fix d_path test
bpf, docs: Fix invalid escape sequence warnings in bpf_doc.py
xsk: Fix xsk_diag use-after-free error during socket cleanup
bpf, docs: s/eBPF/BPF in standards documents
bpf, docs: Add abi.rst document to standardization subdirectory
bpf, docs: Move linux-notes.rst to root bpf docs tree
bpf, sockmap: Fix preempt_rt splat when using raw_spin_lock_t
docs/bpf: Add description for CO-RE relocations
bpf, docs: Correct source of offset for program-local call
selftests/bpf: Fix flaky cgroup_iter_sleepable subtest
xsk: Fix xsk_build_skb() error: 'skb' dereferencing possible ERR_PTR()
bpftool: Fix build warnings with -Wtype-limits
bpf: Prevent inlining of bpf_fentry_test7()
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Pull NFS client updates from Anna Schumaker:
"New Features:
- Enable the NFS v4.2 READ_PLUS operation by default
Stable Fixes:
- NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
- NFS: Fix a potential data corruption
Bugfixes:
- Fix various READ_PLUS issues including:
- smatch warnings
- xdr size calculations
- scratch buffer handling
- 32bit / highmem xdr page handling
- Fix checkpatch errors in file.c
- Fix redundant readdir request after an EOF
- Fix handling of COPY ERR_OFFLOAD_NO_REQ
- Fix assignment of xprtdata.cred
Cleanups:
- Remove unused xprtrdma function declarations
- Clean up an integer overflow check to avoid a warning
- Clean up #includes in dns_resolve.c
- Clean up nfs4_get_device_info so we don't pass a NULL pointer
to __free_page()
- Clean up sunrpc TCP socket timeout configuration
- Guard against READDIR loops when entry names are too long
- Use EXCHID4_FLAG_USE_PNFS_DS for DS servers"
* tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (22 commits)
pNFS: Fix assignment of xprtdata.cred
NFSv4.2: fix handling of COPY ERR_OFFLOAD_NO_REQ
NFS: Guard against READDIR loop when entry names exceed MAXNAMELEN
NFSv4.1: use EXCHGID4_FLAG_USE_PNFS_DS for DS server
NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver
SUNRPC: Don't override connect timeouts in rpc_clnt_add_xprt()
SUNRPC: Allow specification of TCP client connect timeout at setup
SUNRPC: Refactor and simplify connect timeout
SUNRPC: Set the TCP_SYNCNT to match the socket timeout
NFS: Fix a potential data corruption
nfs: fix redundant readdir request after get eof
nfs/blocklayout: Use the passed in gfp flags
filemap: Fix errors in file.c
NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
NFS: Move common includes outside ifdef
SUNRPC: clean up integer overflow check
xprtrdma: Remove unused function declaration rpcrdma_bc_post_recv()
NFS: Enable the READ_PLUS operation by default
SUNRPC: kmap() the xdr pages during decode
NFSv4.2: Rework scratch handling for READ_PLUS (again)
...
|
|
Pull nfsd updates from Chuck Lever:
"I'm thrilled to announce that the Linux in-kernel NFS server now
offers NFSv4 write delegations. A write delegation enables a client to
cache data and metadata for a single file more aggressively, reducing
network round trips and server workload. Many thanks to Dai Ngo for
contributing this facility, and to Jeff Layton and Neil Brown for
reviewing and testing it.
This release also sees the removal of all support for DES- and
triple-DES-based Kerberos encryption types in the kernel's SunRPC
implementation. These encryption types have been deprecated by the
Internet community for years and are considered insecure. This change
affects both the in-kernel NFS client and server.
The server's UDP and TCP socket transports have now fully adopted
David Howells' new bio_vec iterator so that no more than one sendmsg()
call is needed to transmit each RPC message. In particular, this helps
kTLS optimize record boundaries when sending RPC-with-TLS replies, and
it takes the server a baby step closer to handling file I/O via
folios.
We've begun work on overhauling the SunRPC thread scheduler to remove
a costly linked-list walk when looking for an idle RPC service thread
to wake. The pre-requisites are included in this release. Thanks to
Neil Brown for his ongoing work on this improvement"
* tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (56 commits)
Documentation: Add missing documentation for EXPORT_OP flags
SUNRPC: Remove unused declaration rpc_modcount()
SUNRPC: Remove unused declarations
NFSD: da_addr_body field missing in some GETDEVICEINFO replies
SUNRPC: Remove return value of svc_pool_wake_idle_thread()
SUNRPC: make rqst_should_sleep() idempotent()
SUNRPC: Clean up svc_set_num_threads
SUNRPC: Count ingress RPC messages per svc_pool
SUNRPC: Deduplicate thread wake-up code
SUNRPC: Move trace_svc_xprt_enqueue
SUNRPC: Add enum svc_auth_status
SUNRPC: change svc_xprt::xpt_flags bits to enum
SUNRPC: change svc_rqst::rq_flags bits to enum
SUNRPC: change svc_pool::sp_flags bits to enum
SUNRPC: change cache_head.flags bits to enum
SUNRPC: remove timeout arg from svc_recv()
SUNRPC: change svc_recv() to return void.
SUNRPC: call svc_process() from svc_recv().
nfsd: separate nfsd_last_thread() from nfsd_put()
nfsd: Simplify code around svc_exit_thread() call in nfsd()
...
|
|
The files fbmem.c, fb_defio.c, fbsysfs.c, fbmon.c, modedb.c, and
fbcmap.c were moved. Drop the path and just keep the file name.
Reported by kalekale in #kernel (Libera IRC).
Fixes: f7018c213502 ("video: move fbdev to drivers/video/fbdev")
Fixes: 19757fc8432a ("fbdev: move fbdev core files to separate directory")
Signed-off-by: Jonathan Neuschäfer <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
The .need_pwm and .need_chargepump fields in struct ssd1307fb_deviceinfo
are flags that can have only two possible values: 0 and 1.
Reduce kernel size by changing their types from int to bool.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Javier Martinez Canillas <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
Avoid those compiler warnings:
neofb.c:1959:3: warning: 'snprintf' will always be truncated;
specified size is 16, but format string expands to at least 17 [-Wfortify-source]
Signed-off-by: Helge Deller <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Reported-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Link: https://lore.kernel.org/all/CAKwvOdn0xoVWjQ6ufM_rojtKb0f1i1hW-J_xYGfKDNFdHwaeHQ@mail.gmail.com/
Link: https://github.com/ClangBuiltLinux/linux/issues/1923
|
|
Pull smb server updates from Steve French:
- fix potential overflows in decoding create and in session setup
requests
- cleanup fixes
- compounding fixes, including one for MacOS compounded read requests
- session setup error handling fix
- fix mode bit bug when applying force_directory_mode and
force_create_mode
- RDMA (smbdirect) write fix
* tag '6.6-rc-ksmbd-fixes-part1' of git://git.samba.org/ksmbd:
ksmbd: add missing calling smb2_set_err_rsp() on error
ksmbd: replace one-element array with flex-array member in struct smb2_ea_info
ksmbd: fix slub overflow in ksmbd_decode_ntlmssp_auth_blob()
ksmbd: fix wrong DataOffset validation of create context
ksmbd: Fix one kernel-doc comment
ksmbd: reduce descriptor size if remaining bytes is less than request size
ksmbd: fix `force create mode' and `force directory mode'
ksmbd: fix wrong interim response on compound
ksmbd: add support for read compound
ksmbd: switch to use kmemdup_nul() helper
|
|
Pull jfs updates from Dave Kleikamp:
"A few small fixes"
* tag 'jfs-6.6' of github.com:kleikamp/linux-shaggy:
jfs: validate max amount of blocks before allocation.
jfs: remove redundant initialization to pointer ip
jfs: fix invalid free of JFS_IP(ipimap)->i_imap in diUnmount
FS: JFS: (trivial) Fix grammatical error in extAlloc
fs/jfs: prevent double-free in dbUnmount() after failed jfs_remount()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Many ext4 and jbd2 cleanups and bug fixes:
- Cleanups in the ext4 remount code when going to and from read-only
- Cleanups in ext4's multiblock allocator
- Cleanups in the jbd2 setup/mounting code paths
- Performance improvements when appending to a delayed allocation file
- Miscellaneous syzbot and other bug fixes"
* tag 'ext4_for_linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits)
ext4: fix slab-use-after-free in ext4_es_insert_extent()
libfs: remove redundant checks of s_encoding
ext4: remove redundant checks of s_encoding
ext4: reject casefold inode flag without casefold feature
ext4: use LIST_HEAD() to initialize the list_head in mballoc.c
ext4: do not mark inode dirty every time when appending using delalloc
ext4: rename s_error_work to s_sb_upd_work
ext4: add periodic superblock update check
ext4: drop dio overwrite only flag and associated warning
ext4: add correct group descriptors and reserved GDT blocks to system zone
ext4: remove unused function declaration
ext4: mballoc: avoid garbage value from err
ext4: use sbi instead of EXT4_SB(sb) in ext4_mb_new_blocks_simple()
ext4: change the type of blocksize in ext4_mb_init_cache()
ext4: fix unttached inode after power cut with orphan file feature enabled
jbd2: correct the end of the journal recovery scan range
ext4: ext4_get_{dev}_journal return proper error value
ext4: cleanup ext4_get_dev_journal() and ext4_get_journal()
jbd2: jbd2_journal_init_{dev,inode} return proper error return value
jbd2: drop useless error tag in jbd2_journal_wipe()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
- Allow blocking posix lock requests to be interrupted while waiting.
This requires a cancel request to be sent to the userspace daemon
where posix lock requests are processed across the cluster.
- Fix a posix lock patch from the previous cycle in which lock requests
from different file systems could be mixed up.
- Fix some long standing problems with nfs posix lock cancelation.
- Add a new debugfs file for printing queued callbacks.
- Stop modifying buffers that have been used to receive a message.
- Misc cleanups and some refactoring.
* tag 'dlm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: fix plock lookup when using multiple lockspaces
fs: dlm: don't use RCOM_NAMES for version detection
fs: dlm: create midcomms nodes when configure
fs: dlm: constify receive buffer
fs: dlm: drop rxbuf manipulation in dlm_recover_master_copy
fs: dlm: drop rxbuf manipulation in dlm_copy_master_names
fs: dlm: get recovery sequence number as parameter
fs: dlm: cleanup lock order
fs: dlm: remove clear_members_cb
fs: dlm: add plock dev tracepoints
fs: dlm: check on plock ops when exit dlm
fs: dlm: debugfs for queued callbacks
fs: dlm: remove unused processed_nodes
fs: dlm: add missing spin_unlock
fs: dlm: fix F_CANCELLK to cancel pending request
fs: dlm: allow to F_SETLKW getting interrupted
fs: dlm: remove twice newline
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull more superblock follow-on fixes from Christian Brauner:
"This contains two more small follow-up fixes for the super work this
cycle. I went through all filesystems once more and detected two minor
issues that still needed fixing:
- Some filesystems support mtd devices (e.g., mount -t jffs2 mtd2
/mnt). The mtd infrastructure uses the sb->s_mtd pointer to find an
existing superblock. When the mtd device is put and sb->s_mtd
cleared the superblock can still be found fs_supers and so this
risks a use-after-free.
Add a small patch that aligns mtd with what we did for regular
block devices and switch keying to rely on sb->s_dev.
(This was tested with mtd devices and jffs2 as xfstests doesn't
support mtd devices.)
- Switch nfs back to rely on kill_anon_super() so the superblock is
removed from the list of active supers before sb->s_fs_info is
freed"
* tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
NFS: switch back to using kill_anon_super
mtd: key superblock by device number
fs: export sget_dev()
|
|
Fix up missed semantic mis-merge between commits
161e393c0f63 ("mm: Make pte_mkwrite() take a VMA")
27af67f35631 ("powerpc/book3s64/mm: enable transparent pud hugepage")
where the newly introduced powerpc use of 'pte_mkwrite()' needs to use
the 'novma()' versions as per commit 2f0584f3f4bd ("mm: Rename arch
pte_mkwrite()'s to pte_mkwrite_novma()").
Fixes: df57721f9a63 ("Merge tag 'x86_shstk_for_6.6-rc1' of [...]")
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When XCR0[9] is set, PKRU can be read and written from userspace with
XSAVE and XRSTOR, even when CR4.PKE is clear.
Clear XCR0[9] when protection keys are disabled.
Reported-by: Tavis Ormandy <[email protected]>
Signed-off-by: Jim Mattson <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Commit 1756ddea6916 ("pstore: Remove worst-case compression size logic")
removed some clunky per-algorithm worst case size estimation routines on
the basis that we can always store pstore records uncompressed, and
these worst case estimations are about how much the size might
inadvertently *increase* due to encapsulation overhead when the input
cannot be compressed at all. So if compression results in a size
increase, we just store the original data instead.
However, it seems that the original code was misinterpreting these
calculations as an estimation of how much uncompressed data might fit
into a compressed buffer of a given size, and it was using the results
to consume the input data in larger chunks than the pstore record size,
relying on the compression to ensure that what ultimately gets stored
fits into the available space.
One result of this, as observed and reported by Linus, is that upgrading
to a newer kernel that includes the given commit may result in pstore
decompression errors reported in the kernel log. This is due to the fact
that the existing records may unexpectedly decompress to a size that is
larger than the pstore record size.
Another potential problem caused by this change is that we may
underutilize the fixed sized records on pstore backends such as ramoops.
And on pstore backends with variable sized records such as EFI, we will
end up creating many more entries than before to store the same amount
of compressed data.
So let's fix both issues, by bringing back the typical case estimation of
how much ASCII text captured from the dmesg log might fit into a pstore
record of a given size after compression. The original implementation
used the computation given below for zlib:
switch (size) {
/* buffer range for efivars */
case 1000 ... 2000:
cmpr = 56;
break;
case 2001 ... 3000:
cmpr = 54;
break;
case 3001 ... 3999:
cmpr = 52;
break;
/* buffer range for nvram, erst */
case 4000 ... 10000:
cmpr = 45;
break;
default:
cmpr = 60;
break;
}
return (size * 100) / cmpr;
We will use the previous worst-case of 60% for compression. For
decompression go extra large (3x) so we make sure there's enough space
for anything.
While at it, rate limit the error message so we don't flood the log
unnecessarily on systems that have accumulated a lot of pstore history.
Cc: Linus Torvalds <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Herbert Xu <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Co-developed-by: Kees Cook <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
|
|
The mx3fb driver does not support devicetree and i.MX has been converted
to a DT-only platform since kernel 5.10.
As there is no user for this driver anymore, just remove it.
Signed-off-by: Fabio Estevam <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
Convert list_for_each() to list_for_each_entry() so that the pos
list_head pointer and list_entry() call are no longer needed, which
can reduce a few lines of code. No functional changed.
Signed-off-by: Jinjie Ruan <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
When using the "install" or targets depending on install, e.g. "gen_tar",
the BPF machine flavors weren't included.
A command like:
| make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- O=/workspace/kbuild \
| HOSTCC=gcc FORMAT= SKIP_TARGETS="arm64 ia64 powerpc sparc64 x86 sgx" \
| -C tools/testing/selftests gen_tar
would not include bpf/no_alu32, bpf/cpuv4, or bpf/bpf-gcc.
Include the BPF machine flavors for "install" make target.
Signed-off-by: Björn Töpel <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
syzbot reported a data race splat between two processes trying to
update the same BPF map value via syscall on different CPUs:
BUG: KCSAN: data-race in bpf_percpu_array_update / bpf_percpu_array_update
write to 0xffffe8fffe7425d8 of 8 bytes by task 8257 on cpu 1:
bpf_long_memcpy include/linux/bpf.h:428 [inline]
bpf_obj_memcpy include/linux/bpf.h:441 [inline]
copy_map_value_long include/linux/bpf.h:464 [inline]
bpf_percpu_array_update+0x3bb/0x500 kernel/bpf/arraymap.c:380
bpf_map_update_value+0x190/0x370 kernel/bpf/syscall.c:175
generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1749
bpf_map_do_batch+0x2df/0x3d0 kernel/bpf/syscall.c:4648
__sys_bpf+0x28a/0x780
__do_sys_bpf kernel/bpf/syscall.c:5241 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5239 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5239
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
write to 0xffffe8fffe7425d8 of 8 bytes by task 8268 on cpu 0:
bpf_long_memcpy include/linux/bpf.h:428 [inline]
bpf_obj_memcpy include/linux/bpf.h:441 [inline]
copy_map_value_long include/linux/bpf.h:464 [inline]
bpf_percpu_array_update+0x3bb/0x500 kernel/bpf/arraymap.c:380
bpf_map_update_value+0x190/0x370 kernel/bpf/syscall.c:175
generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1749
bpf_map_do_batch+0x2df/0x3d0 kernel/bpf/syscall.c:4648
__sys_bpf+0x28a/0x780
__do_sys_bpf kernel/bpf/syscall.c:5241 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5239 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5239
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x0000000000000000 -> 0xfffffff000002788
The bpf_long_memcpy is used with 8-byte aligned pointers, power-of-8 size
and forced to use long read/writes to try to atomically copy long counters.
It is best-effort only and no barriers are here since it _will_ race with
concurrent updates from BPF programs. The bpf_long_memcpy() is called from
bpf(2) syscall. Marco suggested that the best way to make this known to
KCSAN would be to use data_race() annotation.
Reported-by: [email protected]
Suggested-by: Marco Elver <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Marco Elver <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Link: https://lore.kernel.org/bpf/57628f7a15e20d502247c3b55fceb1cb2b31f266.1693342186.git.daniel@iogearbox.net
|
|
Pull ARM updates from Russell King:
- Refactor VFP code and convert to C code (Ard Biesheuvel)
- Fix hardware breakpoint single-stepping using bpf_overflow_handler
- Make SMP stop calls asynchronous allowing panic from irq context to
work
- Fix for kernel-doc warnings for locomo
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
Revert part of ae1f8d793a19 ("ARM: 9304/1: add prototype for function called only from asm")
ARM: 9318/1: locomo: move kernel-doc to prevent warnings
ARM: 9317/1: kexec: Make smp stop calls asynchronous
ARM: 9316/1: hw_breakpoint: fix single-stepping when using bpf_overflow_handler
ARM: entry: Make asm coproc dispatch code NWFPE only
ARM: iwmmxt: Use undef hook to enable coprocessor for task
ARM: entry: Disregard Thumb undef exception in coproc dispatch
ARM: vfp: Use undef hook for handling VFP exceptions
ARM: kernel: Get rid of thread_info::used_cp[] array
ARM: vfp: Reimplement VFP exception entry in C code
ARM: vfp: Remove workaround for Feroceon CPUs
ARM: vfp: Record VFP bounces as perf emulation faults
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the
configured SMT state when hotplugging CPUs into the system
- Combine final TLB flush and lazy TLB mm shootdown IPIs when using the
Radix MMU to avoid a broadcast TLBIE flush on exit
- Drop the exclusion between ptrace/perf watchpoints, and drop the now
unused associated arch hooks
- Add support for the "nohlt" command line option to disable CPU idle
- Add support for -fpatchable-function-entry for ftrace, with GCC >=
13.1
- Rework memory block size determination, and support 256MB size on
systems with GPUs that have hotpluggable memory
- Various other small features and fixes
Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam
Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel
Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof
Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar,
Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor,
Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar
Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh
Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain,
Xiongfeng Wang, Yuan Tan, Zhang Rui, and Zheng Zengkai.
* tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (135 commits)
macintosh/ams: linux/platform_device.h is needed
powerpc/xmon: Reapply "Relax frame size for clang"
powerpc/mm/book3s64: Use 256M as the upper limit with coherent device memory attached
powerpc/mm/book3s64: Fix build error with SPARSEMEM disabled
powerpc/iommu: Fix notifiers being shared by PCI and VIO buses
powerpc/mpc5xxx: Add missing fwnode_handle_put()
powerpc/config: Disable SLAB_DEBUG_ON in skiroot
powerpc/pseries: Remove unused hcall tracing instruction
powerpc/pseries: Fix hcall tracepoints with JUMP_LABEL=n
powerpc: dts: add missing space before {
powerpc/eeh: Use pci_dev_id() to simplify the code
powerpc/64s: Move CPU -mtune options into Kconfig
powerpc/powermac: Fix unused function warning
powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT
powerpc: Don't include lppaca.h in paca.h
powerpc/pseries: Move hcall_vphn() prototype into vphn.h
powerpc/pseries: Move VPHN constants into vphn.h
cxl: Drop unused detach_spa()
powerpc: Drop zalloc_maybe_bootmem()
powerpc/powernv: Use struct opal_prd_msg in more places
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 shadow stack support from Dave Hansen:
"This is the long awaited x86 shadow stack support, part of Intel's
Control-flow Enforcement Technology (CET).
CET consists of two related security features: shadow stacks and
indirect branch tracking. This series implements just the shadow stack
part of this feature, and just for userspace.
The main use case for shadow stack is providing protection against
return oriented programming attacks. It works by maintaining a
secondary (shadow) stack using a special memory type that has
protections against modification. When executing a CALL instruction,
the processor pushes the return address to both the normal stack and
to the special permission shadow stack. Upon RET, the processor pops
the shadow stack copy and compares it to the normal stack copy.
For more information, refer to the links below for the earlier
versions of this patch set"
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/lkml/[email protected]/
* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
x86/shstk: Change order of __user in type
x86/ibt: Convert IBT selftest to asm
x86/shstk: Don't retry vm_munmap() on -EINTR
x86/kbuild: Fix Documentation/ reference
x86/shstk: Move arch detail comment out of core mm
x86/shstk: Add ARCH_SHSTK_STATUS
x86/shstk: Add ARCH_SHSTK_UNLOCK
x86: Add PTRACE interface for shadow stack
selftests/x86: Add shadow stack test
x86/cpufeatures: Enable CET CR4 bit for shadow stack
x86/shstk: Wire in shadow stack interface
x86: Expose thread features in /proc/$PID/status
x86/shstk: Support WRSS for userspace
x86/shstk: Introduce map_shadow_stack syscall
x86/shstk: Check that signal frame is shadow stack mem
x86/shstk: Check that SSP is aligned on sigreturn
x86/shstk: Handle signals for shadow stack
x86/shstk: Introduce routines modifying shstk
x86/shstk: Handle thread shadow stack
x86/shstk: Add user-mode shadow stack support
...
|
|
Fix this warning:
arch/x86/kernel/i8259.c:235: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* ELCR registers (0x4d0, 0x4d1) control edge/level of IRQ
CC arch/x86/kernel/irqinit.o
Signed-off-by: Vincenzo Palazzo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The Gather Data Sampling (GDS) vulnerability is common to all Skylake
processors. However, the "client" Skylakes* are now in this list:
https://www.intel.com/content/www/us/en/support/articles/000022396/processors.html
which means they are no longer included for new vulnerabilities here:
https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html
or in other GDS documentation. Thus, they were not included in the
original GDS mitigation patches.
Mark SKYLAKE and SKYLAKE_L as vulnerable to GDS to match all the
other Skylake CPUs (which include Kaby Lake). Also group the CPUs
so that the ones that share the exact same vulnerabilities are next
to each other.
Last, move SRBDS to the end of each line. This makes it clear at a
glance that SKYLAKE_X is unique. Of the five Skylakes, it is the
only "server" CPU and has a different implementation from the
clients of the "special register" hardware, making it immune to SRBDS.
This makes the diff much harder to read, but the resulting table is
worth it.
I very much appreciate the report from Michael Zhivich about this
issue. Despite what level of support a hardware vendor is providing,
the kernel very much needs an accurate and up-to-date list of
vulnerable CPUs. More reports like this are very welcome.
* Client Skylakes are CPUID 406E3/506E3 which is family 6, models
0x4E and 0x5E, aka INTEL_FAM6_SKYLAKE and INTEL_FAM6_SKYLAKE_L.
Reported-by: Michael Zhivich <[email protected]>
Fixes: 8974eb588283 ("x86/speculation: Add Gather Data Sampling mitigation")
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Daniel Sneddon <[email protected]>
Cc: Linus Torvalds <[email protected]>
|
|
Explicitly include mmu.h in spte.h instead of relying on the "parent" to
include mmu.h. spte.h references a variety of macros and variables that
are defined/declared in mmu.h, and so including spte.h before (or instead
of) mmu.h will result in build errors, e.g.
arch/x86/kvm/mmu/spte.h: In function ‘is_mmio_spte’:
arch/x86/kvm/mmu/spte.h:242:23: error: ‘enable_mmio_caching’ undeclared
242 | likely(enable_mmio_caching);
| ^~~~~~~~~~~~~~~~~~~
arch/x86/kvm/mmu/spte.h: In function ‘is_large_pte’:
arch/x86/kvm/mmu/spte.h:302:22: error: ‘PT_PAGE_SIZE_MASK’ undeclared
302 | return pte & PT_PAGE_SIZE_MASK;
| ^~~~~~~~~~~~~~~~~
arch/x86/kvm/mmu/spte.h: In function ‘is_dirty_spte’:
arch/x86/kvm/mmu/spte.h:332:56: error: ‘PT_WRITABLE_MASK’ undeclared
332 | return dirty_mask ? spte & dirty_mask : spte & PT_WRITABLE_MASK;
| ^~~~~~~~~~~~~~~~
Fixes: 5a9624affe7c ("KVM: mmu: extract spte.h and spte.c")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When attempting to allocate a shadow root for a !visible guest root gfn,
e.g. that resides in MMIO space, load a dummy root that is backed by the
zero page instead of immediately synthesizing a triple fault shutdown
(using the zero page ensures any attempt to translate memory will generate
a !PRESENT fault and thus VM-Exit).
Unless the vCPU is racing with memslot activity, KVM will inject a page
fault due to not finding a visible slot in FNAME(walk_addr_generic), i.e.
the end result is mostly same, but critically KVM will inject a fault only
*after* KVM runs the vCPU with the bogus root.
Waiting to inject a fault until after running the vCPU fixes a bug where
KVM would bail from nested VM-Enter if L1 tried to run L2 with TDP enabled
and a !visible root. Even though a bad root will *probably* lead to
shutdown, (a) it's not guaranteed and (b) the CPU won't read the
underlying memory until after VM-Enter succeeds. E.g. if L1 runs L2 with
a VMX preemption timer value of '0', then architecturally the preemption
timer VM-Exit is guaranteed to occur before the CPU executes any
instruction, i.e. before the CPU needs to translate a GPA to a HPA (so
long as there are no injected events with higher priority than the
preemption timer).
If KVM manages to get to FNAME(fetch) with a dummy root, e.g. because
userspace created a memslot between installing the dummy root and handling
the page fault, simply unload the MMU to allocate a new root and retry the
instruction. Use KVM_REQ_MMU_FREE_OBSOLETE_ROOTS to drop the root, as
invoking kvm_mmu_free_roots() while holding mmu_lock would deadlock, and
conceptually the dummy root has indeeed become obsolete. The only
difference versus existing usage of KVM_REQ_MMU_FREE_OBSOLETE_ROOTS is
that the root has become obsolete due to memslot *creation*, not memslot
deletion or movement.
Reported-by: Reima Ishii <[email protected]>
Cc: Yu Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Explicitly inject a page fault if guest attempts to use a !visible gfn
as a page table. kvm_vcpu_gfn_to_hva_prot() will naturally handle the
case where there is no memslot, but doesn't catch the scenario where the
gfn points at a KVM-internal memslot.
Letting the guest backdoor its way into accessing KVM-internal memslots
isn't dangerous on its own, e.g. at worst the guest can crash itself, but
disallowing the behavior will simplify fixing how KVM handles !visible
guest root gfns (immediately synthesizing a triple fault when loading the
root is architecturally wrong).
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Explicitly check that tdp_iter_start() is handed a valid shadow page
to harden KVM against bugs, e.g. if KVM calls into the TDP MMU with an
invalid or shadow MMU root (which would be a fatal KVM bug), the shadow
page pointer will be NULL.
Opportunistically stop the TDP MMU iteration instead of continuing on
with garbage if the incoming root is bogus. Attempting to walk a garbage
root is more likely to caused major problems than doing nothing.
Cc: Yu Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Harden kvm_mmu_new_pgd() against NULL pointer dereference bugs by sanity
checking that the target root has an associated shadow page prior to
dereferencing said shadow page. The code in question is guaranteed to
only see roots with shadow pages as fast_pgd_switch() explicitly frees the
current root if it doesn't have a shadow page, i.e. is a PAE root, and
that in turn prevents valid roots from being cached, but that's all very
subtle.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add a dedicated helper for converting a root hpa to a shadow page in
anticipation of using a "dummy" root to handle the scenario where KVM
needs to load a valid shadow root (from hardware's perspective), but
the guest doesn't have a visible root to shadow. Similar to PAE roots,
the dummy root won't have an associated kvm_mmu_page and will need special
handling when finding a shadow page given a root.
Opportunistically retrieve the root shadow page in kvm_mmu_sync_roots()
*after* verifying the root is unsync (the dummy root can never be unsync).
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Open code gpa_to_gfn() in kvmgt_page_track_write() and drop KVMGT's
dependency on kvm_host.h, i.e. include only kvm_page_track.h. KVMGT
assumes "gfn == gpa >> PAGE_SHIFT" all over the place, including a few
lines below in the same function with the same gpa, i.e. there's no
reason to use KVM's helper for this one case.
No functional change intended.
Reviewed-by: Yan Zhao <[email protected]>
Tested-by: Yongwei Ma <[email protected]>
Reviewed-by: Zhi Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|