aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2024-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski23-51/+111
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/[email protected]/ No adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02Merge tag 'net-6.9-rc7' of ↵Linus Torvalds22-51/+110
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf. Relatively calm week, likely due to public holiday in most places. No known outstanding regressions. Current release - regressions: - rxrpc: fix wrong alignmask in __page_frag_alloc_align() - eth: e1000e: change usleep_range to udelay in PHY mdic access Previous releases - regressions: - gro: fix udp bad offset in socket lookup - bpf: fix incorrect runtime stat for arm64 - tipc: fix UAF in error path - netfs: fix a potential infinite loop in extract_user_to_sg() - eth: ice: ensure the copied buf is NUL terminated - eth: qeth: fix kernel panic after setting hsuid Previous releases - always broken: - bpf: - verifier: prevent userspace memory access - xdp: use flags field to disambiguate broadcast redirect - bridge: fix multicast-to-unicast with fraglist GSO - mptcp: ensure snd_nxt is properly initialized on connect - nsh: fix outer header access in nsh_gso_segment(). - eth: bcmgenet: fix racing registers access - eth: vxlan: fix stats counters. Misc: - a bunch of MAINTAINERS file updates" * tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) MAINTAINERS: mark MYRICOM MYRI-10G as Orphan MAINTAINERS: remove Ariel Elior net: gro: add flush check in udp_gro_receive_segment net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb ipv4: Fix uninit-value access in __ip_make_skb() s390/qeth: Fix kernel panic after setting hsuid vxlan: Pull inner IP header in vxlan_rcv(). tipc: fix a possible memleak in tipc_buf_append tipc: fix UAF in error path rxrpc: Clients must accept conn from any address net: core: reject skb_copy(_expand) for fraglist GSO skbs net: bridge: fix multicast-to-unicast with fraglist GSO mptcp: ensure snd_nxt is properly initialized on connect e1000e: change usleep_range to udelay in PHY mdic access net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341 cxgb4: Properly lock TX queue for the selftest. rxrpc: Fix using alignmask being zero for __page_frag_alloc_align() vxlan: Add missing VNI filter counter update in arp_reduce(). vxlan: Fix racy device stats updates. net: qede: use return from qede_parse_actions() ...
2024-05-02net/sched: unregister lockdep keys in qdisc_create/qdisc_alloc error pathDavide Caratti2-0/+2
Naresh and Eric report several errors (corrupted elements in the dynamic key hash list), when running tdc.py or syzbot. The error path of qdisc_alloc() and qdisc_create() frees the qdisc memory, but it forgets to unregister the lockdep key, thus causing use-after-free like the following one: ================================================================== BUG: KASAN: slab-use-after-free in lockdep_register_key+0x5f2/0x700 Read of size 8 at addr ffff88811236f2a8 by task ip/7925 CPU: 26 PID: 7925 Comm: ip Kdump: loaded Not tainted 6.9.0-rc2+ #648 Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0 07/26/2013 Call Trace: <TASK> dump_stack_lvl+0x7c/0xc0 print_report+0xc9/0x610 kasan_report+0x89/0xc0 lockdep_register_key+0x5f2/0x700 qdisc_alloc+0x21d/0xb60 qdisc_create_dflt+0x63/0x3c0 attach_one_default_qdisc.constprop.37+0x8e/0x170 dev_activate+0x4bd/0xc30 __dev_open+0x275/0x380 __dev_change_flags+0x3f1/0x570 dev_change_flags+0x7c/0x160 do_setlink+0x1ea1/0x34b0 __rtnl_newlink+0x8c9/0x1510 rtnl_newlink+0x61/0x90 rtnetlink_rcv_msg+0x2f0/0xbc0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x420/0x630 netlink_sendmsg+0x732/0xbc0 __sock_sendmsg+0x1ea/0x280 ____sys_sendmsg+0x5a9/0x990 ___sys_sendmsg+0xf1/0x180 __sys_sendmsg+0xd3/0x180 do_syscall_64+0x96/0x180 entry_SYSCALL_64_after_hwframe+0x71/0x79 RIP: 0033:0x7f9503f4fa07 Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007fff6c729068 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000006630c681 RCX: 00007f9503f4fa07 RDX: 0000000000000000 RSI: 00007fff6c7290d0 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000078 R10: 000000000000009b R11: 0000000000000246 R12: 0000000000000001 R13: 00007fff6c729180 R14: 0000000000000000 R15: 000055bf67dd9040 </TASK> Allocated by task 7745: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 __kasan_kmalloc+0x7b/0x90 __kmalloc_node+0x1ff/0x460 qdisc_alloc+0xae/0xb60 qdisc_create+0xdd/0xfb0 tc_modify_qdisc+0x37e/0x1960 rtnetlink_rcv_msg+0x2f0/0xbc0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x420/0x630 netlink_sendmsg+0x732/0xbc0 __sock_sendmsg+0x1ea/0x280 ____sys_sendmsg+0x5a9/0x990 ___sys_sendmsg+0xf1/0x180 __sys_sendmsg+0xd3/0x180 do_syscall_64+0x96/0x180 entry_SYSCALL_64_after_hwframe+0x71/0x79 Freed by task 7745: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x36/0x60 __kasan_slab_free+0xfe/0x180 kfree+0x113/0x380 qdisc_create+0xafb/0xfb0 tc_modify_qdisc+0x37e/0x1960 rtnetlink_rcv_msg+0x2f0/0xbc0 netlink_rcv_skb+0x120/0x380 netlink_unicast+0x420/0x630 netlink_sendmsg+0x732/0xbc0 __sock_sendmsg+0x1ea/0x280 ____sys_sendmsg+0x5a9/0x990 ___sys_sendmsg+0xf1/0x180 __sys_sendmsg+0xd3/0x180 do_syscall_64+0x96/0x180 entry_SYSCALL_64_after_hwframe+0x71/0x79 Fix this ensuring that lockdep_unregister_key() is called before the qdisc struct is freed, also in the error path of qdisc_create() and qdisc_alloc(). Fixes: af0cb3fa3f9e ("net/sched: fix false lockdep warning on qdisc root lock") Reported-by: Linux Kernel Functional Testing <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Davide Caratti <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Tested-by: Ido Schimmel <[email protected]> Link: https://lore.kernel.org/r/2aa1ca0c0a3aa0acc15925c666c777a4b5de553c.1714496886.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-02net: gro: add flush check in udp_gro_receive_segmentRichard Gobert1-1/+11
GRO-GSO path is supposed to be transparent and as such L3 flush checks are relevant to all UDP flows merging in GRO. This patch uses the same logic and code from tcp_gro_receive, terminating merge if flush is non zero. Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Signed-off-by: Richard Gobert <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-05-02net: gro: fix udp bad offset in socket lookup by adding ↵Richard Gobert8-4/+13
{inner_}network_offset to napi_gro_cb Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp: additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the complete phase of gro. The functions always return skb->network_header, which in the case of encapsulated packets at the gro complete phase, is always set to the innermost L3 of the packet. That means that calling {ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_ L3/L4 may return an unexpected value. This incorrect usage leads to a bug in GRO's UDP socket lookup. udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These *_hdr functions return network_header which will point to the innermost L3, resulting in the wrong offset being used in __udp{4,6}_lib_lookup with encapsulated packets. This patch adds network_offset and inner_network_offset to napi_gro_cb, and makes sure both are set correctly. To fix the issue, network_offsets union is used inside napi_gro_cb, in which both the outer and the inner network offsets are saved. Reproduction example: Endpoint configuration example (fou + local address bind) # ip fou add port 6666 ipproto 4 # ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip # ip link set tun1 up # ip a add 1.1.1.2/24 dev tun1 Netperf TCP_STREAM result on net-next before patch is applied: net-next main, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.28 2.37 net-next main, GRO disabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2745.06 patch applied, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2877.38 Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Signed-off-by: Richard Gobert <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-05-02ipv4: Fix uninit-value access in __ip_make_skb()Shigeru Yoshida2-1/+4
KMSAN reported uninit-value access in __ip_make_skb() [1]. __ip_make_skb() tests HDRINCL to know if the skb has icmphdr. However, HDRINCL can cause a race condition. If calling setsockopt(2) with IP_HDRINCL changes HDRINCL while __ip_make_skb() is running, the function will access icmphdr in the skb even if it is not included. This causes the issue reported by KMSAN. Check FLOWI_FLAG_KNOWN_NH on fl4->flowi4_flags instead of testing HDRINCL on the socket. Also, fl4->fl4_icmp_type and fl4->fl4_icmp_code are not initialized. These are union in struct flowi4 and are implicitly initialized by flowi4_init_output(), but we should not rely on specific union layout. Initialize these explicitly in raw_sendmsg(). [1] BUG: KMSAN: uninit-value in __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481 __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481 ip_finish_skb include/net/ip.h:243 [inline] ip_push_pending_frames+0x4c/0x5c0 net/ipv4/ip_output.c:1508 raw_sendmsg+0x2381/0x2690 net/ipv4/raw.c:654 inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x274/0x3c0 net/socket.c:745 __sys_sendto+0x62c/0x7b0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x130/0x200 net/socket.c:2199 do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x5f6/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:577 __alloc_skb+0x35a/0x7c0 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1318 [inline] __ip_append_data+0x49ab/0x68c0 net/ipv4/ip_output.c:1128 ip_append_data+0x1e7/0x260 net/ipv4/ip_output.c:1365 raw_sendmsg+0x22b1/0x2690 net/ipv4/raw.c:648 inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x274/0x3c0 net/socket.c:745 __sys_sendto+0x62c/0x7b0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x130/0x200 net/socket.c:2199 do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75 CPU: 1 PID: 15709 Comm: syz-executor.7 Not tainted 6.8.0-11567-gb3603fcb79b1 #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 Fixes: 99e5acae193e ("ipv4: Fix potential uninit variable access bug in __ip_make_skb()") Reported-by: syzkaller <[email protected]> Signed-off-by: Shigeru Yoshida <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-05-01net: dsa: Remove adjust_link pathsFlorian Fainelli2-130/+12
Now that we no longer any drivers using PHYLIB's adjust_link callback, remove all paths that made use of adjust_link as well as the associated functions. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01tipc: fix a possible memleak in tipc_buf_appendXin Long1-1/+1
__skb_linearize() doesn't free the skb when it fails, so move '*buf = NULL' after __skb_linearize(), so that the skb can be freed on the err path. Fixes: b7df21cf1b79 ("tipc: skb_linearize the head skb when reassembling msgs") Reported-by: Paolo Abeni <[email protected]> Signed-off-by: Xin Long <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Tung Nguyen <[email protected]> Link: https://lore.kernel.org/r/90710748c29a1521efac4f75ea01b3b7e61414cf.1714485818.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01tipc: fix UAF in error pathPaolo Abeni1-1/+5
Sam Page (sam4k) working with Trend Micro Zero Day Initiative reported a UAF in the tipc_buf_append() error path: BUG: KASAN: slab-use-after-free in kfree_skb_list_reason+0x47e/0x4c0 linux/net/core/skbuff.c:1183 Read of size 8 at addr ffff88804d2a7c80 by task poc/8034 CPU: 1 PID: 8034 Comm: poc Not tainted 6.8.2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 Call Trace: <IRQ> __dump_stack linux/lib/dump_stack.c:88 dump_stack_lvl+0xd9/0x1b0 linux/lib/dump_stack.c:106 print_address_description linux/mm/kasan/report.c:377 print_report+0xc4/0x620 linux/mm/kasan/report.c:488 kasan_report+0xda/0x110 linux/mm/kasan/report.c:601 kfree_skb_list_reason+0x47e/0x4c0 linux/net/core/skbuff.c:1183 skb_release_data+0x5af/0x880 linux/net/core/skbuff.c:1026 skb_release_all linux/net/core/skbuff.c:1094 __kfree_skb linux/net/core/skbuff.c:1108 kfree_skb_reason+0x12d/0x210 linux/net/core/skbuff.c:1144 kfree_skb linux/./include/linux/skbuff.h:1244 tipc_buf_append+0x425/0xb50 linux/net/tipc/msg.c:186 tipc_link_input+0x224/0x7c0 linux/net/tipc/link.c:1324 tipc_link_rcv+0x76e/0x2d70 linux/net/tipc/link.c:1824 tipc_rcv+0x45f/0x10f0 linux/net/tipc/node.c:2159 tipc_udp_recv+0x73b/0x8f0 linux/net/tipc/udp_media.c:390 udp_queue_rcv_one_skb+0xad2/0x1850 linux/net/ipv4/udp.c:2108 udp_queue_rcv_skb+0x131/0xb00 linux/net/ipv4/udp.c:2186 udp_unicast_rcv_skb+0x165/0x3b0 linux/net/ipv4/udp.c:2346 __udp4_lib_rcv+0x2594/0x3400 linux/net/ipv4/udp.c:2422 ip_protocol_deliver_rcu+0x30c/0x4e0 linux/net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2e4/0x520 linux/net/ipv4/ip_input.c:233 NF_HOOK linux/./include/linux/netfilter.h:314 NF_HOOK linux/./include/linux/netfilter.h:308 ip_local_deliver+0x18e/0x1f0 linux/net/ipv4/ip_input.c:254 dst_input linux/./include/net/dst.h:461 ip_rcv_finish linux/net/ipv4/ip_input.c:449 NF_HOOK linux/./include/linux/netfilter.h:314 NF_HOOK linux/./include/linux/netfilter.h:308 ip_rcv+0x2c5/0x5d0 linux/net/ipv4/ip_input.c:569 __netif_receive_skb_one_core+0x199/0x1e0 linux/net/core/dev.c:5534 __netif_receive_skb+0x1f/0x1c0 linux/net/core/dev.c:5648 process_backlog+0x101/0x6b0 linux/net/core/dev.c:5976 __napi_poll.constprop.0+0xba/0x550 linux/net/core/dev.c:6576 napi_poll linux/net/core/dev.c:6645 net_rx_action+0x95a/0xe90 linux/net/core/dev.c:6781 __do_softirq+0x21f/0x8e7 linux/kernel/softirq.c:553 do_softirq linux/kernel/softirq.c:454 do_softirq+0xb2/0xf0 linux/kernel/softirq.c:441 </IRQ> <TASK> __local_bh_enable_ip+0x100/0x120 linux/kernel/softirq.c:381 local_bh_enable linux/./include/linux/bottom_half.h:33 rcu_read_unlock_bh linux/./include/linux/rcupdate.h:851 __dev_queue_xmit+0x871/0x3ee0 linux/net/core/dev.c:4378 dev_queue_xmit linux/./include/linux/netdevice.h:3169 neigh_hh_output linux/./include/net/neighbour.h:526 neigh_output linux/./include/net/neighbour.h:540 ip_finish_output2+0x169f/0x2550 linux/net/ipv4/ip_output.c:235 __ip_finish_output linux/net/ipv4/ip_output.c:313 __ip_finish_output+0x49e/0x950 linux/net/ipv4/ip_output.c:295 ip_finish_output+0x31/0x310 linux/net/ipv4/ip_output.c:323 NF_HOOK_COND linux/./include/linux/netfilter.h:303 ip_output+0x13b/0x2a0 linux/net/ipv4/ip_output.c:433 dst_output linux/./include/net/dst.h:451 ip_local_out linux/net/ipv4/ip_output.c:129 ip_send_skb+0x3e5/0x560 linux/net/ipv4/ip_output.c:1492 udp_send_skb+0x73f/0x1530 linux/net/ipv4/udp.c:963 udp_sendmsg+0x1a36/0x2b40 linux/net/ipv4/udp.c:1250 inet_sendmsg+0x105/0x140 linux/net/ipv4/af_inet.c:850 sock_sendmsg_nosec linux/net/socket.c:730 __sock_sendmsg linux/net/socket.c:745 __sys_sendto+0x42c/0x4e0 linux/net/socket.c:2191 __do_sys_sendto linux/net/socket.c:2203 __se_sys_sendto linux/net/socket.c:2199 __x64_sys_sendto+0xe0/0x1c0 linux/net/socket.c:2199 do_syscall_x64 linux/arch/x86/entry/common.c:52 do_syscall_64+0xd8/0x270 linux/arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6f/0x77 linux/arch/x86/entry/entry_64.S:120 RIP: 0033:0x7f3434974f29 Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 37 8f 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007fff9154f2b8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3434974f29 RDX: 00000000000032c8 RSI: 00007fff9154f300 RDI: 0000000000000003 RBP: 00007fff915532e0 R08: 00007fff91553360 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000212 R12: 000055ed86d261d0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> In the critical scenario, either the relevant skb is freed or its ownership is transferred into a frag_lists. In both cases, the cleanup code must not free it again: we need to clear the skb reference earlier. Fixes: 1149557d64c9 ("tipc: eliminate unnecessary linearization of incoming buffers") Cc: [email protected] Reported-by: [email protected] # ZDI-CAN-23852 Acked-by: Xin Long <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/752f1ccf762223d109845365d07f55414058e5a3.1714484273.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01arp: Convert ioctl(SIOCGARP) to RCU.Kuniyuki Iwashima1-10/+18
ioctl(SIOCGARP) holds rtnl_lock() to get netdev by __dev_get_by_name() and copy dev->name safely and calls neigh_lookup() later, which looks up a neighbour entry under RCU. Let's replace __dev_get_by_name() with dev_get_by_name_rcu() and strscpy() with netdev_copy_name() to avoid locking rtnl_lock(). Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01net: Protect dev->name by seqlock.Kuniyuki Iwashima1-4/+23
We will convert ioctl(SIOCGARP) to RCU, and then we need to copy dev->name which is currently protected by rtnl_lock(). This patch does the following: 1) Add seqlock netdev_rename_lock to protect dev->name 2) Add netdev_copy_name() that copies dev->name to buffer under netdev_rename_lock 3) Use netdev_copy_name() in netdev_get_name() and drop devnet_rename_sem Suggested-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/netdev/CANn89iJEWs7AYSJqGCUABeVqOCTkErponfZdT5kV-iD=-SajnQ@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01arp: Get dev after calling arp_req_(delete|set|get)().Kuniyuki Iwashima1-36/+50
arp_ioctl() holds rtnl_lock() first regardless of cmd (SIOCDARP, SIOCSARP, and SIOCGARP) to get net_device by __dev_get_by_name() and copy dev->name safely. In the SIOCGARP path, arp_req_get() calls neigh_lookup(), which looks up a neighbour entry under RCU. We will extend the RCU section not to take rtnl_lock() and instead use dev_get_by_name_rcu() for SIOCGARP. As a preparation, let's move __dev_get_by_name() into another function and call it from arp_req_delete(), arp_req_set(), and arp_req_get(). Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01arp: Remove a nest in arp_req_get().Kuniyuki Iwashima1-13/+18
This is a prep patch to make the following changes tidy. No functional change intended. Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01arp: Factorise ip_route_output() call in arp_req_set() and arp_req_delete().Kuniyuki Iwashima1-20/+30
When ioctl(SIOCDARP/SIOCSARP) is issued for non-proxy entry (no ATF_COM) without arpreq.arp_dev[] set, arp_req_set() and arp_req_delete() looks up dev based on IPv4 address by ip_route_output(). Let's factorise the same code as arp_req_dev(). Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01arp: Validate netmask earlier for SIOCDARP and SIOCSARP in arp_ioctl().Kuniyuki Iwashima1-12/+15
When ioctl(SIOCDARP/SIOCSARP) is issued with ATF_PUBL, r.arp_netmask must be 0.0.0.0 or 255.255.255.255. Currently, the netmask is validated in arp_req_delete_public() or arp_req_set_public() under rtnl_lock(). We have ATF_NETMASK test in arp_ioctl() before holding rtnl_lock(), so let's move the netmask validation there. Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01arp: Move ATF_COM setting in arp_req_set().Kuniyuki Iwashima1-3/+6
In arp_req_set(), if ATF_PERM is set in arpreq.arp_flags, ATF_COM is set automatically. The flag will be used later for neigh_update() only when a neighbour entry is found. Let's set ATF_COM just before calling neigh_update(). Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01rxrpc: Clients must accept conn from any addressJeffrey Altman1-7/+2
The find connection logic of Transarc's Rx was modified in the mid-1990s to support multi-homed servers which might send a response packet from an address other than the destination address in the received packet. The rules for accepting a packet by an Rx initiator (RX_CLIENT_CONNECTION) were altered to permit acceptance of a packet from any address provided that the port number was unchanged and all of the connection identifiers matched (Epoch, CID, SecurityClass, ...). This change applies the same rules to the Linux implementation which makes it consistent with IBM AFS 3.6, Arla, OpenAFS and AuriStorFS. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: Jeffrey Altman <[email protected]> Acked-by: David Howells <[email protected]> Signed-off-by: Marc Dionne <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01ipv6: anycast: use call_rcu_hurry() in aca_put()Eric Dumazet1-3/+2
This is a followup of commit b5327b9a300e ("ipv6: use call_rcu_hurry() in fib6_info_release()"). I had another pmtu.sh failure, and found another lazy call_rcu() causing this failure. aca_free_rcu() calls fib6_info_release() which releases devices references. We must not delay it too much or risk unregister_netdevice/ref_tracker traces because references to netdev are not released in time. This should speedup device/netns dismantles when CONFIG_RCU_LAZY=y Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-01net: core: reject skb_copy(_expand) for fraglist GSO skbsFelix Fietkau1-8/+19
SKB_GSO_FRAGLIST skbs must not be linearized, otherwise they become invalid. Return NULL if such an skb is passed to skb_copy or skb_copy_expand, in order to prevent a crash on a potential later call to skb_gso_segment. Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-01net: bridge: fix multicast-to-unicast with fraglist GSOFelix Fietkau1-1/+1
Calling skb_copy on a SKB_GSO_FRAGLIST skb is not valid, since it returns an invalid linearized skb. This code only needs to change the ethernet header, so pskb_copy is the right function to call here. Fixes: 6db6f0eae605 ("bridge: multicast to unicast") Signed-off-by: Felix Fietkau <[email protected]> Acked-by: Paolo Abeni <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-30netpoll: Fix race condition in netpoll_owner_activeBreno Leitao1-1/+1
KCSAN detected a race condition in netpoll: BUG: KCSAN: data-race in net_rx_action / netpoll_send_skb write (marked) to 0xffff8881164168b0 of 4 bytes by interrupt on cpu 10: net_rx_action (./include/linux/netpoll.h:90 net/core/dev.c:6712 net/core/dev.c:6822) <snip> read to 0xffff8881164168b0 of 4 bytes by task 1 on cpu 2: netpoll_send_skb (net/core/netpoll.c:319 net/core/netpoll.c:345 net/core/netpoll.c:393) netpoll_send_udp (net/core/netpoll.c:?) <snip> value changed: 0x0000000a -> 0xffffffff This happens because netpoll_owner_active() needs to check if the current CPU is the owner of the lock, touching napi->poll_owner non atomically. The ->poll_owner field contains the current CPU holding the lock. Use an atomic read to check if the poll owner is the current CPU. Signed-off-by: Breno Leitao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30mptcp: ensure snd_nxt is properly initialized on connectPaolo Abeni1-0/+3
Christoph reported a splat hinting at a corrupted snd_una: WARNING: CPU: 1 PID: 38 at net/mptcp/protocol.c:1005 __mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Modules linked in: CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.9.0-rc1-gbbeac67456c9 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:__mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Code: be 06 01 00 00 bf 06 01 00 00 e8 a8 12 e7 fe e9 00 fe ff ff e8 8e 1a e7 fe 0f b7 ab 3e 02 00 00 e9 d3 fd ff ff e8 7d 1a e7 fe <0f> 0b 4c 8b bb e0 05 00 00 e9 74 fc ff ff e8 6a 1a e7 fe 0f 0b e9 RSP: 0018:ffffc9000013fd48 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8881029bd280 RCX: ffffffff82382fe4 RDX: ffff8881003cbd00 RSI: ffffffff823833c3 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888138ba8000 R13: 0000000000000106 R14: ffff8881029bd908 R15: ffff888126560000 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f604a5dae38 CR3: 0000000101dac002 CR4: 0000000000170ef0 Call Trace: <TASK> __mptcp_clean_una_wakeup net/mptcp/protocol.c:1055 [inline] mptcp_clean_una_wakeup net/mptcp/protocol.c:1062 [inline] __mptcp_retrans+0x7f/0x7e0 net/mptcp/protocol.c:2615 mptcp_worker+0x434/0x740 net/mptcp/protocol.c:2767 process_one_work+0x1e0/0x560 kernel/workqueue.c:3254 process_scheduled_works kernel/workqueue.c:3335 [inline] worker_thread+0x3c7/0x640 kernel/workqueue.c:3416 kthread+0x121/0x170 kernel/kthread.c:388 ret_from_fork+0x44/0x50 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 </TASK> When fallback to TCP happens early on a client socket, snd_nxt is not yet initialized and any incoming ack will copy such value into snd_una. If the mptcp worker (dumbly) tries mptcp-level re-injection after such ack, that would unconditionally trigger a send buffer cleanup using 'bad' snd_una values. We could easily disable re-injection for fallback sockets, but such dumb behavior already helped catching a few subtle issues and a very low to zero impact in practice. Instead address the issue always initializing snd_nxt (and write_seq, for consistency) at connect time. Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect") Cc: [email protected] Reported-by: Christoph Paasch <[email protected]> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485 Tested-by: Christoph Paasch <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Link: https://lore.kernel.org/r/20240429-upstream-net-20240429-mptcp-snd_nxt-init-connect-v1-1-59ceac0a7dcb@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30net: move sysctl_mem_pcpu_rsv to net_hotdataEric Dumazet3-4/+4
sysctl_mem_pcpu_rsv is used in TCP fast path, move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30net: add <net/proto_memory.h>Eric Dumazet7-0/+7
Move some proto memory definitions out of <net/sock.h> Very few files need them, and following patch will include <net/hotdata.h> from <net/proto_memory.h> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30tcp: move tcp_out_of_memory() to net/ipv4/tcp.cEric Dumazet1-1/+9
tcp_out_of_memory() has a single caller: tcp_check_oom(). Following patch will also make sk_memory_allocated() not anymore visible from <net/sock.h> and <net/tcp.h> Add const qualifier to sock argument of tcp_out_of_memory() and tcp_check_oom(). Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30net: move sysctl_skb_defer_max to net_hotdataEric Dumazet5-4/+3
sysctl_skb_defer_max is used in TCP fast path, move it to net_hodata. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30net: move sysctl_max_skb_frags to net_hotdataEric Dumazet5-7/+7
sysctl_max_skb_frags is used in TCP and MPTCP fast paths, move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30inet: introduce dst_rtable() helperEric Dumazet17-46/+45
I added dst_rt6_info() in commit e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper") This patch does a similar change for IPv4. Instead of (struct rtable *)dst casts, we can use : #define dst_rtable(_ptr) \ container_of_const(_ptr, struct rtable, dst) Patch is smaller than IPv6 one, because IPv4 has skb_rtable() helper. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30rxrpc: Fix using alignmask being zero for __page_frag_alloc_align()Yunsheng Lin3-7/+7
rxrpc_alloc_data_txbuf() may be called with data_align being zero in none_alloc_txbuf() and rxkad_alloc_txbuf(), data_align is supposed to be an order-based alignment value, but zero is not a valid order-based alignment value, and '~(data_align - 1)' doesn't result in a valid mask-based alignment value for __page_frag_alloc_align(). Fix it by passing a valid order-based alignment value in none_alloc_txbuf() and rxkad_alloc_txbuf(). Also use page_frag_alloc_align() expecting an order-based alignment value in rxrpc_alloc_data_txbuf() to avoid doing the alignment converting operation and to catch possible invalid alignment value in the future. Remove the 'if (data_align)' checking too, as it is always true for a valid order-based alignment value. Fixes: 6b2536462fd4 ("rxrpc: Fix use of changed alignment param to page_frag_alloc_align()") Fixes: 49489bb03a50 ("rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags") CC: Alexander Duyck <[email protected]> Signed-off-by: Yunsheng Lin <[email protected]> Acked-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30net: page_pool: support error injectionJakub Kicinski1-0/+2
Because of caching / recycling using the general page allocation failures to induce errors in page pool allocation is very hard. Add direct error injection support to page_pool_alloc_pages(). Reviewed-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30net/smc: implement DMB-merged operations of loopback-ismWen Gu2-15/+108
This implements operations related to merging sndbuf with peer DMB in loopback-ism. The DMB won't be freed until no sndbuf is attached to it. Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-and-tested-by: Jan Karcher <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-30net/smc: adapt cursor update when sndbuf and peer DMB are mergedWen Gu1-2/+34
If the local sndbuf shares the same physical memory with peer DMB, the cursor update processing needs to be adapted to ensure that the data to be consumed won't be overwritten. So in this case, the fin_curs and sndbuf_space that were originally updated after sending the CDC message should be modified to not be update until the peer updates cons_curs. Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-and-tested-by: Jan Karcher <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-30net/smc: {at|de}tach sndbuf to peer DMB if supportedWen Gu3-1/+77
If the device used by SMC-D supports merging local sndbuf to peer DMB, then create sndbuf descriptor and attach it to peer DMB once peer token is obtained, and detach and free the sndbuf descriptor when the connection is freed. Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-and-tested-by: Jan Karcher <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-30net/smc: add operations to merge sndbuf with peer DMBWen Gu2-0/+44
In some scenarios using Emulated-ISM device, sndbuf can share the same physical memory region with peer DMB to avoid data copy from one side to the other. In such case the sndbuf is only a descriptor that describes the shared memory and does not actually occupy memory, it's more like a ghost buffer. +----------+ +----------+ | socket A | | socket B | +----------+ +----------+ | | +--------+ +--------+ | sndbuf | | DMB | | desc | | desc | +--------+ +--------+ | | | +----v-----+ +--------------------------> memory | +----------+ So here introduces three new SMC-D device operations to check if this feature is supported by device, and to {attach|detach} ghost sndbuf to peer DMB. For now only loopback-ism supports this. Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-and-tested-by: Jan Karcher <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-30net/smc: register loopback-ism into SMC-D device listWen Gu3-16/+35
After the loopback-ism device is ready, add it to the SMC-D device list as an ISMv2 device, and always keep it at the beginning to ensure it is preferred for providing a shortcut for data transfer within the same kernel. Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-and-tested-by: Jan Karcher <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-30net/smc: ignore loopback-ism when dumping SMC-D devicesWen Gu2-0/+7
Since loopback-ism is not a PCI device, the PCI information fed back by smc_nl_handle_smcd_dev() does not apply to loopback-ism. So currently ignore loopback-ism when dumping SMC-D devices. The netlink function of loopback-ism will be refactored when SMC netlink interface is updated. Link: https://lore.kernel.org/r/[email protected]/ Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-and-tested-by: Jan Karcher <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-30net/smc: mark optional smcd_ops and check for support when calledWen Gu1-1/+8
Some operations are not supported by new introduced Emulated-ISM, so mark them as optional and check if the device supports them when called. Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-and-tested-by: Jan Karcher <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-30net/smc: implement DMB-related operations of loopback-ismWen Gu2-3/+139
This implements DMB (un)registration and data move operations of loopback-ism device. Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-and-tested-by: Jan Karcher <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-30net/smc: implement ID-related operations of loopback-ismWen Gu2-5/+60
This implements operations related to IDs for the loopback-ism device. loopback-ism uses an Extended GID that is a 128-bit GID instead of the existing ISM 64-bit GID, and uses the CHID defined with the reserved value 0xFFFF. Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-and-tested-by: Jan Karcher <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-30net/smc: introduce loopback-ism for SMC intra-OS shortcutWen Gu5-1/+223
This introduces a kind of Emulated-ISM device named loopback-ism for SMCv2.1. The loopback-ism device is currently exclusive for SMC usage, and aims to provide an SMC shortcut for sockets within the same kernel, leading to improved intra-OS traffic performance. Configuration of this feature is managed through the config SMC_LO. Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Gerd Bayer <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-and-tested-by: Jan Karcher <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-30net/smc: decouple ism_client from SMC-D DMB registrationWen Gu1-5/+2
The struct 'ism_client' is specialized for s390 platform firmware ISM. So replace it with 'void' to make SMCD DMB registration helper generic for both Emulated-ISM and existing ISM. Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-and-tested-by: Jan Karcher <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-30sctp: prefer struct_size over open coded arithmeticErick Archer1-3/+4
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "ids" variable is a pointer to "struct sctp_assoc_ids" and this structure ends in a flexible array: struct sctp_assoc_ids { [...] sctp_assoc_t gaids_assoc_id[]; }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + size * count" in the kmalloc() function. Also, refactor the code adding the "ids_size" variable to avoid sizing twice. This way, the code is more readable and safer. This code was detected with the help of Coccinelle, and audited and modified manually. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Signed-off-by: Erick Archer <[email protected]> Acked-by: Xin Long <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/PAXPR02MB724871DB78375AB06B5171C88B152@PAXPR02MB7248.eurprd02.prod.outlook.com Signed-off-by: Paolo Abeni <[email protected]>
2024-04-30netdev: add queue statsXuan Zhuo1-2/+21
These stats are commonly. Support reporting those via netdev-genl queue stats. name: rx-hw-drops name: rx-hw-drop-overruns name: rx-csum-unnecessary name: rx-csum-none name: rx-csum-bad name: rx-hw-gro-packets name: rx-hw-gro-bytes name: rx-hw-gro-wire-packets name: rx-hw-gro-wire-bytes name: rx-hw-drop-ratelimits name: tx-hw-drops name: tx-hw-drop-errors name: tx-csum-none name: tx-needs-csum name: tx-hw-gso-packets name: tx-hw-gso-bytes name: tx-hw-gso-wire-packets name: tx-hw-gso-wire-bytes name: tx-hw-drop-ratelimits Signed-off-by: Xuan Zhuo <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-04-29net: hsr: init prune_proxy_timer soonerEric Dumazet1-1/+1
We must initialize prune_proxy_timer before we attempt a del_timer_sync() on it. syzbot reported the following splat: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 1 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-01199-gfc48de77d69d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: netns cleanup_net Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 assign_lock_key+0x238/0x270 kernel/locking/lockdep.c:976 register_lock_class+0x1cf/0x980 kernel/locking/lockdep.c:1289 __lock_acquire+0xda/0x1fd0 kernel/locking/lockdep.c:5014 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __timer_delete_sync+0x148/0x310 kernel/time/timer.c:1648 del_timer_sync include/linux/timer.h:185 [inline] hsr_dellink+0x33/0x80 net/hsr/hsr_netlink.c:132 default_device_exit_batch+0x956/0xa90 net/core/dev.c:11737 ops_exit_list net/core/net_namespace.c:175 [inline] cleanup_net+0x89d/0xcc0 net/core/net_namespace.c:637 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> ODEBUG: assert_init not available (active state 0) object: ffff88806d3fcd88 object type: timer_list hint: 0x0 WARNING: CPU: 1 PID: 11 at lib/debugobjects.c:517 debug_print_object+0x17a/0x1f0 lib/debugobjects.c:514 Fixes: 5055cccfc2d1 ("net: hsr: Provide RedBox support (HSR-SAN)") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Lukasz Majewski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-29Merge tag 'for-netdev' of ↵Jakub Kicinski8-36/+321
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-04-29 We've added 147 non-merge commits during the last 32 day(s) which contain a total of 158 files changed, 9400 insertions(+), 2213 deletions(-). The main changes are: 1) Add an internal-only BPF per-CPU instruction for resolving per-CPU memory addresses and implement support in x86 BPF JIT. This allows inlining per-CPU array and hashmap lookups and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko. 2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song. 3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various atomics in bpf_arena which can be JITed as a single x86 instruction, from Alexei Starovoitov. 4) Add support for passing mark with bpf_fib_lookup helper, from Anton Protopopov. 5) Add a new bpf_wq API for deferring events and refactor sleepable bpf_timer code to keep common code where possible, from Benjamin Tissoires. 6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs to check when NULL is passed for non-NULLable parameters, from Eduard Zingerman. 7) Harden the BPF verifier's and/or/xor value tracking, from Harishankar Vishwanathan. 8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel crypto subsystem, from Vadim Fedorenko. 9) Various improvements to the BPF instruction set standardization doc, from Dave Thaler. 10) Extend libbpf APIs to partially consume items from the BPF ringbuffer, from Andrea Righi. 11) Bigger batch of BPF selftests refactoring to use common network helpers and to drop duplicate code, from Geliang Tang. 12) Support bpf_tail_call_static() helper for BPF programs with GCC 13, from Jose E. Marchesi. 13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF program to have code sections where preemption is disabled, from Kumar Kartikeya Dwivedi. 14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs, from David Vernet. 15) Extend the BPF verifier to allow different input maps for a given bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu. 16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan. 17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison the stack memory, from Martin KaFai Lau. 18) Improve xsk selftest coverage with new tests on maximum and minimum hardware ring size configurations, from Tushar Vyavahare. 19) Various ReST man pages fixes as well as documentation and bash completion improvements for bpftool, from Rameez Rehman & Quentin Monnet. 20) Fix libbpf with regards to dumping subsequent char arrays, from Quentin Deslandes. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits) bpf, docs: Clarify PC use in instruction-set.rst bpf_helpers.h: Define bpf_tail_call_static when building with GCC bpf, docs: Add introduction for use in the ISA Internet Draft selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args selftests/bpf: dummy_st_ops should reject 0 for non-nullable params bpf: check bpf_dummy_struct_ops program params for test runs selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops selftests/bpf: adjust dummy_st_ops_success to detect additional error bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable selftests/bpf: Add ring_buffer__consume_n test. bpf: Add bpf_guard_preempt() convenience macro selftests: bpf: crypto: add benchmark for crypto functions selftests: bpf: crypto skcipher algo selftests bpf: crypto: add skcipher to bpf crypto bpf: make common crypto API for TC/XDP programs bpf: update the comment for BTF_FIELDS_MAX selftests/bpf: Fix wq test. selftests/bpf: Use make_sockaddr in test_sock_addr selftests/bpf: Use connect_to_addr in test_sock_addr ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-29Merge tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-0/+1
Pull NFS client fixes from Trond Myklebust: - Fix an Oops in xs_tcp_tls_setup_socket - Fix an Oops due to missing error handling in nfs_net_init() * tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: Handle error of rpc_proc_register() in nfs_net_init(). SUNRPC: add a missing rpc_stat for TCP TLS
2024-04-29ipv6: introduce dst_rt6_info() helperEric Dumazet23-70/+60
Instead of (struct rt6_info *)dst casts, we can use : #define dst_rt6_info(_ptr) \ container_of_const(_ptr, struct rt6_info, dst) Some places needed missing const qualifiers : ip6_confirm_neigh(), ipv6_anycast_destination(), ipv6_unicast_destination(), has_gateway() v2: added missing parts (David Ahern) Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-29inet: use call_rcu_hurry() in inet_free_ifa()Eric Dumazet1-1/+6
This is a followup of commit c4e86b4363ac ("net: add two more call_rcu_hurry()") Our reference to ifa->ifa_dev must be freed ASAP to release the reference to the netdev the same way. inet_rcu_free_ifa() in_dev_put() -> in_dev_finish_destroy() -> netdev_put() This should speedup device/netns dismantles when CONFIG_RCU_LAZY=y Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-29net: give more chances to rcu in netdev_wait_allrefs_any()Eric Dumazet1-1/+2
This came while reviewing commit c4e86b4363ac ("net: add two more call_rcu_hurry()"). Paolo asked if adding one synchronize_rcu() would help. While synchronize_rcu() does not help, making sure to call rcu_barrier() before msleep(wait) is definitely helping to make sure lazy call_rcu() are completed. Instead of waiting ~100 seconds in my tests, the ref_tracker splats occurs one time only, and netdev_wait_allrefs_any() latency is reduced to the strict minimum. Ideally we should audit our call_rcu() users to make sure no refcount (or cascading call_rcu()) is held too long, because rcu_barrier() is quite expensive. Fixes: 0e4be9e57e8c ("net: use exponential backoff in netdev_wait_allrefs") Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/all/[email protected]/T/#m76d73ed6b03cd930778ac4d20a777f22a08d6824 Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-26Merge tag 'for-netdev' of ↵Jakub Kicinski2-14/+33
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-04-26 We've added 12 non-merge commits during the last 22 day(s) which contain a total of 14 files changed, 168 insertions(+), 72 deletions(-). The main changes are: 1) Fix BPF_PROBE_MEM in verifier and JIT to skip loads from vsyscall page, from Puranjay Mohan. 2) Fix a crash in XDP with devmap broadcast redirect when the latter map is in process of being torn down, from Toke Høiland-Jørgensen. 3) Fix arm64 and riscv64 BPF JITs to properly clear start time for BPF program runtime stats, from Xu Kuohai. 4) Fix a sockmap KCSAN-reported data race in sk_psock_skb_ingress_enqueue, from Jason Xing. 5) Fix BPF verifier error message in resolve_pseudo_ldimm64, from Anton Protopopov. 6) Fix missing DEBUG_INFO_BTF_MODULES Kconfig menu item, from Andrii Nakryiko. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test PROBE_MEM of VSYSCALL_ADDR on x86-64 bpf, x86: Fix PROBE_MEM runtime load check bpf: verifier: prevent userspace memory access xdp: use flags field to disambiguate broadcast redirect arm32, bpf: Reimplement sign-extension mov instruction riscv, bpf: Fix incorrect runtime stats bpf, arm64: Fix incorrect runtime stats bpf: Fix a verifier verbose message bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue MAINTAINERS: bpf: Add Lehui and Puranjay as riscv64 reviewers MAINTAINERS: Update email address for Puranjay Mohan bpf, kconfig: Fix DEBUG_INFO_BTF_MODULES Kconfig definition ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>