aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-12-10ipvs: use common functions for stats allocationJulian Anastasov1-41/+55
Move alloc_percpu/free_percpu logic in new functions Signed-off-by: Julian Anastasov <[email protected]> Cc: yunhong-cgl jiang <[email protected]> Cc: "dust.li" <[email protected]> Reviewed-by: Jiri Wiesner <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-12-10ipvs: add rcu protection to statsJulian Anastasov2-24/+50
In preparation to using RCU locking for the list with estimators, make sure the struct ip_vs_stats are released after RCU grace period by using RCU callbacks. This affects ipvs->tot_stats where we can not use RCU callbacks for ipvs, so we use allocated struct ip_vs_stats_rcu. For services and dests we force RCU callbacks for all cases. Signed-off-by: Julian Anastasov <[email protected]> Cc: yunhong-cgl jiang <[email protected]> Cc: "dust.li" <[email protected]> Reviewed-by: Jiri Wiesner <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-12-10SUNRPC: Fix crasher in unwrap_integ_data()Chuck Lever1-23/+32
If a zero length is passed to kmalloc() it returns 0x10, which is not a valid address. gss_verify_mic() subsequently crashes when it attempts to dereference that pointer. Instead of allocating this memory on every call based on an untrusted size value, use a piece of dynamically-allocated scratch memory that is always available. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]>
2022-12-10SUNRPC: Make the svc_authenticate tracepoint conditionalChuck Lever1-2/+1
Clean up: Simplify the tracepoint's only call site. Also, I noticed that when svc_authenticate() returns SVC_COMPLETE, it leaves rq_auth_stat set to an error value. That doesn't need to be recorded in the trace log. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]>
2022-12-10SUNRPC: Clean up xdr_write_pages()Chuck Lever1-9/+13
Make it more evident how xdr_write_pages() updates the tail buffer by using the convention of naming the iov pointer variable "tail". I spent more than a couple of hours chasing through code to understand this, so someone is likely to find this useful later. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]>
2022-12-10SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() failsChuck Lever1-2/+7
Fixes: 030d794bf498 ("SUNRPC: Use gssproxy upcall for server RPCGSS authentication.") Signed-off-by: Chuck Lever <[email protected]> Cc: <[email protected]> Reviewed-by: Jeff Layton <[email protected]>
2022-12-09Merge tag 'ipsec-next-2022-12-09' of ↵Jakub Kicinski5-32/+385
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next 2022-12-09 1) Add xfrm packet offload core API. From Leon Romanovsky. 2) Add xfrm packet offload support for mlx5. From Leon Romanovsky and Raed Salem. 3) Fix a typto in a error message. From Colin Ian King. * tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: (38 commits) xfrm: Fix spelling mistake "oflload" -> "offload" net/mlx5e: Open mlx5 driver to accept IPsec packet offload net/mlx5e: Handle ESN update events net/mlx5e: Handle hardware IPsec limits events net/mlx5e: Update IPsec soft and hard limits net/mlx5e: Store all XFRM SAs in Xarray net/mlx5e: Provide intermediate pointer to access IPsec struct net/mlx5e: Skip IPsec encryption for TX path without matching policy net/mlx5e: Add statistics for Rx/Tx IPsec offloaded flows net/mlx5e: Improve IPsec flow steering autogroup net/mlx5e: Configure IPsec packet offload flow steering net/mlx5e: Use same coding pattern for Rx and Tx flows net/mlx5e: Add XFRM policy offload logic net/mlx5e: Create IPsec policy offload tables net/mlx5e: Generalize creation of default IPsec miss group and rule net/mlx5e: Group IPsec miss handles into separate struct net/mlx5e: Make clear what IPsec rx_err does net/mlx5e: Flatten the IPsec RX add rule path net/mlx5e: Refactor FTE setup code to be more clear net/mlx5e: Move IPsec flow table creation to separate function ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-09net: devlink: Add missing error check to devlink_resource_put()Gavrilov Ilia1-3/+4
When the resource size changes, the return value of the 'nla_put_u64_64bit' function is not checked. That has been fixed to avoid rechecking at the next step. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Note that this is harmless, we'd error out at the next put(). Signed-off-by: Ilia.Gavrilov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-09skbuff: Introduce slab_build_skb()Kees Cook2-9/+63
syzkaller reported: BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294 Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295 For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to build_skb(). When build_skb() is passed a frag_size of 0, it means the buffer came from kmalloc. In these cases, ksize() is used to find its actual size, but since the allocation may not have been made to that size, actually perform the krealloc() call so that all the associated buffer size checking will be correctly notified (and use the "new" pointer so that compiler hinting works correctly). Split this logic out into a new interface, slab_build_skb(), but leave the original 0 checking for now to catch any stragglers. Reported-by: [email protected] Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function") Cc: Pavel Begunkov <[email protected]> Cc: pepsipu <[email protected]> Cc: [email protected] Cc: Vlastimil Babka <[email protected]> Cc: kasan-dev <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: [email protected] Cc: Daniel Borkmann <[email protected]> Cc: Hao Luo <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: John Fastabend <[email protected]> Cc: [email protected] Cc: KP Singh <[email protected]> Cc: [email protected] Cc: Stanislav Fomichev <[email protected]> Cc: [email protected] Cc: Yonghong Song <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-09mptcp: return 0 instead of 'err' varMatthieu Baerts2-3/+3
When 'err' is 0, it looks clearer to return '0' instead of the variable called 'err'. The behaviour is then not modified, just a clearer code. By doing this, we can also avoid false positive smatch warnings like this one: net/mptcp/pm_netlink.c:1169 mptcp_pm_parse_pm_addr_attr() warn: missing error code? 'err' Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Suggested-by: Mat Martineau <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-09mptcp: use nlmsg_free instead of kfree_skbGeliang Tang1-4/+4
Use nlmsg_free() instead of kfree_skb() in pm_netlink.c. The SKB's have been created by nlmsg_new(). The proper cleaning way should then be done with nlmsg_free(). For the moment, nlmsg_free() is simply calling kfree_skb() so we don't change the behaviour here. Suggested-by: Jakub Kicinski <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-09net: openvswitch: Add support to count upcall packetswangchuanlei3-0/+107
Add support to count upall packets, when kmod of openvswitch upcall to count the number of packets for upcall succeed and failed, which is a better way to see how many packets upcalled on every interfaces. Signed-off-by: wangchuanlei <[email protected]> Acked-by: Eelco Chaudron <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-12-09net/sched: avoid indirect classify functions on retpoline kernelsPedro Tammela14-25/+49
Expose the necessary tc classifier functions and wire up cls_api to use direct calls in retpoline kernels. Signed-off-by: Pedro Tammela <[email protected]> Reviewed-by: Jamal Hadi Salim <[email protected]> Reviewed-by: Victor Nogueira <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-12-09net/sched: avoid indirect act functions on retpoline kernelsPedro Tammela21-42/+81
Expose the necessary tc act functions and wire up act_api to use direct calls in retpoline kernels. Signed-off-by: Pedro Tammela <[email protected]> Reviewed-by: Jamal Hadi Salim <[email protected]> Reviewed-by: Victor Nogueira <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-12-09net/sched: add retpoline wrapper for tcPedro Tammela1-0/+5
On kernels using retpoline as a spectrev2 mitigation, optimize actions and filters that are compiled as built-ins into a direct call. On subsequent patches we expose the classifiers and actions functions and wire up the wrapper into tc. Signed-off-by: Pedro Tammela <[email protected]> Reviewed-by: Jamal Hadi Salim <[email protected]> Reviewed-by: Victor Nogueira <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-12-09net: vmw_vsock: vmci: Check memcpy_from_msg()Artem Chernyshev1-1/+5
vmci_transport_dgram_enqueue() does not check the return value of memcpy_from_msg(). If memcpy_from_msg() fails, it is possible that uninitialized memory contents are sent unintentionally instead of user's message in the datagram to the destination. Return with an error if memcpy_from_msg() fails. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 0f7db23a07af ("vmci_transport: switch ->enqeue_dgram, ->enqueue_stream and ->dequeue_stream to msghdr") Signed-off-by: Artem Chernyshev <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Reviewed-by: Vishnu Dasa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-12-09xfrm: Fix spelling mistake "oflload" -> "offload"Colin Ian King1-1/+1
There is a spelling mistake in a NL_SET_ERR_MSG message. Fix it. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2022-12-08net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCPWillem de Bruijn2-1/+9
Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from write_seq sockets instead of snd_una. This should have been the behavior from the start. Because processes may now exist that rely on the established behavior, do not change behavior of the existing option, but add the right behavior with a new flag. It is encouraged to always set SOF_TIMESTAMPING_OPT_ID_TCP on stream sockets along with the existing SOF_TIMESTAMPING_OPT_ID. Intuitively the contract is that the counter is zero after the setsockopt, so that the next write N results in a notification for the last byte N - 1. On idle sockets snd_una == write_seq and this holds for both. But on sockets with data in transmission, snd_una records the unacked offset in the stream. This depends on the ACK response from the peer. A process cannot learn this in a race free manner (ioctl SIOCOUTQ is one racy approach). write_seq records the offset at the last byte written by the process. This is a better starting point. It matches the intuitive contract in all circumstances, unaffected by external behavior. The new timestamp flag necessitates increasing sk_tsflags to 32 bits. Move the field in struct sock to avoid growing the socket (for some common CONFIG variants). The UAPI interface so_timestamping.flags is already int, so 32 bits wide. Reported-by: Sotirios Delimanolis <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski26-75/+140
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-08netfilter: flowtable: add a 'default' case to flowtable datapathLi Qiong1-0/+8
Add a 'default' case in case return a uninitialized value of ret, this should not ever happen since the follow transmit path types: - FLOW_OFFLOAD_XMIT_UNSPEC - FLOW_OFFLOAD_XMIT_TC are never observed from this path. Add this check for safety reasons. Signed-off-by: Li Qiong <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-12-08netfilter: flowtable: really fix NAT IPv6 offloadQingfang DENG1-3/+3
The for-loop was broken from the start. It translates to: for (i = 0; i < 4; i += 4) which means the loop statement is run only once, so only the highest 32-bit of the IPv6 address gets mangled. Fix the loop increment. Fixes: 0e07e25b481a ("netfilter: flowtable: fix NAT IPv6 offload mangling") Fixes: 5c27d8d76ce8 ("netfilter: nf_flow_table_offload: add IPv6 support") Signed-off-by: Qingfang DENG <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-12-07ipv6: avoid use-after-free in ip6_fragment()Eric Dumazet1-0/+5
Blamed commit claimed rcu_read_lock() was held by ip6_fragment() callers. It seems to not be always true, at least for UDP stack. syzbot reported: BUG: KASAN: use-after-free in ip6_dst_idev include/net/ip6_fib.h:245 [inline] BUG: KASAN: use-after-free in ip6_fragment+0x2724/0x2770 net/ipv6/ip6_output.c:951 Read of size 8 at addr ffff88801d403e80 by task syz-executor.3/7618 CPU: 1 PID: 7618 Comm: syz-executor.3 Not tainted 6.1.0-rc6-syzkaller-00012-g4312098baf37 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x15e/0x45d mm/kasan/report.c:395 kasan_report+0xbf/0x1f0 mm/kasan/report.c:495 ip6_dst_idev include/net/ip6_fib.h:245 [inline] ip6_fragment+0x2724/0x2770 net/ipv6/ip6_output.c:951 __ip6_finish_output net/ipv6/ip6_output.c:193 [inline] ip6_finish_output+0x9a3/0x1170 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] ip6_local_out+0xb3/0x1a0 net/ipv6/output_core.c:161 ip6_send_skb+0xbb/0x340 net/ipv6/ip6_output.c:1966 udp_v6_send_skb+0x82a/0x18a0 net/ipv6/udp.c:1286 udp_v6_push_pending_frames+0x140/0x200 net/ipv6/udp.c:1313 udpv6_sendmsg+0x18da/0x2c80 net/ipv6/udp.c:1606 inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 sock_write_iter+0x295/0x3d0 net/socket.c:1108 call_write_iter include/linux/fs.h:2191 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x9ed/0xdd0 fs/read_write.c:584 ksys_write+0x1ec/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fde3588c0d9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fde365b6168 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fde359ac050 RCX: 00007fde3588c0d9 RDX: 000000000000ffdc RSI: 00000000200000c0 RDI: 000000000000000a RBP: 00007fde358e7ae9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fde35acfb1f R14: 00007fde365b6300 R15: 0000000000022000 </TASK> Allocated by task 7618: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 __kasan_slab_alloc+0x82/0x90 mm/kasan/common.c:325 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3398 [inline] slab_alloc mm/slub.c:3406 [inline] __kmem_cache_alloc_lru mm/slub.c:3413 [inline] kmem_cache_alloc+0x2b4/0x3d0 mm/slub.c:3422 dst_alloc+0x14a/0x1f0 net/core/dst.c:92 ip6_dst_alloc+0x32/0xa0 net/ipv6/route.c:344 ip6_rt_pcpu_alloc net/ipv6/route.c:1369 [inline] rt6_make_pcpu_route net/ipv6/route.c:1417 [inline] ip6_pol_route+0x901/0x1190 net/ipv6/route.c:2254 pol_lookup_func include/net/ip6_fib.h:582 [inline] fib6_rule_lookup+0x52e/0x6f0 net/ipv6/fib6_rules.c:121 ip6_route_output_flags_noref+0x2e6/0x380 net/ipv6/route.c:2625 ip6_route_output_flags+0x76/0x320 net/ipv6/route.c:2638 ip6_route_output include/net/ip6_route.h:98 [inline] ip6_dst_lookup_tail+0x5ab/0x1620 net/ipv6/ip6_output.c:1092 ip6_dst_lookup_flow+0x90/0x1d0 net/ipv6/ip6_output.c:1222 ip6_sk_dst_lookup_flow+0x553/0x980 net/ipv6/ip6_output.c:1260 udpv6_sendmsg+0x151d/0x2c80 net/ipv6/udp.c:1554 inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 __sys_sendto+0x23a/0x340 net/socket.c:2117 __do_sys_sendto net/socket.c:2129 [inline] __se_sys_sendto net/socket.c:2125 [inline] __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 7599: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2e/0x40 mm/kasan/generic.c:511 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free+0x160/0x1c0 mm/kasan/common.c:200 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1724 [inline] slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1750 slab_free mm/slub.c:3661 [inline] kmem_cache_free+0xee/0x5c0 mm/slub.c:3683 dst_destroy+0x2ea/0x400 net/core/dst.c:127 rcu_do_batch kernel/rcu/tree.c:2250 [inline] rcu_core+0x81f/0x1980 kernel/rcu/tree.c:2510 __do_softirq+0x1fb/0xadc kernel/softirq.c:571 Last potentially related work creation: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:481 call_rcu+0x9d/0x820 kernel/rcu/tree.c:2798 dst_release net/core/dst.c:177 [inline] dst_release+0x7d/0xe0 net/core/dst.c:167 refdst_drop include/net/dst.h:256 [inline] skb_dst_drop include/net/dst.h:268 [inline] skb_release_head_state+0x250/0x2a0 net/core/skbuff.c:838 skb_release_all net/core/skbuff.c:852 [inline] __kfree_skb net/core/skbuff.c:868 [inline] kfree_skb_reason+0x151/0x4b0 net/core/skbuff.c:891 kfree_skb_list_reason+0x4b/0x70 net/core/skbuff.c:901 kfree_skb_list include/linux/skbuff.h:1227 [inline] ip6_fragment+0x2026/0x2770 net/ipv6/ip6_output.c:949 __ip6_finish_output net/ipv6/ip6_output.c:193 [inline] ip6_finish_output+0x9a3/0x1170 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] ip6_local_out+0xb3/0x1a0 net/ipv6/output_core.c:161 ip6_send_skb+0xbb/0x340 net/ipv6/ip6_output.c:1966 udp_v6_send_skb+0x82a/0x18a0 net/ipv6/udp.c:1286 udp_v6_push_pending_frames+0x140/0x200 net/ipv6/udp.c:1313 udpv6_sendmsg+0x18da/0x2c80 net/ipv6/udp.c:1606 inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 sock_write_iter+0x295/0x3d0 net/socket.c:1108 call_write_iter include/linux/fs.h:2191 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x9ed/0xdd0 fs/read_write.c:584 ksys_write+0x1ec/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Second to last potentially related work creation: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:481 call_rcu+0x9d/0x820 kernel/rcu/tree.c:2798 dst_release net/core/dst.c:177 [inline] dst_release+0x7d/0xe0 net/core/dst.c:167 refdst_drop include/net/dst.h:256 [inline] skb_dst_drop include/net/dst.h:268 [inline] __dev_queue_xmit+0x1b9d/0x3ba0 net/core/dev.c:4211 dev_queue_xmit include/linux/netdevice.h:3008 [inline] neigh_resolve_output net/core/neighbour.c:1552 [inline] neigh_resolve_output+0x51b/0x840 net/core/neighbour.c:1532 neigh_output include/net/neighbour.h:546 [inline] ip6_finish_output2+0x56c/0x1530 net/ipv6/ip6_output.c:134 __ip6_finish_output net/ipv6/ip6_output.c:195 [inline] ip6_finish_output+0x694/0x1170 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:445 [inline] NF_HOOK include/linux/netfilter.h:302 [inline] NF_HOOK include/linux/netfilter.h:296 [inline] mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x720/0xdc0 net/ipv6/mcast.c:2653 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 The buggy address belongs to the object at ffff88801d403dc0 which belongs to the cache ip6_dst_cache of size 240 The buggy address is located 192 bytes inside of 240-byte region [ffff88801d403dc0, ffff88801d403eb0) The buggy address belongs to the physical page: page:ffffea00007500c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d403 memcg:ffff888022f49c81 flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000000200 ffffea0001ef6580 dead000000000002 ffff88814addf640 raw: 0000000000000000 00000000800c000c 00000001ffffffff ffff888022f49c81 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 3719, tgid 3719 (kworker/0:6), ts 136223432244, free_ts 136222971441 prep_new_page mm/page_alloc.c:2539 [inline] get_page_from_freelist+0x10b5/0x2d50 mm/page_alloc.c:4288 __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5555 alloc_pages+0x1aa/0x270 mm/mempolicy.c:2285 alloc_slab_page mm/slub.c:1794 [inline] allocate_slab+0x213/0x300 mm/slub.c:1939 new_slab mm/slub.c:1992 [inline] ___slab_alloc+0xa91/0x1400 mm/slub.c:3180 __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3279 slab_alloc_node mm/slub.c:3364 [inline] slab_alloc mm/slub.c:3406 [inline] __kmem_cache_alloc_lru mm/slub.c:3413 [inline] kmem_cache_alloc+0x31a/0x3d0 mm/slub.c:3422 dst_alloc+0x14a/0x1f0 net/core/dst.c:92 ip6_dst_alloc+0x32/0xa0 net/ipv6/route.c:344 icmp6_dst_alloc+0x71/0x680 net/ipv6/route.c:3261 mld_sendpack+0x5de/0xe70 net/ipv6/mcast.c:1809 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x720/0xdc0 net/ipv6/mcast.c:2653 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1459 [inline] free_pcp_prepare+0x65c/0xd90 mm/page_alloc.c:1509 free_unref_page_prepare mm/page_alloc.c:3387 [inline] free_unref_page+0x1d/0x4d0 mm/page_alloc.c:3483 __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2586 qlink_free mm/kasan/quarantine.c:168 [inline] qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187 kasan_quarantine_reduce+0x184/0x210 mm/kasan/quarantine.c:294 __kasan_slab_alloc+0x66/0x90 mm/kasan/common.c:302 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3398 [inline] kmem_cache_alloc_node+0x304/0x410 mm/slub.c:3443 __alloc_skb+0x214/0x300 net/core/skbuff.c:497 alloc_skb include/linux/skbuff.h:1267 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1191 [inline] netlink_sendmsg+0x9a6/0xe10 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 __sys_sendto+0x23a/0x340 net/socket.c:2117 __do_sys_sendto net/socket.c:2129 [inline] __se_sys_sendto net/socket.c:2125 [inline] __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 1758fd4688eb ("ipv6: remove unnecessary dst_hold() in ip6_fragment()") Reported-by: [email protected] Signed-off-by: Eric Dumazet <[email protected]> Cc: Wei Wang <[email protected]> Cc: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07devlink: Expose port function commands to control migratableShay Drory1-0/+55
Expose port function commands to enable / disable migratable capability, this is used to set the port function as migratable. Live migration is the process of transferring a live virtual machine from one physical host to another without disrupting its normal operation. In order for a VM to be able to perform LM, all the VM components must be able to perform migration. e.g.: to be migratable. In order for VF to be migratable, VF must be bound to VFIO driver with migration support. When migratable capability is enabled for a function of the port, the device is making the necessary preparations for the function to be migratable, which might include disabling features which cannot be migrated. Example of LM with migratable function configuration: Set migratable of the VF's port function. $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 migratable disable $ devlink port function set pci/0000:06:00.0/2 migratable enable $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 migratable enable Bind VF to VFIO driver with migration support: $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind $ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind Attach VF to the VM. Start the VM. Perform LM. Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Acked-by: Shannon Nelson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07devlink: Expose port function commands to control RoCEShay Drory1-0/+113
Expose port function commands to enable / disable RoCE, this is used to control the port RoCE device capabilities. When RoCE is disabled for a function of the port, function cannot create any RoCE specific resources (e.g GID table). It also saves system memory utilization. For example disabling RoCE enable a VF/SF saves 1 Mbytes of system memory per function. Example of a PCI VF port which supports function configuration: Set RoCE of the VF's port function. $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 roce enable $ devlink port function set pci/0000:06:00.0/2 roce disable $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 roce disable Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07devlink: Validate port function requestShay Drory1-9/+23
In order to avoid partial request processing, validate the request before processing it. Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07bridge: mcast: Constify 'group' argument in br_multicast_new_port_group()Ido Schimmel2-2/+3
The 'group' argument is not modified, so mark it as 'const'. It will allow us to constify arguments of the callers of this function in future patches. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07bridge: mcast: Remove redundant function argumentsIdo Schimmel1-4/+5
Drop the first three arguments and instead extract them from the MDB configuration structure. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07bridge: mcast: Move checks out of critical sectionIdo Schimmel1-18/+18
The checks only require information parsed from the RTM_NEWMDB netlink message and do not rely on any state stored in the bridge driver. Therefore, there is no need to perform the checks in the critical section under the multicast lock. Move the checks out of the critical section. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07bridge: mcast: Remove br_mdb_parse()Ido Schimmel1-88/+5
The parsing of the netlink messages and the validity checks are now performed in br_mdb_config_init() so we can remove br_mdb_parse(). This finally allows us to stop passing netlink attributes deep in the MDB control path and only use the MDB configuration structure. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07bridge: mcast: Use MDB group key from configuration structureIdo Schimmel1-8/+7
The MDB group key (i.e., {source, destination, protocol, VID}) is currently determined under the multicast lock from the netlink attributes. Instead, use the group key from the MDB configuration structure that was prepared before acquiring the lock. No functional changes intended. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07bridge: mcast: Propagate MDB configuration structure furtherIdo Schimmel1-13/+11
As an intermediate step towards only using the new MDB configuration structure, pass it further in the control path instead of passing individual attributes. No functional changes intended. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07bridge: mcast: Use MDB configuration structure where possibleIdo Schimmel1-19/+15
The MDB configuration structure (i.e., struct br_mdb_config) now includes all the necessary information from the parsed RTM_{NEW,DEL}MDB netlink messages, so use it. This will later allow us to delete the calls to br_mdb_parse() from br_mdb_add() and br_mdb_del(). No functional changes intended. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07bridge: mcast: Remove redundant checksIdo Schimmel1-54/+9
These checks are now redundant as they are performed by br_mdb_config_init() while parsing the RTM_{NEW,DEL}MDB messages. Remove them. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07bridge: mcast: Centralize netlink attribute parsingIdo Schimmel2-0/+127
Netlink attributes are currently passed deep in the MDB creation call chain, making it difficult to add new attributes. In addition, some validity checks are performed under the multicast lock although they can be performed before it is ever acquired. As a first step towards solving these issues, parse the RTM_{NEW,DEL}MDB messages into a configuration structure, relieving other functions from the need to handle raw netlink attributes. Subsequent patches will convert the MDB code to use this configuration structure. This is consistent with how other rtnetlink objects are handled, such as routes and nexthops. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07net: ethernet: use sysfs_emit() to instead of scnprintf()ye xingchen1-1/+1
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: ye xingchen <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07Merge tag 'linux-can-fixes-for-6.1-20221207' of ↵Jakub Kicinski1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2022-12-07 The 1st patch is by Oliver Hartkopp and fixes a potential NULL pointer deref found by syzbot in the AF_CAN protocol. The next 2 patches are by Jiri Slaby and Max Staudt and add the missing flush_work() before freeing the underlying memory in the slcan and can327 driver. The last patch is by Frank Jungclaus and target the esd_usb driver and fixes the CAN error counters, allowing them to return to zero. * tag 'linux-can-fixes-for-6.1-20221207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: esd_usb: Allow REC and TEC to return to zero can: can327: flush TX_work on ldisc .close() can: slcan: fix freed work crash can: af_can: fix NULL pointer dereference in can_rcv_filter ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07Merge tag 'ieee802154-for-net-next-2022-12-05' of ↵Jakub Kicinski6-21/+150
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== ieee802154-next 2022-12-05 Miquel continued his work towards full scanning support. For this, we now allow the creation of dedicated coordinator interfaces to allow a PAN coordinator to serve in the network and set the needed address filters with the hardware. On top of this we have the first part to allow scanning for available 15.4 networks. A new netlink scan group, within the existing nl802154 API, was added. In addition Miquel fixed two issues that have been introduced in the former patches to free an skb correctly and clarifying an expression in the stack. From David Girault we got tracing support when registering new PANs. * tag 'ieee802154-for-net-next-2022-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next: mac802154: Trace the registration of new PANs ieee802154: Advertize coordinators discovery mac802154: Allow the creation of coordinator interfaces mac802154: Clarify an expression mac802154: Move an skb free within the rx path ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07Merge "do not rely on ALLOW_ERROR_INJECTION for fmod_ret" into bpf-nextAlexei Starovoitov1-3/+11
Merge commit 5b481acab4ce ("bpf: do not rely on ALLOW_ERROR_INJECTION for fmod_ret") from hid tree into bpf-next. Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-07bpf: do not rely on ALLOW_ERROR_INJECTION for fmod_retBenjamin Tissoires1-3/+11
The current way of expressing that a non-bpf kernel component is willing to accept that bpf programs can be attached to it and that they can change the return value is to abuse ALLOW_ERROR_INJECTION. This is debated in the link below, and the result is that it is not a reasonable thing to do. Reuse the kfunc declaration structure to also tag the kernel functions we want to be fmodret. This way we can control from any subsystem which functions are being modified by bpf without touching the verifier. Link: https://lore.kernel.org/all/[email protected]/ Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Benjamin Tissoires <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-12-07Merge tag 'ieee802154-for-net-2022-12-05' of ↵Paolo Abeni1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== pull-request: ieee802154 for net 2022-12-05 An update from ieee802154 for your *net* tree: Three small fixes this time around. Ziyang Xuan fixed an error code for a timeout during initialization of the cc2520 driver. Hauke Mehrtens fixed a crash in the ca8210 driver SPI communication due uninitialized SPI structures. Wei Yongjun added INIT_LIST_HEAD ieee802154_if_add() to avoid a potential null pointer dereference. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-12-07tipc: call tipc_lxc_xmit without holding node_read_lockXin Long1-3/+9
When sending packets between nodes in netns, it calls tipc_lxc_xmit() for peer node to receive the packets where tipc_sk_mcast_rcv()/tipc_sk_rcv() might be called, and it's pretty much like in tipc_rcv(). Currently the local 'node rw lock' is held during calling tipc_lxc_xmit() to protect the peer_net not being freed by another thread. However, when receiving these packets, tipc_node_add_conn() might be called where the peer 'node rw lock' is acquired. Then a dead lock warning is triggered by lockdep detector, although it is not a real dead lock: WARNING: possible recursive locking detected -------------------------------------------- conn_server/1086 is trying to acquire lock: ffff8880065cb020 (&n->lock#2){++--}-{2:2}, \ at: tipc_node_add_conn.cold.76+0xaa/0x211 [tipc] but task is already holding lock: ffff8880065cd020 (&n->lock#2){++--}-{2:2}, \ at: tipc_node_xmit+0x285/0xb30 [tipc] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&n->lock#2); lock(&n->lock#2); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by conn_server/1086: #0: ffff8880036d1e40 (sk_lock-AF_TIPC){+.+.}-{0:0}, \ at: tipc_accept+0x9c0/0x10b0 [tipc] #1: ffff8880036d5f80 (sk_lock-AF_TIPC/1){+.+.}-{0:0}, \ at: tipc_accept+0x363/0x10b0 [tipc] #2: ffff8880065cd020 (&n->lock#2){++--}-{2:2}, \ at: tipc_node_xmit+0x285/0xb30 [tipc] #3: ffff888012e13370 (slock-AF_TIPC){+...}-{2:2}, \ at: tipc_sk_rcv+0x2da/0x1b40 [tipc] Call Trace: <TASK> dump_stack_lvl+0x44/0x5b __lock_acquire.cold.77+0x1f2/0x3d7 lock_acquire+0x1d2/0x610 _raw_write_lock_bh+0x38/0x80 tipc_node_add_conn.cold.76+0xaa/0x211 [tipc] tipc_sk_finish_conn+0x21e/0x640 [tipc] tipc_sk_filter_rcv+0x147b/0x3030 [tipc] tipc_sk_rcv+0xbb4/0x1b40 [tipc] tipc_lxc_xmit+0x225/0x26b [tipc] tipc_node_xmit.cold.82+0x4a/0x102 [tipc] __tipc_sendstream+0x879/0xff0 [tipc] tipc_accept+0x966/0x10b0 [tipc] do_accept+0x37d/0x590 This patch avoids this warning by not holding the 'node rw lock' before calling tipc_lxc_xmit(). As to protect the 'peer_net', rcu_read_lock() should be enough, as in cleanup_net() when freeing the netns, it calls synchronize_rcu() before the free is continued. Also since tipc_lxc_xmit() is like the RX path in tipc_rcv(), it makes sense to call it under rcu_read_lock(). Note that the right lock order must be: rcu_read_lock(); tipc_node_read_lock(n); tipc_node_read_unlock(n); tipc_lxc_xmit(); rcu_read_unlock(); instead of: tipc_node_read_lock(n); rcu_read_lock(); tipc_node_read_unlock(n); tipc_lxc_xmit(); rcu_read_unlock(); and we have to call tipc_node_read_lock/unlock() twice in tipc_node_xmit(). Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns") Reported-by: Shuang Li <[email protected]> Signed-off-by: Xin Long <[email protected]> Link: https://lore.kernel.org/r/5bdd1f8fee9db695cfff4528a48c9b9d0523fb00.1670110641.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <[email protected]>
2022-12-07can: af_can: fix NULL pointer dereference in can_rcv_filterOliver Hartkopp1-3/+3
Analogue to commit 8aa59e355949 ("can: af_can: fix NULL pointer dereference in can_rx_register()") we need to check for a missing initialization of ml_priv in the receive path of CAN frames. Since commit 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device") the check for dev->type to be ARPHRD_CAN is not sufficient anymore since bonding or tun netdevices claim to be CAN devices but do not initialize ml_priv accordingly. Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device") Reported-by: [email protected] Reported-by: Wei Chen <[email protected]> Signed-off-by: Oliver Hartkopp <[email protected]> Link: https://lore.kernel.org/all/[email protected] Cc: [email protected] Signed-off-by: Marc Kleine-Budde <[email protected]>
2022-12-06ipv4: Fix incorrect route flushing when table ID 0 is usedIdo Schimmel1-0/+3
Cited commit added the table ID to the FIB info structure, but did not properly initialize it when table ID 0 is used. This can lead to a route in the default VRF with a preferred source address not being flushed when the address is deleted. Consider the following example: # ip address add dev dummy1 192.0.2.1/28 # ip address add dev dummy1 192.0.2.17/28 # ip route add 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 100 # ip route add table 0 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 200 # ip route show 198.51.100.0/24 198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 100 198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200 Both routes are installed in the default VRF, but they are using two different FIB info structures. One with a metric of 100 and table ID of 254 (main) and one with a metric of 200 and table ID of 0. Therefore, when the preferred source address is deleted from the default VRF, the second route is not flushed: # ip address del dev dummy1 192.0.2.17/28 # ip route show 198.51.100.0/24 198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200 Fix by storing a table ID of 254 instead of 0 in the route configuration structure. Add a test case that fails before the fix: # ./fib_tests.sh -t ipv4_del_addr IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Table ID 0 TEST: Route removed in default VRF when source address deleted [FAIL] Tests passed: 8 Tests failed: 1 And passes after: # ./fib_tests.sh -t ipv4_del_addr IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Table ID 0 TEST: Route removed in default VRF when source address deleted [ OK ] Tests passed: 9 Tests failed: 0 Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs") Reported-by: Donald Sharp <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-06ipv4: Fix incorrect route flushing when source address is deletedIdo Schimmel1-0/+1
Cited commit added the table ID to the FIB info structure, but did not prevent structures with different table IDs from being consolidated. This can lead to routes being flushed from a VRF when an address is deleted from a different VRF. Fix by taking the table ID into account when looking for a matching FIB info. This is already done for FIB info structures backed by a nexthop object in fib_find_info_nh(). Add test cases that fail before the fix: # ./fib_tests.sh -t ipv4_del_addr IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [FAIL] TEST: Route in default VRF not removed [ OK ] RTNETLINK answers: File exists TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [FAIL] Tests passed: 6 Tests failed: 2 And pass after: # ./fib_tests.sh -t ipv4_del_addr IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Tests passed: 8 Tests failed: 0 Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-06net/ncsi: Silence runtime memcpy() false positive warningKees Cook1-1/+2
The memcpy() in ncsi_cmd_handler_oem deserializes nca->data into a flexible array structure that overlapping with non-flex-array members (mfr_id) intentionally. Since the mem_to_flex() API is not finished, temporarily silence this warning, since it is a false positive, using unsafe_memcpy(). Reported-by: Joel Stanley <[email protected]> Link: https://lore.kernel.org/netdev/CACPK8Xdfi=OJKP0x0D1w87fQeFZ4A2DP2qzGCRcuVbpU-9=4sQ@mail.gmail.com/ Cc: Samuel Mendoza-Jonas <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-06SUNRPC: Fix missing release socket in rpc_sockname()Wang ShaoBo1-1/+1
socket dynamically created is not released when getting an unintended address family type in rpc_sockname(), direct to out_release for calling sock_release(). Fixes: 2e738fdce22f ("SUNRPC: Add API to acquire source address") Signed-off-by: Wang ShaoBo <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-12-06xprtrdma: Fix regbuf data not freed in rpcrdma_req_create()Zhang Xiaoxu1-1/+1
If rdma receive buffer allocate failed, should call rpcrdma_regbuf_free() to free the send buffer, otherwise, the buffer data will be leaked. Fixes: bb93a1ae2bf4 ("xprtrdma: Allocate req's regbufs at xprt create time") Signed-off-by: Zhang Xiaoxu <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-12-06tipc: Fix potential OOB in tipc_link_proto_rcv()YueHaibing1-1/+3
Fix the potential risk of OOB if skb_linearize() fails in tipc_link_proto_rcv(). Fixes: 5cbb28a4bf65 ("tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers") Signed-off-by: YueHaibing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-12-06ip_gre: do not report erspan version on GRE interfaceHangbin Liu1-19/+29
Although the type I ERSPAN is based on the barebones IP + GRE encapsulation and no extra ERSPAN header. Report erspan version on GRE interface looks unreasonable. Fix this by separating the erspan and gre fill info. IPv6 GRE does not have this info as IPv6 only supports erspan version 1 and 2. Reported-by: Jianlin Shi <[email protected]> Fixes: f989d546a2d5 ("erspan: Add type I version 0 support.") Signed-off-by: Hangbin Liu <[email protected]> Acked-by: William Tu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-12-05xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from ↵Eyal Birger5-2/+150
TC-BPF This change adds xfrm metadata helpers using the unstable kfunc call interface for the TC-BPF hooks. This allows steering traffic towards different IPsec connections based on logic implemented in bpf programs. This object is built based on the availability of BTF debug info. When setting the xfrm metadata, percpu metadata dsts are used in order to avoid allocating a metadata dst per packet. In order to guarantee safe module unload, the percpu dsts are allocated on first use and never freed. The percpu pointer is stored in net/core/filter.c so that it can be reused on module reload. The metadata percpu dsts take ownership of the original skb dsts so that they may be used as part of the xfrm transmission logic - e.g. for MTU calculations. Signed-off-by: Eyal Birger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>