aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-05-07netfilter: nftables: avoid overflows in nft_hash_buckets()Eric Dumazet1-1/+9
Number of buckets being stored in 32bit variables, we have to ensure that no overflows occur in nft_hash_buckets() syzbot injected a size == 0x40000000 and reported: UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 1 PID: 29539 Comm: syz-executor.4 Not tainted 5.12.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327 __roundup_pow_of_two include/linux/log2.h:57 [inline] nft_hash_buckets net/netfilter/nft_set_hash.c:411 [inline] nft_hash_estimate.cold+0x19/0x1e net/netfilter/nft_set_hash.c:652 nft_select_set_ops net/netfilter/nf_tables_api.c:3586 [inline] nf_tables_newset+0xe62/0x3110 net/netfilter/nf_tables_api.c:4322 nfnetlink_rcv_batch+0xa09/0x24b0 net/netfilter/nfnetlink.c:488 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:612 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:630 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 Fixes: 0ed6389c483d ("netfilter: nf_tables: rename set implementations") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-05-05netfilter: nftables: Fix a memleak from userdata error path in new objectsPablo Neira Ayuso1-2/+2
Release object name if userdata allocation fails. Fixes: b131c96496b3 ("netfilter: nf_tables: add userdata support for nft_object") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-05-05netfilter: remove BUG_ON() after skb_header_pointer()Pablo Neira Ayuso6-7/+21
Several conntrack helpers and the TCP tracker assume that skb_header_pointer() never fails based on upfront header validation. Even if this should not ever happen, BUG_ON() is a too drastic measure, remove them. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-05-05netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL checkPablo Neira Ayuso1-0/+2
Do not assume that the tcph->doff field is correct when parsing for TCP options, skb_header_pointer() might fail to fetch these bits. Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-05-05netfilter: nfnetlink: add a missing rcu_read_unlock()Eric Dumazet1-0/+1
Reported by syzbot : BUG: sleeping function called from invalid context at include/linux/sched/mm.h:201 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 26899, name: syz-executor.5 1 lock held by syz-executor.5/26899: #0: ffffffff8bf797a0 (rcu_read_lock){....}-{1:2}, at: nfnetlink_get_subsys net/netfilter/nfnetlink.c:148 [inline] #0: ffffffff8bf797a0 (rcu_read_lock){....}-{1:2}, at: nfnetlink_rcv_msg+0x1da/0x1300 net/netfilter/nfnetlink.c:226 Preemption disabled at: [<ffffffff8917799e>] preempt_schedule_irq+0x3e/0x90 kernel/sched/core.c:5533 CPU: 1 PID: 26899 Comm: syz-executor.5 Not tainted 5.12.0-next-20210504-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:8338 might_alloc include/linux/sched/mm.h:201 [inline] slab_pre_alloc_hook mm/slab.h:500 [inline] slab_alloc_node mm/slub.c:2845 [inline] kmem_cache_alloc_node+0x33d/0x3e0 mm/slub.c:2960 __alloc_skb+0x20b/0x340 net/core/skbuff.c:413 alloc_skb include/linux/skbuff.h:1107 [inline] nlmsg_new include/net/netlink.h:953 [inline] netlink_ack+0x1ed/0xaa0 net/netlink/af_netlink.c:2437 netlink_rcv_skb+0x33d/0x420 net/netlink/af_netlink.c:2508 nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:650 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665f9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fa8a03ee188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 00000000004665f9 RDX: 0000000000000000 RSI: 0000000020000480 RDI: 0000000000000004 RBP: 00000000004bfce1 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf60 R13: 00007fffe864480f R14: 00007fa8a03ee300 R15: 0000000000022000 ================================================ WARNING: lock held when returning to user space! 5.12.0-next-20210504-syzkaller #0 Tainted: G W ------------------------------------------------ syz-executor.5/26899 is leaving the kernel with locks still held! 1 lock held by syz-executor.5/26899: #0: ffffffff8bf797a0 (rcu_read_lock){....}-{1:2}, at: nfnetlink_get_subsys net/netfilter/nfnetlink.c:148 [inline] #0: ffffffff8bf797a0 (rcu_read_lock){....}-{1:2}, at: nfnetlink_rcv_msg+0x1da/0x1300 net/netfilter/nfnetlink.c:226 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 26899 at kernel/rcu/tree_plugin.h:359 rcu_note_context_switch+0xfd/0x16e0 kernel/rcu/tree_plugin.h:359 Modules linked in: CPU: 0 PID: 26899 Comm: syz-executor.5 Tainted: G W 5.12.0-next-20210504-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:rcu_note_context_switch+0xfd/0x16e0 kernel/rcu/tree_plugin.h:359 Code: 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 2e 0d 00 00 8b bd cc 03 00 00 85 ff 7e 02 <0f> 0b 65 48 8b 2c 25 00 f0 01 00 48 8d bd cc 03 00 00 48 b8 00 00 RSP: 0000:ffffc90002fffdb0 EFLAGS: 00010002 RAX: 0000000000000007 RBX: ffff8880b9c36080 RCX: ffffffff8dc99bac RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001 RBP: ffff88808b9d1c80 R08: 0000000000000000 R09: ffffffff8dc96917 R10: fffffbfff1b92d22 R11: 0000000000000000 R12: 0000000000000000 R13: ffff88808b9d1c80 R14: ffff88808b9d1c80 R15: ffffc90002ff8000 FS: 00007fa8a03ee700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f09896ed000 CR3: 0000000032070000 CR4: 00000000001526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __schedule+0x214/0x23e0 kernel/sched/core.c:5044 schedule+0xcf/0x270 kernel/sched/core.c:5226 exit_to_user_mode_loop kernel/entry/common.c:162 [inline] exit_to_user_mode_prepare+0x13e/0x280 kernel/entry/common.c:208 irqentry_exit_to_user_mode+0x5/0x40 kernel/entry/common.c:314 asm_sysvec_reschedule_ipi+0x12/0x20 arch/x86/include/asm/idtentry.h:637 RIP: 0033:0x4665f9 Fixes: 50f2db9e368f ("netfilter: nfnetlink: consolidate callback types") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-05-03netfilter: arptables: use pernet ops struct during unregisterFlorian Westphal3-6/+4
Like with iptables and ebtables, hook unregistration has to use the pernet ops struct, not the template. This triggered following splat: hook not found, pf 3 num 0 WARNING: CPU: 0 PID: 224 at net/netfilter/core.c:480 __nf_unregister_net_hook+0x1eb/0x610 net/netfilter/core.c:480 [..] nf_unregister_net_hook net/netfilter/core.c:502 [inline] nf_unregister_net_hooks+0x117/0x160 net/netfilter/core.c:576 arpt_unregister_table_pre_exit+0x67/0x80 net/ipv4/netfilter/arp_tables.c:1565 Fixes: f9006acc8dfe5 ("netfilter: arp_tables: pass table pointer via nf_hook_ops") Reported-by: [email protected] Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-05-03netfilter: xt_SECMARK: add new revision to fix structure layoutPablo Neira Ayuso2-19/+75
This extension breaks when trying to delete rules, add a new revision to fix this. Fixes: 5e6874cdb8de ("[SECMARK]: Add xtables SECMARK target") Signed-off-by: Phil Sutter <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-05-03Documentation: ABI: sysfs-class-net-qmi: document pass-through fileDaniele Palmas1-0/+16
Add documentation for /sys/class/net/<iface>/qmi/pass_through Signed-off-by: Daniele Palmas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-05-03Revert "drivers/net/wan/hdlc_fr: Fix a double free in pvc_xmit"Xie He1-3/+2
This reverts commit 1b479fb80160 ("drivers/net/wan/hdlc_fr: Fix a double free in pvc_xmit"). 1. This commit is incorrect. "__skb_pad" will NOT free the skb on failure when its "free_on_error" parameter is "false". 2. This commit claims to fix my commit. But it didn't CC me?? Fixes: 1b479fb80160 ("drivers/net/wan/hdlc_fr: Fix a double free in pvc_xmit") Cc: Lv Yunlong <[email protected]> Signed-off-by: Xie He <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-05-03Merge branch 'sctp-race-fix'David S. Miller1-16/+22
Xin Long says: ==================== sctp: fix the race condition in sctp_destroy_sock in a proper way The original fix introduced a dead lock, and has to be removed in Patch 1/2, and we will get a proper way to fix it in Patch 2/2. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-05-03sctp: delay auto_asconf init until binding the first addrXin Long1-14/+17
As Or Cohen described: If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock held and sp->do_auto_asconf is true, then an element is removed from the auto_asconf_splist without any proper locking. This can happen in the following functions: 1. In sctp_accept, if sctp_sock_migrate fails. 2. In inet_create or inet6_create, if there is a bpf program attached to BPF_CGROUP_INET_SOCK_CREATE which denies creation of the sctp socket. This patch is to fix it by moving the auto_asconf init out of sctp_init_sock(), by which inet_create()/inet6_create() won't need to operate it in sctp_destroy_sock() when calling sk_common_release(). It also makes more sense to do auto_asconf init while binding the first addr, as auto_asconf actually requires an ANY addr bind, see it in sctp_addr_wq_timeout_handler(). This addresses CVE-2021-23133. Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications") Reported-by: Or Cohen <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-05-03Revert "net/sctp: fix race condition in sctp_destroy_sock"Xin Long1-5/+8
This reverts commit b166a20b07382b8bc1dcee2a448715c9c2c81b5b. This one has to be reverted as it introduced a dead lock, as syzbot reported: CPU0 CPU1 ---- ---- lock(&net->sctp.addr_wq_lock); lock(slock-AF_INET6); lock(&net->sctp.addr_wq_lock); lock(slock-AF_INET6); CPU0 is the thread of sctp_addr_wq_timeout_handler(), and CPU1 is that of sctp_close(). The original issue this commit fixed will be fixed in the next patch. Reported-by: [email protected] Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-05-03net: hsr: check skb can contain struct hsr_ethhdr in fill_frame_infoPhillip Potter1-0/+4
Check at start of fill_frame_info that the MAC header in the supplied skb is large enough to fit a struct hsr_ethhdr, as otherwise this is not a valid HSR frame. If it is too small, return an error which will then cause the callers to clean up the skb. Fixes a KMSAN-found uninit-value bug reported by syzbot at: https://syzkaller.appspot.com/bug?id=f7e9b601f1414f814f7602a82b6619a8d80bce3f Reported-by: [email protected] Signed-off-by: Phillip Potter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-05-03sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_bXin Long1-1/+2
Normally SCTP_MIB_CURRESTAB is always incremented once asoc enter into ESTABLISHED from the state < ESTABLISHED and decremented when the asoc is being deleted. However, in sctp_sf_do_dupcook_b(), the asoc's state can be changed to ESTABLISHED from the state >= ESTABLISHED where it shouldn't increment SCTP_MIB_CURRESTAB. Otherwise, one asoc may increment MIB_CURRESTAB multiple times but only decrement once at the end. I was able to reproduce it by using scapy to do the 4-way shakehands, after that I replayed the COOKIE-ECHO chunk with 'peer_vtag' field changed to different values, and SCTP_MIB_CURRESTAB was incremented multiple times and never went back to 0 even when the asoc was freed. This patch is to fix it by only incrementing SCTP_MIB_CURRESTAB when the state < ESTABLISHED in sctp_sf_do_dupcook_b(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-05-03Merge branch 'sctp-bad-revert'David S. Miller2-8/+4
Xin Long says: ==================== sctp: fix the incorrect revert commit 35b4f24415c8 ("sctp: do asoc update earlier in sctp_sf_do_dupcook_a") only keeps the SHUTDOWN and COOKIE-ACK with the same asoc, not transport. So instead of revert commit 145cb2f7177d ("sctp: Fix bundling of SHUTDOWN with COOKIE-ACK"), we should revert 12dfd78e3a74 ("sctp: Fix SHUTDOWN CTSN Ack in the peer restart case"). ==================== Signed-off-by: David S. Miller <[email protected]>
2021-05-03Revert "sctp: Fix SHUTDOWN CTSN Ack in the peer restart case"Xin Long1-5/+1
This reverts commit 12dfd78e3a74825e6f0bc8df7ef9f938fbc6bfe3. This can be reverted as shutdown and cookie_ack chunk are using the same asoc since commit 35b4f24415c8 ("sctp: do asoc update earlier in sctp_sf_do_dupcook_a"). Reported-by: Jere Leppänen <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-05-03Revert "Revert "sctp: Fix bundling of SHUTDOWN with COOKIE-ACK""Xin Long1-3/+3
This reverts commit 7e9269a5acec6d841d22e12770a0b02db4f5d8f2. As Jere notice, commit 35b4f24415c8 ("sctp: do asoc update earlier in sctp_sf_do_dupcook_a") only keeps the SHUTDOWN and COOKIE-ACK with the same asoc, not transport. So we have to bring this patch back. Reported-by: Jere Leppänen <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-05-03ethernet:enic: Fix a use after free bug in enic_hard_start_xmitLv Yunlong1-2/+5
In enic_hard_start_xmit, it calls enic_queue_wq_skb(). Inside enic_queue_wq_skb, if some error happens, the skb will be freed by dev_kfree_skb(skb). But the freed skb is still used in skb_tx_timestamp(skb). My patch makes enic_queue_wq_skb() return error and goto spin_unlock() incase of error. The solution is provided by Govind. See https://lkml.org/lkml/2021/4/30/961. Fixes: fb7516d42478e ("enic: add sw timestamp support") Signed-off-by: Lv Yunlong <[email protected]> Acked-by: Govindarajulu Varadarajan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30net: stmmac: Remove duplicate declaration of stmmac_privWan Jiabing1-1/+0
In commit f4da56529da60 ("net: stmmac: Add support for external trigger timestamping"), struct stmmac_priv was declared at line 507 which caused duplicate struct declarations. Remove later duplicate declaration here. Signed-off-by: Wan Jiabing <[email protected]> Reviewed-by: Wong Vee Khee <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30net: phy: marvell: enable downshift by defaultMaxim Kochetkov1-12/+50
A number of PHYs support the PHY tunable to set and get downshift. However, only 88E1116R enables downshift by default. Extend this default enabled to all the PHYs that support the downshift tunable. Signed-off-by: Maxim Kochetkov <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30Merge branch 'sctp-chunk-fix'David S. Miller3-38/+39
Xin Long says: ==================== sctp: always send a chunk with the asoc that it belongs to Currently when processing a duplicate COOKIE-ECHO chunk, a new temp asoc would be created, then it creates the chunks with the new asoc. However, later on it uses the old asoc to send these chunks, which has caused quite a few issues. This patchset is to fix this and make sure that the COOKIE-ACK and SHUTDOWN chunks are created with the same asoc that will be used to send them out. v1->v2: - see Patch 3/3. ==================== Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30sctp: do asoc update earlier in sctp_sf_do_dupcook_bXin Long3-44/+30
The same thing should be done for sctp_sf_do_dupcook_b(). Meanwhile, SCTP_CMD_UPDATE_ASSOC cmd can be removed. v1->v2: - Fix the return value in sctp_sf_do_assoc_update(). Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30Revert "sctp: Fix bundling of SHUTDOWN with COOKIE-ACK"Xin Long1-3/+3
This can be reverted as shutdown and cookie_ack chunk are using the same asoc since the last patch. This reverts commit 145cb2f7177d94bc54563ed26027e952ee0ae03c. Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30sctp: do asoc update earlier in sctp_sf_do_dupcook_aXin Long1-5/+20
There's a panic that occurs in a few of envs, the call trace is as below: [] general protection fault, ... 0x29acd70f1000a: 0000 [#1] SMP PTI [] RIP: 0010:sctp_ulpevent_notify_peer_addr_change+0x4b/0x1fa [sctp] [] sctp_assoc_control_transport+0x1b9/0x210 [sctp] [] sctp_do_8_2_transport_strike.isra.16+0x15c/0x220 [sctp] [] sctp_cmd_interpreter.isra.21+0x1231/0x1a10 [sctp] [] sctp_do_sm+0xc3/0x2a0 [sctp] [] sctp_generate_timeout_event+0x81/0xf0 [sctp] This is caused by a transport use-after-free issue. When processing a duplicate COOKIE-ECHO chunk in sctp_sf_do_dupcook_a(), both COOKIE-ACK and SHUTDOWN chunks are allocated with the transort from the new asoc. However, later in the sideeffect machine, the old asoc is used to send them out and old asoc's shutdown_last_sent_to is set to the transport that SHUTDOWN chunk attached to in sctp_cmd_setup_t2(), which actually belongs to the new asoc. After the new_asoc is freed and the old asoc T2 timeout, the old asoc's shutdown_last_sent_to that is already freed would be accessed in sctp_sf_t2_timer_expire(). Thanks Alexander and Jere for helping dig into this issue. To fix it, this patch is to do the asoc update first, then allocate the COOKIE-ACK and SHUTDOWN chunks with the 'updated' old asoc. This would make more sense, as a chunk from an asoc shouldn't be sent out with another asoc. We had fixed quite a few issues caused by this. Fixes: 145cb2f7177d ("sctp: Fix bundling of SHUTDOWN with COOKIE-ACK") Reported-by: Alexander Sverdlin <[email protected]> Reported-by: [email protected] Reported-by: Michal Tesar <[email protected]> Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30afs, rxrpc: Add Marc Dionne as co-maintainerMarc Dionne1-0/+2
Add Marc Dionne as a co-maintainer for kafs and rxrpc. Signed-off-by: Marc Dionne <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30net: atheros: [email protected] is deadJohannes Berg2-2/+2
Remove it from the MODULE_AUTHOR statements referencing it. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30vsock/vmci: Remove redundant assignment to errYang Li1-2/+0
Variable 'err' is set to zero but this value is never read as it is overwritten with a new value later on, hence it is a redundant assignment and can be removed. Clean up the following clang-analyzer warning: net/vmw_vsock/vmci_transport.c:948:2: warning: Value stored to 'err' is never read [clang-analyzer-deadcode.DeadStores] Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30Merge branch 'hns3-fixes'David S. Miller3-3/+12
Huazhong Tan says: ==================== net: hns3: fixes for -net This series adds some bugfixes for the HNS3 ethernet driver. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-04-30net: hns3: disable phy loopback setting in hclge_mac_start_phyYufeng Mo1-0/+2
If selftest and reset are performed at the same time, the phy loopback setting may be still in enable state after the reset, and device cannot link up. So fix this issue by disabling phy loopback before phy_start(). Fixes: 256727da7395 ("net: hns3: Add MDIO support to HNS3 Ethernet driver for hip08 SoC") Signed-off-by: Yufeng Mo <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30net: hns3: clear unnecessary reset request in hclge_reset_rebuildYufeng Mo1-0/+6
HW error and global reset are reported through MSIX interrupts. The same error may be reported to different functions at the same time. When global reset begins, the pending reset request set by this error is unnecessary. So clear the pending reset request after the reset is complete to avoid the repeated reset. Fixes: f6162d44126c ("net: hns3: add handling of hw errors reported through MSIX") Signed-off-by: Yufeng Mo <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30net: hns3: use netif_tx_disable to stop the transmit queuePeng Li1-1/+1
Currently, netif_tx_stop_all_queues() is used to ensure that the xmit is not running, but for the concurrent case it will not take effect, since netif_tx_stop_all_queues() just sets a flag without locking to indicate that the xmit queue(s) should not be run. So use netif_tx_disable() to replace netif_tx_stop_all_queues(), it takes the xmit queue lock while marking the queue stopped. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Peng Li <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30net: hns3: fix for vxlan gpe tx checksum bugHao Chen1-2/+3
When skb->ip_summed is CHECKSUM_PARTIAL, for non-tunnel udp packet, which has a dest port as the IANA assigned, the hardware is expected to do the checksum offload, but the hardware whose version is below V3 will not do the checksum offload when udp dest port is 4790. So fixes it by doing the checksum in software for this case. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Hao Chen <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-30net: stmmac: cleared __FPE_REMOVING bit in stmmac_fpe_start_wq()Mohammad Athari Bin Ismail1-0/+1
An issue found when network interface is down and up again, FPE handshake fails to trigger. This is due to __FPE_REMOVING bit remains being set in stmmac_fpe_stop_wq() but not cleared in stmmac_fpe_start_wq(). This cause FPE workqueue task, stmmac_fpe_lp_task() not able to be executed. To fix this, add clearing __FPE_REMOVING bit in stmmac_fpe_start_wq(). Fixes: 5a5586112b92 ("net: stmmac: support FPE link partner hand-shaking procedure") Signed-off-by: Mohammad Athari Bin Ismail <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29net: dsa: ksz: ksz8863_smi_probe: set proper return value for ksz_switch_alloc()Oleksij Rempel1-1/+1
ksz_switch_alloc() will return NULL only if allocation is failed. So, the proper return value is -ENOMEM. Fixes: 60a364760002 ("net: dsa: microchip: Add Microchip KSZ8863 SMI based driver support") Signed-off-by: Oleksij Rempel <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29net: dsa: ksz: ksz8795_spi_probe: fix possible NULL pointer dereferenceOleksij Rempel1-0/+3
Fix possible NULL pointer dereference in case devm_kzalloc() failed to allocate memory Fixes: cc13e52c3a89 ("net: dsa: microchip: Add Microchip KSZ8863 SPI based driver support") Reported-by: Colin Ian King <[email protected]> Signed-off-by: Oleksij Rempel <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29net: dsa: ksz: ksz8863_smi_probe: fix possible NULL pointer dereferenceOleksij Rempel1-0/+3
Fix possible NULL pointer dereference in case devm_kzalloc() failed to allocate memory. Fixes: 60a364760002 ("net: dsa: microchip: Add Microchip KSZ8863 SMI based driver support") Reported-by: Colin Ian King <[email protected]> Signed-off-by: Oleksij Rempel <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29bnx2x: Remove redundant assignment to errYang Li1-1/+0
Variable 'err' is set to -EIO but this value is never read as it is overwritten with a new value later on, hence it is a redundant assignment and can be removed. Clean up the following clang-analyzer warning: drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1195:2: warning: Value stored to 'err' is never read [clang-analyzer-deadcode.DeadStores] Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29net: macb: Remove redundant assignment to queueJiapeng Chong1-2/+2
Variable queue is set to bp->queues but these values is not used as it is overwritten later on, hence redundant assignment can be removed. Cleans up the following clang-analyzer warning: drivers/net/ethernet/cadence/macb_main.c:4919:21: warning: Value stored to 'queue' during its initialization is never read [clang-analyzer-deadcode.DeadStores]. drivers/net/ethernet/cadence/macb_main.c:4832:21: warning: Value stored to 'queue' during its initialization is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <[email protected]> Signed-off-by: Jiapeng Chong <[email protected]> Acked-by: Nicolas Ferre <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29MAINTAINERS: move Murali Karicheri to creditsMichael Walle2-13/+5
His email bounces with permanent error "550 Invalid recipient". His last email was from 2020-09-09 on the LKML and he seems to have left TI. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29MAINTAINERS: remove Wingman KwokMichael Walle1-1/+0
His email bounces with permanent error "550 Invalid recipient". His last email on the LKML was from 2015-10-22 on the LKML. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29Merge branch 'hns3-fixes'David S. Miller4-3/+10
Huazhong Tan says: ==================== net: hns3: add some fixes for -net This series adds some fixes for the HNS3 ethernet driver. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-04-29net: hns3: add check for HNS3_NIC_STATE_INITED in hns3_reset_notify_up_enet()Jian Shen1-0/+5
In some cases, the device is not initialized because reset failed. If another task calls hns3_reset_notify_up_enet() before reset retry, it will cause an error since uninitialized pointer access. So add check for HNS3_NIC_STATE_INITED before calling hns3_nic_net_open() in hns3_reset_notify_up_enet(). Fixes: bb6b94a896d4 ("net: hns3: Add reset interface implementation in client") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29net: hns3: initialize the message content in hclge_get_link_mode()Yufeng Mo1-1/+1
The message sent to VF should be initialized, otherwise random value of some contents may cause improper processing by the target. So add a initialization to message in hclge_get_link_mode(). Fixes: 9194d18b0577 ("net: hns3: fix the problem that the supported port is empty") Signed-off-by: Yufeng Mo <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29net: hns3: fix incorrect configuration for igu_egu_hw_errYufeng Mo2-2/+4
According to the UM, the type and enable status of igu_egu_hw_err should be configured separately. Currently, the type field is incorrect when disable this error. So fix it by configuring these two fields separately. Fixes: bf1faf9415dd ("net: hns3: Add enable and process hw errors from IGU, EGU and NCSI") Signed-off-by: Yufeng Mo <[email protected]> Signed-off-by: Huazhong Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29net: Remove redundant assignment to errYang Li1-3/+0
Variable 'err' is set to -ENOMEM but this value is never read as it is overwritten with a new value later on, hence the 'If statements' and assignments are redundantand and can be removed. Cleans up the following clang-analyzer warning: net/ipv6/seg6.c:126:4: warning: Value stored to 'err' is never read [clang-analyzer-deadcode.DeadStores] Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29bridge: Fix possible races between assigning rx_handler_data and setting ↵Zhang Zhengming1-2/+3
IFF_BRIDGE_PORT bit There is a crash in the function br_get_link_af_size_filtered, as the port_exists(dev) is true and the rx_handler_data of dev is NULL. But the rx_handler_data of dev is correct saved in vmcore. The oops looks something like: ... pc : br_get_link_af_size_filtered+0x28/0x1c8 [bridge] ... Call trace: br_get_link_af_size_filtered+0x28/0x1c8 [bridge] if_nlmsg_size+0x180/0x1b0 rtnl_calcit.isra.12+0xf8/0x148 rtnetlink_rcv_msg+0x334/0x370 netlink_rcv_skb+0x64/0x130 rtnetlink_rcv+0x28/0x38 netlink_unicast+0x1f0/0x250 netlink_sendmsg+0x310/0x378 sock_sendmsg+0x4c/0x70 __sys_sendto+0x120/0x150 __arm64_sys_sendto+0x30/0x40 el0_svc_common+0x78/0x130 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc In br_add_if(), we found there is no guarantee that assigning rx_handler_data to dev->rx_handler_data will before setting the IFF_BRIDGE_PORT bit of priv_flags. So there is a possible data competition: CPU 0: CPU 1: (RCU read lock) (RTNL lock) rtnl_calcit() br_add_slave() if_nlmsg_size() br_add_if() br_get_link_af_size_filtered() -> netdev_rx_handler_register ... // The order is not guaranteed ... -> dev->priv_flags |= IFF_BRIDGE_PORT; // The IFF_BRIDGE_PORT bit of priv_flags has been set -> if (br_port_exists(dev)) { // The dev->rx_handler_data has NOT been assigned -> p = br_port_get_rcu(dev); .... -> rcu_assign_pointer(dev->rx_handler_data, rx_handler_data); ... Fix it in br_get_link_af_size_filtered, using br_port_get_check_rcu() and checking the return value. Signed-off-by: Zhang Zhengming <[email protected]> Reviewed-by: Zhao Lei <[email protected]> Reviewed-by: Wang Xiaogang <[email protected]> Suggested-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29Merge branch 'fragment-stack-oob-read'David S. Miller2-8/+8
Davide Caratti says: ==================== fix stack OOB read while fragmenting IPv4 packets - patch 1/2 fixes openvswitch IPv4 fragmentation, that does a stack OOB read after commit d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmt") - patch 2/2 fixes the same issue in TC 'sch_frag' code ==================== Signed-off-by: David S. Miller <[email protected]>
2021-04-29net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packetsDavide Caratti1-4/+4
when 'act_mirred' tries to fragment IPv4 packets that had been previously re-assembled using 'act_ct', splats like the following can be observed on kernels built with KASAN: BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60 Read of size 1 at addr ffff888147009574 by task ping/947 CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 Call Trace: <IRQ> dump_stack+0x92/0xc1 print_address_description.constprop.7+0x1a/0x150 kasan_report.cold.13+0x7f/0x111 ip_do_fragment+0x1b03/0x1f60 sch_fragment+0x4bf/0xe40 tcf_mirred_act+0xc3d/0x11a0 [act_mirred] tcf_action_exec+0x104/0x3e0 fl_classify+0x49a/0x5e0 [cls_flower] tcf_classify_ingress+0x18a/0x820 __netif_receive_skb_core+0xae7/0x3340 __netif_receive_skb_one_core+0xb6/0x1b0 process_backlog+0x1ef/0x6c0 __napi_poll+0xaa/0x500 net_rx_action+0x702/0xac0 __do_softirq+0x1e4/0x97f do_softirq+0x71/0x90 </IRQ> __local_bh_enable_ip+0xdb/0xf0 ip_finish_output2+0x760/0x2120 ip_do_fragment+0x15a5/0x1f60 __ip_finish_output+0x4c2/0xea0 ip_output+0x1ca/0x4d0 ip_send_skb+0x37/0xa0 raw_sendmsg+0x1c4b/0x2d00 sock_sendmsg+0xdb/0x110 __sys_sendto+0x1d7/0x2b0 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f82e13853eb Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89 RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003 RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0 R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0 The buggy address belongs to the page: page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009 flags: 0x17ffffc0001000(reserved) raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 >ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 ^ ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then, in the following call graph: ip_do_fragment() ip_skb_dst_mtu() ip_dst_mtu_maybe_forward() ip_mtu_locked() the pointer to struct dst_entry is used as pointer to struct rtable: this turns the access to struct members like rt_mtu_locked into an OOB read in the stack. Fix this changing the temporary variable used for IPv4 packets in sch_fragment(), similarly to what is done for IPv6 few lines below. Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.") Cc: <[email protected]> # 5.11 Reported-by: Shuang Li <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29openvswitch: fix stack OOB read while fragmenting IPv4 packetsDavide Caratti1-4/+4
running openvswitch on kernels built with KASAN, it's possible to see the following splat while testing fragmentation of IPv4 packets: BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60 Read of size 1 at addr ffff888112fc713c by task handler2/1367 CPU: 0 PID: 1367 Comm: handler2 Not tainted 5.12.0-rc6+ #418 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 Call Trace: dump_stack+0x92/0xc1 print_address_description.constprop.7+0x1a/0x150 kasan_report.cold.13+0x7f/0x111 ip_do_fragment+0x1b03/0x1f60 ovs_fragment+0x5bf/0x840 [openvswitch] do_execute_actions+0x1bd5/0x2400 [openvswitch] ovs_execute_actions+0xc8/0x3d0 [openvswitch] ovs_packet_cmd_execute+0xa39/0x1150 [openvswitch] genl_family_rcv_msg_doit.isra.15+0x227/0x2d0 genl_rcv_msg+0x287/0x490 netlink_rcv_skb+0x120/0x380 genl_rcv+0x24/0x40 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f957079db07 Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 eb ec ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 24 ed ff ff 48 RSP: 002b:00007f956ce35a50 EFLAGS: 00000293 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f957079db07 RDX: 0000000000000000 RSI: 00007f956ce35ae0 RDI: 0000000000000019 RBP: 00007f956ce35ae0 R08: 0000000000000000 R09: 00007f9558006730 R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 00007f956ce37308 R14: 00007f956ce35f80 R15: 00007f956ce35ae0 The buggy address belongs to the page: page:00000000af2a1d93 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x112fc7 flags: 0x17ffffc0000000() raw: 0017ffffc0000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected addr ffff888112fc713c is located in stack of task handler2/1367 at offset 180 in frame: ovs_fragment+0x0/0x840 [openvswitch] this frame has 2 objects: [32, 144) 'ovs_dst' [192, 424) 'ovs_rt' Memory state around the buggy address: ffff888112fc7000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888112fc7080: 00 f1 f1 f1 f1 00 00 00 00 00 00 00 00 00 00 00 >ffff888112fc7100: 00 00 00 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 ^ ffff888112fc7180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888112fc7200: 00 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 for IPv4 packets, ovs_fragment() uses a temporary struct dst_entry. Then, in the following call graph: ip_do_fragment() ip_skb_dst_mtu() ip_dst_mtu_maybe_forward() ip_mtu_locked() the pointer to struct dst_entry is used as pointer to struct rtable: this turns the access to struct members like rt_mtu_locked into an OOB read in the stack. Fix this changing the temporary variable used for IPv4 packets in ovs_fragment(), similarly to what is done for IPv6 few lines below. Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmt") Cc: <[email protected]> Acked-by: Eelco Chaudron <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-29seg6: add counters support for SRv6 BehaviorsAndrea Mayer2-2/+226
This patch provides counters for SRv6 Behaviors as defined in [1], section 6. For each SRv6 Behavior instance, counters defined in [1] are: - the total number of packets that have been correctly processed; - the total amount of traffic in bytes of all packets that have been correctly processed; In addition, this patch introduces a new counter that counts the number of packets that have NOT been properly processed (i.e. errors) by an SRv6 Behavior instance. Counters are not only interesting for network monitoring purposes (i.e. counting the number of packets processed by a given behavior) but they also provide a simple tool for checking whether a behavior instance is working as we expect or not. Counters can be useful for troubleshooting misconfigured SRv6 networks. Indeed, an SRv6 Behavior can silently drop packets for very different reasons (i.e. wrong SID configuration, interfaces set with SID addresses, etc) without any notification/message to the user. Due to the nature of SRv6 networks, diagnostic tools such as ping and traceroute may be ineffective: paths used for reaching a given router can be totally different from the ones followed by probe packets. In addition, paths are often asymmetrical and this makes it even more difficult to keep up with the journey of the packets and to understand which behaviors are actually processing our traffic. When counters are enabled on an SRv6 Behavior instance, it is possible to verify if packets are actually processed by such behavior and what is the outcome of the processing. Therefore, the counters for SRv6 Behaviors offer an non-invasive observability point which can be leveraged for both traffic monitoring and troubleshooting purposes. [1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters Troubleshooting using SRv6 Behavior counters -------------------------------------------- Let's make a brief example to see how helpful counters can be for SRv6 networks. Let's consider a node where an SRv6 End Behavior receives an SRv6 packet whose Segment Left (SL) is equal to 0. In this case, the End Behavior (which accepts only packets with SL >= 1) discards the packet and increases the error counter. This information can be leveraged by the network operator for troubleshooting. Indeed, the error counter is telling the user that the packet: (i) arrived at the node; (ii) the packet has been taken into account by the SRv6 End behavior; (iii) but an error has occurred during the processing. The error (iii) could be caused by different reasons, such as wrong route settings on the node or due to an invalid SID List carried by the SRv6 packet. Anyway, the error counter is used to exclude that the packet did not arrive at the node or it has not been processed by the behavior at all. Turning on/off counters for SRv6 Behaviors ------------------------------------------ Each SRv6 Behavior instance can be configured, at the time of its creation, to make use of counters. This is done through iproute2 which allows the user to create an SRv6 Behavior instance specifying the optional "count" attribute as shown in the following example: $ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0 per-behavior counters can be shown by adding "-s" to the iproute2 command line, i.e.: $ ip -s -6 route show 2001:db8::1 2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Impact of counters for SRv6 Behaviors on performance ==================================================== To determine the performance impact due to the introduction of counters in the SRv6 Behavior subsystem, we have carried out extensive tests. We chose to test the throughput achieved by the SRv6 End.DX2 Behavior because, among all the other behaviors implemented so far, it reaches the highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100 bytes. Three different tests were conducted in order to evaluate the overall throughput of the SRv6 End.DX2 Behavior in the following scenarios: 1) vanilla kernel (without the SRv6 Behavior counters patch) and a single instance of an SRv6 End.DX2 Behavior; 2) patched kernel with SRv6 Behavior counters and a single instance of an SRv6 End.DX2 Behavior with counters turned off; 3) patched kernel with SRv6 Behavior counters and a single instance of SRv6 End.DX2 Behavior with counters turned on. All tests were performed on a testbed deployed on the CloudLab facilities [2], a flexible infrastructure dedicated to scientific research on the future of Cloud Computing. Results of tests are shown in the following table: Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps As can be observed, throughputs achieved in scenarios (2),(3) did not suffer any observable degradation compared to scenario (1). Thanks to Jakub Kicinski and David Ahern for their valuable suggestions and comments provided during the discussion of the proposed RFCs. [2] https://www.cloudlab.us Signed-off-by: Andrea Mayer <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>