aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2023-10-04kthread: add kthread_stop_putAndreas Gruenbacher1-2/+1
Add a kthread_stop_put() helper that stops a thread and puts its task struct. Use it to replace the various instances of kthread_stop() followed by put_task_struct(). Remove the kthread_stop_put() macro in usbip that is similar but doesn't return the result of kthread_stop(). [[email protected]: fix kerneldoc comment] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: document kthread_stop_put()'s argument] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andreas Gruenbacher <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04sunrpc: dynamically allocate the sunrpc_cred shrinkerQi Zheng1-8/+12
Use new APIs to dynamically allocate the sunrpc_cred shrinker. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qi Zheng <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Neil Brown <[email protected]> Cc: Olga Kornievskaia <[email protected]> Cc: Dai Ngo <[email protected]> Cc: Tom Talpey <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: Anna Schumaker <[email protected]> Cc: Abhinav Kumar <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alyssa Rosenzweig <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Bob Peterson <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Carlos Llamas <[email protected]> Cc: Chandan Babu R <[email protected]> Cc: Chao Yu <[email protected]> Cc: Chris Mason <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Christian Koenig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Coly Li <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: "Darrick J. Wong" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Airlie <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Sterba <[email protected]> Cc: Dmitry Baryshkov <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Huang Rui <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jeffle Xu <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Marijn Suijten <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Oleksandr Tyshchenko <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rob Clark <[email protected]> Cc: Rob Herring <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sean Paul <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Song Liu <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Steven Price <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tomeu Vizoso <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xuan Zhuo <[email protected]> Cc: Yue Hu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04Merge tag 'for-netdev' of ↵Jakub Kicinski3-9/+9
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-10-02 We've added 11 non-merge commits during the last 12 day(s) which contain a total of 12 files changed, 176 insertions(+), 41 deletions(-). The main changes are: 1) Fix BPF verifier to reset backtrack_state masks on global function exit as otherwise subsequent precision tracking would reuse them, from Andrii Nakryiko. 2) Several sockmap fixes for available bytes accounting, from John Fastabend. 3) Reject sk_msg egress redirects to non-TCP sockets given this is only supported for TCP sockets today, from Jakub Sitnicki. 4) Fix a syzkaller splat in bpf_mprog when hitting maximum program limits with BPF_F_BEFORE directive, from Daniel Borkmann and Nikolay Aleksandrov. 5) Fix BPF memory allocator to use kmalloc_size_roundup() to adjust size_index for selecting a bpf_mem_cache, from Hou Tao. 6) Fix arch_prepare_bpf_trampoline return code for s390 JIT, from Song Liu. 7) Fix bpf_trampoline_get when CONFIG_BPF_JIT is turned off, from Leon Hwang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Use kmalloc_size_roundup() to adjust size_index selftest/bpf: Add various selftests for program limits bpf, mprog: Fix maximum program check on mprog attachment bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets bpf, sockmap: Add tests for MSG_F_PEEK bpf, sockmap: Do not inc copied_seq when PEEK flag set bpf: tcp_read_skb needs to pop skb regardless of seq bpf: unconditionally reset backtrack_state masks on global func exit bpf: Fix tr dereferencing selftests/bpf: Check bpf_cubic_acked() is called via struct_ops s390/bpf: Let arch_prepare_bpf_trampoline return program size ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-04netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failureFlorian Westphal1-17/+29
nft_rbtree_gc_elem() walks back and removes the end interval element that comes before the expired element. There is a small chance that we've cached this element as 'rbe_ge'. If this happens, we hold and test a pointer that has been queued for freeing. It also causes spurious insertion failures: $ cat test-testcases-sets-0044interval_overlap_0.1/testout.log Error: Could not process rule: File exists add element t s { 0 - 2 } ^^^^^^ Failed to insert 0 - 2 given: table ip t { set s { type inet_service flags interval,timeout timeout 2s gc-interval 2s } } The set (rbtree) is empty. The 'failure' doesn't happen on next attempt. Reason is that when we try to insert, the tree may hold an expired element that collides with the range we're adding. While we do evict/erase this element, we can trip over this check: if (rbe_ge && nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new)) return -ENOTEMPTY; rbe_ge was erased by the synchronous gc, we should not have done this check. Next attempt won't find it, so retry results in successful insertion. Restart in-kernel to avoid such spurious errors. Such restart are rare, unless userspace intentionally adds very large numbers of elements with very short timeouts while setting a huge gc interval. Even in this case, this cannot loop forever, on each retry an existing element has been removed. As the caller is holding the transaction mutex, its impossible for a second entity to add more expiring elements to the tree. After this it also becomes feasible to remove the async gc worker and perform all garbage collection from the commit path. Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Florian Westphal <[email protected]>
2023-10-04netfilter: nf_tables: Deduplicate nft_register_obj audit logsPhil Sutter1-16/+28
When adding/updating an object, the transaction handler emits suitable audit log entries already, the one in nft_obj_notify() is redundant. To fix that (and retain the audit logging from objects' 'update' callback), Introduce an "audit log free" variant for internal use. Fixes: c520292f29b8 ("audit: log nftables configuration change events once per table") Signed-off-by: Phil Sutter <[email protected]> Reviewed-by: Richard Guy Briggs <[email protected]> Acked-by: Paul Moore <[email protected]> (Audit) Signed-off-by: Florian Westphal <[email protected]>
2023-10-04netfilter: handle the connecting collision properly in nf_conntrack_proto_sctpXin Long1-10/+33
In Scenario A and B below, as the delayed INIT_ACK always changes the peer vtag, SCTP ct with the incorrect vtag may cause packet loss. Scenario A: INIT_ACK is delayed until the peer receives its own INIT_ACK 192.168.1.2 > 192.168.1.1: [INIT] [init tag: 1328086772] 192.168.1.1 > 192.168.1.2: [INIT] [init tag: 1414468151] 192.168.1.2 > 192.168.1.1: [INIT ACK] [init tag: 1328086772] 192.168.1.1 > 192.168.1.2: [INIT ACK] [init tag: 1650211246] * 192.168.1.2 > 192.168.1.1: [COOKIE ECHO] 192.168.1.1 > 192.168.1.2: [COOKIE ECHO] 192.168.1.2 > 192.168.1.1: [COOKIE ACK] Scenario B: INIT_ACK is delayed until the peer completes its own handshake 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885] 192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO] 192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021] * This patch fixes it as below: In SCTP_CID_INIT processing: - clear ct->proto.sctp.init[!dir] if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir]. (Scenario E) - set ct->proto.sctp.init[dir]. In SCTP_CID_INIT_ACK processing: - drop it if !ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] && ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario B, Scenario C) - drop it if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario A) In SCTP_CID_COOKIE_ACK processing: - clear ct->proto.sctp.init[dir] and ct->proto.sctp.init[!dir]. (Scenario D) Also, it's important to allow the ct state to move forward with cookie_echo and cookie_ack from the opposite dir for the collision scenarios. There are also other Scenarios where it should allow the packet through, addressed by the processing above: Scenario C: new CT is created by INIT_ACK. Scenario D: start INIT on the existing ESTABLISHED ct. Scenario E: start INIT after the old collision on the existing ESTABLISHED ct. 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885] (both side are stopped, then start new connection again in hours) 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 242308742] Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Signed-off-by: Xin Long <[email protected]> Signed-off-by: Florian Westphal <[email protected]>
2023-10-04netfilter: nft_payload: rebuild vlan header on h_proto accessFlorian Westphal1-1/+12
nft can perform merging of adjacent payload requests. This means that: ether saddr 00:11 ... ether type 8021ad ... is a single payload expression, for 8 bytes, starting at the ethernet source offset. Check that offset+length is fully within the source/destination mac addersses. This bug prevents 'ether type' from matching the correct h_proto in case vlan tag got stripped. Fixes: de6843be3082 ("netfilter: nft_payload: rebuild vlan header when needed") Reported-by: David Ward <[email protected]> Signed-off-by: Florian Westphal <[email protected]>
2023-10-04can: raw: Remove NULL check before dev_{put, hold}Jiapeng Chong1-2/+1
The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there is no need to check before using dev_{put, hold}, remove it to silence the warning: ./net/can/raw.c:497:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed. Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6231 Signed-off-by: Jiapeng Chong <[email protected]> Reported-by: Simon Horman <[email protected]> Acked-by: Oliver Hartkopp <[email protected]> Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Marc Kleine-Budde <[email protected]>
2023-10-03net: add sysctl to disable rfc4862 5.5.3e lifetime handlingPatrick Rohr1-13/+25
This change adds a sysctl to opt-out of RFC4862 section 5.5.3e's valid lifetime derivation mechanism. RFC4862 section 5.5.3e prescribes that the valid lifetime in a Router Advertisement PIO shall be ignored if it less than 2 hours and to reset the lifetime of the corresponding address to 2 hours. An in-progress 6man draft (see draft-ietf-6man-slaac-renum-07 section 4.2) is currently looking to remove this mechanism. While this draft has not been moving particularly quickly for other reasons, there is widespread consensus on section 4.2 which updates RFC4862 section 5.5.3e. Cc: Maciej Żenczykowski <[email protected]> Cc: Lorenzo Colitti <[email protected]> Cc: Jen Linkova <[email protected]> Signed-off-by: Patrick Rohr <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-03net: nfc: llcp: Add lock when modifying device listJeremy Cline1-0/+2
The device list needs its associated lock held when modifying it, or the list could become corrupted, as syzbot discovered. Reported-and-tested-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=c1d0a03d305972dbbe14 Signed-off-by: Jeremy Cline <[email protected]> Reviewed-by: Simon Horman <[email protected]> Fixes: 6709d4b7bc2e ("net: nfc: Fix use-after-free caused by nfc_llcp_find_local") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-03ethtool: plca: fix plca enable data type while parsing the valueParthiban Veerasooran1-16/+29
The ETHTOOL_A_PLCA_ENABLED data type is u8. But while parsing the value from the attribute, nla_get_u32() is used in the plca_update_sint() function instead of nla_get_u8(). So plca_cfg.enabled variable is updated with some garbage value instead of 0 or 1 and always enables plca even though plca is disabled through ethtool application. This bug has been fixed by parsing the values based on the attributes type in the policy. Fixes: 8580e16c28f3 ("net/ethtool: add netlink interface for the PLCA RS") Signed-off-by: Parthiban Veerasooran <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-03net: dsa: tag_ksz: Extend ksz9477_xmit() for HSR frame duplicationLukasz Majewski1-0/+8
The KSZ9477 has support for HSR (High-Availability Seamless Redundancy). One of its offloading (i.e. performed in the switch IC hardware) features is to duplicate received frame to both HSR aware switch ports. To achieve this goal - the tail TAG needs to be modified. To be more specific, both ports must be marked as destination (egress) ones. The NETIF_F_HW_HSR_DUP flag indicates that the device supports HSR and assures (in HSR core code) that frame is sent only once from HOST to switch with tail tag indicating both ports. Signed-off-by: Lukasz Majewski <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03net: dsa: notify drivers of MAC address changes on user portsVladimir Oltean1-0/+7
In some cases, drivers may need to veto the changing of a MAC address on a user port. Such is the case with KSZ9477 when it offloads a HSR device, because it programs the MAC address of multiple ports to a shared hardware register. Those ports need to have equal MAC addresses for the lifetime of the HSR offload. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Lukasz Majewski <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03net: dsa: propagate extack to ds->ops->port_hsr_join()Vladimir Oltean3-4/+6
Drivers can provide meaningful error messages which state a reason why they can't perform an offload, and dsa_slave_changeupper() already has the infrastructure to propagate these over netlink rather than printing to the kernel log. So pass the extack argument and modify the xrs700x driver's port_hsr_join() prototype. Also take the opportunity and use the extack for the 2 -EOPNOTSUPP cases from xrs700x_hsr_join(). Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Lukasz Majewski <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03ipv6: mark address parameters of udp_tunnel6_xmit_skb() as constBeniamino Galvani1-2/+3
The function doesn't modify the addresses passed as input, mark them as 'const' to make that clear. Signed-off-by: Beniamino Galvani <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03udp_tunnel: Use flex array to simplify codeChristophe JAILLET1-9/+2
'n_tables' is small, UDP_TUNNEL_NIC_MAX_TABLES = 4 as a maximum. So there is no real point to allocate the 'entries' pointers array with a dedicate memory allocation. Using a flexible array for struct udp_tunnel_nic->entries avoids the overhead of an additional memory allocation. This also saves an indirection when the array is accessed. Finally, __counted_by() can be used for run-time bounds checking if configured and supported by the compiler. Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/4a096ba9cf981a588aa87235bb91e933ee162b3d.1695542544.git.christophe.jaillet@wanadoo.fr Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03tcp_metrics: optimize tcp_metrics_flush_all()Eric Dumazet1-2/+5
This is inspired by several syzbot reports where tcp_metrics_flush_all() was seen in the traces. We can avoid acquiring tcp_metrics_lock for empty buckets, and we should add one cond_resched() to break potential long loops. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03tcp_metrics: do not create an entry from tcp_init_metrics()Eric Dumazet1-1/+1
tcp_init_metrics() only wants to get metrics if they were previously stored in the cache. Creating an entry is adding useless costs, especially when tcp_no_metrics_save is set. Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03tcp_metrics: properly set tp->snd_ssthresh in tcp_init_metrics()Eric Dumazet1-5/+4
We need to set tp->snd_ssthresh to TCP_INFINITE_SSTHRESH in the case tcp_get_metrics() fails for some reason. Fixes: 9ad7c049f0f7 ("tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03tcp_metrics: add missing barriers on deleteEric Dumazet1-2/+2
When removing an item from RCU protected list, we must prevent store-tearing, using rcu_assign_pointer() or WRITE_ONCE(). Fixes: 04f721c671656 ("tcp_metrics: Rewrite tcp_metrics_flush_all") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03ipv6: tcp: add a missing nf_reset_ct() in 3WHS handlingIlya Maximets1-3/+7
Commit b0e214d21203 ("netfilter: keep conntrack reference until IPsecv6 policy checks are done") is a direct copy of the old commit b59c270104f0 ("[NETFILTER]: Keep conntrack reference until IPsec policy checks are done") but for IPv6. However, it also copies a bug that this old commit had. That is: when the third packet of 3WHS connection establishment contains payload, it is added into socket receive queue without the XFRM check and the drop of connection tracking context. That leads to nf_conntrack module being impossible to unload as it waits for all the conntrack references to be dropped while the packet release is deferred in per-cpu cache indefinitely, if not consumed by the application. The issue for IPv4 was fixed in commit 6f0012e35160 ("tcp: add a missing nf_reset_ct() in 3WHS handling") by adding a missing XFRM check and correctly dropping the conntrack context. However, the issue was introduced to IPv6 code afterwards. Fixing it the same way for IPv6 now. Fixes: b0e214d21203 ("netfilter: keep conntrack reference until IPsecv6 policy checks are done") Link: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Ilya Maximets <[email protected]> Acked-by: Florian Westphal <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03ipv4/fib: send notify when delete source address routesHangbin Liu2-0/+5
After deleting an interface address in fib_del_ifaddr(), the function scans the fib_info list for stray entries and calls fib_flush() and fib_table_flush(). Then the stray entries will be deleted silently and no RTM_DELROUTE notification will be sent. This lack of notification can make routing daemons, or monitor like `ip monitor route` miss the routing changes. e.g. + ip link add dummy1 type dummy + ip link add dummy2 type dummy + ip link set dummy1 up + ip link set dummy2 up + ip addr add 192.168.5.5/24 dev dummy1 + ip route add 7.7.7.0/24 dev dummy2 src 192.168.5.5 + ip -4 route 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 + ip monitor route + ip addr del 192.168.5.5/24 dev dummy1 Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5 Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5 As Ido reminded, fib_table_flush() isn't only called when an address is deleted, but also when an interface is deleted or put down. The lack of notification in these cases is deliberate. And commit 7c6bb7d2faaf ("net/ipv6: Add knob to skip DELROUTE message on device down") introduced a sysctl to make IPv6 behave like IPv4 in this regard. So we can't send the route delete notify blindly in fib_table_flush(). To fix this issue, let's add a new flag in "struct fib_info" to track the deleted prefer source address routes, and only send notify for them. After update: + ip monitor route + ip addr del 192.168.5.5/24 dev dummy1 Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5 Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5 Deleted 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5 Suggested-by: Thomas Haller <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Acked-by: Nicolas Dichtel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-10-02handshake: Fix sign of key_serial_t fieldsChuck Lever1-2/+2
key_serial_t fields are signed integers. Use nla_get/put_s32 for those to avoid implicit signed conversion in the netlink protocol. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/169530167716.8905.645746457741372879.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-02handshake: Fix sign of socket file descriptor fieldsChuck Lever3-3/+3
Socket file descriptors are signed integers. Use nla_get/put_s32 for those to avoid implicit signed conversion in the netlink protocol. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/169530165057.8905.8650469415145814828.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-02net: openvswitch: Annotate struct dp_meter with __counted_byKees Cook1-1/+1
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct dp_meter. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Pravin B Shelar <[email protected]> Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-02net: openvswitch: Annotate struct dp_meter_instance with __counted_byKees Cook1-1/+1
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct dp_meter_instance. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Pravin B Shelar <[email protected]> Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-01inet: implement lockless getsockopt(IP_MULTICAST_IF)Eric Dumazet5-21/+20
Add missing annotations to inet->mc_index and inet->mc_addr to fix data-races. getsockopt(IP_MULTICAST_IF) can be lockless. setsockopt() side is left for later. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01inet: lockless IP_PKTOPTIONS implementationEric Dumazet1-39/+37
Current implementation is already lockless, because the socket lock is released before reading socket fields. Add missing READ_ONCE() annotations. Note that corresponding WRITE_ONCE() are needed, the order of the patches do not really matter. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01inet: implement lockless getsockopt(IP_UNICAST_IF)Eric Dumazet5-18/+21
Add missing READ_ONCE() annotations when reading inet->uc_index Implementing getsockopt(IP_UNICAST_IF) locklessly seems possible, the setsockopt() part might not be possible at the moment. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01inet: lockless getsockopt(IP_MTU)Eric Dumazet1-11/+9
sk_dst_get() does not require socket lock. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01inet: lockless getsockopt(IP_OPTIONS)Eric Dumazet1-10/+10
inet->inet_opt being RCU protected, we can use RCU instead of locking the socket. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01inet: implement lockless IP_TOSEric Dumazet7-31/+27
Some reads of inet->tos are racy. Add needed READ_ONCE() annotations and convert IP_TOS option lockless. v2: missing changes in include/net/route.h (David Ahern) Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01inet: implement lockless IP_MTU_DISCOVEREric Dumazet6-18/+14
inet->pmtudisc can be read locklessly. Implement proper lockless reads and writes to inet->pmtudisc ip_sock_set_mtu_discover() can now be called from arbitrary contexts. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01inet: implement lockless IP_MULTICAST_TTLEric Dumazet3-17/+18
inet->mc_ttl can be read locklessly. Implement proper lockless reads and writes to inet->mc_ttl Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01net: prevent address rewrite in kernel_bind()Jordan Rife4-5/+10
Similar to the change in commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect"), BPF hooks run on bind may rewrite the address passed to kernel_bind(). This change 1) Makes a copy of the bind address in kernel_bind() to insulate callers. 2) Replaces direct calls to sock->ops->bind() in net with kernel_bind() Link: https://lore.kernel.org/netdev/[email protected]/ Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind") Cc: [email protected] Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Jordan Rife <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01net: prevent rewrite of msg_name in sock_sendmsg()Jordan Rife1-6/+23
Callers of sock_sendmsg(), and similarly kernel_sendmsg(), in kernel space may observe their value of msg_name change in cases where BPF sendmsg hooks rewrite the send address. This has been confirmed to break NFS mounts running in UDP mode and has the potential to break other systems. This patch: 1) Creates a new function called __sock_sendmsg() with same logic as the old sock_sendmsg() function. 2) Replaces calls to sock_sendmsg() made by __sys_sendto() and __sys_sendmsg() with __sock_sendmsg() to avoid an unnecessary copy, as these system calls are already protected. 3) Modifies sock_sendmsg() so that it makes a copy of msg_name if present before passing it down the stack to insulate callers from changes to the send address. Link: https://lore.kernel.org/netdev/[email protected]/ Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg") Cc: [email protected] Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Jordan Rife <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01net: replace calls to sock->ops->connect() with kernel_connect()Jordan Rife2-3/+3
commit 0bdf399342c5 ("net: Avoid address overwrite in kernel_connect") ensured that kernel_connect() will not overwrite the address parameter in cases where BPF connect hooks perform an address rewrite. This change replaces direct calls to sock->ops->connect() in net with kernel_connect() to make these call safe. Link: https://lore.kernel.org/netdev/[email protected]/ Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect") Cc: [email protected] Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Jordan Rife <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01net: annotate data-races around sk->sk_dst_pending_confirmEric Dumazet2-2/+2
This field can be read or written without socket lock being held. Add annotations to avoid load-store tearing. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01net: lockless implementation of SO_TXREHASHEric Dumazet1-13/+10
sk->sk_txrehash readers are already safe against concurrent change of this field. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01net: implement lockless SO_MAX_PACING_RATEEric Dumazet5-32/+36
SO_MAX_PACING_RATE setsockopt() does not need to hold the socket lock, because sk->sk_pacing_rate readers can run fine if the value is changed by other threads, after adding READ_ONCE() accessors. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01net: lockless implementation of SO_BUSY_POLL, SO_PREFER_BUSY_POLL, ↵Eric Dumazet1-24/+20
SO_BUSY_POLL_BUDGET Setting sk->sk_ll_usec, sk_prefer_busy_poll and sk_busy_poll_budget do not require the socket lock, readers are lockless anyway. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01net: lockless SO_{TYPE|PROTOCOL|DOMAIN|ERROR } setsockopt()Eric Dumazet1-6/+5
This options can not be set and return -ENOPROTOOPT, no need to acqure socket lock. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01net: lockless SO_PASSCRED, SO_PASSPIDFD and SO_PASSSECEric Dumazet1-11/+9
sock->flags are atomic, no need to hold the socket lock in sk_setsockopt() for SO_PASSCRED, SO_PASSPIDFD and SO_PASSSEC. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01net: implement lockless SO_PRIORITYEric Dumazet22-33/+34
This is a followup of 8bf43be799d4 ("net: annotate data-races around sk->sk_priority"). sk->sk_priority can be read and written without holding the socket lock. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01openvswitch: reduce stack usage in do_execute_actionsIlya Maximets1-12/+11
do_execute_actions() function can be called recursively multiple times while executing actions that require pipeline forking or recirculations. It may also be re-entered multiple times if the packet leaves openvswitch module and re-enters it through a different port. Currently, there is a 256-byte array allocated on stack in this function that is supposed to hold NSH header. Compilers tend to pre-allocate that space right at the beginning of the function: a88: 48 81 ec b0 01 00 00 sub $0x1b0,%rsp NSH is not a very common protocol, but the space is allocated on every recursive call or re-entry multiplying the wasted stack space. Move the stack allocation to push_nsh() function that is only used if NSH actions are actually present. push_nsh() is also a simple function without a possibility for re-entry, so the stack is returned right away. With this change the preallocated space is reduced by 256 B per call: b18: 48 81 ec b0 00 00 00 sub $0xb0,%rsp Signed-off-by: Ilya Maximets <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Eelco Chaudron [email protected] Reviewed-by: Aaron Conole <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()David Howells1-1/+1
Including the transhdrlen in length is a problem when the packet is partially filled (e.g. something like send(MSG_MORE) happened previously) when appending to an IPv4 or IPv6 packet as we don't want to repeat the transport header or account for it twice. This can happen under some circumstances, such as splicing into an L2TP socket. The symptom observed is a warning in __ip6_append_data(): WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800 that occurs when MSG_SPLICE_PAGES is used to append more data to an already partially occupied skbuff. The warning occurs when 'copy' is larger than the amount of data in the message iterator. This is because the requested length includes the transport header length when it shouldn't. This can be triggered by, for example: sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP); bind(sfd, ...); // ::1 connect(sfd, ...); // ::1 port 7 send(sfd, buffer, 4100, MSG_MORE); sendfile(sfd, dfd, NULL, 1024); Fix this by only adding transhdrlen into the length if the write queue is empty in l2tp_ip6_sendmsg(), analogously to how UDP does things. l2tp_ip_sendmsg() looks like it won't suffer from this problem as it builds the UDP packet itself. Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6") Reported-by: [email protected] Link: https://lore.kernel.org/r/[email protected]/ Suggested-by: Willem de Bruijn <[email protected]> Signed-off-by: David Howells <[email protected]> cc: Eric Dumazet <[email protected]> cc: Willem de Bruijn <[email protected]> cc: "David S. Miller" <[email protected]> cc: David Ahern <[email protected]> cc: Paolo Abeni <[email protected]> cc: Jakub Kicinski <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01neighbour: fix data-races around n->outputEric Dumazet2-6/+6
n->output field can be read locklessly, while a writer might change the pointer concurrently. Add missing annotations to prevent load-store tearing. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01net: l2tp_eth: use generic dev->stats fieldsEric Dumazet1-22/+12
Core networking has opt-in atomic variant of dev->stats, simply use DEV_STATS_INC(), DEV_STATS_ADD() and DEV_STATS_READ(). v2: removed @priv local var in l2tp_eth_dev_recv() (Simon) Signed-off-by: Eric Dumazet <[email protected]> Cc: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01net: fix possible store tearing in neigh_periodic_work()Eric Dumazet1-1/+3
While looking at a related syzbot report involving neigh_periodic_work(), I found that I forgot to add an annotation when deleting an RCU protected item from a list. Readers use rcu_deference(*np), we need to use either rcu_assign_pointer() or WRITE_ONCE() on writer side to prevent store tearing. I use rcu_assign_pointer() to have lockdep support, this was the choice made in neigh_flush_dev(). Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-01Merge tag 'for-net-2023-09-20' of ↵David S. Miller6-45/+58
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth bluetooth pull request for net: - Fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER - Fix handling of listen for ISO unicast - Fix build warnings - Fix leaking content of local_codecs - Add shutdown function for QCA6174 - Delete unused hci_req_prepare_suspend() declaration - Fix hci_link_tx_to RCU lock usage - Avoid redundant authentication Signed-off-by: David S. Miller <[email protected]>