aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2023-09-16tcp: call tcp_try_undo_recovery when an RTOd TFO SYNACK is ACKedAananth V1-4/+5
For passive TCP Fast Open sockets that had SYN/ACK timeout and did not send more data in SYN_RECV, upon receiving the final ACK in 3WHS, the congestion state may awkwardly stay in CA_Loss mode unless the CA state was undone due to TCP timestamp checks. However, if tcp_rcv_synrecv_state_fastopen() decides not to undo, then we should enter CA_Open, because at that point we have received an ACK covering the retransmitted SYNACKs. Currently, the icsk_ca_state is only set to CA_Open after we receive an ACK for a data-packet. This is because tcp_ack does not call tcp_fastretrans_alert (and tcp_process_loss) if !prior_packets Note that tcp_process_loss() calls tcp_try_undo_recovery(), so having tcp_rcv_synrecv_state_fastopen() decide that if we're in CA_Loss we should call tcp_try_undo_recovery() is consistent with that, and low risk. Fixes: dad8cea7add9 ("tcp: fix TFO SYNACK undo to avoid double-timestamp-undo") Signed-off-by: Aananth V <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-16net: core: Use the bitmap API to allocate bitmapsAndy Shevchenko1-3/+3
Use bitmap_zalloc() and bitmap_free() instead of hand-writing them. It is less verbose and it improves the type checking and semantic. While at it, add missing header inclusion (should be bitops.h, but with the above change it becomes bitmap.h). Suggested-by: Sergey Ryazanov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-0/+2
Alexei Starovoitov says: ==================== The following pull-request contains BPF updates for your *net* tree. We've added 21 non-merge commits during the last 8 day(s) which contain a total of 21 files changed, 450 insertions(+), 36 deletions(-). The main changes are: 1) Adjust bpf_mem_alloc buckets to match ksize(), from Hou Tao. 2) Check whether override is allowed in kprobe mult, from Jiri Olsa. 3) Fix btf_id symbol generation with ld.lld, from Jiri and Nick. 4) Fix potential deadlock when using queue and stack maps from NMI, from Toke Høiland-Jørgensen. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Alan Maguire, Biju Das, Björn Töpel, Dan Carpenter, Daniel Borkmann, Eduard Zingerman, Hsin-Wei Hung, Marcus Seyfarth, Nathan Chancellor, Satya Durga Srinivasu Prabhala, Song Liu, Stephen Rothwell ==================== Signed-off-by: David S. Miller <[email protected]>
2023-09-16net: add truesize debug checks in skb_{add|coalesce}_rx_frag()Eric Dumazet1-0/+4
It can be time consuming to track driver bugs, that might be detected too late from this confusing warning in skb_try_coalesce() WARN_ON_ONCE(delta < len); Add sanity check in skb_add_rx_frag() and skb_coalesce_rx_frag() to better track bug origin for CONFIG_DEBUG_NET=y builds. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-16net: use indirect call helpers for sk->sk_prot->release_cb()Eric Dumazet1-2/+5
When adding sk->sk_prot->release_cb() call from __sk_flush_backlog() Paolo suggested using indirect call helpers to take care of CONFIG_RETPOLINE=y case. It turns out Google had such mitigation for years in release_sock(), it is time to make this public :) Suggested-by: Paolo Abeni <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15bpf: expose information about supported xdp metadata kfuncStanislav Fomichev2-3/+13
Add new xdp-rx-metadata-features member to netdev netlink which exports a bitmask of supported kfuncs. Most of the patch is autogenerated (headers), the only relevant part is netdev.yaml and the changes in netdev-genl.c to marshal into netlink. Example output on veth: $ ip link add veth0 type veth peer name veth1 # ifndex == 12 $ ./tools/net/ynl/samples/netdev 12 Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12 veth1[12] xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0 Cc: [email protected] Cc: Willem de Bruijn <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-09-15bpf: make it easier to add new metadata kfuncStanislav Fomichev1-2/+2
No functional changes. Instead of having hand-crafted code in bpf_dev_bound_resolve_kfunc, move kfunc <> xmo handler relationship into XDP_METADATA_KFUNC_xxx. This way, any time new kfunc is added, we don't have to touch bpf_dev_bound_resolve_kfunc. Also document XDP_METADATA_KFUNC_xxx arguments since we now have more than two and it might be confusing what is what. Cc: [email protected] Cc: Willem de Bruijn <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-09-15xsk: add multi-buffer support for sockets sharing umemTirthendu Sarkar2-1/+4
Userspace applications indicate their multi-buffer capability to xsk using XSK_USE_SG socket bind flag. For sockets using shared umem the bind flag may contain XSK_USE_SG only for the first socket. For any subsequent socket the only option supported is XDP_SHARED_UMEM. Add option XDP_UMEM_SG_FLAG in umem config flags to store the multi-buffer handling capability when indicated by XSK_USE_SG option in bing flag by the first socket. Use this to derive multi-buffer capability for subsequent sockets in xsk core. Signed-off-by: Tirthendu Sarkar <[email protected]> Fixes: 81470b5c3c66 ("xsk: introduce XSK_USE_SG bind flag for xsk socket") Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-15netfilter, bpf: Adjust timeouts of non-confirmed CTs in bpf_ct_insert_entry()Ilya Leoshkevich1-0/+2
bpf_nf testcase fails on s390x: bpf_skb_ct_lookup() cannot find the entry that was added by bpf_ct_insert_entry() within the same BPF function. The reason is that this entry is deleted by nf_ct_gc_expired(). The CT timeout starts ticking after the CT confirmation; therefore nf_conn.timeout is initially set to the timeout value, and __nf_conntrack_confirm() sets it to the deadline value. bpf_ct_insert_entry() sets IPS_CONFIRMED_BIT, but does not adjust the timeout, making its value meaningless and causing false positives. Fix the problem by making bpf_ct_insert_entry() adjust the timeout, like __nf_conntrack_confirm(). Fixes: 2cdaa3eefed8 ("netfilter: conntrack: restore IPS_CONFIRMED out of nf_conntrack_hash_check_insert()") Signed-off-by: Ilya Leoshkevich <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Cc: Florian Westphal <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-15Merge tag 'nf-23-09-13' of ↵David S. Miller5-34/+58
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf netfilter pull request 23-09-13 ==================== The following patchset contains Netfilter fixes for net: 1) Do not permit to remove rules from chain binding, otherwise double rule release is possible, triggering UaF. This rule deletion support does not make sense and userspace does not use this. Problem exists since the introduction of chain binding support. 2) rbtree GC worker only collects the elements that have expired. This operation is not destructive, therefore, turn write into read spinlock to avoid datapath contention due to GC worker run. This was not fixed in the recent GC fix batch in the 6.5 cycle. 3) pipapo set backend performs sync GC, therefore, catchall elements must use sync GC queue variant. This bug was introduced in the 6.5 cycle with the recent GC fixes. 4) Stop GC run if memory allocation fails in pipapo set backend, otherwise access to NULL pointer to GC transaction object might occur. This bug was introduced in the 6.5 cycle with the recent GC fixes. 5) rhash GC run uses an iterator that might hit EAGAIN to rewind, triggering double-collection of the same element. This bug was introduced in the 6.5 cycle with the recent GC fixes. 6) Do not permit to remove elements in anonymous sets, this type of sets are populated once and then bound to rules. This fix is similar to the chain binding patch coming first in this batch. API permits since the very beginning but it has no use case from userspace. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-09-15tcp: indent an if statementDan Carpenter1-1/+1
Indent this if statement one tab. Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15SUNRPC: Do not include crypto/algapi.hHerbert Xu2-3/+1
The header file crypto/algapi.h is for internal use only. Use the header file crypto/utils.h instead. Acked-by: Chuck Lever <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2023-09-15mptcp: Do not include crypto/algapi.hHerbert Xu1-1/+1
The header file crypto/algapi.h is for internal use only. Use the header file crypto/utils.h instead. Acked-by: Matthieu Baerts <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2023-09-15ah: Do not include crypto/algapi.hHerbert Xu2-2/+2
The header file crypto/algapi.h is for internal use only. Use the header file crypto/utils.h instead. Signed-off-by: Herbert Xu <[email protected]>
2023-09-15Bluetooth: Do not include crypto/algapi.hHerbert Xu1-2/+1
The header file crypto/algapi.h is for internal use only. Use the header file crypto/utils.h instead. Acked-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2023-09-15net/core: Fix ETH_P_1588 flow dissectorSasha Neftin1-1/+1
When a PTP ethernet raw frame with a size of more than 256 bytes followed by a 0xff pattern is sent to __skb_flow_dissect, nhoff value calculation is wrong. For example: hdr->message_length takes the wrong value (0xffff) and it does not replicate real header length. In this case, 'nhoff' value was overridden and the PTP header was badly dissected. This leads to a kernel crash. net/core: flow_dissector net/core flow dissector nhoff = 0x0000000e net/core flow dissector hdr->message_length = 0x0000ffff net/core flow dissector nhoff = 0x0001000d (u16 overflow) ... skb linear: 00000000: 00 a0 c9 00 00 00 00 a0 c9 00 00 00 88 skb frag: 00000000: f7 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Using the size of the ptp_header struct will allow the corrected calculation of the nhoff value. net/core flow dissector nhoff = 0x0000000e net/core flow dissector nhoff = 0x00000030 (sizeof ptp_header) ... skb linear: 00000000: 00 a0 c9 00 00 00 00 a0 c9 00 00 00 88 f7 ff ff skb linear: 00000010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff skb linear: 00000020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff skb frag: 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Kernel trace: [ 74.984279] ------------[ cut here ]------------ [ 74.989471] kernel BUG at include/linux/skbuff.h:2440! [ 74.995237] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 75.001098] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G U 5.15.85-intel-ese-standard-lts #1 [ 75.011629] Hardware name: Intel Corporation A-Island (CPU:AlderLake)/A-Island (ID:06), BIOS SB_ADLP.01.01.00.01.03.008.D-6A9D9E73-dirty Mar 30 2023 [ 75.026507] RIP: 0010:eth_type_trans+0xd0/0x130 [ 75.031594] Code: 03 88 47 78 eb c7 8b 47 68 2b 47 6c 48 8b 97 c0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb ab <0f> 0b b8 00 01 00 00 eb a2 48 85 ff 74 eb 48 8d 54 24 06 31 f6 b9 [ 75.052612] RSP: 0018:ffff9948c0228de0 EFLAGS: 00010297 [ 75.058473] RAX: 00000000000003f2 RBX: ffff8e47047dc300 RCX: 0000000000001003 [ 75.066462] RDX: ffff8e4e8c9ea040 RSI: ffff8e4704e0a000 RDI: ffff8e47047dc300 [ 75.074458] RBP: ffff8e4704e2acc0 R08: 00000000000003f3 R09: 0000000000000800 [ 75.082466] R10: 000000000000000d R11: ffff9948c0228dec R12: ffff8e4715e4e010 [ 75.090461] R13: ffff9948c0545018 R14: 0000000000000001 R15: 0000000000000800 [ 75.098464] FS: 0000000000000000(0000) GS:ffff8e4e8fb00000(0000) knlGS:0000000000000000 [ 75.107530] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 75.113982] CR2: 00007f5eb35934a0 CR3: 0000000150e0a002 CR4: 0000000000770ee0 [ 75.121980] PKRU: 55555554 [ 75.125035] Call Trace: [ 75.127792] <IRQ> [ 75.130063] ? eth_get_headlen+0xa4/0xc0 [ 75.134472] igc_process_skb_fields+0xcd/0x150 [ 75.139461] igc_poll+0xc80/0x17b0 [ 75.143272] __napi_poll+0x27/0x170 [ 75.147192] net_rx_action+0x234/0x280 [ 75.151409] __do_softirq+0xef/0x2f4 [ 75.155424] irq_exit_rcu+0xc7/0x110 [ 75.159432] common_interrupt+0xb8/0xd0 [ 75.163748] </IRQ> [ 75.166112] <TASK> [ 75.168473] asm_common_interrupt+0x22/0x40 [ 75.173175] RIP: 0010:cpuidle_enter_state+0xe2/0x350 [ 75.178749] Code: 85 c0 0f 8f 04 02 00 00 31 ff e8 39 6c 67 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 50 02 00 00 31 ff e8 52 b0 6d ff fb 45 85 f6 <0f> 88 b1 00 00 00 49 63 ce 4c 2b 2c 24 48 89 c8 48 6b d1 68 48 c1 [ 75.199757] RSP: 0018:ffff9948c013bea8 EFLAGS: 00000202 [ 75.205614] RAX: ffff8e4e8fb00000 RBX: ffffb948bfd23900 RCX: 000000000000001f [ 75.213619] RDX: 0000000000000004 RSI: ffffffff94206161 RDI: ffffffff94212e20 [ 75.221620] RBP: 0000000000000004 R08: 000000117568973a R09: 0000000000000001 [ 75.229622] R10: 000000000000afc8 R11: ffff8e4e8fb29ce4 R12: ffffffff945ae980 [ 75.237628] R13: 000000117568973a R14: 0000000000000004 R15: 0000000000000000 [ 75.245635] ? cpuidle_enter_state+0xc7/0x350 [ 75.250518] cpuidle_enter+0x29/0x40 [ 75.254539] do_idle+0x1d9/0x260 [ 75.258166] cpu_startup_entry+0x19/0x20 [ 75.262582] secondary_startup_64_no_verify+0xc2/0xcb [ 75.268259] </TASK> [ 75.270721] Modules linked in: 8021q snd_sof_pci_intel_tgl snd_sof_intel_hda_common tpm_crb snd_soc_hdac_hda snd_sof_intel_hda snd_hda_ext_core snd_sof_pci snd_sof snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress iTCO_wdt ac97_bus intel_pmc_bxt mei_hdcp iTCO_vendor_support snd_hda_codec_hdmi pmt_telemetry intel_pmc_core pmt_class snd_hda_intel x86_pkg_temp_thermal snd_intel_dspcfg snd_hda_codec snd_hda_core kvm_intel snd_pcm snd_timer kvm snd mei_me soundcore tpm_tis irqbypass i2c_i801 mei tpm_tis_core pcspkr intel_rapl_msr tpm i2c_smbus intel_pmt thermal sch_fq_codel uio uhid i915 drm_buddy video drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm fuse configfs [ 75.342736] ---[ end trace 3785f9f360400e3a ]--- [ 75.347913] RIP: 0010:eth_type_trans+0xd0/0x130 [ 75.352984] Code: 03 88 47 78 eb c7 8b 47 68 2b 47 6c 48 8b 97 c0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb ab <0f> 0b b8 00 01 00 00 eb a2 48 85 ff 74 eb 48 8d 54 24 06 31 f6 b9 [ 75.373994] RSP: 0018:ffff9948c0228de0 EFLAGS: 00010297 [ 75.379860] RAX: 00000000000003f2 RBX: ffff8e47047dc300 RCX: 0000000000001003 [ 75.387856] RDX: ffff8e4e8c9ea040 RSI: ffff8e4704e0a000 RDI: ffff8e47047dc300 [ 75.395864] RBP: ffff8e4704e2acc0 R08: 00000000000003f3 R09: 0000000000000800 [ 75.403857] R10: 000000000000000d R11: ffff9948c0228dec R12: ffff8e4715e4e010 [ 75.411863] R13: ffff9948c0545018 R14: 0000000000000001 R15: 0000000000000800 [ 75.419875] FS: 0000000000000000(0000) GS:ffff8e4e8fb00000(0000) knlGS:0000000000000000 [ 75.428946] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 75.435403] CR2: 00007f5eb35934a0 CR3: 0000000150e0a002 CR4: 0000000000770ee0 [ 75.443410] PKRU: 55555554 [ 75.446477] Kernel panic - not syncing: Fatal exception in interrupt [ 75.453738] Kernel Offset: 0x11c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 75.465794] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: 4f1cc51f3488 ("net: flow_dissector: Parse PTP L2 packet header") Signed-off-by: Sasha Neftin <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_FLOWINFO_SEND implementationEric Dumazet11-21/+21
np->sndflow reads are racy. Use one bit ftom atomic inet->inet_flags instead, IPV6_FLOWINFO_SEND setsockopt() can be lockless. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_MTU_DISCOVER implementationEric Dumazet5-14/+13
Most np->pmtudisc reads are racy. Move this 3bit field on a full byte, add annotations and make IPV6_MTU_DISCOVER setsockopt() lockless. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_ROUTER_ALERT_ISOLATE implementationEric Dumazet2-9/+7
Reads from np->rtalert_isolate are racy. Move this flag to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: move np->repflow to atomic flagsEric Dumazet4-14/+13
Move np->repflow to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_RECVERR implemetationEric Dumazet8-27/+22
np->recverr is moved to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_DONTFRAG implementationEric Dumazet7-12/+11
Move np->dontfrag flag to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_AUTOFLOWLABEL implementationEric Dumazet2-13/+10
Move np->autoflowlabel and np->autoflowlabel_set in inet->inet_flags, to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_MULTICAST_ALL implementationEric Dumazet3-10/+8
Move np->mc_all to an atomic flags to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_RECVERR_RFC4884 implementationEric Dumazet2-10/+9
Move np->recverr_rfc4884 to an atomic flag to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_MINHOPCOUNT implementationEric Dumazet1-16/+15
Add one missing READ_ONCE() annotation in do_ipv6_getsockopt() and make IPV6_MINHOPCOUNT setsockopt() lockless. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_MTU implementationEric Dumazet2-16/+18
np->frag_size can be read/written without holding socket lock. Add missing annotations and make IPV6_MTU setsockopt() lockless. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_MULTICAST_HOPS implementationEric Dumazet4-16/+19
This fixes data-races around np->mcast_hops, and make IPV6_MULTICAST_HOPS lockless. Note that np->mcast_hops is never negative, thus can fit an u8 field instead of s16. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_MULTICAST_LOOP implementationEric Dumazet5-20/+14
Add inet6_{test|set|clear|assign}_bit() helpers. Note that I am using bits from inet->inet_flags, this might change in the future if we need more flags. While solving data-races accessing np->mc_loop, this patch also allows to implement lockless accesses to np->mcast_hops in the following patch. Also constify sk_mc_loop() argument. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_UNICAST_HOPS implementationEric Dumazet4-12/+14
Some np->hop_limit accesses are racy, when socket lock is not held. Add missing annotations and switch to full lockless implementation. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni3-25/+30
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14ipv6: mcast: Remove redundant comparison in igmp6_mcf_get_next()Gavrilov Ilia1-2/+0
The 'state->im' value will always be non-zero after the 'while' statement, so the check can be removed. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Gavrilov Ilia <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14ipv4: igmp: Remove redundant comparison in igmp_mcf_get_next()Gavrilov Ilia1-2/+0
The 'state->im' value will always be non-zero after the 'while' statement, so the check can be removed. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Gavrilov Ilia <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14udplite: fix various data-racesEric Dumazet2-14/+16
udp->pcflag, udp->pcslen and udp->pcrlen reads/writes are racy. Move udp->pcflag to udp->udp_flags for atomicity, and add READ_ONCE()/WRITE_ONCE() annotations for pcslen and pcrlen. Fixes: ba4e58eca8aa ("[NET]: Supporting UDP-Lite (RFC 3828) in Linux") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14udplite: remove UDPLITE_BITEric Dumazet2-2/+0
This flag is set but never read, we can remove it. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14udp: annotate data-races around udp->encap_typeEric Dumazet5-13/+15
syzbot/KCSAN complained about UDP_ENCAP_L2TPINUDP setsockopt() racing. Add READ_ONCE()/WRITE_ONCE() to document races on this lockless field. syzbot report was: BUG: KCSAN: data-race in udp_lib_setsockopt / udp_lib_setsockopt read-write to 0xffff8881083603fa of 1 bytes by task 16557 on cpu 0: udp_lib_setsockopt+0x682/0x6c0 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read-write to 0xffff8881083603fa of 1 bytes by task 16554 on cpu 1: udp_lib_setsockopt+0x682/0x6c0 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x01 -> 0x05 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 16554 Comm: syz-executor.5 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GROEric Dumazet3-9/+5
Move udp->encap_enabled to udp->udp_flags. Add udp_test_and_set_bit() helper to allow lockless udp_tunnel_encap_enable() implementation. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14udp: move udp->accept_udp_{l4|fraglist} to udp->udp_flagsEric Dumazet1-1/+1
These are read locklessly, move them to udp_flags to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14udp: add missing WRITE_ONCE() around up->encap_rcvEric Dumazet1-2/+4
UDP_ENCAP_ESPINUDP_NON_IKE setsockopt() writes over up->encap_rcv while other cpus read it. Fixes: 067b207b281d ("[UDP]: Cleanup UDP encapsulation code") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14udp: move udp->gro_enabled to udp->udp_flagsEric Dumazet3-6/+6
syzbot reported that udp->gro_enabled can be read locklessly. Use one atomic bit from udp->udp_flags. Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14udp: move udp->no_check6_rx to udp->udp_flagsEric Dumazet2-5/+5
syzbot reported that udp->no_check6_rx can be read locklessly. Use one atomic bit from udp->udp_flags. Fixes: 1c19448c9ba6 ("net: Make enabling of zero UDP6 csums more restrictive") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14udp: move udp->no_check6_tx to udp->udp_flagsEric Dumazet2-4/+4
syzbot reported that udp->no_check6_tx can be read locklessly. Use one atomic bit from udp->udp_flags Fixes: 1c19448c9ba6 ("net: Make enabling of zero UDP6 csums more restrictive") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14udp: introduce udp->udp_flagsEric Dumazet2-9/+9
According to syzbot, it is time to use proper atomic flags for various UDP flags. Add udp_flags field, and convert udp->corkflag to first bit in it. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg().Kuniyuki Iwashima1-7/+8
syzkaller found a memory leak in kcm_sendmsg(), and commit c821a88bd720 ("kcm: Fix memory leak in error path of kcm_sendmsg()") suppressed it by updating kcm_tx_msg(head)->last_skb if partial data is copied so that the following sendmsg() will resume from the skb. However, we cannot know how many bytes were copied when we get the error. Thus, we could mess up the MSG_MORE queue. When kcm_sendmsg() fails for SOCK_DGRAM, we should purge the queue as we do so for UDP by udp_flush_pending_frames(). Even without this change, when the error occurred, the following sendmsg() resumed from a wrong skb and the queue was messed up. However, we have yet to get such a report, and only syzkaller stumbled on it. So, this can be changed safely. Note this does not change SOCK_SEQPACKET behaviour. Fixes: c821a88bd720 ("kcm: Fix memory leak in error path of kcm_sendmsg()") Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14vsock: send SIGPIPE on write to shutdowned socketArseniy Krasnov1-0/+3
POSIX requires to send SIGPIPE on write to SOCK_STREAM socket which was shutdowned with SHUT_WR flag or its peer was shutdowned with SHUT_RD flag. Also we must not send SIGPIPE if MSG_NOSIGNAL flag is set. Signed-off-by: Arseniy Krasnov <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-13netfilter: nf_tables: Fix entries val in rule reset audit logPhil Sutter1-6/+10
The value in idx and the number of rules handled in that particular __nf_tables_dump_rules() call is not identical. The former is a cursor to pick up from if multiple netlink messages are needed, so its value is ever increasing. Fixing this is not just a matter of subtracting s_idx from it, though: When resetting rules in multiple chains, __nf_tables_dump_rules() is called for each and cb->args[0] is not adjusted in between. Introduce a dedicated counter to record the number of rules reset in this call in a less confusing way. While being at it, prevent the direct return upon buffer exhaustion: Any rules previously dumped into that skb would evade audit logging otherwise. Fixes: 9b5ba5c9c5109 ("netfilter: nf_tables: Unbreak audit log reset") Signed-off-by: Phil Sutter <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2023-09-13netfilter: conntrack: fix extension size tableFlorian Westphal1-2/+2
The size table is incorrect due to copypaste error, this reserves more size than needed. TSTAMP reserved 32 instead of 16 bytes. TIMEOUT reserved 16 instead of 8 bytes. Fixes: 5f31edc0676b ("netfilter: conntrack: move extension sizes into core") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2023-09-13NFSv4.1: fix pnfs MDS=DS session trunkingOlga Kornievskaia1-4/+7
Currently, when GETDEVICEINFO returns multiple locations where each is a different IP but the server's identity is same as MDS, then nfs4_set_ds_client() finds the existing nfs_client structure which has the MDS's max_connect value (and if it's 1), then the 1st IP on the DS's list will get dropped due to MDS trunking rules. Other IPs would be added as they fall under the pnfs trunking rules. For the list of IPs the 1st goes thru calling nfs4_set_ds_client() which will eventually call nfs4_add_trunk() and call into rpc_clnt_test_and_add_xprt() which has the check for MDS trunking. The other IPs (after the 1st one), would call rpc_clnt_add_xprt() which doesn't go thru that check. nfs4_add_trunk() is called when MDS trunking is happening and it needs to enforce the usage of max_connect mount option of the 1st mount. However, this shouldn't be applied to pnfs flow. Instead, this patch proposed to treat MDS=DS as DS trunking and make sure that MDS's max_connect limit does not apply to the 1st IP returned in the GETDEVICEINFO list. It does so by marking the newly created client with a new flag NFS_CS_PNFS which then used to pass max_connect value to use into the rpc_clnt_test_and_add_xprt() instead of the existing rpc client's max_connect value set by the MDS connection. For example, mount was done without max_connect value set so MDS's rpc client has cl_max_connect=1. Upon calling into rpc_clnt_test_and_add_xprt() and using rpc client's value, the caller passes in max_connect value which is previously been set in the pnfs path (as a part of handling GETDEVICEINFO list of IPs) in nfs4_set_ds_client(). However, when NFS_CS_PNFS flag is not set and we know we are doing MDS trunking, comparing a new IP of the same server, we then set the max_connect value to the existing MDS's value and pass that into rpc_clnt_test_and_add_xprt(). Fixes: dc48e0abee24 ("SUNRPC enforce creation of no more than max_connect xprts") Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2023-09-13Revert "SUNRPC: Fail faster on bad verifier"Trond Myklebust1-1/+1
This reverts commit 0701214cd6e66585a999b132eb72ae0489beb724. The premise of this commit was incorrect. There are exactly 2 cases where rpcauth_checkverf() will return an error: 1) If there was an XDR decode problem (i.e. garbage data). 2) If gss_validate() had a problem verifying the RPCSEC_GSS MIC. In the second case, there are again 2 subcases: a) The GSS context expires, in which case gss_validate() will force a new context negotiation on retry by invalidating the cred. b) The sequence number check failed because an RPC call timed out, and the client retransmitted the request using a new sequence number, as required by RFC2203. In neither subcase is this a fatal error. Reported-by: Russell Cattelan <[email protected]> Fixes: 0701214cd6e6 ("SUNRPC: Fail faster on bad verifier") Cc: [email protected] Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2023-09-13SUNRPC: Mark the cred for revalidation if the server rejects itTrond Myklebust1-0/+1
If the server rejects the credential as being stale, or bad, then we should mark it for revalidation before retransmitting. Fixes: 7f5667a5f8c4 ("SUNRPC: Clean up rpc_verify_header()") Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>