aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-06-20net: usb: rtl8150 fix unintiatilzed variables in rtl8150_get_link_ksettingsOliver Neukum1-1/+2
This functions retrieves values by passing a pointer. As the function that retrieves them can fail before touching the pointers, the variables must be initialized. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: [email protected] Signed-off-by: Oliver Neukum <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-20selftests: virtio_net: add forgotten config optionsJiri Pirko1-1/+7
One may use tools/testing/selftests/drivers/net/virtio_net/config for example for vng build command like this one: $ vng -v -b -f tools/testing/selftests/drivers/net/virtio_net/config In that case, the needed kernel config options are not turned on. Add the missed kernel config options. Reported-by: Jakub Kicinski <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Reported-by: Matthieu Baerts <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Fixes: ccfaed04db5e ("selftests: virtio_net: add initial tests") Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Xuan Zhuo <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-20Merge branch 'bnxt_en-bug-fixes-for-net'Jakub Kicinski3-136/+184
Michael Chan says: ==================== bnxt_en: Bug fixes for net The first firmware interface update is needed by the second patch to limit the number of TSO segments on the 5760X chips. The third patch fixes the TX error path for PTP packets. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-20bnxt_en: Restore PTP tx_avail count in case of skb_pad() errorPavan Chebbi1-3/+2
The current code only restores PTP tx_avail count when we get DMA mapping errors. Fix it so that the PTP tx_avail count will be restored for both DMA mapping errors and skb_pad() errors. Otherwise PTP TX timestamp will not be available after a PTP packet hits the skb_pad() error. Fixes: 83bb623c968e ("bnxt_en: Transmit and retrieve packet timestamps") Reviewed-by: Andy Gospodarek <[email protected]> Signed-off-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-20bnxt_en: Set TSO max segs on devices with limitsMichael Chan2-0/+4
Firmware will now advertise a non-zero TSO max segments if the device has a limit. 0 means no limit. The latest 5760X chip (early revs) has a limit of 2047 that cannot be exceeded. If exceeded, the chip will send out just a small number of segments. Call netif_set_tso_max_segs() if the device has a limit. Fixes: 2012a6abc876 ("bnxt_en: Add 5760X (P7) PCI IDs") Reviewed-by: Ajit Khaparde <[email protected]> Reviewed-by: Somnath Kotur <[email protected]> Signed-off-by: Michael Chan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-20bnxt_en: Update firmware interface to 1.10.3.44Michael Chan1-133/+178
The relevant change is the max_tso_segs value returned by firmware in the HWRM_FUNC_QCAPS response. This value will be used in the next patch to cap the TSO segments. Signed-off-by: Michael Chan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-20net: stmmac: Assign configured channel value to EXTTS eventOleksij Rempel1-1/+5
Assign the configured channel value to the EXTTS event in the timestamp interrupt handler. Without assigning the correct channel, applications like ts2phc will refuse to accept the event, resulting in errors such as: ... ts2phc[656.834]: config item end1.ts2phc.pin_index is 0 ts2phc[656.834]: config item end1.ts2phc.channel is 3 ts2phc[656.834]: config item end1.ts2phc.extts_polarity is 2 ts2phc[656.834]: config item end1.ts2phc.extts_correction is 0 ... ts2phc[656.862]: extts on unexpected channel ts2phc[658.141]: extts on unexpected channel ts2phc[659.140]: extts on unexpected channel Fixes: f4da56529da60 ("net: stmmac: Add support for external trigger timestamping") Cc: [email protected] Signed-off-by: Oleksij Rempel <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-06-20Merge tag 'nf-24-06-19' of ↵Paolo Abeni11-26/+776
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: Patch #1 fixes the suspicious RCU usage warning that resulted from the recent fix for the race between namespace cleanup and gc in ipset left out checking the pernet exit phase when calling rcu_dereference_protected(), from Jozsef Kadlecsik. Patch #2 fixes incorrect input and output netdevice in SRv6 prerouting hooks, from Jianguo Wu. Patch #3 moves nf_hooks_lwtunnel sysctl toggle to the netfilter core. The connection tracking system is loaded on-demand, this ensures availability of this knob regardless. Patch #4-#5 adds selftests for SRv6 netfilter hooks also from Jianguo Wu. netfilter pull request 24-06-19 * tag 'nf-24-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: add selftest for the SRv6 End.DX6 behavior with netfilter selftests: add selftest for the SRv6 End.DX4 behavior with netfilter netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core seg6: fix parameter passing when calling NF_HOOK() in End.DX4 and End.DX6 behaviors netfilter: ipset: Fix suspicious rcu_dereference_protected() ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-06-20net: do not leave a dangling sk pointer, when socket creation failsIgnat Korchagin1-0/+3
It is possible to trigger a use-after-free by: * attaching an fentry probe to __sock_release() and the probe calling the bpf_get_socket_cookie() helper * running traceroute -I 1.1.1.1 on a freshly booted VM A KASAN enabled kernel will log something like below (decoded and stripped): ================================================================== BUG: KASAN: slab-use-after-free in __sock_gen_cookie (./arch/x86/include/asm/atomic64_64.h:15 ./include/linux/atomic/atomic-arch-fallback.h:2583 ./include/linux/atomic/atomic-instrumented.h:1611 net/core/sock_diag.c:29) Read of size 8 at addr ffff888007110dd8 by task traceroute/299 CPU: 2 PID: 299 Comm: traceroute Tainted: G E 6.10.0-rc2+ #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1)) print_report (mm/kasan/report.c:378 mm/kasan/report.c:488) ? __sock_gen_cookie (./arch/x86/include/asm/atomic64_64.h:15 ./include/linux/atomic/atomic-arch-fallback.h:2583 ./include/linux/atomic/atomic-instrumented.h:1611 net/core/sock_diag.c:29) kasan_report (mm/kasan/report.c:603) ? __sock_gen_cookie (./arch/x86/include/asm/atomic64_64.h:15 ./include/linux/atomic/atomic-arch-fallback.h:2583 ./include/linux/atomic/atomic-instrumented.h:1611 net/core/sock_diag.c:29) kasan_check_range (mm/kasan/generic.c:183 mm/kasan/generic.c:189) __sock_gen_cookie (./arch/x86/include/asm/atomic64_64.h:15 ./include/linux/atomic/atomic-arch-fallback.h:2583 ./include/linux/atomic/atomic-instrumented.h:1611 net/core/sock_diag.c:29) bpf_get_socket_ptr_cookie (./arch/x86/include/asm/preempt.h:94 ./include/linux/sock_diag.h:42 net/core/filter.c:5094 net/core/filter.c:5092) bpf_prog_875642cf11f1d139___sock_release+0x6e/0x8e bpf_trampoline_6442506592+0x47/0xaf __sock_release (net/socket.c:652) __sock_create (net/socket.c:1601) ... Allocated by task 299 on cpu 2 at 78.328492s: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:68) __kasan_slab_alloc (mm/kasan/common.c:312 mm/kasan/common.c:338) kmem_cache_alloc_noprof (mm/slub.c:3941 mm/slub.c:4000 mm/slub.c:4007) sk_prot_alloc (net/core/sock.c:2075) sk_alloc (net/core/sock.c:2134) inet_create (net/ipv4/af_inet.c:327 net/ipv4/af_inet.c:252) __sock_create (net/socket.c:1572) __sys_socket (net/socket.c:1660 net/socket.c:1644 net/socket.c:1706) __x64_sys_socket (net/socket.c:1718) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Freed by task 299 on cpu 2 at 78.328502s: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:68) kasan_save_free_info (mm/kasan/generic.c:582) poison_slab_object (mm/kasan/common.c:242) __kasan_slab_free (mm/kasan/common.c:256) kmem_cache_free (mm/slub.c:4437 mm/slub.c:4511) __sk_destruct (net/core/sock.c:2117 net/core/sock.c:2208) inet_create (net/ipv4/af_inet.c:397 net/ipv4/af_inet.c:252) __sock_create (net/socket.c:1572) __sys_socket (net/socket.c:1660 net/socket.c:1644 net/socket.c:1706) __x64_sys_socket (net/socket.c:1718) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Fix this by clearing the struct socket reference in sk_common_release() to cover all protocol families create functions, which may already attached the reference to the sk object with sock_init_data(). Fixes: c5dbb89fc2ac ("bpf: Expose bpf_get_socket_cookie to tracing programs") Suggested-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Ignat Korchagin <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/netdev/[email protected]/T/ Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: D. Wythe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-06-19net/tcp_ao: Don't leak ao_info on error-pathDmitry Safonov1-2/+4
It seems I introduced it together with TCP_AO_CMDF_AO_REQUIRED, on version 5 [1] of TCP-AO patches. Quite frustrative that having all these selftests that I've written, running kmemtest & kcov was always in todo. [1]: https://lore.kernel.org/netdev/[email protected]/ Reported-by: Jakub Kicinski <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Fixes: 0aadc73995d0 ("net/tcp: Prevent TCP-MD5 with TCP-AO being set") Cc: [email protected] Signed-off-by: Dmitry Safonov <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-19ice: Fix VSI list rule with ICE_SW_LKUP_LAST typeMarcin Szycik1-2/+4
Adding/updating VSI list rule, as well as allocating/freeing VSI list resource are called several times with type ICE_SW_LKUP_LAST, which fails because ice_update_vsi_list_rule() and ice_aq_alloc_free_vsi_list() consider it invalid. Allow calling these functions with ICE_SW_LKUP_LAST. This fixes at least one issue in switchdev mode, where the same rule with different action cannot be added, e.g.: tc filter add dev $PF1 ingress protocol arp prio 0 flower skip_sw \ dst_mac ff:ff:ff:ff:ff:ff action mirred egress redirect dev $VF1_PR tc filter add dev $PF1 ingress protocol arp prio 0 flower skip_sw \ dst_mac ff:ff:ff:ff:ff:ff action mirred egress redirect dev $VF2_PR Fixes: 0f94570d0cae ("ice: allow adding advanced rules") Suggested-by: Michal Swiatkowski <[email protected]> Reviewed-by: Michal Swiatkowski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Marcin Szycik <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-19ipv6: bring NLM_DONE out to a separate recv() againJakub Kicinski1-1/+2
Commit under Fixes optimized the number of recv() calls needed during RTM_GETROUTE dumps, but we got multiple reports of applications hanging on recv() calls. Applications expect that a route dump will be terminated with a recv() reading an individual NLM_DONE message. Coalescing NLM_DONE is perfectly legal in netlink, but even tho reporters fixed the code in respective projects, chances are it will take time for those applications to get updated. So revert to old behavior (for now)? This is an IPv6 version of commit 460b0d33cf10 ("inet: bring NLM_DONE out to a separate recv() again"). Reported-by: Maciej Żenczykowski <[email protected]> Link: https://lore.kernel.org/all/CANP3RGc1RG71oPEBXNx_WZFP9AyphJefdO4paczN92n__ds4ow@mail.gmail.com Reported-by: Stefano Brivio <[email protected]> Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth Reported-by: Ilya Maximets <[email protected]> Link: https://lore.kernel.org/all/[email protected] Fixes: 5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()") Tested-by: Ilya Maximets <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-19selftests: add selftest for the SRv6 End.DX6 behavior with netfilterJianguo Wu3-0/+342
this selftest is designed for evaluating the SRv6 End.DX6 behavior used with netfilter(rpfilter), in this example, for implementing IPv6 L3 VPN use cases. Signed-off-by: Jianguo Wu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-06-19selftests: add selftest for the SRv6 End.DX4 behavior with netfilterJianguo Wu3-0/+337
this selftest is designed for evaluating the SRv6 End.DX4 behavior used with netfilter(rpfilter), in this example, for implementing IPv4 L3 VPN use cases. Signed-off-by: Jianguo Wu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-06-19netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter coreJianguo Wu5-17/+87
Currently, the sysctl net.netfilter.nf_hooks_lwtunnel depends on the nf_conntrack module, but the nf_conntrack module is not always loaded. Therefore, accessing net.netfilter.nf_hooks_lwtunnel may have an error. Move sysctl nf_hooks_lwtunnel into the netfilter core. Fixes: 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane") Suggested-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Jianguo Wu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-06-19seg6: fix parameter passing when calling NF_HOOK() in End.DX4 and End.DX6 ↵Jianguo Wu1-4/+4
behaviors input_action_end_dx4() and input_action_end_dx6() are called NF_HOOK() for PREROUTING hook, in PREROUTING hook, we should passing a valid indev, and a NULL outdev to NF_HOOK(), otherwise may trigger a NULL pointer dereference, as below: [74830.647293] BUG: kernel NULL pointer dereference, address: 0000000000000090 [74830.655633] #PF: supervisor read access in kernel mode [74830.657888] #PF: error_code(0x0000) - not-present page [74830.659500] PGD 0 P4D 0 [74830.660450] Oops: 0000 [#1] PREEMPT SMP PTI ... [74830.664953] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [74830.666569] RIP: 0010:rpfilter_mt+0x44/0x15e [ipt_rpfilter] ... [74830.689725] Call Trace: [74830.690402] <IRQ> [74830.690953] ? show_trace_log_lvl+0x1c4/0x2df [74830.692020] ? show_trace_log_lvl+0x1c4/0x2df [74830.693095] ? ipt_do_table+0x286/0x710 [ip_tables] [74830.694275] ? __die_body.cold+0x8/0xd [74830.695205] ? page_fault_oops+0xac/0x140 [74830.696244] ? exc_page_fault+0x62/0x150 [74830.697225] ? asm_exc_page_fault+0x22/0x30 [74830.698344] ? rpfilter_mt+0x44/0x15e [ipt_rpfilter] [74830.699540] ipt_do_table+0x286/0x710 [ip_tables] [74830.700758] ? ip6_route_input+0x19d/0x240 [74830.701752] nf_hook_slow+0x3f/0xb0 [74830.702678] input_action_end_dx4+0x19b/0x1e0 [74830.703735] ? input_action_end_t+0xe0/0xe0 [74830.704734] seg6_local_input_core+0x2d/0x60 [74830.705782] lwtunnel_input+0x5b/0xb0 [74830.706690] __netif_receive_skb_one_core+0x63/0xa0 [74830.707825] process_backlog+0x99/0x140 [74830.709538] __napi_poll+0x2c/0x160 [74830.710673] net_rx_action+0x296/0x350 [74830.711860] __do_softirq+0xcb/0x2ac [74830.713049] do_softirq+0x63/0x90 input_action_end_dx4() passing a NULL indev to NF_HOOK(), and finally trigger a NULL dereference in rpfilter_mt()->rpfilter_is_loopback(): static bool rpfilter_is_loopback(const struct sk_buff *skb, const struct net_device *in) { // in is NULL return skb->pkt_type == PACKET_LOOPBACK || in->flags & IFF_LOOPBACK; } Fixes: 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane") Signed-off-by: Jianguo Wu <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-06-19netfilter: ipset: Fix suspicious rcu_dereference_protected()Jozsef Kadlecsik1-5/+6
When destroying all sets, we are either in pernet exit phase or are executing a "destroy all sets command" from userspace. The latter was taken into account in ip_set_dereference() (nfnetlink mutex is held), but the former was not. The patch adds the required check to rcu_dereference_protected() in ip_set_dereference(). Fixes: 4e7aaa6b82d6 ("netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type") Reported-by: [email protected] Reported-by: [email protected] Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Jozsef Kadlecsik <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-06-19selftests: openvswitch: Set value to nla flags.Adrian Moreno1-1/+1
Netlink flags, although they don't have payload at the netlink level, are represented as having "True" as value in pyroute2. Without it, trying to add a flow with a flag-type action (e.g: pop_vlan) fails with the following traceback: Traceback (most recent call last): File "[...]/ovs-dpctl.py", line 2498, in <module> sys.exit(main(sys.argv)) ^^^^^^^^^^^^^^ File "[...]/ovs-dpctl.py", line 2487, in main ovsflow.add_flow(rep["dpifindex"], flow) File "[...]/ovs-dpctl.py", line 2136, in add_flow reply = self.nlm_request( ^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/nlsocket.py", line 822, in nlm_request return tuple(self._genlm_request(*argv, **kwarg)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/generic/__init__.py", line 126, in nlm_request return tuple(super().nlm_request(*argv, **kwarg)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/nlsocket.py", line 1124, in nlm_request self.put(msg, msg_type, msg_flags, msg_seq=msg_seq) File "[...]/pyroute2/netlink/nlsocket.py", line 389, in put self.sendto_gate(msg, addr) File "[...]/pyroute2/netlink/nlsocket.py", line 1056, in sendto_gate msg.encode() File "[...]/pyroute2/netlink/__init__.py", line 1245, in encode offset = self.encode_nlas(offset) ^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/__init__.py", line 1560, in encode_nlas nla_instance.setvalue(cell[1]) File "[...]/pyroute2/netlink/__init__.py", line 1265, in setvalue nlv.setvalue(nla_tuple[1]) ~~~~~~~~~^^^ IndexError: list index out of range Signed-off-by: Adrian Moreno <[email protected]> Acked-by: Aaron Conole <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-19octeontx2-pf: Fix linking objects into multiple modulesGeetha sowjanya3-2/+10
This patch fixes the below build warning messages that are caused due to linking same files to multiple modules by exporting the required symbols. "scripts/Makefile.build:244: drivers/net/ethernet/marvell/octeontx2/nic/Makefile: otx2_devlink.o is added to multiple modules: rvu_nicpf rvu_nicvf scripts/Makefile.build:244: drivers/net/ethernet/marvell/octeontx2/nic/Makefile: otx2_dcbnl.o is added to multiple modules: rvu_nicpf rvu_nicvf" Fixes: 8e67558177f8 ("octeontx2-pf: PFC config support with DCBx"). Signed-off-by: Geetha sowjanya <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-19octeontx2-pf: Add error handling to VLAN unoffload handlingSimon Horman1-1/+4
otx2_sq_append_skb makes used of __vlan_hwaccel_push_inside() to unoffload VLANs - push them from skb meta data into skb data. However, it omitts a check for __vlan_hwaccel_push_inside() returning NULL. Found by inspection based on [1] and [2]. Compile tested only. [1] Re: [PATCH net-next v1] net: stmmac: Enable TSO on VLANs https://lore.kernel.org/all/[email protected]/ [2] Re: [PATCH net-next v2] net: stmmac: Enable TSO on VLANs https://lore.kernel.org/all/CANn89i+11L5=tKsa7V7Aeyxaj6nYGRwy35PAbCRYJ73G+b25sg@mail.gmail.com/ Fixes: fd9d7859db6c ("octeontx2-pf: Implement ingress/egress VLAN offload") Signed-off-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-19Merge branch 'virtio_net-csum-xdp-fixes'David S. Miller1-3/+29
Heng Qi says: ==================== virtio_net: fixes for checksum offloading and XDP handling This series of patches aim to address two specific issues identified in the virtio_net driver related to checksum offloading and XDP processing of fully checksummed packets. The first patch corrects the handling of checksum offloading in the driver. The second patch addresses an issue where the XDP program had no trouble with fully checksummed packets. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-06-19virtio_net: fixing XDP for fully checksummed packets handlingHeng Qi1-1/+19
The XDP program can't correctly handle partially checksummed packets, but works fine with fully checksummed packets. If the device has already validated fully checksummed packets, then the driver doesn't need to re-validate them, saving CPU resources. Additionally, the driver does not drop all partially checksummed packets when VIRTIO_NET_F_GUEST_CSUM is not negotiated. This is not a bug, as the driver has always done this. Fixes: 436c9453a1ac ("virtio-net: keep vnet header zeroed after processing XDP") Signed-off-by: Heng Qi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-19virtio_net: checksum offloading handling fixHeng Qi1-2/+10
In virtio spec 0.95, VIRTIO_NET_F_GUEST_CSUM was designed to handle partially checksummed packets, and the validation of fully checksummed packets by the device is independent of VIRTIO_NET_F_GUEST_CSUM negotiation. However, the specification erroneously stated: "If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device MUST set flags to zero and SHOULD supply a fully checksummed packet to the driver." This statement is inaccurate because even without VIRTIO_NET_F_GUEST_CSUM negotiation, the device can still set the VIRTIO_NET_HDR_F_DATA_VALID flag. Essentially, the device can facilitate the validation of these packets' checksums - a process known as RX checksum offloading - removing the need for the driver to do so. This scenario is currently not implemented in the driver and requires correction. The necessary specification correction[1] has been made and approved in the virtio TC vote. [1] https://lists.oasis-open.org/archives/virtio-comment/202401/msg00011.html Fixes: 4f49129be6fa ("virtio-net: Set RXCSUM feature if GUEST_CSUM is available") Signed-off-by: Heng Qi <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Acked-by: Jason Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-19net: usb: ax88179_178a: improve reset checkJose Ignacio Tornos Martinez1-5/+13
After ecf848eb934b ("net: usb: ax88179_178a: fix link status when link is set to down/up") to not reset from usbnet_open after the reset from usbnet_probe at initialization stage to speed up this, some issues have been reported. It seems to happen that if the initialization is slower, and some time passes between the probe operation and the open operation, the second reset from open is necessary too to have the device working. The reason is that if there is no activity with the phy, this is "disconnected". In order to improve this, the solution is to detect when the phy is "disconnected", and we can use the phy status register for this. So we will only reset the device from reset operation in this situation, that is, only if necessary. The same bahavior is happening when the device is stopped (link set to down) and later is restarted (link set to up), so if the phy keeps working we only need to enable the mac again, but if enough time passes between the device stop and restart, reset is necessary, and we can detect the situation checking the phy status register too. cc: [email protected] # 6.6+ Fixes: ecf848eb934b ("net: usb: ax88179_178a: fix link status when link is set to down/up") Reported-by: Yongqin Liu <[email protected]> Reported-by: Antje Miederhöfer <[email protected]> Reported-by: Arne Fitzenreiter <[email protected]> Tested-by: Yongqin Liu <[email protected]> Tested-by: Antje Miederhöfer <[email protected]> Signed-off-by: Jose Ignacio Tornos Martinez <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-18net: stmmac: No need to calculate speed divider when offload is disabledXiaolei Wang1-18/+22
commit be27b8965297 ("net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters") introduced a problem. When deleting, it prompts "Invalid portTransmitRate 0 (idleSlope - sendSlope)" and exits. Add judgment on cbs.enable. Only when offload is enabled, speed divider needs to be calculated. Fixes: be27b8965297 ("net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters") Signed-off-by: Xiaolei Wang <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18net: phy: dp83tg720: get master/slave configuration in link down stateOleksij Rempel1-3/+21
Get master/slave configuration for initial system start with the link in down state. This ensures ethtool shows current configuration. Also fixes link reconfiguration with ethtool while in down state, preventing ethtool from displaying outdated configuration. Even though dp83tg720_config_init() is executed periodically as long as the link is in admin up state but no carrier is detected, this is not sufficient for the link in admin down state where dp83tg720_read_status() is not periodically executed. To cover this case, we need an extra read role configuration in dp83tg720_config_aneg(). Fixes: cb80ee2f9bee1 ("net: phy: Add support for the DP83TG720S Ethernet PHY") Cc: [email protected] Signed-off-by: Oleksij Rempel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18net: phy: dp83tg720: wake up PHYs in managed modeOleksij Rempel1-3/+15
In case this PHY is bootstrapped for managed mode, we need to manually wake it. Otherwise no link will be detected. Cc: [email protected] Fixes: cb80ee2f9bee1 ("net: phy: Add support for the DP83TG720S Ethernet PHY") Signed-off-by: Oleksij Rempel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18selftests: openvswitch: Use bash as interpreterSimon Horman1-1/+1
openvswitch.sh makes use of substitutions of the form ${ns:0:1}, to obtain the first character of $ns. Empirically, this is works with bash but not dash. When run with dash these evaluate to an empty string and printing an error to stdout. # dash -c 'ns=client; echo "${ns:0:1}"' 2>error # cat error dash: 1: Bad substitution # bash -c 'ns=client; echo "${ns:0:1}"' 2>error c # cat error This leads to tests that neither pass nor fail. F.e. TEST: arp_ping [START] adding sandbox 'test_arp_ping' Adding DP/Bridge IF: sbx:test_arp_ping dp:arpping {, , } create namespaces ./openvswitch.sh: 282: eval: Bad substitution TEST: ct_connect_v4 [START] adding sandbox 'test_ct_connect_v4' Adding DP/Bridge IF: sbx:test_ct_connect_v4 dp:ct4 {, , } ./openvswitch.sh: 322: eval: Bad substitution create namespaces Resolve this by making openvswitch.sh a bash script. Fixes: 918423fda910 ("selftests: openvswitch: add an initial flow programming case") Signed-off-by: Simon Horman <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18ptp: fix integer overflow in max_vclocks_storeDan Carpenter1-2/+1
On 32bit systems, the "4 * max" multiply can overflow. Use kcalloc() to do the allocation to prevent this. Fixes: 44c494c8e30e ("ptp: track available ptp vclocks information") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Heng Qi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-18sched: act_ct: add netns into the key of tcf_ct_flow_tableXin Long1-5/+11
zones_ht is a global hashtable for flow_table with zone as key. However, it does not consider netns when getting a flow_table from zones_ht in tcf_ct_init(), and it means an act_ct action in netns A may get a flow_table that belongs to netns B if it has the same zone value. In Shuang's test with the TOPO: tcf2_c <---> tcf2_sw1 <---> tcf2_sw2 <---> tcf2_s tcf2_sw1 and tcf2_sw2 saw the same flow and used the same flow table, which caused their ct entries entering unexpected states and the TCP connection not able to end normally. This patch fixes the issue simply by adding netns into the key of tcf_ct_flow_table so that an act_ct action gets a flow_table that belongs to its own netns in tcf_ct_init(). Note that for easy coding we don't use tcf_ct_flow_table.nf_ft.net, as the ct_ft is initialized after inserting it to the hashtable in tcf_ct_flow_table_get() and also it requires to implement several functions in rhashtable_params including hashfn, obj_hashfn and obj_cmpfn. Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to flow table") Reported-by: Shuang Li <[email protected]> Signed-off-by: Xin Long <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/1db5b6cc6902c5fc6f8c6cbd85494a2008087be5.1718488050.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <[email protected]>
2024-06-18tipc: force a dst refcount before doing decryptionXin Long1-0/+1
As it says in commit 3bc07321ccc2 ("xfrm: Force a dst refcount before entering the xfrm type handlers"): "Crypto requests might return asynchronous. In this case we leave the rcu protected region, so force a refcount on the skb's destination entry before we enter the xfrm type input/output handlers." On TIPC decryption path it has the same problem, and skb_dst_force() should be called before doing decryption to avoid a possible crash. Shuang reported this issue when this warning is triggered: [] WARNING: include/net/dst.h:337 tipc_sk_rcv+0x1055/0x1ea0 [tipc] [] Kdump: loaded Tainted: G W --------- - - 4.18.0-496.el8.x86_64+debug [] Workqueue: crypto cryptd_queue_worker [] RIP: 0010:tipc_sk_rcv+0x1055/0x1ea0 [tipc] [] Call Trace: [] tipc_sk_mcast_rcv+0x548/0xea0 [tipc] [] tipc_rcv+0xcf5/0x1060 [tipc] [] tipc_aead_decrypt_done+0x215/0x2e0 [tipc] [] cryptd_aead_crypt+0xdb/0x190 [] cryptd_queue_worker+0xed/0x190 [] process_one_work+0x93d/0x17e0 Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Reported-by: Shuang Li <[email protected]> Signed-off-by: Xin Long <[email protected]> Link: https://lore.kernel.org/r/fbe3195fad6997a4eec62d9bf076b2ad03ac336b.1718476040.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <[email protected]>
2024-06-18net/sched: act_api: fix possible infinite loop in tcf_idr_check_alloc()David Ruth1-2/+1
syzbot found hanging tasks waiting on rtnl_lock [1] A reproducer is available in the syzbot bug. When a request to add multiple actions with the same index is sent, the second request will block forever on the first request. This holds rtnl_lock, and causes tasks to hang. Return -EAGAIN to prevent infinite looping, while keeping documented behavior. [1] INFO: task kworker/1:0:5088 blocked for more than 143 seconds. Not tainted 6.9.0-rc4-syzkaller-00173-g3cdb45594619 #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/1:0 state:D stack:23744 pid:5088 tgid:5088 ppid:2 flags:0x00004000 Workqueue: events_power_efficient reg_check_chans_work Call Trace: <TASK> context_switch kernel/sched/core.c:5409 [inline] __schedule+0xf15/0x5d00 kernel/sched/core.c:6746 __schedule_loop kernel/sched/core.c:6823 [inline] schedule+0xe7/0x350 kernel/sched/core.c:6838 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6895 __mutex_lock_common kernel/locking/mutex.c:684 [inline] __mutex_lock+0x5b8/0x9c0 kernel/locking/mutex.c:752 wiphy_lock include/net/cfg80211.h:5953 [inline] reg_leave_invalid_chans net/wireless/reg.c:2466 [inline] reg_check_chans_work+0x10a/0x10e0 net/wireless/reg.c:2481 Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=b87c222546179f4513a7 Signed-off-by: David Ruth <[email protected]> Reviewed-by: Jamal Hadi Salim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-06-18Merge branch 'net-lan743x-fixes-for-multiple-wol-related-issues'Paolo Abeni4-32/+146
Raju Lakkaraju says: ==================== net: lan743x: Fixes for multiple WOL related issues This patch series implement the following fixes: 1. Disable WOL upon resume in order to restore full data path operation 2. Support WOL at both the PHY and MAC appropriately 3. Remove interrupt mask clearing from config_init Patch-3 was sent seperately earlier. Review comments in link: https://lore.kernel.org/lkml/[email protected]/T/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-06-18net: phy: mxl-gpy: Remove interrupt mask clearing from config_initRaju Lakkaraju1-20/+38
When the system resumes from sleep, the phy_init_hw() function invokes config_init(), which clears all interrupt masks and causes wake events to be lost in subsequent wake sequences. Remove interrupt mask clearing from config_init() and preserve relevant masks in config_intr(). Fixes: 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: Raju Lakkaraju <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-06-18net: lan743x: Support WOL at both the PHY and MAC appropriatelyRaju Lakkaraju3-8/+58
Prevent options not supported by the PHY from being requested to it by the MAC Whenever a WOL option is supported by both, the PHY is given priority since that usually leads to better power savings. Fixes: e9e13b6adc33 ("lan743x: fix for potential NULL pointer dereference with bare card") Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: Raju Lakkaraju <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-06-18net: lan743x: disable WOL upon resume to restore full data path operationRaju Lakkaraju2-4/+50
When Wake-on-LAN (WoL) is active and the system is in suspend mode, triggering a system event can wake the system from sleep, which may block the data path. To restore normal data path functionality after waking, disable all wake-up events. Furthermore, clear all Write 1 to Clear (W1C) status bits by writing 1's to them. Fixes: 4d94282afd95 ("lan743x: Add power management support") Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: Raju Lakkaraju <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-06-18qca_spi: Make interrupt remembering atomicStefan Wahren3-14/+11
The whole mechanism to remember occurred SPI interrupts is not atomic, which could lead to unexpected behavior. So fix this by using atomic bit operations instead. Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-06-18netns: Make get_net_ns() handle zero refcount netYue Haibing1-2/+7
Syzkaller hit a warning: refcount_t: addition on 0; use-after-free. WARNING: CPU: 3 PID: 7890 at lib/refcount.c:25 refcount_warn_saturate+0xdf/0x1d0 Modules linked in: CPU: 3 PID: 7890 Comm: tun Not tainted 6.10.0-rc3-00100-gcaa4f9578aba-dirty #310 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:refcount_warn_saturate+0xdf/0x1d0 Code: 41 49 04 31 ff 89 de e8 9f 1e cd fe 84 db 75 9c e8 76 26 cd fe c6 05 b6 41 49 04 01 90 48 c7 c7 b8 8e 25 86 e8 d2 05 b5 fe 90 <0f> 0b 90 90 e9 79 ff ff ff e8 53 26 cd fe 0f b6 1 RSP: 0018:ffff8881067b7da0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff811c72ac RDX: ffff8881026a2140 RSI: ffffffff811c72b5 RDI: 0000000000000001 RBP: ffff8881067b7db0 R08: 0000000000000000 R09: 205b5d3730353139 R10: 0000000000000000 R11: 205d303938375420 R12: ffff8881086500c4 R13: ffff8881086500c4 R14: ffff8881086500b0 R15: ffff888108650040 FS: 00007f5b2961a4c0(0000) GS:ffff88823bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055d7ed36fd18 CR3: 00000001482f6000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? show_regs+0xa3/0xc0 ? __warn+0xa5/0x1c0 ? refcount_warn_saturate+0xdf/0x1d0 ? report_bug+0x1fc/0x2d0 ? refcount_warn_saturate+0xdf/0x1d0 ? handle_bug+0xa1/0x110 ? exc_invalid_op+0x3c/0xb0 ? asm_exc_invalid_op+0x1f/0x30 ? __warn_printk+0xcc/0x140 ? __warn_printk+0xd5/0x140 ? refcount_warn_saturate+0xdf/0x1d0 get_net_ns+0xa4/0xc0 ? __pfx_get_net_ns+0x10/0x10 open_related_ns+0x5a/0x130 __tun_chr_ioctl+0x1616/0x2370 ? __sanitizer_cov_trace_switch+0x58/0xa0 ? __sanitizer_cov_trace_const_cmp2+0x1c/0x30 ? __pfx_tun_chr_ioctl+0x10/0x10 tun_chr_ioctl+0x2f/0x40 __x64_sys_ioctl+0x11b/0x160 x64_sys_call+0x1211/0x20d0 do_syscall_64+0x9e/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5b28f165d7 Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 8 RSP: 002b:00007ffc2b59c5e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5b28f165d7 RDX: 0000000000000000 RSI: 00000000000054e3 RDI: 0000000000000003 RBP: 00007ffc2b59c650 R08: 00007f5b291ed8c0 R09: 00007f5b2961a4c0 R10: 0000000029690010 R11: 0000000000000246 R12: 0000000000400730 R13: 00007ffc2b59cf40 R14: 0000000000000000 R15: 0000000000000000 </TASK> Kernel panic - not syncing: kernel: panic_on_warn set ... This is trigger as below: ns0 ns1 tun_set_iff() //dev is tun0 tun->dev = dev //ip link set tun0 netns ns1 put_net() //ref is 0 __tun_chr_ioctl() //TUNGETDEVNETNS net = dev_net(tun->dev); open_related_ns(&net->ns, get_net_ns); //ns1 get_net_ns() get_net() //addition on 0 Use maybe_get_net() in get_net_ns in case net's ref is zero to fix this Fixes: 0c3e0e3bb623 ("tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device") Signed-off-by: Yue Haibing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-06-17xfrm6: check ip6_dst_idev() return value in xfrm6_get_saddr()Eric Dumazet1-1/+7
ip6_dst_idev() can return NULL, xfrm6_get_saddr() must act accordingly. syzbot reported: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 PID: 12 Comm: kworker/u8:1 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381d4e2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker RIP: 0010:xfrm6_get_saddr+0x93/0x130 net/ipv6/xfrm6_policy.c:64 Code: df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 97 00 00 00 4c 8b ab d8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 86 00 00 00 4d 8b 6d 00 e8 ca 13 47 01 48 b8 00 RSP: 0018:ffffc90000117378 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88807b079dc0 RCX: ffffffff89a0d6d7 RDX: 0000000000000000 RSI: ffffffff89a0d6e9 RDI: ffff88807b079e98 RBP: ffff88807ad73248 R08: 0000000000000007 R09: fffffffffffff000 R10: ffff88807b079dc0 R11: 0000000000000007 R12: ffffc90000117480 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4586d00440 CR3: 0000000079042000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> xfrm_get_saddr net/xfrm/xfrm_policy.c:2452 [inline] xfrm_tmpl_resolve_one net/xfrm/xfrm_policy.c:2481 [inline] xfrm_tmpl_resolve+0xa26/0xf10 net/xfrm/xfrm_policy.c:2541 xfrm_resolve_and_create_bundle+0x140/0x2570 net/xfrm/xfrm_policy.c:2835 xfrm_bundle_lookup net/xfrm/xfrm_policy.c:3070 [inline] xfrm_lookup_with_ifid+0x4d1/0x1e60 net/xfrm/xfrm_policy.c:3201 xfrm_lookup net/xfrm/xfrm_policy.c:3298 [inline] xfrm_lookup_route+0x3b/0x200 net/xfrm/xfrm_policy.c:3309 ip6_dst_lookup_flow+0x15c/0x1d0 net/ipv6/ip6_output.c:1256 send6+0x611/0xd20 drivers/net/wireguard/socket.c:139 wg_socket_send_skb_to_peer+0xf9/0x220 drivers/net/wireguard/socket.c:178 wg_socket_send_buffer_to_peer+0x12b/0x190 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation+0x227/0x360 drivers/net/wireguard/send.c:40 wg_packet_handshake_send_worker+0x1c/0x30 drivers/net/wireguard/send.c:51 process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231 process_scheduled_works kernel/workqueue.c:3312 [inline] worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-17ipv6: prevent possible NULL dereference in rt6_probe()Eric Dumazet1-0/+2
syzbot caught a NULL dereference in rt6_probe() [1] Bail out if __in6_dev_get() returns NULL. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cb: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000658-0x000000000000065f] CPU: 1 PID: 22444 Comm: syz-executor.0 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381d4e2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 RIP: 0010:rt6_probe net/ipv6/route.c:656 [inline] RIP: 0010:find_match+0x8c4/0xf50 net/ipv6/route.c:758 Code: 14 fd f7 48 8b 85 38 ff ff ff 48 c7 45 b0 00 00 00 00 48 8d b8 5c 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 19 RSP: 0018:ffffc900034af070 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90004521000 RDX: 00000000000000cb RSI: ffffffff8990d0cd RDI: 000000000000065c RBP: ffffc900034af150 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000002 R12: 000000000000000a R13: 1ffff92000695e18 R14: ffff8880244a1d20 R15: 0000000000000000 FS: 00007f4844a5a6c0(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b31b27000 CR3: 000000002d42c000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rt6_nh_find_match+0xfa/0x1a0 net/ipv6/route.c:784 nexthop_for_each_fib6_nh+0x26d/0x4a0 net/ipv4/nexthop.c:1496 __find_rr_leaf+0x6e7/0xe00 net/ipv6/route.c:825 find_rr_leaf net/ipv6/route.c:853 [inline] rt6_select net/ipv6/route.c:897 [inline] fib6_table_lookup+0x57e/0xa30 net/ipv6/route.c:2195 ip6_pol_route+0x1cd/0x1150 net/ipv6/route.c:2231 pol_lookup_func include/net/ip6_fib.h:616 [inline] fib6_rule_lookup+0x386/0x720 net/ipv6/fib6_rules.c:121 ip6_route_output_flags_noref net/ipv6/route.c:2639 [inline] ip6_route_output_flags+0x1d0/0x640 net/ipv6/route.c:2651 ip6_dst_lookup_tail.constprop.0+0x961/0x1760 net/ipv6/ip6_output.c:1147 ip6_dst_lookup_flow+0x99/0x1d0 net/ipv6/ip6_output.c:1250 rawv6_sendmsg+0xdab/0x4340 net/ipv6/raw.c:898 inet_sendmsg+0x119/0x140 net/ipv4/af_inet.c:853 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_write_iter+0x4b8/0x5c0 net/socket.c:1160 new_sync_write fs/read_write.c:497 [inline] vfs_write+0x6b6/0x1140 fs/read_write.c:590 ksys_write+0x1f8/0x260 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 52e1635631b3 ("[IPV6]: ROUTE: Add router_probe_interval sysctl.") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jason Xing <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-17ipv6: prevent possible NULL deref in fib6_nh_init()Eric Dumazet1-1/+1
syzbot reminds us that in6_dev_get() can return NULL. fib6_nh_init() ip6_validate_gw( &idev ) ip6_route_check_nh( idev ) *idev = in6_dev_get(dev); // can be NULL Oops: general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7] CPU: 0 PID: 11237 Comm: syz-executor.3 Not tainted 6.10.0-rc2-syzkaller-00249-gbe27b8965297 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 RIP: 0010:fib6_nh_init+0x640/0x2160 net/ipv6/route.c:3606 Code: 00 00 fc ff df 4c 8b 64 24 58 48 8b 44 24 28 4c 8b 74 24 30 48 89 c1 48 89 44 24 28 48 8d 98 e0 05 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 38 84 c0 0f 85 b3 17 00 00 8b 1b 31 ff 89 de e8 b8 8b RSP: 0018:ffffc900032775a0 EFLAGS: 00010202 RAX: 00000000000000bc RBX: 00000000000005e0 RCX: 0000000000000000 RDX: 0000000000000010 RSI: ffffc90003277a54 RDI: ffff88802b3a08d8 RBP: ffffc900032778b0 R08: 00000000000002fc R09: 0000000000000000 R10: 00000000000002fc R11: 0000000000000000 R12: ffff88802b3a08b8 R13: 1ffff9200064eec8 R14: ffffc90003277a00 R15: dffffc0000000000 FS: 00007f940feb06c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000245e8000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ip6_route_info_create+0x99e/0x12b0 net/ipv6/route.c:3809 ip6_route_add+0x28/0x160 net/ipv6/route.c:3853 ipv6_route_ioctl+0x588/0x870 net/ipv6/route.c:4483 inet6_ioctl+0x21a/0x280 net/ipv6/af_inet6.c:579 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f940f07cea9 Fixes: 428604fb118f ("ipv6: do not set routes if disable_ipv6 has been enabled") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Lorenzo Bianconi <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-17selftests: mptcp: userspace_pm: fixed subtest namesMatthieu Baerts (NGI0)1-18/+28
It is important to have fixed (sub)test names in TAP, because these names are used to identify them. If they are not fixed, tracking cannot be done. Some subtests from the userspace_pm selftest were using random numbers in their names: the client and server address IDs from $RANDOM, and the client port number randomly picked by the kernel when creating the connection. These values have been replaced by 'client' and 'server' words: that's even more helpful than showing random numbers. Note that the addresses IDs are incremented and decremented in the test: +1 or -1 are then displayed in these cases. Not to loose info that can be useful for debugging in case of issues, these random numbers are now displayed at the beginning of the test. Fixes: f589234e1af0 ("selftests: mptcp: userspace_pm: format subtests results in TAP") Cc: [email protected] Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/20240614-upstream-net-20240614-selftests-mptcp-uspace-pm-fixed-test-names-v1-1-460ad3edb429@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-17tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()Eric Dumazet1-0/+1
Some applications were reporting ETIMEDOUT errors on apparently good looking flows, according to packet dumps. We were able to root cause the issue to an accidental setting of tp->retrans_stamp in the following scenario: - client sends TFO SYN with data. - server has TFO disabled, ACKs only SYN but not payload. - client receives SYNACK covering only SYN. - tcp_ack() eats SYN and sets tp->retrans_stamp to 0. - tcp_rcv_fastopen_synack() calls tcp_xmit_retransmit_queue() to retransmit TFO payload w/o SYN, sets tp->retrans_stamp to "now", but we are not in any loss recovery state. - TFO payload is ACKed. - we are not in any loss recovery state, and don't see any dupacks, so we don't get to any code path that clears tp->retrans_stamp. - tp->retrans_stamp stays non-zero for the lifetime of the connection. - after first RTO, tcp_clamp_rto_to_user_timeout() clamps second RTO to 1 jiffy due to bogus tp->retrans_stamp. - on clamped RTO with non-zero icsk_retransmits, retransmits_timed_out() sets start_ts from tp->retrans_stamp from TFO payload retransmit hours/days ago, and computes bogus long elapsed time for loss recovery, and suffers ETIMEDOUT early. Fixes: a7abf3cd76e1 ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()") CC: [email protected] Co-developed-by: Neal Cardwell <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Co-developed-by: Yuchung Cheng <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-17netrom: Fix a memory leak in nr_heartbeat_expiry()Gavrilov Ilia1-1/+2
syzbot reported a memory leak in nr_create() [0]. Commit 409db27e3a2e ("netrom: Fix use-after-free of a listening socket.") added sock_hold() to the nr_heartbeat_expiry() function, where a) a socket has a SOCK_DESTROY flag or b) a listening socket has a SOCK_DEAD flag. But in the case "a," when the SOCK_DESTROY flag is set, the file descriptor has already been closed and the nr_release() function has been called. So it makes no sense to hold the reference count because no one will call another nr_destroy_socket() and put it as in the case "b." nr_connect nr_establish_data_link nr_start_heartbeat nr_release switch (nr->state) case NR_STATE_3 nr->state = NR_STATE_2 sock_set_flag(sk, SOCK_DESTROY); nr_rx_frame nr_process_rx_frame switch (nr->state) case NR_STATE_2 nr_state2_machine() nr_disconnect() nr_sk(sk)->state = NR_STATE_0 sock_set_flag(sk, SOCK_DEAD) nr_heartbeat_expiry switch (nr->state) case NR_STATE_0 if (sock_flag(sk, SOCK_DESTROY) || (sk->sk_state == TCP_LISTEN && sock_flag(sk, SOCK_DEAD))) sock_hold() // ( !!! ) nr_destroy_socket() To fix the memory leak, let's call sock_hold() only for a listening socket. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with Syzkaller. [0]: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16 Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16 Fixes: 409db27e3a2e ("netrom: Fix use-after-free of a listening socket.") Signed-off-by: Gavrilov Ilia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-14Merge branch '40GbE' of ↵Jakub Kicinski2-3/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-06-11 (ice) This series contains updates to ice driver only. En-Wei Wu resolves IRQ collision during suspend. Paul corrects 200Gbps speed being reported as unknown. Wojciech adds retry mechanism when package download fails. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: implement AQ download pkg retry ice: fix 200G link speed message log ice: avoid IRQ collision to fix init failure on ACPI S3 resume ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-14netdev-genl: fix error codes when outputting XDP featuresJakub Kicinski1-8/+8
-EINVAL will interrupt the dump. The correct error to return if we have more data to dump is -EMSGSIZE. Discovered by doing: for i in `seq 80`; do ip link add type veth; done ./cli.py --dbg-small-recv 5300 --spec netdev.yaml --dump dev-get >> /dev/null [...] nl_len = 64 (48) nl_flags = 0x0 nl_type = 19 nl_len = 20 (4) nl_flags = 0x2 nl_type = 3 error: -22 Fixes: d3d854fd6a1d ("netdev-genl: create a simple family for netdev stuff") Reviewed-by: Amritha Nambiar <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-14Merge tag 'for-netdev' of ↵Jakub Kicinski9-11/+92
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-06-14 We've added 8 non-merge commits during the last 2 day(s) which contain a total of 9 files changed, 92 insertions(+), 11 deletions(-). The main changes are: 1) Silence a syzkaller splat under CONFIG_DEBUG_NET=y in pskb_pull_reason() triggered via __bpf_try_make_writable(), from Florian Westphal. 2) Fix removal of kfuncs during linking phase which then throws a kernel build warning via resolve_btfids about unresolved symbols, from Tony Ambardar. 3) Fix a UML x86_64 compilation failure from BPF as pcpu_hot symbol is not available on User Mode Linux, from Maciej Żenczykowski. 4) Fix a register corruption in reg_set_min_max triggering an invariant violation in BPF verifier, from Daniel Borkmann. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Harden __bpf_kfunc tag against linker kfunc removal compiler_types.h: Define __retain for __attribute__((__retain__)) bpf: Avoid splat in pskb_pull_reason bpf: fix UML x86_64 compile failure selftests/bpf: Add test coverage for reg_set_min_max handling bpf: Reduce stack consumption in check_stack_write_fixed_off bpf: Fix reg_set_min_max corruption of fake_reg MAINTAINERS: mailmap: Update Stanislav's email address ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-14bpf: Harden __bpf_kfunc tag against linker kfunc removalTony Ambardar1-1/+1
BPF kfuncs are often not directly referenced and may be inadvertently removed by optimization steps during kernel builds, thus the __bpf_kfunc tag mitigates against this removal by including the __used macro. However, this macro alone does not prevent removal during linking, and may still yield build warnings (e.g. on mips64el): [...] LD vmlinux BTFIDS vmlinux WARN: resolve_btfids: unresolved symbol bpf_verify_pkcs7_signature WARN: resolve_btfids: unresolved symbol bpf_lookup_user_key WARN: resolve_btfids: unresolved symbol bpf_lookup_system_key WARN: resolve_btfids: unresolved symbol bpf_key_put WARN: resolve_btfids: unresolved symbol bpf_iter_task_next WARN: resolve_btfids: unresolved symbol bpf_iter_css_task_new WARN: resolve_btfids: unresolved symbol bpf_get_file_xattr WARN: resolve_btfids: unresolved symbol bpf_ct_insert_entry WARN: resolve_btfids: unresolved symbol bpf_cgroup_release WARN: resolve_btfids: unresolved symbol bpf_cgroup_from_id WARN: resolve_btfids: unresolved symbol bpf_cgroup_acquire WARN: resolve_btfids: unresolved symbol bpf_arena_free_pages NM System.map SORTTAB vmlinux OBJCOPY vmlinux.32 [...] Update the __bpf_kfunc tag to better guard against linker optimization by including the new __retain compiler macro, which fixes the warnings above. Verify the __retain macro with readelf by checking object flags for 'R': $ readelf -Wa kernel/trace/bpf_trace.o Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [...] [178] .text.bpf_key_put PROGBITS 00000000 6420 0050 00 AXR 0 0 8 [...] Key to Flags: [...] R (retain), D (mbind), p (processor specific) Fixes: 57e7c169cd6a ("bpf: Add __bpf_kfunc tag for marking kernel functions as kfuncs") Reported-by: kernel test robot <[email protected]> Signed-off-by: Tony Ambardar <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Jiri Olsa <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Cc: Yonghong Song <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Link: https://lore.kernel.org/bpf/ZlmGoT9KiYLZd91S@krava/T/ Link: https://lore.kernel.org/bpf/e9c64e9b5c073dabd457ff45128aabcab7630098.1717477560.git.Tony.Ambardar@gmail.com
2024-06-14compiler_types.h: Define __retain for __attribute__((__retain__))Tony Ambardar1-0/+23
Some code includes the __used macro to prevent functions and data from being optimized out. This macro implements __attribute__((__used__)), which operates at the compiler and IR-level, and so still allows a linker to remove objects intended to be kept. Compilers supporting __attribute__((__retain__)) can address this gap by setting the flag SHF_GNU_RETAIN on the section of a function/variable, indicating to the linker the object should be retained. This attribute is available since gcc 11, clang 13, and binutils 2.36. Provide a __retain macro implementing __attribute__((__retain__)), whose first user will be the '__bpf_kfunc' tag. [ Additional remark from discussion: Why is CONFIG_LTO_CLANG added here? The __used macro permits garbage collection at section level, so CLANG_LTO_CLANG without CONFIG_LD_DEAD_CODE_DATA_ELIMINATION should not change final section dynamics? The conditional guard was included to ensure consistent behaviour between __retain and other features forcing split sections. In particular, the same guard is used in vmlinux.lds.h to merge split sections where needed. For example, using __retain in LLVM builds without CONFIG_LTO was failing CI tests on kernel-patches/bpf because the kernel didn't boot properly. And in further testing, the kernel had no issues loading BPF kfunc modules with such split sections, so the module (partial) linking scripts were left alone. ] Signed-off-by: Tony Ambardar <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Cc: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/ZlmGoT9KiYLZd91S@krava/T/ Link: https://lore.kernel.org/bpf/b31bca5a5e6765a0f32cc8c19b1d9cdbfaa822b5.1717477560.git.Tony.Ambardar@gmail.com
2024-06-14bpf: Avoid splat in pskb_pull_reasonFlorian Westphal1-0/+5
syzkaller builds (CONFIG_DEBUG_NET=y) frequently trigger a debug hint in pskb_may_pull. We'd like to retain this debug check because it might hint at integer overflows and other issues (kernel code should pull headers, not huge value). In bpf case, this splat isn't interesting at all: such (nonsensical) bpf programs are typically generated by a fuzzer anyway. Do what Eric suggested and suppress such warning. For CONFIG_DEBUG_NET=n we don't need the extra check because pskb_may_pull will do the right thing: return an error without the WARN() backtrace. Fixes: 219eee9c0d16 ("net: skbuff: add overflow debug check to pull/push helpers") Reported-by: [email protected] Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Closes: https://syzkaller.appspot.com/bug?extid=0c4150bff9fff3bf023c Link: https://lore.kernel.org/netdev/[email protected]/ Link: https://lore.kernel.org/bpf/[email protected]