aboutsummaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2023-10-18net: treat possible_net_t net pointer as an RCU one and add read_pnet_rcu()Jiri Pirko1-3/+12
Make the net pointer stored in possible_net_t structure annotated as an RCU pointer. Change the access helpers to treat it as such. Introduce read_pnet_rcu() helper to allow caller to dereference the net pointer under RCU read lock. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-17Merge tag 'ipsec-2023-10-17' of ↵Jakub Kicinski1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2023-10-17 1) Fix a slab-use-after-free in xfrm_policy_inexact_list_reinsert. From Dong Chenchen. 2) Fix data-races in the xfrm interfaces dev->stats fields. From Eric Dumazet. 3) Fix a data-race in xfrm_gen_index. From Eric Dumazet. 4) Fix an inet6_dev refcount underflow. From Zhang Changzhong. 5) Check the return value of pskb_trim in esp_remove_trailer for esp4 and esp6. From Ma Ke. 6) Fix a data-race in xfrm_lookup_with_ifid. From Eric Dumazet. * tag 'ipsec-2023-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix a data-race in xfrm_lookup_with_ifid() net: ipv4: fix return value check in esp_remove_trailer net: ipv6: fix return value check in esp_remove_trailer xfrm6: fix inet6_dev refcount underflow problem xfrm: fix a data-race in xfrm_gen_index() xfrm: interface: use DEV_STATS_INC() net: xfrm: skip policies marked as dead while reinserting policies ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-17tcp: fix excessive TLP and RACK timeouts from HZ roundingNeal Cardwell1-0/+3
We discovered from packet traces of slow loss recovery on kernels with the default HZ=250 setting (and min_rtt < 1ms) that after reordering, when receiving a SACKed sequence range, the RACK reordering timer was firing after about 16ms rather than the desired value of roughly min_rtt/4 + 2ms. The problem is largely due to the RACK reorder timer calculation adding in TCP_TIMEOUT_MIN, which is 2 jiffies. On kernels with HZ=250, this is 2*4ms = 8ms. The TLP timer calculation has the exact same issue. This commit fixes the TLP transmit timer and RACK reordering timer floor calculation to more closely match the intended 2ms floor even on kernels with HZ=250. It does this by adding in a new TCP_TIMEOUT_MIN_US floor of 2000 us and then converting to jiffies, instead of the current approach of converting to jiffies and then adding th TCP_TIMEOUT_MIN value of 2 jiffies. Our testing has verified that on kernels with HZ=1000, as expected, this does not produce significant changes in behavior, but on kernels with the default HZ=250 the latency improvement can be large. For example, our tests show that for HZ=250 kernels at low RTTs this fix roughly halves the latency for the RACK reorder timer: instead of mostly firing at 16ms it mostly fires at 8ms. Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Fixes: bb4d991a28cc ("tcp: adjust tail loss probe timeout") Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-17Merge tag 'wireless-next-2023-10-16' of ↵Jakub Kicinski1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.7 The second pull request for v6.7, with only driver changes this time. We have now support for mt7925 PCIe and USB variants, few new features and of course some fixes. Major changes: mt76 - mt7925 support ath12k - read board data variant name from SMBIOS wfx - Remain-On-Channel (ROC) support * tag 'wireless-next-2023-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (109 commits) wifi: rtw89: mac: do bf_monitor only if WiFi 6 chips wifi: rtw89: mac: set bf_assoc capabilities according to chip gen wifi: rtw89: mac: set bfee_ctrl() according to chip gen wifi: rtw89: mac: add registers of MU-EDCA parameters for WiFi 7 chips wifi: rtw89: mac: generalize register of MU-EDCA switch according to chip gen wifi: rtw89: mac: update RTS threshold according to chip gen wifi: rtlwifi: simplify TX command fill callbacks wifi: hostap: remove unused ioctl function wifi: atmel: remove unused ioctl function wifi: rtw89: coex: add annotation __counted_by() to struct rtw89_btc_btf_set_mon_reg wifi: rtw89: coex: add annotation __counted_by() for struct rtw89_btc_btf_set_slot_table wifi: rtw89: add EHT radiotap in monitor mode wifi: rtw89: show EHT rate in debugfs wifi: rtw89: parse TX EHT rate selected by firmware from RA C2H report wifi: rtw89: Add EHT rate mask as parameters of RA H2C command wifi: rtw89: parse EHT information from RX descriptor and PPDU status packet wifi: radiotap: add bandwidth definition of EHT U-SIG wifi: rtlwifi: use convenient list_count_nodes() wifi: p54: Annotate struct p54_cal_database with __counted_by wifi: brcmfmac: fweh: Add __counted_by for struct brcmf_fweh_queue_item and use struct_size() ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-17net, bpf: Add a warning if NAPI cb missed xdp_do_flush().Sebastian Andrzej Siewior1-0/+9
A few drivers were missing a xdp_do_flush() invocation after XDP_REDIRECT. Add three helper functions each for one of the per-CPU lists. Return true if the per-CPU list is non-empty and flush the list. Add xdp_do_check_flushed() which invokes each helper functions and creates a warning if one of the functions had a non-empty list. Hide everything behind CONFIG_DEBUG_NET. Suggested-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-10-16Merge tag 'for-netdev' of ↵Jakub Kicinski1-0/+5
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-10-16 We've added 90 non-merge commits during the last 25 day(s) which contain a total of 120 files changed, 3519 insertions(+), 895 deletions(-). The main changes are: 1) Add missed stats for kprobes to retrieve the number of missed kprobe executions and subsequent executions of BPF programs, from Jiri Olsa. 2) Add cgroup BPF sockaddr hooks for unix sockets. The use case is for systemd to reimplement the LogNamespace feature which allows running multiple instances of systemd-journald to process the logs of different services, from Daan De Meyer. 3) Implement BPF CPUv4 support for s390x BPF JIT, from Ilya Leoshkevich. 4) Improve BPF verifier log output for scalar registers to better disambiguate their internal state wrt defaults vs min/max values matching, from Andrii Nakryiko. 5) Extend the BPF fib lookup helpers for IPv4/IPv6 to support retrieving the source IP address with a new BPF_FIB_LOOKUP_SRC flag, from Martynas Pumputis. 6) Add support for open-coded task_vma iterator to help with symbolization for BPF-collected user stacks, from Dave Marchevsky. 7) Add libbpf getters for accessing individual BPF ring buffers which is useful for polling them individually, for example, from Martin Kelly. 8) Extend AF_XDP selftests to validate the SHARED_UMEM feature, from Tushar Vyavahare. 9) Improve BPF selftests cross-building support for riscv arch, from Björn Töpel. 10) Add the ability to pin a BPF timer to the same calling CPU, from David Vernet. 11) Fix libbpf's bpf_tracing.h macros for riscv to use the generic implementation of PT_REGS_SYSCALL_REGS() to access syscall arguments, from Alexandre Ghiti. 12) Extend libbpf to support symbol versioning for uprobes, from Hengqi Chen. 13) Fix bpftool's skeleton code generation to guarantee that ELF data is 8 byte aligned, from Ian Rogers. 14) Inherit system-wide cpu_mitigations_off() setting for Spectre v1/v4 security mitigations in BPF verifier, from Yafang Shao. 15) Annotate struct bpf_stack_map with __counted_by attribute to prepare BPF side for upcoming __counted_by compiler support, from Kees Cook. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (90 commits) bpf: Ensure proper register state printing for cond jumps bpf: Disambiguate SCALAR register state output in verifier logs selftests/bpf: Make align selftests more robust selftests/bpf: Improve missed_kprobe_recursion test robustness selftests/bpf: Improve percpu_alloc test robustness selftests/bpf: Add tests for open-coded task_vma iter bpf: Introduce task_vma open-coded iterator kfuncs selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c bpf: Don't explicitly emit BTF for struct btf_iter_num bpf: Change syscall_nr type to int in struct syscall_tp_t net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set bpf: Avoid unnecessary audit log for CPU security mitigations selftests/bpf: Add tests for cgroup unix socket address hooks selftests/bpf: Make sure mount directory exists documentation/bpf: Document cgroup unix socket address hooks bpftool: Add support for cgroup unix socket address hooks libbpf: Add support for cgroup unix socket address hooks bpf: Implement cgroup sockaddr hooks for unix sockets bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf bpf: Propagate modified uaddrlen from cgroup sockaddr programs ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-16page_pool: fragment API support for 32-bit arch with 64-bit DMAYunsheng Lin1-6/+14
Currently page_pool_alloc_frag() is not supported in 32-bit arch with 64-bit DMA because of the overlap issue between pp_frag_count and dma_addr_upper in 'struct page' for those arches, which seems to be quite common, see [1], which means driver may need to handle it when using fragment API. It is assumed that the combination of the above arch with an address space >16TB does not exist, as all those arches have 64b equivalent, it seems logical to use the 64b version for a system with a large address space. It is also assumed that dma address is page aligned when we are dma mapping a page aligned buffer, see [2]. That means we're storing 12 bits of 0 at the lower end for a dma address, we can reuse those bits for the above arches to support 32b+12b, which is 16TB of memory. If we make a wrong assumption, a warning is emitted so that user can report to us. 1. https://lore.kernel.org/all/[email protected]/ 2. https://lore.kernel.org/all/[email protected]/ Tested-by: Alexander Lobakin <[email protected]> Signed-off-by: Yunsheng Lin <[email protected]> CC: Lorenzo Bianconi <[email protected]> CC: Alexander Duyck <[email protected]> CC: Liang Chen <[email protected]> CC: Guillaume Tucker <[email protected]> CC: Matthew Wilcox <[email protected]> CC: Linux-MM <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-16Merge tag 'for-net-2023-10-13' of ↵Jakub Kicinski1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix race when opening vhci device - Avoid memcmp() out of bounds warning - Correctly bounds check and pad HCI_MON_NEW_INDEX name - Fix using memcmp when comparing keys - Ignore error return for hci_devcd_register() in btrtl - Always check if connection is alive before deleting - Fix a refcnt underflow problem for hci_conn * tag 'for-net-2023-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_sock: Correctly bounds check and pad HCI_MON_NEW_INDEX name Bluetooth: avoid memcmp() out of bounds warning Bluetooth: hci_sock: fix slab oob read in create_monitor_event Bluetooth: btrtl: Ignore error return for hci_devcd_register() Bluetooth: hci_event: Fix coding style Bluetooth: hci_event: Fix using memcmp when comparing keys Bluetooth: Fix a refcnt underflow problem for hci_conn Bluetooth: hci_sync: always check if connection is alive before deleting Bluetooth: Reject connection with the device which has same BD_ADDR Bluetooth: hci_event: Ignore NULL link key Bluetooth: ISO: Fix invalid context error Bluetooth: vhci: Fix race when opening vhci device ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-16net: stub tcp_gro_complete if CONFIG_INET=nJacob Keller1-0/+4
A few networking drivers including bnx2x, bnxt, qede, and idpf call tcp_gro_complete as part of offloading TCP GRO. The function is only defined if CONFIG_INET is true, since its TCP specific and is meaningless if the kernel lacks IP networking support. The combination of trying to use the complex network drivers with CONFIG_NET but not CONFIG_INET is rather unlikely in practice: most use cases are going to need IP networking. The tcp_gro_complete function just sets some data in the socket buffer for use in processing the TCP packet in the event that the GRO was offloaded to the device. If the kernel lacks TCP support, such setup will simply go unused. The bnx2x, bnxt, and qede drivers wrap their TCP offload support in CONFIG_INET checks and skip handling on such kernels. The idpf driver did not check CONFIG_INET and thus fails to link if the kernel is configured with CONFIG_NET=y, CONFIG_IDPF=(m|y), and CONFIG_INET=n. While checking CONFIG_INET does allow the driver to bypass significantly more instructions in the event that we know TCP networking isn't supported, the configuration is unlikely to be used widely. Rather than require driver authors to care about this, stub the tcp_gro_complete function when CONFIG_INET=n. This allows drivers to be left as-is. It does mean the idpf driver will perform slightly more work than strictly necessary when CONFIG_INET=n, since it will still execute some of the skb setup in idpf_rx_rsc. However, that work would be performed in the case where CONFIG_INET=y anyways. I did not change the existing drivers, since they appear to wrap a significant portion of code when CONFIG_INET=n. There is little benefit in trashing these drivers just to unwrap and remove the CONFIG_INET check. Using a stub for tcp_gro_complete is still beneficial, as it means future drivers no longer need to worry about this case of CONFIG_NET=y and CONFIG_INET=n, which should reduce noise from buildbots that check such a configuration. Signed-off-by: Jacob Keller <[email protected]> Acked-by: Randy Dunlap <[email protected]> Tested-by: Randy Dunlap <[email protected]> # build-tested Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-16tcp: Set pingpong threshold via sysctlHaiyang Zhang2-4/+14
TCP pingpong threshold is 1 by default. But some applications, like SQL DB may prefer a higher pingpong threshold to activate delayed acks in quick ack mode for better performance. The pingpong threshold and related code were changed to 3 in the year 2019 in: commit 4a41f453bedf ("tcp: change pingpong threshold to 3") And reverted to 1 in the year 2022 in: commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"") There is no single value that fits all applications. Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for optimal performance based on the application needs. Signed-off-by: Haiyang Zhang <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Neal Cardwell <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-16net, sched: Add tcf_set_drop_reason for {__,}tcf_classifyDaniel Borkmann1-0/+3
Add an initial user for the newly added tcf_set_drop_reason() helper to set the drop reason for internal errors leading to TC_ACT_SHOT inside {__,}tcf_classify(). Right now this only adds a very basic SKB_DROP_REASON_TC_ERROR as a generic fallback indicator to mark drop locations. Where needed, such locations can be converted to more specific codes, for example, when hitting the reclassification limit, etc. Signed-off-by: Daniel Borkmann <[email protected]> Cc: Jamal Hadi Salim <[email protected]> Cc: Victor Nogueira <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-16net, sched: Make tc-related drop reason more flexibleDaniel Borkmann2-2/+7
Currently, the kfree_skb_reason() in sch_handle_{ingress,egress}() can only express a basic SKB_DROP_REASON_TC_INGRESS or SKB_DROP_REASON_TC_EGRESS reason. Victor kicked-off an initial proposal to make this more flexible by disambiguating verdict from return code by moving the verdict into struct tcf_result and letting tcf_classify() return a negative error. If hit, then two new drop reasons were added in the proposal, that is SKB_DROP_REASON_TC_INGRESS_ERROR as well as SKB_DROP_REASON_TC_EGRESS_ERROR. Further analysis of the actual error codes would have required to attach to tcf_classify via kprobe/kretprobe to more deeply debug skb and the returned error. In order to make the kfree_skb_reason() in sch_handle_{ingress,egress}() more extensible, it can be addressed in a more straight forward way, that is: Instead of placing the verdict into struct tcf_result, we can just put the drop reason in there, which does not require changes throughout various classful schedulers given the existing verdict logic can stay as is. Then, SKB_DROP_REASON_TC_ERROR{,_*} can be added to the enum skb_drop_reason to disambiguate between an error or an intentional drop. New drop reason error codes can be added successively to the tc code base. For internal error locations which have not yet been annotated with a SKB_DROP_REASON_TC_ERROR{,_*}, the fallback is SKB_DROP_REASON_TC_INGRESS and SKB_DROP_REASON_TC_EGRESS, respectively. Generic errors could be marked with a SKB_DROP_REASON_TC_ERROR code until they are converted to more specific ones if it is found that they would be useful for troubleshooting. While drop reasons have infrastructure for subsystem specific error codes which are currently used by mac80211 and ovs, Jakub mentioned that it is preferred for tc to use the enum skb_drop_reason core codes given it is a better fit and currently the tooling support is better, too. With regards to the latter: [...] I think Alastair (bpftrace) is working on auto-prettifying enums when bpftrace outputs maps. So we can do something like: $ bpftrace -e 'tracepoint:skb:kfree_skb { @[args->reason] = count(); }' Attaching 1 probe... ^C @[SKB_DROP_REASON_TC_INGRESS]: 2 @[SKB_CONSUMED]: 34 ^^^^^^^^^^^^ names!! Auto-magically. [...] Add a small helper tcf_set_drop_reason() which can be used to set the drop reason into the tcf_result. Signed-off-by: Daniel Borkmann <[email protected]> Cc: Jamal Hadi Salim <[email protected]> Cc: Victor Nogueira <[email protected]> Link: https://lore.kernel.org/netdev/[email protected] Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-16ipv4: add new arguments to udp_tunnel_dst_lookup()Beniamino Galvani1-3/+5
We want to make the function more generic so that it can be used by other UDP tunnel implementations such as geneve and vxlan. To do that, add the following arguments: - source and destination UDP port; - ifindex of the output interface, needed by vxlan; - the tos, because in some cases it is not taken from struct ip_tunnel_info (for example, when it's inherited from the inner packet); - the dst cache, because not all tunnel types (e.g. vxlan) want to use the one from struct ip_tunnel_info. With these parameters, the function no longer needs the full struct ip_tunnel_info as argument and we can pass only the relevant part of it (struct ip_tunnel_key). Suggested-by: Guillaume Nault <[email protected]> Signed-off-by: Beniamino Galvani <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-16ipv4: remove "proto" argument from udp_tunnel_dst_lookup()Beniamino Galvani1-1/+1
The function is now UDP-specific, the protocol is always IPPROTO_UDP. Suggested-by: Guillaume Nault <[email protected]> Signed-off-by: Beniamino Galvani <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-16ipv4: rename and move ip_route_output_tunnel()Beniamino Galvani2-6/+6
At the moment ip_route_output_tunnel() is used only by bareudp. Ideally, other UDP tunnel implementations should use it, but to do so the function needs to accept new parameters that are specific for UDP tunnels, such as the ports. Prepare for these changes by renaming the function to udp_tunnel_dst_lookup() and move it to file net/ipv4/udp_tunnel_core.c. Suggested-by: Guillaume Nault <[email protected]> Signed-off-by: Beniamino Galvani <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-15vsock: check for MSG_ZEROCOPY support on sendArseniy Krasnov1-0/+7
This feature totally depends on transport, so if transport doesn't support it, return error. Signed-off-by: Arseniy Krasnov <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-13Bluetooth: hci_sock: Correctly bounds check and pad HCI_MON_NEW_INDEX nameKees Cook1-1/+1
The code pattern of memcpy(dst, src, strlen(src)) is almost always wrong. In this case it is wrong because it leaves memory uninitialized if it is less than sizeof(ni->name), and overflows ni->name when longer. Normally strtomem_pad() could be used here, but since ni->name is a trailing array in struct hci_mon_new_index, compilers that don't support -fstrict-flex-arrays=3 can't tell how large this array is via __builtin_object_size(). Instead, open-code the helper and use sizeof() since it will work correctly. Additionally mark ni->name as __nonstring since it appears to not be a %NUL terminated C string. Cc: Luiz Augusto von Dentz <[email protected]> Cc: Edward AD <[email protected]> Cc: Marcel Holtmann <[email protected]> Cc: Johan Hedberg <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: 18f547f3fc07 ("Bluetooth: hci_sock: fix slab oob read in create_monitor_event") Link: https://lore.kernel.org/lkml/202310110908.F2639D3276@keescook/ Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2023-10-13tcp: allow again tcp_disconnect() when threads are waitingPaolo Abeni1-6/+4
As reported by Tom, .NET and applications build on top of it rely on connect(AF_UNSPEC) to async cancel pending I/O operations on TCP socket. The blamed commit below caused a regression, as such cancellation can now fail. As suggested by Eric, this change addresses the problem explicitly causing blocking I/O operation to terminate immediately (with an error) when a concurrent disconnect() is executed. Instead of tracking the number of threads blocked on a given socket, track the number of disconnect() issued on such socket. If such counter changes after a blocking operation releasing and re-acquiring the socket lock, error out the current operation. Fixes: 4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting") Reported-by: Tom Deseyn <[email protected]> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1886305 Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/f3b95e47e3dbed840960548aebaa8d954372db41.1697008693.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-13tls: use fixed size for tls_offload_context_{tx,rx}.driver_stateSabrina Dubroca1-10/+4
driver_state is a flex array, but is always allocated by the tls core to a fixed size (TLS_DRIVER_STATE_SIZE_{TX,RX}). Simplify the code by making that size explicit so that sizeof(struct tls_offload_context_{tx,rx}) works. Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-13tls: store iv directly within cipher_contextSabrina Dubroca1-1/+2
TLS_MAX_IV_SIZE + TLS_MAX_SALT_SIZE is 20B, we don't get much benefit in cipher_context's size and can simplify the init code a bit. Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-13tls: rename MAX_IV_SIZE to TLS_MAX_IV_SIZESabrina Dubroca1-1/+1
It's defined in include/net/tls.h, avoid using an overly generic name. Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-13tls: store rec_seq directly within cipher_contextSabrina Dubroca1-1/+1
TLS_MAX_REC_SEQ_SIZE is 8B, we don't get anything by using kmalloc. Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+1
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: kernel/bpf/verifier.c 829955981c55 ("bpf: Fix verifier log for async callback return values") a923819fb2c5 ("bpf: Treat first argument as return value for bpf_throw") Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-12wifi: radiotap: add bandwidth definition of EHT U-SIGPing-Ke Shih1-0/+6
Define EHT U-SIG bandwidth used by radiotap according to Table 36-28 "U-SIG field of an EHT MU PPDU" in 802.11be (D3.0). Signed-off-by: Ping-Ke Shih <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-10netfilter: cleanup struct nft_tableGeorge Guo1-1/+4
Add comments for nlpid, family, udlen and udata in struct nft_table, and afinfo is no longer a member of struct nft_table, so remove the comment for it. Signed-off-by: George Guo <[email protected]> Signed-off-by: Florian Westphal <[email protected]>
2023-10-10netfilter: conntrack: simplify nf_conntrack_alter_replyFlorian Westphal1-4/+10
nf_conntrack_alter_reply doesn't do helper reassignment anymore. Remove the comments that make this claim. Furthermore, remove dead code from the function and place ot in nf_conntrack.h. Signed-off-by: Florian Westphal <[email protected]>
2023-10-10net: macsec: indicate next pn update when offloadingRadu Pirea (NXP OSS)1-0/+1
Indicate next PN update using update_pn flag in macsec_context. Offloaded MACsec implementations does not know whether or not the MACSEC_SA_ATTR_PN attribute was passed for an SA update and assume that next PN should always updated, but this is not always true. The PN can be reset to its initial value using the following command: $ ip macsec set macsec0 tx sa 0 off #octeontx2-pf case Or, the update PN command will succeed even if the driver does not support PN updates. $ ip macsec set macsec0 tx sa 0 pn 1 on #mscc phy driver case Comparing the initial PN with the new PN value is not a solution. When the user updates the PN using its initial value the command will succeed, even if the driver does not support it. Like this: $ ip macsec add macsec0 tx sa 0 pn 1 on key 00 \ ead3664f508eb06c40ac7104cdae4ce5 $ ip macsec set macsec0 tx sa 0 pn 1 on #mlx5 case Signed-off-by: Radu Pirea (NXP OSS) <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-10tcp: record last received ipv6 flowlabelDavid Morley2-1/+6
In order to better estimate whether a data packet has been retransmitted or is the result of a TLP, we save the last received ipv6 flowlabel. To make space for this field we resize the "ato" field in inet_connection_sock as the current value of TCP_DELACK_MAX can be fully contained in 8 bits and add a compile_time_assert ensuring this field is the required size. v2: addressed kernel bot feedback about dccp_delack_timer() v3: addressed build error introduced by commit bbf80d713fe7 ("tcp: derive delack_max from rto_min") Signed-off-by: David Morley <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Tested-by: David Morley <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-09bpf: Derive source IP addr via bpf_*_fib_lookup()Martynas Pumputis1-0/+5
Extend the bpf_fib_lookup() helper by making it to return the source IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set. For example, the following snippet can be used to derive the desired source IP address: struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr }; ret = bpf_skb_fib_lookup(skb, p, sizeof(p), BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH); if (ret != BPF_FIB_LKUP_RET_SUCCESS) return TC_ACT_SHOT; /* the p.ipv4_src now contains the source address */ The inability to derive the proper source address may cause malfunctions in BPF-based dataplanes for hosts containing netdevs with more than one routable IP address or for multi-homed hosts. For example, Cilium implements packet masquerading in BPF. If an egressing netdev to which the Cilium's BPF prog is attached has multiple IP addresses, then only one [hardcoded] IP address can be used for masquerading. This breaks connectivity if any other IP address should have been selected instead, for example, when a public and private addresses are attached to the same egress interface. The change was tested with Cilium [1]. Nikolay Aleksandrov helped to figure out the IPv6 addr selection. [1]: https://github.com/cilium/cilium/pull/28283 Signed-off-by: Martynas Pumputis <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-10-06Merge tag 'wireless-next-2023-10-06' of ↵Jakub Kicinski3-96/+182
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.7 The first pull request for v6.7, with both stack and driver changes. We have a big change how locking is handled in cfg80211 and mac80211 which removes several locks and hopefully simplifies the locking overall. In drivers rtw89 got MCC support and smaller features to other active drivers but nothing out of ordinary. Major changes: cfg80211 - remove wdev mutex, use the wiphy mutex instead - annotate iftype_data pointer with sparse - first kunit tests, for element defrag - remove unused scan_width support mac80211 - major locking rework, remove several locks like sta_mtx, key_mtx etc. and use the wiphy mutex instead - remove unused shifted rate support - support antenna control in frame injection (requires driver support) - convert RX_DROP_UNUSABLE to more detailed reason codes rtw89 - TDMA-based multi-channel concurrency (MCC) support iwlwifi - support set_antenna() operation - support frame injection antenna control ath12k - WCN7850: enable 320 MHz channels in 6 GHz band - WCN7850: hardware rfkill support - WCN7850: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS to make scan faster ath11k - add chip id board name while searching board-2.bin * tag 'wireless-next-2023-10-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (272 commits) wifi: rtlwifi: remove unreachable code in rtl92d_dm_check_edca_turbo() wifi: rtw89: debug: txpwr table supports Wi-Fi 7 chips wifi: rtw89: debug: show txpwr table according to chip gen wifi: rtw89: phy: set TX power RU limit according to chip gen wifi: rtw89: phy: set TX power limit according to chip gen wifi: rtw89: phy: set TX power offset according to chip gen wifi: rtw89: phy: set TX power by rate according to chip gen wifi: rtw89: mac: get TX power control register according to chip gen wifi: rtlwifi: use unsigned long for rtl_bssid_entry timestamp wifi: rtlwifi: fix EDCA limit set by BT coexistence wifi: rt2x00: fix MT7620 low RSSI issue wifi: rtw89: refine bandwidth 160MHz uplink OFDMA performance wifi: rtw89: refine uplink trigger based control mechanism wifi: rtw89: 8851b: update TX power tables to R34 wifi: rtw89: 8852b: update TX power tables to R35 wifi: rtw89: 8852c: update TX power tables to R67 wifi: rtw89: regd: configure Thailand in regulation type wifi: mac80211: add back SPDX identifier wifi: mac80211: fix ieee80211_drop_unencrypted_mgmt return type/value wifi: rtlwifi: cleanup few rtlxxxx_set_hw_reg() routines ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-06Merge wireless into wireless-nextJohannes Berg3-12/+34
Resolve several conflicts, mostly between changes/fixes in wireless and the locking rework in wireless-next. One of the conflicts actually shows a bug in wireless that we'll want to fix separately. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Kalle Valo <[email protected]>
2023-10-06flow_offload: Annotate struct flow_action_entry with __counted_byKees Cook1-1/+1
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct flow_action_entry. Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: [email protected] Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-06nexthop: Annotate struct nh_group with __counted_byKees Cook1-1/+1
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct nh_group. Cc: David Ahern <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: [email protected] Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-06nexthop: Annotate struct nh_notifier_grp_info with __counted_byKees Cook1-1/+1
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct nh_notifier_grp_info. Cc: David Ahern <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-06xfrm: pass struct net to xfrm_decode_session wrappersFlorian Westphal1-6/+6
Preparation patch, extra arg is not used. No functional changes intended. This is needed to replace the xfrm session decode functions with the flow dissector. skb_flow_dissect() cannot be used as-is, because it attempts to deduce the 'struct net' to use for bpf program fetch from skb->sk or skb->dev, but xfrm code path can see skbs that have neither sk or dev filled in. So either flow dissector needs to try harder, e.g. by also trying skb->dst->dev, or we have to pass the struct net explicitly. Passing the struct net doesn't look too bad to me, most places already have it available or can derive it from the output device. Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2023-10-06xfrm: Support GRO for IPv6 ESP in UDP encapsulationSteffen Klassert2-0/+5
This patch enables the GRO codepath for IPv6 ESP in UDP encapsulated packets. Decapsulation happens at L2 and saves a full round through the stack for each packet. This is also needed to support HW offload for ESP in UDP encapsulation. Signed-off-by: Steffen Klassert <[email protected]> Co-developed-by: Antony Antony <[email protected]> Signed-off-by: Antony Antony <[email protected]> Reviewed-by: Eyal Birger <[email protected]>
2023-10-06xfrm: Support GRO for IPv4 ESP in UDP encapsulationSteffen Klassert2-1/+3
This patch enables the GRO codepath for IPv4 ESP in UDP encapsulated packets. Decapsulation happens at L2 and saves a full round through the stack for each packet. This is also needed to support HW offload for ESP in UDP encapsulation. Enabling this would imporove performance for ESP in UDP datapath, i.e IPsec with NAT in between. By default GRP for ESP-in-UDP is disabled for UDP sockets. To enable this feature for an ESP socket, the following two options need to be set: 1. enable ESP-in-UDP: (this is already set by an IKE daemon). int type = UDP_ENCAP_ESPINUDP; setsockopt(fd, SOL_UDP, UDP_ENCAP, &type, sizeof(type)); 2. To enable GRO for ESP in UDP socket: type = true; setsockopt(fd, SOL_UDP, UDP_GRO, &type, sizeof(type)); Enabling ESP-in-UDP has the side effect of preventing the Linux stack from seeing ESP packets at the L3 (when ESP OFFLOAD is disabled), as packets are immediately decapsulated from UDP and decrypted. This change may affect nftable rules that match on ESP packets at L3. Also tcpdump won't see the ESP packet. Developers/admins are advised to review and adapt any nftable rules accordingly before enabling this feature to prevent potential rule breakage. Also tcpdump will not see from ESP packets from a ESP in UDP flow, when this is enabled. Signed-off-by: Steffen Klassert <[email protected]> Co-developed-by: Antony Antony <[email protected]> Signed-off-by: Antony Antony <[email protected]> Reviewed-by: Eyal Birger <[email protected]>
2023-10-05nexthop: Annotate struct nh_notifier_res_table_info with __counted_byKees Cook1-1/+1
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct nh_notifier_res_table_info. Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Tom Rix <[email protected]> Cc: [email protected] Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-05nexthop: Annotate struct nh_res_table with __counted_byKees Cook1-1/+1
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct nh_res_table. Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski7-11/+17
Cross-merge networking fixes after downstream PR. No conflicts (or adjacent changes of note). Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-05net_sched: export pfifo_fast prio2band[]Eric Dumazet1-0/+1
pfifo_fast prio2band[] is renamed to sch_default_prio2band[] and exported because we want to share it in FQ. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Dave Taht <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-05net: mana: Fix oversized sge0 for GSO packetsHaiyang Zhang1-2/+3
Handle the case when GSO SKB linear length is too large. MANA NIC requires GSO packets to put only the header part to SGE0, otherwise the TX queue may stop at the HW level. So, use 2 SGEs for the skb linear part which contains more than the packet header. Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Haiyang Zhang <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Shradha Gupta <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-04tcp: fix quick-ack counting to count actual ACKs of new dataNeal Cardwell1-2/+4
This commit fixes quick-ack counting so that it only considers that a quick-ack has been provided if we are sending an ACK that newly acknowledges data. The code was erroneously using the number of data segments in outgoing skbs when deciding how many quick-ack credits to remove. This logic does not make sense, and could cause poor performance in request-response workloads, like RPC traffic, where requests or responses can be multi-segment skbs. When a TCP connection decides to send N quick-acks, that is to accelerate the cwnd growth of the congestion control module controlling the remote endpoint of the TCP connection. That quick-ack decision is purely about the incoming data and outgoing ACKs. It has nothing to do with the outgoing data or the size of outgoing data. And in particular, an ACK only serves the intended purpose of allowing the remote congestion control to grow the congestion window quickly if the ACK is ACKing or SACKing new data. The fix is simple: only count packets as serving the goal of the quickack mechanism if they are ACKing/SACKing new data. We can tell whether this is the case by checking inet_csk_ack_scheduled(), since we schedule an ACK exactly when we are ACKing/SACKing new data. Fixes: fc6415bcb0f5 ("[TCP]: Fix quick-ack decrementing with TSO.") Signed-off-by: Neal Cardwell <[email protected]> Reviewed-by: Yuchung Cheng <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-04page_pool: fix documentation typosRandy Dunlap1-3/+3
Correct grammar for better readability. Signed-off-by: Randy Dunlap <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Ilias Apalodimas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-04net: appletalk: remove cops supportGreg Kroah-Hartman1-1/+0
The COPS Appletalk support is very old, never said to actually work properly, and the firmware code for the devices are under a very suspect license. Remove it all to clear up the license issue, if it is still needed and actually used by anyone, we can add it back later once the license is cleared up. Reported-by: Prarit Bhargava <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Acked-by: Prarit Bhargava <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-04Merge tag 'wireless-2023-09-27' of ↵Jakub Kicinski1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Quite a collection of fixes this time, really too many to list individually. Many stack fixes, even rfkill (found by simulation and the new eevdf scheduler)! Also a bigger maintainers file cleanup, to remove old and redundant information. * tag 'wireless-2023-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (32 commits) wifi: iwlwifi: mvm: Fix incorrect usage of scan API wifi: mac80211: Create resources for disabled links wifi: cfg80211: avoid leaking stack data into trace wifi: mac80211: allow transmitting EAPOL frames with tainted key wifi: mac80211: work around Cisco AP 9115 VHT MPDU length wifi: cfg80211: Fix 6GHz scan configuration wifi: mac80211: fix potential key leak wifi: mac80211: fix potential key use-after-free wifi: mt76: mt76x02: fix MT76x0 external LNA gain handling wifi: brcmfmac: Replace 1-element arrays with flexible arrays wifi: mwifiex: Fix oob check condition in mwifiex_process_rx_packet wifi: rtw88: rtw8723d: Fix MAC address offset in EEPROM rfkill: sync before userspace visibility/changes wifi: mac80211: fix mesh id corruption on 32 bit systems wifi: cfg80211: add missing kernel-doc for cqm_rssi_work wifi: cfg80211: fix cqm_config access race wifi: iwlwifi: mvm: Fix a memory corruption issue wifi: iwlwifi: Ensure ack flag is properly cleared. wifi: iwlwifi: dbg_ini: fix structure packing iwlwifi: mvm: handle PS changes in vif_cfg_changed ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-03net: dsa: notify drivers of MAC address changes on user portsVladimir Oltean1-0/+10
In some cases, drivers may need to veto the changing of a MAC address on a user port. Such is the case with KSZ9477 when it offloads a HSR device, because it programs the MAC address of multiple ports to a shared hardware register. Those ports need to have equal MAC addresses for the lifetime of the HSR offload. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Lukasz Majewski <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03net: dsa: propagate extack to ds->ops->port_hsr_join()Vladimir Oltean1-1/+2
Drivers can provide meaningful error messages which state a reason why they can't perform an offload, and dsa_slave_changeupper() already has the infrastructure to propagate these over netlink rather than printing to the kernel log. So pass the extack argument and modify the xrs700x driver's port_hsr_join() prototype. Also take the opportunity and use the extack for the 2 -EOPNOTSUPP cases from xrs700x_hsr_join(). Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Lukasz Majewski <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03ipv6: mark address parameters of udp_tunnel6_xmit_skb() as constBeniamino Galvani1-2/+3
The function doesn't modify the addresses passed as input, mark them as 'const' to make that clear. Signed-off-by: Beniamino Galvani <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-10-03ipv4/fib: send notify when delete source address routesHangbin Liu1-0/+1
After deleting an interface address in fib_del_ifaddr(), the function scans the fib_info list for stray entries and calls fib_flush() and fib_table_flush(). Then the stray entries will be deleted silently and no RTM_DELROUTE notification will be sent. This lack of notification can make routing daemons, or monitor like `ip monitor route` miss the routing changes. e.g. + ip link add dummy1 type dummy + ip link add dummy2 type dummy + ip link set dummy1 up + ip link set dummy2 up + ip addr add 192.168.5.5/24 dev dummy1 + ip route add 7.7.7.0/24 dev dummy2 src 192.168.5.5 + ip -4 route 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 + ip monitor route + ip addr del 192.168.5.5/24 dev dummy1 Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5 Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5 As Ido reminded, fib_table_flush() isn't only called when an address is deleted, but also when an interface is deleted or put down. The lack of notification in these cases is deliberate. And commit 7c6bb7d2faaf ("net/ipv6: Add knob to skip DELROUTE message on device down") introduced a sysctl to make IPv6 behave like IPv4 in this regard. So we can't send the route delete notify blindly in fib_table_flush(). To fix this issue, let's add a new flag in "struct fib_info" to track the deleted prefer source address routes, and only send notify for them. After update: + ip monitor route + ip addr del 192.168.5.5/24 dev dummy1 Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5 Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5 Deleted 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5 Suggested-by: Thomas Haller <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Acked-by: Nicolas Dichtel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>