aboutsummaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2023-06-17tcp: enforce receive buffer memory limits by allowing the tcp window to shrink[email protected]1-0/+1
Under certain circumstances, the tcp receive buffer memory limit set by autotuning (sk_rcvbuf) is increased due to incoming data packets as a result of the window not closing when it should be. This can result in the receive buffer growing all the way up to tcp_rmem[2], even for tcp sessions with a low BDP. To reproduce: Connect a TCP session with the receiver doing nothing and the sender sending small packets (an infinite loop of socket send() with 4 bytes of payload with a sleep of 1 ms in between each send()). This will cause the tcp receive buffer to grow all the way up to tcp_rmem[2]. As a result, a host can have individual tcp sessions with receive buffers of size tcp_rmem[2], and the host itself can reach tcp_mem limits, causing the host to go into tcp memory pressure mode. The fundamental issue is the relationship between the granularity of the window scaling factor and the number of byte ACKed back to the sender. This problem has previously been identified in RFC 7323, appendix F [1]. The Linux kernel currently adheres to never shrinking the window. In addition to the overallocation of memory mentioned above, the current behavior is functionally incorrect, because once tcp_rmem[2] is reached when no remediations remain (i.e. tcp collapse fails to free up any more memory and there are no packets to prune from the out-of-order queue), the receiver will drop in-window packets resulting in retransmissions and an eventual timeout of the tcp session. A receive buffer full condition should instead result in a zero window and an indefinite wait. In practice, this problem is largely hidden for most flows. It is not applicable to mice flows. Elephant flows can send data fast enough to "overrun" the sk_rcvbuf limit (in a single ACK), triggering a zero window. But this problem does show up for other types of flows. Examples are websockets and other type of flows that send small amounts of data spaced apart slightly in time. In these cases, we directly encounter the problem described in [1]. RFC 7323, section 2.4 [2], says there are instances when a retracted window can be offered, and that TCP implementations MUST ensure that they handle a shrinking window, as specified in RFC 1122, section 4.2.2.16 [3]. All prior RFCs on the topic of tcp window management have made clear that sender must accept a shrunk window from the receiver, including RFC 793 [4] and RFC 1323 [5]. This patch implements the functionality to shrink the tcp window when necessary to keep the right edge within the memory limit by autotuning (sk_rcvbuf). This new functionality is enabled with the new sysctl: net.ipv4.tcp_shrink_window Additional information can be found at: https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/ [1] https://www.rfc-editor.org/rfc/rfc7323#appendix-F [2] https://www.rfc-editor.org/rfc/rfc7323#section-2.4 [3] https://www.rfc-editor.org/rfc/rfc1122#page-91 [4] https://www.rfc-editor.org/rfc/rfc793 [5] https://www.rfc-editor.org/rfc/rfc1323 Signed-off-by: Mike Freemon <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-17net: sched: Remove unused qdisc_l2t()YueHaibing1-14/+0
This is unused since switch to psched_l2t_ns(). Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-15net: ioctl: Use kernel memory on protocol ioctl callbacksBreno Leitao4-3/+27
Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl callback. This is not flexible, because it is hard to reuse these functions without passing userspace buffers. Change the "struct proto" ioctls to avoid touching userspace memory and operate on kernel buffers, i.e., all protocol's ioctl callbacks is adapted to operate on a kernel memory other than on userspace (so, no more {put,get}_user() and friends being called in the ioctl callback). This changes the "struct proto" ioctl format in the following way: int (*ioctl)(struct sock *sk, int cmd, - unsigned long arg); + int *karg); (Important to say that this patch does not touch the "struct proto_ops" protocols) So, the "karg" argument, which is passed to the ioctl callback, is a pointer allocated to kernel space memory (inside a function wrapper). This buffer (karg) may contain input argument (copied from userspace in a prep function) and it might return a value/buffer, which is copied back to userspace if necessary. There is not one-size-fits-all format (that is I am using 'may' above), but basically, there are three type of ioctls: 1) Do not read from userspace, returns a result to userspace 2) Read an input parameter from userspace, and does not return anything to userspace 3) Read an input from userspace, and return a buffer to userspace. The default case (1) (where no input parameter is given, and an "int" is returned to userspace) encompasses more than 90% of the cases, but there are two other exceptions. Here is a list of exceptions: * Protocol RAW: * cmd = SIOCGETVIFCNT: * input and output = struct sioc_vif_req * cmd = SIOCGETSGCNT * input and output = struct sioc_sg_req * Explanation: for the SIOCGETVIFCNT case, userspace passes the input argument, which is struct sioc_vif_req. Then the callback populates the struct, which is copied back to userspace. * Protocol RAW6: * cmd = SIOCGETMIFCNT_IN6 * input and output = struct sioc_mif_req6 * cmd = SIOCGETSGCNT_IN6 * input and output = struct sioc_sg_req6 * Protocol PHONET: * cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE * input int (4 bytes) * Nothing is copied back to userspace. For the exception cases, functions sock_sk_ioctl_inout() will copy the userspace input, and copy it back to kernel space. The wrapper that prepare the buffer and put the buffer back to user is sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now calls sk_ioctl(), which will handle all cases. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-2/+12
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/mlx5/driver.h 617f5db1a626 ("RDMA/mlx5: Fix affinity assignment") dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports") https://lore.kernel.org/all/[email protected]/ tools/testing/selftests/net/mptcp/mptcp_join.sh 47867f0a7e83 ("selftests: mptcp: join: skip check if MIB counter not supported") 425ba803124b ("selftests: mptcp: join: support RM_ADDR for used endpoints or not") 45b1a1227a7a ("mptcp: introduces more address related mibs") 0639fa230a21 ("selftests: mptcp: add explicit check for new mibs") https://lore.kernel.org/netdev/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net/ No adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-15net: tls: make the offload check helper take skb not socketJakub Kicinski1-3/+5
All callers of tls_is_sk_tx_device_offloaded() currently do an equivalent of: if (skb->sk && tls_is_skb_tx_device_offloaded(skb->sk)) Have the helper accept skb and do the skb->sk check locally. Two drivers have local static inlines with similar wrappers already. While at it change the ifdef condition to TLS_DEVICE. Only TLS_DEVICE selects SOCK_VALIDATE_XMIT, so the two are equivalent. This makes removing the duplicated IS_ENABLED() check in funeth more obviously correct. Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Tariq Toukan <[email protected]> Acked-by: Dimitris Michailidis <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-14wifi: iwlwifi: mvm: support U-SIG EHT validate checksJohannes Berg1-0/+2
Support new firmware that can validate the validate bits in sniffer mode, and advertise that fact and the result of the checks in the U-SIG radiotap field. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230613155501.c20480aa1171.Icc0d077dae01d662ccb948823e196aa9c5c87976@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-14wifi: mac80211: Do not use "non-MLD AP" syntaxIlan Peer1-3/+6
Instead clarify the cases where link ID == 0 is intended for an AP STA that is not part of an AP MLD. Signed-off-by: Ilan Peer <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230611121219.77236a2e26ad.I8193ca8e236c9eb015870471f77a7d5134da3156@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-14wifi: cfg80211: Support association to AP MLD with disabled linksIlan Peer1-1/+4
An AP part of an AP MLD might be temporarily disabled, and might be enabled later. Such a link should be included in the association exchange, but should not be used until enabled. Extend the NL80211_CMD_ASSOCIATE to also indicate disabled links. Signed-off-by: Ilan Peer <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230608163202.c4c61ee4c4a5.I784ef4a0d619fc9120514b5615458fbef3b3684a@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-14wifi: mac80211: Add getter functions for vif MLD stateIlan Peer1-0/+21
As a preparation to support disabled/dormant links, add the following function: - ieee80211_vif_usable_links(): returns the bitmap of the links that can be activated. Use this function in all the places that the bitmap of the usable links is needed. - ieee80211_vif_is_mld(): returns true iff the vif is an MLD. Use this function in all the places where an indication that the connection is a MLD is needed. Signed-off-by: Ilan Peer <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230608163202.86e3351da1fc.If6fe3a339fda2019f13f57ff768ecffb711b710a@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-14wifi: mac80211: allow disabling SMPS debugfs controlsMiri Korenblit1-0/+3
There are cases in which we don't want the user to override the smps mode, e.g. when SMPS should be disabled due to EMLSR. Add a driver flag to disable SMPS overriding and don't override if it is set. Signed-off-by: Miri Korenblit <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230608163202.ef129e80556c.I74a298fdc86b87074c95228d3916739de1400597@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-14wifi: mac80211: add helpers to access sband iftype dataJohannes Berg1-1/+43
There's quite a bit of code accessing sband iftype data (HE, HE 6 GHz, EHT) and we always need to remember to use the ieee80211_vif_type_p2p() helper. Add new helpers to directly get it from the sband/vif rather than having to call ieee80211_vif_type_p2p(). Convert most code with the following spatch: @@ expression vif, sband; @@ -ieee80211_get_he_iftype_cap(sband, ieee80211_vif_type_p2p(vif)) +ieee80211_get_he_iftype_cap_vif(sband, vif) @@ expression vif, sband; @@ -ieee80211_get_eht_iftype_cap(sband, ieee80211_vif_type_p2p(vif)) +ieee80211_get_eht_iftype_cap_vif(sband, vif) @@ expression vif, sband; @@ -ieee80211_get_he_6ghz_capa(sband, ieee80211_vif_type_p2p(vif)) +ieee80211_get_he_6ghz_capa_vif(sband, vif) Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230604120651.db099f49e764.Ie892966c49e22c7b7ee1073bc684f142debfdc84@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-14wifi: cfg80211: S1G rate information and calculationsGilad Itzkovitch1-3/+15
Increase the size of S1G rate_info flags to support S1G and add flags for new S1G MCS and the supported bandwidths. Also, include S1G rate information to netlink STA rate message. Lastly, add rate calculation function for S1G MCS. Signed-off-by: Gilad Itzkovitch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2023-06-14net/sched: qdisc_destroy() old ingress and clsact Qdiscs before graftingPeilin Ye1-0/+8
mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized in ingress_init() to point to net_device::miniq_ingress. ingress Qdiscs access this per-net_device pointer in mini_qdisc_pair_swap(). Similar for clsact Qdiscs and miniq_egress. Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER requests (thanks Hillf Danton for the hint), when replacing ingress or clsact Qdiscs, for example, the old Qdisc ("@old") could access the same miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"), causing race conditions [1] including a use-after-free bug in mini_qdisc_pair_swap() reported by syzbot: BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573 Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901 ... Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319 print_report mm/kasan/report.c:430 [inline] kasan_report+0x11c/0x130 mm/kasan/report.c:536 mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573 tcf_chain_head_change_item net/sched/cls_api.c:495 [inline] tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509 tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline] tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline] tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266 ... @old and @new should not affect each other. In other words, @old should never modify miniq_{in,e}gress after @new, and @new should not update @old's RCU state. Fixing without changing sch_api.c turned out to be difficult (please refer to Closes: for discussions). Instead, make sure @new's first call always happen after @old's last call (in {ingress,clsact}_destroy()) has finished: In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests, and call qdisc_destroy() for @old before grafting @new. Introduce qdisc_refcount_dec_if_one() as the counterpart of qdisc_refcount_inc_nz() used for filter requests. Introduce a non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check, just like qdisc_put() etc. Depends on patch "net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs". [1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under TC_H_ROOT (no longer possible after commit c7cfbd115001 ("net/sched: sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8 transmission queues: Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then adds a flower filter X to A. Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and b2) to replace A, then adds a flower filter Y to B. Thread 1 A's refcnt Thread 2 RTM_NEWQDISC (A, RTNL-locked) qdisc_create(A) 1 qdisc_graft(A) 9 RTM_NEWTFILTER (X, RTNL-unlocked) __tcf_qdisc_find(A) 10 tcf_chain0_head_change(A) mini_qdisc_pair_swap(A) (1st) | | RTM_NEWQDISC (B, RTNL-locked) RCU sync 2 qdisc_graft(B) | 1 notify_and_destroy(A) | tcf_block_release(A) 0 RTM_NEWTFILTER (Y, RTNL-unlocked) qdisc_destroy(A) tcf_chain0_head_change(B) tcf_chain0_head_change_cb_del(A) mini_qdisc_pair_swap(B) (2nd) mini_qdisc_pair_swap(A) (3rd) | ... ... Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to its mini Qdisc, b1. Then, A calls mini_qdisc_pair_swap() again during ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress packets on eth0 will not find filter Y in sch_handle_ingress(). This is just one of the possible consequences of concurrently accessing miniq_{in,e}gress pointers. Fixes: 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops") Fixes: 87f373921c4e ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops") Reported-by: [email protected] Closes: https://lore.kernel.org/r/[email protected]/ Cc: Hillf Danton <[email protected]> Cc: Vlad Buslov <[email protected]> Signed-off-by: Peilin Ye <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-06-14net/sched: act_ct: Fix promotion of offloaded unreplied tuplePaul Blakey1-1/+1
Currently UNREPLIED and UNASSURED connections are added to the nf flow table. This causes the following connection packets to be processed by the flow table which then skips conntrack_in(), and thus such the connections will remain UNREPLIED and UNASSURED even if reply traffic is then seen. Even still, the unoffloaded reply packets are the ones triggering hardware update from new to established state, and if there aren't any to triger an update and/or previous update was missed, hardware can get out of sync with sw and still mark packets as new. Fix the above by: 1) Not skipping conntrack_in() for UNASSURED packets, but still refresh for hardware, as before the cited patch. 2) Try and force a refresh by reply-direction packets that update the hardware rules from new to established state. 3) Remove any bidirectional flows that didn't failed to update in hardware for re-insertion as bidrectional once any new packet arrives. Fixes: 6a9bad0069cf ("net/sched: act_ct: offload UDP NEW connections") Co-developed-by: Vlad Buslov <[email protected]> Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-06-12kcm: Send multiple frags in one sendmsg()David Howells1-1/+1
Rewrite the AF_KCM transmission loop to send all the fragments in a single skb or frag_list-skb in one sendmsg() with MSG_SPLICE_PAGES set. The list of fragments in each skb is conveniently a bio_vec[] that can just be attached to a BVEC iter. Note: I'm working out the size of each fragment-skb by adding up bv_len for all the bio_vecs in skb->frags[] - but surely this information is recorded somewhere? For the skbs in head->frag_list, this is equal to skb->data_len, but not for the head. head->data_len includes all the tail frags too. Signed-off-by: David Howells <[email protected]> cc: Tom Herbert <[email protected]> cc: Tom Herbert <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-12net: flow_dissector: add support for cfm packetsZahari Doychev1-0/+21
Add support for dissecting cfm packets. The cfm packet header fields maintenance domain level and opcode can be dissected. Signed-off-by: Zahari Doychev <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-12tcp: remove size parameter from tcp_stream_alloc_skb()Eric Dumazet1-1/+1
Now all tcp_stream_alloc_skb() callers pass @size == 0, we can remove this parameter. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-12tcp: let tcp_send_syn_data() build headless packetsEric Dumazet1-0/+1
tcp_send_syn_data() is the last component in TCP transmit path to put payload in skb->head. Switch it to use page frags, so that we can remove dead code later. This allows to put more payload than previous implementation. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-12scm: add SO_PASSPIDFD and SCM_PIDFDAlexander Mikhalitsyn1-2/+37
Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS, but it contains pidfd instead of plain pid, which allows programmers not to care about PID reuse problem. We mask SO_PASSPIDFD feature if CONFIG_UNIX is not builtin because it depends on a pidfd_prepare() API which is not exported to the kernel modules. Idea comes from UAPI kernel group: https://uapi-group.org/kernel-features/ Big thanks to Christian Brauner and Lennart Poettering for productive discussions about this. Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: David Ahern <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Kees Cook <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Lennart Poettering <[email protected]> Cc: Luca Boccassi <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Tested-by: Luca Boccassi <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Alexander Mikhalitsyn <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-12net/sched: taprio: report class offload stats per TXQ, not per TCVladimir Oltean1-5/+5
The taprio Qdisc creates child classes per netdev TX queue, but taprio_dump_class_stats() currently reports offload statistics per traffic class. Traffic classes are groups of TXQs sharing the same dequeue priority, so this is incorrect and we shouldn't be bundling up the TXQ stats when reporting them, as we currently do in enetc. Modify the API from taprio to drivers such that they report TXQ offload stats and not TC offload stats. There is no change in the UAPI or in the global Qdisc stats. Fixes: 6c1adb650c8d ("net/sched: taprio: add netlink reporting for offload statistics counters") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-10Merge tag 'nf-23-06-08' of ↵David S. Miller1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf netfilter pull request 23-06-08 Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for net: 1) Add commit and abort set operation to pipapo set abort path. 2) Bail out immediately in case of ENOMEM in nfnetlink batch. 3) Incorrect error path handling when creating a new rule leads to dangling pointer in set transaction list. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-06-10net: move gso declarations and functions to their own filesEric Dumazet3-0/+111
Move declarations into include/net/gso.h and code into net/core/gso.c Signed-off-by: Eric Dumazet <[email protected]> Cc: Stanislav Fomichev <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-09Merge tag 'wireless-next-2023-06-09' of ↵Jakub Kicinski2-4/+96
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.5 The second pull request for v6.5. We have support for three new Realtek chipsets, all from different generations. Shows how active Realtek development is right now, even older generations are being worked on. Note: We merged wireless into wireless-next to avoid complex conflicts between the trees. Major changes: rtl8xxxu - RTL8192FU support rtw89 - RTL8851BE support rtw88 - RTL8723DS support ath11k - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode iwlwifi - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature cfg80211/mac80211 - more Multi-Link Operation (MLO) support such as hardware restart - fixes for a potential work/mutex deadlock and with it beginnings of the previously discussed locking simplifications * tag 'wireless-next-2023-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (162 commits) wifi: rtlwifi: remove misused flag from HAL data wifi: rtlwifi: remove unused dualmac control leftovers wifi: rtlwifi: remove unused timer and related code wifi: rsi: Do not set MMC_PM_KEEP_POWER in shutdown wifi: rsi: Do not configure WoWlan in shutdown hook if not enabled wifi: brcmfmac: Detect corner error case earlier with log wifi: rtw89: 8852c: update RF radio A/B parameters to R63 wifi: rtw89: 8852c: update TX power tables to R63 with 6 GHz power type (3 of 3) wifi: rtw89: 8852c: update TX power tables to R63 with 6 GHz power type (2 of 3) wifi: rtw89: 8852c: update TX power tables to R63 with 6 GHz power type (1 of 3) wifi: rtw89: process regulatory for 6 GHz power type wifi: rtw89: regd: update regulatory map to R64-R40 wifi: rtw89: regd: judge 6 GHz according to chip and BIOS wifi: rtw89: refine clearing supported bands to check 2/5 GHz first wifi: rtw89: 8851b: configure CRASH_TRIGGER feature for 8851B wifi: rtw89: set TX power without precondition during setting channel wifi: rtw89: debug: txpwr table access only valid page according to chip wifi: rtw89: 8851b: enable hw_scan support wifi: cfg80211: move scan done work to wiphy work wifi: cfg80211: move sched scan stop to wiphy work ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-08ipv4, ipv6: Use splice_eof() to flushDavid Howells3-0/+3
Allow splice to undo the effects of MSG_MORE after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. For UDP, a pending packet will not be emitted if the socket is closed before it is flushed; with this change, it be flushed by ->splice_eof(). For TCP, it's not clear that MSG_MORE is actually effective. Suggested-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <[email protected]> cc: Kuniyuki Iwashima <[email protected]> cc: Willem de Bruijn <[email protected]> cc: David Ahern <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-08splice, net: Add a splice_eof op to file-ops and socket-opsDavid Howells1-0/+1
Add an optional method, ->splice_eof(), to allow splice to indicate the premature termination of a splice to struct file_operations and struct proto_ops. This is called if sendfile() or splice() encounters all of the following conditions inside splice_direct_to_actor(): (1) the user did not set SPLICE_F_MORE (splice only), and (2) an EOF condition occurred (->splice_read() returned 0), and (3) we haven't read enough to fulfill the request (ie. len > 0 still), and (4) we have already spliced at least one byte. A further patch will modify the behaviour of SPLICE_F_MORE to always be passed to the actor if either the user set it or we haven't yet read sufficient data to fulfill the request. Suggested-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> cc: Jens Axboe <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Jan Kara <[email protected]> cc: Jeff Layton <[email protected]> cc: David Hildenbrand <[email protected]> cc: Christian Brauner <[email protected]> cc: Chuck Lever <[email protected]> cc: Boris Pismenny <[email protected]> cc: John Fastabend <[email protected]> cc: [email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski9-18/+26
Cross-merge networking fixes after downstream PR. Conflicts: net/sched/sch_taprio.c d636fc5dd692 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping") dced11ef84fb ("net/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()") net/ipv4/sysctl_net_ipv4.c e209fee4118f ("net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294") ccce324dabfe ("tcp: make the first N SYN RTO backoffs linear") https://lore.kernel.org/all/[email protected]/ No adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-08netfilter: nf_tables: integrate pipapo into commit protocolPablo Neira Ayuso1-1/+3
The pipapo set backend follows copy-on-update approach, maintaining one clone of the existing datastructure that is being updated. The clone and current datastructures are swapped via rcu from the commit step. The existing integration with the commit protocol is flawed because there is no operation to clean up the clone if the transaction is aborted. Moreover, the datastructure swap happens on set element activation. This patch adds two new operations for sets: commit and abort, these new operations are invoked from the commit and abort steps, after the transactions have been digested, and it updates the pipapo set backend to use it. This patch adds a new ->pending_update field to sets to maintain a list of sets that require this new commit and abort operations. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2023-06-07wifi: cfg80211: add a work abstraction with special semanticsJohannes Berg1-4/+91
Add a work abstraction at the cfg80211 level that will always hold the wiphy_lock() for any work executed and therefore also can be canceled safely (without waiting) while holding that. This improves on what we do now as with the new wiphy works we don't have to worry about locking while cancelling them safely. Also, don't let such works run while the device is suspended, since they'll likely need to interact with the device. Flush them before suspend though. Signed-off-by: Johannes Berg <[email protected]>
2023-06-07net: sched: move rtm_tca_policy declaration to include fileEric Dumazet1-0/+2
rtm_tca_policy is used from net/sched/sch_api.c and net/sched/cls_api.c, thus should be declared in an include file. This fixes the following sparse warning: net/sched/sch_api.c:1434:25: warning: symbol 'rtm_tca_policy' was not declared. Should it be static? Fixes: e331473fee3d ("net/sched: cls_api: add missing validation of netlink attributes") Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-07net: sched: add rcu annotations around qdisc->qdisc_sleepingEric Dumazet1-2/+4
syzbot reported a race around qdisc->qdisc_sleeping [1] It is time we add proper annotations to reads and writes to/from qdisc->qdisc_sleeping. [1] BUG: KCSAN: data-race in dev_graft_qdisc / qdisc_lookup_rcu read to 0xffff8881286fc618 of 8 bytes by task 6928 on cpu 1: qdisc_lookup_rcu+0x192/0x2c0 net/sched/sch_api.c:331 __tcf_qdisc_find+0x74/0x3c0 net/sched/cls_api.c:1174 tc_get_tfilter+0x18f/0x990 net/sched/cls_api.c:2547 rtnetlink_rcv_msg+0x7af/0x8c0 net/core/rtnetlink.c:6386 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x375/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x1e3/0x270 net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __x64_sys_sendmsg+0x46/0x50 net/socket.c:2593 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd write to 0xffff8881286fc618 of 8 bytes by task 6912 on cpu 0: dev_graft_qdisc+0x4f/0x80 net/sched/sch_generic.c:1115 qdisc_graft+0x7d0/0xb60 net/sched/sch_api.c:1103 tc_modify_qdisc+0x712/0xf10 net/sched/sch_api.c:1693 rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6395 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x375/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x1e3/0x270 net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __x64_sys_sendmsg+0x46/0x50 net/socket.c:2593 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 6912 Comm: syz-executor.5 Not tainted 6.4.0-rc3-syzkaller-00190-g0d85b27b0cc6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023 Fixes: 3a7d0d07a386 ("net: sched: extend Qdisc with rcu") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Vlad Buslov <[email protected]> Acked-by: Jamal Hadi Salim<[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-07rfs: annotate lockless accesses to sk->sk_rxhashEric Dumazet1-5/+13
Add READ_ONCE()/WRITE_ONCE() on accesses to sk->sk_rxhash. This also prevents a (smart ?) compiler to remove the condition in: if (sk->sk_rxhash != newval) sk->sk_rxhash = newval; We need the condition to avoid dirtying a shared cache line. Fixes: fec5e652e58f ("rfs: Receive Flow Steering") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-06Merge tag 'for-net-2023-06-05' of ↵Jakub Kicinski2-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fixes to debugfs registration - Fix use-after-free in hci_remove_ltk/hci_remove_irk - Fixes to ISO channel support - Fix missing checks for invalid L2CAP DCID - Fix l2cap_disconnect_req deadlock - Add lock to protect HCI_UNREGISTER * tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: Add missing checks for invalid DCID Bluetooth: ISO: use correct CIS order in Set CIG Parameters event Bluetooth: ISO: don't try to remove CIG if there are bound CIS left Bluetooth: Fix l2cap_disconnect_req deadlock Bluetooth: hci_qca: fix debugfs registration Bluetooth: fix debugfs registration Bluetooth: hci_sync: add lock to protect HCI_UNREGISTER Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irk Bluetooth: ISO: Fix CIG auto-allocation to select configurable CIG Bluetooth: ISO: consider right CIS when removing CIG at cleanup ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-06ipv6: rpl: Fix Route of Death.Kuniyuki Iwashima1-3/+0
A remote DoS vulnerability of RPL Source Routing is assigned CVE-2023-2156. The Source Routing Header (SRH) has the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Hdr Ext Len | Routing Type | Segments Left | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CmprI | CmprE | Pad | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | . . . Addresses[1..n] . . . | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The originator of an SRH places the first hop's IPv6 address in the IPv6 header's IPv6 Destination Address and the second hop's IPv6 address as the first address in Addresses[1..n]. The CmprI and CmprE fields indicate the number of prefix octets that are shared with the IPv6 Destination Address. When CmprI or CmprE is not 0, Addresses[1..n] are compressed as follows: 1..n-1 : (16 - CmprI) bytes n : (16 - CmprE) bytes Segments Left indicates the number of route segments remaining. When the value is not zero, the SRH is forwarded to the next hop. Its address is extracted from Addresses[n - Segment Left + 1] and swapped with IPv6 Destination Address. When Segment Left is greater than or equal to 2, the size of SRH is not changed because Addresses[1..n-1] are decompressed and recompressed with CmprI. OTOH, when Segment Left changes from 1 to 0, the new SRH could have a different size because Addresses[1..n-1] are decompressed with CmprI and recompressed with CmprE. Let's say CmprI is 15 and CmprE is 0. When we receive SRH with Segment Left >= 2, Addresses[1..n-1] have 1 byte for each, and Addresses[n] has 16 bytes. When Segment Left is 1, Addresses[1..n-1] is decompressed to 16 bytes and not recompressed. Finally, the new SRH will need more room in the header, and the size is (16 - 1) * (n - 1) bytes. Here the max value of n is 255 as Segment Left is u8, so in the worst case, we have to allocate 3825 bytes in the skb headroom. However, now we only allocate a small fixed buffer that is IPV6_RPL_SRH_WORST_SWAP_SIZE (16 + 7 bytes). If the decompressed size overflows the room, skb_push() hits BUG() below [0]. Instead of allocating the fixed buffer for every packet, let's allocate enough headroom only when we receive SRH with Segment Left 1. [0]: skbuff: skb_under_panic: text:ffffffff81c9f6e2 len:576 put:576 head:ffff8880070b5180 data:ffff8880070b4fb0 tail:0x70 end:0x140 dev:lo kernel BUG at net/core/skbuff.c:200! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 154 Comm: python3 Not tainted 6.4.0-rc4-00190-gc308e9ec0047 #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:skb_panic (net/core/skbuff.c:200) Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 80 6e 77 82 e8 ad 8b 60 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 RSP: 0018:ffffc90000003da0 EFLAGS: 00000246 RAX: 0000000000000085 RBX: ffff8880058a6600 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88807dc1c540 RDI: ffff88807dc1c540 RBP: ffffc90000003e48 R08: ffffffff82b392c8 R09: 00000000ffffdfff R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888005b1c800 R13: ffff8880070b51b8 R14: ffff888005b1ca18 R15: ffff8880070b5190 FS: 00007f4539f0b740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055670baf3000 CR3: 0000000005b0e000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <IRQ> skb_push (net/core/skbuff.c:210) ipv6_rthdr_rcv (./include/linux/skbuff.h:2880 net/ipv6/exthdrs.c:634 net/ipv6/exthdrs.c:718) ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5)) ip6_input_finish (./include/linux/rcupdate.h:805 net/ipv6/ip6_input.c:483) __netif_receive_skb_one_core (net/core/dev.c:5494) process_backlog (./include/linux/rcupdate.h:805 net/core/dev.c:5934) __napi_poll (net/core/dev.c:6496) net_rx_action (net/core/dev.c:6565 net/core/dev.c:6696) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572) do_softirq (kernel/softirq.c:472 kernel/softirq.c:459) </IRQ> <TASK> __local_bh_enable_ip (kernel/softirq.c:396) __dev_queue_xmit (net/core/dev.c:4272) ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:134) rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914) sock_sendmsg (net/socket.c:724 net/socket.c:747) __sys_sendto (net/socket.c:2144) __x64_sys_sendto (net/socket.c:2156 net/socket.c:2152 net/socket.c:2152) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) RIP: 0033:0x7f453a138aea Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffcc212a1c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007ffcc212a288 RCX: 00007f453a138aea RDX: 0000000000000060 RSI: 00007f4539084c20 RDI: 0000000000000003 RBP: 00007f4538308e80 R08: 00007ffcc212a300 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f4539712d1b </TASK> Modules linked in: Fixes: 8610c7c6e3bd ("net: ipv6: add support for rpl sr exthdr") Reported-by: Max VA Closes: https://www.interruptlabs.co.uk/articles/linux-ipv6-route-of-death Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-06wifi: mac80211: provide a helper to fetch the medium synchronization delayEmmanuel Grumbach1-0/+3
There are drivers which need this information. Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230604120651.b1043f3126e2.Iad3806f8bf8df07f52ef0a02cc3d0373c44a8c93@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-06wifi: mac80211: fetch and store the EML capability informationEmmanuel Grumbach1-0/+2
We need to teach the low level driver about the EML capability which includes information for EMLSR / EMLMR operation. Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2023-06-06gro: decrease size of CBRichard Gobert1-10/+16
The GRO control block (NAPI_GRO_CB) is currently at its maximum size. This commit reduces its size by putting two groups of fields that are used only at different times into a union. Specifically, the fields frag0 and frag0_len are the fields that make up the frag0 optimisation mechanism, which is used during the initial parsing of the SKB. The fields last and age are used after the initial parsing, while the SKB is stored in the GRO list, waiting for other packets to arrive. There was one location in dev_gro_receive that modified the frag0 fields after setting last and age. I changed this accordingly without altering the code behaviour. Signed-off-by: Richard Gobert <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/20230601161407.GA9253@debian Signed-off-by: Paolo Abeni <[email protected]>
2023-06-06Merge tag 'v6.4-rc4' into wpan-next/stagingMiquel Raynal47-686/+1318
Linux 6.4-rc4
2023-06-05Bluetooth: ISO: use correct CIS order in Set CIG Parameters eventPauli Virtanen1-1/+2
The order of CIS handle array in Set CIG Parameters response shall match the order of the CIS_ID array in the command (Core v5.3 Vol 4 Part E Sec 7.8.97). We send CIS_IDs mainly in the order of increasing CIS_ID (but with "last" CIS first if it has fixed CIG_ID). In handling of the reply, we currently assume this is also the same as the order of hci_conn in hdev->conn_hash, but that is not true. Match the correct hci_conn to the correct handle by matching them based on the CIG+CIS combination. The CIG+CIS combination shall be unique for ISO_LINK hci_conn at state >= BT_BOUND, which we maintain in hci_le_set_cig_params. Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Pauli Virtanen <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2023-06-05Bluetooth: fix debugfs registrationJohan Hovold1-0/+1
Since commit ec6cef9cd98d ("Bluetooth: Fix SMP channel registration for unconfigured controllers") the debugfs interface for unconfigured controllers will be created when the controller is configured. There is however currently nothing preventing a controller from being configured multiple time (e.g. setting the device address using btmgmt) which results in failed attempts to register the already registered debugfs entries: debugfs: File 'features' in directory 'hci0' already present! debugfs: File 'manufacturer' in directory 'hci0' already present! debugfs: File 'hci_version' in directory 'hci0' already present! ... debugfs: File 'quirk_simultaneous_discovery' in directory 'hci0' already present! Add a controller flag to avoid trying to register the debugfs interface more than once. Fixes: ec6cef9cd98d ("Bluetooth: Fix SMP channel registration for unconfigured controllers") Cc: [email protected] # 4.0 Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2023-06-05Bluetooth: hci_sync: add lock to protect HCI_UNREGISTERZhengping Jiang1-0/+1
When the HCI_UNREGISTER flag is set, no jobs should be scheduled. Fix potential race when HCI_UNREGISTER is set after the flag is tested in hci_cmd_sync_queue. Fixes: 0b94f2651f56 ("Bluetooth: hci_sync: Fix queuing commands when HCI_UNREGISTER is set") Signed-off-by: Zhengping Jiang <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2023-06-02net/ipv6: convert skip_notify_on_dev_down sysctl to u8Eric Dumazet1-1/+1
Save a bit a space, and could help future sysctls to use the same pattern. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Acked-by: Matthieu Baerts <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02net/ipv6: fix bool/int mismatch for skip_notify_on_dev_downEric Dumazet1-1/+1
skip_notify_on_dev_down ctl table expects this field to be an int (4 bytes), not a bool (1 byte). Because proc_dou8vec_minmax() was added in 5.13, this patch converts skip_notify_on_dev_down to an int. Following patch then converts the field to u8 and use proc_dou8vec_minmax(). Fixes: 7c6bb7d2faaf ("net/ipv6: Add knob to skip DELROUTE message on device down") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Acked-by: Matthieu Baerts <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02ipv4: Drop tos parameter from flowi4_update_output()Guillaume Nault2-6/+3
Callers of flowi4_update_output() never try to update ->flowi4_tos: * ip_route_connect() updates ->flowi4_tos with its own current value. * ip_route_newports() has two users: tcp_v4_connect() and dccp_v4_connect. Both initialise fl4 with ip_route_connect(), which in turn sets ->flowi4_tos with RT_TOS(inet_sk(sk)->tos) and ->flowi4_scope based on SOCK_LOCALROUTE. Then ip_route_newports() updates ->flowi4_tos with RT_CONN_FLAGS(sk), which is the same as RT_TOS(inet_sk(sk)->tos), unless SOCK_LOCALROUTE is set on the socket. In that case, the lowest order bit is set to 1, to eventually inform ip_route_output_key_hash() to restrict the scope to RT_SCOPE_LINK. This is equivalent to properly setting ->flowi4_scope as ip_route_connect() did. * ip_vs_xmit.c initialises ->flowi4_tos with memset(0), then calls flowi4_update_output() with tos=0. * sctp_v4_get_dst() uses the same RT_CONN_FLAGS_TOS() when initialising ->flowi4_tos and when calling flowi4_update_output(). In the end, ->flowi4_tos never changes. So let's just drop the tos parameter. This will simplify the conversion of ->flowi4_tos from __u8 to dscp_t. Signed-off-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-02net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294Akihiro Suda1-5/+1
With this commit, all the GIDs ("0 4294967294") can be written to the "net.ipv4.ping_group_range" sysctl. Note that 4294967295 (0xffffffff) is an invalid GID (see gid_valid() in include/linux/uidgid.h), and an attempt to register this number will cause -EINVAL. Prior to this commit, only up to GID 2147483647 could be covered. Documentation/networking/ip-sysctl.rst had "0 4294967295" as an example value, but this example was wrong and causing -EINVAL. Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Co-developed-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Akihiro Suda <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-01devlink: bring port new reply backJiri Pirko1-1/+3
In the offending fixes commit I mistakenly removed the reply message of the port new command. I was under impression it is a new port notification, partly due to the "notify" in the name of the helper function. Bring the code sending reply with new port message back, this time putting it directly to devlink_nl_cmd_port_new_doit() Fixes: c496daeb8630 ("devlink: remove duplicate port notification") Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-01neighbour: fix unaligned access to pneigh_entryQingfang DENG1-1/+1
After the blamed commit, the member key is longer 4-byte aligned. On platforms that do not support unaligned access, e.g., MIPS32R2 with unaligned_action set to 1, this will trigger a crash when accessing an IPv6 pneigh_entry, as the key is cast to an in6_addr pointer. Change the type of the key to u32 to make it aligned. Fixes: 62dd93181aaa ("[IPV6] NDISC: Set per-entry is_router flag in Proxy NA.") Signed-off-by: Qingfang DENG <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-2/+5
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/sfc/tc.c 622ab656344a ("sfc: fix error unwinds in TC offload") b6583d5e9e94 ("sfc: support TC decap rules matching on enc_src_port") net/mptcp/protocol.c 5b825727d087 ("mptcp: add annotations around msk->subflow accesses") e76c8ef5cc5b ("mptcp: refactor mptcp_stream_accept()") Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-01RDMA/mana_ib: Use v2 version of cfg_rx_steer_req to enable RX coalescingLong Li1-1/+3
With RX coalescing, one CQE entry can be used to indicate multiple packets on the receive queue. This saves processing time and PCI bandwidth over the CQ. The MANA Ethernet driver also uses the v2 version of the protocol. It doesn't use RX coalescing and its behavior is not changed. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Long Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-06-01tcp: fix mishandling when the sack compression is deferred.fuyuanli1-0/+1
In this patch, we mainly try to handle sending a compressed ack correctly if it's deferred. Here are more details in the old logic: When sack compression is triggered in the tcp_compressed_ack_kick(), if the sock is owned by user, it will set TCP_DELACK_TIMER_DEFERRED and then defer to the release cb phrase. Later once user releases the sock, tcp_delack_timer_handler() should send a ack as expected, which, however, cannot happen due to lack of ICSK_ACK_TIMER flag. Therefore, the receiver would not sent an ack until the sender's retransmission timeout. It definitely increases unnecessary latency. Fixes: 5d9f4262b7ea ("tcp: add SACK compression") Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: fuyuanli <[email protected]> Signed-off-by: Jason Xing <[email protected]> Link: https://lore.kernel.org/netdev/20230529113804.GA20300@didi-ThinkCentre-M920t-N000/ Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/20230531080150.GA20424@didi-ThinkCentre-M920t-N000 Signed-off-by: Paolo Abeni <[email protected]>
2023-05-31net/sched: taprio: add netlink reporting for offload statistics countersVladimir Oltean1-8/+39
Offloading drivers may report some additional statistics counters, some of them even suggested by 802.1Q, like TransmissionOverrun. In my opinion we don't have to limit ourselves to reporting counters only globally to the Qdisc/interface, especially if the device has more detailed reporting (per traffic class), since the more detailed info is valuable for debugging and can help identifying who is exceeding its time slot. But on the other hand, some devices may not be able to report both per TC and global stats. So we end up reporting both ways, and use the good old ethtool_put_stat() strategy to determine which statistics are supported by this NIC. Statistics which aren't set are simply not reported to netlink. For this reason, we need something dynamic (a nlattr nest) to be reported through TCA_STATS_APP, and not something daft like the fixed-size and inextensible struct tc_codel_xstats. A good model for xstats which are a nlattr nest rather than a fixed struct seems to be cake. # Global stats $ tc -s qdisc show dev eth0 root # Per-tc stats $ tc -s class show dev eth0 Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Vinicius Costa Gomes <[email protected]> Signed-off-by: David S. Miller <[email protected]>