aboutsummaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2024-06-25af_unix: Remove U_LOCK_DIAG.Kuniyuki Iwashima1-1/+0
sk_diag_dump_icons() acquires embryo's lock by unix_state_lock_nested() to fetch its peer. The embryo's ->peer is set to NULL only when its parent listener is close()d. Then, unix_release_sock() is called for each embryo after unlinking skb by skb_dequeue(). In sk_diag_dump_icons(), we hold the parent's recvq lock, so we need not acquire unix_state_lock_nested(), and peer is always non-NULL. Let's remove unnecessary unix_state_lock_nested() and non-NULL test for peer. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-06-25af_unix: Define locking order for U_LOCK_SECOND in unix_stream_connect().Kuniyuki Iwashima1-1/+0
While a SOCK_(STREAM|SEQPACKET) socket connect()s to another, we hold two locks of them by unix_state_lock() and unix_state_lock_nested() in unix_stream_connect(). Before unix_state_lock_nested(), the following is guaranteed by checking sk->sk_state: 1. The first socket is TCP_LISTEN 2. The second socket is not the first one 3. Simultaneous connect() must fail So, the client state can be TCP_CLOSE or TCP_LISTEN or TCP_ESTABLISHED. Let's define the expected states as unix_state_lock_cmp_fn() instead of using unix_state_lock_nested(). Note that 2. is detected by debug_spin_lock_before() and 3. cannot be expressed as lock_cmp_fn. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-06-25xfrm: Fix unregister netdevice hang on hardware offload.Steffen Klassert1-26/+10
When offloading xfrm states to hardware, the offloading device is attached to the skbs secpath. If a skb is free is deferred, an unregister netdevice hangs because the netdevice is still refcounted. Fix this by removing the netdevice from the xfrm states when the netdevice is unregistered. To find all xfrm states that need to be cleared we add another list where skbs linked to that are unlinked from the lists (deleted) but not yet freed. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Steffen Klassert <[email protected]>
2024-06-24seg6: Use nested-BH locking for seg6_bpf_srh_states.Sebastian Andrzej Siewior1-0/+1
The access to seg6_bpf_srh_states is protected by disabling preemption. Based on the code, the entry point is input_action_end_bpf() and every other function (the bpf helper functions bpf_lwt_seg6_*()), that is accessing seg6_bpf_srh_states, should be called from within input_action_end_bpf(). input_action_end_bpf() accesses seg6_bpf_srh_states first at the top of the function and then disables preemption. This looks wrong because if preemption needs to be disabled as part of the locking mechanism then the variable shouldn't be accessed beforehand. Looking at how it is used via test_lwt_seg6local.sh then input_action_end_bpf() is always invoked from softirq context. If this is always the case then the preempt_disable() statement is superfluous. If this is not always invoked from softirq then disabling only preemption is not sufficient. Replace the preempt_disable() statement with nested-BH locking. This is not an equivalent replacement as it assumes that the invocation of input_action_end_bpf() always occurs in softirq context and thus the preempt_disable() is superfluous. Add a local_lock_t the data structure and use local_lock_nested_bh() for locking. Add lockdep_assert_held() to ensure the lock is held while the per-CPU variable is referenced in the helper functions. Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: David Ahern <[email protected]> Cc: Hao Luo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Fastabend <[email protected]> Cc: KP Singh <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Song Liu <[email protected]> Cc: Stanislav Fomichev <[email protected]> Cc: Yonghong Song <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-24net/ipv4: Use nested-BH locking for ipv4_tcp_sk.Sebastian Andrzej Siewior1-0/+5
ipv4_tcp_sk is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a sock member (original ipv4_tcp_sk) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: David Ahern <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+3
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c 1e7962114c10 ("bnxt_en: Restore PTP tx_avail count in case of skb_pad() error") 165f87691a89 ("bnxt_en: add timestamping statistics support") No adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-19netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter coreJianguo Wu1-0/+3
Currently, the sysctl net.netfilter.nf_hooks_lwtunnel depends on the nf_conntrack module, but the nf_conntrack module is not always loaded. Therefore, accessing net.netfilter.nf_hooks_lwtunnel may have an error. Move sysctl nf_hooks_lwtunnel into the netfilter core. Fixes: 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane") Suggested-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Jianguo Wu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-06-18net: mana: Add support for page sizes other than 4KB on ARM64Haiyang Zhang2-2/+11
As defined by the MANA Hardware spec, the queue size for DMA is 4KB minimal, and power of 2. And, the HWC queue size has to be exactly 4KB. To support page sizes other than 4KB on ARM64, define the minimal queue size as a macro separately from the PAGE_SIZE, which we always assumed it to be 4KB before supporting ARM64. Also, add MANA specific macros and update code related to size alignment, DMA region calculations, etc. Signed-off-by: Haiyang Zhang <[email protected]> Reviewed-by: Michael Kelley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-16Merge branch 'mana-shared' of ↵Leon Romanovsky8-21/+32
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Leon Romanovsky says: ==================== net: mana: Allow variable size indirection table Like we talked, I created new shared branch for this patch: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=mana-shared * 'mana-shared' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: net: mana: Allow variable size indirection table ==================== Link: https://lore.kernel.org/all/20240612183051.GE4966@unreal Signed-off-by: Leon Romanovsky <[email protected]>
2024-06-13Merge branch 'mana-shared' of ↵Jakub Kicinski2-5/+8
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Leon Romanovsky says: ==================== net: mana: Allow variable size indirection table Like we talked, I created new shared branch for this patch: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=mana-shared * 'mana-shared' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: net: mana: Allow variable size indirection table ==================== Link: https://lore.kernel.org/all/20240612183051.GE4966@unreal Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-6/+35
Cross-merge networking fixes after downstream PR. No conflicts, no adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-12flow_offload: add encapsulation control flag helpersAsbjørn Sloth Tønnesen1-0/+35
This patch adds two new helper functions: flow_rule_is_supp_enc_control_flags() flow_rule_has_enc_control_flags() They are intended to be used for validating encapsulation control flags, and compliment the similar helpers without "enc_" in the name. The only difference is that they have their own error message, to make it obvious if an unsupported flag error is related to FLOW_DISSECTOR_KEY_CONTROL or FLOW_DISSECTOR_KEY_ENC_CONTROL. flow_rule_has_enc_control_flags() is for drivers supporting FLOW_DISSECTOR_KEY_ENC_CONTROL, but not supporting any encapsulation control flags. (Currently all 4 drivers fits this category) flow_rule_is_supp_enc_control_flags() is currently only used for the above helper, but should also be used by drivers once they implement at least one encapsulation control flag. There is AFAICT currently no need for an "enc_" variant of flow_rule_match_has_control_flags(), as all drivers currently supporting FLOW_DISSECTOR_KEY_ENC_CONTROL, are already calling flow_rule_match_enc_control() directly. Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <[email protected]> Reviewed-by: Davide Caratti <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-12net: ipv4: Add a sysctl to set multipath hash seedPetr Machata3-1/+32
When calculating hashes for the purpose of multipath forwarding, both IPv4 and IPv6 code currently fall back on flow_hash_from_keys(). That uses a randomly-generated seed. That's a fine choice by default, but unfortunately some deployments may need a tighter control over the seed used. In this patch, make the seed configurable by adding a new sysctl key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used specifically for multipath forwarding and not for the other concerns that flow_hash_from_keys() is used for, such as queue selection. Expose the knob as sysctl because other such settings, such as headers to hash, are also handled that way. Like those, the multipath hash seed is a per-netns variable. Despite being placed in the net.ipv4 namespace, the multipath seed sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of TCP variables. The seed used by flow_hash_from_keys() is a 128-bit quantity. However it seems that usually the seed is a much more modest value. 32 bits seem typical (Cisco, Cumulus), some systems go even lower. For that reason, and to decouple the user interface from implementation details, go with a 32-bit quantity, which is then quadruplicated to form the siphash key. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-12net: ipv4,ipv6: Pass multipath hash computation through a helperPetr Machata1-0/+7
The following patches will add a sysctl to control multipath hash seed. In order to centralize the hash computation, add a helper, fib_multipath_hash_from_keys(), and have all IPv4 and IPv6 route.c invocations of flow_hash_from_keys() go through this helper instead. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-12net: mana: Allow variable size indirection tableShradha Gupta2-5/+8
Allow variable size indirection table allocation in MANA instead of using a constant value MANA_INDIRECT_TABLE_SIZE. The size is now derived from the MANA_QUERY_VPORT_CONFIG and the indirection table is allocated dynamically. Signed-off-by: Shradha Gupta <[email protected]> Link: https://lore.kernel.org/r/1718015319-9609-1-git-send-email-shradhagupta@linux.microsoft.com Reviewed-by: Dexuan Cui <[email protected]> Reviewed-by: Haiyang Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-06-12wifi: cfg80211: add regulatory flag to allow VLP AP operationJohannes Berg1-0/+6
Add a regulatory flag to allow VLP AP operation even on channels otherwise marked NO_IR, which may be possible in some regulatory domains/countries. Note that this requires checking also when the beacon is changed, since that may change the regulatory power type. Reviewed-by: Miriam Rachel Korenblit <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Link: https://msgid.link/20240523120945.63792ce19790.Ie2a02750d283b78fbf3c686b10565fb0388889e2@changeid Signed-off-by: Johannes Berg <[email protected]>
2024-06-12wifi: cfg80211: refactor regulatory beaconing checkingJohannes Berg1-6/+48
There are two functions exported now, with different settings, refactor to just export a single function that take a struct with different settings. This will make it easier to add more parameters. Reviewed-by: Miriam Rachel Korenblit <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Link: https://msgid.link/20240523120945.d44c34dadfc2.I59b4403108e0dbf7fc6ae8f7522e1af520cffb1c@changeid Signed-off-by: Johannes Berg <[email protected]>
2024-06-12wifi: cfg80211: move enum ieee80211_ap_reg_power to cfg80211Johannes Berg1-0/+15
This really shouldn't have been in ieee80211.h, since it doesn't directly represent the spec. Move it to cfg80211 rather than mac80211 since upcoming changes will use it there. Reviewed-by: Miriam Rachel Korenblit <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Link: https://msgid.link/20240523120945.962b16c831cd.I5745962525b1b176c5b90d37b3720fc100eee406@changeid Signed-off-by: Johannes Berg <[email protected]>
2024-06-12wifi: cfg80211: use BIT() for flag enumsJohannes Berg1-51/+51
Use BIT(x) instead of 1<<x, in part because it's mostly missing spaces anyway, in part because it reads nicer. Reviewed-by: Miriam Rachel Korenblit <[email protected]> Reviewed-by: Ilan Peer <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Link: https://msgid.link/20240523120945.c21598fbf49c.Ib8f26c5e9f508aee19fdfa1fd4b5995f084c46d4@changeid Signed-off-by: Johannes Berg <[email protected]>
2024-06-12net/tcp: Remove tcp_hash_fail()Dmitry Safonov1-37/+0
Now there are tracepoints, that cover all functionality of tcp_hash_fail(), but also wire up missing places They are also faster, can be disabled and provide filtering. This potentially may create a regression if a userspace depends on dmesg logs. Fingers crossed, let's see if anyone complains in reality. Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-12net/tcp: Move tcp_inbound_hash() from headersDmitry Safonov1-74/+4
Two reasons: 1. It's grown up enough 2. In order to not do header spaghetti by including <trace/events/tcp.h>, which is necessary for TCP tracepoints. While at it, unexport and make static tcp_inbound_ao_hash(). Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-12net/tcp: Add a helper tcp_ao_hdr_maclen()Dmitry Safonov1-0/+5
It's going to be used more in TCP-AO tracepoints. Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-12net/tcp: Use static_branch_tcp_{md5,ao} to drop ifdefsDmitry Safonov1-10/+4
It's possible to clean-up some ifdefs by hiding that tcp_{md5,ao}_needed static branch is defined and compiled only under related configs, since commit 4c8530dc7d7d ("net/tcp: Only produce AO/MD5 logs if there are any keys"). Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-10Merge tag 'wireless-next-2024-06-07' of ↵Jakub Kicinski4-11/+49
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.11 The first "new features" pull request for v6.11 with changes both in stack and in drivers. Nothing out of ordinary, except that we have two conflicts this time: net/mac80211/cfg.c https://lore.kernel.org/all/[email protected] drivers/net/wireless/microchip/wilc1000/netdev.c https://lore.kernel.org/all/[email protected] Major changes: cfg80211/mac80211 * parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers wilc1000 * read MAC address during probe to make it visible to user space iwlwifi * bump FW API to 91 for BZ/SC devices * report 64-bit radiotap timestamp * enable P2P low latency by default * handle Transmit Power Envelope (TPE) advertised by AP * start using guard() rtlwifi * RTL8192DU support ath12k * remove unsupported tx monitor handling * channel 2 in 6 GHz band support * Spatial Multiplexing Power Save (SMPS) in 6 GHz band support * multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA) support * dynamic VLAN support * add panic handler for resetting the firmware state ath10k * add qcom,no-msa-ready-indicator Device Tree property * LED support for various chipsets * tag 'wireless-next-2024-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (194 commits) wifi: ath12k: add hw_link_id in ath12k_pdev wifi: ath12k: add panic handler wifi: rtw89: chan: Use swap() in rtw89_swap_sub_entity() wifi: brcm80211: remove unused structs wifi: brcm80211: use sizeof(*pointer) instead of sizeof(type) wifi: ath12k: do not process consecutive RDDM event dt-bindings: net: wireless: ath11k: Drop "qcom,ipq8074-wcss-pil" from example wifi: ath12k: fix memory leak in ath12k_dp_rx_peer_frag_setup() wifi: rtlwifi: handle return value of usb init TX/RX wifi: rtlwifi: Enable the new rtl8192du driver wifi: rtlwifi: Add rtl8192du/sw.c wifi: rtlwifi: Constify rtl_hal_cfg.{ops,usb_interface_cfg} and rtl_priv.cfg wifi: rtlwifi: Add rtl8192du/dm.{c,h} wifi: rtlwifi: Add rtl8192du/fw.{c,h} and rtl8192du/led.{c,h} wifi: rtlwifi: Add rtl8192du/rf.{c,h} wifi: rtlwifi: Add rtl8192du/trx.{c,h} wifi: rtlwifi: Add rtl8192du/phy.{c,h} wifi: rtlwifi: Add rtl8192du/hw.{c,h} wifi: rtlwifi: Add new members to struct rtl_priv for RTL8192DU wifi: rtlwifi: Add rtl8192du/table.{c,h} ... Signed-off-by: Jakub Kicinski <[email protected]> ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-10Bluetooth: L2CAP: Fix rejecting L2CAP_CONN_PARAM_UPDATE_REQLuiz Augusto von Dentz1-4/+32
This removes the bogus check for max > hcon->le_conn_max_interval since the later is just the initial maximum conn interval not the maximum the stack could support which is really 3200=4000ms. In order to pass GAP/CONN/CPUP/BV-05-C one shall probably enter values of the following fields in IXIT that would cause hci_check_conn_params to fail: TSPX_conn_update_int_min TSPX_conn_update_int_max TSPX_conn_update_peripheral_latency TSPX_conn_update_supervision_timeout Link: https://github.com/bluez/bluez/issues/847 Fixes: e4b019515f95 ("Bluetooth: Enforce validation on max value of connection interval") Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2024-06-10geneve: Fix incorrect inner network header offset when innerprotoinherit is setGal Pressman1-2/+3
When innerprotoinherit is set, the tunneled packets do not have an inner Ethernet header. Change 'maclen' to not always assume the header length is ETH_HLEN, as there might not be a MAC header. This resolves issues with drivers (e.g. mlx5, in mlx5e_tx_tunnel_accel()) who rely on the skb inner network header offset to be correct, and use it for TX offloads. Fixes: d8a6213d70ac ("geneve: fix header validation in geneve[6]_xmit_skb") Signed-off-by: Gal Pressman <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-10tcp: move inet_twsk_schedule helper out of headerFlorian Westphal1-5/+0
Its no longer used outside inet_timewait_sock.c, so move it there. Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-10net: tcp/dccp: prepare for tw_timer un-pinningValentin Schneider1-2/+4
The TCP timewait timer is proving to be problematic for setups where scheduler CPU isolation is achieved at runtime via cpusets (as opposed to statically via isolcpus=domains). What happens there is a CPU goes through tcp_time_wait(), arming the time_wait timer, then gets isolated. TCP_TIMEWAIT_LEN later, the timer fires, causing interference for the now-isolated CPU. This is conceptually similar to the issue described in commit e02b93124855 ("workqueue: Unbind kworkers before sending them to exit()") Move inet_twsk_schedule() to within inet_twsk_hashdance(), with the ehash lock held. Expand the lock's critical section from inet_twsk_kill() to inet_twsk_deschedule_put(), serializing the scheduling vs descheduling of the timer. IOW, this prevents the following race: tcp_time_wait() inet_twsk_hashdance() inet_twsk_deschedule_put() del_timer_sync() inet_twsk_schedule() Thanks to Paolo Abeni for suggesting to leverage the ehash lock. This also restores a comment from commit ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait hashdance") as inet_twsk_hashdance() had a "Step 1" and "Step 3" comment, but the "Step 2" had gone missing. inet_twsk_deschedule_put() now acquires the ehash spinlock to synchronize with inet_twsk_hashdance_schedule(). To ease possible regression search, actual un-pin is done in next patch. Link: https://lore.kernel.org/all/[email protected]/ Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Valentin Schneider <[email protected]> Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-09RDMA/mana_ib: Process QP error events in mana_ibKonstantin Taranov1-0/+1
Process QP fatal events from the error event queue. For that, find the QP, using QPN from the event, and then call its event_handler. To find the QPs, store created RC QPs in an xarray. Signed-off-by: Konstantin Taranov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Wei Hu <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2024-06-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-3/+5
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/pensando/ionic/ionic_txrx.c d9c04209990b ("ionic: Mark error paths in the data path as unlikely") 491aee894a08 ("ionic: fix kernel panic in XDP_TX action") net/ipv6/ip6_fib.c b4cb4a1391dc ("net: use unrcu_pointer() helper") b01e1c030770 ("ipv6: fix possible race in __fib6_drop_pcpu_from()") Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-06tcp: move reqsk_alloc() to inet_connection_sock.cEric Dumazet1-33/+0
reqsk_alloc() has a single caller, no need to expose it in include/net/request_sock.h. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-06-06tcp: small changes in reqsk_put() and reqsk_free()Eric Dumazet1-2/+2
In reqsk_free(), use DEBUG_NET_WARN_ON_ONCE() instead of WARN_ON_ONCE() for a condition which never fired. In reqsk_put() directly call __reqsk_free(), there is no point checking rsk_refcnt again right after a transition to zero. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-06-06net: use unrcu_pointer() helperEric Dumazet1-1/+1
Toke mentioned unrcu_pointer() existence, allowing to remove some of the ugly casts we have when using xchg() for rcu protected pointers. Also make inet_rcv_compat const. Signed-off-by: Eric Dumazet <[email protected]> Cc: Toke Høiland-Jørgensen <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-06-05tcp: add sysctl_tcp_rto_min_usKevin Yang1-0/+1
Adding a sysctl knob to allow user to specify a default rto_min at socket init time, other than using the hard coded 200ms default rto_min. Note that the rto_min route option has the highest precedence for configuring this setting, followed by the TCP_BPF_RTO_MIN socket option, followed by the tcp_rto_min_us sysctl. Signed-off-by: Kevin Yang <[email protected]> Reviewed-by: Neal Cardwell <[email protected]> Reviewed-by: Yuchung Cheng <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Tony Lu <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05rtnetlink: make the "split" NLM_DONE handling genericJakub Kicinski1-0/+1
Jaroslav reports Dell's OMSA Systems Management Data Engine expects NLM_DONE in a separate recvmsg(), both for rtnl_dump_ifinfo() and inet_dump_ifaddr(). We already added a similar fix previously in commit 460b0d33cf10 ("inet: bring NLM_DONE out to a separate recv() again") Instead of modifying all the dump handlers, and making them look different than modern for_each_netdev_dump()-based dump handlers - put the workaround in rtnetlink code. This will also help us move the custom rtnl-locking from af_netlink in the future (in net-next). Note that this change is not touching rtnl_dump_all(). rtnl_dump_all() is different kettle of fish and a potential problem. We now mix families in a single recvmsg(), but NLM_DONE is not coalesced. Tested: ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_addr.yaml \ --dump getaddr --json '{"ifa-family": 2}' ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \ --dump getroute --json '{"rtm-family": 2}' ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_link.yaml \ --dump getlink Fixes: 3e41af90767d ("rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo()") Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()") Reported-by: Jaroslav Pulchart <[email protected]> Link: https://lore.kernel.org/all/CAK8fFZ7MKoFSEzMBDAOjoUt+vTZRRQgLDNXEOfdCCXSoXXKE0g@mail.gmail.com Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05devlink: Constify the 'table_ops' parameter of devl_dpipe_table_register()Christophe JAILLET1-2/+2
"struct devlink_dpipe_table_ops" only contains some function pointers. Update "struct devlink_dpipe_table" and the 'table_ops' parameter of devl_dpipe_table_register() so that structures in drivers can be constified. Constifying these structures will move some data to a read-only section, so increase overall security. Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05net: caif: remove unused structsDr. David Alan Gilbert1-2/+0
'cfpktq' has been unused since commit 73d6ac633c6c ("caif: code cleanup"). 'caif_packet_funcs' is declared but never defined. Remove both of them. Signed-off-by: Dr. David Alan Gilbert <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-05net: remove NULL-pointer net parameter in ip_metrics_convertJason Xing2-3/+2
When I was doing some experiments, I found that when using the first parameter, namely, struct net, in ip_metrics_convert() always triggers NULL pointer crash. Then I digged into this part, realizing that we can remove this one due to its uselessness. Signed-off-by: Jason Xing <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-06-04tcp: add a helper for setting EOR on tail skbJakub Kicinski1-0/+9
TLS (and hopefully soon PSP will) use EOR to prevent skbs with different decrypted state from getting merged, without adding new tests to the skb handling. In both cases once the connection switches to an "encrypted" state, all subsequent skbs will be encrypted, so a single "EOR fence" is sufficient to prevent mixing. Add a helper for setting the EOR bit, to make this arrangement more explicit. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-06-04tcp: wrap mptcp and decrypted checks into tcp_skb_can_collapse_rx()Jakub Kicinski1-0/+7
tcp_skb_can_collapse() checks for conditions which don't make sense on input. Because of this we ended up sprinkling a few pairs of mptcp_skb_can_collapse() and skb_cmp_decrypted() calls on the input path. Group them in a new helper. This should make it less likely that someone will check mptcp and not decrypted or vice versa when adding new code. This implicitly adds a decrypted check early in tcp_collapse(). AFAIU this will very slightly increase our ability to collapse packets under memory pressure, not a real bug. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Matthieu Baerts (NGI0) <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-06-04flow_dissector: add support for tunnel control flagsDavide Caratti2-0/+21
Dissect [no]csum, [no]dontfrag, [no]oam, [no]crit flags from skb metadata. This is a prerequisite for matching these control flags using TC flower. Suggested-by: Ilya Maximets <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-06-01net/tcp: Don't consider TCP_CLOSE in TCP_AO_ESTABLISHEDDmitry Safonov1-3/+4
TCP_CLOSE may or may not have current/rnext keys and should not be considered "established". The fast-path for TCP_CLOSE is SKB_DROP_REASON_TCP_CLOSE. This is what tcp_rcv_state_process() does anyways. Add an early drop path to not spend any time verifying segment signatures for sockets in TCP_CLOSE state. Cc: [email protected] # v6.7 Fixes: 0a3a809089eb ("net/tcp: Verify inbound TCP-AO signed segments") Signed-off-by: Dmitry Safonov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-01net: qstat: extend kdoc about get_base_statsJakub Kicinski1-0/+2
mlx5 has a dedicated queue for PTP packets. Clarify that this sort of queues can also be accounted towards the base. Reviewed-by: Joe Damato <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-13/+19
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_classifier.c abd5576b9c57 ("net: ti: icssg-prueth: Add support for ICSSG switch firmware") 56a5cf538c3f ("net: ti: icssg-prueth: Fix start counter for ft1 filter") https://lore.kernel.org/all/[email protected]/ No other adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-30ipv6: sr: restruct ifdefinesHangbin Liu2-0/+14
There are too many ifdef in IPv6 segment routing code that may cause logic problems. like commit 160e9d275218 ("ipv6: sr: fix invalid unregister error path"). To avoid this, the init functions are redefined for both cases. The code could be more clear after all fidefs are removed. Suggested-by: Simon Horman <[email protected]> Suggested-by: David Ahern <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-29net: dsa: remove mac_prepare()/mac_finish() shimsRussell King (Oracle)1-6/+0
No DSA driver makes use of the mac_prepare()/mac_finish() shimmed operations anymore, so we can remove these. Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-29net: fix __dst_negative_advice() raceEric Dumazet2-11/+4
__dst_negative_advice() does not enforce proper RCU rules when sk->dst_cache must be cleared, leading to possible UAF. RCU rules are that we must first clear sk->sk_dst_cache, then call dst_release(old_dst). Note that sk_dst_reset(sk) is implementing this protocol correctly, while __dst_negative_advice() uses the wrong order. Given that ip6_negative_advice() has special logic against RTF_CACHE, this means each of the three ->negative_advice() existing methods must perform the sk_dst_reset() themselves. Note the check against NULL dst is centralized in __dst_negative_advice(), there is no need to duplicate it in various callbacks. Many thanks to Clement Lecigne for tracking this issue. This old bug became visible after the blamed commit, using UDP sockets. Fixes: a87cb3e48ee8 ("net: Facility to report route quality of connected sockets") Reported-by: Clement Lecigne <[email protected]> Diagnosed-by: Clement Lecigne <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Tom Herbert <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-29tcp: add tcp_done_with_error() helperEric Dumazet1-0/+1
tcp_reset() ends with a sequence that is carefuly ordered. We need to fix [e]poll bugs in the following patches, it makes sense to use a common helper. Suggested-by: Neal Cardwell <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Neal Cardwell <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-29wifi: nl80211: clean up coalescing rule handlingJohannes Berg1-1/+1
There's no need to allocate a tiny struct and then an array again, just allocate the two together and use __counted_by(). Also unify the freeing. Reviewed-by: Miriam Rachel Korenblit <[email protected]> Link: https://msgid.link/20240523120213.48a40cfb96f9.Ia02bf8f8fefbf533c64c5fa26175848d4a3a7899@changeid Signed-off-by: Johannes Berg <[email protected]>
2024-05-28Merge tag 'for-netdev' of ↵Jakub Kicinski1-2/+2
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-05-28 We've added 23 non-merge commits during the last 11 day(s) which contain a total of 45 files changed, 696 insertions(+), 277 deletions(-). The main changes are: 1) Rename skb's mono_delivery_time to tstamp_type for extensibility and add SKB_CLOCK_TAI type support to bpf_skb_set_tstamp(), from Abhishek Chauhan. 2) Add netfilter CT zone ID and direction to bpf_ct_opts so that arbitrary CT zones can be used from XDP/tc BPF netfilter CT helper functions, from Brad Cowie. 3) Several tweaks to the instruction-set.rst IETF doc to address the Last Call review comments, from Dave Thaler. 4) Small batch of riscv64 BPF JIT optimizations in order to emit more compressed instructions to the JITed image for better icache efficiency, from Xiao Wang. 5) Sort bpftool C dump output from BTF, aiming to simplify vmlinux.h diffing and forcing more natural type definitions ordering, from Mykyta Yatsenko. 6) Use DEV_STATS_INC() macro in BPF redirect helpers to silence a syzbot/KCSAN race report for the tx_errors counter, from Jiang Yunshui. 7) Un-constify bpf_func_info in bpftool to fix compilation with LLVM 17+ which started treating const structs as constants and thus breaking full BTF program name resolution, from Ivan Babrou. 8) Fix up BPF program numbers in test_sockmap selftest in order to reduce some of the test-internal array sizes, from Geliang Tang. 9) Small cleanup in Makefile.btf script to use test-ge check for v1.25-only pahole, from Alan Maguire. 10) Fix bpftool's make dependencies for vmlinux.h in order to avoid needless rebuilds in some corner cases, from Artem Savkov. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits) bpf, net: Use DEV_STAT_INC() bpf, docs: Fix instruction.rst indentation bpf, docs: Clarify call local offset bpf, docs: Add table captions bpf, docs: clarify sign extension of 64-bit use of 32-bit imm bpf, docs: Use RFC 2119 language for ISA requirements bpf, docs: Move sentence about returning R0 to abi.rst bpf: constify member bpf_sysctl_kern:: Table riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT riscv, bpf: Use STACK_ALIGN macro for size rounding up riscv, bpf: Optimize zextw insn with Zba extension selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets net: Add additional bit to support clockid_t timestamp type net: Rename mono_delivery_time to tstamp_type for scalabilty selftests/bpf: Update tests for new ct zone opts for nf_conntrack kfuncs net: netfilter: Make ct zone opts configurable for bpf ct helpers selftests/bpf: Fix prog numbers in test_sockmap bpf: Remove unused variable "prev_state" bpftool: Un-const bpf_func_info to fix it for llvm 17 and newer bpf: Fix order of args in call to bpf_map_kvcalloc ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>