aboutsummaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2023-09-25wifi: mac80211: allow for_each_sta_active_link() under RCUJohannes Berg1-1/+1
Since we only use this to protect the dereference and with STA mutex, we can also allow this with just RCU. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230920211508.73c3e04985f4.I52ef396d693e0e381a73eade06850137d8900948@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-09-25wifi: mac80211: relax RCU check in for_each_vif_active_link()Johannes Berg1-1/+1
To iterate the vif links we don't necessarily need to be in an RCU critical section, it's also possible to hold the sdata/wdev mutex. Annotate for_each_vif_active_link() accordingly. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230920211508.858921bd2860.I01f456be8ce2a4fbd15e0d44302e2f7d72e91987@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-09-25wifi: cfg80211: split struct cfg80211_ap_settingsJohannes Berg1-1/+17
Using the full struct cfg80211_ap_settings for an update is misleading, since most settings cannot be updated. Split the update case off into a new struct cfg80211_ap_update. Change-Id: I3ba4dd9280938ab41252f145227a7005edf327e4 Signed-off-by: Johannes Berg <[email protected]>
2023-09-25wifi: mac80211: ethtool: always hold wiphy mutexJohannes Berg1-0/+4
Drivers should really be able to rely on the wiphy mutex being held all the time, unless otherwise documented. For ethtool, that wasn't quite right. Fix and clarify this in both code and documentation. Reported-by: [email protected] Fixes: 0e8185ce1dde ("wifi: mac80211: check wiphy mutex in ops") Signed-off-by: Johannes Berg <[email protected]>
2023-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni1-3/+4
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <[email protected]>
2023-09-20Bluetooth: hci_core: Fix build warningsLuiz Augusto von Dentz1-1/+1
This fixes the following warnings: net/bluetooth/hci_core.c: In function ‘hci_register_dev’: net/bluetooth/hci_core.c:2620:54: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 5 [-Wformat-truncation=] 2620 | snprintf(hdev->name, sizeof(hdev->name), "hci%d", id); | ^~ net/bluetooth/hci_core.c:2620:50: note: directive argument in the range [0, 2147483647] 2620 | snprintf(hdev->name, sizeof(hdev->name), "hci%d", id); | ^~~~~~~ net/bluetooth/hci_core.c:2620:9: note: ‘snprintf’ output between 5 and 14 bytes into a destination of size 8 2620 | snprintf(hdev->name, sizeof(hdev->name), "hci%d", id); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2023-09-20netfilter: nf_tables: fix memleak when more than 255 elements expiredFlorian Westphal1-1/+1
When more than 255 elements expired we're supposed to switch to a new gc container structure. This never happens: u8 type will wrap before reaching the boundary and nft_trans_gc_space() always returns true. This means we recycle the initial gc container structure and lose track of the elements that came before. While at it, don't deref 'gc' after we've passed it to call_rcu. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Reported-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Florian Westphal <[email protected]>
2023-09-19ipv6: lockless IPV6_ADDR_PREFERENCES implementationEric Dumazet2-16/+9
We have data-races while reading np->srcprefs Switch the field to a plain byte, add READ_ONCE() and WRITE_ONCE() annotations where needed, and IPV6_ADDR_PREFERENCES setsockopt() can now be lockless. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-09-18wifi: cfg80211: save power spectral density(psd) of regulatory ruleWen Gong2-1/+6
6 GHz regulatory domains introduces Power Spectral Density (PSD). The PSD value of the regulatory rule should be taken into effect for the ieee80211_channels falling into that particular regulatory rule. Save the values in the channel which has PSD value and add nl80211 attributes accordingly to handle it. Co-developed-by: Aditya Kumar Singh <[email protected]> Signed-off-by: Aditya Kumar Singh <[email protected]> Signed-off-by: Wen Gong <[email protected]> Link: https://lore.kernel.org/r/[email protected] [use hole in chan flags, reword docs] Signed-off-by: Johannes Berg <[email protected]>
2023-09-17Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2-4/+17
Alexei Starovoitov says: ==================== The following pull-request contains BPF updates for your *net-next* tree. We've added 73 non-merge commits during the last 9 day(s) which contain a total of 79 files changed, 5275 insertions(+), 600 deletions(-). The main changes are: 1) Basic BTF validation in libbpf, from Andrii Nakryiko. 2) bpf_assert(), bpf_throw(), exceptions in bpf progs, from Kumar Kartikeya Dwivedi. 3) next_thread cleanups, from Oleg Nesterov. 4) Add mcpu=v4 support to arm32, from Puranjay Mohan. 5) Add support for __percpu pointers in bpf progs, from Yonghong Song. 6) Fix bpf tailcall interaction with bpf trampoline, from Leon Hwang. 7) Raise irq_work in bpf_mem_alloc while irqs are disabled to improve refill probabablity, from Hou Tao. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Alan Maguire, Andrey Konovalov, Dave Marchevsky, "Eric W. Biederman", Jiri Olsa, Maciej Fijalkowski, Quentin Monnet, Russell King (Oracle), Song Liu, Stanislav Fomichev, Yonghong Song ==================== Signed-off-by: David S. Miller <[email protected]>
2023-09-17devlink: introduce possibility to expose info about nested devlinksJiri Pirko1-0/+2
In mlx5, there is a devlink instance created for PCI device. Also, one separate devlink instance is created for auxiliary device that represents the netdev of uplink port. This relation is currently invisible to the devlink user. Benefit from the rel infrastructure and allow for nested devlink instance to set the relationship for the nested-in devlink instance. Note that there may be many nested instances, therefore use xarray to hold the list of rel_indexes for individual nested instances. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-17devlink: convert linecard nested devlink to new rel infrastructureJiri Pirko1-2/+2
Benefit from the newly introduced rel infrastructure, treat the linecard nested devlink instances in the same way as port function instances. Convert the code to use the rel infrastructure. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-17devlink: expose peer SF devlink instanceJiri Pirko1-0/+3
Introduce a new helper devl_port_fn_devlink_set() to be used by driver assigning a devlink instance to the peer devlink port function. Expose this to user over new netlink attribute nested under port function nest to expose devlink handle related to the port function. This is particularly helpful for user to understand the relationship between devlink instances created for SFs and the port functions they belong to. Note that caller of devlink_port_notify() needs to hold devlink instance lock, put the assertion to devl_port_fn_devlink_set() to make this requirement explicit. Also note the limitations that only allow to make this assignment for registered objects. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15bpf: expose information about supported xdp metadata kfuncStanislav Fomichev1-1/+4
Add new xdp-rx-metadata-features member to netdev netlink which exports a bitmask of supported kfuncs. Most of the patch is autogenerated (headers), the only relevant part is netdev.yaml and the changes in netdev-genl.c to marshal into netlink. Example output on veth: $ ip link add veth0 type veth peer name veth1 # ifndex == 12 $ ./tools/net/ynl/samples/netdev 12 Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12 veth1[12] xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0 Cc: [email protected] Cc: Willem de Bruijn <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-09-15bpf: make it easier to add new metadata kfuncStanislav Fomichev1-4/+12
No functional changes. Instead of having hand-crafted code in bpf_dev_bound_resolve_kfunc, move kfunc <> xmo handler relationship into XDP_METADATA_KFUNC_xxx. This way, any time new kfunc is added, we don't have to touch bpf_dev_bound_resolve_kfunc. Also document XDP_METADATA_KFUNC_xxx arguments since we now have more than two and it might be confusing what is what. Cc: [email protected] Cc: Willem de Bruijn <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-09-15xsk: add multi-buffer support for sockets sharing umemTirthendu Sarkar1-0/+2
Userspace applications indicate their multi-buffer capability to xsk using XSK_USE_SG socket bind flag. For sockets using shared umem the bind flag may contain XSK_USE_SG only for the first socket. For any subsequent socket the only option supported is XDP_SHARED_UMEM. Add option XDP_UMEM_SG_FLAG in umem config flags to store the multi-buffer handling capability when indicated by XSK_USE_SG option in bing flag by the first socket. Use this to derive multi-buffer capability for subsequent sockets in xsk core. Signed-off-by: Tirthendu Sarkar <[email protected]> Fixes: 81470b5c3c66 ("xsk: introduce XSK_USE_SG bind flag for xsk socket") Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-09-15Merge tag 'nf-23-09-13' of ↵David S. Miller1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf netfilter pull request 23-09-13 ==================== The following patchset contains Netfilter fixes for net: 1) Do not permit to remove rules from chain binding, otherwise double rule release is possible, triggering UaF. This rule deletion support does not make sense and userspace does not use this. Problem exists since the introduction of chain binding support. 2) rbtree GC worker only collects the elements that have expired. This operation is not destructive, therefore, turn write into read spinlock to avoid datapath contention due to GC worker run. This was not fixed in the recent GC fix batch in the 6.5 cycle. 3) pipapo set backend performs sync GC, therefore, catchall elements must use sync GC queue variant. This bug was introduced in the 6.5 cycle with the recent GC fixes. 4) Stop GC run if memory allocation fails in pipapo set backend, otherwise access to NULL pointer to GC transaction object might occur. This bug was introduced in the 6.5 cycle with the recent GC fixes. 5) rhash GC run uses an iterator that might hit EAGAIN to rewind, triggering double-collection of the same element. This bug was introduced in the 6.5 cycle with the recent GC fixes. 6) Do not permit to remove elements in anonymous sets, this type of sets are populated once and then bound to rules. This fix is similar to the chain binding patch coming first in this batch. API permits since the very beginning but it has no use case from userspace. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_FLOWINFO_SEND implementationEric Dumazet1-0/+1
np->sndflow reads are racy. Use one bit ftom atomic inet->inet_flags instead, IPV6_FLOWINFO_SEND setsockopt() can be lockless. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_MTU_DISCOVER implementationEric Dumazet1-5/+9
Most np->pmtudisc reads are racy. Move this 3bit field on a full byte, add annotations and make IPV6_MTU_DISCOVER setsockopt() lockless. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_ROUTER_ALERT_ISOLATE implementationEric Dumazet1-0/+1
Reads from np->rtalert_isolate are racy. Move this flag to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: move np->repflow to atomic flagsEric Dumazet1-0/+1
Move np->repflow to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_RECVERR implemetationEric Dumazet2-3/+2
np->recverr is moved to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_DONTFRAG implementationEric Dumazet3-4/+5
Move np->dontfrag flag to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_AUTOFLOWLABEL implementationEric Dumazet2-1/+3
Move np->autoflowlabel and np->autoflowlabel_set in inet->inet_flags, to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_MULTICAST_ALL implementationEric Dumazet1-0/+1
Move np->mc_all to an atomic flags to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_RECVERR_RFC4884 implementationEric Dumazet1-0/+1
Move np->recverr_rfc4884 to an atomic flag to fix data-races. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_MULTICAST_HOPS implementationEric Dumazet1-1/+1
This fixes data-races around np->mcast_hops, and make IPV6_MULTICAST_HOPS lockless. Note that np->mcast_hops is never negative, thus can fit an u8 field instead of s16. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_MULTICAST_LOOP implementationEric Dumazet2-1/+2
Add inet6_{test|set|clear|assign}_bit() helpers. Note that I am using bits from inet->inet_flags, this might change in the future if we need more flags. While solving data-races accessing np->mc_loop, this patch also allows to implement lockless accesses to np->mcast_hops in the following patch. Also constify sk_mc_loop() argument. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-15ipv6: lockless IPV6_UNICAST_HOPS implementationEric Dumazet1-1/+1
Some np->hop_limit accesses are racy, when socket lock is not held. Add missing annotations and switch to full lockless implementation. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni1-1/+6
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14udplite: fix various data-racesEric Dumazet1-5/+9
udp->pcflag, udp->pcslen and udp->pcrlen reads/writes are racy. Move udp->pcflag to udp->udp_flags for atomicity, and add READ_ONCE()/WRITE_ONCE() annotations for pcslen and pcrlen. Fixes: ba4e58eca8aa ("[NET]: Supporting UDP-Lite (RFC 3828) in Linux") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GROEric Dumazet1-6/+3
Move udp->encap_enabled to udp->udp_flags. Add udp_test_and_set_bit() helper to allow lockless udp_tunnel_encap_enable() implementation. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-14wifi: cfg80211: fix kernel-doc for wiphy_delayed_work_flush()Johannes Berg1-1/+1
Now that I fixed the function name, we see the parameters are wrong as well. Fix that too. Change-Id: I1a4cfea446875998a5a242ca36acc8244991a199 Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2023-09-13wifi: mac80211: Sanity check tx bitrate if not provided by driverStephen Douthit1-0/+5
If the driver doesn't fill NL80211_STA_INFO_TX_BITRATE in sta_set_sinfo() then as a fallback sta->deflink.tx_stats.last_rate is used. Unfortunately there's no guarantee that this has actually been set before it's used. Originally found when 'iw <dev> link' would always return a tx rate of 6Mbps regardless of actual link speed for the QCA9337 running firmware WLAN.TF.2.1-00021-QCARMSWP-1 in my netbook. Use the sanity check logic from ieee80211_fill_rx_status() and refactor that to use the new inline function. Signed-off-by: Stephen Douthit <[email protected]> Link: https://lore.kernel.org/r/[email protected] [change to bool ..._rate_valid() instead of int ..._rate_invalid()] Signed-off-by: Johannes Berg <[email protected]>
2023-09-13wifi: cfg80211: export DFS CAC time and usable state helper functionsAditya Kumar Singh1-0/+24
cfg80211 has cfg80211_chandef_dfs_usable() function to know whether at least one channel in the chandef is in usable state or not. Also, cfg80211_chandef_dfs_cac_time() function is there which tells the CAC time required for the given chandef. Make these two functions visible to drivers by exporting their symbol to global list of kernel symbols. Lower level drivers can make use of these two functions to be aware if CAC is required on the given chandef and for how long. For example drivers which maintains the CAC state internally can make use of these. Signed-off-by: Aditya Kumar Singh <[email protected]> Reviewed-by: Jeff Johnson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2023-09-13wifi: cfg80211: call reg_call_notifier on beacon hintsAbhishek Kumar1-0/+3
Currently the channel property updates are not propagated to driver. This causes issues in the discovery of hidden SSIDs and fails to connect to them. This change defines a new wiphy flag which when enabled by vendor driver, the reg_call_notifier callback will be trigger on beacon hints. This ensures that the channel property changes are visible to the vendor driver. The vendor changes the channels for active scans. This fixes the discovery issue of hidden SSID. Signed-off-by: Abhishek Kumar <[email protected]> Link: https://lore.kernel.org/r/20230629035254.1.I059fe585f9f9e896c2d51028ef804d197c8c009e@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-09-13wifi: cfg80211: modify prototype for change_beaconAloka Dixit1-1/+1
Modify the prototype for change_beacon() in struct cfg80211_op to accept cfg80211_ap_settings instead of cfg80211_beacon_data so that it can process data in addition to beacons. Modify the prototypes of ieee80211_change_beacon() and driver specific functions accordingly. Signed-off-by: Aloka Dixit <[email protected]> Reviewed-by: Jeff Johnson <[email protected]> Link: https://lore.kernel.org/r/[email protected] [while at it, remove pointless "if (info)" check in tracing that just makes all the lines longer than they need be - it's never NULL] Signed-off-by: Johannes Berg <[email protected]>
2023-09-13wifi: nl80211: fixes to FILS discovery updatesAloka Dixit1-0/+4
Add a new flag 'update' which is set to true during start_ap() if (and only if) one of the following two conditions are met: - Userspace passed an empty nested attribute which indicates that the feature should be disabled and templates deleted. - Userspace passed all the parameters for the nested attribute. Existing configuration will not be changed while the flag remains false. Add similar changes for unsolicited broadcast probe response transmission. Signed-off-by: Aloka Dixit <[email protected]> Reviewed-by: Jeff Johnson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2023-09-13wifi: cfg80211: remove scan_width supportJohannes Berg1-62/+1
There really isn't any support for scanning at different channel widths than 20 MHz since there's no way to set it. Remove this support for now, if somebody wants to maintain this whole thing later we can revisit how it should work. Signed-off-by: Johannes Berg <[email protected]>
2023-09-13wifi: cfg80211: add missing kernel-doc for cqm_rssi_workJohannes Berg1-0/+1
As reported by Stephen, I neglected to add the kernel-doc for the new struct member. Fix that. Reported-by: Stephen Rothwell <[email protected]> Fixes: 37c20b2effe9 ("wifi: cfg80211: fix cqm_config access race") Signed-off-by: Johannes Berg <[email protected]>
2023-09-13wifi: cfg80211: fix kernel-doc for wiphy_delayed_work_flush()Johannes Berg1-1/+1
Clearly, there's no space in the function name, not sure how that could've happened. Put the underscore that it should be. Reported-by: Stephen Rothwell <[email protected]> Fixes: 56cfb8ce1f7f ("wifi: cfg80211: add flush functions for wiphy work") Signed-off-by: Johannes Berg <[email protected]>
2023-09-13xfrm: fix a data-race in xfrm_gen_index()Eric Dumazet1-0/+1
xfrm_gen_index() mutual exclusion uses net->xfrm.xfrm_policy_lock. This means we must use a per-netns idx_generator variable, instead of a static one. Alternative would be to use an atomic variable. syzbot reported: BUG: KCSAN: data-race in xfrm_sk_policy_insert / xfrm_sk_policy_insert write to 0xffffffff87005938 of 4 bytes by task 29466 on cpu 0: xfrm_gen_index net/xfrm/xfrm_policy.c:1385 [inline] xfrm_sk_policy_insert+0x262/0x640 net/xfrm/xfrm_policy.c:2347 xfrm_user_policy+0x413/0x540 net/xfrm/xfrm_state.c:2639 do_ipv6_setsockopt+0x1317/0x2ce0 net/ipv6/ipv6_sockglue.c:943 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 rawv6_setsockopt+0x21e/0x410 net/ipv6/raw.c:1054 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffffffff87005938 of 4 bytes by task 29460 on cpu 1: xfrm_sk_policy_insert+0x13e/0x640 xfrm_user_policy+0x413/0x540 net/xfrm/xfrm_state.c:2639 do_ipv6_setsockopt+0x1317/0x2ce0 net/ipv6/ipv6_sockglue.c:943 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 rawv6_setsockopt+0x21e/0x410 net/ipv6/raw.c:1054 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00006ad8 -> 0x00006b18 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 29460 Comm: syz-executor.1 Not tainted 6.5.0-rc5-syzkaller-00243-g9106536c1aa3 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Fixes: 1121994c803f ("netns xfrm: policy insertion in netns") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Steffen Klassert <[email protected]> Cc: Herbert Xu <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2023-09-13tcp: Fix bind() regression for v4-mapped-v6 wildcard address.Kuniyuki Iwashima1-0/+5
Andrei Vagin reported bind() regression with strace logs. If we bind() a TCPv6 socket to ::FFFF:0.0.0.0 and then bind() a TCPv4 socket to 127.0.0.1, the 2nd bind() should fail but now succeeds. from socket import * s1 = socket(AF_INET6, SOCK_STREAM) s1.bind(('::ffff:0.0.0.0', 0)) s2 = socket(AF_INET, SOCK_STREAM) s2.bind(('127.0.0.1', s1.getsockname()[1])) During the 2nd bind(), if tb->family is AF_INET6 and sk->sk_family is AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check if tb has the v4-mapped-v6 wildcard address. The example above does not work after commit 5456262d2baa ("net: Fix incorrect address comparison when searching for a bind2 bucket"), but the blamed change is not the commit. Before the commit, the leading zeros of ::FFFF:0.0.0.0 were treated as 0.0.0.0, and the sequence above worked by chance. Technically, this case has been broken since bhash2 was introduced. Note that if we bind() two sockets to 127.0.0.1 and then ::FFFF:0.0.0.0, the 2nd bind() fails properly because we fall back to using bhash to detect conflicts for the v4-mapped-v6 address. Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Reported-by: Andrei Vagin <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-09-12tcp: defer regular ACK while processing socket backlogEric Dumazet1-0/+1
This idea came after a particular workload requested the quickack attribute set on routes, and a performance drop was noticed for large bulk transfers. For high throughput flows, it is best to use one cpu running the user thread issuing socket system calls, and a separate cpu to process incoming packets from BH context. (With TSO/GRO, bottleneck is usually the 'user' cpu) Problem is the user thread can spend a lot of time while holding the socket lock, forcing BH handler to queue most of incoming packets in the socket backlog. Whenever the user thread releases the socket lock, it must first process all accumulated packets in the backlog, potentially adding latency spikes. Due to flood mitigation, having too many packets in the backlog increases chance of unexpected drops. Backlog processing unfortunately shifts a fair amount of cpu cycles from the BH cpu to the 'user' cpu, thus reducing max throughput. This patch takes advantage of the backlog processing, and the fact that ACK are mostly cumulative. The idea is to detect we are in the backlog processing and defer all eligible ACK into a single one, sent from tcp_release_cb(). This saves cpu cycles on both sides, and network resources. Performance of a single TCP flow on a 200Gbit NIC: - Throughput is increased by 20% (100Gbit -> 120Gbit). - Number of generated ACK per second shrinks from 240,000 to 40,000. - Number of backlog drops per second shrinks from 230 to 0. Benchmark context: - Regular netperf TCP_STREAM (no zerocopy) - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids) - MAX_SKB_FRAGS = 17 (~60KB per GRO packet) This feature is guarded by a new sysctl, and enabled by default: /proc/sys/net/ipv4/tcp_backlog_ack_defer Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Acked-by: Neal Cardwell <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Acked-by: Dave Taht <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-12net: sock_release_ownership() cleanupEric Dumazet1-5/+4
sock_release_ownership() should only be called by user owning the socket lock. After prior commit, we can remove one condition. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-09-12ipv6: fix ip6_sock_set_addr_preferences() typoEric Dumazet1-1/+1
ip6_sock_set_addr_preferences() second argument should be an integer. SUNRPC attempts to set IPV6_PREFER_SRC_PUBLIC were translated to IPV6_PREFER_SRC_TMP Fixes: 18d5ad623275 ("ipv6: add ip6_sock_set_addr_preferences") Signed-off-by: Eric Dumazet <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-09-12net: dst: remove unnecessary input parameter in dst_alloc and dst_initZhengchao Shao1-2/+2
Since commit 1202cdd66531("Remove DECnet support from kernel") has been merged, all callers pass in the initial_ref value of 1 when they call dst_alloc(). Therefore, remove initial_ref when the dst_alloc() is declared and replace initial_ref with 1 in dst_alloc(). Also when all callers call dst_init(), the value of initial_ref is 1. Therefore, remove the input parameter initial_ref of the dst_init() and replace initial_ref with the value 1 in dst_init. Signed-off-by: Zhengchao Shao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-09-11wifi: cfg80211: fix cqm_config access raceJohannes Berg1-1/+2
Max Schulze reports crashes with brcmfmac. The reason seems to be a race between userspace removing the CQM config and the driver calling cfg80211_cqm_rssi_notify(), where if the data is freed while cfg80211_cqm_rssi_notify() runs it will crash since it assumes wdev->cqm_config is set. This can't be fixed with a simple non-NULL check since there's nothing we can do for locking easily, so use RCU instead to protect the pointer, but that requires pulling the updates out into an asynchronous worker so they can sleep and call back into the driver. Since we need to change the free anyway, also change it to go back to the old settings if changing the settings fails. Reported-and-tested-by: Max Schulze <[email protected]> Closes: https://lore.kernel.org/r/[email protected] Fixes: 4a4b8169501b ("cfg80211: Accept multiple RSSI thresholds for CQM") Signed-off-by: Johannes Berg <[email protected]>
2023-09-11wifi: cfg80211: add ieee80211_fragment_element to public APIBenjamin Berg1-0/+12
This function will be used by the kunit tests within cfg80211. As it is generally useful, move it from mac80211 to cfg80211. Signed-off-by: Benjamin Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230827135854.5af9391659f5.Ie534ed6591ba02be8572d4d7242394f29e3af04b@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-09-11wifi: mac80211: add support for mld in ieee80211_chswitch_doneEmmanuel Grumbach1-2/+6
This allows to finalize the CSA per link. In case the switch didn't work, tear down the MLD connection. Also pass the ieee80211_bss_conf to post_channel_switch to let the driver know which link completed the switch. Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230828130311.3d3eacc88436.Ic2d14e2285aa1646216a56806cfd4a8d0054437c@changeid Signed-off-by: Johannes Berg <[email protected]>