blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-06-22	Merge tag 'nf-23-06-21' of ↵	Paolo Abeni	1	-2/+29
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net This is v3, including a crash fix for patch 01/14. The following patchset contains Netfilter/IPVS fixes for net: 1) Fix UDP segmentation with IPVS tunneled traffic, from Terin Stock. 2) Fix chain binding transaction logic, add a bound flag to rule transactions. Remove incorrect logic in nft_data_hold() and nft_data_release(). 3) Add a NFT_TRANS_PREPARE_ERROR deactivate state to deal with releasing the set/chain as a follow up to 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE") 4) Drop map element references from preparation phase instead of set destroy path, otherwise bogus EBUSY with transactions such as: flush chain ip x y delete chain ip x w where chain ip x y contains jump/goto from set elements. 5) Pipapo set type does not regard generation mask from the walk iteration. 6) Fix reference count underflow in set element reference to stateful object. 7) Several patches to tighten the nf_tables API: - disallow set element updates of bound anonymous set - disallow unbound anonymous set/chain at the end of transaction. - disallow updates of anonymous set. - disallow timeout configuration for anonymous sets. 8) Fix module reference leak in chain updates. 9) Fix nfnetlink_osf module autoload. 10) Fix deletion of basechain when NFTA_CHAIN_HOOK is specified as in iptables-nft. This Netfilter batch is larger than usual at this stage, I am aware we are fairly late in the -rc cycle, if you prefer to route them through net-next, please let me know. netfilter pull request 23-06-21 * tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: Fix for deleting base chains with payload netfilter: nfnetlink_osf: fix module autoload netfilter: nf_tables: drop module reference after updating chain netfilter: nf_tables: disallow timeout for anonymous sets netfilter: nf_tables: disallow updates of anonymous sets netfilter: nf_tables: reject unbound chain set before commit phase netfilter: nf_tables: reject unbound anonymous set before commit phase netfilter: nf_tables: disallow element updates of bound anonymous sets netfilter: nf_tables: fix underflow in object reference counter netfilter: nft_set_pipapo: .walk does not deal with generations netfilter: nf_tables: drop map element references from preparation phase netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain netfilter: nf_tables: fix chain binding transaction logic ipvs: align inner_mac_header for encapsulation ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-06-21	wifi: cfg80211: Retrieve PSD information from RNR AP information	Ilan Peer	1	-0/+2
	Retrieve the Power Spectral Density (PSD) value from RNR AP information entry and store it so it could be used by the drivers. PSD value is explained in Section 9.4.2.170 of Draft P802.11Revme_D2.0. Signed-off-by: Ilan Peer <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230619161906.067ded2b8fc3.I9f407ab5800cbb07045a0537a513012960ced740@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-21	wifi: mac80211: fix documentation config reference	Johannes Berg	1	-1/+1
	We shouldn't refer to CPTCFG_, that's for backports, in mainline that's just CONFIG_. Fix it. Signed-off-by: Johannes Berg <[email protected]>
2023-06-20	mctp: Reorder fields in 'struct mctp_route'	Christophe JAILLET	1	-2/+2
	Group some variables based on their sizes to reduce hole and avoid padding. On x86_64, this shrinks the size of 'struct mctp_route' from 72 to 64 bytes. It saves a few bytes of memory and is more cache-line friendly. Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Jeremy Kerr <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/393ad1a5aef0aa28d839eeb3d7477da0e0eeb0b0.1687080803.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-20	netfilter: nf_tables: reject unbound anonymous set before commit phase	Pablo Neira Ayuso	1	-0/+3
	Add a new list to track set transaction and to check for unbound anonymous sets before entering the commit phase. Bail out at the end of the transaction handling if an anonymous set remains unbound. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2023-06-20	netfilter: nf_tables: drop map element references from preparation phase	Pablo Neira Ayuso	1	-1/+4
	set .destroy callback releases the references to other objects in maps. This is very late and it results in spurious EBUSY errors. Drop refcount from the preparation phase instead, update set backend not to drop reference counter from set .destroy path. Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the reference counter because the transaction abort path releases the map references for each element since the set is unbound. The abort path also deals with releasing reference counter for new elements added to unbound sets. Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2023-06-20	netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain	Pablo Neira Ayuso	1	-0/+2
	Add a new state to deal with rule expressions deactivation from the newrule error path, otherwise the anonymous set remains in the list in inactive state for the next generation. Mark the set/chain transaction as unbound so the abort path releases this object, set it as inactive in the next generation so it is not reachable anymore from this transaction and reference counter is dropped. Fixes: 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2023-06-20	netfilter: nf_tables: fix chain binding transaction logic	Pablo Neira Ayuso	1	-1/+20
	Add bound flag to rule and chain transactions as in 6a0a8d10a366 ("netfilter: nf_tables: use-after-free in failing rule with bound set") to skip them in case that the chain is already bound from the abort path. This patch fixes an imbalance in the chain use refcnt that triggers a WARN_ON on the table and chain destroy path. This patch also disallows nested chain bindings, which is not supported from userspace. The logic to deal with chain binding in nft_data_hold() and nft_data_release() is not correct. The NFT_TRANS_PREPARE state needs a special handling in case a chain is bound but next expressions in the same rule fail to initialize as described by 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE"). The chain is left bound if rule construction fails, so the objects stored in this chain (and the chain itself) are released by the transaction records from the abort path, follow up patch ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain") completes this error handling. When deleting an existing rule, chain bound flag is set off so the rule expression .destroy path releases the objects. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2023-06-20	Merge tag 'ipsec-2023-06-20' of ↵	David S. Miller	1	-0/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec ipsec-2023-06-20
2023-06-20	net: dsa: introduce preferred_default_local_cpu_port and use on MT7530	Vladimir Oltean	1	-0/+8
	Since the introduction of the OF bindings, DSA has always had a policy that in case multiple CPU ports are present in the device tree, the numerically smallest one is always chosen. The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because it has higher bandwidth. The MT7530 driver developers had 3 options: - to modify DSA when the MT7531 switch support was introduced, such as to prefer the better port - to declare both CPU ports in device trees as CPU ports, and live with the sub-optimal performance resulting from not preferring the better port - to declare just port 6 in the device tree as a CPU port Of course they chose the path of least resistance (3rd option), kicking the can down the road. The hardware description in the device tree is supposed to be stable - developers are not supposed to adopt the strategy of piecemeal hardware description, where the device tree is updated in lockstep with the features that the kernel currently supports. Now, as a result of the fact that they did that, any attempts to modify the device tree and describe both CPU ports as CPU ports would make DSA change its default selection from port 6 to 5, effectively resulting in a performance degradation visible to users with the MT7531BE switch as can be seen below. Without preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 374 MBytes 157 Mbits/sec 734 sender [ 5][TX-C] 0.00-20.00 sec 373 MBytes 156 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 778 Mbits/sec 0 sender [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 777 Mbits/sec receiver With preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 856 Mbits/sec 273 sender [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 855 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.72 GBytes 737 Mbits/sec 15 sender [ 7][RX-C] 0.00-20.00 sec 1.71 GBytes 736 Mbits/sec receiver Using one port for WAN and the other ports for LAN is a very popular use case which is what this test emulates. As such, this change proposes that we retroactively modify stable kernels (which don't support the modification of the CPU port assignments, so as to let user space fix the problem and restore the throughput) to keep the mt7530 driver preferring port 6 even with device trees where the hardware is more fully described. Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Arınç ÜNAL <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-19	ipv6: rpl: Remove pskb(_may)?_pull() in ipv6_rpl_srh_rcv().	Kuniyuki Iwashima	1	-3/+0
	As Eric Dumazet pointed out [0], ipv6_rthdr_rcv() pulls these data - Segment Routing Header : 8 - Hdr Ext Len : skb_transport_header(skb)[1] << 3 needed by ipv6_rpl_srh_rcv(). We can remove pskb_may_pull() and replace pskb_pull() with skb_pull() in ipv6_rpl_srh_rcv(). Link: https://lore.kernel.org/netdev/CANn89iLboLwLrHXeHJucAqBkEL_S0rJFog68t7wwwXO-aNf5Mg@mail.gmail.com/ [0] Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-19	xsk: Remove unused inline function xsk_buff_discard()	YueHaibing	1	-4/+0
	commit f2f167583601 ("xsk: Remove unused xsk_buff_discard") left behind this, remove it. Signed-off-by: YueHaibing <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-06-19	wifi: cfg80211/nl80211: Add support to indicate STA MLD setup links removal	Veerendranath Jakkam	1	-0/+13
	STA MLD setup links may get removed if AP MLD remove the corresponding affiliated APs with Multi-Link reconfiguration as described in P802.11be_D3.0, section 35.3.6.2.2 Removing affiliated APs. Currently, there is no support to notify such operation to cfg80211 and userspace. Add support for the drivers to indicate STA MLD setup links removal to cfg80211 and notify the same to userspace. Upon receiving such indication from the driver, clear the MLO links information of the removed links in the WDEV. Signed-off-by: Veerendranath Jakkam <[email protected]> Link: https://lore.kernel.org/r/[email protected] [rename function and attribute, fix kernel-doc] Signed-off-by: Johannes Berg <[email protected]>
2023-06-19	wifi: cfg80211: fix regulatory disconnect with OCB/NAN	Johannes Berg	1	-12/+1
	Since regulatory disconnect was added, OCB and NAN interface types were added, which made it completely unusable for any driver that allowed OCB/NAN. Add OCB/NAN (though NAN doesn't do anything, we don't have any info) and also remove all the logic that opts out, so it won't be broken again if/when new interface types are added. Fixes: 6e0bd6c35b02 ("cfg80211: 802.11p OCB mode handling") Fixes: cb3b7d87652a ("cfg80211: add start / stop NAN commands") Signed-off-by: Johannes Berg <[email protected]> Link: https://lore.kernel.org/r/20230616222844.2794d1625a26.I8e78a3789a29e6149447b3139df724a6f1b46fc3@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19	wifi: cfg80211: add element defragmentation helper	Benjamin Berg	1	-0/+22
	This is already needed within mac80211 and support is also needed by cfg80211 to parse ML elements. Signed-off-by: Benjamin Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230616094949.29c3ebeed10d.I009c049289dd0162c2e858ed8b68d2875a672ed6@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19	wifi: cfg80211: add inform_bss op to update BSS	Benjamin Berg	1	-0/+13
	This new function is called from within the inform_bss(_frame)_data functions in order for the driver to update data that it is tracking. Signed-off-by: Benjamin Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230616094949.8d7781b0f965.I80041183072b75c081996a1a5a230b34aff5c668@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19	wifi: cfg80211: make TDLS management link-aware	Mukesh Sisodiya	1	-3/+4
	For multi-link operation(MLO) TDLS management frames need to be transmitted on a specific link. The TDLS setup request will add BSSID along with peer address and userspace will pass the link-id based on BSSID value to the driver(or mac80211). Signed-off-by: Mukesh Sisodiya <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230616094948.cb3d87c22812.Ia3d15ac4a9a182145bf2d418bcb3ddf4539cd0a7@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-19	wifi: mac80211: Support disabled links during association	Ilan Peer	1	-2/+4
	When the association is complete, do not configure disabled links, and track them as part of the interface data. Signed-off-by: Ilan Peer <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230608163202.c194fabeb81a.Iaefdef5ba0492afe9a5ede14c68060a4af36e444@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-18	tcp: Use per-vma locking for receive zerocopy	Arjun Roy	1	-0/+1
	Per-VMA locking allows us to lock a struct vm_area_struct without taking the process-wide mmap lock in read mode. Consider a process workload where the mmap lock is taken constantly in write mode. In this scenario, all zerocopy receives are periodically blocked during that period of time - though in principle, the memory ranges being used by TCP are not touched by the operations that need the mmap write lock. This results in performance degradation. Now consider another workload where the mmap lock is never taken in write mode, but there are many TCP connections using receive zerocopy that are concurrently receiving. These connections all take the mmap lock in read mode, but this does induce a lot of contention and atomic ops for this process-wide lock. This results in additional CPU overhead caused by contending on the cache line for this lock. However, with per-vma locking, both of these problems can be avoided. As a test, I ran an RPC-style request/response workload with 4KB payloads and receive zerocopy enabled, with 100 simultaneous TCP connections. I measured perf cycles within the find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and without per-vma locking enabled. When using process-wide mmap semaphore read locking, about 1% of measured perf cycles were within this path. With per-VMA locking, this value dropped to about 0.45%. Signed-off-by: Arjun Roy <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-17	tcp: enforce receive buffer memory limits by allowing the tcp window to shrink	[email protected]	1	-0/+1
	Under certain circumstances, the tcp receive buffer memory limit set by autotuning (sk_rcvbuf) is increased due to incoming data packets as a result of the window not closing when it should be. This can result in the receive buffer growing all the way up to tcp_rmem[2], even for tcp sessions with a low BDP. To reproduce: Connect a TCP session with the receiver doing nothing and the sender sending small packets (an infinite loop of socket send() with 4 bytes of payload with a sleep of 1 ms in between each send()). This will cause the tcp receive buffer to grow all the way up to tcp_rmem[2]. As a result, a host can have individual tcp sessions with receive buffers of size tcp_rmem[2], and the host itself can reach tcp_mem limits, causing the host to go into tcp memory pressure mode. The fundamental issue is the relationship between the granularity of the window scaling factor and the number of byte ACKed back to the sender. This problem has previously been identified in RFC 7323, appendix F [1]. The Linux kernel currently adheres to never shrinking the window. In addition to the overallocation of memory mentioned above, the current behavior is functionally incorrect, because once tcp_rmem[2] is reached when no remediations remain (i.e. tcp collapse fails to free up any more memory and there are no packets to prune from the out-of-order queue), the receiver will drop in-window packets resulting in retransmissions and an eventual timeout of the tcp session. A receive buffer full condition should instead result in a zero window and an indefinite wait. In practice, this problem is largely hidden for most flows. It is not applicable to mice flows. Elephant flows can send data fast enough to "overrun" the sk_rcvbuf limit (in a single ACK), triggering a zero window. But this problem does show up for other types of flows. Examples are websockets and other type of flows that send small amounts of data spaced apart slightly in time. In these cases, we directly encounter the problem described in [1]. RFC 7323, section 2.4 [2], says there are instances when a retracted window can be offered, and that TCP implementations MUST ensure that they handle a shrinking window, as specified in RFC 1122, section 4.2.2.16 [3]. All prior RFCs on the topic of tcp window management have made clear that sender must accept a shrunk window from the receiver, including RFC 793 [4] and RFC 1323 [5]. This patch implements the functionality to shrink the tcp window when necessary to keep the right edge within the memory limit by autotuning (sk_rcvbuf). This new functionality is enabled with the new sysctl: net.ipv4.tcp_shrink_window Additional information can be found at: https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/ [1] https://www.rfc-editor.org/rfc/rfc7323#appendix-F [2] https://www.rfc-editor.org/rfc/rfc7323#section-2.4 [3] https://www.rfc-editor.org/rfc/rfc1122#page-91 [4] https://www.rfc-editor.org/rfc/rfc793 [5] https://www.rfc-editor.org/rfc/rfc1323 Signed-off-by: Mike Freemon <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-17	net: sched: Remove unused qdisc_l2t()	YueHaibing	1	-14/+0
	This is unused since switch to psched_l2t_ns(). Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-15	net: ioctl: Use kernel memory on protocol ioctl callbacks	Breno Leitao	4	-3/+27
	Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl callback. This is not flexible, because it is hard to reuse these functions without passing userspace buffers. Change the "struct proto" ioctls to avoid touching userspace memory and operate on kernel buffers, i.e., all protocol's ioctl callbacks is adapted to operate on a kernel memory other than on userspace (so, no more {put,get}_user() and friends being called in the ioctl callback). This changes the "struct proto" ioctl format in the following way: int (ioctl)(struct sock sk, int cmd, - unsigned long arg); + int karg); (Important to say that this patch does not touch the "struct proto_ops" protocols) So, the "karg" argument, which is passed to the ioctl callback, is a pointer allocated to kernel space memory (inside a function wrapper). This buffer (karg) may contain input argument (copied from userspace in a prep function) and it might return a value/buffer, which is copied back to userspace if necessary. There is not one-size-fits-all format (that is I am using 'may' above), but basically, there are three type of ioctls: 1) Do not read from userspace, returns a result to userspace 2) Read an input parameter from userspace, and does not return anything to userspace 3) Read an input from userspace, and return a buffer to userspace. The default case (1) (where no input parameter is given, and an "int" is returned to userspace) encompasses more than 90% of the cases, but there are two other exceptions. Here is a list of exceptions: Protocol RAW: * cmd = SIOCGETVIFCNT: * input and output = struct sioc_vif_req * cmd = SIOCGETSGCNT * input and output = struct sioc_sg_req * Explanation: for the SIOCGETVIFCNT case, userspace passes the input argument, which is struct sioc_vif_req. Then the callback populates the struct, which is copied back to userspace. * Protocol RAW6: * cmd = SIOCGETMIFCNT_IN6 * input and output = struct sioc_mif_req6 * cmd = SIOCGETSGCNT_IN6 * input and output = struct sioc_sg_req6 * Protocol PHONET: * cmd == SIOCPNADDRESOURCE \| SIOCPNDELRESOURCE * input int (4 bytes) * Nothing is copied back to userspace. For the exception cases, functions sock_sk_ioctl_inout() will copy the userspace input, and copy it back to kernel space. The wrapper that prepare the buffer and put the buffer back to user is sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now calls sk_ioctl(), which will handle all cases. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-15	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	3	-2/+12
	Cross-merge networking fixes after downstream PR. Conflicts: include/linux/mlx5/driver.h 617f5db1a626 ("RDMA/mlx5: Fix affinity assignment") dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports") https://lore.kernel.org/all/[email protected]/ tools/testing/selftests/net/mptcp/mptcp_join.sh 47867f0a7e83 ("selftests: mptcp: join: skip check if MIB counter not supported") 425ba803124b ("selftests: mptcp: join: support RM_ADDR for used endpoints or not") 45b1a1227a7a ("mptcp: introduces more address related mibs") 0639fa230a21 ("selftests: mptcp: add explicit check for new mibs") https://lore.kernel.org/netdev/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net/ No adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-15	net: tls: make the offload check helper take skb not socket	Jakub Kicinski	1	-3/+5
	All callers of tls_is_sk_tx_device_offloaded() currently do an equivalent of: if (skb->sk && tls_is_skb_tx_device_offloaded(skb->sk)) Have the helper accept skb and do the skb->sk check locally. Two drivers have local static inlines with similar wrappers already. While at it change the ifdef condition to TLS_DEVICE. Only TLS_DEVICE selects SOCK_VALIDATE_XMIT, so the two are equivalent. This makes removing the duplicated IS_ENABLED() check in funeth more obviously correct. Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Tariq Toukan <[email protected]> Acked-by: Dimitris Michailidis <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-14	wifi: iwlwifi: mvm: support U-SIG EHT validate checks	Johannes Berg	1	-0/+2
	Support new firmware that can validate the validate bits in sniffer mode, and advertise that fact and the result of the checks in the U-SIG radiotap field. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230613155501.c20480aa1171.Icc0d077dae01d662ccb948823e196aa9c5c87976@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-14	wifi: mac80211: Do not use "non-MLD AP" syntax	Ilan Peer	1	-3/+6
	Instead clarify the cases where link ID == 0 is intended for an AP STA that is not part of an AP MLD. Signed-off-by: Ilan Peer <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230611121219.77236a2e26ad.I8193ca8e236c9eb015870471f77a7d5134da3156@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-14	wifi: cfg80211: Support association to AP MLD with disabled links	Ilan Peer	1	-1/+4
	An AP part of an AP MLD might be temporarily disabled, and might be enabled later. Such a link should be included in the association exchange, but should not be used until enabled. Extend the NL80211_CMD_ASSOCIATE to also indicate disabled links. Signed-off-by: Ilan Peer <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230608163202.c4c61ee4c4a5.I784ef4a0d619fc9120514b5615458fbef3b3684a@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-14	wifi: mac80211: Add getter functions for vif MLD state	Ilan Peer	1	-0/+21
	As a preparation to support disabled/dormant links, add the following function: - ieee80211_vif_usable_links(): returns the bitmap of the links that can be activated. Use this function in all the places that the bitmap of the usable links is needed. - ieee80211_vif_is_mld(): returns true iff the vif is an MLD. Use this function in all the places where an indication that the connection is a MLD is needed. Signed-off-by: Ilan Peer <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230608163202.86e3351da1fc.If6fe3a339fda2019f13f57ff768ecffb711b710a@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-14	wifi: mac80211: allow disabling SMPS debugfs controls	Miri Korenblit	1	-0/+3
	There are cases in which we don't want the user to override the smps mode, e.g. when SMPS should be disabled due to EMLSR. Add a driver flag to disable SMPS overriding and don't override if it is set. Signed-off-by: Miri Korenblit <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230608163202.ef129e80556c.I74a298fdc86b87074c95228d3916739de1400597@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-14	wifi: mac80211: add helpers to access sband iftype data	Johannes Berg	1	-1/+43
	There's quite a bit of code accessing sband iftype data (HE, HE 6 GHz, EHT) and we always need to remember to use the ieee80211_vif_type_p2p() helper. Add new helpers to directly get it from the sband/vif rather than having to call ieee80211_vif_type_p2p(). Convert most code with the following spatch: @@ expression vif, sband; @@ -ieee80211_get_he_iftype_cap(sband, ieee80211_vif_type_p2p(vif)) +ieee80211_get_he_iftype_cap_vif(sband, vif) @@ expression vif, sband; @@ -ieee80211_get_eht_iftype_cap(sband, ieee80211_vif_type_p2p(vif)) +ieee80211_get_eht_iftype_cap_vif(sband, vif) @@ expression vif, sband; @@ -ieee80211_get_he_6ghz_capa(sband, ieee80211_vif_type_p2p(vif)) +ieee80211_get_he_6ghz_capa_vif(sband, vif) Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Gregory Greenman <[email protected]> Link: https://lore.kernel.org/r/20230604120651.db099f49e764.Ie892966c49e22c7b7ee1073bc684f142debfdc84@changeid Signed-off-by: Johannes Berg <[email protected]>
2023-06-14	wifi: cfg80211: S1G rate information and calculations	Gilad Itzkovitch	1	-3/+15
	Increase the size of S1G rate_info flags to support S1G and add flags for new S1G MCS and the supported bandwidths. Also, include S1G rate information to netlink STA rate message. Lastly, add rate calculation function for S1G MCS. Signed-off-by: Gilad Itzkovitch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2023-06-14	net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting	Peilin Ye	1	-0/+8
	mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized in ingress_init() to point to net_device::miniq_ingress. ingress Qdiscs access this per-net_device pointer in mini_qdisc_pair_swap(). Similar for clsact Qdiscs and miniq_egress. Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER requests (thanks Hillf Danton for the hint), when replacing ingress or clsact Qdiscs, for example, the old Qdisc ("@old") could access the same miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"), causing race conditions [1] including a use-after-free bug in mini_qdisc_pair_swap() reported by syzbot: BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573 Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901 ... Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319 print_report mm/kasan/report.c:430 [inline] kasan_report+0x11c/0x130 mm/kasan/report.c:536 mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573 tcf_chain_head_change_item net/sched/cls_api.c:495 [inline] tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509 tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline] tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline] tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266 ... @old and @new should not affect each other. In other words, @old should never modify miniq_{in,e}gress after @new, and @new should not update @old's RCU state. Fixing without changing sch_api.c turned out to be difficult (please refer to Closes: for discussions). Instead, make sure @new's first call always happen after @old's last call (in {ingress,clsact}_destroy()) has finished: In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests, and call qdisc_destroy() for @old before grafting @new. Introduce qdisc_refcount_dec_if_one() as the counterpart of qdisc_refcount_inc_nz() used for filter requests. Introduce a non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check, just like qdisc_put() etc. Depends on patch "net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs". [1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under TC_H_ROOT (no longer possible after commit c7cfbd115001 ("net/sched: sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8 transmission queues: Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then adds a flower filter X to A. Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and b2) to replace A, then adds a flower filter Y to B. Thread 1 A's refcnt Thread 2 RTM_NEWQDISC (A, RTNL-locked) qdisc_create(A) 1 qdisc_graft(A) 9 RTM_NEWTFILTER (X, RTNL-unlocked) __tcf_qdisc_find(A) 10 tcf_chain0_head_change(A) mini_qdisc_pair_swap(A) (1st) \| \| RTM_NEWQDISC (B, RTNL-locked) RCU sync 2 qdisc_graft(B) \| 1 notify_and_destroy(A) \| tcf_block_release(A) 0 RTM_NEWTFILTER (Y, RTNL-unlocked) qdisc_destroy(A) tcf_chain0_head_change(B) tcf_chain0_head_change_cb_del(A) mini_qdisc_pair_swap(B) (2nd) mini_qdisc_pair_swap(A) (3rd) \| ... ... Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to its mini Qdisc, b1. Then, A calls mini_qdisc_pair_swap() again during ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress packets on eth0 will not find filter Y in sch_handle_ingress(). This is just one of the possible consequences of concurrently accessing miniq_{in,e}gress pointers. Fixes: 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops") Fixes: 87f373921c4e ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops") Reported-by: [email protected] Closes: https://lore.kernel.org/r/[email protected]/ Cc: Hillf Danton <[email protected]> Cc: Vlad Buslov <[email protected]> Signed-off-by: Peilin Ye <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-06-14	net/sched: act_ct: Fix promotion of offloaded unreplied tuple	Paul Blakey	1	-1/+1
	Currently UNREPLIED and UNASSURED connections are added to the nf flow table. This causes the following connection packets to be processed by the flow table which then skips conntrack_in(), and thus such the connections will remain UNREPLIED and UNASSURED even if reply traffic is then seen. Even still, the unoffloaded reply packets are the ones triggering hardware update from new to established state, and if there aren't any to triger an update and/or previous update was missed, hardware can get out of sync with sw and still mark packets as new. Fix the above by: 1) Not skipping conntrack_in() for UNASSURED packets, but still refresh for hardware, as before the cited patch. 2) Try and force a refresh by reply-direction packets that update the hardware rules from new to established state. 3) Remove any bidirectional flows that didn't failed to update in hardware for re-insertion as bidrectional once any new packet arrives. Fixes: 6a9bad0069cf ("net/sched: act_ct: offload UDP NEW connections") Co-developed-by: Vlad Buslov <[email protected]> Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-06-12	kcm: Send multiple frags in one sendmsg()	David Howells	1	-1/+1
	Rewrite the AF_KCM transmission loop to send all the fragments in a single skb or frag_list-skb in one sendmsg() with MSG_SPLICE_PAGES set. The list of fragments in each skb is conveniently a bio_vec[] that can just be attached to a BVEC iter. Note: I'm working out the size of each fragment-skb by adding up bv_len for all the bio_vecs in skb->frags[] - but surely this information is recorded somewhere? For the skbs in head->frag_list, this is equal to skb->data_len, but not for the head. head->data_len includes all the tail frags too. Signed-off-by: David Howells <[email protected]> cc: Tom Herbert <[email protected]> cc: Tom Herbert <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-12	net: flow_dissector: add support for cfm packets	Zahari Doychev	1	-0/+21
	Add support for dissecting cfm packets. The cfm packet header fields maintenance domain level and opcode can be dissected. Signed-off-by: Zahari Doychev <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-12	tcp: remove size parameter from tcp_stream_alloc_skb()	Eric Dumazet	1	-1/+1
	Now all tcp_stream_alloc_skb() callers pass @size == 0, we can remove this parameter. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-12	tcp: let tcp_send_syn_data() build headless packets	Eric Dumazet	1	-0/+1
	tcp_send_syn_data() is the last component in TCP transmit path to put payload in skb->head. Switch it to use page frags, so that we can remove dead code later. This allows to put more payload than previous implementation. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-12	scm: add SO_PASSPIDFD and SCM_PIDFD	Alexander Mikhalitsyn	1	-2/+37
	Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS, but it contains pidfd instead of plain pid, which allows programmers not to care about PID reuse problem. We mask SO_PASSPIDFD feature if CONFIG_UNIX is not builtin because it depends on a pidfd_prepare() API which is not exported to the kernel modules. Idea comes from UAPI kernel group: https://uapi-group.org/kernel-features/ Big thanks to Christian Brauner and Lennart Poettering for productive discussions about this. Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: David Ahern <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Kees Cook <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Lennart Poettering <[email protected]> Cc: Luca Boccassi <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Tested-by: Luca Boccassi <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Alexander Mikhalitsyn <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-12	net/sched: taprio: report class offload stats per TXQ, not per TC	Vladimir Oltean	1	-5/+5
	The taprio Qdisc creates child classes per netdev TX queue, but taprio_dump_class_stats() currently reports offload statistics per traffic class. Traffic classes are groups of TXQs sharing the same dequeue priority, so this is incorrect and we shouldn't be bundling up the TXQ stats when reporting them, as we currently do in enetc. Modify the API from taprio to drivers such that they report TXQ offload stats and not TC offload stats. There is no change in the UAPI or in the global Qdisc stats. Fixes: 6c1adb650c8d ("net/sched: taprio: add netlink reporting for offload statistics counters") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-10	Merge tag 'nf-23-06-08' of ↵	David S. Miller	1	-1/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf netfilter pull request 23-06-08 Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for net: 1) Add commit and abort set operation to pipapo set abort path. 2) Bail out immediately in case of ENOMEM in nfnetlink batch. 3) Incorrect error path handling when creating a new rule leads to dangling pointer in set transaction list. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-06-10	net: move gso declarations and functions to their own files	Eric Dumazet	3	-0/+111
	Move declarations into include/net/gso.h and code into net/core/gso.c Signed-off-by: Eric Dumazet <[email protected]> Cc: Stanislav Fomichev <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-09	Merge tag 'wireless-next-2023-06-09' of ↵	Jakub Kicinski	2	-4/+96
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.5 The second pull request for v6.5. We have support for three new Realtek chipsets, all from different generations. Shows how active Realtek development is right now, even older generations are being worked on. Note: We merged wireless into wireless-next to avoid complex conflicts between the trees. Major changes: rtl8xxxu - RTL8192FU support rtw89 - RTL8851BE support rtw88 - RTL8723DS support ath11k - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode iwlwifi - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature cfg80211/mac80211 - more Multi-Link Operation (MLO) support such as hardware restart - fixes for a potential work/mutex deadlock and with it beginnings of the previously discussed locking simplifications * tag 'wireless-next-2023-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (162 commits) wifi: rtlwifi: remove misused flag from HAL data wifi: rtlwifi: remove unused dualmac control leftovers wifi: rtlwifi: remove unused timer and related code wifi: rsi: Do not set MMC_PM_KEEP_POWER in shutdown wifi: rsi: Do not configure WoWlan in shutdown hook if not enabled wifi: brcmfmac: Detect corner error case earlier with log wifi: rtw89: 8852c: update RF radio A/B parameters to R63 wifi: rtw89: 8852c: update TX power tables to R63 with 6 GHz power type (3 of 3) wifi: rtw89: 8852c: update TX power tables to R63 with 6 GHz power type (2 of 3) wifi: rtw89: 8852c: update TX power tables to R63 with 6 GHz power type (1 of 3) wifi: rtw89: process regulatory for 6 GHz power type wifi: rtw89: regd: update regulatory map to R64-R40 wifi: rtw89: regd: judge 6 GHz according to chip and BIOS wifi: rtw89: refine clearing supported bands to check 2/5 GHz first wifi: rtw89: 8851b: configure CRASH_TRIGGER feature for 8851B wifi: rtw89: set TX power without precondition during setting channel wifi: rtw89: debug: txpwr table access only valid page according to chip wifi: rtw89: 8851b: enable hw_scan support wifi: cfg80211: move scan done work to wiphy work wifi: cfg80211: move sched scan stop to wiphy work ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-08	ipv4, ipv6: Use splice_eof() to flush	David Howells	3	-0/+3
	Allow splice to undo the effects of MSG_MORE after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. For UDP, a pending packet will not be emitted if the socket is closed before it is flushed; with this change, it be flushed by ->splice_eof(). For TCP, it's not clear that MSG_MORE is actually effective. Suggested-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <[email protected]> cc: Kuniyuki Iwashima <[email protected]> cc: Willem de Bruijn <[email protected]> cc: David Ahern <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-08	splice, net: Add a splice_eof op to file-ops and socket-ops	David Howells	1	-0/+1
	Add an optional method, ->splice_eof(), to allow splice to indicate the premature termination of a splice to struct file_operations and struct proto_ops. This is called if sendfile() or splice() encounters all of the following conditions inside splice_direct_to_actor(): (1) the user did not set SPLICE_F_MORE (splice only), and (2) an EOF condition occurred (->splice_read() returned 0), and (3) we haven't read enough to fulfill the request (ie. len > 0 still), and (4) we have already spliced at least one byte. A further patch will modify the behaviour of SPLICE_F_MORE to always be passed to the actor if either the user set it or we haven't yet read sufficient data to fulfill the request. Suggested-by: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> cc: Jens Axboe <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Jan Kara <[email protected]> cc: Jeff Layton <[email protected]> cc: David Hildenbrand <[email protected]> cc: Christian Brauner <[email protected]> cc: Chuck Lever <[email protected]> cc: Boris Pismenny <[email protected]> cc: John Fastabend <[email protected]> cc: [email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-08	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	9	-18/+26
	Cross-merge networking fixes after downstream PR. Conflicts: net/sched/sch_taprio.c d636fc5dd692 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping") dced11ef84fb ("net/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()") net/ipv4/sysctl_net_ipv4.c e209fee4118f ("net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294") ccce324dabfe ("tcp: make the first N SYN RTO backoffs linear") https://lore.kernel.org/all/[email protected]/ No adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-08	netfilter: nf_tables: integrate pipapo into commit protocol	Pablo Neira Ayuso	1	-1/+3
	The pipapo set backend follows copy-on-update approach, maintaining one clone of the existing datastructure that is being updated. The clone and current datastructures are swapped via rcu from the commit step. The existing integration with the commit protocol is flawed because there is no operation to clean up the clone if the transaction is aborted. Moreover, the datastructure swap happens on set element activation. This patch adds two new operations for sets: commit and abort, these new operations are invoked from the commit and abort steps, after the transactions have been digested, and it updates the pipapo set backend to use it. This patch adds a new ->pending_update field to sets to maintain a list of sets that require this new commit and abort operations. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2023-06-07	wifi: cfg80211: add a work abstraction with special semantics	Johannes Berg	1	-4/+91
	Add a work abstraction at the cfg80211 level that will always hold the wiphy_lock() for any work executed and therefore also can be canceled safely (without waiting) while holding that. This improves on what we do now as with the new wiphy works we don't have to worry about locking while cancelling them safely. Also, don't let such works run while the device is suspended, since they'll likely need to interact with the device. Flush them before suspend though. Signed-off-by: Johannes Berg <[email protected]>
2023-06-07	net: sched: move rtm_tca_policy declaration to include file	Eric Dumazet	1	-0/+2
	rtm_tca_policy is used from net/sched/sch_api.c and net/sched/cls_api.c, thus should be declared in an include file. This fixes the following sparse warning: net/sched/sch_api.c:1434:25: warning: symbol 'rtm_tca_policy' was not declared. Should it be static? Fixes: e331473fee3d ("net/sched: cls_api: add missing validation of netlink attributes") Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-07	net: sched: add rcu annotations around qdisc->qdisc_sleeping	Eric Dumazet	1	-2/+4
	syzbot reported a race around qdisc->qdisc_sleeping [1] It is time we add proper annotations to reads and writes to/from qdisc->qdisc_sleeping. [1] BUG: KCSAN: data-race in dev_graft_qdisc / qdisc_lookup_rcu read to 0xffff8881286fc618 of 8 bytes by task 6928 on cpu 1: qdisc_lookup_rcu+0x192/0x2c0 net/sched/sch_api.c:331 __tcf_qdisc_find+0x74/0x3c0 net/sched/cls_api.c:1174 tc_get_tfilter+0x18f/0x990 net/sched/cls_api.c:2547 rtnetlink_rcv_msg+0x7af/0x8c0 net/core/rtnetlink.c:6386 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x375/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x1e3/0x270 net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __x64_sys_sendmsg+0x46/0x50 net/socket.c:2593 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd write to 0xffff8881286fc618 of 8 bytes by task 6912 on cpu 0: dev_graft_qdisc+0x4f/0x80 net/sched/sch_generic.c:1115 qdisc_graft+0x7d0/0xb60 net/sched/sch_api.c:1103 tc_modify_qdisc+0x712/0xf10 net/sched/sch_api.c:1693 rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6395 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x375/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x1e3/0x270 net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __x64_sys_sendmsg+0x46/0x50 net/socket.c:2593 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 6912 Comm: syz-executor.5 Not tainted 6.4.0-rc3-syzkaller-00190-g0d85b27b0cc6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023 Fixes: 3a7d0d07a386 ("net: sched: extend Qdisc with rcu") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Vlad Buslov <[email protected]> Acked-by: Jamal Hadi Salim<[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-07	rfs: annotate lockless accesses to sk->sk_rxhash	Eric Dumazet	1	-5/+13
	Add READ_ONCE()/WRITE_ONCE() on accesses to sk->sk_rxhash. This also prevents a (smart ?) compiler to remove the condition in: if (sk->sk_rxhash != newval) sk->sk_rxhash = newval; We need the condition to avoid dirtying a shared cache line. Fixes: fec5e652e58f ("rfs: Receive Flow Steering") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>