aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2023-07-29net: add missing data-race annotation for sk_ll_usecEric Dumazet1-1/+1
In a prior commit I forgot that sk_getsockopt() reads sk->sk_ll_usec without holding a lock. Fixes: 0dbffbb5335a ("net: annotate data race around sk_ll_usec") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-29net: add missing data-race annotations around sk->sk_peek_offEric Dumazet2-3/+3
sk_getsockopt() runs locklessly, thus we need to annotate the read of sk->sk_peek_off. While we are at it, add corresponding annotations to sk_set_peek_off() and unix_set_peek_off(). Fixes: b9bb53f3836f ("sock: convert sk_peek_offset functions to WRITE_ONCE") Signed-off-by: Eric Dumazet <[email protected]> Cc: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-29net: annotate data-races around sk->sk_markEric Dumazet20-34/+35
sk->sk_mark is often read while another thread could change the value. Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-29net: add missing READ_ONCE(sk->sk_rcvbuf) annotationEric Dumazet1-1/+1
In a prior commit, I forgot to change sk_getsockopt() when reading sk->sk_rcvbuf locklessly. Fixes: ebb3b78db7bf ("tcp: annotate sk->sk_rcvbuf lockless reads") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-29net: add missing READ_ONCE(sk->sk_sndbuf) annotationEric Dumazet1-1/+1
In a prior commit, I forgot to change sk_getsockopt() when reading sk->sk_sndbuf locklessly. Fixes: e292f05e0df7 ("tcp: annotate sk->sk_sndbuf lockless reads") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-29net: annotate data-races around sk->sk_{rcv|snd}timeoEric Dumazet2-12/+16
sk_getsockopt() runs without locks, we must add annotations to sk->sk_rcvtimeo and sk->sk_sndtimeo. In the future we might allow fetching these fields before we lock the socket in TCP fast path. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-29net: add missing READ_ONCE(sk->sk_rcvlowat) annotationEric Dumazet1-1/+1
In a prior commit, I forgot to change sk_getsockopt() when reading sk->sk_rcvlowat locklessly. Fixes: eac66402d1c3 ("net: annotate sk->sk_rcvlowat lockless reads") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-29net: annotate data-races around sk->sk_max_pacing_rateEric Dumazet1-3/+6
sk_getsockopt() runs locklessly. This means sk->sk_max_pacing_rate can be read while other threads are changing its value. Fixes: 62748f32d501 ("net: introduce SO_MAX_PACING_RATE") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-29net: annotate data-race around sk->sk_txrehashEric Dumazet1-2/+5
sk_getsockopt() runs locklessly. This means sk->sk_txrehash can be read while other threads are changing its value. Other locations were handled in commit cb6cd2cec799 ("tcp: Change SYN ACK retransmit behaviour to account for rehash") Fixes: 26859240e4ee ("txhash: Add socket option to control TX hash rethink behavior") Signed-off-by: Eric Dumazet <[email protected]> Cc: Akhmat Karakotov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-29net: annotate data-races around sk->sk_reserved_memEric Dumazet1-3/+4
sk_getsockopt() runs locklessly. This means sk->sk_reserved_mem can be read while other threads are changing its value. Add missing annotations where they are needed. Fixes: 2bb2f5fb21b0 ("net: add new socket option SO_RESERVE_MEM") Signed-off-by: Eric Dumazet <[email protected]> Cc: Wei Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-29net: gro: fix misuse of CB in udp socket lookupRichard Gobert4-8/+22
This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to `udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the device from CB. The fix changes it to fetch the device from `skb->dev`. l3mdev case requires special attention since it has a master and a slave device. Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Reported-by: Gal Pressman <[email protected]> Signed-off-by: Richard Gobert <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-28net: sched: cls_u32: Fix match key mis-addressingJamal Hadi Salim1-6/+50
A match entry is uniquely identified with an "address" or "path" in the form of: hashtable ID(12b):bucketid(8b):nodeid(12b). When creating table match entries all of hash table id, bucket id and node (match entry id) are needed to be either specified by the user or reasonable in-kernel defaults are used. The in-kernel default for a table id is 0x800(omnipresent root table); for bucketid it is 0x0. Prior to this fix there was none for a nodeid i.e. the code assumed that the user passed the correct nodeid and if the user passes a nodeid of 0 (as Mingi Cho did) then that is what was used. But nodeid of 0 is reserved for identifying the table. This is not a problem until we dump. The dump code notices that the nodeid is zero and assumes it is referencing a table and therefore references table struct tc_u_hnode instead of what was created i.e match entry struct tc_u_knode. Ming does an equivalent of: tc filter add dev dummy0 parent 10: prio 1 handle 0x1000 \ protocol ip u32 match ip src 10.0.0.1/32 classid 10:1 action ok Essentially specifying a table id 0, bucketid 1 and nodeid of zero Tableid 0 is remapped to the default of 0x800. Bucketid 1 is ignored and defaults to 0x00. Nodeid was assumed to be what Ming passed - 0x000 dumping before fix shows: ~$ tc filter ls dev dummy0 parent 10: filter protocol ip pref 1 u32 chain 0 filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1 filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor -30591 Note that the last line reports a table instead of a match entry (you can tell this because it says "ht divisor..."). As a result of reporting the wrong data type (misinterpretting of struct tc_u_knode as being struct tc_u_hnode) the divisor is reported with value of -30591. Ming identified this as part of the heap address (physmap_base is 0xffff8880 (-30591 - 1)). The fix is to ensure that when table entry matches are added and no nodeid is specified (i.e nodeid == 0) then we get the next available nodeid from the table's pool. After the fix, this is what the dump shows: $ tc filter ls dev dummy0 parent 10: filter protocol ip pref 1 u32 chain 0 filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1 filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 10:1 not_in_hw match 0a000001/ffffffff at 12 action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 Reported-by: Mingi Cho <[email protected]> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jamal Hadi Salim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-28netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter linkDaniel Xu1-15/+108
This commit adds support for enabling IP defrag using pre-existing netfilter defrag support. Basically all the flag does is bump a refcnt while the link the active. Checks are also added to ensure the prog requesting defrag support is run _after_ netfilter defrag hooks. We also take care to avoid any issues w.r.t. module unloading -- while defrag is active on a link, the module is prevented from unloading. Signed-off-by: Daniel Xu <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Link: https://lore.kernel.org/r/5cff26f97e55161b7d56b09ddcf5f8888a5add1d.1689970773.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <[email protected]>
2023-07-28netfilter: defrag: Add glue hooks for enabling/disabling defragDaniel Xu3-1/+33
We want to be able to enable/disable IP packet defrag from core bpf/netfilter code. In other words, execute code from core that could possibly be built as a module. To help avoid symbol resolution errors, use glue hooks that the modules will register callbacks with during module init. Signed-off-by: Daniel Xu <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Link: https://lore.kernel.org/r/f6a8824052441b72afe5285acedbd634bd3384c1.1689970773.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <[email protected]>
2023-07-28Merge branch 'in-kernel-support-for-the-tls-alert-protocol'Jakub Kicinski8-42/+197
Chuck Lever says: ==================== In-kernel support for the TLS Alert protocol IMO the kernel doesn't need user space (ie, tlshd) to handle the TLS Alert protocol. Instead, a set of small helper functions can be used to handle sending and receiving TLS Alerts for in-kernel TLS consumers. ==================== Merged on top of a tag in case it's needed in the NFS tree. Link: https://lore.kernel.org/r/169047923706.5241.1181144206068116926.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-28net/handshake: Trace events for TLS Alert helpersChuck Lever2-0/+9
Add observability for the new TLS Alert infrastructure. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Link: https://lore.kernel.org/r/169047947409.5241.14548832149596892717.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-28SUNRPC: Use new helpers to handle TLS AlertsChuck Lever2-41/+48
Use the helpers to parse the level and description fields in incoming alerts. "Warning" alerts are discarded, and "fatal" alerts mean the session is no longer valid. Signed-off-by: Chuck Lever <[email protected]> Link: https://lore.kernel.org/r/169047944747.5241.1974889594004407123.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-28net/handshake: Add helpers for parsing incoming TLS AlertsChuck Lever1-0/+42
Kernel TLS consumers can replace common TLS Alert parsing code with these helpers. Signed-off-by: Chuck Lever <[email protected]> Link: https://lore.kernel.org/r/169047942074.5241.13791647439480672048.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-28SUNRPC: Send TLS Closure alerts before closing a TCP socketChuck Lever2-0/+4
Before closing a TCP connection, the TLS protocol wants peers to send session close Alert notifications. Add those in both the RPC client and server. Signed-off-by: Chuck Lever <[email protected]> Link: https://lore.kernel.org/r/169047939404.5241.14392506226409865832.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-28net/handshake: Add API for sending TLS Closure alertsChuck Lever4-1/+91
This helper sends an alert only if a TLS session was established. Signed-off-by: Chuck Lever <[email protected]> Link: https://lore.kernel.org/r/169047936730.5241.618595693821012638.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-28net/tls: Move TLS protocol elements to a separate headerChuck Lever3-0/+3
Kernel TLS consumers will need definitions of various parts of the TLS protocol, but often do not need the function declarations and other infrastructure provided in <net/tls.h>. Break out existing standardized protocol elements into a separate header, and make room for a few more elements in subsequent patches. Signed-off-by: Chuck Lever <[email protected]> Link: https://lore.kernel.org/r/169047931374.5241.7713175865185969309.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-28net: change accept_ra_min_rtr_lft to affect all RA lifetimesPatrick Rohr2-21/+19
accept_ra_min_rtr_lft only considered the lifetime of the default route and discarded entire RAs accordingly. This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and applies the value to individual RA sections; in particular, router lifetime, PIO preferred lifetime, and RIO lifetime. If any of those lifetimes are lower than the configured value, the specific RA section is ignored. In order for the sysctl to be useful to Android, it should really apply to all lifetimes in the RA, since that is what determines the minimum frequency at which RAs must be processed by the kernel. Android uses hardware offloads to drop RAs for a fraction of the minimum of all lifetimes present in the RA (some networks have very frequent RAs (5s) with high lifetimes (2h)). Despite this, we have encountered networks that set the router lifetime to 30s which results in very frequent CPU wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the WiFi firmware) entirely on such networks, it seems better to ignore the misconfigured routers while still processing RAs from other IPv6 routers on the same network (i.e. to support IoT applications). The previous implementation dropped the entire RA based on router lifetime. This turned out to be hard to expand to the other lifetimes present in the RA in a consistent manner; dropping the entire RA based on RIO/PIO lifetimes would essentially require parsing the whole thing twice. Fixes: 1671bcfd76fd ("net: add sysctl accept_ra_min_rtr_lft") Cc: Lorenzo Colitti <[email protected]> Signed-off-by: Patrick Rohr <[email protected]> Reviewed-by: Maciej Żenczykowski <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-28net: convert some netlink netdev iterators to depend on the xarrayJakub Kicinski3-122/+51
Reap the benefits of easier iteration thanks to the xarray. Convert just the genetlink ones, those are easier to test. Reviewed-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-28net: store netdevs in an xarrayJakub Kicinski1-28/+54
Iterating over the netdev hash table for netlink dumps is hard. Dumps are done in "chunks" so we need to save the position after each chunk, so we know where to restart from. Because netdevs are stored in a hash table we remember which bucket we were in and how many devices we dumped. Since we don't hold any locks across the "chunks" - devices may come and go while we're dumping. If that happens we may miss a device (if device is deleted from the bucket we were in). We indicate to user space that this may have happened by setting NLM_F_DUMP_INTR. User space is supposed to dump again (I think) if it sees that. Somehow I doubt most user space gets this right.. To illustrate let's look at an example: System state: start: # [A, B, C] del: B # [A, C] with the hash table we may dump [A, B], missing C completely even tho it existed both before and after the "del B". Add an xarray and use it to allocate ifindexes. This way we can iterate ifindexes in order, without the worry that we'll skip one. We may still generate a dump of a state which "never existed", for example for a set of values and sequence of ops: System state: start: # [A, B] add: C # [A, C, B] del: B # [A, C] we may generate a dump of [A], if C got an index between A and B. System has never been in such state. But I'm 90% sure that's perfectly fine, important part is that we can't _miss_ devices which exist before and after. User space which wants to mirror kernel's state subscribes to notifications and does periodic dumps so it will know that C exists from the notification about its creation or from the next dump (next dump is _guaranteed_ to include C, if it doesn't get removed). To avoid any perf regressions keep the hash table for now. Most net namespaces have very few devices and microbenchmarking 1M lookups on Skylake I get the following results (not counting loopback to number of devs): #devs | hash | xa | delta 2 | 18.3 | 20.1 | + 9.8% 16 | 18.3 | 20.1 | + 9.5% 64 | 18.3 | 26.3 | +43.8% 128 | 20.4 | 26.3 | +28.6% 256 | 20.0 | 26.4 | +32.1% 1024 | 26.6 | 26.7 | + 0.2% 8192 |541.3 | 33.5 | -93.8% No surprises since the hash table has 256 entries. The microbenchmark scans indexes in order, if the pattern is more random xa starts to win at 512 devices already. But that's a lot of devices, in practice. Reviewed-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-28Merge tag 'ceph-for-6.5-rc4' of https://github.com/ceph/ceph-clientLinus Torvalds1-0/+1
Pull ceph fixes from Ilya Dryomov: "A patch to reduce the potential for erroneous RBD exclusive lock blocklisting (fencing) with a couple of prerequisites and a fixup to prevent metrics from being sent to the MDS even just once after that has been disabled by the user. All marked for stable" * tag 'ceph-for-6.5-rc4' of https://github.com/ceph/ceph-client: rbd: retrieve and check lock owner twice before blocklisting rbd: harden get_lock_owner_info() a bit rbd: make get_lock_owner_info() return a single locker or NULL ceph: never send metrics if disable_send_metrics is set
2023-07-28Merge tag '9p-fixes-6.5-rc3' of ↵Linus Torvalds2-38/+16
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull 9p fixes from Eric Van Hensbergen: "Misc set of fixes for 9p. Most of these clean up warnings we've gotten out of compilation tools, but several of them were from inspection while hunting down a couple of regressions. The most important one is 75b396821cb7 ("fs/9p: remove unnecessary and overrestrictive check") which caused a regression for some folks by restricting mmap in any case where writeback caches weren't enabled. Most of the other bugs caught via inspection were type mismatches" * tag '9p-fixes-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: Remove unused extern declaration 9p: remove dead stores (variable set again without being read) 9p: virtio: skip incrementing unused variable 9p: virtio: make sure 'offs' is initialized in zc_request 9p: virtio: fix unlikely null pointer deref in handle_rerror 9p: fix ignored return value in v9fs_dir_release fs/9p: remove unnecessary invalidate_inode_pages2 fs/9p: fix type mismatch in file cache mode helper fs/9p: fix typo in comparison logic for cache mode fs/9p: remove unnecessary and overrestrictive check fs/9p: Fix a datatype used with V9FS_DIRECT_IO
2023-07-28batman-adv: Do not get eth header before batadv_check_management_packetRemi Pommarel2-2/+4
If received skb in batadv_v_elp_packet_recv or batadv_v_ogm_packet_recv is either cloned or non linearized then its data buffer will be reallocated by batadv_check_management_packet when skb_cow or skb_linearize get called. Thus geting ethernet header address inside skb data buffer before batadv_check_management_packet had any chance to reallocate it could lead to the following kernel panic: Unable to handle kernel paging request at virtual address ffffff8020ab069a Mem abort info: ESR = 0x96000007 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x07: level 3 translation fault Data abort info: ISV = 0, ISS = 0x00000007 CM = 0, WnR = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000040f45000 [ffffff8020ab069a] pgd=180000007fffa003, p4d=180000007fffa003, pud=180000007fffa003, pmd=180000007fefe003, pte=0068000020ab0706 Internal error: Oops: 96000007 [#1] SMP Modules linked in: ahci_mvebu libahci_platform libahci dvb_usb_af9035 dvb_usb_dib0700 dib0070 dib7000m dibx000_common ath11k_pci ath10k_pci ath10k_core mwl8k_new nf_nat_sip nf_conntrack_sip xhci_plat_hcd xhci_hcd nf_nat_pptp nf_conntrack_pptp at24 sbsa_gwdt CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.15.42-00066-g3242268d425c-dirty #550 Hardware name: A8k (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : batadv_is_my_mac+0x60/0xc0 lr : batadv_v_ogm_packet_recv+0x98/0x5d0 sp : ffffff8000183820 x29: ffffff8000183820 x28: 0000000000000001 x27: ffffff8014f9af00 x26: 0000000000000000 x25: 0000000000000543 x24: 0000000000000003 x23: ffffff8020ab0580 x22: 0000000000000110 x21: ffffff80168ae880 x20: 0000000000000000 x19: ffffff800b561000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 00dc098924ae0032 x14: 0f0405433e0054b0 x13: ffffffff00000080 x12: 0000004000000001 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : ffffffc076dae000 x6 : ffffff8000183700 x5 : ffffffc00955e698 x4 : ffffff80168ae000 x3 : ffffff80059cf000 x2 : ffffff800b561000 x1 : ffffff8020ab0696 x0 : ffffff80168ae880 Call trace: batadv_is_my_mac+0x60/0xc0 batadv_v_ogm_packet_recv+0x98/0x5d0 batadv_batman_skb_recv+0x1b8/0x244 __netif_receive_skb_core.isra.0+0x440/0xc74 __netif_receive_skb_one_core+0x14/0x20 netif_receive_skb+0x68/0x140 br_pass_frame_up+0x70/0x80 br_handle_frame_finish+0x108/0x284 br_handle_frame+0x190/0x250 __netif_receive_skb_core.isra.0+0x240/0xc74 __netif_receive_skb_list_core+0x6c/0x90 netif_receive_skb_list_internal+0x1f4/0x310 napi_complete_done+0x64/0x1d0 gro_cell_poll+0x7c/0xa0 __napi_poll+0x34/0x174 net_rx_action+0xf8/0x2a0 _stext+0x12c/0x2ac run_ksoftirqd+0x4c/0x7c smpboot_thread_fn+0x120/0x210 kthread+0x140/0x150 ret_from_fork+0x10/0x20 Code: f9403844 eb03009f 54fffee1 f94 Thus ethernet header address should only be fetched after batadv_check_management_packet has been called. Fixes: 0da0035942d4 ("batman-adv: OGMv2 - add basic infrastructure") Cc: [email protected] Signed-off-by: Remi Pommarel <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2023-07-28IPv6: add extack info for IPv6 address add/deleteHangbin Liu3-26/+58
Add extack info for IPv6 address add/delete, which would be useful for users to understand the problem without having to read kernel code. Suggested-by: Beniamino Galvani <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-28net: ethtool: Unify ETHTOOL_{G,S}RXFH rxnfc copyJoe Damato1-37/+38
ETHTOOL_GRXFH correctly copies in the full struct ethtool_rxnfc when FLOW_RSS is set; ETHTOOL_SRXFH needs a similar code path to handle the FLOW_RSS case so that ethtool can set the flow hash for custom RSS contexts (if supported by the driver). The copy code from ETHTOOL_GRXFH has been pulled out in to a helper so that it can be called in both ETHTOOL_{G,S}RXFH code paths. Acked-by: Edward Cree <[email protected]> Signed-off-by: Joe Damato <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-07-27net: Explicitly include correct DT includesRob Herring1-0/+1
The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Acked-by: Alex Elder <[email protected]> Reviewed-by: Bhupesh Sharma <[email protected]> Reviewed-by: Wei Fang <[email protected]> Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27Merge tag 'nf-next-23-07-27' of ↵Jakub Kicinski7-31/+22
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter updates for net-next 1. silence a harmless warning for CONFIG_NF_CONNTRACK_PROCFS=n builds, from Zhu Wang. 2, 3: Allow NLA_POLICY_MASK to be used with BE16/BE32 types, and replace a few manual checks with nla_policy based one in nf_tables, from myself. 4: cleanup in ctnetlink to validate while parsing rather than using two steps, from Lin Ma. 5: refactor boyer-moore textsearch by moving a small chunk to a helper function, rom Jeremy Sowden. * tag 'nf-next-23-07-27' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: lib/ts_bm: add helper to reduce indentation and improve readability netfilter: conntrack: validate cta_ip via parsing netfilter: nf_tables: use NLA_POLICY_MASK to test for valid flag options netlink: allow be16 and be32 types in all uint policy checks nf_conntrack: fix -Wunused-const-variable= ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27Merge branch 'net-tls-fixes-for-nvme-over-tls'Jakub Kicinski5-20/+135
Hannes Reinecke says: ==================== net/tls: fixes for NVMe-over-TLS here are some small fixes to get NVMe-over-TLS up and running. The first set are just minor modifications to have MSG_EOR handled for TLS, but the second set implements the ->read_sock() callback for tls_sw. The ->read_sock() callbacks return -EIO when encountering any TLS Alert message, but as that's the default behaviour anyway I guess we can get away with it. ==================== Applied on top of the tag in case Sagi gets convinced to pull it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27net: flower: fix stack-out-of-bounds in fl_set_key_cfm()Eric Dumazet1-2/+3
Typical misuse of nla_parse_nested(array, XXX_MAX, ...); array must be declared as struct nlattr *array[XXX_MAX + 1]; v2: Based on feedbacks from Ido Schimmel and Zahari Doychev, I also changed TCA_FLOWER_KEY_CFM_OPT_MAX and cfm_opt_policy definitions. syzbot reported: BUG: KASAN: stack-out-of-bounds in __nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588 Write of size 32 at addr ffffc90003a0ee20 by task syz-executor296/5014 CPU: 0 PID: 5014 Comm: syz-executor296 Not tainted 6.5.0-rc2-syzkaller-00307-gd192f5382581 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0x163/0x540 mm/kasan/report.c:475 kasan_report+0x175/0x1b0 mm/kasan/report.c:588 kasan_check_range+0x27e/0x290 mm/kasan/generic.c:187 __asan_memset+0x23/0x40 mm/kasan/shadow.c:84 __nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588 __nla_parse+0x40/0x50 lib/nlattr.c:700 nla_parse_nested include/net/netlink.h:1262 [inline] fl_set_key_cfm+0x1e3/0x440 net/sched/cls_flower.c:1718 fl_set_key+0x2168/0x6620 net/sched/cls_flower.c:1884 fl_tmplt_create+0x1fe/0x510 net/sched/cls_flower.c:2666 tc_chain_tmplt_add net/sched/cls_api.c:2959 [inline] tc_ctl_chain+0x131d/0x1ac0 net/sched/cls_api.c:3068 rtnetlink_rcv_msg+0x82b/0xf50 net/core/rtnetlink.c:6424 netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2549 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x7c3/0x990 net/netlink/af_netlink.c:1365 netlink_sendmsg+0xa2a/0xd60 net/netlink/af_netlink.c:1914 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] ____sys_sendmsg+0x592/0x890 net/socket.c:2494 ___sys_sendmsg net/socket.c:2548 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2577 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f54c6150759 Code: 48 83 c4 28 c3 e8 d7 19 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe06c30578 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f54c619902d RCX: 00007f54c6150759 RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003 RBP: 00007ffe06c30590 R08: 0000000000000000 R09: 00007ffe06c305f0 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54c61c35f0 R13: 00007ffe06c30778 R14: 0000000000000001 R15: 0000000000000001 </TASK> The buggy address belongs to stack of task syz-executor296/5014 and is located at offset 32 in frame: fl_set_key_cfm+0x0/0x440 net/sched/cls_flower.c:374 This frame has 1 object: [32, 56) 'nla_cfm_opt' The buggy address belongs to the virtual mapping at [ffffc90003a08000, ffffc90003a11000) created by: copy_process+0x5c8/0x4290 kernel/fork.c:2330 Fixes: 7cfffd5fed3e ("net: flower: add support for matching cfm fields") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Simon Horman <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Zahari Doychev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27net/tls: implement ->read_sock()Hannes Reinecke3-0/+103
Implement ->read_sock() function for use with nvme-tcp. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Cc: Boris Pismenny <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27net/tls: split tls_rx_reader_lockHannes Reinecke1-16/+22
Split tls_rx_reader_{lock,unlock} into an 'acquire/release' and the actual locking part. With that we can use the tls_rx_reader_lock in situations where the socket is already locked. Suggested-by: Sagi Grimberg <[email protected]> Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27net/tls: Use tcp_read_sock() instead of ops->read_sock()Hannes Reinecke1-2/+1
TLS resets the protocol operations, so the read_sock() callback might be changed, too. In this case using sock->ops->readsock() in tls_strp_read_copyin() will enter an infinite recursion if the read_sock() callback is calling tls_rx_rec_wait() which will call into sock->ops->readsock() via tls_strp_read_copyin(). But as tls_strp_read_copyin() is supposed to produce data from the consumed socket and that socket is always a TCP socket we can call tcp_read_sock() directly without having to deal with callbacks. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27net/tls: handle MSG_EOR for tls_device TX flowHannes Reinecke1-1/+5
tls_push_data() MSG_MORE, but bails out on MSG_EOR. Seeing that MSG_EOR is basically the opposite of MSG_MORE this patch adds handling MSG_EOR by treating it as the absence of MSG_MORE. Consequently we should return an error when both are set. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27net/tls: handle MSG_EOR for tls_sw TX flowHannes Reinecke1-1/+4
tls_sw_sendmsg() already handles MSG_MORE, but bails out on MSG_EOR. Seeing that MSG_EOR is basically the opposite of MSG_MORE this patch adds handling MSG_EOR by treating it as the negation of MSG_MORE. And erroring out if MSG_EOR is specified with MSG_MORE. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27net: dsa: fix older DSA drivers using phylinkRussell King (Oracle)1-1/+8
Older DSA drivers that do not provide an dsa_ops adjust_link method end up using phylink. Unfortunately, a recent phylink change that requires its supported_interfaces bitmap to be filled breaks these drivers because the bitmap remains empty. Rather than fixing each driver individually, fix it in the core code so we have a sensible set of defaults. Reported-by: Sergei Antonov <[email protected]> Fixes: de5c9bf40c45 ("net: phylink: require supported_interfaces to be filled") Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Tested-by: Vladimir Oltean <[email protected]> # dsa_loop Reviewed-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27dccp: Remove unused declaration dccp_feat_initialise_sysctls()YueHaibing1-1/+0
This is never used, so can remove it. Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE lengthLin Ma1-2/+6
There are totally 9 ndo_bridge_setlink handlers in the current kernel, which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3) i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5) ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7) nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink. By investigating the code, we find that 1-7 parse and use nlattr IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can lead to an out-of-attribute read and allow a malformed nlattr (e.g., length 0) to be viewed as a 2 byte integer. To avoid such issues, also for other ndo_bridge_setlink handlers in the future. This patch adds the nla_len check in rtnl_bridge_setlink and does an early error return if length mismatches. To make it works, the break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure this nla_for_each_nested iterates every attribute. Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink") Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops") Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Lin Ma <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27bridge: Remove unused declaration br_multicast_set_hash_max()YueHaibing1-1/+0
Since commit 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable") this is not used, so can remove it. Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Simon Horman <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27net: remove comment in ndisc_router_discoveryPatrick Rohr1-4/+0
Removes superfluous (and misplaced) comment from ndisc_router_discovery. Signed-off-by: Patrick Rohr <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski11-35/+83
Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
2023-07-27bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsingLin Ma1-1/+4
The nla_for_each_nested parsing in function bpf_sk_storage_diag_alloc does not check the length of the nested attribute. This can lead to an out-of-attribute read and allow a malformed nlattr (e.g., length 0) to be viewed as a 4 byte integer. This patch adds an additional check when the nlattr is getting counted. This makes sure the latter nla_get_u32 can access the attributes with the correct length. Fixes: 1ed4d92458a9 ("bpf: INET_DIAG support in bpf_sk_storage") Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Lin Ma <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-07-27virtio/vsock: support MSG_PEEK for SOCK_SEQPACKETArseniy Krasnov1-3/+60
This adds support of MSG_PEEK flag for SOCK_SEQPACKET type of socket. Difference with SOCK_STREAM is that this callback returns either length of the message or error. Signed-off-by: Arseniy Krasnov <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-07-27virtio/vsock: rework MSG_PEEK for SOCK_STREAMArseniy Krasnov1-22/+19
This reworks current implementation of MSG_PEEK logic: 1) Replaces 'skb_queue_walk_safe()' with 'skb_queue_walk()'. There is no need in the first one, as there are no removes of skb in loop. 2) Removes nested while loop - MSG_PEEK logic could be implemented without it: just iterate over skbs without removing it and copy data from each until destination buffer is not full. Signed-off-by: Arseniy Krasnov <[email protected]> Reviewed-by: Bobby Eshleman <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-07-27netfilter: conntrack: validate cta_ip via parsingLin Ma1-6/+2
In current ctnetlink_parse_tuple_ip() function, nested parsing and validation is splitting as two parts, which could be cleanup to a simplified form. As the nla_parse_nested_deprecated function supports validation in the fly. These two finially reach same place __nla_validate_parse with same validate flag. nla_parse_nested_deprecated __nla_parse(.., NL_VALIDATE_LIBERAL, ..) __nla_validate_parse nla_validate_nested_deprecated __nla_validate_nested(.., NL_VALIDATE_LIBERAL, ..) __nla_validate __nla_validate_parse This commit removes the call to nla_validate_nested_deprecated and pass cta_ip_nla_policy when do parsing. Signed-off-by: Lin Ma <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Florian Westphal <[email protected]>
2023-07-27netfilter: nf_tables: use NLA_POLICY_MASK to test for valid flag optionsFlorian Westphal5-25/+18
nf_tables relies on manual test of netlink attributes coming from userspace even in cases where this could be handled via netlink policy. Convert a bunch of 'flag' attributes to use NLA_POLICY_MASK checks. Signed-off-by: Florian Westphal <[email protected]>
2023-07-27nf_conntrack: fix -Wunused-const-variable=Zhu Wang1-0/+2
When building with W=1, the following warning occurs. net/netfilter/nf_conntrack_proto_dccp.c:72:27: warning: ‘dccp_state_names’ defined but not used [-Wunused-const-variable=] static const char * const dccp_state_names[] = { We include dccp_state_names in the macro CONFIG_NF_CONNTRACK_PROCFS, since it is only used in the place which is included in the macro CONFIG_NF_CONNTRACK_PROCFS. Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support") Signed-off-by: Zhu Wang <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Florian Westphal <[email protected]>