aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2024-07-15net: Add struct kernel_ethtool_ts_infoKory Maincent7-13/+21
In prevision to add new UAPI for hwtstamp we will be limited to the struct ethtool_ts_info that is currently passed in fixed binary format through the ETHTOOL_GET_TS_INFO ethtool ioctl. It would be good if new kernel code already started operating on an extensible kernel variant of that structure, similar in concept to struct kernel_hwtstamp_config vs struct hwtstamp_config. Since struct ethtool_ts_info is in include/uapi/linux/ethtool.h, here we introduce the kernel-only structure in include/linux/ethtool.h. The manual copy is then made in the function called by ETHTOOL_GET_TS_INFO. Acked-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20240709-feature_ptp_netnext-v17-6-b5317f50df2a@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: Change the API of PHY default timestamp to MACKory Maincent3-8/+7
Change the API to select MAC default time stamping instead of the PHY. Indeed the PHY is closer to the wire therefore theoretically it has less delay than the MAC timestamping but the reality is different. Due to lower time stamping clock frequency, latency in the MDIO bus and no PHC hardware synchronization between different PHY, the PHY PTP is often less precise than the MAC. The exception is for PHY designed specially for PTP case but these devices are not very widespread. For not breaking the compatibility default_timestamp flag has been introduced in phy_device that is set by the phy driver to know we are using the old API behavior. Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20240709-feature_ptp_netnext-v17-4-b5317f50df2a@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15Bluetooth: Remove hci_request.{c,h}Luiz Augusto von Dentz9-965/+1
This removes hci_request.{c,h} since it shall no longer be used. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-15Bluetooth: hci_sync: Remove remaining dependencies of hci_requestLuiz Augusto von Dentz2-20/+11
This removes the dependencies of hci_req_init and hci_request_cancel_all from hci_sync.c. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-15Bluetooth: hci_sync: Move handling of interleave_scanLuiz Augusto von Dentz2-7/+49
This moves handling of interleave_scan work to hci_sync.c since hci_request.c is deprecated. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-15Bluetooth: hci_core: Don't use hci_prepare_cmdLuiz Augusto von Dentz2-5/+4
This replaces the instance of hci_prepare_cmd with hci_cmd_sync_alloc since the former is part of hci_request.c which is considered deprecated. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-15Bluetooth: hci_core: Remove usage of hci_req_syncLuiz Augusto von Dentz2-30/+9
hci_request functions are considered deprecated so this replaces the usage of hci_req_sync with hci_inquiry_sync. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-15Bluetooth: Fix usage of __hci_cmd_sync_statusLuiz Augusto von Dentz1-15/+12
__hci_cmd_sync_status shall only be used if hci_req_sync_lock is _not_ required which is not the case of hci_dev_cmd so it needs to use hci_cmd_sync_status which uses hci_req_sync_lock internally. Fixes: f1a8f402f13f ("Bluetooth: L2CAP: Fix deadlock") Reported-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-15Bluetooth: hci_core: cleanup struct hci_devDmitry Antipov1-1/+0
Remove unused and set but otherwise unused 'discovery_old_state' and 'sco_last_tx' members of 'struct hci_dev'. The first one is a leftover after commit 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier"); the second one is originated from ancient 2.4.19 and I was unable to find any actual use since that. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-15net: dsa: prepare 'dsa_tag_8021q_bridge_join' for standalone usePawel Dembicki1-1/+4
The 'dsa_tag_8021q_bridge_join' could be used as a generic implementation of the 'ds->ops->port_bridge_join()' function. However, it is necessary to synchronize their arguments. This patch also moves the 'tx_fwd_offload' flag configuration line into 'dsa_tag_8021q_bridge_join' body. Currently, every (sja1105) driver sets it, and the future vsc73xx implementation will also need it for simplification. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20240713211620.1125910-11-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: vsc73xx: introduce tag 8021q for vsc73xxPawel Dembicki3-0/+75
This commit introduces a new tagger based on 802.1q tagging. It's designed for the vsc73xx driver. The VSC73xx family doesn't have any tag support for the RGMII port, but it could be based on VLANs. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patch.msgid.link/20240713211620.1125910-8-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: tag_sja1105: refactor skb->dev assignment to dsa_tag_8021q_find_user()Vladimir Oltean3-17/+24
A new tagging protocol implementation based on tag_8021q is on the horizon, and it appears that it also has to open-code the complicated logic of finding a source port based on a VLAN header. Create a single dsa_tag_8021q_find_user() and make sja1105 call it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20240713211620.1125910-7-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: tag_sja1105: prefer precise source port info on SJA1110 tooVladimir Oltean1-4/+4
Now that dsa_8021q_rcv() handles better the case where we don't overwrite the precise source information if it comes from an external (non-tag_8021q) source, we can now unify the call sequence between sja1105_rcv() and sja1110_rcv(). This is a preparatory change for creating a higher-level wrapper for the entire sequence which will live in tag_8021q. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20240713211620.1125910-6-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: tag_sja1105: absorb entire sja1105_vlan_rcv() into dsa_8021q_rcv()Vladimir Oltean4-36/+34
tag_sja1105 has a wrapper over dsa_8021q_rcv(): sja1105_vlan_rcv(), which determines whether the packet came from a bridge with vlan_filtering=1 (the case resolved via dsa_find_designated_bridge_port_by_vid()), or if it contains a tag_8021q header. Looking at a new tagger implementation for vsc73xx, based also on tag_8021q, it is becoming clear that the logic is needed there as well. So instead of forcing each tagger to wrap around dsa_8021q_rcv(), let's merge the logic into the core. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Link: https://patch.msgid.link/20240713211620.1125910-5-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: tag_sja1105: absorb logic for not overwriting precise info into ↵Vladimir Oltean2-23/+32
dsa_8021q_rcv() In both sja1105_rcv() and sja1110_rcv(), we may have precise source port information coming from parallel hardware mechanisms, in addition to the tag_8021q header. Only sja1105_rcv() has extra logic to not overwrite that precise info with what's present in the VLAN tag. This is because sja1110_rcv() gets by, by having a reversed set of checks when assigning skb->dev. When the source port is imprecise (vbid >=1), source_port and switch_id will be set to zeroes by dsa_8021q_rcv(), which might be problematic. But by checking for vbid >= 1 first, sja1110_rcv() fends that off. We would like to make more code common between sja1105_rcv() and sja1110_rcv(), and for that, we need to make sure that sja1110_rcv() also goes through the precise source port preservation logic. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20240713211620.1125910-4-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: bridge: mst: Check vlan state for egress decisionElliot Ayrey1-2/+2
If a port is blocking in the common instance but forwarding in an MST instance, traffic egressing the bridge will be dropped because the state of the common instance is overriding that of the MST instance. Fix this by skipping the port state check in MST mode to allow checking the vlan state via br_allowed_egress(). This is similar to what happens in br_handle_frame_finish() when checking ingress traffic, which was introduced in the change below. Fixes: ec7328b59176 ("net: bridge: mst: Multiple Spanning Tree (MST) mode") Signed-off-by: Elliot Ayrey <elliot.ayrey@alliedtelesis.co.nz> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-14xdp: fix invalid wait context of page_pool_destroy()Taehee Yoo1-3/+1
If the driver uses a page pool, it creates a page pool with page_pool_create(). The reference count of page pool is 1 as default. A page pool will be destroyed only when a reference count reaches 0. page_pool_destroy() is used to destroy page pool, it decreases a reference count. When a page pool is destroyed, ->disconnect() is called, which is mem_allocator_disconnect(). This function internally acquires mutex_lock(). If the driver uses XDP, it registers a memory model with xdp_rxq_info_reg_mem_model(). The xdp_rxq_info_reg_mem_model() internally increases a page pool reference count if a memory model is a page pool. Now the reference count is 2. To destroy a page pool, the driver should call both page_pool_destroy() and xdp_unreg_mem_model(). The xdp_unreg_mem_model() internally calls page_pool_destroy(). Only page_pool_destroy() decreases a reference count. If a driver calls page_pool_destroy() then xdp_unreg_mem_model(), we will face an invalid wait context warning. Because xdp_unreg_mem_model() calls page_pool_destroy() with rcu_read_lock(). The page_pool_destroy() internally acquires mutex_lock(). Splat looks like: ============================= [ BUG: Invalid wait context ] 6.10.0-rc6+ #4 Tainted: G W ----------------------------- ethtool/1806 is trying to lock: ffffffff90387b90 (mem_id_lock){+.+.}-{4:4}, at: mem_allocator_disconnect+0x73/0x150 other info that might help us debug this: context-{5:5} 3 locks held by ethtool/1806: stack backtrace: CPU: 0 PID: 1806 Comm: ethtool Tainted: G W 6.10.0-rc6+ #4 f916f41f172891c800f2fed Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 Call Trace: <TASK> dump_stack_lvl+0x7e/0xc0 __lock_acquire+0x1681/0x4de0 ? _printk+0x64/0xe0 ? __pfx_mark_lock.part.0+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 lock_acquire+0x1b3/0x580 ? mem_allocator_disconnect+0x73/0x150 ? __wake_up_klogd.part.0+0x16/0xc0 ? __pfx_lock_acquire+0x10/0x10 ? dump_stack_lvl+0x91/0xc0 __mutex_lock+0x15c/0x1690 ? mem_allocator_disconnect+0x73/0x150 ? __pfx_prb_read_valid+0x10/0x10 ? mem_allocator_disconnect+0x73/0x150 ? __pfx_llist_add_batch+0x10/0x10 ? console_unlock+0x193/0x1b0 ? lockdep_hardirqs_on+0xbe/0x140 ? __pfx___mutex_lock+0x10/0x10 ? tick_nohz_tick_stopped+0x16/0x90 ? __irq_work_queue_local+0x1e5/0x330 ? irq_work_queue+0x39/0x50 ? __wake_up_klogd.part.0+0x79/0xc0 ? mem_allocator_disconnect+0x73/0x150 mem_allocator_disconnect+0x73/0x150 ? __pfx_mem_allocator_disconnect+0x10/0x10 ? mark_held_locks+0xa5/0xf0 ? rcu_is_watching+0x11/0xb0 page_pool_release+0x36e/0x6d0 page_pool_destroy+0xd7/0x440 xdp_unreg_mem_model+0x1a7/0x2a0 ? __pfx_xdp_unreg_mem_model+0x10/0x10 ? kfree+0x125/0x370 ? bnxt_free_ring.isra.0+0x2eb/0x500 ? bnxt_free_mem+0x5ac/0x2500 xdp_rxq_info_unreg+0x4a/0xd0 bnxt_free_mem+0x1356/0x2500 bnxt_close_nic+0xf0/0x3b0 ? __pfx_bnxt_close_nic+0x10/0x10 ? ethnl_parse_bit+0x2c6/0x6d0 ? __pfx___nla_validate_parse+0x10/0x10 ? __pfx_ethnl_parse_bit+0x10/0x10 bnxt_set_features+0x2a8/0x3e0 __netdev_update_features+0x4dc/0x1370 ? ethnl_parse_bitset+0x4ff/0x750 ? __pfx_ethnl_parse_bitset+0x10/0x10 ? __pfx___netdev_update_features+0x10/0x10 ? mark_held_locks+0xa5/0xf0 ? _raw_spin_unlock_irqrestore+0x42/0x70 ? __pm_runtime_resume+0x7d/0x110 ethnl_set_features+0x32d/0xa20 To fix this problem, it uses rhashtable_lookup_fast() instead of rhashtable_lookup() with rcu_read_lock(). Using xa without rcu_read_lock() here is safe. xa is freed by __xdp_mem_allocator_rcu_free() and this is called by call_rcu() of mem_xa_remove(). The mem_xa_remove() is called by page_pool_destroy() if a reference count reaches 0. The xa is already protected by the reference count mechanism well in the control plane. So removing rcu_read_lock() for page_pool_destroy() is safe. Fixes: c3f812cea0d7 ("page_pool: do not release pool until inflight == 0.") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240712095116.3801586-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14net: phy: bcm54811: New link mode for BroadR-ReachKamil Horák (2N)1-0/+3
Introduce a new link mode necessary for 10 MBit single-pair connection in BroadR-Reach mode on bcm5481x PHY by Broadcom. This new link mode, 10baseT1BRR, is known as 1BR10 in the Broadcom terminology. Another link mode to be used is 1BR100 and it is already present as 100baseT1, because Broadcom's 1BR100 became 100baseT1 (IEEE 802.3bw). Signed-off-by: Kamil Horák (2N) <kamilh@axis.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240712150709.3134474-2-kamilh@axis.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14af_packet: Handle outgoing VLAN packets without hardware offloadingChengen Du1-2/+84
The issue initially stems from libpcap. The ethertype will be overwritten as the VLAN TPID if the network interface lacks hardware VLAN offloading. In the outbound packet path, if hardware VLAN offloading is unavailable, the VLAN tag is inserted into the payload but then cleared from the sk_buff struct. Consequently, this can lead to a false negative when checking for the presence of a VLAN tag, causing the packet sniffing outcome to lack VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing tool may be unable to parse packets as expected. The TCI-TPID is missing because the prb_fill_vlan_info() function does not modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the payload and not in the sk_buff struct. The skb_vlan_tag_present() function only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header is stripped, preventing the packet capturing tool from determining the correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, which means the packet capturing tool cannot parse the L3 header correctly. Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") Cc: stable@vger.kernel.org Signed-off-by: Chengen Du <chengen.du@canonical.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240713114735.62360-1-chengen.du@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Bluetooth: hci_core, hci_sync: cleanup struct discovery_stateDmitry Antipov1-2/+0
After commit 78db544b5d27 ("Bluetooth: hci_core: Remove le_restart_scan work"), 'scan_start' and 'scan_duration' of 'struct discovery_state' are still initialized but actually unused. So remove the aforementioned fields and adjust 'hci_discovery_filter_clear()' and 'le_scan_disable()' accordingly. Compile tested only. Fixes: 78db544b5d27 ("Bluetooth: hci_core: Remove le_restart_scan work") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-14Bluetooth: hci_event: Set QoS encryption from BIGInfo reportIulia Tanasescu1-0/+2
On a Broadcast Sink, after synchronizing to the PA transimitted by a Broadcast Source, the BIGInfo advertising reports emitted by the Controller hold the encryption field, which indicates whether the Broadcast Source is transmitting encrypted streams. This updates the PA sync hcon QoS with the encryption value reported in the BIGInfo report, so that this information is accurate if the userspace tries to access the QoS struct via getsockopt. Fixes: 1d11d70d1f6b ("Bluetooth: ISO: Pass BIG encryption info through QoS") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-14Bluetooth: Add vendor-specific packet classification for ISO dataYing Hsu1-0/+16
When HCI raw sockets are opened, the Bluetooth kernel module doesn't track CIS/BIS connections. User-space applications have to identify ISO data by maintaining connection information and look up the mapping for each ACL data packet received. Besides, btsnoop log captured in kernel couldn't tell ISO data from ACL data in this case. To avoid additional lookups, this patch introduces vendor-specific packet classification for Intel BT controllers to distinguish ISO data packets from ACL data packets. Signed-off-by: Ying Hsu <yinghsu@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-14Bluetooth: iso: remove unused struct 'iso_list_data'Dr. David Alan Gilbert1-5/+0
'iso_list_data' has been unused since the original commit ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-14Bluetooth: MGMT: Uninitialized variable in load_conn_param()Dan Carpenter1-1/+1
The "update" variable needs to be initialized to false. Fixes: 0ece498c27d8 ("Bluetooth: MGMT: Make MGMT_OP_LOAD_CONN_PARAM update existing connection") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-14tty: rfcomm: prefer array indexing over pointer arithmeticErick Archer1-6/+6
Refactor the list_for_each_entry() loop of rfcomm_get_dev_list() function to use array indexing instead of pointer arithmetic. This way, the code is more readable and idiomatic. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Erick Archer <erick.archer@outlook.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-14tty: rfcomm: prefer struct_size over open coded arithmeticErick Archer1-7/+4
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "dl" variable is a pointer to "struct rfcomm_dev_list_req" and this structure ends in a flexible array: struct rfcomm_dev_list_req { [...] struct rfcomm_dev_info dev_info[]; }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + count * size" in the kzalloc() and copy_to_user() functions. At the same time, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). In this case, it is important to note that the logic needs a little refactoring to ensure that the "dev_num" member is initialized before the first access to the flex array. Specifically, add the assignment before the list_for_each_entry() loop. Also remove the "size" variable as it is no longer needed. This way, the code is more readable and safer. This code was detected with the help of Coccinelle, and audited and modified manually. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Erick Archer <erick.archer@outlook.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-14Bluetooth: hci_core: Prefer array indexing over pointer arithmeticErick Archer1-2/+2
Refactor the list_for_each_entry() loop of hci_get_dev_list() function to use array indexing instead of pointer arithmetic. This way, the code is more readable and idiomatic. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Erick Archer <erick.archer@outlook.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-14Bluetooth: hci_core: Prefer struct_size over open coded arithmeticErick Archer1-7/+4
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "dl" variable is a pointer to "struct hci_dev_list_req" and this structure ends in a flexible array: struct hci_dev_list_req { [...] struct hci_dev_req dev_req[]; /* hci_dev_req structures */ }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + count * size" in the kzalloc() and copy_to_user() functions. At the same time, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). In this case, it is important to note that the logic needs a little refactoring to ensure that the "dev_num" member is initialized before the first access to the flex array. Specifically, add the assignment before the list_for_each_entry() loop. Also remove the "size" variable as it is no longer needed. This way, the code is more readable and safer. This code was detected with the help of Coccinelle, and audited and modified manually. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Erick Archer <erick.archer@outlook.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-14Bluetooth: MGMT: Make MGMT_OP_LOAD_CONN_PARAM update existing connectionLuiz Augusto von Dentz2-2/+66
This makes MGMT_OP_LOAD_CONN_PARAM update existing connection by dectecting the request is just for one connection, parameters already exists and there is a connection. Since this is a new behavior the revision is also updated to enable userspace to detect it. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-07-14Merge tag 'ipsec-next-2024-07-13' of ↵Jakub Kicinski12-10/+378
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-07-13 1) Support sending NAT keepalives in ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it. From Eyal Birger. 2) Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths. Currently, IPsec crypto offload is enabled for GRO code path only. This patchset support UDP encapsulation for the non GRO path. From Mike Yu. * tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Support crypto offload for outbound IPv4 UDP-encapsulated ESP packet xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packet xfrm: Allow UDP encapsulation in crypto offload control path xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO path xfrm: support sending NAT keepalives in ESP in UDP states ==================== Link: https://patch.msgid.link/20240713102416.3272997-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14ipv6: take care of scope when choosing the src addrNicolas Dichtel1-1/+2
When the source address is selected, the scope must be checked. For example, if a loopback address is assigned to the vrf device, it must not be chosen for packets sent outside. CC: stable@vger.kernel.org Fixes: afbac6010aec ("net: ipv6: Address selection needs to consider L3 domains") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240710081521.3809742-4-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14ipv6: fix source address selection with route leakNicolas Dichtel2-1/+2
By default, an address assigned to the output interface is selected when the source address is not specified. This is problematic when a route, configured in a vrf, uses an interface from another vrf (aka route leak). The original vrf does not own the selected source address. Let's add a check against the output interface and call the appropriate function to select the source address. CC: stable@vger.kernel.org Fixes: 0d240e7811c4 ("net: vrf: Implement get_saddr for IPv6") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20240710081521.3809742-3-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14ipv4: fix source address selection with route leakNicolas Dichtel1-2/+11
By default, an address assigned to the output interface is selected when the source address is not specified. This is problematic when a route, configured in a vrf, uses an interface from another vrf (aka route leak). The original vrf does not own the selected source address. Let's add a check against the output interface and call the appropriate function to select the source address. CC: stable@vger.kernel.org Fixes: 8cbb512c923d ("net: Add source address lookup op for VRF") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240710081521.3809742-2-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14net: ethtool: pse-pd: Fix possible null-derefKory Maincent1-2/+2
Fix a possible null dereference when a PSE supports both c33 and PoDL, but only one of the netlink attributes is specified. The c33 or PoDL PSE capabilities are already validated in the ethnl_set_pse_validate() call. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240705184116.13d8235a@kernel.org/ Fixes: 4d18e3ddf427 ("net: ethtool: pse-pd: Expand pse commands with the PSE PoE interface") Link: https://patch.msgid.link/20240711-fix_pse_pd_deref-v3-2-edd78fc4fe42@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14net: pse-pd: Do not return EOPNOSUPP if config is nullKory Maincent1-1/+3
For a PSE supporting both c33 and PoDL, setting config for one type of PoE leaves the other type's config null. Currently, this case returns EOPNOTSUPP, which is incorrect. Instead, we should do nothing if the configuration is empty. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Fixes: d83e13761d5b ("net: pse-pd: Use regulator framework within PSE framework") Link: https://patch.msgid.link/20240711-fix_pse_pd_deref-v3-1-edd78fc4fe42@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge tag 'ipsec-2024-07-11' of ↵Jakub Kicinski8-17/+82
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-07-11 1) Fix esp_output_tail_tcp() on unsupported ESPINTCP. From Hagar Hemdan. 2) Fix two bugs in the recently introduced SA direction separation. From Antony Antony. 3) Fix unregister netdevice hang on hardware offload. We had to add another list where skbs linked to that are unlinked from the lists (deleted) but not yet freed. 4) Fix netdev reference count imbalance in xfrm_state_find. From Jianbo Liu. 5) Call xfrm_dev_policy_delete when killingi them on offloaded policies. Jianbo Liu. * tag 'ipsec-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: call xfrm_dev_policy_delete when kill policy xfrm: fix netdev reference count imbalance xfrm: Export symbol xfrm_dev_state_delete. xfrm: Fix unregister netdevice hang on hardware offload. xfrm: Log input direction mismatch error in one place xfrm: Fix input error path memory access net: esp: cleanup esp_output_tail_tcp() in case of unsupported ESPINTCP ==================== Link: https://patch.msgid.link/20240711100025.1949454-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13net: ethtool: Monotonically increase the message sequence numberDanielle Ratson1-1/+1
Currently, during the module firmware flashing process, unicast notifications are sent from the kernel using the same sequence number, making it impossible for user space to track missed notifications. Monotonically increase the message sequence number, so the order of notifications could be tracked effectively. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240711080934.2071869-1-danieller@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13tcp: Don't drop SYN+ACK for simultaneous connect().Kuniyuki Iwashima1-0/+9
RFC 9293 states that in the case of simultaneous connect(), the connection gets established when SYN+ACK is received. [0] TCP Peer A TCP Peer B 1. CLOSED CLOSED 2. SYN-SENT --> <SEQ=100><CTL=SYN> ... 3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT 4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ... 6. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED 7. ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge ACK. For example, the write() syscall in the following packetdrill script fails with -EAGAIN, and wrong SNMP stats get incremented. 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8> +0 < S 0:0(0) win 1000 <mss 1000> +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8> +0 < S. 0:0(0) ack 1 win 1000 +0 write(3, ..., 100) = 100 +0 > P. 1:101(100) ack 1 -- # packetdrill cross-synack.pkt cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable) # nstat ... TcpExtTCPChallengeACK 1 0.0 TcpExtTCPSYNChallenge 1 0.0 The problem is that bpf_skops_established() is triggered by the Challenge ACK instead of SYN+ACK. This causes the bpf prog to miss the chance to check if the peer supports a TCP option that is expected to be exchanged in SYN and SYN+ACK. Let's accept a bare SYN+ACK for active-open TCP_SYN_RECV sockets to avoid such a situation. Note that tcp_ack_snd_check() in tcp_rcv_state_process() is skipped not to send an unnecessary ACK, but this could be a bit risky for net.git, so this targets for net-next. Link: https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 [0] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240710171246.87533-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-12Merge branch '200GbE' of ↵Jakub Kicinski1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== idpf: XDP chapter I: convert Rx to libeth Alexander Lobakin says: XDP for idpf is currently 5 chapters: * convert Rx to libeth (this); * convert Tx and stats to libeth; * generic XDP and XSk code changes, libeth_xdp; * actual XDP for idpf via libeth_xdp; * XSk for idpf (^). Part I does the following: * splits &idpf_queue into 4 (RQ, SQ, FQ, CQ) and puts them on a diet; * ensures optimal cacheline placement, strictly asserts CL sizes; * moves currently unused/dead singleq mode out of line; * reuses libeth's Rx ptype definitions and helpers; * uses libeth's Rx buffer management for both header and payload; * eliminates memcpy()s and coherent DMA uses on hotpath, uses napi_build_skb() instead of in-place short skb allocation. Most idpf patches, except for the queue split, removes more lines than adds. Expect far better memory utilization and +5-8% on Rx depending on the case (+17% on skb XDP_DROP :>). * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: idpf: use libeth Rx buffer management for payload buffer idpf: convert header split mode to libeth + napi_build_skb() libeth: support different types of buffers for Rx idpf: remove legacy Page Pool Ethtool stats idpf: reuse libeth's definitions of parsed ptype structures idpf: compile singleq code only under default-n CONFIG_IDPF_SINGLEQ idpf: merge singleq and splitq &net_device_ops idpf: strictly assert cachelines of queue and queue vector structures idpf: avoid bloating &idpf_q_vector with big %NR_CPUS idpf: split &idpf_queue into 4 strictly-typed queue structures idpf: stop using macros for accessing queue descriptors libeth: add cacheline / struct layout assertion helpers page_pool: use __cacheline_group_{begin, end}_aligned() cache: add __cacheline_group_{begin, end}_aligned() (+ couple more) ==================== Link: https://patch.msgid.link/20240710203031.188081-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-12Merge tag 'for-netdev' of ↵Jakub Kicinski1-1/+0
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-07-12 We've added 23 non-merge commits during the last 3 day(s) which contain a total of 18 files changed, 234 insertions(+), 243 deletions(-). The main changes are: 1) Improve BPF verifier by utilizing overflow.h helpers to check for overflows, from Shung-Hsi Yu. 2) Fix NULL pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT when attr->attach_prog_fd was not specified, from Tengda Wu. 3) Fix arm64 BPF JIT when generating code for BPF trampolines with BPF_TRAMP_F_CALL_ORIG which corrupted upper address bits, from Puranjay Mohan. 4) Remove test_run callback from lwt_seg6local_prog_ops which never worked in the first place and caused syzbot reports, from Sebastian Andrzej Siewior. 5) Relax BPF verifier to accept non-zero offset on KF_TRUSTED_ARGS/ /KF_RCU-typed BPF kfuncs, from Matt Bobrowski. 6) Fix a long standing bug in libbpf with regards to handling of BPF skeleton's forward and backward compatibility, from Andrii Nakryiko. 7) Annotate btf_{seq,snprintf}_show functions with __printf, from Alan Maguire. 8) BPF selftest improvements to reuse common network helpers in sk_lookup test and dropping the open-coded inetaddr_len() and make_socket() ones, from Geliang Tang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits) selftests/bpf: Test for null-pointer-deref bugfix in resolve_prog_type() bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT selftests/bpf: DENYLIST.aarch64: Skip fexit_sleep again bpf: use check_sub_overflow() to check for subtraction overflows bpf: use check_add_overflow() to check for addition overflows bpf: fix overflow check in adjust_jmp_off() bpf: Eliminate remaining "make W=1" warnings in kernel/bpf/btf.o bpf: annotate BTF show functions with __printf bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG selftests/bpf: Close obj in error path in xdp_adjust_tail selftests/bpf: Null checks for links in bpf_tcp_ca selftests/bpf: Use connect_fd_to_fd in sk_lookup selftests/bpf: Use start_server_addr in sk_lookup selftests/bpf: Use start_server_str in sk_lookup selftests/bpf: Close fd in error path in drop_on_reuseport selftests/bpf: Add ASSERT_OK_FD macro selftests/bpf: Add backlog for network_helper_opts selftests/bpf: fix compilation failure when CONFIG_NF_FLOW_TABLE=m bpf: Remove tst_run from lwt_seg6local_prog_ops. bpf: relax zero fixed offset constraint on KF_TRUSTED_ARGS/KF_RCU ... ==================== Link: https://patch.msgid.link/20240712212448.5378-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-5/+19
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c f7ce5eb2cb79 ("bnxt_en: Fix crash in bnxt_get_max_rss_ctx_ring()") 20c8ad72eb7f ("eth: bnxt: use the RSS context XArray instead of the local list") Adjacent changes: net/ethtool/ioctl.c 503757c80928 ("net: ethtool: Fix RSS setting") eac9122f0c41 ("net: ethtool: record custom RSS contexts in the XArray") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-12net: ethtool: let drivers declare max size of RSS indir table and keyJakub Kicinski1-10/+36
Some drivers (bnxt but I think also mlx5 from ML discussions) change the size of the indirection table depending on the number of Rx rings. Decouple the max table size from the size of the currently used table, so that we can reserve space in the context for table growth. Static members in ethtool_ops are good enough for now, we can add callbacks to read the max size more dynamically if someone needs that. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-12net: ethtool: let drivers remove lost RSS contextsJakub Kicinski1-0/+14
RSS contexts may get lost from a device, in various extreme circumstances. Specifically if the firmware leaks resources and resets, or crashes and either recovers in partially working state or the crash causes a different FW version to run - creating the context again may fail. Drivers should do their absolute best to prevent this from happening. When it does, however, telling user that a context exists, when it can't possibly be used any more is counter productive. Add a helper for drivers to discard contexts. Print an error, in the future netlink notification will also be sent. More robust approaches were proposed, like keeping the contexts but marking them as "dead" (but possibly resurrected by next reset). That may be better but it's unclear at this stage whether the effort is worth the benefits. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-12Merge tag 'net-6.10-rc8-2' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull more networking fixes from Jakub Kicinski: "A quick follow up to yesterday's pull. We got a regressions report for the bnxt patch as soon as it got to your tree. The ethtool fix is also good to have, although it's an older regression. Current release - regressions: - eth: bnxt_en: fix crash in bnxt_get_max_rss_ctx_ring() on older HW when user tries to decrease the ring count Previous releases - regressions: - ethtool: fix RSS setting, accept "no change" setting if the driver doesn't support the new features - eth: i40e: remove needless retries of NVM update, don't wait 20min when we know the firmware update won't succeed" * tag 'net-6.10-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: bnxt_en: Fix crash in bnxt_get_max_rss_ctx_ring() octeontx2-af: fix issue with IPv4 match for RSS octeontx2-af: fix issue with IPv6 ext match for RSS octeontx2-af: fix detection of IP layer octeontx2-af: fix a issue with cpt_lf_alloc mailbox octeontx2-af: replace cpt slot with lf id on reg write i40e: fix: remove needless retries of NVM update net: ethtool: Fix RSS setting
2024-07-12Merge tag 'ceph-for-6.10-rc8' of https://github.com/ceph/ceph-clientLinus Torvalds2-4/+17
Pull ceph fixes from Ilya Dryomov: "A fix for a possible use-after-free following "rbd unmap" or "umount" marked for stable and two kernel-doc fixups" * tag 'ceph-for-6.10-rc8' of https://github.com/ceph/ceph-client: libceph: fix crush_choose_firstn() kernel-doc warnings libceph: suppress crush_choose_indep() kernel-doc warnings libceph: fix race between delayed_work() and ceph_monc_stop()
2024-07-12xfrm: Support crypto offload for outbound IPv4 UDP-encapsulated ESP packetMike Yu2-2/+23
esp_xmit() is already able to handle UDP encapsulation through the call to esp_output_head(). However, the ESP header and the outer IP header are not correct and need to be corrected. Test: Enabled both dir=in/out IPsec crypto offload, and verified IPv4 UDP-encapsulated ESP packets on both wifi/cellular network Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-12xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packetMike Yu1-1/+2
If xfrm_input() is called with UDP_ENCAP_ESPINUDP, the packet is already processed in UDP layer that removes the UDP header. Therefore, there should be no much difference to treat it as an ESP packet in the XFRM stack. Test: Enabled dir=in IPsec crypto offload, and verified IPv4 UDP-encapsulated ESP packets on both wifi/cellular network Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-12xfrm: Allow UDP encapsulation in crypto offload control pathMike Yu1-3/+3
Unblock this limitation so that SAs with encapsulation specified can be passed to HW drivers. HW drivers can still reject the SA in their implementation of xdo_dev_state_add if the encapsulation is not supported. Test: Verified on Android device Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-12xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO pathMike Yu2-2/+5
IPsec crypt offload supports outbound IPv6 ESP packets, but it doesn't support inbound IPv6 ESP packets. This change enables the crypto offload for inbound IPv6 ESP packets that are not handled through GRO code path. If HW drivers add the offload information to the skb, the packet will be handled in the crypto offload rx code path. Apart from the change in crypto offload rx code path, the change in xfrm_policy_check is also needed. Exampe of RX data path: +-----------+ +-------+ | HW Driver |-->| wlan0 |--------+ +-----------+ +-------+ | v +---------------+ +------+ +------>| Network Stack |-->| Apps | | +---------------+ +------+ | | | v +--------+ +------------+ | ipsec1 |<--| XFRM Stack | +--------+ +------------+ Test: Enabled both in/out IPsec crypto offload, and verified IPv6 ESP packets on Android device on both wifi/cellular network Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-12l2tp: fix l2tp_session_register with colliding l2tpv3 IDsJames Chapman1-8/+10
When handling colliding L2TPv3 session IDs, we use the existing session IDR entry and link the new session on that using session->coll_list. However, when using an existing IDR entry, we must not do the idr_replace step. Fixes: aa5e17e1f5ec ("l2tp: store l2tpv3 sessions in per-net IDR") Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>