aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-09-28xfrm: Reinject transport-mode packets through workqueueLiu Jian1-5/+13
The following warning is displayed when the tcp6-multi-diffip11 stress test case of the LTP test suite is tested: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ns-tcpserver:48198] CPU: 0 PID: 48198 Comm: ns-tcpserver Kdump: loaded Not tainted 6.0.0-rc6+ #39 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : des3_ede_encrypt+0x27c/0x460 [libdes] lr : 0x3f sp : ffff80000ceaa1b0 x29: ffff80000ceaa1b0 x28: ffff0000df056100 x27: ffff0000e51e5280 x26: ffff80004df75030 x25: ffff0000e51e4600 x24: 000000000000003b x23: 0000000000802080 x22: 000000000000003d x21: 0000000000000038 x20: 0000000080000020 x19: 000000000000000a x18: 0000000000000033 x17: ffff0000e51e4780 x16: ffff80004e2d1448 x15: ffff80004e2d1248 x14: ffff0000e51e4680 x13: ffff80004e2d1348 x12: ffff80004e2d1548 x11: ffff80004e2d1848 x10: ffff80004e2d1648 x9 : ffff80004e2d1748 x8 : ffff80004e2d1948 x7 : 000000000bcaf83d x6 : 000000000000001b x5 : ffff80004e2d1048 x4 : 00000000761bf3bf x3 : 000000007f1dd0a3 x2 : ffff0000e51e4780 x1 : ffff0000e3b9a2f8 x0 : 00000000db44e872 Call trace: des3_ede_encrypt+0x27c/0x460 [libdes] crypto_des3_ede_encrypt+0x1c/0x30 [des_generic] crypto_cbc_encrypt+0x148/0x190 crypto_skcipher_encrypt+0x2c/0x40 crypto_authenc_encrypt+0xc8/0xfc [authenc] crypto_aead_encrypt+0x2c/0x40 echainiv_encrypt+0x144/0x1a0 [echainiv] crypto_aead_encrypt+0x2c/0x40 esp6_output_tail+0x1c8/0x5d0 [esp6] esp6_output+0x120/0x278 [esp6] xfrm_output_one+0x458/0x4ec xfrm_output_resume+0x6c/0x1f0 xfrm_output+0xac/0x4ac __xfrm6_output+0x130/0x270 xfrm6_output+0x60/0xec ip6_xmit+0x2ec/0x5bc inet6_csk_xmit+0xbc/0x10c __tcp_transmit_skb+0x460/0x8c0 tcp_write_xmit+0x348/0x890 __tcp_push_pending_frames+0x44/0x110 tcp_rcv_established+0x3c8/0x720 tcp_v6_do_rcv+0xdc/0x4a0 tcp_v6_rcv+0xc24/0xcb0 ip6_protocol_deliver_rcu+0xf0/0x574 ip6_input_finish+0x48/0x7c ip6_input+0x48/0xc0 ip6_rcv_finish+0x80/0x9c xfrm_trans_reinject+0xb0/0xf4 tasklet_action_common.constprop.0+0xf8/0x134 tasklet_action+0x30/0x3c __do_softirq+0x128/0x368 do_softirq+0xb4/0xc0 __local_bh_enable_ip+0xb0/0xb4 put_cpu_fpsimd_context+0x40/0x70 kernel_neon_end+0x20/0x40 sha1_base_do_update.constprop.0.isra.0+0x11c/0x140 [sha1_ce] sha1_ce_finup+0x94/0x110 [sha1_ce] crypto_shash_finup+0x34/0xc0 hmac_finup+0x48/0xe0 crypto_shash_finup+0x34/0xc0 shash_digest_unaligned+0x74/0x90 crypto_shash_digest+0x4c/0x9c shash_ahash_digest+0xc8/0xf0 shash_async_digest+0x28/0x34 crypto_ahash_digest+0x48/0xcc crypto_authenc_genicv+0x88/0xcc [authenc] crypto_authenc_encrypt+0xd8/0xfc [authenc] crypto_aead_encrypt+0x2c/0x40 echainiv_encrypt+0x144/0x1a0 [echainiv] crypto_aead_encrypt+0x2c/0x40 esp6_output_tail+0x1c8/0x5d0 [esp6] esp6_output+0x120/0x278 [esp6] xfrm_output_one+0x458/0x4ec xfrm_output_resume+0x6c/0x1f0 xfrm_output+0xac/0x4ac __xfrm6_output+0x130/0x270 xfrm6_output+0x60/0xec ip6_xmit+0x2ec/0x5bc inet6_csk_xmit+0xbc/0x10c __tcp_transmit_skb+0x460/0x8c0 tcp_write_xmit+0x348/0x890 __tcp_push_pending_frames+0x44/0x110 tcp_push+0xb4/0x14c tcp_sendmsg_locked+0x71c/0xb64 tcp_sendmsg+0x40/0x6c inet6_sendmsg+0x4c/0x80 sock_sendmsg+0x5c/0x6c __sys_sendto+0x128/0x15c __arm64_sys_sendto+0x30/0x40 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x170/0x194 do_el0_svc+0x38/0x4c el0_svc+0x28/0xe0 el0t_64_sync_handler+0xbc/0x13c el0t_64_sync+0x180/0x184 Get softirq info by bcc tool: ./softirqs -NT 10 Tracing soft irq event time... Hit Ctrl-C to end. 15:34:34 SOFTIRQ TOTAL_nsecs block 158990 timer 20030920 sched 46577080 net_rx 676746820 tasklet 9906067650 15:34:45 SOFTIRQ TOTAL_nsecs block 86100 sched 38849790 net_rx 676532470 timer 1163848790 tasklet 9409019620 15:34:55 SOFTIRQ TOTAL_nsecs sched 58078450 net_rx 475156720 timer 533832410 tasklet 9431333300 The tasklet software interrupt takes too much time. Therefore, the xfrm_trans_reinject executor is changed from tasklet to workqueue. Add add spin lock to protect the queue. This reduces the processing flow of the tcp_sendmsg function in this scenario. Fixes: acf568ee859f0 ("xfrm: Reinject transport-mode packets through tasklet") Signed-off-by: Liu Jian <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2022-09-27Add skb drop reasons to IPv6 UDP receive pathDonald Hunter1-6/+16
Enumerate the skb drop reasons in the receive path for IPv6 UDP packets. Signed-off-by: Donald Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-27net: tls: Add ARIA-GCM algorithmTaehee Yoo2-0/+96
RFC 6209 describes ARIA for TLS 1.2. ARIA-128-GCM and ARIA-256-GCM are defined in RFC 6209. This patch would offer performance increment and an opportunity for hardware offload. Benchmark results: iperf-ssl are used. CPU: intel i3-12100. TLS(openssl-3.0-dev) [ 3] 0.0- 1.0 sec 185 MBytes 1.55 Gbits/sec [ 3] 1.0- 2.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 2.0- 3.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 3.0- 4.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 4.0- 5.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 0.0- 5.0 sec 927 MBytes 1.56 Gbits/sec kTLS(aria-generic) [ 3] 0.0- 1.0 sec 198 MBytes 1.66 Gbits/sec [ 3] 1.0- 2.0 sec 194 MBytes 1.62 Gbits/sec [ 3] 2.0- 3.0 sec 194 MBytes 1.63 Gbits/sec [ 3] 3.0- 4.0 sec 194 MBytes 1.63 Gbits/sec [ 3] 4.0- 5.0 sec 194 MBytes 1.62 Gbits/sec [ 3] 0.0- 5.0 sec 974 MBytes 1.63 Gbits/sec kTLS(aria-avx wirh GFNI) [ 3] 0.0- 1.0 sec 632 MBytes 5.30 Gbits/sec [ 3] 1.0- 2.0 sec 657 MBytes 5.51 Gbits/sec [ 3] 2.0- 3.0 sec 657 MBytes 5.51 Gbits/sec [ 3] 3.0- 4.0 sec 656 MBytes 5.50 Gbits/sec [ 3] 4.0- 5.0 sec 656 MBytes 5.50 Gbits/sec [ 3] 0.0- 5.0 sec 3.18 GBytes 5.47 Gbits/sec Signed-off-by: Taehee Yoo <[email protected]> Reviewed-by: Vadim Fedorenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-27Merge tag 'wireless-2022-09-27' of ↵Jakub Kicinski6-10/+19
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== A few late-comer fixes: * locking in mac80211 MLME * non-QoS driver crash/regression * minstrel memory corruption * TX deadlock * TX queues not always enabled * HE/EHT bitrate calculation * tag 'wireless-2022-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: mlme: Fix double unlock on assoc success handling wifi: mac80211: mlme: Fix missing unlock on beacon RX wifi: mac80211: fix memory corruption in minstrel_ht_update_rates() wifi: mac80211: fix regression with non-QoS drivers wifi: mac80211: ensure vif queues are operational after start wifi: mac80211: don't start TX with fq->lock to fix deadlock wifi: cfg80211: fix MCS divisor value ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-27Bluetooth: hci_core: Fix not handling link timeouts propertlyLuiz Augusto von Dentz1-11/+23
Change that introduced the use of __check_timeout did not account for link types properly, it always assumes ACL_LINK is used thus causing hdev->acl_last_tx to be used even in case of LE_LINK and then again uses ACL_LINK with hci_link_tx_to. To fix this __check_timeout now takes the link type as parameter and then procedure to use the right last_tx based on the link type and pass it to hci_link_tx_to. Fixes: 1b1d29e51499 ("Bluetooth: Make use of __check_timeout on hci_sched_le") Signed-off-by: Luiz Augusto von Dentz <[email protected]> Tested-by: David Beinder <[email protected]>
2022-09-27NFC: hci: Split memcpy() of struct hcp_message flexible arrayKees Cook1-7/+5
To work around a misbehavior of the compiler's ability to see into composite flexible array structs (as detailed in the coming memcpy() hardening series[1]), split the memcpy() of the header and the payload so no false positive run-time overflow warning will be generated. This split already existed for the "firstfrag" case, so just generalize the logic further. [1] https://lore.kernel.org/linux-hardening/[email protected]/ Cc: Eric Dumazet <[email protected]> Cc: Paolo Abeni <[email protected]> Reported-by: "Gustavo A. R. Silva" <[email protected]> Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-27net: openvswitch: allow conntrack in non-initial user namespaceMichael Weiß1-5/+8
Similar to the previous commit, the Netlink interface of the OVS conntrack module was restricted to global CAP_NET_ADMIN by using GENL_ADMIN_PERM. This is changed to GENL_UNS_ADMIN_PERM to support unprivileged containers in non-initial user namespace. Signed-off-by: Michael Weiß <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-09-27net: openvswitch: allow metering in non-initial user namespaceMichael Weiß1-7/+7
The Netlink interface for metering was restricted to global CAP_NET_ADMIN by using GENL_ADMIN_PERM. To allow metring in a non-inital user namespace, e.g., a container, this is changed to GENL_UNS_ADMIN_PERM. Signed-off-by: Michael Weiß <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-09-27wifi: mac80211: mlme: Fix double unlock on assoc success handlingRafael Mendonca1-1/+0
Commit 6911458dc428 ("wifi: mac80211: mlme: refactor assoc success handling") moved the per-link setup out of ieee80211_assoc_success() into a new function ieee80211_assoc_config_link() but missed to remove the unlock of 'sta_mtx' in case of HE capability/operation missing on HE AP, which leads to a double unlock: ieee80211_assoc_success() { ... ieee80211_assoc_config_link() { ... if (!(link->u.mgd.conn_flags & IEEE80211_CONN_DISABLE_HE) && (!elems->he_cap || !elems->he_operation)) { mutex_unlock(&sdata->local->sta_mtx); ... } ... } ... mutex_unlock(&sdata->local->sta_mtx); ... } Fixes: 6911458dc428 ("wifi: mac80211: mlme: refactor assoc success handling") Signed-off-by: Rafael Mendonca <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-09-27wifi: mac80211: mlme: Fix missing unlock on beacon RXRafael Mendonca1-2/+6
Commit 98b0b467466c ("wifi: mac80211: mlme: use correct link_sta") switched to link station instead of deflink and added some checks to do that, which are done with the 'sta_mtx' mutex held. However, the error path of these checks does not unlock 'sta_mtx' before returning. Fixes: 98b0b467466c ("wifi: mac80211: mlme: use correct link_sta") Signed-off-by: Rafael Mendonca <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-09-27wifi: mac80211: fix memory corruption in minstrel_ht_update_rates()Paweł Lenkow1-2/+4
During our testing of WFM200 module over SDIO on i.MX6Q-based platform, we discovered a memory corruption on the system, tracing back to the wfx driver. Using kfence, it was possible to trace it back to the root cause, which is hw->max_rates set to 8 in wfx_init_common, while the maximum defined by IEEE80211_TX_TABLE_SIZE is 4. This causes array out-of-bounds writes during updates of the rate table, as seen below: BUG: KFENCE: memory corruption in kfree_rcu_work+0x320/0x36c Corrupted memory at 0xe0a4ffe0 [ 0x03 0x03 0x03 0x03 0x01 0x00 0x00 0x02 0x02 0x02 0x09 0x00 0x21 0xbb 0xbb 0xbb ] (in kfence-#81): kfree_rcu_work+0x320/0x36c process_one_work+0x3ec/0x920 worker_thread+0x60/0x7a4 kthread+0x174/0x1b4 ret_from_fork+0x14/0x2c 0x0 kfence-#81: 0xe0a4ffc0-0xe0a4ffdf, size=32, cache=kmalloc-64 allocated by task 297 on cpu 0 at 631.039555s: minstrel_ht_update_rates+0x38/0x2b0 [mac80211] rate_control_tx_status+0xb4/0x148 [mac80211] ieee80211_tx_status_ext+0x364/0x1030 [mac80211] ieee80211_tx_status+0xe0/0x118 [mac80211] ieee80211_tasklet_handler+0xb0/0xe0 [mac80211] tasklet_action_common.constprop.0+0x11c/0x148 __do_softirq+0x1a4/0x61c irq_exit+0xcc/0x104 call_with_stack+0x18/0x20 __irq_svc+0x80/0xb0 wq_worker_sleeping+0x10/0x100 wq_worker_sleeping+0x10/0x100 schedule+0x50/0xe0 schedule_timeout+0x2e0/0x474 wait_for_completion+0xdc/0x1ec mmc_wait_for_req_done+0xc4/0xf8 mmc_io_rw_extended+0x3b4/0x4ec sdio_io_rw_ext_helper+0x290/0x384 sdio_memcpy_toio+0x30/0x38 wfx_sdio_copy_to_io+0x88/0x108 [wfx] wfx_data_write+0x88/0x1f0 [wfx] bh_work+0x1c8/0xcc0 [wfx] process_one_work+0x3ec/0x920 worker_thread+0x60/0x7a4 kthread+0x174/0x1b4 ret_from_fork+0x14/0x2c 0x0 After discussion on the wireless mailing list it was clarified that the issue has been introduced by: commit ee0e16ab756a ("mac80211: minstrel_ht: fill all requested rates") and fix shall be in minstrel_ht_update_rates in rc80211_minstrel_ht.c. Fixes: ee0e16ab756a ("mac80211: minstrel_ht: fill all requested rates") Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/linux-wireless/[email protected]/ Cc: Jérôme Pouiller <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Peter Seiderer <[email protected]> Cc: Kalle Valo <[email protected]> Cc: Krzysztof Drobiński <[email protected]>, Signed-off-by: Paweł Lenkow <[email protected]> Signed-off-by: Lech Perczak <[email protected]> Reviewed-by: Peter Seiderer <[email protected]> Reviewed-by: Jérôme Pouiller <[email protected]> Acked-by: Felix Fietkau <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2022-09-27wifi: mac80211: fix regression with non-QoS driversHans de Goede1-0/+4
Commit 10cb8e617560 ("mac80211: enable QoS support for nl80211 ctrl port") changed ieee80211_tx_control_port() to aways call __ieee80211_select_queue() without checking local->hw.queues. __ieee80211_select_queue() returns a queue-id between 0 and 3, which means that now ieee80211_tx_control_port() may end up setting the queue-mapping for a skb to a value higher then local->hw.queues if local->hw.queues is less then 4. Specifically this is a problem for ralink rt2500-pci cards where local->hw.queues is 2. There this causes rt2x00queue_get_tx_queue() to return NULL and the following error to be logged: "ieee80211 phy0: rt2x00mac_tx: Error - Attempt to send packet over invalid queue 2", after which association with the AP fails. Other callers of __ieee80211_select_queue() skip calling it when local->hw.queues < IEEE80211_NUM_ACS, add the same check to ieee80211_tx_control_port(). This fixes ralink rt2500-pci and similar cards when less then 4 tx-queues no longer working. Fixes: 10cb8e617560 ("mac80211: enable QoS support for nl80211 ctrl port") Cc: Markus Theil <[email protected]> Suggested-by: Stanislaw Gruszka <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-09-27wifi: mac80211: ensure vif queues are operational after startAlexander Wetzel1-2/+2
Make sure local->queue_stop_reasons and vif.txqs_stopped stay in sync. When a new vif is created the queues may end up in an inconsistent state and be inoperable: Communication not using iTXQ will work, allowing to e.g. complete the association. But the 4-way handshake will time out. The sta will not send out any skbs queued in iTXQs. All normal attempts to start the queues will fail when reaching this state. local->queue_stop_reasons will have marked all queues as operational but vif.txqs_stopped will still be set, creating an inconsistent internal state. In reality this seems to be race between the mac80211 function ieee80211_do_open() setting SDATA_STATE_RUNNING and the wake_txqs_tasklet: Depending on the driver and the timing the queues may end up to be operational or not. Cc: [email protected] Fixes: f856373e2f31 ("wifi: mac80211: do not wake queues on a vif that is being stopped") Signed-off-by: Alexander Wetzel <[email protected]> Acked-by: Felix Fietkau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-09-27wifi: mac80211: don't start TX with fq->lock to fix deadlockAlexander Wetzel1-1/+1
ieee80211_txq_purge() calls fq_tin_reset() and ieee80211_purge_tx_queue(); Both are then calling ieee80211_free_txskb(). Which can decide to TX the skb again. There are at least two ways to get a deadlock: 1) When we have a TDLS teardown packet queued in either tin or frags ieee80211_tdls_td_tx_handle() will call ieee80211_subif_start_xmit() while we still hold fq->lock. ieee80211_txq_enqueue() will thus deadlock. 2) A variant of the above happens if aggregation is up and running: In that case ieee80211_iface_work() will deadlock with the original task: The original tasks already holds fq->lock and tries to get sta->lock after kicking off ieee80211_iface_work(). But the worker can get sta->lock prior to the original task and will then spin for fq->lock. Avoid these deadlocks by not sending out any skbs when called via ieee80211_free_txskb(). Signed-off-by: Alexander Wetzel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-09-27wifi: cfg80211: fix MCS divisor valueTamizh Chelvam Raja1-2/+2
The Bitrate for HE/EHT MCS6 is calculated wrongly due to the incorrect MCS divisor value for mcs6. Fix it with the proper value. previous mcs_divisor value = (11769/6144) = 1.915527 fixed mcs_divisor value = (11377/6144) = 1.851725 Fixes: 9c97c88d2f4b ("cfg80211: Add support to calculate and report 4096-QAM HE rates") Signed-off-by: Tamizh Chelvam Raja <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-09-27net/smc: Support SO_REUSEPORTTony Lu1-0/+1
This enables SO_REUSEPORT [1] for clcsock when it is set on smc socket, so that some applications which uses it can be transparently replaced with SMC. Also, this helps improve load distribution. Here is a simple test of NGINX + wrk with SMC. The CPU usage is collected on NGINX (server) side as below. Disable SO_REUSEPORT: 05:15:33 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 05:15:34 PM all 7.02 0.00 11.86 0.00 2.04 8.93 0.00 0.00 0.00 70.15 05:15:34 PM 0 0.00 0.00 0.00 0.00 16.00 70.00 0.00 0.00 0.00 14.00 05:15:34 PM 1 11.58 0.00 22.11 0.00 0.00 0.00 0.00 0.00 0.00 66.32 05:15:34 PM 2 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 98.00 05:15:34 PM 3 16.84 0.00 30.53 0.00 0.00 0.00 0.00 0.00 0.00 52.63 05:15:34 PM 4 28.72 0.00 44.68 0.00 0.00 0.00 0.00 0.00 0.00 26.60 05:15:34 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:15:34 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:15:34 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Enable SO_REUSEPORT: 05:15:20 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 05:15:21 PM all 8.56 0.00 14.40 0.00 2.20 9.86 0.00 0.00 0.00 64.98 05:15:21 PM 0 0.00 0.00 4.08 0.00 14.29 76.53 0.00 0.00 0.00 5.10 05:15:21 PM 1 9.09 0.00 16.16 0.00 1.01 0.00 0.00 0.00 0.00 73.74 05:15:21 PM 2 9.38 0.00 16.67 0.00 1.04 0.00 0.00 0.00 0.00 72.92 05:15:21 PM 3 10.42 0.00 17.71 0.00 1.04 0.00 0.00 0.00 0.00 70.83 05:15:21 PM 4 9.57 0.00 15.96 0.00 0.00 0.00 0.00 0.00 0.00 74.47 05:15:21 PM 5 9.18 0.00 15.31 0.00 0.00 1.02 0.00 0.00 0.00 74.49 05:15:21 PM 6 8.60 0.00 15.05 0.00 0.00 0.00 0.00 0.00 0.00 76.34 05:15:21 PM 7 12.37 0.00 14.43 0.00 0.00 0.00 0.00 0.00 0.00 73.20 Using SO_REUSEPORT helps the load distribution of NGINX be more balanced. [1] https://man7.org/linux/man-pages/man7/socket.7.html Signed-off-by: Tony Lu <[email protected]> Acked-by: Wenjia Zhang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-09-26net: sched: act_ct: fix possible refcount leak in tcf_ct_init()Hangyu Hua1-1/+4
nf_ct_put need to be called to put the refcount got by tcf_ct_fill_params to avoid possible refcount leak when tcf_ct_flow_table_get fails. Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone") Signed-off-by: Hangyu Hua <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-26net/sched: taprio: simplify list iteration in taprio_dev_notifier()Vladimir Oltean1-9/+4
taprio_dev_notifier() subscribes to netdev state changes in order to determine whether interfaces which have a taprio root qdisc have changed their link speed, so the internal calculations can be adapted properly. The 'qdev' temporary variable serves no purpose, because we just use it only once, and can just as well use qdisc_dev(q->root) directly (or the "dev" that comes from the netdev notifier; this is because qdev is only interesting if it was the subject of the state change, _and_ its root qdisc belongs in the taprio list). The 'found' variable also doesn't really serve too much of a purpose either; we can just call taprio_set_picos_per_byte() within the loop, and exit immediately afterwards. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Vinicius Costa Gomes <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-26net: dsa: make user ports return to init_net on netns deletionVladimir Oltean1-0/+1
As pointed out during review, currently the following set of commands crashes the kernel: $ ip netns add ns0 $ ip link set swp0 netns ns0 $ ip netns del ns0 WARNING: CPU: 1 PID: 27 at net/core/dev.c:10884 unregister_netdevice_many+0xaa4/0xaec Workqueue: netns cleanup_net pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : unregister_netdevice_many+0xaa4/0xaec lr : unregister_netdevice_many+0x700/0xaec Call trace: unregister_netdevice_many+0xaa4/0xaec default_device_exit_batch+0x294/0x340 ops_exit_list+0xac/0xc4 cleanup_net+0x2e4/0x544 process_one_work+0x4ec/0xb40 ---[ end trace 0000000000000000 ]--- unregister_netdevice: waiting for swp0 to become free. Usage count = 2 This is because since DSA user ports, since they started populating dev->rtnl_link_ops in the blamed commit, gained a different treatment from default_device_exit_net(), which thinks these interfaces can now be unregistered. They can't; so set netns_refund = true to restore the behavior prior to populating dev->rtnl_link_ops. Fixes: 95f510d0b792 ("net: dsa: allow the DSA master to be seen and changed through rtnetlink") Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-26xdp: improve page_pool xdp_return performanceJesper Dangaard Brouer1-6/+4
During LPC2022 I meetup with my page_pool co-maintainer Ilias. When discussing page_pool code we realised/remembered certain optimizations had not been fully utilised. Since commit c07aea3ef4d4 ("mm: add a signature in struct page") struct page have a direct pointer to the page_pool object this page was allocated from. Thus, with this info it is possible to skip the rhashtable_lookup to find the page_pool object in __xdp_return(). The rcu_read_lock can be removed as it was tied to xdp_mem_allocator. The page_pool object is still safe to access as it tracks inflight pages and (potentially) schedules final release from a work queue. Created a micro benchmark of XDP redirecting from mlx5 into veth with XDP_DROP bpf-prog on the peer veth device. This increased performance 6.5% from approx 8.45Mpps to 9Mpps corresponding to using 7 nanosec (27 cycles at 3.8GHz) less per packet. Suggested-by: Ilias Apalodimas <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Acked-by: Toke Høiland-Jørgensen <[email protected]> Reviewed-by: Ilias Apalodimas <[email protected]> Link: https://lore.kernel.org/r/166377993287.1737053.10258297257583703949.stgit@firesoul Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-26SUNRPC: Fix typo in xdr_buf_subsegment's kdoc commentChuck Lever1-1/+1
Fix a typo. Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-09-26NFSD: Refactor common code out of dirlist helpersChuck Lever1-0/+22
The dust has settled a bit and it's become obvious what code is totally common between nfsd_init_dirlist_pages() and nfsd3_init_dirlist_pages(). Move that common code to SUNRPC. The new helper brackets the existing xdr_init_decode_pages() API. Signed-off-by: Chuck Lever <[email protected]>
2022-09-26SUNRPC: Clarify comment that documents svc_max_payload()Chuck Lever1-2/+6
Note the function returns a per-transport value, not a per-request value (eg, one that is related to the size of the available send or receive buffer space). Signed-off-by: Chuck Lever <[email protected]>
2022-09-26SUNRPC: Parametrize how much of argsize should be zeroedChuck Lever1-1/+1
Currently, SUNRPC clears the whole of .pc_argsize before processing each incoming RPC transaction. Add an extra parameter to struct svc_procedure to enable upper layers to reduce the amount of each operation's argument structure that is zeroed by SUNRPC. The size of struct nfsd4_compoundargs, in particular, is a lot to clear on each incoming RPC Call. A subsequent patch will cut this down to something closer to what NFSv2 and NFSv3 uses. This patch should cause no behavior changes. Signed-off-by: Chuck Lever <[email protected]>
2022-09-26SUNRPC: Optimize svc_process()Chuck Lever1-13/+11
Move exception handling code out of the hot path, and avoid the need for a bswap of a non-constant. Signed-off-by: Chuck Lever <[email protected]>
2022-09-26af_unix: Refactor unix_read_skb()Peilin Ye1-24/+10
Similar to udp_read_skb(), delete the unnecessary while loop in unix_read_skb() for readability. Since recv_actor() cannot return a value greater than skb->len (see sk_psock_verdict_recv()), remove the redundant check. Suggested-by: Cong Wang <[email protected]> Signed-off-by: Peilin Ye <[email protected]> Link: https://lore.kernel.org/r/7009141683ad6cd3785daced3e4a80ba0eb773b5.1663909008.git.peilin.ye@bytedance.com Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-26udp: Refactor udp_read_skb()Peilin Ye1-29/+17
Delete the unnecessary while loop in udp_read_skb() for readability. Additionally, since recv_actor() cannot return a value greater than skb->len (see sk_psock_verdict_recv()), remove the redundant check. Suggested-by: Cong Wang <[email protected]> Signed-off-by: Peilin Ye <[email protected]> Link: https://lore.kernel.org/r/343b5d8090a3eb764068e9f1d392939e2b423747.1663909008.git.peilin.ye@bytedance.com Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-26skmsg: Schedule psock work if the cached skb exists on the psockLiu Jian1-4/+8
In sk_psock_backlog function, for ingress direction skb, if no new data packet arrives after the skb is cached, the cached skb does not have a chance to be added to the receive queue of psock. As a result, the cached skb cannot be received by the upper-layer application. Fix this by reschedule the psock work to dispose the cached skb in sk_msg_recvmsg function. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-26net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memoryLiu Jian1-1/+2
Fixes the below NULL pointer dereference: [...] [ 14.471200] Call Trace: [ 14.471562] <TASK> [ 14.471882] lock_acquire+0x245/0x2e0 [ 14.472416] ? remove_wait_queue+0x12/0x50 [ 14.473014] ? _raw_spin_lock_irqsave+0x17/0x50 [ 14.473681] _raw_spin_lock_irqsave+0x3d/0x50 [ 14.474318] ? remove_wait_queue+0x12/0x50 [ 14.474907] remove_wait_queue+0x12/0x50 [ 14.475480] sk_stream_wait_memory+0x20d/0x340 [ 14.476127] ? do_wait_intr_irq+0x80/0x80 [ 14.476704] do_tcp_sendpages+0x287/0x600 [ 14.477283] tcp_bpf_push+0xab/0x260 [ 14.477817] tcp_bpf_sendmsg_redir+0x297/0x500 [ 14.478461] ? __local_bh_enable_ip+0x77/0xe0 [ 14.479096] tcp_bpf_send_verdict+0x105/0x470 [ 14.479729] tcp_bpf_sendmsg+0x318/0x4f0 [ 14.480311] sock_sendmsg+0x2d/0x40 [ 14.480822] ____sys_sendmsg+0x1b4/0x1c0 [ 14.481390] ? copy_msghdr_from_user+0x62/0x80 [ 14.482048] ___sys_sendmsg+0x78/0xb0 [ 14.482580] ? vmf_insert_pfn_prot+0x91/0x150 [ 14.483215] ? __do_fault+0x2a/0x1a0 [ 14.483738] ? do_fault+0x15e/0x5d0 [ 14.484246] ? __handle_mm_fault+0x56b/0x1040 [ 14.484874] ? lock_is_held_type+0xdf/0x130 [ 14.485474] ? find_held_lock+0x2d/0x90 [ 14.486046] ? __sys_sendmsg+0x41/0x70 [ 14.486587] __sys_sendmsg+0x41/0x70 [ 14.487105] ? intel_pmu_drain_pebs_core+0x350/0x350 [ 14.487822] do_syscall_64+0x34/0x80 [ 14.488345] entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] The test scenario has the following flow: thread1 thread2 ----------- --------------- tcp_bpf_sendmsg tcp_bpf_send_verdict tcp_bpf_sendmsg_redir sock_close tcp_bpf_push_locked __sock_release tcp_bpf_push //inet_release do_tcp_sendpages sock->ops->release sk_stream_wait_memory // tcp_close sk_wait_event sk->sk_prot->close release_sock(__sk); *** lock_sock(sk); __tcp_close sock_orphan(sk) sk->sk_wq = NULL release_sock **** lock_sock(__sk); remove_wait_queue(sk_sleep(sk), &wait); sk_sleep(sk) //NULL pointer dereference &rcu_dereference_raw(sk->sk_wq)->wait While waiting for memory in thread1, the socket is released with its wait queue because thread2 has closed it. This caused by tcp_bpf_send_verdict didn't increase the f_count of psock->sk_redir->sk_socket->file in thread1. We should check if SOCK_DEAD flag is set on wakeup in sk_stream_wait_memory before accessing the wait queue. Suggested-by: Jakub Sitnicki <[email protected]> Signed-off-by: Liu Jian <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Cc: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-25Merge 7e2cd21e02b3 ("Merge tag 'tty-6.0-rc7' of ↵Greg Kroah-Hartman53-300/+500
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty") into tty-next We need the tty fixes and api additions in this branch. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-09-24Bluetooth: RFCOMM: remove define-only RFCOMM_TTY_MAGIC ex-magic-numberнаб1-1/+0
Appeared in its present state in pre-git (2.5.41), never used Found with grep MAGIC Documentation/process/magic-number.rst | while read -r mag _; do git grep -wF "$mag" | grep -ve '^Documentation.*magic-number.rst:' \ -qe ':#define '"$mag" || git grep -wF "$mag" | while IFS=: read -r f _; do sed -i '/\b'"$mag"'\b/d' "$f"; done ; done Signed-off-by: Ahelenia Ziemiańska <[email protected]> Link: https://lore.kernel.org/r/f6d375201dfd99416ea03b49b3dd40af56c1537e.1663280877.git.nabijaczleweli@nabijaczleweli.xyz Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-09-23ipv6: tcp: send consistent autoflowlabel in RST packetsEric Dumazet1-1/+4
Blamed commit added a txhash parameter to tcp_v6_send_response() but forgot to update tcp_v6_send_reset() accordingly. Fixes: aa51b80e1af4 ("ipv6: tcp: send consistent autoflowlabel in SYN_RECV state") Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-23Merge branch 'for-6.0-fixes' into for-6.1Tejun Heo86-437/+744
for-6.0 has the following fix for cgroup_get_from_id(). 836ac87d ("cgroup: fix cgroup_get_from_id") which conflicts with the following two commits in for-6.1. 4534dee9 ("cgroup: cgroup: Honor caller's cgroup NS when resolving cgroup id") fa7e439c ("cgroup: Homogenize cgroup_get_from_id() return value") While the resolution is straightforward, the code ends up pretty ugly afterwards. Let's pull for-6.0-fixes into for-6.1 so that the code can be fixed up there. Signed-off-by: Tejun Heo <[email protected]>
2022-09-23Merge tag 'linux-can-next-for-6.1-20220923' of ↵Jakub Kicinski1-6/+19
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2022-09-23 The first 2 patches are by Ziyang Xuan and optimize registration and the sending in the CAN BCM protocol a bit. The next 8 patches target the gs_usb driver. 7 are by me and first fix the time hardware stamping support (added during this net-next cycle), rename a variable, convert the usb_control_msg + manual kmalloc()/kfree() to usb_control_msg_{send,rev}(), clean up the error handling and add switchable termination support. The patch by Rhett Aultman and Vasanth Sadhasivan convert the driver from usb_alloc_coherent()/usb_free_coherent() to kmalloc()/URB_FREE_BUFFER. The last patch is by Shang XiaoJing and removes an unneeded call to dev_err() from the ctucanfd driver. * tag 'linux-can-next-for-6.1-20220923' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: ctucanfd: Remove redundant dev_err call can: gs_usb: remove dma allocations can: gs_usb: add switchable termination support can: gs_usb: gs_make_candev(): clean up error handling can: gs_usb: convert from usb_control_msg() to usb_control_msg_{send,recv}() can: gs_usb: gs_cmd_reset(): rename variable holding struct gs_can pointer to dev can: gs_usb: gs_can_open(): initialize time counter before starting device can: gs_usb: add missing lock to protect struct timecounter::cycle_last can: gs_usb: gs_usb_get_timestamp(): fix endpoint parameter for usb_control_msg_recv() can: bcm: check the result of can_send() in bcm_can_tx() can: bcm: registration process optimization in bcm_module_init() ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-23can: bcm: check the result of can_send() in bcm_can_tx()Ziyang Xuan1-3/+4
If can_send() fail, it should not update frames_abs counter in bcm_can_tx(). Add the result check for can_send() in bcm_can_tx(). Suggested-by: Marc Kleine-Budde <[email protected]> Suggested-by: Oliver Hartkopp <[email protected]> Signed-off-by: Ziyang Xuan <[email protected]> Link: https://lore.kernel.org/all/9851878e74d6d37aee2f1ee76d68361a46f89458.1663206163.git.william.xuanziyang@huawei.com Acked-by: Oliver Hartkopp <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
2022-09-23can: bcm: registration process optimization in bcm_module_init()Ziyang Xuan1-3/+15
Now, register_netdevice_notifier() and register_pernet_subsys() are both after can_proto_register(). It can create CAN_BCM socket and process socket once can_proto_register() successfully, so it is possible missing notifier event or proc node creation because notifier or bcm proc directory is not registered or created yet. Although this is a low probability scenario, it is not impossible. Move register_pernet_subsys() and register_netdevice_notifier() to the front of can_proto_register(). In addition, register_pernet_subsys() and register_netdevice_notifier() may fail, check their results are necessary. Signed-off-by: Ziyang Xuan <[email protected]> Link: https://lore.kernel.org/all/823cff0ebec33fa9389eeaf8b8ded3217c32cb38.1663206163.git.william.xuanziyang@huawei.com Acked-by: Oliver Hartkopp <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
2022-09-23net: phy: Add support for rate matchingSean Anderson2-0/+6
This adds support for rate matching (also known as rate adaptation) to the phy subsystem. The general idea is that the phy interface runs at one speed, and the MAC throttles the rate at which it sends packets to the link speed. There's a good overview of several techniques for achieving this at [1]. This patch adds support for three: pause-frame based (such as in Aquantia phys), CRS-based (such as in 10PASS-TS and 2BASE-TL), and open-loop-based (such as in 10GBASE-W). This patch makes a few assumptions and a few non assumptions about the types of rate matching available. First, it assumes that different phys may use different forms of rate matching. Second, it assumes that phys can use rate matching for any of their supported link speeds (e.g. if a phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T). Third, it does not assume that all interface modes will use the same form of rate matching. Fourth, it does not assume that all phy devices will support rate matching (even if some do). Relaxing or strengthening these (non-)assumptions could result in a different API. For example, if all interface modes were assumed to use the same form of rate matching, then a bitmask of interface modes supportting rate matching would suffice. For some better visibility into the process, the current rate matching mode is exposed as part of the ethtool ksettings. For the moment, only read access is supported. I'm not sure what userspace might want to configure yet (disable it altogether, disable just one mode, specify the mode to use, etc.). For the moment, since only pause-based rate adaptation support is added in the next few commits, rate matching can be disabled altogether by adjusting the advertisement. 802.3 calls this feature "rate adaptation" in clause 49 (10GBASE-R) and "rate matching" in clause 61 (10PASS-TL and 2BASE-TS). Aquantia also calls this feature "rate adaptation". I chose "rate matching" because it is shorter, and because Russell doesn't think "adaptation" is correct in this context. Signed-off-by: Sean Anderson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-22ethtool: tunnels: check the return value of nla_nest_start()Li Zhong1-0/+2
Check the return value of nla_nest_start(). When starting the entry level nested attributes, if the tailroom of socket buffer is insufficient to store the attribute header and payload, the return value will be NULL. There is, however, no real bug here since if the skb is full nla_put_be16() will fail as well and we'll error out. Signed-off-by: Li Zhong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-22net/sched: use tc_qdisc_stats_dump() in qdiscZhengchao Shao22-147/+31
use tc_qdisc_stats_dump() in qdisc. Signed-off-by: Zhengchao Shao <[email protected]> Reviewed-by: Victor Nogueira <[email protected]> Tested-by: Victor Nogueira <[email protected]> Acked-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-22net/sched: taprio: remove unnecessary taprio_list_lockVladimir Oltean1-7/+0
The 3 functions that want access to the taprio_list: taprio_dev_notifier(), taprio_destroy() and taprio_init() are all called with the rtnl_mutex held, therefore implicitly serialized with respect to each other. A spin lock serves no purpose. Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Vinicius Costa Gomes <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-22net/tls: Support 256 bit keys with TX device offloadGal Pressman2-0/+13
Add the missing clause for 256 bit keys in tls_set_device_offload(), and the needed adjustments in tls_device_fallback.c. Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Gal Pressman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-22net/tls: Use cipher sizes structsGal Pressman2-51/+76
Use the newly introduced cipher sizes structs instead of the repeated switch cases churn. Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Gal Pressman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-22net/tls: Describe ciphers sizes by const structsTariq Toukan1-0/+17
Introduce cipher sizes descriptor. It helps reducing the amount of code duplications and repeated switch/cases that assigns the proper sizes according to the cipher type. Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Gal Pressman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski21-65/+132
drivers/net/ethernet/freescale/fec.h 7b15515fc1ca ("Revert "fec: Restart PPS after link state change"") 40c79ce13b03 ("net: fec: add stop mode support for imx8 platform") https://lore.kernel.org/all/[email protected]/ drivers/pinctrl/pinctrl-ocelot.c c297561bc98a ("pinctrl: ocelot: Fix interrupt controller") 181f604b33cd ("pinctrl: ocelot: add ability to be used in a non-mmio configuration") https://lore.kernel.org/all/[email protected]/ tools/testing/selftests/drivers/net/bonding/Makefile bbb774d921e2 ("net: Add tests for bonding and team address list management") 152e8ec77640 ("selftests/bonding: add a test for bonding lladdr target") https://lore.kernel.org/all/[email protected]/ drivers/net/can/usb/gs_usb.c 5440428b3da6 ("can: gs_usb: gs_can_open(): fix race dev->can.state condition") 45dfa45f52e6 ("can: gs_usb: add RX and TX hardware timestamp support") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-22Merge tag 'net-6.0-rc7' of ↵Linus Torvalds19-61/+125
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wifi, netfilter and can. A handful of awaited fixes here - revert of the FEC changes, bluetooth fix, fixes for iwlwifi spew. We added a warning in PHY/MDIO code which is triggering on a couple of platforms in a false-positive-ish way. If we can't iron that out over the week we'll drop it and re-add for 6.1. I've added a new "follow up fixes" section for fixes to fixes in 6.0-rcs but it may actually give the false impression that those are problematic or that more testing time would have caught them. So likely a one time thing. Follow up fixes: - nf_tables_addchain: fix nft_counters_enabled underflow - ebtables: fix memory leak when blob is malformed - nf_ct_ftp: fix deadlock when nat rewrite is needed Current release - regressions: - Revert "fec: Restart PPS after link state change" and the related "net: fec: Use a spinlock to guard `fep->ptp_clk_on`" - Bluetooth: fix HCIGETDEVINFO regression - wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2 - mptcp: fix fwd memory accounting on coalesce - rwlock removal fall out: - ipmr: always call ip{,6}_mr_forward() from RCU read-side critical section - ipv6: fix crash when IPv6 is administratively disabled - tcp: read multiple skbs in tcp_read_skb() - mdio_bus_phy_resume state warning fallout: - eth: ravb: fix PHY state warning splat during system resume - eth: sh_eth: fix PHY state warning splat during system resume Current release - new code bugs: - wifi: iwlwifi: don't spam logs with NSS>2 messages - eth: mtk_eth_soc: enable XDP support just for MT7986 SoC Previous releases - regressions: - bonding: fix NULL deref in bond_rr_gen_slave_id - wifi: iwlwifi: mark IWLMEI as broken Previous releases - always broken: - nf_conntrack helpers: - irc: tighten matching on DCC message - sip: fix ct_sip_walk_headers - osf: fix possible bogus match in nf_osf_find() - ipvlan: fix out-of-bound bugs caused by unset skb->mac_header - core: fix flow symmetric hash - bonding, team: unsync device addresses on ndo_stop - phy: micrel: fix shared interrupt on LAN8814" * tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits) selftests: forwarding: add shebang for sch_red.sh bnxt: prevent skb UAF after handing over to PTP worker net: marvell: Fix refcounting bugs in prestera_port_sfp_bind() net: sched: fix possible refcount leak in tc_new_tfilter() net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD udp: Use WARN_ON_ONCE() in udp_read_skb() selftests: bonding: cause oops in bond_rr_gen_slave_id bonding: fix NULL deref in bond_rr_gen_slave_id net: phy: micrel: fix shared interrupt on LAN8814 net/smc: Stop the CLC flow if no link to map buffers on ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient net: atlantic: fix potential memory leak in aq_ndev_close() can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported can: gs_usb: gs_can_open(): fix race dev->can.state condition can: flexcan: flexcan_mailbox_read() fix return value for drop = true net: sh_eth: Fix PHY state warning splat during system resume net: ravb: Fix PHY state warning splat during system resume netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed netfilter: ebtables: fix memory leak when blob is malformed netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain() ...
2022-09-22xsk: Inherit need_wakeup flag for shared socketsJalal Mostafa2-4/+5
The flag for need_wakeup is not set for xsks with `XDP_SHARED_UMEM` flag and of different queue ids and/or devices. They should inherit the flag from the first socket buffer pool since no flags can be specified once `XDP_SHARED_UMEM` is specified. Fixes: b5aea28dca134 ("xsk: Add shared umem support between queue ids") Signed-off-by: Jalal Mostafa <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-09-22net: sched: fix possible refcount leak in tc_new_tfilter()Hangyu Hua1-0/+1
tfilter_put need to be called to put the refount got by tp->ops->get to avoid possible refcount leak when chain->tmplt_ops != NULL and chain->tmplt_ops != tp->ops. Fixes: 7d5509fa0d3d ("net: sched: extend proto ops with 'put' callback") Signed-off-by: Hangyu Hua <[email protected]> Reviewed-by: Vlad Buslov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-22udp: Use WARN_ON_ONCE() in udp_read_skb()Peilin Ye1-1/+1
Prevent udp_read_skb() from flooding the syslog. Suggested-by: Jakub Sitnicki <[email protected]> Signed-off-by: Peilin Ye <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-22net/smc: Unbind r/w buffer size from clcsock and make them tunableTony Lu3-7/+27
Currently, SMC uses smc->sk.sk_{rcv|snd}buf to create buffers for send buffer and RMB. And the values of buffer size are from tcp_{w|r}mem in clcsock. The buffer size from TCP socket doesn't fit SMC well. Generally, buffers are usually larger than TCP for SMC-R/-D to get higher performance, for they are different underlay devices and paths. So this patch unbinds buffer size from TCP, and introduces two sysctl knobs to tune them independently. Also, these knobs are per net namespace and work for containers. Signed-off-by: Tony Lu <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-09-22net/smc: Introduce a specific sysctl for TEST_LINK timeWen Gu3-1/+11
SMC-R tests the viability of link by sending out TEST_LINK LLC messages over RoCE fabric when connections on link have been idle for a time longer than keepalive interval (testlink time). But using tcp_keepalive_time as testlink time maybe not quite suitable because it is default no less than two hours[1], which is too long for single link to find peer dead. The active host will still use peer-dead link (QP) sending messages, and can't find out until get IB_WC_RETRY_EXC_ERR error CQEs, which takes more time than TEST_LINK timeout (SMC_LLC_WAIT_TIME) normally. So this patch introduces a independent sysctl for SMC-R to set link keepalive time, in order to detect link down in time. The default value is 30 seconds. [1] https://www.rfc-editor.org/rfc/rfc1122#page-101 Signed-off-by: Wen Gu <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>