aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-10-21net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OFRob Herring1-1/+1
Commit b0377116decd ("net: ethernet: Use device_get_match_data()") dropped the unconditional use of xgene_enet_of_match resulting in this warning: drivers/net/ethernet/apm/xgene/xgene_enet_main.c:2004:34: warning: unused variable 'xgene_enet_of_match' [-Wunused-const-variable] The fix is to drop of_match_ptr() which is not necessary because DT is always used for this driver (well, it could in theory support ACPI only, but CONFIG_OF is always enabled for arm64). Fixes: b0377116decd ("net: ethernet: Use device_get_match_data()") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Rob Herring <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20i40e: sync next_to_clean and next_to_process for programming status descTirthendu Sarkar1-1/+8
When a programming status desc is encountered on the rx_ring, next_to_process is bumped along with cleaned_count but next_to_clean is not. This causes I40E_DESC_UNUSED() macro to misbehave resulting in overwriting whole ring with new buffers. Update next_to_clean to point to next_to_process on seeing a programming status desc if not in the middle of handling a multi-frag packet. Also, bump cleaned_count only for such case as otherwise next_to_clean buffer may be returned to hardware on reaching clean_threshold. Fixes: e9031f2da1ae ("i40e: introduce next_to_process to i40e_ring") Suggested-by: Maciej Fijalkowski <[email protected]> Reported-by: [email protected] Reported by: Solomon Peachy <[email protected]> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217678 Tested-by: [email protected] Tested by: Indrek Järve <[email protected]> Signed-off-by: Tirthendu Sarkar <[email protected]> Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-20igc: Fix ambiguity in the ethtool advertisingSasha Neftin1-10/+25
The 'ethtool_convert_link_mode_to_legacy_u32' method does not allow us to advertise 2500M speed support and TP (twisted pair) properly. Convert to 'ethtool_link_ksettings_test_link_mode' to advertise supported speed and eliminate ambiguity. Fixes: 8c5ad0dae93c ("igc: Add ethtool support") Suggested-by: Dima Ruinskiy <[email protected]> Suggested-by: Vitaly Lifshits <[email protected]> Signed-off-by: Sasha Neftin <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-20neighbour: fix various data-racesEric Dumazet1-32/+35
1) tbl->gc_thresh1, tbl->gc_thresh2, tbl->gc_thresh3 and tbl->gc_interval can be written from sysfs. 2) tbl->last_flush is read locklessly from neigh_alloc() 3) tbl->proxy_queue.qlen is read locklessly from neightbl_fill_info() 4) neightbl_fill_info() reads cpu stats that can be changed concurrently. Fixes: c7fb64db001f ("[NETLINK]: Neighbour table configuration and statistics via rtnetlink") Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-20net: do not leave an empty skb in write queueEric Dumazet1-3/+5
Under memory stress conditions, tcp_sendmsg_locked() might call sk_stream_wait_memory(), thus releasing the socket lock. If a fresh skb has been allocated prior to this, we should not leave it in the write queue otherwise tcp_write_xmit() could panic. This apparently does not happen often, but a future change in __sk_mem_raise_allocated() that Shakeel and others are considering would increase chances of being hurt. Under discussion is to remove this controversial part: /* Fail only if socket is _under_ its sndbuf. * In this case we cannot block, so that we have to fail. */ if (sk->sk_wmem_queued + size >= sk->sk_sndbuf) { /* Force charge with __GFP_NOFAIL */ if (memcg_charge && !charged) { mem_cgroup_charge_skmem(sk->sk_memcg, amt, gfp_memcg_charge() | __GFP_NOFAIL); } return 1; } Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-20igb: Fix potential memory leak in igb_add_ethtool_nfc_entryMateusz Palczewski1-1/+5
Add check for return of igb_update_ethtool_nfc_entry so that in case of any potential errors the memory alocated for input will be freed. Fixes: 0e71def25281 ("igb: add support of RX network flow classification") Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: Mateusz Palczewski <[email protected]> Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20treewide: Spelling fix in commentKunwu Chan1-1/+1
reques -> request Fixes: 09dde54c6a69 ("PS3: gelic: Add wireless support for PS3") Signed-off-by: Kunwu Chan <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20i40e: Fix I40E_FLAG_VF_VLAN_PRUNING valueIvan Vecera1-1/+1
Commit c87c938f62d8f1 ("i40e: Add VF VLAN pruning") added new PF flag I40E_FLAG_VF_VLAN_PRUNING but its value collides with existing I40E_FLAG_TOTAL_PORT_SHUTDOWN_ENABLED flag. Move the affected flag at the end of the flags and fix its value. Reproducer: [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close on [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 vf-vlan-pruning on [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close off [ 6323.142585] i40e 0000:02:00.0: Setting link-down-on-close not supported on this port (because total-port-shutdown is enabled) netlink error: Operation not supported [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 vf-vlan-pruning off [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close off The link-down-on-close flag cannot be modified after setting vf-vlan-pruning because vf-vlan-pruning shares the same bit with total-port-shutdown flag that prevents any modification of link-down-on-close flag. Fixes: c87c938f62d8 ("i40e: Add VF VLAN pruning") Cc: Mateusz Palczewski <[email protected]> Cc: Simon Horman <[email protected]> Signed-off-by: Ivan Vecera <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: David S. Miller <[email protected]>
2023-10-20iavf: initialize waitqueues before starting watchdog_taskMichal Schmidt1-2/+3
It is not safe to initialize the waitqueues after queueing the watchdog_task. It will be using them. The chance of this causing a real problem is very small, because there will be some sleeping before any of the waitqueues get used. I got a crash only after inserting an artificial sleep in iavf_probe. Queue the watchdog_task as the last step in iavf_probe. Add a comment to prevent repeating the mistake. Fixes: fe2647ab0c99 ("i40evf: prevent VF close returning before state transitions to DOWN") Signed-off-by: Michal Schmidt <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20r8169: fix the KCSAN reported data race in rtl_rx while reading desc->opts1Mirsad Goran Todorovac1-1/+1
KCSAN reported the following data-race bug: ================================================================== BUG: KCSAN: data-race in rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169 race at unknown origin, with read to 0xffff888117e43510 of 4 bytes by interrupt on cpu 21: rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169 __napi_poll (net/core/dev.c:6527) net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727) __do_softirq (kernel/softirq.c:553) __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632) irq_exit_rcu (kernel/softirq.c:647) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1074 (discriminator 14)) asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:645) cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291) cpuidle_enter (drivers/cpuidle/cpuidle.c:390) call_cpuidle (kernel/sched/idle.c:135) do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282) cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1)) start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294) secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433) value changed: 0x80003fff -> 0x3402805f Reported by Kernel Concurrency Sanitizer on: CPU: 21 PID: 0 Comm: swapper/21 Tainted: G L 6.6.0-rc2-kcsan-00143-gb5cbe7c00aa0 #41 Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023 ================================================================== drivers/net/ethernet/realtek/r8169_main.c: ========================================== 4429 → 4430 status = le32_to_cpu(desc->opts1); 4431 if (status & DescOwn) 4432 break; 4433 4434 /* This barrier is needed to keep us from reading 4435 * any other fields out of the Rx descriptor until 4436 * we know the status of DescOwn 4437 */ 4438 dma_rmb(); 4439 4440 if (unlikely(status & RxRES)) { 4441 if (net_ratelimit()) 4442 netdev_warn(dev, "Rx ERROR. status = %08x\n", Marco Elver explained that dma_rmb() doesn't prevent the compiler to tear up the access to desc->opts1 which can be written to concurrently. READ_ONCE() should prevent that from happening: 4429 → 4430 status = le32_to_cpu(READ_ONCE(desc->opts1)); 4431 if (status & DescOwn) 4432 break; 4433 As the consequence of this fix, this KCSAN warning was eliminated. Fixes: 6202806e7c03a ("r8169: drop member opts1_mask from struct rtl8169_private") Suggested-by: Marco Elver <[email protected]> Cc: Heiner Kallweit <[email protected]> Cc: [email protected] Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Mirsad Goran Todorovac <[email protected]> Acked-by: Marco Elver <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20r8169: fix the KCSAN reported data-race in rtl_tx while reading ↵Mirsad Goran Todorovac1-1/+1
TxDescArray[entry].opts1 KCSAN reported the following data-race: ================================================================== BUG: KCSAN: data-race in rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4368 drivers/net/ethernet/realtek/r8169_main.c:4581) r8169 race at unknown origin, with read to 0xffff888140d37570 of 4 bytes by interrupt on cpu 21: rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4368 drivers/net/ethernet/realtek/r8169_main.c:4581) r8169 __napi_poll (net/core/dev.c:6527) net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727) __do_softirq (kernel/softirq.c:553) __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632) irq_exit_rcu (kernel/softirq.c:647) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1074 (discriminator 14)) asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:645) cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291) cpuidle_enter (drivers/cpuidle/cpuidle.c:390) call_cpuidle (kernel/sched/idle.c:135) do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282) cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1)) start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294) secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433) value changed: 0xb0000042 -> 0x00000000 Reported by Kernel Concurrency Sanitizer on: CPU: 21 PID: 0 Comm: swapper/21 Tainted: G L 6.6.0-rc2-kcsan-00143-gb5cbe7c00aa0 #41 Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023 ================================================================== The read side is in drivers/net/ethernet/realtek/r8169_main.c ========================================= 4355 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp, 4356 int budget) 4357 { 4358 unsigned int dirty_tx, bytes_compl = 0, pkts_compl = 0; 4359 struct sk_buff *skb; 4360 4361 dirty_tx = tp->dirty_tx; 4362 4363 while (READ_ONCE(tp->cur_tx) != dirty_tx) { 4364 unsigned int entry = dirty_tx % NUM_TX_DESC; 4365 u32 status; 4366 → 4367 status = le32_to_cpu(tp->TxDescArray[entry].opts1); 4368 if (status & DescOwn) 4369 break; 4370 4371 skb = tp->tx_skb[entry].skb; 4372 rtl8169_unmap_tx_skb(tp, entry); 4373 4374 if (skb) { 4375 pkts_compl++; 4376 bytes_compl += skb->len; 4377 napi_consume_skb(skb, budget); 4378 } 4379 dirty_tx++; 4380 } 4381 4382 if (tp->dirty_tx != dirty_tx) { 4383 dev_sw_netstats_tx_add(dev, pkts_compl, bytes_compl); 4384 WRITE_ONCE(tp->dirty_tx, dirty_tx); 4385 4386 netif_subqueue_completed_wake(dev, 0, pkts_compl, bytes_compl, 4387 rtl_tx_slots_avail(tp), 4388 R8169_TX_START_THRS); 4389 /* 4390 * 8168 hack: TxPoll requests are lost when the Tx packets are 4391 * too close. Let's kick an extra TxPoll request when a burst 4392 * of start_xmit activity is detected (if it is not detected, 4393 * it is slow enough). -- FR 4394 * If skb is NULL then we come here again once a tx irq is 4395 * triggered after the last fragment is marked transmitted. 4396 */ 4397 if (READ_ONCE(tp->cur_tx) != dirty_tx && skb) 4398 rtl8169_doorbell(tp); 4399 } 4400 } tp->TxDescArray[entry].opts1 is reported to have a data-race and READ_ONCE() fixes this KCSAN warning. 4366 → 4367 status = le32_to_cpu(READ_ONCE(tp->TxDescArray[entry].opts1)); 4368 if (status & DescOwn) 4369 break; 4370 Cc: Heiner Kallweit <[email protected]> Cc: [email protected] Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Marco Elver <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Mirsad Goran Todorovac <[email protected]> Acked-by: Marco Elver <[email protected]> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: David S. Miller <[email protected]>
2023-10-20r8169: fix the KCSAN reported data-race in rtl_tx() while reading tp->cur_txMirsad Goran Todorovac1-1/+1
KCSAN reported the following data-race: ================================================================== BUG: KCSAN: data-race in rtl8169_poll [r8169] / rtl8169_start_xmit [r8169] write (marked) to 0xffff888102474b74 of 4 bytes by task 5358 on cpu 29: rtl8169_start_xmit (drivers/net/ethernet/realtek/r8169_main.c:4254) r8169 dev_hard_start_xmit (./include/linux/netdevice.h:4889 ./include/linux/netdevice.h:4903 net/core/dev.c:3544 net/core/dev.c:3560) sch_direct_xmit (net/sched/sch_generic.c:342) __dev_queue_xmit (net/core/dev.c:3817 net/core/dev.c:4306) ip_finish_output2 (./include/linux/netdevice.h:3082 ./include/net/neighbour.h:526 ./include/net/neighbour.h:540 net/ipv4/ip_output.c:233) __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:293) ip_finish_output (net/ipv4/ip_output.c:328) ip_output (net/ipv4/ip_output.c:435) ip_send_skb (./include/net/dst.h:458 net/ipv4/ip_output.c:127 net/ipv4/ip_output.c:1486) udp_send_skb (net/ipv4/udp.c:963) udp_sendmsg (net/ipv4/udp.c:1246) inet_sendmsg (net/ipv4/af_inet.c:840 (discriminator 4)) sock_sendmsg (net/socket.c:730 net/socket.c:753) __sys_sendto (net/socket.c:2177) __x64_sys_sendto (net/socket.c:2185) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) read to 0xffff888102474b74 of 4 bytes by interrupt on cpu 21: rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4397 drivers/net/ethernet/realtek/r8169_main.c:4581) r8169 __napi_poll (net/core/dev.c:6527) net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727) __do_softirq (kernel/softirq.c:553) __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632) irq_exit_rcu (kernel/softirq.c:647) common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14)) asm_common_interrupt (./arch/x86/include/asm/idtentry.h:636) cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291) cpuidle_enter (drivers/cpuidle/cpuidle.c:390) call_cpuidle (kernel/sched/idle.c:135) do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282) cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1)) start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294) secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433) value changed: 0x002f4815 -> 0x002f4816 Reported by Kernel Concurrency Sanitizer on: CPU: 21 PID: 0 Comm: swapper/21 Tainted: G L 6.6.0-rc2-kcsan-00143-gb5cbe7c00aa0 #41 Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023 ================================================================== The write side of drivers/net/ethernet/realtek/r8169_main.c is: ================== 4251 /* rtl_tx needs to see descriptor changes before updated tp->cur_tx */ 4252 smp_wmb(); 4253 → 4254 WRITE_ONCE(tp->cur_tx, tp->cur_tx + frags + 1); 4255 4256 stop_queue = !netif_subqueue_maybe_stop(dev, 0, rtl_tx_slots_avail(tp), 4257 R8169_TX_STOP_THRS, 4258 R8169_TX_START_THRS); The read side is the function rtl_tx(): 4355 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp, 4356 int budget) 4357 { 4358 unsigned int dirty_tx, bytes_compl = 0, pkts_compl = 0; 4359 struct sk_buff *skb; 4360 4361 dirty_tx = tp->dirty_tx; 4362 4363 while (READ_ONCE(tp->cur_tx) != dirty_tx) { 4364 unsigned int entry = dirty_tx % NUM_TX_DESC; 4365 u32 status; 4366 4367 status = le32_to_cpu(tp->TxDescArray[entry].opts1); 4368 if (status & DescOwn) 4369 break; 4370 4371 skb = tp->tx_skb[entry].skb; 4372 rtl8169_unmap_tx_skb(tp, entry); 4373 4374 if (skb) { 4375 pkts_compl++; 4376 bytes_compl += skb->len; 4377 napi_consume_skb(skb, budget); 4378 } 4379 dirty_tx++; 4380 } 4381 4382 if (tp->dirty_tx != dirty_tx) { 4383 dev_sw_netstats_tx_add(dev, pkts_compl, bytes_compl); 4384 WRITE_ONCE(tp->dirty_tx, dirty_tx); 4385 4386 netif_subqueue_completed_wake(dev, 0, pkts_compl, bytes_compl, 4387 rtl_tx_slots_avail(tp), 4388 R8169_TX_START_THRS); 4389 /* 4390 * 8168 hack: TxPoll requests are lost when the Tx packets are 4391 * too close. Let's kick an extra TxPoll request when a burst 4392 * of start_xmit activity is detected (if it is not detected, 4393 * it is slow enough). -- FR 4394 * If skb is NULL then we come here again once a tx irq is 4395 * triggered after the last fragment is marked transmitted. 4396 */ → 4397 if (tp->cur_tx != dirty_tx && skb) 4398 rtl8169_doorbell(tp); 4399 } 4400 } Obviously from the code, an earlier detected data-race for tp->cur_tx was fixed in the line 4363: 4363 while (READ_ONCE(tp->cur_tx) != dirty_tx) { but the same solution is required for protecting the other access to tp->cur_tx: → 4397 if (READ_ONCE(tp->cur_tx) != dirty_tx && skb) 4398 rtl8169_doorbell(tp); The write in the line 4254 is protected with WRITE_ONCE(), but the read in the line 4397 might have suffered read tearing under some compiler optimisations. The fix eliminated the KCSAN data-race report for this bug. It is yet to be evaluated what happens if tp->cur_tx changes between the test in line 4363 and line 4397. This test should certainly not be cached by the compiler in some register for such a long time, while asynchronous writes to tp->cur_tx might have occurred in line 4254 in the meantime. Fixes: 94d8a98e6235c ("r8169: reduce number of workaround doorbell rings") Cc: Heiner Kallweit <[email protected]> Cc: [email protected] Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Marco Elver <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Mirsad Goran Todorovac <[email protected]> Acked-by: Marco Elver <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-19i40e: xsk: remove count_maskMaciej Fijalkowski1-9/+13
Cited commit introduced a neat way of updating next_to_clean that does not require boundary checks on each increment. This was done by masking the new value with (ring length - 1) mask. Problem is that this is applicable only for power of 2 ring sizes, for every other size this assumption can not be made. In turn, it leads to cleaning descriptors out of order as well as splats: [ 1388.411915] Workqueue: events xp_release_deferred [ 1388.411919] RIP: 0010:xp_free+0x1a/0x50 [ 1388.411921] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 8b 57 70 48 8d 47 70 48 89 e5 48 39 d0 74 06 <5d> c3 cc cc cc cc 48 8b 57 60 83 82 b8 00 00 00 01 48 8b 57 60 48 [ 1388.411922] RSP: 0018:ffa0000000a83cb0 EFLAGS: 00000206 [ 1388.411923] RAX: ff11000119aa5030 RBX: 000000000000001d RCX: ff110001129b6e50 [ 1388.411924] RDX: ff11000119aa4fa0 RSI: 0000000055555554 RDI: ff11000119aa4fc0 [ 1388.411925] RBP: ffa0000000a83cb0 R08: 0000000000000000 R09: 0000000000000000 [ 1388.411926] R10: 0000000000000001 R11: 0000000000000000 R12: ff11000115829b80 [ 1388.411927] R13: 000000000000005f R14: 0000000000000000 R15: ff11000119aa4fc0 [ 1388.411928] FS: 0000000000000000(0000) GS:ff11000277e00000(0000) knlGS:0000000000000000 [ 1388.411929] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1388.411930] CR2: 00007f1f564e6c14 CR3: 000000000783c005 CR4: 0000000000771ef0 [ 1388.411931] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1388.411931] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1388.411932] PKRU: 55555554 [ 1388.411933] Call Trace: [ 1388.411934] <IRQ> [ 1388.411935] ? show_regs+0x6e/0x80 [ 1388.411937] ? watchdog_timer_fn+0x1d2/0x240 [ 1388.411939] ? __pfx_watchdog_timer_fn+0x10/0x10 [ 1388.411941] ? __hrtimer_run_queues+0x10e/0x290 [ 1388.411945] ? clockevents_program_event+0xae/0x130 [ 1388.411947] ? hrtimer_interrupt+0x105/0x240 [ 1388.411949] ? __sysvec_apic_timer_interrupt+0x54/0x150 [ 1388.411952] ? sysvec_apic_timer_interrupt+0x7f/0x90 [ 1388.411955] </IRQ> [ 1388.411955] <TASK> [ 1388.411956] ? asm_sysvec_apic_timer_interrupt+0x1f/0x30 [ 1388.411958] ? xp_free+0x1a/0x50 [ 1388.411960] i40e_xsk_clean_rx_ring+0x5d/0x100 [i40e] [ 1388.411968] i40e_clean_rx_ring+0x14c/0x170 [i40e] [ 1388.411977] i40e_queue_pair_disable+0xda/0x260 [i40e] [ 1388.411986] i40e_xsk_pool_setup+0x192/0x1d0 [i40e] [ 1388.411993] i40e_reconfig_rss_queues+0x1f0/0x1450 [i40e] [ 1388.412002] xp_disable_drv_zc+0x73/0xf0 [ 1388.412004] ? mutex_lock+0x17/0x50 [ 1388.412007] xp_release_deferred+0x2b/0xc0 [ 1388.412010] process_one_work+0x178/0x350 [ 1388.412011] ? __pfx_worker_thread+0x10/0x10 [ 1388.412012] worker_thread+0x2f7/0x420 [ 1388.412014] ? __pfx_worker_thread+0x10/0x10 [ 1388.412015] kthread+0xf8/0x130 [ 1388.412017] ? __pfx_kthread+0x10/0x10 [ 1388.412019] ret_from_fork+0x3d/0x60 [ 1388.412021] ? __pfx_kthread+0x10/0x10 [ 1388.412023] ret_from_fork_asm+0x1b/0x30 [ 1388.412026] </TASK> It comes from picking wrong ring entries when cleaning xsk buffers during pool detach. Remove the count_mask logic and use they boundary check when updating next_to_process (which used to be a next_to_clean). Fixes: c8a8ca3408dc ("i40e: remove unnecessary memory writes of the next to clean pointer") Reported-by: Tushar Vyavahare <[email protected]> Tested-by: Tushar Vyavahare <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-19Merge tag 'net-6.6-rc7' of ↵Linus Torvalds97-466/+967
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, netfilter, WiFi. Feels like an up-tick in regression fixes, mostly for older releases. The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as fairly clear-cut user reported regressions. The mlx5 DMA bug was causing strife for 390x folks. The fixes themselves are not particularly scary, tho. No open investigations / outstanding reports at the time of writing. Current release - regressions: - eth: mlx5: perform DMA operations in the right locations, make devices usable on s390x, again - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve, previous fix of rejecting invalid config broke some scripts - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock - revert "ethtool: Fix mod state of verbose no_mask bitset", needs more work Current release - new code bugs: - tcp: fix listen() warning with v4-mapped-v6 address Previous releases - regressions: - tcp: allow tcp_disconnect() again when threads are waiting, it was denied to plug a constant source of bugs but turns out .NET depends on it - eth: mlx5: fix double-free if buffer refill fails under OOM - revert "net: wwan: iosm: enable runtime pm support for 7560", it's causing regressions and the WWAN team at Intel disappeared - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb, fix single-stream perf regression on some devices Previous releases - always broken: - Bluetooth: - fix issues in legacy BR/EDR PIN code pairing - correctly bounds check and pad HCI_MON_NEW_INDEX name - netfilter: - more fixes / follow ups for the large "commit protocol" rework, which went in as a fix to 6.5 - fix null-derefs on netlink attrs which user may not pass in - tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless Debian for keeping HZ=250 alive) - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent letting frankenstein UDP super-frames from getting into the stack - net: fix interface altnames when ifc moves to a new namespace - eth: qed: fix the size of the RX buffers - mptcp: avoid sending RST when closing the initial subflow" * tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits) Revert "ethtool: Fix mod state of verbose no_mask bitset" selftests: mptcp: join: no RST when rm subflow/addr mptcp: avoid sending RST when closing the initial subflow mptcp: more conservative check for zero probes tcp: check mptcp-level constraints for backlog coalescing selftests: mptcp: join: correctly check for no RST net: ti: icssg-prueth: Fix r30 CMDs bitmasks selftests: net: add very basic test for netdev names and namespaces net: move altnames together with the netdevice net: avoid UAF on deleted altname net: check for altname conflicts when changing netdev's netns net: fix ifname in netlink ntf during netns move net: ethernet: ti: Fix mixed module-builtin object net: phy: bcm7xxx: Add missing 16nm EPHY statistics ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr tcp_bpf: properly release resources on error paths net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve net: mdio-mux: fix C45 access returning -EIO after API change tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb octeon_ep: update BQL sent bytes before ringing doorbell ...
2023-10-19Merge tag 'loongarch-fixes-6.6-3' of ↵Linus Torvalds8-41/+51
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai ChenL "Fix 4-level pagetable building, disable WUC for pgprot_writecombine() like ioremap_wc(), use correct annotation for exception handlers, and a trivial cleanup" * tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Disable WUC for pgprot_writecombine() like ioremap_wc() LoongArch: Replace kmap_atomic() with kmap_local_page() in copy_user_highpage() LoongArch: Export symbol invalid_pud_table for modules building LoongArch: Use SYM_CODE_* to annotate exception handlers
2023-10-19Merge tag 'slab-fixes-for-6.6-rc6' of ↵Linus Torvalds1-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: - stable fix to prevent kernel warnings with KASAN_HW_TAGS on arm64 due to improperly resolved kmalloc alignment restrictions (Catalin Marinas) * tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign()
2023-10-19Merge tag 'seccomp-v6.6-rc7' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp fix from Kees Cook: - Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby) * tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: perf/benchmark: fix seccomp_unotify benchmark for 32-bit
2023-10-19Merge tag 'v6.6-rc7.vfs.fixes' of ↵Linus Torvalds3-9/+10
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fix from Christian Brauner: "An openat() call from io_uring triggering an audit call can apparently cause the refcount of struct filename to be incremented from multiple threads concurrently during async execution, triggering a refcount underflow and hitting a BUG_ON(). That bug has been lurking around since at least v5.16 apparently. Switch to an atomic counter to fix that. The underflow check is downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily remove that check altogether tbh" * tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: audit,io_uring: io_uring openat triggers audit reference count underflow
2023-10-19Revert "ethtool: Fix mod state of verbose no_mask bitset"Kory Maincent1-26/+6
This reverts commit 108a36d07c01edbc5942d27c92494d1c6e4d45a0. It was reported that this fix breaks the possibility to remove existing WoL flags. For example: ~$ ethtool lan2 ... Supports Wake-on: pg Wake-on: d ... ~$ ethtool -s lan2 wol gp ~$ ethtool lan2 ... Wake-on: pg ... ~$ ethtool -s lan2 wol d ~$ ethtool lan2 ... Wake-on: pg ... This worked correctly before this commit because we were always updating a zero bitmap (since commit 6699170376ab ("ethtool: fix application of verbose no_mask bitset"), that is) so that the rest was left zero naturally. But now the 1->0 change (old_val is true, bit not present in netlink nest) no longer works. Reported-by: Oleksij Rempel <[email protected]> Reported-by: Michal Kubecek <[email protected]> Closes: https://lore.kernel.org/netdev/[email protected]/ Cc: [email protected] Fixes: 108a36d07c01 ("ethtool: Fix mod state of verbose no_mask bitset") Signed-off-by: Kory Maincent <[email protected]> Reviewed-by: Michal Kubecek <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-19Merge tag 'ntfs3_for_6.6' of ↵Linus Torvalds16-82/+197
https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 fixes from Konstantin Komarov: - memory leak - some logic errors, NULL dereferences - some code was refactored - more sanity checks * tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3: fs/ntfs3: Avoid possible memory leak fs/ntfs3: Fix directory element type detection fs/ntfs3: Fix possible null-pointer dereference in hdr_find_e() fs/ntfs3: Fix OOB read in ntfs_init_from_boot fs/ntfs3: fix panic about slab-out-of-bounds caused by ntfs_list_ea() fs/ntfs3: Fix NULL pointer dereference on error in attr_allocate_frame() fs/ntfs3: Fix possible NULL-ptr-deref in ni_readpage_cmpr() fs/ntfs3: Do not allow to change label if volume is read-only fs/ntfs3: Add more info into /proc/fs/ntfs3/<dev>/volinfo fs/ntfs3: Refactoring and comments fs/ntfs3: Fix alternative boot searching fs/ntfs3: Allow repeated call to ntfs3_put_sbi fs/ntfs3: Use inode_set_ctime_to_ts instead of inode_set_ctime fs/ntfs3: Fix shift-out-of-bounds in ntfs_fill_super fs/ntfs3: fix deadlock in mark_as_free_ex fs/ntfs3: Add more attributes checks in mi_enum_attr() fs/ntfs3: Use kvmalloc instead of kmalloc(... __GFP_NOWARN) fs/ntfs3: Write immediately updated ntfs state fs/ntfs3: Add ckeck in ni_update_parent()
2023-10-19Merge branch 'mptcp-fixes-for-v6-6'Jakub Kicinski3-15/+43
Mat Martineau says: ==================== mptcp: Fixes for v6.6 Patch 1 corrects the logic for MP_JOIN tests where 0 RSTs are expected. Patch 2 ensures MPTCP packets are not incorrectly coalesced in the TCP backlog queue. Patch 3 avoids a zero-window probe and associated WARN_ON_ONCE() in an expected MPTCP reinjection scenario. Patches 4 & 5 allow an initial MPTCP subflow to be closed cleanly instead of always sending RST. Associated selftest is updated. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-19selftests: mptcp: join: no RST when rm subflow/addrMatthieu Baerts1-0/+13
Recently, we noticed that some RST were wrongly generated when removing the initial subflow. This patch makes sure RST are not sent when removing any subflows or any addresses. Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures") Cc: [email protected] Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-19mptcp: avoid sending RST when closing the initial subflowGeliang Tang1-6/+22
When closing the first subflow, the MPTCP protocol unconditionally calls tcp_disconnect(), which in turn generates a reset if the subflow is established. That is unexpected and different from what MPTCP does with MPJ subflows, where resets are generated only on FASTCLOSE and other edge scenarios. We can't reuse for the first subflow the same code in place for MPJ subflows, as MPTCP clean them up completely via a tcp_close() call, while must keep the first subflow socket alive for later re-usage, due to implementation constraints. This patch adds a new helper __mptcp_subflow_disconnect() that encapsulates, a logic similar to tcp_close, issuing a reset only when the MPTCP_CF_FASTCLOSE flag is set, and performing a clean shutdown otherwise. Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures") Cc: [email protected] Reviewed-by: Matthieu Baerts <[email protected]> Co-developed-by: Paolo Abeni <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-19mptcp: more conservative check for zero probesPaolo Abeni1-7/+1
Christoph reported that the MPTCP protocol can find the subflow-level write queue unexpectedly not empty while crafting a zero-window probe, hitting a warning: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 188 at net/mptcp/protocol.c:1312 mptcp_sendmsg_frag+0xc06/0xe70 Modules linked in: CPU: 0 PID: 188 Comm: kworker/0:2 Not tainted 6.6.0-rc2-g1176aa719d7a #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:mptcp_sendmsg_frag+0xc06/0xe70 net/mptcp/protocol.c:1312 RAX: 47d0530de347ff6a RBX: 47d0530de347ff6b RCX: ffff8881015d3c00 RDX: ffff8881015d3c00 RSI: 47d0530de347ff6b RDI: 47d0530de347ff6b RBP: 47d0530de347ff6b R08: ffffffff8243c6a8 R09: ffffffff82042d9c R10: 0000000000000002 R11: ffffffff82056850 R12: ffff88812a13d580 R13: 0000000000000001 R14: ffff88812b375e50 R15: ffff88812bbf3200 FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000695118 CR3: 0000000115dfc001 CR4: 0000000000170ef0 Call Trace: <TASK> __subflow_push_pending+0xa4/0x420 net/mptcp/protocol.c:1545 __mptcp_push_pending+0x128/0x3b0 net/mptcp/protocol.c:1614 mptcp_release_cb+0x218/0x5b0 net/mptcp/protocol.c:3391 release_sock+0xf6/0x100 net/core/sock.c:3521 mptcp_worker+0x6e8/0x8f0 net/mptcp/protocol.c:2746 process_scheduled_works+0x341/0x690 kernel/workqueue.c:2630 worker_thread+0x3a7/0x610 kernel/workqueue.c:2784 kthread+0x143/0x180 kernel/kthread.c:388 ret_from_fork+0x4d/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:304 </TASK> The root cause of the issue is that expectations are wrong: e.g. due to MPTCP-level re-injection we can hit the critical condition. Explicitly avoid the zero-window probe when the subflow write queue is not empty and drop the related warnings. Reported-by: Christoph Paasch <[email protected]> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/444 Fixes: f70cad1085d1 ("mptcp: stop relying on tcp_tx_skb_cache") Cc: [email protected] Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-19tcp: check mptcp-level constraints for backlog coalescingPaolo Abeni1-0/+1
The MPTCP protocol can acquire the subflow-level socket lock and cause the tcp backlog usage. When inserting new skbs into the backlog, the stack will try to coalesce them. Currently, we have no check in place to ensure that such coalescing will respect the MPTCP-level DSS, and that may cause data stream corruption, as reported by Christoph. Address the issue by adding the relevant admission check for coalescing in tcp_add_backlog(). Note the issue is not easy to reproduce, as the MPTCP protocol tries hard to avoid acquiring the subflow-level socket lock. Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Cc: [email protected] Reported-by: Christoph Paasch <[email protected]> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/420 Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-19selftests: mptcp: join: correctly check for no RSTMatthieu Baerts1-2/+6
The commit mentioned below was more tolerant with the number of RST seen during a test because in some uncontrollable situations, multiple RST can be generated. But it was not taking into account the case where no RST are expected: this validation was then no longer reporting issues for the 0 RST case because it is not possible to have less than 0 RST in the counter. This patch fixes the issue by adding a specific condition. Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: [email protected] Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-19net: ti: icssg-prueth: Fix r30 CMDs bitmasksMD Danish Anwar1-2/+2
The bitmasks for EMAC_PORT_DISABLE and EMAC_PORT_FORWARD r30 commands are wrong in the driver. Update the bitmasks of these commands to the correct ones as used by the ICSSG firmware. These bitmasks are backwards compatible and work with any ICSSG firmware version. Fixes: e9b4ece7d74b ("net: ti: icssg-prueth: Add Firmware config and classification APIs.") Signed-off-by: MD Danish Anwar <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-19Merge tag 'for-6.6-rc6-tag' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "Fix a bug in chunk size decision that could lead to suboptimal placement and filling patterns" * tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix stripe length calculation for non-zoned data chunk allocation
2023-10-19Merge branch 'net-fix-bugs-in-device-netns-move-and-rename'Paolo Abeni4-15/+141
Jakub Kicinski says: ==================== net: fix bugs in device netns-move and rename Daniel reported issues with the uevents generated during netdev namespace move, if the netdev is getting renamed at the same time. While the issue that he actually cares about is not fixed here, there is a bunch of seemingly obvious other bugs in this code. Fix the purely networking bugs while the discussion around the uevent fix is still ongoing. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-10-19selftests: net: add very basic test for netdev names and namespacesJakub Kicinski2-0/+88
Add selftest for fixes around naming netdevs and namespaces. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-19net: move altnames together with the netdeviceJakub Kicinski1-4/+9
The altname nodes are currently not moved to the new netns when netdevice itself moves: [ ~]# ip netns add test [ ~]# ip -netns test link add name eth0 type dummy [ ~]# ip -netns test link property add dev eth0 altname some-name [ ~]# ip -netns test link show dev some-name 2: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 1e:67:ed:19:3d:24 brd ff:ff:ff:ff:ff:ff altname some-name [ ~]# ip -netns test link set dev eth0 netns 1 [ ~]# ip link ... 3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff altname some-name [ ~]# ip li show dev some-name Device "some-name" does not exist. Remove them from the hash table when device is unlisted and add back when listed again. Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-19net: avoid UAF on deleted altnameJakub Kicinski1-1/+6
Altnames are accessed under RCU (dev_get_by_name_rcu()) but freed by kfree() with no synchronization point. Each node has one or two allocations (node and a variable-size name, sometimes the name is netdev->name). Adding rcu_heads here is a bit tedious. Besides most code which unlists the names already has rcu barriers - so take the simpler approach of adding synchronize_rcu(). Note that the one on the unregistration path (which matters more) is removed by the next fix. Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist") Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-19net: check for altname conflicts when changing netdev's netnsJakub Kicinski2-1/+11
It's currently possible to create an altname conflicting with an altname or real name of another device by creating it in another netns and moving it over: [ ~]$ ip link add dev eth0 type dummy [ ~]$ ip netns add test [ ~]$ ip -netns test link add dev ethX netns test type dummy [ ~]$ ip -netns test link property add dev ethX altname eth0 [ ~]$ ip -netns test link set dev ethX netns 1 [ ~]$ ip link ... 3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff ... 5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff altname eth0 Create a macro for walking the altnames, this hopefully makes it clearer that the list we walk contains only altnames. Which is otherwise not entirely intuitive. Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-19net: fix ifname in netlink ntf during netns moveJakub Kicinski1-13/+31
dev_get_valid_name() overwrites the netdev's name on success. This makes it hard to use in prepare-commit-like fashion, where we do validation first, and "commit" to the change later. Factor out a helper which lets us save the new name to a buffer. Use it to fix the problem of notification on netns move having incorrect name: 5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff 6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff [ ~]# ip link set dev eth0 netns 1 name eth1 ip monitor inside netns: Deleted inet eth0 Deleted inet6 eth0 Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7 Name is reported as eth1 in old netns for ifindex 5, already renamed. Fixes: d90310243fd7 ("net: device name allocation cleanups") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-19net: ethernet: ti: Fix mixed module-builtin objectMD Danish Anwar3-3/+19
With CONFIG_TI_K3_AM65_CPSW_NUSS=y and CONFIG_TI_ICSSG_PRUETH=m, k3-cppi-desc-pool.o is linked to a module and also to vmlinux even though the expected CFLAGS are different between builtins and modules. The build system is complaining about the following: k3-cppi-desc-pool.o is added to multiple modules: icssg-prueth ti-am65-cpsw-nuss Introduce the new module, k3-cppi-desc-pool, to provide the common functions to ti-am65-cpsw-nuss and icssg-prueth. Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver") Signed-off-by: MD Danish Anwar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-10-18Merge tag 'nf-23-10-18' of ↵Jakub Kicinski3-26/+83
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: updates for net First patch, from Phil Sutter, reduces number of audit notifications when userspace requests to re-set stateful objects. This change also comes with a selftest update. Second patch, also from Phil, moves the nftables audit selftest to its own netns to avoid interference with the init netns. Third patch, from Pablo Neira, fixes an inconsistency with the "rbtree" set backend: When set element X has expired, a request to delete element X should fail (like with all other backends). Finally, patch four, also from Pablo, reverts a recent attempt to speed up abort of a large pending update with the "pipapo" set backend. It could cause stray references to remain in the set, which then results in a double-free. * tag 'nf-23-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: revert do not remove elements if set backend implements .abort netfilter: nft_set_rbtree: .deactivate fails if element has expired selftests: netfilter: Run nft_audit.sh in its own netns netfilter: nf_tables: audit log object reset once per table ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-18Merge tag 'wireless-2023-10-18' of ↵Jakub Kicinski4-8/+6
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== A few more fixes: * prevent value bounce/glitch in rfkill GPIO probe * fix lockdep report in rfkill * fix error path leak in mac80211 key handling * use system_unbound_wq for wiphy work since it can take longer * tag 'wireless-2023-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: net: rfkill: reduce data->mtx scope in rfkill_fop_open net: rfkill: gpio: prevent value glitch during probe wifi: mac80211: fix error path key leak wifi: cfg80211: use system_unbound_wq for wiphy work ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-18net: phy: bcm7xxx: Add missing 16nm EPHY statisticsFlorian Fainelli1-0/+3
The .probe() function would allocate the necessary space and ensure that the library call sizes the number of statistics but the callbacks necessary to fetch the name and values were not wired up. Reported-by: Justin Chen <[email protected]> Fixes: f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165") Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-18ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddrEric Dumazet1-5/+9
syzbot reported a data-race while accessing nh->nh_saddr_genid [1] Add annotations, but leave the code lazy as intended. [1] BUG: KCSAN: data-race in fib_select_path / fib_select_path write to 0xffff8881387166f0 of 4 bytes by task 6778 on cpu 1: fib_info_update_nhc_saddr net/ipv4/fib_semantics.c:1334 [inline] fib_result_prefsrc net/ipv4/fib_semantics.c:1354 [inline] fib_select_path+0x292/0x330 net/ipv4/fib_semantics.c:2269 ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810 ip_route_output_key_hash net/ipv4/route.c:2644 [inline] __ip_route_output_key include/net/route.h:134 [inline] ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872 send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61 wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 read to 0xffff8881387166f0 of 4 bytes by task 6759 on cpu 0: fib_result_prefsrc net/ipv4/fib_semantics.c:1350 [inline] fib_select_path+0x1cb/0x330 net/ipv4/fib_semantics.c:2269 ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810 ip_route_output_key_hash net/ipv4/route.c:2644 [inline] __ip_route_output_key include/net/route.h:134 [inline] ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872 send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61 wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 value changed: 0x959d3217 -> 0x959d3218 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 6759 Comm: kworker/u4:15 Not tainted 6.6.0-rc4-syzkaller-00029-gcbf3a2cb156a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023 Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker Fixes: 436c3b66ec98 ("ipv4: Invalidate nexthop cache nh_saddr more correctly.") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-18tcp_bpf: properly release resources on error pathsPaolo Abeni1-4/+12
In the blamed commit below, I completely forgot to release the acquired resources before erroring out in the TCP BPF code, as reported by Dan. Address the issues by replacing the bogus return with a jump to the relevant cleanup code. Fixes: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Acked-by: Jakub Sitnicki <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/r/8f99194c698bcef12666f0a9a999c58f8b1cb52c.1697557782.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-18net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curvePedro Tammela1-4/+14
Christian Theune says: I upgraded from 6.1.38 to 6.1.55 this morning and it broke my traffic shaping script, leaving me with a non-functional uplink on a remote router. A 'rt' curve cannot be used as a inner curve (parent class), but we were allowing such configurations since the qdisc was introduced. Such configurations would trigger a UAF as Budimir explains: The parent will have vttree_insert() called on it in init_vf(), but will not have vttree_remove() called on it in update_vf() because it does not have the HFSC_FSC flag set. The qdisc always assumes that inner classes have the HFSC_FSC flag set. This is by design as it doesn't make sense 'qdisc wise' for an 'rt' curve to be an inner curve. Budimir's original patch disallows users to add classes with a 'rt' parent, but this is too strict as it breaks users that have been using 'rt' as a inner class. Another approach, taken by this patch, is to upgrade the inner 'rt' into a 'sc', warning the user in the process. It avoids the UAF reported by Budimir while also being more permissive to bad scripts/users/code using 'rt' as a inner class. Users checking the `tc class ls [...]` or `tc class get [...]` dumps would observe the curve change and are potentially breaking with this change. v1->v2: https://lore.kernel.org/all/[email protected]/ - Correct 'Fixes' tag and merge with revert (Jakub) Cc: Christian Theune <[email protected]> Cc: Budimir Markovic <[email protected]> Fixes: b3d26c5702c7 ("net/sched: sch_hfsc: Ensure inner classes have fsc curve") Signed-off-by: Pedro Tammela <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-18net: mdio-mux: fix C45 access returning -EIO after API changeVladimir Oltean1-0/+47
The mii_bus API conversion to read_c45() and write_c45() did not cover the mdio-mux driver before read() and write() were made C22-only. This broke arch/arm64/boot/dts/freescale/fsl-ls1028a-qds-13bb.dtso. The -EOPNOTSUPP from mdiobus_c45_read() is transformed by get_phy_c45_devs_in_pkg() into -EIO, is further propagated to of_mdiobus_register() and this makes the mdio-mux driver fail to probe the entire child buses, not just the PHYs that cause access errors. Fix the regression by introducing special c45 read and write accessors to mdio-mux which forward the operation to the parent MDIO bus. Fixes: db1a63aed89c ("net: phy: Remove fallback to old C45 method") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-18tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skbEric Dumazet1-2/+14
In commit 75eefc6c59fd ("tcp: tsq: add a shortcut in tcp_small_queue_check()") we allowed to send an skb regardless of TSQ limits being hit if rtx queue was empty or had a single skb, in order to better fill the pipe when/if TX completions were slow. Then later, commit 75c119afe14f ("tcp: implement rb-tree based retransmit queue") accidentally removed the special case for one skb in rtx queue. Stefan Wahren reported a regression in single TCP flow throughput using a 100Mbit fec link, starting from commit 65466904b015 ("tcp: adjust TSO packet sizes based on min_rtt"). This last commit only made the regression more visible, because it locked the TCP flow on a particular behavior where TSQ prevented two skbs being pushed downstream, adding silences on the wire between each TSO packet. Many thanks to Stefan for his invaluable help ! Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue") Link: https://lore.kernel.org/netdev/[email protected]/ Reported-by: Stefan Wahren <[email protected]> Tested-by: Stefan Wahren <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Neal Cardwell <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-18octeon_ep: update BQL sent bytes before ringing doorbellShinas Rasheed1-7/+6
Sometimes Tx is completed immediately after doorbell is updated, which causes Tx completion routing to update completion bytes before the same packet bytes are updated in sent bytes in transmit function, hence hitting BUG_ON() in dql_completed(). To avoid this, update BQL sent bytes before ringing doorbell. Fixes: 37d79d059606 ("octeon_ep: add Tx/Rx processing and interrupt support") Signed-off-by: Shinas Rasheed <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-18perf/benchmark: fix seccomp_unotify benchmark for 32-bitJiri Slaby (SUSE)1-1/+1
Commit 7d5cb68af638 (perf/benchmark: add a new benchmark for seccom_unotify) added a reference to __NR_seccomp into perf. This is fine as it added also a definition of __NR_seccomp for 64-bit. But it failed to do so for 32-bit as instead of ifndef, ifdef was used. Fix this typo (so fix the build of perf on 32-bit). Fixes: 7d5cb68af638 (perf/benchmark: add a new benchmark for seccom_unotify) Cc: Andrei Vagin <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: "Jiri Slaby (SUSE)" <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2023-10-18Merge tag 'spi-fix-v6-6-rc4' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fix from Mark Brown: "A fix for the npcm-fiu driver in cases where there are no dummy bytes during reads" * tag 'spi-fix-v6-6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: npcm-fiu: Fix UMA reads when dummy.nbytes == 0
2023-10-18Merge tag 'regmap-fix-v6.6-rc6' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fix from Mark Brown: "A straightforward fix from Johan for a long standing bug in cases where we both have regmaps without devices and something is using dev_get_regmap()" * tag 'regmap-fix-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: fix NULL deref on lookup
2023-10-18netfilter: nf_tables: revert do not remove elements if set backend ↵Pablo Neira Ayuso1-4/+1
implements .abort nf_tables_abort_release() path calls nft_set_elem_destroy() for NFT_MSG_NEWSETELEM which releases the element, however, a reference to the element still remains in the working copy. Fixes: ebd032fa8818 ("netfilter: nf_tables: do not remove elements if set backend implements .abort") Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Florian Westphal <[email protected]>
2023-10-18netfilter: nft_set_rbtree: .deactivate fails if element has expiredPablo Neira Ayuso1-0/+2
This allows to remove an expired element which is not possible in other existing set backends, this is more noticeable if gc-interval is high so expired elements remain in the tree. On-demand gc also does not help in this case, because this is delete element path. Return NULL if element has expired. Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Florian Westphal <[email protected]>
2023-10-18selftests: netfilter: Run nft_audit.sh in its own netnsPhil Sutter1-0/+6
Don't mess with the host's firewall ruleset. Since audit logging is not per-netns, add an initial delay of a second so other selftests' netns cleanups have a chance to finish. Fixes: e8dbde59ca3f ("selftests: netfilter: Test nf_tables audit logging") Signed-off-by: Phil Sutter <[email protected]> Signed-off-by: Florian Westphal <[email protected]>