aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-10-24tcp: fix indefinite deferral of RTO with SACK renegingNeal Cardwell1-1/+2
This commit fixes a bug that can cause a TCP data sender to repeatedly defer RTOs when encountering SACK reneging. The bug is that when we're in fast recovery in a scenario with SACK reneging, every time we get an ACK we call tcp_check_sack_reneging() and it can note the apparent SACK reneging and rearm the RTO timer for srtt/2 into the future. In some SACK reneging scenarios that can happen repeatedly until the receive window fills up, at which point the sender can't send any more, the ACKs stop arriving, and the RTO fires at srtt/2 after the last ACK. But that can take far too long (O(10 secs)), since the connection is stuck in fast recovery with a low cwnd that cannot grow beyond ssthresh, even if more bandwidth is available. This fix changes the logic in tcp_check_sack_reneging() to only rearm the RTO timer if data is cumulatively ACKed, indicating forward progress. This avoids this kind of nearly infinite loop of RTO timer re-arming. In addition, this meets the goals of tcp_check_sack_reneging() in handling Windows TCP behavior that looks temporarily like SACK reneging but is not really. Many thanks to Jakub Kicinski and Neil Spring, who reported this issue and provided critical packet traces that enabled root-causing this issue. Also, many thanks to Jakub Kicinski for testing this fix. Fixes: 5ae344c949e7 ("tcp: reduce spurious retransmits due to transient SACK reneging") Reported-by: Jakub Kicinski <[email protected]> Reported-by: Neil Spring <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Cc: Yuchung Cheng <[email protected]> Tested-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-24Merge tag 'for-netdev' of ↵Jakub Kicinski8-5/+69
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2022-10-23 We've added 7 non-merge commits during the last 18 day(s) which contain a total of 8 files changed, 69 insertions(+), 5 deletions(-). The main changes are: 1) Wait for busy refill_work when destroying bpf memory allocator, from Hou. 2) Allow bpf_user_ringbuf_drain() callbacks to return 1, from David. 3) Fix dispatcher patchable function entry to 5 bytes nop, from Jiri. 4) Prevent decl_tag from being referenced in func_proto, from Stanislav. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Use __llist_del_all() whenever possbile during memory draining bpf: Wait for busy refill_work when destroying bpf memory allocator bpf: Fix dispatcher patchable function entry to 5 bytes nop bpf: prevent decl_tag from being referenced in func_proto selftests/bpf: Add reproducer for decl_tag in func_proto return type selftests/bpf: Make bpf_user_ringbuf_drain() selftest callback return 1 bpf: Allow bpf_user_ringbuf_drain() callbacks to return 1 ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-24tcp: fix a signed-integer-overflow bug in tcp_add_backlog()Lu Wei1-1/+3
The type of sk_rcvbuf and sk_sndbuf in struct sock is int, and in tcp_add_backlog(), the variable limit is caculated by adding sk_rcvbuf, sk_sndbuf and 64 * 1024, it may exceed the max value of int and overflow. This patch reduces the limit budget by halving the sndbuf to solve this issue since ACK packets are much smaller than the payload. Fixes: c9c3321257e1 ("tcp: add tcp_add_backlog()") Signed-off-by: Lu Wei <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-24net: lantiq_etop: don't free skb when returning NETDEV_TX_BUSYZhang Changzhong1-1/+0
The ndo_start_xmit() method must not free skb when returning NETDEV_TX_BUSY, since caller is going to requeue freed skb. Fixes: 504d4721ee8e ("MIPS: Lantiq: Add ethernet driver") Signed-off-by: Zhang Changzhong <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-24net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failedZhengchao Shao1-0/+7
When the ops_init() interface is invoked to initialize the net, but ops->init() fails, data is released. However, the ptr pointer in net->gen is invalid. In this case, when nfqnl_nf_hook_drop() is invoked to release the net, invalid address access occurs. The process is as follows: setup_net() ops_init() data = kzalloc(...) ---> alloc "data" net_assign_generic() ---> assign "date" to ptr in net->gen ... ops->init() ---> failed ... kfree(data); ---> ptr in net->gen is invalid ... ops_exit_list() ... nfqnl_nf_hook_drop() *q = nfnl_queue_pernet(net) ---> q is invalid The following is the Call Trace information: BUG: KASAN: use-after-free in nfqnl_nf_hook_drop+0x264/0x280 Read of size 8 at addr ffff88810396b240 by task ip/15855 Call Trace: <TASK> dump_stack_lvl+0x8e/0xd1 print_report+0x155/0x454 kasan_report+0xba/0x1f0 nfqnl_nf_hook_drop+0x264/0x280 nf_queue_nf_hook_drop+0x8b/0x1b0 __nf_unregister_net_hook+0x1ae/0x5a0 nf_unregister_net_hooks+0xde/0x130 ops_exit_list+0xb0/0x170 setup_net+0x7ac/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> Allocated by task 15855: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0xa1/0xb0 __kmalloc+0x49/0xb0 ops_init+0xe7/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 15855: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x155/0x1b0 slab_free_freelist_hook+0x11b/0x220 __kmem_cache_free+0xa4/0x360 ops_init+0xb9/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: f875bae06533 ("net: Automatically allocate per namespace data.") Signed-off-by: Zhengchao Shao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-24docs: netdev: offer performance feedback to contributorsJakub Kicinski1-0/+10
Some of us gotten used to producing large quantities of peer feedback at work, every 3 or 6 months. Extending the same courtesy to community members seems like a logical step. It may be hard for some folks to get validation of how important their work is internally, especially at smaller companies which don't employ many kernel experts. The concept of "peer feedback" may be a hyperscaler / silicon valley thing so YMMV. Hopefully we can build more context as we go. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-24Merge branch 'kcm-data-races'David S. Miller1-8/+15
Eric Dumazet says: ==================== kcm: annotate data-races This series address two different syzbot reports for KCM. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-10-24kcm: annotate data-races around kcm->rx_waitEric Dumazet1-6/+11
kcm->rx_psock can be read locklessly in kcm_rfree(). Annotate the read and writes accordingly. syzbot reported: BUG: KCSAN: data-race in kcm_rcv_strparser / kcm_rfree write to 0xffff88810784e3d0 of 1 bytes by task 1823 on cpu 1: reserve_rx_kcm net/kcm/kcmsock.c:283 [inline] kcm_rcv_strparser+0x250/0x3a0 net/kcm/kcmsock.c:363 __strp_recv+0x64c/0xd20 net/strparser/strparser.c:301 strp_recv+0x6d/0x80 net/strparser/strparser.c:335 tcp_read_sock+0x13e/0x5a0 net/ipv4/tcp.c:1703 strp_read_sock net/strparser/strparser.c:358 [inline] do_strp_work net/strparser/strparser.c:406 [inline] strp_work+0xe8/0x180 net/strparser/strparser.c:415 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 read to 0xffff88810784e3d0 of 1 bytes by task 17869 on cpu 0: kcm_rfree+0x121/0x220 net/kcm/kcmsock.c:181 skb_release_head_state+0x8e/0x160 net/core/skbuff.c:841 skb_release_all net/core/skbuff.c:852 [inline] __kfree_skb net/core/skbuff.c:868 [inline] kfree_skb_reason+0x5c/0x260 net/core/skbuff.c:891 kfree_skb include/linux/skbuff.h:1216 [inline] kcm_recvmsg+0x226/0x2b0 net/kcm/kcmsock.c:1161 ____sys_recvmsg+0x16c/0x2e0 ___sys_recvmsg net/socket.c:2743 [inline] do_recvmmsg+0x2f1/0x710 net/socket.c:2837 __sys_recvmmsg net/socket.c:2916 [inline] __do_sys_recvmmsg net/socket.c:2939 [inline] __se_sys_recvmmsg net/socket.c:2932 [inline] __x64_sys_recvmmsg+0xde/0x160 net/socket.c:2932 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x01 -> 0x00 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 17869 Comm: syz-executor.2 Not tainted 6.1.0-rc1-syzkaller-00010-gbb1a1146467a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-24kcm: annotate data-races around kcm->rx_psockEric Dumazet1-3/+5
kcm->rx_psock can be read locklessly in kcm_rfree(). Annotate the read and writes accordingly. We do the same for kcm->rx_wait in the following patch. syzbot reported: BUG: KCSAN: data-race in kcm_rfree / unreserve_rx_kcm write to 0xffff888123d827b8 of 8 bytes by task 2758 on cpu 1: unreserve_rx_kcm+0x72/0x1f0 net/kcm/kcmsock.c:313 kcm_rcv_strparser+0x2b5/0x3a0 net/kcm/kcmsock.c:373 __strp_recv+0x64c/0xd20 net/strparser/strparser.c:301 strp_recv+0x6d/0x80 net/strparser/strparser.c:335 tcp_read_sock+0x13e/0x5a0 net/ipv4/tcp.c:1703 strp_read_sock net/strparser/strparser.c:358 [inline] do_strp_work net/strparser/strparser.c:406 [inline] strp_work+0xe8/0x180 net/strparser/strparser.c:415 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 read to 0xffff888123d827b8 of 8 bytes by task 5859 on cpu 0: kcm_rfree+0x14c/0x220 net/kcm/kcmsock.c:181 skb_release_head_state+0x8e/0x160 net/core/skbuff.c:841 skb_release_all net/core/skbuff.c:852 [inline] __kfree_skb net/core/skbuff.c:868 [inline] kfree_skb_reason+0x5c/0x260 net/core/skbuff.c:891 kfree_skb include/linux/skbuff.h:1216 [inline] kcm_recvmsg+0x226/0x2b0 net/kcm/kcmsock.c:1161 ____sys_recvmsg+0x16c/0x2e0 ___sys_recvmsg net/socket.c:2743 [inline] do_recvmmsg+0x2f1/0x710 net/socket.c:2837 __sys_recvmmsg net/socket.c:2916 [inline] __do_sys_recvmmsg net/socket.c:2939 [inline] __se_sys_recvmmsg net/socket.c:2932 [inline] __x64_sys_recvmmsg+0xde/0x160 net/socket.c:2932 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0xffff88812971ce00 -> 0x0000000000000000 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 5859 Comm: syz-executor.3 Not tainted 6.0.0-syzkaller-12189-g19d17ab7c68b-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-24net: fman: Use physical address for userspace interfacesSean Anderson4-10/+10
Before 262f2b782e25 ("net: fman: Map the base address once"), the physical address of the MAC was exposed to userspace in two places: via sysfs and via SIOCGIFMAP. While this is not best practice, it is an external ABI which is in use by userspace software. The aforementioned commit inadvertently modified these addresses and made them virtual. This constitutes and ABI break. Additionally, it leaks the kernel's memory layout to userspace. Partially revert that commit, reintroducing the resource back into struct mac_device, while keeping the intended changes (the rework of the address mapping). Fixes: 262f2b782e25 ("net: fman: Map the base address once") Reported-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Sean Anderson <[email protected]> Acked-by: Madalin Bucur <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-24net/mlx5e: Cleanup MACsec uninitialization routineLeon Romanovsky1-10/+1
The mlx5e_macsec_cleanup() routine has NULL pointer dereferencing if mlx5 device doesn't support MACsec (priv->macsec will be NULL). While at it delete comment line, assignment and extra blank lines, so fix everything in one patch. Fixes: 1f53da676439 ("net/mlx5e: Create advanced steering operation (ASO) object for MACsec") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-24atlantic: fix deadlock at aq_nic_stopÍñigo Huguet2-24/+74
NIC is stopped with rtnl_lock held, and during the stop it cancels the 'service_task' work and free irqs. However, if CONFIG_MACSEC is set, rtnl_lock is acquired both from aq_nic_service_task and aq_linkstate_threaded_isr. Then a deadlock happens if aq_nic_stop tries to cancel/disable them when they've already started their execution. As the deadlock is caused by rtnl_lock, it causes many other processes to stall, not only atlantic related stuff. Fix it by introducing a mutex that protects each NIC's macsec related data, and locking it instead of the rtnl_lock from the service task and the threaded IRQ. Before this patch, all macsec data was protected with rtnl_lock, but maybe not all of it needs to be protected. With this new mutex, further efforts can be made to limit the protected data only to that which requires it. However, probably it doesn't worth it because all macsec's data accesses are infrequent, and almost all are done from macsec_ops or ethtool callbacks, called holding rtnl_lock, so macsec_mutex won't never be much contended. The issue appeared repeteadly attaching and deattaching the NIC to a bond interface. Doing that after this patch I cannot reproduce the bug. Fixes: 62c1c2e606f6 ("net: atlantic: MACSec offload skeleton") Reported-by: Li Liang <[email protected]> Suggested-by: Andrew Lunn <[email protected]> Signed-off-by: Íñigo Huguet <[email protected]> Reviewed-by: Igor Russkikh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-21nfp: only clean `sp_indiff` when application firmware is unloadedYinjun Zhang1-23/+15
Currently `sp_indiff` is cleaned when driver is removed. This will cause problem in multi-PF/multi-host case, considering one PF is removed while another is still in use. Since `sp_indiff` is the application firmware property, it should only be cleaned when the firmware is unloaded. Now let management firmware to clean it when necessary, driver only set it. Fixes: b1e4f11e426d ("nfp: refine the ABI of getting `sp_indiff` info") Signed-off-by: Yinjun Zhang <[email protected]> Signed-off-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-21Merge branch 'amd-xgbe-miscellaneous-fixes'Jakub Kicinski3-21/+68
Raju Rangoju says: ==================== amd-xgbe: Miscellaneous fixes (1) Fix the rrc for Yellow carp devices. CDR workaround path is disabled for YC devices, receiver reset cycle is not needed in such cases. (2) Add enumerations for mailbox command and sub-commands. Instead of using hard-coded values, use enums. (3) Enable PLL_CTL for fixed PHY modes only. Driver does not implement SW RRCM for Autoneg Off configuration, hence PLL is needed for fixed PHY modes only. (4) Fix the SFP compliance codes check for DAC cables. Some of the passive cables have non-zero data at offset 6 in SFP EEPROM data. So, fix the sfp compliance codes check. (5) Add a quirk for Molex passive cables to extend the rate ceiling to 0x78. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-21amd-xgbe: add the bit rate quirk for Molex cablesRaju Rangoju1-1/+8
The offset 12 (bit-rate) of EEPROM SFP DAC (passive) cables is expected to be in the range 0x64 to 0x68. However, the 5 meter and 7 meter Molex passive cables have the rate ceiling 0x78 at offset 12. Add a quirk for Molex passive cables to extend the rate ceiling to 0x78. Fixes: abf0a1c2b26a ("amd-xgbe: Add support for SFP+ modules") Signed-off-by: Raju Rangoju <[email protected]> Acked-by: Tom Lendacky <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-21amd-xgbe: fix the SFP compliance codes check for DAC cablesRaju Rangoju1-4/+4
The current XGBE code assumes that offset 6 of EEPROM SFP DAC (passive) cables is NULL. However, some cables (the 5 meter and 7 meter Molex passive cables) have non-zero data at offset 6. Fix the logic by moving the passive cable check above the active checks, so as not to be improperly identified as an active cable. This will fix the issue for any passive cable that advertises 1000Base-CX in offset 6. Fixes: abf0a1c2b26a ("amd-xgbe: Add support for SFP+ modules") Signed-off-by: Raju Rangoju <[email protected]> Acked-by: Tom Lendacky <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-21amd-xgbe: enable PLL_CTL for fixed PHY modes onlyRaju Rangoju1-2/+8
PLL control setting(RRC) is needed only in fixed PHY configuration to fix the peer-peer issues. Without the PLL control setting, the link up takes longer time in a fixed phy configuration. Driver implements SW RRC for Autoneg On configuration, hence PLL control setting (RRC) is not needed for AN On configuration, and can be skipped. Also, PLL re-initialization is not needed for PHY Power Off and RRC commands. Otherwise, they lead to mailbox errors. Added the changes accordingly. Fixes: daf182d360e5 ("net: amd-xgbe: Toggle PLL settings during rate change") Signed-off-by: Raju Rangoju <[email protected]> Acked-by: Tom Lendacky <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-21amd-xgbe: use enums for mailbox cmd and sub_cmdsRaju Rangoju2-13/+41
Instead of using hardcoded values, use enumerations for mailbox command and sub commands. Signed-off-by: Raju Rangoju <[email protected]> Acked-by: Tom Lendacky <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-21amd-xgbe: Yellow carp devices do not need rrcRaju Rangoju3-1/+7
Link stability issues are noticed on Yellow carp platforms when Receiver Reset Cycle is issued. Since the CDR workaround is disabled on these platforms, the Receiver Reset Cycle is not needed. So, avoid issuing rrc on Yellow carp platforms. Fixes: dbb6c58b5a61 ("net: amd-xgbe: Add Support for Yellow Carp Ethernet device") Signed-off-by: Raju Rangoju <[email protected]> Acked-by: Tom Lendacky <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-21Merge branch 'Wait for busy refill_work when destroying bpf memory allocator'Alexei Starovoitov1-2/+16
Hou Tao says: ==================== From: Hou Tao <[email protected]> Hi, The patchset aims to fix one problem of bpf memory allocator destruction when there is PREEMPT_RT kernel or kernel with arch_irq_work_has_interrupt() being false (e.g. 1-cpu arm32 host or mips). The root cause is that there may be busy refill_work when the allocator is destroying and it may incur oops or other problems as shown in patch #1. Patch #1 fixes the problem by waiting for the completion of irq work during destroying and patch #2 is just a clean-up patch based on patch #1. Please see individual patches for more details. Comments are always welcome. Change Log: v2: * patch 1: fix typos and add notes about the overhead of irq_work_sync() * patch 1 & 2: add Acked-by tags from [email protected] v1: https://lore.kernel.org/bpf/[email protected]/T/#t ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2022-10-21bpf: Use __llist_del_all() whenever possbile during memory drainingHou Tao1-2/+5
Except for waiting_for_gp list, there are no concurrent operations on free_by_rcu, free_llist and free_llist_extra lists, so use __llist_del_all() instead of llist_del_all(). waiting_for_gp list can be deleted by RCU callback concurrently, so still use llist_del_all(). Acked-by: Stanislav Fomichev <[email protected]> Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-10-21bpf: Wait for busy refill_work when destroying bpf memory allocatorHou Tao1-0/+11
A busy irq work is an unfinished irq work and it can be either in the pending state or in the running state. When destroying bpf memory allocator, refill_work may be busy for PREEMPT_RT kernel in which irq work is invoked in a per-CPU RT-kthread. It is also possible for kernel with arch_irq_work_has_interrupt() being false (e.g. 1-cpu arm32 host or mips) and irq work is inovked in timer interrupt. The busy refill_work leads to various issues. The obvious one is that there will be concurrent operations on free_by_rcu and free_list between irq work and memory draining. Another one is call_rcu_in_progress will not be reliable for the checking of pending RCU callback because do_call_rcu() may have not been invoked by irq work yet. The other is there will be use-after-free if irq work is freed before the callback of irq work is invoked as shown below: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 12ab94067 P4D 12ab94067 PUD 1796b4067 PMD 0 Oops: 0010 [#1] PREEMPT_RT SMP CPU: 5 PID: 64 Comm: irq_work/5 Not tainted 6.0.0-rt11+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffadc080293e78 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffffcdc07fb6a388 RCX: ffffa05000a2e000 RDX: ffffa05000a2e000 RSI: ffffffff96cc9827 RDI: ffffcdc07fb6a388 ...... Call Trace: <TASK> irq_work_single+0x24/0x60 irq_work_run_list+0x24/0x30 run_irq_workd+0x23/0x30 smpboot_thread_fn+0x203/0x300 kthread+0x126/0x150 ret_from_fork+0x1f/0x30 </TASK> Considering the ease of concurrency handling, no overhead for irq_work_sync() under non-PREEMPT_RT kernel and has-irq-work-interrupt kernel and the short wait time used for irq_work_sync() under PREEMPT_RT (When running two test_maps on PREEMPT_RT kernel and 72-cpus host, the max wait time is about 8ms and the 99th percentile is 10us), just using irq_work_sync() to wait for busy refill_work to complete before memory draining and memory freeing. Fixes: 7c8199e24fa0 ("bpf: Introduce any context BPF specific memory allocator.") Acked-by: Stanislav Fomichev <[email protected]> Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-10-21MAINTAINERS: add keyword match on PTPJakub Kicinski1-0/+1
Most of PTP drivers live under ethernet and we have to keep telling people to CC the PTP maintainers. Let's try a keyword match, we can refine as we go if it causes false positives. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-21ethtool: pse-pd: fix null-deref on genl_info in dumpJakub Kicinski1-1/+1
ethnl_default_dump_one() passes NULL as info. It's correct not to set extack during dump, as we should just silently skip interfaces which can't provide the information. Reported-by: [email protected] Fixes: 18ff0bcda6d1 ("ethtool: add interface to interact with Ethernet Power Equipment") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Oleksij Rempel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-20nfc: virtual_ncidev: Fix memory leak in virtual_nci_send()Shang XiaoJing1-0/+3
skb should be free in virtual_nci_send(), otherwise kmemleak will report memleak. Steps for reproduction (simulated in qemu): cd tools/testing/selftests/nci make ./nci_dev BUG: memory leak unreferenced object 0xffff888107588000 (size 208): comm "nci_dev", pid 206, jiffies 4294945376 (age 368.248s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000008d94c8fd>] __alloc_skb+0x1da/0x290 [<00000000278bc7f8>] nci_send_cmd+0xa3/0x350 [<0000000081256a22>] nci_reset_req+0x6b/0xa0 [<000000009e721112>] __nci_request+0x90/0x250 [<000000005d556e59>] nci_dev_up+0x217/0x5b0 [<00000000e618ce62>] nfc_dev_up+0x114/0x220 [<00000000981e226b>] nfc_genl_dev_up+0x94/0xe0 [<000000009bb03517>] genl_family_rcv_msg_doit.isra.14+0x228/0x2d0 [<00000000b7f8c101>] genl_rcv_msg+0x35c/0x640 [<00000000c94075ff>] netlink_rcv_skb+0x11e/0x350 [<00000000440cfb1e>] genl_rcv+0x24/0x40 [<0000000062593b40>] netlink_unicast+0x43f/0x640 [<000000001d0b13cc>] netlink_sendmsg+0x73a/0xbf0 [<000000003272487f>] __sys_sendto+0x324/0x370 [<00000000ef9f1747>] __x64_sys_sendto+0xdd/0x1b0 [<000000001e437841>] do_syscall_64+0x3f/0x90 Fixes: e624e6c3e777 ("nfc: Add a virtual nci device driver") Signed-off-by: Shang XiaoJing <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20net: macb: Specify PHY PM management done by MACSergiu Moga1-0/+1
The `macb_resume`/`macb_suspend` methods already call the `phylink_start`/`phylink_stop` methods during their execution so explicitly say that the PM of the PHY is done by MAC by using the `mac_managed_pm` flag of the `struct phylink_config`. This also fixes the warning message issued during resume: WARNING: CPU: 0 PID: 237 at drivers/net/phy/phy_device.c:323 mdio_bus_phy_resume+0x144/0x148 Depends-on: 96de900ae78e ("net: phylink: add mac_managed_pm in phylink_config structure") Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Signed-off-by: Sergiu Moga <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Claudiu Beznea <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20Merge branch 'fix-some-issues-in-huawei-hinic-driver'Jakub Kicinski4-9/+14
Zhengchao Shao says: ==================== fix some issues in Huawei hinic driver Fix some issues in Huawei hinic driver. This patchset is compiled only, not tested. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20net: hinic: fix the issue of double release MBOX callback of VFZhengchao Shao1-1/+0
In hinic_vf_func_init(), if VF fails to register information with PF through the MBOX, the MBOX callback function of VF is released once. But it is released again in hinic_init_hwdev(). Remove one. Fixes: 7dd29ee12865 ("hinic: add sriov feature support") Signed-off-by: Zhengchao Shao <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20net: hinic: fix the issue of CMDQ memory leaksZhengchao Shao1-1/+1
When hinic_set_cmdq_depth() fails in hinic_init_cmdqs(), the cmdq memory is not released correctly. Fix it. Fixes: 72ef908bb3ff ("hinic: add three net_device_ops of vf") Signed-off-by: Zhengchao Shao <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20net: hinic: fix memory leak when reading function tableZhengchao Shao1-6/+12
When the input parameter idx meets the expected case option in hinic_dbg_get_func_table(), read_data is not released. Fix it. Fixes: 5215e16244ee ("hinic: add support to query function table") Signed-off-by: Zhengchao Shao <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20net: hinic: fix incorrect assignment issue in hinic_set_interrupt_cfg()Zhengchao Shao1-1/+1
The value of lli_credit_cnt is incorrectly assigned, fix it. Fixes: a0337c0dee68 ("hinic: add support to set and get irq coalesce") Signed-off-by: Zhengchao Shao <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20Merge branch 'selftests-net-fix-problems-in-some-drivers-net-tests'Jakub Kicinski9-9/+18
Benjamin Poirier says: ==================== selftests: net: Fix problems in some drivers/net tests Fix two problems mostly introduced in commit bbb774d921e2 ("net: Add tests for bonding and team address list management"). ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20selftests: net: Fix netdev name mismatch in cleanupBenjamin Poirier1-1/+1
lag_lib.sh creates the interfaces dummy1 and dummy2 whereas dev_addr_lists.sh:destroy() deletes the interfaces dummy0 and dummy1. Fix the mismatch in names. Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") Signed-off-by: Benjamin Poirier <[email protected]> Reviewed-by: Jonathan Toppins <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20selftests: net: Fix cross-tree inclusion of scriptsBenjamin Poirier9-8/+17
When exporting and running a subset of selftests via kselftest, files from parts of the source tree which were not exported are not available. A few tests are trying to source such files. Address the problem by using symlinks. The problem can be reproduced by running: make -C tools/testing/selftests gen_tar TARGETS="drivers/net/bonding" [... extract archive ...] ./run_kselftest.sh or: make kselftest KBUILD_OUTPUT=/tmp/kselftests TARGETS="drivers/net/bonding" Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") Fixes: eccd0a80dc7f ("selftests: net: dsa: add a stress test for unlocked FDB operations") Link: https://lore.kernel.org/netdev/[email protected]/ Reported-by: Jonathan Toppins <[email protected]> Signed-off-by: Benjamin Poirier <[email protected]> Reviewed-by: Jonathan Toppins <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20net: lan966x: Fix the rx drop counterHoratiu Vultur1-1/+9
Currently the rx drop is calculated as the sum of multiple HW drop counters. The issue is that not all the HW drop counters were added for the rx drop counter. So if for example you have a police that drops frames, they were not see in the rx drop counter. Fix this by updating how the rx drop counter is calculated. It is required to add also RX_RED_PRIO_* HW counters. Fixes: 12c2d0a5b8e2 ("net: lan966x: add ethtool configuration and statistics") Signed-off-by: Horatiu Vultur <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20net: netsec: fix error handling in netsec_register_mdio()Yang Yingliang1-0/+2
If phy_device_register() fails, phy_device_free() need be called to put refcount, so memory of phy device and device name can be freed in callback function. If get_phy_device() fails, mdiobus_unregister() need be called, or it will cause warning in mdiobus_free() and kobject is leaked. Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver") Signed-off-by: Yang Yingliang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20tipc: fix a null-ptr-deref in tipc_topsrv_acceptXin Long1-4/+12
syzbot found a crash in tipc_topsrv_accept: KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] Workqueue: tipc_rcv tipc_topsrv_accept RIP: 0010:kernel_accept+0x22d/0x350 net/socket.c:3487 Call Trace: <TASK> tipc_topsrv_accept+0x197/0x280 net/tipc/topsrv.c:460 process_one_work+0x991/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e4/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 It was caused by srv->listener that might be set to null by tipc_topsrv_stop() in net .exit whereas it's still used in tipc_topsrv_accept() worker. srv->listener is protected by srv->idr_lock in tipc_topsrv_stop(), so add a check for srv->listener under srv->idr_lock in tipc_topsrv_accept() to avoid the null-ptr-deref. To ensure the lsock is not released during the tipc_topsrv_accept(), move sock_release() after tipc_topsrv_work_stop() where it's waiting until the tipc_topsrv_accept worker to be done. Note that sk_callback_lock is used to protect sk->sk_user_data instead of srv->listener, and it should check srv in tipc_topsrv_listener_data_ready() instead. This also ensures that no more tipc_topsrv_accept worker will be started after tipc_conn_close() is called in tipc_topsrv_stop() where it sets sk->sk_user_data to null. Fixes: 0ef897be12b8 ("tipc: separate topology server listener socket from subcsriber sockets") Reported-by: [email protected] Signed-off-by: Xin Long <[email protected]> Acked-by: Jon Maloy <[email protected]> Link: https://lore.kernel.org/r/4eee264380c409c61c6451af1059b7fb271a7e7b.1666120790.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-20bpf: Fix dispatcher patchable function entry to 5 bytes nopJiri Olsa3-1/+32
The patchable_function_entry(5) might output 5 single nop instructions (depends on toolchain), which will clash with bpf_arch_text_poke check for 5 bytes nop instruction. Adding early init call for dispatcher that checks and change the patchable entry into expected 5 nop instruction if needed. There's no need to take text_mutex, because we are using it in early init call which is called at pre-smp time. Fixes: ceea991a019c ("bpf: Move bpf_dispatcher function out of ftrace locations") Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-10-20Merge tag 'net-6.1-rc2' of ↵Linus Torvalds58-185/+432
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. Current release - regressions: - revert "net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}" - revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()" - dsa: uninitialized variable in dsa_slave_netdevice_event() - eth: sunhme: uninitialized variable in happy_meal_init() Current release - new code bugs: - eth: octeontx2: fix resource not freed after malloc Previous releases - regressions: - sched: fix return value of qdisc ingress handling on success - sched: fix race condition in qdisc_graft() - udp: update reuse->has_conns under reuseport_lock. - tls: strp: make sure the TCP skbs do not have overlapping data - hsr: avoid possible NULL deref in skb_clone() - tipc: fix an information leak in tipc_topsrv_kern_subscr - phylink: add mac_managed_pm in phylink_config structure - eth: i40e: fix DMA mappings leak - eth: hyperv: fix a RX-path warning - eth: mtk: fix memory leaks Previous releases - always broken: - sched: cake: fix null pointer access issue when cake_init() fails" * tag 'net-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits) net: phy: dp83822: disable MDI crossover status change interrupt net: sched: fix race condition in qdisc_graft() net: hns: fix possible memory leak in hnae_ae_register() wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new() sfc: include vport_id in filter spec hash and equal() genetlink: fix kdoc warnings selftests: add selftest for chaining of tc ingress handling to egress net: Fix return value of qdisc ingress handling on success net: sched: sfb: fix null pointer access issue when sfb_init() fails Revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()" net: sched: cake: fix null pointer access issue when cake_init() fails ethernet: marvell: octeontx2 Fix resource not freed after malloc netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces. ionic: catch NULL pointer issue on reconfig net: hsr: avoid possible NULL deref in skb_clone() bnxt_en: fix memory leak in bnxt_nvm_test() ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failed udp: Update reuse->has_conns under reuseport_lock. net: ethernet: mediatek: ppe: Remove the unused function mtk_foe_entry_usable() ...
2022-10-20Merge tag 'ata-6.1-rc2' of ↵Linus Torvalds7-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ata fixes from Damien Le Moal: "Several minor fixes: - Fix the module alias for the ahci_imx driver to get autoloading to work (Alexander) - Fix a potential array-index-out-of-bounds problem with the enclosure managment support in the ahci driver (Kai-Heng) - Several patches to fix compilation warnings thrown by clang in the ahci_st, sata_rcar, ahci_brcm, ahci_xgene, ahci_imx and ahci_qoriq drivers (me)" * tag 'ata-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: ahci_qoriq: Fix compilation warning ata: ahci_imx: Fix compilation warning ata: ahci_xgene: Fix compilation warning ata: ahci_brcm: Fix compilation warning ata: sata_rcar: Fix compilation warning ata: ahci_st: Fix compilation warning ata: ahci: Match EM_MAX_SLOTS with SATA_PMP_MAX_PORTS ata: ahci-imx: Fix MODULE_ALIAS
2022-10-20Merge tag 'for-6.1/dm-changes-v2' of ↵Linus Torvalds11-104/+110
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Fix dm-bufio to use test_bit_acquire to properly test_bit on arches with weaker memory ordering. - DM core replace DMWARN with DMERR or DMCRIT for fatal errors. - Enable WQ_HIGHPRI on DM verity target's verify_wq. - Add documentation for DM verity's try_verify_in_tasklet option. - Various typo and redundant word fixes in code and/or comments. * tag 'for-6.1/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm clone: Fix typo in block_device format specifier dm: remove unnecessary assignment statement in alloc_dev() dm verity: Add documentation for try_verify_in_tasklet option dm cache: delete the redundant word 'each' in comment dm raid: fix typo in analyse_superblocks code comment dm verity: enable WQ_HIGHPRI on verify_wq dm raid: delete the redundant word 'that' in comment dm: change from DMWARN to DMERR or DMCRIT for fatal errors dm bufio: use the acquire memory barrier when testing for B_READING
2022-10-19net: phy: dp83822: disable MDI crossover status change interruptFelix Riemann1-2/+1
If the cable is disconnected the PHY seems to toggle between MDI and MDI-X modes. With the MDI crossover status interrupt active this causes roughly 10 interrupts per second. As the crossover status isn't checked by the driver, the interrupt can be disabled to reduce the interrupt load. Fixes: 87461f7a58ab ("net: phy: DP83822 initial driver submission") Signed-off-by: Felix Riemann <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-19net: sched: fix race condition in qdisc_graft()Eric Dumazet1-2/+3
We had one syzbot report [1] in syzbot queue for a while. I was waiting for more occurrences and/or a repro but Dmitry Vyukov spotted the issue right away. <quoting Dmitry> qdisc_graft() drops reference to qdisc in notify_and_destroy while it's still assigned to dev->qdisc </quoting> Indeed, RCU rules are clear when replacing a data structure. The visible pointer (dev->qdisc in this case) must be updated to the new object _before_ RCU grace period is started (qdisc_put(old) in this case). [1] BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066 Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027 CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x2ba/0x719 mm/kasan/report.c:433 kasan_report+0xb1/0x1e0 mm/kasan/report.c:495 __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066 __tcf_qdisc_find net/sched/cls_api.c:1051 [inline] tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018 rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:734 ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536 __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f5efaa89279 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279 RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005 RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000 </TASK> Allocated by task 21027: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:45 [inline] set_alloc_info mm/kasan/common.c:437 [inline] ____kasan_kmalloc mm/kasan/common.c:516 [inline] ____kasan_kmalloc mm/kasan/common.c:475 [inline] __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525 kmalloc_node include/linux/slab.h:623 [inline] kzalloc_node include/linux/slab.h:744 [inline] qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938 qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997 attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline] netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline] attach_default_qdiscs net/sched/sch_generic.c:1170 [inline] dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229 __dev_open+0x393/0x4d0 net/core/dev.c:1441 __dev_change_flags+0x583/0x750 net/core/dev.c:8556 rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189 rtnl_newlink_create net/core/rtnetlink.c:3371 [inline] __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593 rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:734 ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536 __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 21020: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:45 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:367 [inline] ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329 kasan_slab_free include/linux/kasan.h:200 [inline] slab_free_hook mm/slub.c:1754 [inline] slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780 slab_free mm/slub.c:3534 [inline] kfree+0xe2/0x580 mm/slub.c:4562 rcu_do_batch kernel/rcu/tree.c:2245 [inline] rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505 __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571 Last potentially related work creation: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348 call_rcu+0x99/0x790 kernel/rcu/tree.c:2793 qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083 notify_and_destroy net/sched/sch_api.c:1012 [inline] qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084 tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671 rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:734 ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536 __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Second to last potentially related work creation: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348 kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322 neigh_destroy+0x431/0x630 net/core/neighbour.c:912 neigh_release include/net/neighbour.h:454 [inline] neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103 neigh_del net/core/neighbour.c:225 [inline] neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246 neigh_forced_gc net/core/neighbour.c:276 [inline] neigh_alloc net/core/neighbour.c:447 [inline] ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642 ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125 __ip6_finish_output net/ipv6/ip6_output.c:195 [inline] ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:451 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653 process_one_work+0x991/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e4/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 The buggy address belongs to the object at ffff88802065e000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 56 bytes inside of 1024-byte region [ffff88802065e000, ffff88802065e400) The buggy address belongs to the physical page: page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658 head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212 prep_new_page mm/page_alloc.c:2532 [inline] get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283 __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515 alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270 alloc_slab_page mm/slub.c:1824 [inline] allocate_slab+0x27e/0x3d0 mm/slub.c:1969 new_slab mm/slub.c:2029 [inline] ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031 __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118 slab_alloc_node mm/slub.c:3209 [inline] __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955 kmalloc_reserve net/core/skbuff.c:358 [inline] __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430 alloc_skb_fclone include/linux/skbuff.h:1307 [inline] tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861 tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325 tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483 inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:734 sock_write_iter+0x291/0x3d0 net/socket.c:1108 call_write_iter include/linux/fs.h:2187 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x9e9/0xdd0 fs/read_write.c:578 ksys_write+0x1e8/0x250 fs/read_write.c:631 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1449 [inline] free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499 free_unref_page_prepare mm/page_alloc.c:3380 [inline] free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476 __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548 qlink_free mm/kasan/quarantine.c:168 [inline] qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187 kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294 __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447 kasan_slab_alloc include/linux/kasan.h:224 [inline] slab_post_alloc_hook mm/slab.h:727 [inline] slab_alloc_node mm/slub.c:3243 [inline] slab_alloc mm/slub.c:3251 [inline] __kmem_cache_alloc_lru mm/slub.c:3258 [inline] kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268 kmem_cache_zalloc include/linux/slab.h:723 [inline] alloc_buffer_head+0x20/0x140 fs/buffer.c:2974 alloc_page_buffers+0x280/0x790 fs/buffer.c:829 create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558 ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074 ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996 generic_perform_write+0x246/0x560 mm/filemap.c:3738 ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270 ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679 call_write_iter include/linux/fs.h:2187 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x9e9/0xdd0 fs/read_write.c:578 Fixes: af356afa010f ("net_sched: reintroduce dev->qdisc for use by sch_api") Reported-by: syzbot <[email protected]> Diagnosed-by: Dmitry Vyukov <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-19net: hns: fix possible memory leak in hnae_ae_register()Yang Yingliang1-1/+3
Inject fault while probing module, if device_register() fails, but the refcount of kobject is not decreased to 0, the name allocated in dev_set_name() is leaked. Fix this by calling put_device(), so that name can be freed in callback function kobject_cleanup(). unreferenced object 0xffff00c01aba2100 (size 128): comm "systemd-udevd", pid 1259, jiffies 4294903284 (age 294.152s) hex dump (first 32 bytes): 68 6e 61 65 30 00 00 00 18 21 ba 1a c0 00 ff ff hnae0....!...... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000034783f26>] slab_post_alloc_hook+0xa0/0x3e0 [<00000000748188f2>] __kmem_cache_alloc_node+0x164/0x2b0 [<00000000ab0743e8>] __kmalloc_node_track_caller+0x6c/0x390 [<000000006c0ffb13>] kvasprintf+0x8c/0x118 [<00000000fa27bfe1>] kvasprintf_const+0x60/0xc8 [<0000000083e10ed7>] kobject_set_name_vargs+0x3c/0xc0 [<000000000b87affc>] dev_set_name+0x7c/0xa0 [<000000003fd8fe26>] hnae_ae_register+0xcc/0x190 [hnae] [<00000000fe97edc9>] hns_dsaf_ae_init+0x9c/0x108 [hns_dsaf] [<00000000c36ff1eb>] hns_dsaf_probe+0x548/0x748 [hns_dsaf] Fixes: 6fe6611ff275 ("net: add Hisilicon Network Subsystem hnae framework support") Signed-off-by: Yang Yingliang <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-19wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new()Yang Yingliang1-1/+1
Inject fault while probing module, if device_register() fails, but the refcount of kobject is not decreased to 0, the name allocated in dev_set_name() is leaked. Fix this by calling put_device(), so that name can be freed in callback function kobject_cleanup(). unreferenced object 0xffff88810152ad20 (size 8): comm "modprobe", pid 252, jiffies 4294849206 (age 22.713s) hex dump (first 8 bytes): 68 77 73 69 6d 30 00 ff hwsim0.. backtrace: [<000000009c3504ed>] __kmalloc_node_track_caller+0x44/0x1b0 [<00000000c0228a5e>] kvasprintf+0xb5/0x140 [<00000000cff8c21f>] kvasprintf_const+0x55/0x180 [<0000000055a1e073>] kobject_set_name_vargs+0x56/0x150 [<000000000a80b139>] dev_set_name+0xab/0xe0 Fixes: f36a111a74e7 ("wwan_hwsim: WWAN device simulator") Signed-off-by: Yang Yingliang <[email protected]> Reviewed-by: Loic Poulain <[email protected]> Acked-by: Sergey Ryazanov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-19sfc: include vport_id in filter spec hash and equal()Pieter Jansen van Vuuren2-7/+7
Filters on different vports are qualified by different implicit MACs and/or VLANs, so shouldn't be considered equal even if their other match fields are identical. Fixes: 7c460d9be610 ("sfc: Extend and abstract efx_filter_spec to cover Huntington/EF10") Co-developed-by: Edward Cree <[email protected]> Signed-off-by: Edward Cree <[email protected]> Signed-off-by: Pieter Jansen van Vuuren <[email protected]> Reviewed-by: Martin Habets <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski5-2/+8
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Missing flowi uid field in nft_fib expression, from Guillaume Nault. This is broken since the creation of the fib expression. 2) Relax sanity check to fix bogus EINVAL error when deleting elements belonging set intervals. Broken since 6.0-rc. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-19genetlink: fix kdoc warningsJakub Kicinski1-3/+5
Address a bunch of kdoc warnings: include/net/genetlink.h:81: warning: Function parameter or member 'module' not described in 'genl_family' include/net/genetlink.h:243: warning: expecting prototype for struct genl_info. Prototype was for struct genl_dumpit_info instead include/net/genetlink.h:419: warning: Function parameter or member 'net' not described in 'genlmsg_unicast' include/net/genetlink.h:438: warning: expecting prototype for gennlmsg_data(). Prototype was for genlmsg_data() instead include/net/genetlink.h:244: warning: Function parameter or member 'op' not described in 'genl_dumpit_info' Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-19Merge branch 'qdisc-ingress-success'David S. Miller3-0/+84
Paul Blakey says: ==================== net: Fix return value of qdisc ingress handling on success Fix patch + self-test with the currently broken scenario. v4->v3: Removed new line in self test and rebase (Paolo). v2->v3: Added DROP return to TC_ACT_SHOT case (Cong). v1->v2: Changed blamed commit Added self-test ==================== Signed-off-by: David S. Miller <[email protected]>
2022-10-19selftests: add selftest for chaining of tc ingress handling to egressPaul Blakey2-0/+80
This test runs a simple ingress tc setup between two veth pairs, then adds a egress->ingress rule to test the chaining of tc ingress pipeline to tc egress piepline. Signed-off-by: Paul Blakey <[email protected]> Signed-off-by: David S. Miller <[email protected]>