Age | Commit message (Collapse) | Author | Files | Lines |
|
When putting an inode during extent map shrinking we're doing a standard
iput() but that may take a long time in case the inode is dirty and we are
doing the final iput that triggers eviction - the VFS will have to wait
for writeback before calling the btrfs evict callback (see
fs/inode.c:evict()).
This slows down the task running the shrinker which may have been
triggered while updating some tree for example, meaning locks are held
as well as an open transaction handle.
Also if the iput() ends up triggering eviction and the inode has no links
anymore, then we trigger item truncation which requires flushing delayed
items, space reservation to start a transaction and that may trigger the
space reclaim task and wait for it, resulting in deadlocks in case the
reclaim task needs for example to commit a transaction and the shrinker
is being triggered from a path holding a transaction handle.
Syzbot reported such a case with the following stack traces:
======================================================
WARNING: possible circular locking dependency detected
6.10.0-rc2-syzkaller-00010-g2ab795141095 #0 Not tainted
------------------------------------------------------
kswapd0/111 is trying to acquire lock:
ffff88801eae4610 (sb_internal#3){.+.+}-{0:0}, at: btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275
but task is already holding lock:
ffffffff8dd3a9a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xa88/0x1970 mm/vmscan.c:6924
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (fs_reclaim){+.+.}-{0:0}:
__fs_reclaim_acquire mm/page_alloc.c:3783 [inline]
fs_reclaim_acquire+0x102/0x160 mm/page_alloc.c:3797
might_alloc include/linux/sched/mm.h:334 [inline]
slab_pre_alloc_hook mm/slub.c:3890 [inline]
slab_alloc_node mm/slub.c:3980 [inline]
kmem_cache_alloc_lru_noprof+0x58/0x2f0 mm/slub.c:4019
btrfs_alloc_inode+0x118/0xb20 fs/btrfs/inode.c:8411
alloc_inode+0x5d/0x230 fs/inode.c:261
iget5_locked fs/inode.c:1235 [inline]
iget5_locked+0x1c9/0x2c0 fs/inode.c:1228
btrfs_iget_locked fs/btrfs/inode.c:5590 [inline]
btrfs_iget_path fs/btrfs/inode.c:5607 [inline]
btrfs_iget+0xfb/0x230 fs/btrfs/inode.c:5636
create_reloc_inode+0x403/0x820 fs/btrfs/relocation.c:3911
btrfs_relocate_block_group+0x471/0xe60 fs/btrfs/relocation.c:4114
btrfs_relocate_chunk+0x143/0x450 fs/btrfs/volumes.c:3373
__btrfs_balance fs/btrfs/volumes.c:4157 [inline]
btrfs_balance+0x211a/0x3f00 fs/btrfs/volumes.c:4534
btrfs_ioctl_balance fs/btrfs/ioctl.c:3675 [inline]
btrfs_ioctl+0x12ed/0x8290 fs/btrfs/ioctl.c:4742
__do_compat_sys_ioctl+0x2c3/0x330 fs/ioctl.c:1007
do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
__do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
-> #2 (btrfs_trans_num_extwriters){++++}-{0:0}:
join_transaction+0x164/0xf40 fs/btrfs/transaction.c:315
start_transaction+0x427/0x1a70 fs/btrfs/transaction.c:700
btrfs_rebuild_free_space_tree+0xaa/0x480 fs/btrfs/free-space-tree.c:1323
btrfs_start_pre_rw_mount+0x218/0xf60 fs/btrfs/disk-io.c:2999
open_ctree+0x41ab/0x52e0 fs/btrfs/disk-io.c:3554
btrfs_fill_super fs/btrfs/super.c:946 [inline]
btrfs_get_tree_super fs/btrfs/super.c:1863 [inline]
btrfs_get_tree+0x11e9/0x1b90 fs/btrfs/super.c:2089
vfs_get_tree+0x8f/0x380 fs/super.c:1780
fc_mount+0x16/0xc0 fs/namespace.c:1125
btrfs_get_tree_subvol fs/btrfs/super.c:2052 [inline]
btrfs_get_tree+0xa53/0x1b90 fs/btrfs/super.c:2090
vfs_get_tree+0x8f/0x380 fs/super.c:1780
do_new_mount fs/namespace.c:3352 [inline]
path_mount+0x6e1/0x1f10 fs/namespace.c:3679
do_mount fs/namespace.c:3692 [inline]
__do_sys_mount fs/namespace.c:3898 [inline]
__se_sys_mount fs/namespace.c:3875 [inline]
__ia32_sys_mount+0x295/0x320 fs/namespace.c:3875
do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
__do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
-> #1 (btrfs_trans_num_writers){++++}-{0:0}:
join_transaction+0x148/0xf40 fs/btrfs/transaction.c:314
start_transaction+0x427/0x1a70 fs/btrfs/transaction.c:700
btrfs_rebuild_free_space_tree+0xaa/0x480 fs/btrfs/free-space-tree.c:1323
btrfs_start_pre_rw_mount+0x218/0xf60 fs/btrfs/disk-io.c:2999
open_ctree+0x41ab/0x52e0 fs/btrfs/disk-io.c:3554
btrfs_fill_super fs/btrfs/super.c:946 [inline]
btrfs_get_tree_super fs/btrfs/super.c:1863 [inline]
btrfs_get_tree+0x11e9/0x1b90 fs/btrfs/super.c:2089
vfs_get_tree+0x8f/0x380 fs/super.c:1780
fc_mount+0x16/0xc0 fs/namespace.c:1125
btrfs_get_tree_subvol fs/btrfs/super.c:2052 [inline]
btrfs_get_tree+0xa53/0x1b90 fs/btrfs/super.c:2090
vfs_get_tree+0x8f/0x380 fs/super.c:1780
do_new_mount fs/namespace.c:3352 [inline]
path_mount+0x6e1/0x1f10 fs/namespace.c:3679
do_mount fs/namespace.c:3692 [inline]
__do_sys_mount fs/namespace.c:3898 [inline]
__se_sys_mount fs/namespace.c:3875 [inline]
__ia32_sys_mount+0x295/0x320 fs/namespace.c:3875
do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
__do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
-> #0 (sb_internal#3){.+.+}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain kernel/locking/lockdep.c:3869 [inline]
__lock_acquire+0x2478/0x3b30 kernel/locking/lockdep.c:5137
lock_acquire kernel/locking/lockdep.c:5754 [inline]
lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5719
percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
__sb_start_write include/linux/fs.h:1655 [inline]
sb_start_intwrite include/linux/fs.h:1838 [inline]
start_transaction+0xbc1/0x1a70 fs/btrfs/transaction.c:694
btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275
btrfs_evict_inode+0x960/0xe80 fs/btrfs/inode.c:5291
evict+0x2ed/0x6c0 fs/inode.c:667
iput_final fs/inode.c:1741 [inline]
iput.part.0+0x5a8/0x7f0 fs/inode.c:1767
iput+0x5c/0x80 fs/inode.c:1757
btrfs_scan_root fs/btrfs/extent_map.c:1118 [inline]
btrfs_free_extent_maps+0xbd3/0x1320 fs/btrfs/extent_map.c:1189
super_cache_scan+0x409/0x550 fs/super.c:227
do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435
shrink_slab+0x18a/0x1310 mm/shrinker.c:662
shrink_one+0x493/0x7c0 mm/vmscan.c:4790
shrink_many mm/vmscan.c:4851 [inline]
lru_gen_shrink_node+0x89f/0x1750 mm/vmscan.c:4951
shrink_node mm/vmscan.c:5910 [inline]
kswapd_shrink_node mm/vmscan.c:6720 [inline]
balance_pgdat+0x1105/0x1970 mm/vmscan.c:6911
kswapd+0x5ea/0xbf0 mm/vmscan.c:7180
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
other info that might help us debug this:
Chain exists of:
sb_internal#3 --> btrfs_trans_num_extwriters --> fs_reclaim
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(btrfs_trans_num_extwriters);
lock(fs_reclaim);
rlock(sb_internal#3);
*** DEADLOCK ***
2 locks held by kswapd0/111:
#0: ffffffff8dd3a9a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xa88/0x1970 mm/vmscan.c:6924
#1: ffff88801eae40e0 (&type->s_umount_key#62){++++}-{3:3}, at: super_trylock_shared fs/super.c:562 [inline]
#1: ffff88801eae40e0 (&type->s_umount_key#62){++++}-{3:3}, at: super_cache_scan+0x96/0x550 fs/super.c:196
stack backtrace:
CPU: 0 PID: 111 Comm: kswapd0 Not tainted 6.10.0-rc2-syzkaller-00010-g2ab795141095 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114
check_noncircular+0x31a/0x400 kernel/locking/lockdep.c:2187
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain kernel/locking/lockdep.c:3869 [inline]
__lock_acquire+0x2478/0x3b30 kernel/locking/lockdep.c:5137
lock_acquire kernel/locking/lockdep.c:5754 [inline]
lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5719
percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
__sb_start_write include/linux/fs.h:1655 [inline]
sb_start_intwrite include/linux/fs.h:1838 [inline]
start_transaction+0xbc1/0x1a70 fs/btrfs/transaction.c:694
btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275
btrfs_evict_inode+0x960/0xe80 fs/btrfs/inode.c:5291
evict+0x2ed/0x6c0 fs/inode.c:667
iput_final fs/inode.c:1741 [inline]
iput.part.0+0x5a8/0x7f0 fs/inode.c:1767
iput+0x5c/0x80 fs/inode.c:1757
btrfs_scan_root fs/btrfs/extent_map.c:1118 [inline]
btrfs_free_extent_maps+0xbd3/0x1320 fs/btrfs/extent_map.c:1189
super_cache_scan+0x409/0x550 fs/super.c:227
do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435
shrink_slab+0x18a/0x1310 mm/shrinker.c:662
shrink_one+0x493/0x7c0 mm/vmscan.c:4790
shrink_many mm/vmscan.c:4851 [inline]
lru_gen_shrink_node+0x89f/0x1750 mm/vmscan.c:4951
shrink_node mm/vmscan.c:5910 [inline]
kswapd_shrink_node mm/vmscan.c:6720 [inline]
balance_pgdat+0x1105/0x1970 mm/vmscan.c:6911
kswapd+0x5ea/0xbf0 mm/vmscan.c:7180
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
So fix this by using btrfs_add_delayed_iput() so that the final iput is
delegated to the cleaner kthread.
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Reported-by: [email protected]
Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps")
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Currently, when built with "make W=1", the following warnings are
generated:
net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'work' not described in 'crush_choose_firstn'
net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'weight' not described in 'crush_choose_firstn'
net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'weight_max' not described in 'crush_choose_firstn'
net/ceph/crush/mapper.c:466: warning: Function parameter or struct member 'choose_args' not described in 'crush_choose_firstn'
Update the crush_choose_firstn() kernel-doc to document these
parameters.
Signed-off-by: Jeff Johnson <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Currently, when built with "make W=1", the following warnings are
generated:
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'map' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'work' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'bucket' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'weight' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'weight_max' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'x' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'left' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'numrep' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'type' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'out' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'outpos' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'tries' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'recurse_tries' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'recurse_to_leaf' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'out2' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'parent_r' not described in 'crush_choose_indep'
net/ceph/crush/mapper.c:655: warning: Function parameter or struct member 'choose_args' not described in 'crush_choose_indep'
These warnings are generated because the prologue comment for
crush_choose_indep() uses the kernel-doc prefix, but the actual
comment is a very brief description that is not in kernel-doc
format. Since this is a static function there is no need to fully
document the function, so replace the kernel-doc comment prefix with a
standard comment prefix to remove these warnings.
Signed-off-by: Jeff Johnson <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains Netfilter fixes for net:
Patch #1 fixes a bogus WARN_ON splat in nfnetlink_queue.
Patch #2 fixes a crash due to stack overflow in chain loop detection
by using the existing chain validation routines
Both patches from Florian Westphal.
netfilter pull request 24-07-11
* tag 'nf-24-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: prefer nft_chain_validate
netfilter: nfnetlink_queue: drop bogus WARN_ON
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-07-11
The following pull-request contains BPF updates for your *net* tree.
We've added 4 non-merge commits during the last 2 day(s) which contain
a total of 4 files changed, 262 insertions(+), 19 deletions(-).
The main changes are:
1) Fixes for a BPF timer lockup and a use-after-free scenario when timers
are used concurrently, from Kumar Kartikeya Dwivedi.
2) Fix the argument order in the call to bpf_map_kvcalloc() which could
otherwise lead to a compilation error, from Mohammad Shehar Yaar Tausif.
bpf-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add timer lockup selftest
bpf: Defer work in bpf_timer_cancel_and_free
bpf: Fail bpf_timer_cancel when callback is being cancelled
bpf: fix order of args in call to bpf_map_kvcalloc
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
When using a BPF program on kernel_connect(), the call can return -EPERM. This
causes xs_tcp_setup_socket() to loop forever, filling up the syslog and causing
the kernel to potentially freeze up.
Neil suggested:
This will propagate -EPERM up into other layers which might not be ready
to handle it. It might be safer to map EPERM to an error we would be more
likely to expect from the network system - such as ECONNREFUSED or ENETDOWN.
ECONNREFUSED as error seems reasonable. For programs setting a different error
can be out of reach (see handling in 4fbac77d2d09) in particular on kernels
which do not have f10d05966196 ("bpf: Make BPF_PROG_RUN_ARRAY return -err
instead of allow boolean"), thus given that it is better to simply remap for
consistent behavior. UDP does handle EPERM in xs_udp_send_request().
Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect")
Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind")
Co-developed-by: Lex Siegel <[email protected]>
Signed-off-by: Lex Siegel <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Anna Schumaker <[email protected]>
Link: https://github.com/cilium/cilium/issues/33395
Link: https://lore.kernel.org/bpf/[email protected]
Link: https://patch.msgid.link/9069ec1d59e4b2129fc23433349fd5580ad43921.1720075070.git.daniel@iogearbox.net
Signed-off-by: Paolo Abeni <[email protected]>
|
|
KASAN reports the following UAF:
BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct]
Read of size 1 at addr ffff888c07603600 by task handler130/6469
Call Trace:
<IRQ>
dump_stack_lvl+0x48/0x70
print_address_description.constprop.0+0x33/0x3d0
print_report+0xc0/0x2b0
kasan_report+0xd0/0x120
__asan_load1+0x6c/0x80
tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct]
tcf_ct_act+0x886/0x1350 [act_ct]
tcf_action_exec+0xf8/0x1f0
fl_classify+0x355/0x360 [cls_flower]
__tcf_classify+0x1fd/0x330
tcf_classify+0x21c/0x3c0
sch_handle_ingress.constprop.0+0x2c5/0x500
__netif_receive_skb_core.constprop.0+0xb25/0x1510
__netif_receive_skb_list_core+0x220/0x4c0
netif_receive_skb_list_internal+0x446/0x620
napi_complete_done+0x157/0x3d0
gro_cell_poll+0xcf/0x100
__napi_poll+0x65/0x310
net_rx_action+0x30c/0x5c0
__do_softirq+0x14f/0x491
__irq_exit_rcu+0x82/0xc0
irq_exit_rcu+0xe/0x20
common_interrupt+0xa1/0xb0
</IRQ>
<TASK>
asm_common_interrupt+0x27/0x40
Allocated by task 6469:
kasan_save_stack+0x38/0x70
kasan_set_track+0x25/0x40
kasan_save_alloc_info+0x1e/0x40
__kasan_krealloc+0x133/0x190
krealloc+0xaa/0x130
nf_ct_ext_add+0xed/0x230 [nf_conntrack]
tcf_ct_act+0x1095/0x1350 [act_ct]
tcf_action_exec+0xf8/0x1f0
fl_classify+0x355/0x360 [cls_flower]
__tcf_classify+0x1fd/0x330
tcf_classify+0x21c/0x3c0
sch_handle_ingress.constprop.0+0x2c5/0x500
__netif_receive_skb_core.constprop.0+0xb25/0x1510
__netif_receive_skb_list_core+0x220/0x4c0
netif_receive_skb_list_internal+0x446/0x620
napi_complete_done+0x157/0x3d0
gro_cell_poll+0xcf/0x100
__napi_poll+0x65/0x310
net_rx_action+0x30c/0x5c0
__do_softirq+0x14f/0x491
Freed by task 6469:
kasan_save_stack+0x38/0x70
kasan_set_track+0x25/0x40
kasan_save_free_info+0x2b/0x60
____kasan_slab_free+0x180/0x1f0
__kasan_slab_free+0x12/0x30
slab_free_freelist_hook+0xd2/0x1a0
__kmem_cache_free+0x1a2/0x2f0
kfree+0x78/0x120
nf_conntrack_free+0x74/0x130 [nf_conntrack]
nf_ct_destroy+0xb2/0x140 [nf_conntrack]
__nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack]
nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack]
__nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack]
tcf_ct_act+0x12ad/0x1350 [act_ct]
tcf_action_exec+0xf8/0x1f0
fl_classify+0x355/0x360 [cls_flower]
__tcf_classify+0x1fd/0x330
tcf_classify+0x21c/0x3c0
sch_handle_ingress.constprop.0+0x2c5/0x500
__netif_receive_skb_core.constprop.0+0xb25/0x1510
__netif_receive_skb_list_core+0x220/0x4c0
netif_receive_skb_list_internal+0x446/0x620
napi_complete_done+0x157/0x3d0
gro_cell_poll+0xcf/0x100
__napi_poll+0x65/0x310
net_rx_action+0x30c/0x5c0
__do_softirq+0x14f/0x491
The ct may be dropped if a clash has been resolved but is still passed to
the tcf_ct_flow_table_process_conn function for further usage. This issue
can be fixed by retrieving ct from skb again after confirming conntrack.
Fixes: 0cc254e5aa37 ("net/sched: act_ct: Offload connections with commit action")
Co-developed-by: Gerald Yang <[email protected]>
Signed-off-by: Gerald Yang <[email protected]>
Signed-off-by: Chengen Du <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The amount of TX space in the hardware buffer is tracked in the tx_space
variable. The initial value is currently only set during driver probing.
After closing the interface and reopening it the tx_space variable has
the last value it had before close. If it is smaller than the size of
the first send packet after reopeing the interface the queue will be
stopped. The queue is woken up after receiving a TX interrupt but this
will never happen since we did not send anything.
This commit moves the initialization of the tx_space variable to the
ks8851_net_open function right before starting the TX queue. Also query
the value from the hardware instead of using a hard coded value.
Only the SPI chip variant is affected by this issue because only this
driver variant actually depends on the tx_space variable in the xmit
function.
Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: [email protected]
Cc: [email protected] # 5.10+
Signed-off-by: Ronald Wahl <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
syzkaller triggered the warning [0] in udp_v4_early_demux().
In udp_v[46]_early_demux() and sk_lookup(), we do not touch the refcount
of the looked-up sk and use sock_pfree() as skb->destructor, so we check
SOCK_RCU_FREE to ensure that the sk is safe to access during the RCU grace
period.
Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
into the hash table. Moreover, the SOCK_RCU_FREE check is done too early
in udp_v[46]_early_demux() and sk_lookup(), so there could be a small race
window:
CPU1 CPU2
---- ----
udp_v4_early_demux() udp_lib_get_port()
| |- hlist_add_head_rcu()
|- sk = __udp4_lib_demux_lookup() |
|- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
`- sock_set_flag(sk, SOCK_RCU_FREE)
We had the same bug in TCP and fixed it in commit 871019b22d1b ("net:
set SOCK_RCU_FREE before inserting socket into hashtable").
Let's apply the same fix for UDP.
[0]:
WARNING: CPU: 0 PID: 11198 at net/ipv4/udp.c:2599 udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599
Modules linked in:
CPU: 0 PID: 11198 Comm: syz-executor.1 Not tainted 6.9.0-g93bda33046e7 #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599
Code: c5 7a 15 fe bb 01 00 00 00 44 89 e9 31 ff d3 e3 81 e3 bf ef ff ff 89 de e8 2c 74 15 fe 85 db 0f 85 02 06 00 00 e8 9f 7a 15 fe <0f> 0b e8 98 7a 15 fe 49 8d 7e 60 e8 4f 39 2f fe 49 c7 46 60 20 52
RSP: 0018:ffffc9000ce3fa58 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8318c92c
RDX: ffff888036ccde00 RSI: ffffffff8318c2f1 RDI: 0000000000000001
RBP: ffff88805a2dd6e0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0001ffffffffffff R12: ffff88805a2dd680
R13: 0000000000000007 R14: ffff88800923f900 R15: ffff88805456004e
FS: 00007fc449127640(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc449126e38 CR3: 000000003de4b002 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
PKRU: 55555554
Call Trace:
<TASK>
ip_rcv_finish_core.constprop.0+0xbdd/0xd20 net/ipv4/ip_input.c:349
ip_rcv_finish+0xda/0x150 net/ipv4/ip_input.c:447
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ip_rcv+0x16c/0x180 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5624
__netif_receive_skb+0x21/0xd0 net/core/dev.c:5738
netif_receive_skb_internal net/core/dev.c:5824 [inline]
netif_receive_skb+0x271/0x300 net/core/dev.c:5884
tun_rx_batched drivers/net/tun.c:1549 [inline]
tun_get_user+0x24db/0x2c50 drivers/net/tun.c:2002
tun_chr_write_iter+0x107/0x1a0 drivers/net/tun.c:2048
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0x76f/0x8d0 fs/read_write.c:590
ksys_write+0xbf/0x190 fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__x64_sys_write+0x41/0x50 fs/read_write.c:652
x64_sys_call+0xe66/0x1990 arch/x86/include/generated/asm/syscalls_64.h:2
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x4b/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fc44a68bc1f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 cf f5 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 3c d0 f5 ff 48
RSP: 002b:00007fc449126c90 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007fc44a68bc1f
RDX: 0000000000000032 RSI: 00000000200000c0 RDI: 00000000000000c8
RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000032 R11: 0000000000000293 R12: 0000000000000000
R13: 000000000000000b R14: 00007fc44a5ec530 R15: 0000000000000000
</TASK>
Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
nft_chain_validate already performs loop detection because a cycle will
result in a call stack overflow (ctx->level >= NFT_JUMP_STACK_SIZE).
It also follows maps via ->validate callback in nft_lookup, so there
appears no reason to iterate the maps again.
nf_tables_check_loops() and all its helper functions can be removed.
This improves ruleset load time significantly, from 23s down to 12s.
This also fixes a crash bug. Old loop detection code can result in
unbounded recursion:
BUG: TASK stack guard page was hit at ....
Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN
CPU: 4 PID: 1539 Comm: nft Not tainted 6.10.0-rc5+ #1
[..]
with a suitable ruleset during validation of register stores.
I can't see any actual reason to attempt to check for this from
nft_validate_register_store(), at this point the transaction is still in
progress, so we don't have a full picture of the rule graph.
For nf-next it might make sense to either remove it or make this depend
on table->validate_state in case we could catch an error earlier
(for improved error reporting to userspace).
Fixes: 20a69341f2d0 ("netfilter: nf_tables: add netlink set API")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Happens when rules get flushed/deleted while packet is out, so remove
this WARN_ON.
This WARN exists in one form or another since v4.14, no need to backport
this to older releases, hence use a more recent fixes tag.
Fixes: 3f8019688894 ("netfilter: move nf_reinject into nfnetlink_queue modules")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Do not attach SQI value if link is down. "SQI values are only valid if
link-up condition is present" per OpenAlliance specification of
100Base-T1 Interoperability Test suite [1]. The same rule would apply
for other link types.
[1] https://opensig.org/automotive-ethernet-specifications/#
Fixes: 806602191592 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)")
Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Woojung Huh <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Since 'ppp_async_encode()' assumes valid LCP packets (with code
from 1 to 7 inclusive), add 'ppp_check_packet()' to ensure that
LCP packet has an actual body beyond PPP_LCP header bytes, and
reject claimed-as-LCP but actually malformed data otherwise.
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=ec0723ba9605678b14bf
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Dmitry Antipov <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Add a selftest that tries to trigger a situation where two timer callbacks
are attempting to cancel each other's timer. By running them continuously,
we hit a condition where both run in parallel and cancel each other.
Without the fix in the previous patch, this would cause a lockup as
hrtimer_cancel on either side will wait for forward progress from the
callback.
Ensure that this situation leads to a EDEADLK error.
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The below commit introduced a warning message when phy state is not in
the states: PHY_HALTED, PHY_READY, and PHY_UP.
commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
mtk-star-emac doesn't need mdiobus suspend/resume. To fix the warning
message during resume, indicate the phy resume/suspend is managed by the
mac when probing.
Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Signed-off-by: Jian Hui Lee <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Tony Nguyen says:
====================
ice: Support to dump PHY config, FEC
Anil Samal says:
Implementation to dump PHY configuration and FEC statistics to
facilitate link level debugging of customer issues. Implementation has
two parts
a. Serdes equalization
# ethtool -d eth0
Output:
Offset Values
------ ------
0x0000: 00 00 00 00 03 00 00 00 05 00 00 00 01 08 00 40
0x0010: 01 00 00 40 00 00 39 3c 01 00 00 00 00 00 00 00
0x0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
0x01f0: 01 00 00 00 ef be ad de 8f 00 00 00 00 00 00 00
0x0200: 00 00 00 00 ef be ad de 00 00 00 00 00 00 00 00
0x0210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0230: 00 00 00 00 00 00 00 00 00 00 00 00 fa ff 00 00
0x0240: 06 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00
0x0250: 0f b0 0f b0 00 00 00 00 00 00 00 00 00 00 00 00
0x0260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x02a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x02b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x02c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x02d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x02e0: 00 00 00 00 00 00 00 00 00 00 00 00
Current implementation appends 176 bytes i.e. 44 bytes * 4 serdes lane.
For port with 2 serdes lane, first 88 bytes are valid values and
remaining 88 bytes are filled with zero. Similarly for port with 1
serdes lane, first 44 bytes are valid and remaining 132 bytes are marked
zero.
Each set of serdes equalizer parameter (i.e. set of 44 bytes) follows
below order
a. rx_equalization_pre2
b. rx_equalization_pre1
c. rx_equalization_post1
d. rx_equalization_bflf
e. rx_equalization_bfhf
f. rx_equalization_drate
g. tx_equalization_pre1
h. tx_equalization_pre3
i. tx_equalization_atten
j. tx_equalization_post1
k. tx_equalization_pre2
Where each individual equalizer parameter is of 4 bytes. As ethtool
prints values as individual bytes, for little endian machine these
values will be in reverse byte order.
b. FEC block counts
# ethtool -I --show-fec eth0
Output:
FEC parameters for eth0:
Supported/Configured FEC encodings: Auto RS BaseR
Active FEC encoding: RS
Statistics:
corrected_blocks: 0
uncorrectable_blocks: 0
This series do following:
Patch 1 - Implementation to support user provided flag for side band
queue command.
Patch 2 - Currently driver does not have a way to derive serdes lane
number, pcs quad , pcs port from port number. So we introduced a
mechanism to derive above info.
Ethtool interface extension to include FEC statistics counter.
Patch 3 - Ethtool interface extension to include serdes equalizer output.
v1: https://lore.kernel.org/netdev/[email protected]/
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
To debug link issues in the field, serdes Tx/Rx equalizer values
help to determine the health of serdes lane.
Extend 'ethtool -d' option to dump serdes Tx/Rx equalizer.
The following list of equalizer param is supported
a. rx_equalization_pre2
b. rx_equalization_pre1
c. rx_equalization_post1
d. rx_equalization_bflf
e. rx_equalization_bfhf
f. rx_equalization_drate
g. tx_equalization_pre1
h. tx_equalization_pre3
i. tx_equalization_atten
j. tx_equalization_post1
k. tx_equalization_pre2
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Anil Samal <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
To debug link issues in the field, it is paramount to
dump fec corrected/uncorrected block counts from firmware.
Firmware requires PCS quad number and PCS port number to
read FEC statistics. Current driver implementation does
not maintain above physical properties of a port.
Add new driver API to derive physical properties of an input
port.These properties include PCS quad number, PCS port number,
serdes lane count, primary serdes lane number.
Extend ethtool option '--show-fec' to support fec statistics.
The IEEE standard mandates two sets of counters:
- 30.5.1.1.17 aFECCorrectedBlocks
- 30.5.1.1.18 aFECUncorrectableBlocks
Standard defines above statistics per lane but current
implementation supports total FEC statistics per port
i.e. sum of all lane per port. Find sample output below
FEC parameters for ens21f0np0:
Supported/Configured FEC encodings: Auto RS BaseR
Active FEC encoding: RS
Statistics:
corrected_blocks: 0
uncorrectable_blocks: 0
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Anil Samal <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Current driver implementation for Sideband Queue supports a
fixed flag (ICE_AQ_FLAG_RD). To retrieve FEC statistics from
firmware, Sideband Queue command is used with a different flag.
Extend API for Sideband Queue command to use 'flags' as input
argument.
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Anil Samal <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Commit 861e8086029e ("e1000e: move force SMBUS from enable ulp function
to avoid PHY loss issue") resolved a PHY access loss during suspend on
Meteor Lake consumer platforms, but it affected corporate systems
incorrectly.
A better fix, working for both consumer and corporate systems, was
proposed in commit bfd546a552e1 ("e1000e: move force SMBUS near the end
of enable_ulp function"). However, it introduced a regression on older
devices, such as [8086:15B8], [8086:15F9], [8086:15BE].
This patch aims to fix the secondary regression, by limiting the scope of
the changes to Meteor Lake platforms only.
Fixes: bfd546a552e1 ("e1000e: move force SMBUS near the end of enable_ulp function")
Reported-by: Todd Brandt <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218940
Reported-by: Dieter Mummenschanz <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218936
Signed-off-by: Vitaly Lifshits <[email protected]>
Tested-by: Mor Bar-Gabay <[email protected]> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Convert enetc device binding file to yaml. Split to 3 yaml files,
'fsl,enetc.yaml', 'fsl,enetc-mdio.yaml', 'fsl,enetc-ierb.yaml'.
Additional Changes:
- Add pci<vendor id>,<production id> in compatible string.
- Ref to common ethernet-controller.yaml and mdio.yaml.
- Add Wei fang, Vladimir and Claudiu as maintainer.
- Update ENETC description.
- Remove fixed-link part.
Signed-off-by: Frank Li <[email protected]>
Reviewed-by: Rob Herring (Arm) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
retracted its window to zero, tcp_retransmit_timer() can
retransmit a packet every two jiffies (2 ms for HZ=1000),
for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
The fix is to make sure tcp_rtx_probe0_timed_out() takes
icsk->icsk_user_timeout into account.
Before blamed commit, the socket would not timeout after
icsk->icsk_user_timeout, but would use standard exponential
backoff for the retransmits.
Also worth noting that before commit e89688e3e978 ("net: tcp:
fix unexcepted socket die when snd_wnd is 0"), the issue
would last 2 minutes instead of 4.
Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Neal Cardwell <[email protected]>
Reviewed-by: Jason Xing <[email protected]>
Reviewed-by: Jon Maxwell <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The RTL8211F PHY does support LED configuration, document support
for LEDs in the binding document.
Signed-off-by: Marek Vasut <[email protected]>
Reviewed-by: Rob Herring (Arm) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Kumar Kartikeya Dwivedi says:
====================
Fixes for BPF timer lockup and UAF
The following patches contain fixes for timer lockups and a
use-after-free scenario.
This set proposes to fix the following lockup situation for BPF timers.
CPU 1 CPU 2
bpf_timer_cb bpf_timer_cb
timer_cb1 timer_cb2
bpf_timer_cancel(timer_cb2) bpf_timer_cancel(timer_cb1)
hrtimer_cancel hrtimer_cancel
In this case, both callbacks will continue waiting for each other to
finish synchronously, causing a lockup.
The proposed fix adds support for tracking in-flight cancellations
*begun by other timer callbacks* for a particular BPF timer. Whenever
preparing to call hrtimer_cancel, a callback will increment the target
timer's counter, then inspect its in-flight cancellations, and if
non-zero, return -EDEADLK to avoid situations where the target timer's
callback is waiting for its completion.
This does mean that in cases where a callback is fired and cancelled, it
will be unable to cancel any timers in that execution. This can be
alleviated by maintaining the list of waiting callbacks in bpf_hrtimer
and searching through it to avoid interdependencies, but this may
introduce additional delays in bpf_timer_cancel, in addition to
requiring extra state at runtime which may need to be allocated or
reused from bpf_hrtimer storage. Moreover, extra synchronization is
needed to delete these elements from the list of waiting callbacks once
hrtimer_cancel has finished.
The second patch is for a deadlock situation similar to above in
bpf_timer_cancel_and_free, but also a UAF scenario that can occur if
timer is armed before entering it, if hrtimer_running check causes the
hrtimer_cancel call to be skipped.
As seen above, synchronous hrtimer_cancel would lead to deadlock (if
same callback tries to free its timer, or two timers free each other),
therefore we queue work onto the global workqueue to ensure outstanding
timers are cancelled before bpf_hrtimer state is freed.
Further details are in the patches.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Currently, the same case as previous patch (two timer callbacks trying
to cancel each other) can be invoked through bpf_map_update_elem as
well, or more precisely, freeing map elements containing timers. Since
this relies on hrtimer_cancel as well, it is prone to the same deadlock
situation as the previous patch.
It would be sufficient to use hrtimer_try_to_cancel to fix this problem,
as the timer cannot be enqueued after async_cancel_and_free. Once
async_cancel_and_free has been done, the timer must be reinitialized
before it can be armed again. The callback running in parallel trying to
arm the timer will fail, and freeing bpf_hrtimer without waiting is
sufficient (given kfree_rcu), and bpf_timer_cb will return
HRTIMER_NORESTART, preventing the timer from being rearmed again.
However, there exists a UAF scenario where the callback arms the timer
before entering this function, such that if cancellation fails (due to
timer callback invoking this routine, or the target timer callback
running concurrently). In such a case, if the timer expiration is
significantly far in the future, the RCU grace period expiration
happening before it will free the bpf_hrtimer state and along with it
the struct hrtimer, that is enqueued.
Hence, it is clear cancellation needs to occur after
async_cancel_and_free, and yet it cannot be done inline due to deadlock
issues. We thus modify bpf_timer_cancel_and_free to defer work to the
global workqueue, adding a work_struct alongside rcu_head (both used at
_different_ points of time, so can share space).
Update existing code comments to reflect the new state of affairs.
Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Given a schedule:
timer1 cb timer2 cb
bpf_timer_cancel(timer2); bpf_timer_cancel(timer1);
Both bpf_timer_cancel calls would wait for the other callback to finish
executing, introducing a lockup.
Add an atomic_t count named 'cancelling' in bpf_hrtimer. This keeps
track of all in-flight cancellation requests for a given BPF timer.
Whenever cancelling a BPF timer, we must check if we have outstanding
cancellation requests, and if so, we must fail the operation with an
error (-EDEADLK) since cancellation is synchronous and waits for the
callback to finish executing. This implies that we can enter a deadlock
situation involving two or more timer callbacks executing in parallel
and attempting to cancel one another.
Note that we avoid incrementing the cancelling counter for the target
timer (the one being cancelled) if bpf_timer_cancel is not invoked from
a callback, to avoid spurious errors. The whole point of detecting
cur->cancelling and returning -EDEADLK is to not enter a busy wait loop
(which may or may not lead to a lockup). This does not apply in case the
caller is in a non-callback context, the other side can continue to
cancel as it sees fit without running into errors.
Background on prior attempts:
Earlier versions of this patch used a bool 'cancelling' bit and used the
following pattern under timer->lock to publish cancellation status.
lock(t->lock);
t->cancelling = true;
mb();
if (cur->cancelling)
return -EDEADLK;
unlock(t->lock);
hrtimer_cancel(t->timer);
t->cancelling = false;
The store outside the critical section could overwrite a parallel
requests t->cancelling assignment to true, to ensure the parallely
executing callback observes its cancellation status.
It would be necessary to clear this cancelling bit once hrtimer_cancel
is done, but lack of serialization introduced races. Another option was
explored where bpf_timer_start would clear the bit when (re)starting the
timer under timer->lock. This would ensure serialized access to the
cancelling bit, but may allow it to be cleared before in-flight
hrtimer_cancel has finished executing, such that lockups can occur
again.
Thus, we choose an atomic counter to keep track of all outstanding
cancellation requests and use it to prevent lockups in case callbacks
attempt to cancel each other while executing in parallel.
Reported-by: Dohyun Kim <[email protected]>
Reported-by: Neel Natu <[email protected]>
Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The original function call passed size of smap->bucket before the number of
buckets which raises the error 'calloc-transposed-args' on compilation.
Vlastimil Babka added:
The order of parameters can be traced back all the way to 6ac99e8f23d4
("bpf: Introduce bpf sk local storage") accross several refactorings,
and that's why the commit is used as a Fixes: tag.
In v6.10-rc1, a different commit 2c321f3f70bc ("mm: change inlined
allocation helpers to account at the call site") however exposed the
order of args in a way that gcc-14 has enough visibility to start
warning about it, because (in !CONFIG_MEMCG case) bpf_map_kvcalloc is
then a macro alias for kvcalloc instead of a static inline wrapper.
To sum up the warning happens when the following conditions are all met:
- gcc-14 is used (didn't see it with gcc-13)
- commit 2c321f3f70bc is present
- CONFIG_MEMCG is not enabled in .config
- CONFIG_WERROR turns this from a compiler warning to error
Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
Reviewed-by: Andrii Nakryiko <[email protected]>
Tested-by: Christian Kujau <[email protected]>
Signed-off-by: Mohammad Shehar Yaar Tausif <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"21 hotfixes, 15 of which are cc:stable.
No identifiable theme here - all are singleton patches, 19 are for MM"
* tag 'mm-hotfixes-stable-2024-07-10-13-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
mm/hugetlb: fix potential race in __update_and_free_hugetlb_folio()
filemap: replace pte_offset_map() with pte_offset_map_nolock()
arch/xtensa: always_inline get_current() and current_thread_info()
sched.h: always_inline alloc_tag_{save|restore} to fix modpost warnings
MAINTAINERS: mailmap: update Lorenzo Stoakes's email address
mm: fix crashes from deferred split racing folio migration
lib/build_OID_registry: avoid non-destructive substitution for Perl < 5.13.2 compat
mm: gup: stop abusing try_grab_folio
nilfs2: fix kernel bug on rename operation of broken directory
mm/hugetlb_vmemmap: fix race with speculative PFN walkers
cachestat: do not flush stats in recency check
mm/shmem: disable PMD-sized page cache if needed
mm/filemap: skip to create PMD-sized page cache if needed
mm/readahead: limit page cache size in page_cache_ra_order()
mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray
mm/damon/core: merge regions aggressively when max_nr_regions is unmet
Fix userfaultfd_api to return EINVAL as expected
mm: vmalloc: check if a hash-index is in cpu_possible_mask
mm: prevent derefencing NULL ptr in pfn_section_valid()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"One core change that moves a disk start message to a location where it
will only be printed once instead of twice plus a couple of error
handling race fixes in the ufs driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Do not repeat the starting disk message
scsi: ufs: core: Fix ufshcd_abort_one racing issue
scsi: ufs: core: Fix ufshcd_clear_cmd racing issue
|
|
Geliang Tang says:
====================
v2:
- only check the first "link" (link_nl) in test_mixed_links().
- Drop patch 2 in v1.
This patchset fixes a segfault and a bpf object leak in test_progs.
It is a resend patch 1 out of "skip ENOTSUPP BPF selftests" set as Eduard
suggested. Together with another fix for xdp_adjust_tail.
====================
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
If bpf_object__load() fails in test_xdp_adjust_frags_tail_grow(), "obj"
opened before this should be closed. So use "goto out" to close it instead
of using "return" here.
Fixes: 110221081aac ("bpf: selftests: update xdp_adjust_tail selftest to include xdp frags")
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/f282a1ed2d0e3fb38cceefec8e81cabb69cab260.1720615848.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Run bpf_tcp_ca selftests (./test_progs -t bpf_tcp_ca) on a Loongarch
platform, some "Segmentation fault" errors occur:
'''
test_dctcp:PASS:bpf_dctcp__open_and_load 0 nsec
test_dctcp:FAIL:bpf_map__attach_struct_ops unexpected error: -524
#29/1 bpf_tcp_ca/dctcp:FAIL
test_cubic:PASS:bpf_cubic__open_and_load 0 nsec
test_cubic:FAIL:bpf_map__attach_struct_ops unexpected error: -524
#29/2 bpf_tcp_ca/cubic:FAIL
test_dctcp_fallback:PASS:dctcp_skel 0 nsec
test_dctcp_fallback:PASS:bpf_dctcp__load 0 nsec
test_dctcp_fallback:FAIL:dctcp link unexpected error: -524
#29/4 bpf_tcp_ca/dctcp_fallback:FAIL
test_write_sk_pacing:PASS:open_and_load 0 nsec
test_write_sk_pacing:FAIL:attach_struct_ops unexpected error: -524
#29/6 bpf_tcp_ca/write_sk_pacing:FAIL
test_update_ca:PASS:open 0 nsec
test_update_ca:FAIL:attach_struct_ops unexpected error: -524
settcpca:FAIL:setsockopt unexpected setsockopt: \
actual -1 == expected -1
(network_helpers.c:99: errno: No such file or directory) \
Failed to call post_socket_cb
start_test:FAIL:start_server_str unexpected start_server_str: \
actual -1 == expected -1
test_update_ca:FAIL:ca1_ca1_cnt unexpected ca1_ca1_cnt: \
actual 0 <= expected 0
#29/9 bpf_tcp_ca/update_ca:FAIL
#29 bpf_tcp_ca:FAIL
Caught signal #11!
Stack trace:
./test_progs(crash_handler+0x28)[0x5555567ed91c]
linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x7ffffee408b0]
./test_progs(bpf_link__update_map+0x80)[0x555556824a78]
./test_progs(+0x94d68)[0x5555564c4d68]
./test_progs(test_bpf_tcp_ca+0xe8)[0x5555564c6a88]
./test_progs(+0x3bde54)[0x5555567ede54]
./test_progs(main+0x61c)[0x5555567efd54]
/usr/lib64/libc.so.6(+0x22208)[0x7ffff2aaa208]
/usr/lib64/libc.so.6(__libc_start_main+0xac)[0x7ffff2aaa30c]
./test_progs(_start+0x48)[0x55555646bca8]
Segmentation fault
'''
This is because BPF trampoline is not implemented on Loongarch yet,
"link" returned by bpf_map__attach_struct_ops() is NULL. test_progs
crashs when this NULL link passes to bpf_link__update_map(). This
patch adds NULL checks for all links in bpf_tcp_ca to fix these errors.
If "link" is NULL, goto the newly added label "out" to destroy the skel.
v2:
- use "goto out" instead of "return" as Eduard suggested.
Fixes: 06da9f3bd641 ("selftests/bpf: Test switching TCP Congestion Control algorithms.")
Signed-off-by: Geliang Tang <[email protected]>
Reviewed-by: Alan Maguire <[email protected]>
Link: https://lore.kernel.org/r/b4c841492bd4ed97964e4e61e92827ce51bf1dc9.1720615848.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Pull VFIO fix from Alex Williamson:
- Recent stable backports are exposing a bug introduced in the v6.10
development cycle where a counter value is uninitialized. This leads
to regressions in userspace drivers like QEMU where where the kernel
might ask for an arbitrary buffer size or return out of memory itself
based on a bogus value. Zero initialize the counter. (Yi Liu)
* tag 'vfio-v6.10' of https://github.com/awilliam/linux-vfio:
vfio/pci: Init the count variable in collecting hot-reset devices
|
|
Geliang Tang says:
====================
v11:
- new patches 2, 4, 6.
- drop expect_errno from network_helper_opts as Eduard and Martin
suggested.
- drop sockmap_ktls patches from this set.
v10:
- a new patch 10 is added.
- patches 1-6, 8-9 unchanged, only commit logs updated.
- "err = -errno" is used in patches 7, 11, 12 to get the real error
number before checking value of "err".
v9:
- new patches 5-7, new struct member expect_errno for network_helper_opts.
- patches 1-4, 8-9 unchanged.
- update patches 10-11 to make sure all tests pass.
v8:
- only patch 8 updated, to fix errors reported by CI.
v7:
- address Martin's comments in v6. (thanks)
- use MAX(opts->backlog, 0) instead of opts->backlog.
- use connect_to_fd_opts instead connect_to_fd.
- more ASSERT_* to check errors.
v6:
- update patch 6 as Daniel suggested. (thanks)
v5:
- keep make_server and make_client as Eduard suggested.
v4:
- a new patch to use make_sockaddr in sockmap_ktls.
- a new patch to close fd in error path in drop_on_reuseport.
- drop make_server() in patch 7.
- drop make_client() too in patch 9.
v3:
- a new patch to add backlog for network_helper_opts.
- use start_server_str in sockmap_ktls now, not start_server.
v2:
- address Eduard's comments in v1. (thanks)
- fix errors reported by CI.
This patch set uses network helpers in sk_lookup, and drop the local
helpers inetaddr_len() and make_socket().
====================
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
This patch uses public helper connect_fd_to_fd() exported in
network_helpers.h instead of using getsockname() + connect() in
run_lookup_prog() in prog_tests/sk_lookup.c. This can simplify
the code.
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/7077c277cde5a1864cdc244727162fb75c8bb9c5.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
This patch uses public helper start_server_addr() in udp_recv_send()
in prog_tests/sk_lookup.c to simplify the code.
And use ASSERT_OK_FD() to check fd returned by start_server_addr().
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/f11cabfef4a2170ecb66a1e8e2e72116d8f621b3.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
This patch uses public helper start_server_str() to simplify make_server()
in prog_tests/sk_lookup.c.
Add a callback setsockopts() to do all sockopts, set it to post_socket_cb
pointer of struct network_helper_opts. And add a new struct cb_opts to save
the data needed to pass to the callback. Then pass this network_helper_opts
to start_server_str().
Also use ASSERT_OK_FD() to check fd returned by start_server_str().
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/5981539f5591d2c4998c962ef2bf45f34c940548.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
In the error path when update_lookup_map() fails in drop_on_reuseport in
prog_tests/sk_lookup.c, "server1", the fd of server 1, should be closed.
This patch fixes this by using "goto close_srv1" lable instead of "detach"
to close "server1" in this case.
Fixes: 0ab5539f8584 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/86aed33b4b0ea3f04497c757845cff7e8e621a2d.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Add a new dedicated ASSERT macro ASSERT_OK_FD to test whether a socket
FD is valid or not. It can be used to replace macros ASSERT_GT(fd, 0, ""),
ASSERT_NEQ(fd, -1, "") or statements (fd < 0), (fd != -1).
Suggested-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/ded75be86ac630a3a5099739431854c1ec33f0ea.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Some callers expect __start_server() helper to pass their own "backlog"
value to listen() instead of the default of 1. So this patch adds struct
member "backlog" for network_helper_opts to allow callers to set "backlog"
value via start_server_str() helper.
listen(fd, 0 /* backlog */) can be used to enforce syncookie. Meaning
backlog 0 is a legit value.
Using 0 as a default and changing it to 1 here is fine. It makes the test
program easier to write for the common case. Enforcing syncookie mode by
using backlog 0 is a niche use case but it should at least have a way for
the caller to do that. Thus, -ve backlog value is used here for the
syncookie use case. Please see the comment in network_helpers.h for
the details.
Signed-off-by: Geliang Tang <[email protected]>
Link: https://lore.kernel.org/r/1660229659b66eaad07aa2126e9c9fe217eba0dd.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Pull bcachefs fixes from Kent Overstreet:
- Switch some asserts to WARN()
- Fix a few "transaction not locked" asserts in the data read retry
paths and backpointers gc
- Fix a race that would cause the journal to get stuck on a flush
commit
- Add missing fsck checks for the fragmentation LRU
- The usual assorted ssorted syzbot fixes
* tag 'bcachefs-2024-07-10' of https://evilpiepirate.org/git/bcachefs: (22 commits)
bcachefs: Add missing bch2_trans_begin()
bcachefs: Fix missing error check in journal_entry_btree_keys_validate()
bcachefs: Warn on attempting a move with no replicas
bcachefs: bch2_data_update_to_text()
bcachefs: Log mount failure error code
bcachefs: Fix undefined behaviour in eytzinger1_first()
bcachefs: Mark bch_inode_info as SLAB_ACCOUNT
bcachefs: Fix bch2_inode_insert() race path for tmpfiles
closures: fix closure_sync + closure debugging
bcachefs: Fix journal getting stuck on a flush commit
bcachefs: io clock: run timer fns under clock lock
bcachefs: Repair fragmentation_lru in alloc_write_key()
bcachefs: add check for missing fragmentation in check_alloc_to_lru_ref()
bcachefs: bch2_btree_write_buffer_maybe_flush()
bcachefs: Add missing printbuf_tabstops_reset() calls
bcachefs: Fix loop restart in bch2_btree_transactions_read()
bcachefs: Fix bch2_read_retry_nodecode()
bcachefs: Don't use the new_fs() bucket alloc path on an initialized fs
bcachefs: Fix shift greater than integer size
bcachefs: Change bch2_fs_journal_stop() BUG_ON() to warning
...
|
|
In many cases, kernel netfilter functionality is built as modules.
If CONFIG_NF_FLOW_TABLE=m in particular, progs/xdp_flowtable.c
(and hence selftests) will fail to compile, so add a ___local
version of "struct flow_ports".
Fixes: c77e572d3a8c ("selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc")
Signed-off-by: Alan Maguire <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for 6.10-rc8
Here's a fix for a long-standing issue in the mos7840 driver that can trigger
a crash when resuming from system suspend.
Included are also some new modem device ids.
All have been in linux-next with no reported issues.
* tag 'usb-serial-6.10-rc8' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: mos7840: fix crash on resume
USB: serial: option: add Rolling RW350-GL variants
USB: serial: option: add support for Foxconn T99W651
USB: serial: option: add Netprisma LCUK54 series modules
|
|
idpf uses Page Pool for data buffers with hardcoded buffer lengths of
4k for "classic" buffers and 2k for "short" ones. This is not flexible
and does not ensure optimal memory usage. Why would you need 4k buffers
when the MTU is 1500?
Use libeth for the data buffers and don't hardcode any buffer sizes. Let
them be calculated from the MTU for "classics" and then divide the
truesize by 2 for "short" ones. The memory usage is now greatly reduced
and 2 buffer queues starts make sense: on frames <= 1024, you'll recycle
(and resync) a page only after 4 HW writes rather than two.
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Currently, idpf uses the following model for the header buffers:
* buffers are allocated via dma_alloc_coherent();
* when receiving, napi_alloc_skb() is called and then the header is
copied to the newly allocated linear part.
This is far from optimal as DMA coherent zone is slow on many systems
and memcpy() neutralizes the idea and benefits of the header split. Not
speaking of that XDP can't be run on DMA coherent buffers, but at the
same time the idea of allocating an skb to run XDP program is ill.
Instead, use libeth to create page_pools for the header buffers, allocate
them dynamically and then build an skb via napi_build_skb() around them
with no memory copy. With one exception...
When you enable header split, you expect you'll always have a separate
header buffer, so that you could reserve headroom and tailroom only
there and then use full buffers for the data. For example, this is how
TCP zerocopy works -- you have to have the payload aligned to PAGE_SIZE.
The current hardware running idpf does *not* guarantee that you'll
always have headers placed separately. For example, on my setup, even
ICMP packets are written as one piece to the data buffers. You can't
build a valid skb around a data buffer in this case.
To not complicate things and not lose TCP zerocopy etc., when such thing
happens, use the empty header buffer and pull either full frame (if it's
short) or the Ethernet header there and build an skb around it. GRO
layer will pull more from the data buffer later. This W/A will hopefully
be removed one day.
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Unlike previous generations, idpf requires more buffer types for optimal
performance. This includes: header buffers, short buffers, and
no-overhead buffers (w/o headroom and tailroom, for TCP zerocopy when
the header split is enabled).
Introduce libeth Rx buffer type and calculate page_pool params
accordingly. All the HW-related details like buffer alignment are still
accounted. For the header buffers, pick 256 bytes as in most places in
the kernel (have you ever seen frames with bigger headers?).
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Page Pool Ethtool stats are deprecated since the Netlink Page Pool
interface introduction.
idpf receives big changes in Rx buffer management, including &page_pool
layout, so keeping these deprecated stats does only harm, not speaking
of that CONFIG_IDPF selects CONFIG_PAGE_POOL_STATS unconditionally,
while the latter is often turned off for better performance.
Remove all the references to PP stats from the Ethtool code. The stats
are still available in their full via the generic Netlink interface.
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
idpf's in-kernel parsed ptype structure is almost identical to the one
used in the previous Intel drivers, which means it can be converted to
use libeth's definitions and even helpers. The only difference is that
it doesn't use a constant table (libie), rather than one obtained from
the device.
Remove the driver counterpart and use libeth's helpers for hashes and
checksums. This slightly optimizes skb fields processing due to faster
checks. Also don't define big static array of ptypes in &idpf_vport --
allocate them dynamically. The pointer to it is anyway cached in
&idpf_rx_queue.
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Currently, all HW supporting idpf supports the singleq model, but none
of it advertises it by default, as splitq is supported and preferred
for multiple reasons. Still, this almost dead code often times adds
hotpath branches and redundant cacheline accesses.
While it can't currently be removed, add CONFIG_IDPF_SINGLEQ and build
the singleq code only when it's enabled manually. This corresponds to
-10 Kb of object code size and a good bunch of hotpath checks.
idpf_is_queue_model_split() works as a gate and compiles out to `true`
when the config option is disabled.
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
It makes no sense to have a second &net_device_ops struct (800 bytes of
rodata) with only one difference in .ndo_start_xmit, which can easily
be just one `if`. This `if` is a drop in the ocean and you won't see
any difference.
Define unified idpf_xmit_start(). The preparation for sending is the
same, just call either idpf_tx_splitq_frame() or idpf_tx_singleq_frame()
depending on the active model to actually map and send the skb.
Reviewed-by: Przemek Kitszel <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|