aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2021-12-19flow_offload: allow user to offload tc action to net deviceBaowen Zheng14-23/+231
Use flow_indr_dev_register/flow_indr_dev_setup_offload to offload tc action. We need to call tc_cleanup_flow_action to clean up tc action entry since in tc_setup_action, some actions may hold dev refcnt, especially the mirror action. Signed-off-by: Baowen Zheng <[email protected]> Signed-off-by: Louis Peens <[email protected]> Signed-off-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-12-19flow_offload: add ops to tc_action_ops for flow action setupBaowen Zheng13-208/+394
Add a new ops to tc_action_ops for flow action setup. Refactor function tc_setup_flow_action to use this new ops. We make this change to facilitate to add standalone action module. We will also use this ops to offload action independent of filter in following patch. Signed-off-by: Baowen Zheng <[email protected]> Signed-off-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-12-19flow_offload: rename offload functions with offload instead of flowBaowen Zheng3-14/+14
To improves readability, we rename offload functions with offload instead of flow. The term flow is related to exact matches, so we rename these functions with offload. We make this change to facilitate single action offload functions naming. Signed-off-by: Baowen Zheng <[email protected]> Signed-off-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-12-19flow_offload: add index to flow_action_entry structureBaowen Zheng1-2/+1
Add index to flow_action_entry structure and delete index from police and gate child structure. We make this change to offload tc action for driver to identify a tc action. Signed-off-by: Baowen Zheng <[email protected]> Signed-off-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-12-19flow_offload: fill flags to action structureBaowen Zheng14-14/+14
Fill flags to action structure to allow user control if the action should be offloaded to hardware or not. Signed-off-by: Baowen Zheng <[email protected]> Signed-off-by: Louis Peens <[email protected]> Signed-off-by: Simon Horman <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-12-18bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.Hao Luo1-32/+32
Some helper functions may modify its arguments, for example, bpf_d_path, bpf_get_stack etc. Previously, their argument types were marked as ARG_PTR_TO_MEM, which is compatible with read-only mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate, but technically incorrect, to modify a read-only memory by passing it into one of such helper functions. This patch tags the bpf_args compatible with immutable memory with MEM_RDONLY flag. The arguments that don't have this flag will be only compatible with mutable memory types, preventing the helper from modifying a read-only memory. The bpf_args that have MEM_RDONLY are compatible with both mutable memory and immutable memory. Signed-off-by: Hao Luo <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-12-18bpf: Introduce MEM_RDONLY flagHao Luo2-2/+2
This patch introduce a flag MEM_RDONLY to tag a reg value pointing to read-only memory. It makes the following changes: 1. PTR_TO_RDWR_BUF -> PTR_TO_BUF 2. PTR_TO_RDONLY_BUF -> PTR_TO_BUF | MEM_RDONLY Signed-off-by: Hao Luo <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-12-18bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULLHao Luo2-2/+2
We have introduced a new type to make bpf_reg composable, by allocating bits in the type to represent flags. One of the flags is PTR_MAYBE_NULL which indicates a pointer may be NULL. This patch switches the qualified reg_types to use this flag. The reg_types changed in this patch include: 1. PTR_TO_MAP_VALUE_OR_NULL 2. PTR_TO_SOCKET_OR_NULL 3. PTR_TO_SOCK_COMMON_OR_NULL 4. PTR_TO_TCP_SOCK_OR_NULL 5. PTR_TO_BTF_ID_OR_NULL 6. PTR_TO_MEM_OR_NULL 7. PTR_TO_RDONLY_BUF_OR_NULL 8. PTR_TO_RDWR_BUF_OR_NULL Signed-off-by: Hao Luo <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-12-18xdp: move the if dev statements to the firstYajun Deng1-5/+5
The xdp_rxq_info_unreg() called by xdp_rxq_info_reg() is meaningless when dev is NULL, so move the if dev statements to the first. Signed-off-by: Yajun Deng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-12-18ax25: NPD bug when detaching AX25 deviceLin Ma1-1/+3
The existing cleanup routine implementation is not well synchronized with the syscall routine. When a device is detaching, below race could occur. static int ax25_sendmsg(...) { ... lock_sock() ax25 = sk_to_ax25(sk); if (ax25->ax25_dev == NULL) // CHECK ... ax25_queue_xmit(skb, ax25->ax25_dev->dev); // USE ... } static void ax25_kill_by_device(...) { ... if (s->ax25_dev == ax25_dev) { s->ax25_dev = NULL; ... } Other syscall functions like ax25_getsockopt, ax25_getname, ax25_info_show also suffer from similar races. To fix them, this patch introduce lock_sock() into ax25_kill_by_device in order to guarantee that the nullify action in cleanup routine cannot proceed when another socket request is pending. Signed-off-by: Hanjie Wu <[email protected]> Signed-off-by: Lin Ma <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-12-189p/trans_virtio: Fix typo in the comment for p9_virtio_create()zhuxinran1-1/+1
couldlook ==> could look Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: zhuxinran <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
2021-12-17mptcp: clean up harmless false expressionsJean Sacren1-4/+4
entry->addr.id is u8 with a range from 0 to 255 and MAX_ADDR_ID is 255. We should drop both false expressions of (entry->addr.id > MAX_ADDR_ID). We should also remove the obsolete parentheses in the first if branch. Use U8_MAX for MAX_ADDR_ID and add a comment to show the link to mptcp_addr_info.id as suggested by Mr. Matthieu Baerts. Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Jean Sacren <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-17mptcp: enforce HoL-blocking estimationPaolo Abeni2-25/+48
The MPTCP packet scheduler has sub-optimal behavior with asymmetric subflows: if the faster subflow-level cwin is closed, the packet scheduler can enqueue "too much" data on a slower subflow. When all the data on the faster subflow is acked, if the mptcp-level cwin is closed, and link utilization becomes suboptimal. The solution is implementing blest-like[1] HoL-blocking estimation, transmitting only on the subflow with the shorter estimated time to flush the queued memory. If such subflows cwin is closed, we wait even if other subflows are available. This is quite simpler than the original blest implementation, as we leverage the pacing rate provided by the TCP socket. To get a more accurate estimation for the subflow linger-time, we maintain a per-subflow weighted average of such info. Additionally drop magic numbers usage in favor of newly defined macros and use more meaningful names for status variable. [1] http://dl.ifip.org/db/conf/networking/networking2016/1570234725.pdf Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/137 Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-17Revert "tipc: use consistent GFP flags"Hoang Le1-4/+4
This reverts commit 86c3a3e964d910a62eeb277d60b2a60ebefa9feb. The tipc_aead_init() function can be calling from an interrupt routine. This allocation might sleep with GFP_KERNEL flag, hence the following BUG is reported. [ 17.657509] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:230 [ 17.660916] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/3 [ 17.664093] preempt_count: 302, expected: 0 [ 17.665619] RCU nest depth: 2, expected: 0 [ 17.667163] Preemption disabled at: [ 17.667165] [<0000000000000000>] 0x0 [ 17.669753] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G W 5.16.0-rc4+ #1 [ 17.673006] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 17.675540] Call Trace: [ 17.676285] <IRQ> [ 17.676913] dump_stack_lvl+0x34/0x44 [ 17.678033] __might_resched.cold+0xd6/0x10f [ 17.679311] kmem_cache_alloc_trace+0x14d/0x220 [ 17.680663] tipc_crypto_start+0x4a/0x2b0 [tipc] [ 17.682146] ? kmem_cache_alloc_trace+0xd3/0x220 [ 17.683545] tipc_node_create+0x2f0/0x790 [tipc] [ 17.684956] tipc_node_check_dest+0x72/0x680 [tipc] [ 17.686706] ? ___cache_free+0x31/0x350 [ 17.688008] ? skb_release_data+0x128/0x140 [ 17.689431] tipc_disc_rcv+0x479/0x510 [tipc] [ 17.690904] tipc_rcv+0x71c/0x730 [tipc] [ 17.692219] ? __netif_receive_skb_core+0xb7/0xf60 [ 17.693856] tipc_l2_rcv_msg+0x5e/0x90 [tipc] [ 17.695333] __netif_receive_skb_list_core+0x20b/0x260 [ 17.697072] netif_receive_skb_list_internal+0x1bf/0x2e0 [ 17.698870] ? dev_gro_receive+0x4c2/0x680 [ 17.700255] napi_complete_done+0x6f/0x180 [ 17.701657] virtnet_poll+0x29c/0x42e [virtio_net] [ 17.703262] __napi_poll+0x2c/0x170 [ 17.704429] net_rx_action+0x22f/0x280 [ 17.705706] __do_softirq+0xfd/0x30a [ 17.706921] common_interrupt+0xa4/0xc0 [ 17.708206] </IRQ> [ 17.708922] <TASK> [ 17.709651] asm_common_interrupt+0x1e/0x40 [ 17.711078] RIP: 0010:default_idle+0x18/0x20 Fixes: 86c3a3e964d9 ("tipc: use consistent GFP flags") Acked-by: Jon Maloy <[email protected]> Signed-off-by: Hoang Le <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-17net: openvswitch: Fix matching zone id for invalid conns arriving from tcPaul Blakey2-1/+8
Zone id is not restored if we passed ct and ct rejected the connection, as there is no ct info on the skb. Save the zone from tc skb cb to tc skb extension and pass it on to ovs, use that info to restore the zone id for invalid connections. Fixes: d29334c15d33 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct") Signed-off-by: Paul Blakey <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-17net/sched: flow_dissector: Fix matching on zone id for invalid connsPaul Blakey3-2/+5
If ct rejects a flow, it removes the conntrack info from the skb. act_ct sets the post_ct variable so the dissector will see this case as an +tracked +invalid state, but the zone id is lost with the conntrack info. To restore the zone id on such cases, set the last executed zone, via the tc control block, when passing ct, and read it back in the dissector if there is no ct info on the skb (invalid connection). Fixes: 7baf2429a1a9 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support") Signed-off-by: Paul Blakey <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-17net/sched: Extend qdisc control block with tc control blockPaul Blakey5-15/+19
BPF layer extends the qdisc control block via struct bpf_skb_data_end and because of that there is no more room to add variables to the qdisc layer control block without going over the skb->cb size. Extend the qdisc control block with a tc control block, and move all tc related variables to there as a pre-step for extending the tc control block with additional members. Signed-off-by: Paul Blakey <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-17Revert "xsk: Do not sleep in poll() when need_wakeup set"Magnus Karlsson1-2/+2
This reverts commit bd0687c18e635b63233dc87f38058cd728802ab4. This patch causes a Tx only workload to go to sleep even when it does not have to, leading to misserable performance in skb mode. It fixed one rare problem but created a much worse one, so this need to be reverted while I try to craft a proper solution to the original problem. Fixes: bd0687c18e63 ("xsk: Do not sleep in poll() when need_wakeup set") Signed-off-by: Magnus Karlsson <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-12-17bus: mhi: core: Add an API for auto queueing buffers for DL channelManivannan Sadhasivam1-1/+1
Add a new API "mhi_prepare_for_transfer_autoqueue" for using with client drivers like QRTR to request MHI core to autoqueue buffers for the DL channel along with starting both UL and DL channels. So far, the "auto_queue" flag specified by the controller drivers in channel definition served this purpose but this will be removed at some point in future. Cc: [email protected] Cc: Jakub Kicinski <[email protected]> Cc: David S. Miller <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Co-developed-by: Loic Poulain <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: Loic Poulain <[email protected]> Signed-off-by: Manivannan Sadhasivam <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2021-12-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller4-6/+9
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix UAF in set catch-all element, from Eric Dumazet. 2) Fix MAC mangling for multicast/loopback traffic in nfnetlink_queue and nfnetlink_log, from Ignacy Gawędzki. 3) Remove expired entries from ctnetlink dump path regardless the tuple direction, from Florian Westphal. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-12-17xfrm: state and policy should fail if XFRMA_IF_ID 0Antony Antony1-3/+18
xfrm ineterface does not allow xfrm if_id = 0 fail to create or update xfrm state and policy. With this commit: ip xfrm policy add src 192.0.2.1 dst 192.0.2.2 dir out if_id 0 RTNETLINK answers: Invalid argument ip xfrm state add src 192.0.2.1 dst 192.0.2.2 proto esp spi 1 \ reqid 1 mode tunnel aead 'rfc4106(gcm(aes))' \ 0x1111111111111111111111111111111111111111 96 if_id 0 RTNETLINK answers: Invalid argument v1->v2 change: - add Fixes: tag Fixes: 9f8550e4bd9d ("xfrm: fix disable_xfrm sysctl when used on xfrm interfaces") Signed-off-by: Antony Antony <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2021-12-17xfrm: interface with if_id 0 should return errorAntony Antony1-2/+12
xfrm interface if_id = 0 would cause xfrm policy lookup errors since Commit 9f8550e4bd9d. Now explicitly fail to create an xfrm interface when if_id = 0 With this commit: ip link add ipsec0 type xfrm dev lo if_id 0 Error: if_id must be non zero. v1->v2 change: - add Fixes: tag Fixes: 9f8550e4bd9d ("xfrm: fix disable_xfrm sysctl when used on xfrm interfaces") Signed-off-by: Antony Antony <[email protected]> Reviewed-by: Eyal Birger <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2021-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski25-57/+115
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-16Merge tag 'net-5.16-rc6' of ↵Linus Torvalds24-56/+113
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from mac80211, wifi, bpf. Relatively large batches of fixes from BPF and the WiFi stack, calm in general networking. Current release - regressions: - dpaa2-eth: fix buffer overrun when reporting ethtool statistics Current release - new code bugs: - bpf: fix incorrect state pruning for <8B spill/fill - iavf: - add missing unlocks in iavf_watchdog_task() - do not override the adapter state in the watchdog task (again) - mlxsw: spectrum_router: consolidate MAC profiles when possible Previous releases - regressions: - mac80211 fixes: - rate control, avoid driver crash for retransmitted frames - regression in SSN handling of addba tx - a memory leak where sta_info is not freed - marking TX-during-stop for TX in in_reconfig, prevent stall - cfg80211: acquire wiphy mutex on regulatory work - wifi drivers: fix build regressions and LED config dependency - virtio_net: fix rx_drops stat for small pkts - dsa: mv88e6xxx: unforce speed & duplex in mac_link_down() Previous releases - always broken: - bpf fixes: - kernel address leakage in atomic fetch - kernel address leakage in atomic cmpxchg's r0 aux reg - signed bounds propagation after mov32 - extable fixup offset - extable address check - mac80211: - fix the size used for building probe request - send ADDBA requests using the tid/queue of the aggregation session - agg-tx: don't schedule_and_wake_txq() under sta->lock, avoid deadlocks - validate extended element ID is present - mptcp: - never allow the PM to close a listener subflow (null-defer) - clear 'kern' flag from fallback sockets, prevent crash - fix deadlock in __mptcp_push_pending() - inet_diag: fix kernel-infoleak for UDP sockets - xsk: do not sleep in poll() when need_wakeup set - smc: avoid very long waits in smc_release() - sch_ets: don't remove idle classes from the round-robin list - netdevsim: - zero-initialize memory for bpf map's value, prevent info leak - don't let user space overwrite read only (max) ethtool parms - ixgbe: set X550 MDIO speed before talking to PHY - stmmac: - fix null-deref in flower deletion w/ VLAN prio Rx steering - dwmac-rk: fix oob read in rk_gmac_setup - ice: time stamping fixes - systemport: add global locking for descriptor life cycle" * tag 'net-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (89 commits) bpf, selftests: Fix racing issue in btf_skc_cls_ingress test selftest/bpf: Add a test that reads various addresses. bpf: Fix extable address check. bpf: Fix extable fixup offset. bpf, selftests: Add test case trying to taint map value pointer bpf: Make 32->64 bounds propagation slightly more robust bpf: Fix signed bounds propagation after mov32 sit: do not call ipip6_dev_free() from sit_init_net() net: systemport: Add global locking for descriptor lifecycle net/smc: Prevent smc_release() from long blocking net: Fix double 0x prefix print in SKB dump virtio_net: fix rx_drops stat for small pkts dsa: mv88e6xxx: fix debug print for SPEED_UNFORCED sfc_ef100: potential dereference of null pointer net: stmmac: dwmac-rk: fix oob read in rk_gmac_setup net: usb: lan78xx: add Allied Telesis AT29M2-AF net/packet: rx_owner_map depends on pg_vec netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc dpaa2-eth: fix ethtool statistics ixgbe: set X550 MDIO speed before talking to PHY ...
2021-12-16add missing bpf-cgroup.h includesJakub Kicinski3-0/+3
We're about to break the cgroup-defs.h -> bpf-cgroup.h dependency, make sure those who actually need more than the definition of struct cgroup_bpf include bpf-cgroup.h explicitly. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-12-16Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski1-2/+2
Daniel Borkmann says: ==================== pull-request: bpf 2021-12-16 We've added 15 non-merge commits during the last 7 day(s) which contain a total of 12 files changed, 434 insertions(+), 30 deletions(-). The main changes are: 1) Fix incorrect verifier state pruning behavior for <8B register spill/fill, from Paul Chaignon. 2) Fix x86-64 JIT's extable handling for fentry/fexit when return pointer is an ERR_PTR(), from Alexei Starovoitov. 3) Fix 3 different possibilities that BPF verifier missed where unprivileged could leak kernel addresses, from Daniel Borkmann. 4) Fix xsk's poll behavior under need_wakeup flag, from Magnus Karlsson. 5) Fix an oob-write in test_verifier due to a missed MAX_NR_MAPS bump, from Kumar Kartikeya Dwivedi. 6) Fix a race in test_btf_skc_cls_ingress selftest, from Martin KaFai Lau. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, selftests: Fix racing issue in btf_skc_cls_ingress test selftest/bpf: Add a test that reads various addresses. bpf: Fix extable address check. bpf: Fix extable fixup offset. bpf, selftests: Add test case trying to taint map value pointer bpf: Make 32->64 bounds propagation slightly more robust bpf: Fix signed bounds propagation after mov32 bpf, selftests: Update test case for atomic cmpxchg on r0 with pointer bpf: Fix kernel address leakage in atomic cmpxchg's r0 aux reg bpf, selftests: Add test case for atomic fetch on spilled pointer bpf: Fix kernel address leakage in atomic fetch selftests/bpf: Fix OOB write in test_verifier xsk: Do not sleep in poll() when need_wakeup set selftests/bpf: Tests for state pruning with u32 spill/fill bpf: Fix incorrect state pruning for <8B spill/fill ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-16sit: do not call ipip6_dev_free() from sit_init_net()Eric Dumazet1-1/+0
ipip6_dev_free is sit dev->priv_destructor, already called by register_netdevice() if something goes wrong. Alternative would be to make ipip6_dev_free() robust against multiple invocations, but other drivers do not implement this strategy. syzbot reported: dst_release underflow WARNING: CPU: 0 PID: 5059 at net/core/dst.c:173 dst_release+0xd8/0xe0 net/core/dst.c:173 Modules linked in: CPU: 1 PID: 5059 Comm: syz-executor.4 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:dst_release+0xd8/0xe0 net/core/dst.c:173 Code: 4c 89 f2 89 d9 31 c0 5b 41 5e 5d e9 da d5 44 f9 e8 1d 90 5f f9 c6 05 87 48 c6 05 01 48 c7 c7 80 44 99 8b 31 c0 e8 e8 67 29 f9 <0f> 0b eb 85 0f 1f 40 00 53 48 89 fb e8 f7 8f 5f f9 48 83 c3 a8 48 RSP: 0018:ffffc9000aa5faa0 EFLAGS: 00010246 RAX: d6894a925dd15a00 RBX: 00000000ffffffff RCX: 0000000000040000 RDX: ffffc90005e19000 RSI: 000000000003ffff RDI: 0000000000040000 RBP: 0000000000000000 R08: ffffffff816a1f42 R09: ffffed1017344f2c R10: ffffed1017344f2c R11: 0000000000000000 R12: 0000607f462b1358 R13: 1ffffffff1bfd305 R14: ffffe8ffffcb1358 R15: dffffc0000000000 FS: 00007f66c71a2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f88aaed5058 CR3: 0000000023e0f000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> dst_cache_destroy+0x107/0x1e0 net/core/dst_cache.c:160 ipip6_dev_free net/ipv6/sit.c:1414 [inline] sit_init_net+0x229/0x550 net/ipv6/sit.c:1936 ops_init+0x313/0x430 net/core/net_namespace.c:140 setup_net+0x35b/0x9d0 net/core/net_namespace.c:326 copy_net_ns+0x359/0x5c0 net/core/net_namespace.c:470 create_new_namespaces+0x4ce/0xa00 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0x11e/0x180 kernel/nsproxy.c:226 ksys_unshare+0x57d/0xb50 kernel/fork.c:3075 __do_sys_unshare kernel/fork.c:3146 [inline] __se_sys_unshare kernel/fork.c:3144 [inline] __x64_sys_unshare+0x34/0x40 kernel/fork.c:3144 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f66c882ce99 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f66c71a2168 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007f66c893ff60 RCX: 00007f66c882ce99 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000048040200 RBP: 00007f66c8886ff1 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fff6634832f R14: 00007f66c71a2300 R15: 0000000000022000 </TASK> Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-16net/smc: Prevent smc_release() from long blockingD. Wythe1-1/+3
In nginx/wrk benchmark, there's a hung problem with high probability on case likes that: (client will last several minutes to exit) server: smc_run nginx client: smc_run wrk -c 10000 -t 1 http://server Client hangs with the following backtrace: 0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f 1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6 2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c 3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de 4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3 5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc] 6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d 7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl 8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93 9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5 10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2 11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a 12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994 13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373 14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c This issue dues to flush_work(), which is used to wait for smc_connect_work() to finish in smc_release(). Once lots of smc_connect_work() was pending or all executing work dangling, smc_release() has to block until one worker comes to free, which is equivalent to wait another smc_connnect_work() to finish. In order to fix this, There are two changes: 1. For those idle smc_connect_work(), cancel it from the workqueue; for executing smc_connect_work(), waiting for it to finish. For that purpose, replace flush_work() with cancel_work_sync(). 2. Since smc_connect() hold a reference for passive closing, if smc_connect_work() has been cancelled, release the reference. Fixes: 24ac3a08e658 ("net/smc: rebuild nonblocking connect") Reported-by: Tony Lu <[email protected]> Tested-by: Dust Li <[email protected]> Reviewed-by: Dust Li <[email protected]> Reviewed-by: Tony Lu <[email protected]> Signed-off-by: D. Wythe <[email protected]> Acked-by: Karsten Graul <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-16fib: expand fib_rule_policyFlorian Westphal1-1/+17
Now that there is only one fib nla_policy there is no need to keep the macro around. Place it where its used. Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-16fib: rules: remove duplicated nla policiesFlorian Westphal6-28/+7
The attributes are identical in all implementations so move the ipv4 one into the core and remove the per-family nla policies. Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-16netfilter: ctnetlink: remove expired entries firstFlorian Westphal1-2/+3
When dumping conntrack table to userspace via ctnetlink, check if the ct has already expired before doing any of the 'skip' checks. This expires dead entries faster. /proc handler also removes outdated entries first. Reported-by: Vitaly Zuevsky <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-12-16netfilter: nf_nat_masquerade: add netns refcount tracker to masq_dev_workEric Dumazet1-1/+3
If compiled with CONFIG_NET_NS_REFCNT_TRACKER=y, using put_net_track() in iterate_cleanup_work() and netns_tracker_alloc() in nf_nat_masq_schedule() might help us finding netns refcount imbalances. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-12-16netfilter: nfnetlink: add netns refcount tracker to struct nfulnl_instanceEric Dumazet1-2/+3
If compiled with CONFIG_NET_NS_REFCNT_TRACKER=y, using put_net_track() in nfulnl_instance_free_rcu() and get_net_track() in instance_create() might help us finding netns refcount imbalances. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-12-16net: Fix double 0x prefix print in SKB dumpGal Pressman1-1/+1
When printing netdev features %pNF already takes care of the 0x prefix, remove the explicit one. Fixes: 6413139dfc64 ("skbuff: increase verbosity when dumping skb data") Signed-off-by: Gal Pressman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-12-15net/packet: rx_owner_map depends on pg_vecWillem de Bruijn1-2/+3
Packet sockets may switch ring versions. Avoid misinterpreting state between versions, whose fields share a union. rx_owner_map is only allocated with a packet ring (pg_vec) and both are swapped together. If pg_vec is NULL, meaning no packet ring was allocated, then neither was rx_owner_map. And the field may be old state from a tpacket_v3. Fixes: 61fad6816fc1 ("net/packet: tpacket_rcv: avoid a producer race condition") Reported-by: Syzbot <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextJakub Kicinski6-22/+12
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, mostly rather small housekeeping patches: 1) Remove unused variable in IPVS, from GuoYong Zheng. 2) Use memset_after in conntrack, from Kees Cook. 3) Remove leftover function in nfnetlink_queue, from Florian Westphal. 4) Remove redundant test on bool in conntrack, from Bernard Zhao. 5) egress support for nft_fwd, from Lukas Wunner. 6) Make pppoe work for br_netfilter, from Florian Westphal. 7) Remove unused variable in conntrack resize routine, from luo penghao. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: netfilter: conntrack: Remove useless assignment statements netfilter: bridge: add support for pppoe filtering netfilter: nft_fwd_netdev: Support egress hook netfilter: ctnetlink: remove useless type conversion to bool netfilter: nf_queue: remove leftover synchronize_rcu netfilter: conntrack: Use memset_startat() to zero struct nf_conn ipvs: remove unused variable for ip_vs_new_dest ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-16netfilter: conntrack: Remove useless assignment statementsluo penghao1-1/+0
The old_size assignment here will not be used anymore The clang_analyzer complains as follows: Value stored to 'old_size' is never read Reported-by: Zeal Robot <[email protected]> Signed-off-by: luo penghao <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-12-16netfilter: fix regression in looped (broad|multi)cast's MAC handlingIgnacy Gawędzki2-2/+4
In commit 5648b5e1169f ("netfilter: nfnetlink_queue: fix OOB when mac header was cleared"), the test for non-empty MAC header introduced in commit 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC handling") has been replaced with a test for a set MAC header. This breaks the case when the MAC header has been reset (using skb_reset_mac_header), as is the case with looped-back multicast packets. As a result, the packets ending up in NFQUEUE get a bogus hwaddr interpreted from the first bytes of the IP header. This patch adds a test for a non-empty MAC header in addition to the test for a set MAC header. The same two tests are also implemented in nfnetlink_log.c, where the initial code of commit 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC handling") has not been touched, but where supposedly the same situation may happen. Fixes: 5648b5e1169f ("netfilter: nfnetlink_queue: fix OOB when mac header was cleared") Signed-off-by: Ignacy Gawędzki <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-12-15netfilter: nf_tables: fix use-after-free in nft_set_catchall_destroy()Eric Dumazet1-2/+2
We need to use list_for_each_entry_safe() iterator because we can not access @catchall after kfree_rcu() call. syzbot reported: BUG: KASAN: use-after-free in nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4486 [inline] BUG: KASAN: use-after-free in nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline] BUG: KASAN: use-after-free in nft_set_destroy+0x3fd/0x4f0 net/netfilter/nf_tables_api.c:4493 Read of size 8 at addr ffff8880716e5b80 by task syz-executor.3/8871 CPU: 1 PID: 8871 Comm: syz-executor.3 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x2ed mm/kasan/report.c:247 __kasan_report mm/kasan/report.c:433 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:450 nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4486 [inline] nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline] nft_set_destroy+0x3fd/0x4f0 net/netfilter/nf_tables_api.c:4493 __nft_release_table+0x79f/0xcd0 net/netfilter/nf_tables_api.c:9626 nft_rcv_nl_event+0x4f8/0x670 net/netfilter/nf_tables_api.c:9688 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 blocking_notifier_call_chain kernel/notifier.c:318 [inline] blocking_notifier_call_chain+0x67/0x90 kernel/notifier.c:306 netlink_release+0xcb6/0x1dd0 net/netlink/af_netlink.c:788 __sock_release+0xcd/0x280 net/socket.c:649 sock_close+0x18/0x20 net/socket.c:1314 __fput+0x286/0x9f0 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:175 [inline] exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f75fbf28adb Code: 0f 05 48 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 63 fc ff ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 fc ff ff 8b 44 RSP: 002b:00007ffd8da7ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f75fbf28adb RDX: 00007f75fc08e828 RSI: ffffffffffffffff RDI: 0000000000000003 RBP: 00007f75fc08a960 R08: 0000000000000000 R09: 00007f75fc08e830 R10: 00007ffd8da7ed10 R11: 0000000000000293 R12: 00000000002067c3 R13: 00007ffd8da7ed10 R14: 00007f75fc088f60 R15: 0000000000000032 </TASK> Allocated by task 8886: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:434 [inline] ____kasan_kmalloc mm/kasan/common.c:513 [inline] ____kasan_kmalloc mm/kasan/common.c:472 [inline] __kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:522 kasan_kmalloc include/linux/kasan.h:269 [inline] kmem_cache_alloc_trace+0x1ea/0x4a0 mm/slab.c:3575 kmalloc include/linux/slab.h:590 [inline] nft_setelem_catchall_insert net/netfilter/nf_tables_api.c:5544 [inline] nft_setelem_insert net/netfilter/nf_tables_api.c:5562 [inline] nft_add_set_elem+0x232e/0x2f40 net/netfilter/nf_tables_api.c:5936 nf_tables_newsetelem+0x6ff/0xbb0 net/netfilter/nf_tables_api.c:6032 nfnetlink_rcv_batch+0x1710/0x25f0 net/netfilter/nfnetlink.c:513 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:652 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x904/0xdf0 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 15335: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free mm/kasan/common.c:328 [inline] __kasan_slab_free+0xd1/0x110 mm/kasan/common.c:374 kasan_slab_free include/linux/kasan.h:235 [inline] __cache_free mm/slab.c:3445 [inline] kmem_cache_free_bulk+0x67/0x1e0 mm/slab.c:3766 kfree_bulk include/linux/slab.h:446 [inline] kfree_rcu_work+0x51c/0xa10 kernel/rcu/tree.c:3273 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Last potentially related work creation: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 __kasan_record_aux_stack+0xb5/0xe0 mm/kasan/generic.c:348 kvfree_call_rcu+0x74/0x990 kernel/rcu/tree.c:3550 nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4489 [inline] nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline] nft_set_destroy+0x34a/0x4f0 net/netfilter/nf_tables_api.c:4493 __nft_release_table+0x79f/0xcd0 net/netfilter/nf_tables_api.c:9626 nft_rcv_nl_event+0x4f8/0x670 net/netfilter/nf_tables_api.c:9688 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 blocking_notifier_call_chain kernel/notifier.c:318 [inline] blocking_notifier_call_chain+0x67/0x90 kernel/notifier.c:306 netlink_release+0xcb6/0x1dd0 net/netlink/af_netlink.c:788 __sock_release+0xcd/0x280 net/socket.c:649 sock_close+0x18/0x20 net/socket.c:1314 __fput+0x286/0x9f0 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:175 [inline] exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff8880716e5b80 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff8880716e5b80, ffff8880716e5bc0) The buggy address belongs to the page: page:ffffea0001c5b940 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8880716e5c00 pfn:0x716e5 flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000000200 ffffea0000911848 ffffea00007c4d48 ffff888010c40200 raw: ffff8880716e5c00 ffff8880716e5000 000000010000001e 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x242040(__GFP_IO|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE), pid 3638, ts 211086074437, free_ts 211031029429 prep_new_page mm/page_alloc.c:2418 [inline] get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369 __alloc_pages_node include/linux/gfp.h:570 [inline] kmem_getpages mm/slab.c:1377 [inline] cache_grow_begin+0x75/0x470 mm/slab.c:2593 cache_alloc_refill+0x27f/0x380 mm/slab.c:2965 ____cache_alloc mm/slab.c:3048 [inline] ____cache_alloc mm/slab.c:3031 [inline] __do_cache_alloc mm/slab.c:3275 [inline] slab_alloc mm/slab.c:3316 [inline] __do_kmalloc mm/slab.c:3700 [inline] __kmalloc+0x3b3/0x4d0 mm/slab.c:3711 kmalloc include/linux/slab.h:595 [inline] kzalloc include/linux/slab.h:724 [inline] tomoyo_get_name+0x234/0x480 security/tomoyo/memory.c:173 tomoyo_parse_name_union+0xbc/0x160 security/tomoyo/util.c:260 tomoyo_update_path_number_acl security/tomoyo/file.c:687 [inline] tomoyo_write_file+0x629/0x7f0 security/tomoyo/file.c:1034 tomoyo_write_domain2+0x116/0x1d0 security/tomoyo/common.c:1152 tomoyo_add_entry security/tomoyo/common.c:2042 [inline] tomoyo_supervisor+0xbc7/0xf00 security/tomoyo/common.c:2103 tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline] tomoyo_path_number_perm+0x419/0x590 security/tomoyo/file.c:734 security_file_ioctl+0x50/0xb0 security/security.c:1541 __do_sys_ioctl fs/ioctl.c:868 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0xb3/0x200 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1338 [inline] free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389 free_unref_page_prepare mm/page_alloc.c:3309 [inline] free_unref_page+0x19/0x690 mm/page_alloc.c:3388 slab_destroy mm/slab.c:1627 [inline] slabs_destroy+0x89/0xc0 mm/slab.c:1647 cache_flusharray mm/slab.c:3418 [inline] ___cache_free+0x4cc/0x610 mm/slab.c:3480 qlink_free mm/kasan/quarantine.c:146 [inline] qlist_free_all+0x4e/0x110 mm/kasan/quarantine.c:165 kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272 __kasan_slab_alloc+0x97/0xb0 mm/kasan/common.c:444 kasan_slab_alloc include/linux/kasan.h:259 [inline] slab_post_alloc_hook mm/slab.h:519 [inline] slab_alloc_node mm/slab.c:3261 [inline] kmem_cache_alloc_node+0x2ea/0x590 mm/slab.c:3599 __alloc_skb+0x215/0x340 net/core/skbuff.c:414 alloc_skb include/linux/skbuff.h:1126 [inline] nlmsg_new include/net/netlink.h:953 [inline] rtmsg_ifinfo_build_skb+0x72/0x1a0 net/core/rtnetlink.c:3808 rtmsg_ifinfo_event net/core/rtnetlink.c:3844 [inline] rtmsg_ifinfo_event net/core/rtnetlink.c:3835 [inline] rtmsg_ifinfo+0x83/0x120 net/core/rtnetlink.c:3853 netdev_state_change net/core/dev.c:1395 [inline] netdev_state_change+0x114/0x130 net/core/dev.c:1386 linkwatch_do_dev+0x10e/0x150 net/core/link_watch.c:167 __linkwatch_run_queue+0x233/0x6a0 net/core/link_watch.c:213 linkwatch_event+0x4a/0x60 net/core/link_watch.c:252 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 Memory state around the buggy address: ffff8880716e5a80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8880716e5b00: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc >ffff8880716e5b80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff8880716e5c00: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8880716e5c80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-12-15ethtool: always write dev in ethnl_parse_header_dev_getJakub Kicinski1-3/+2
Commit 0976b888a150 ("ethtool: fix null-ptr-deref on ref tracker") made the write to req_info.dev conditional, but as Eric points out in a different follow up the structure is often allocated on the stack and not kzalloc()'d so seems safer to always write the dev, in case it's garbage on input. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-12-15net: add net device refcount tracker to struct packet_typeEric Dumazet2-5/+13
Most notable changes are in af_packet, tipc ones are trivial. Signed-off-by: Eric Dumazet <[email protected]> Cc: Jon Maloy <[email protected]> Cc: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-12-15ethtool: use ethnl_parse_header_dev_put()Eric Dumazet16-18/+23
It seems I missed that most ethnl_parse_header_dev_get() callers declare an on-stack struct ethnl_req_info, and that they simply call dev_put(req_info.dev) when about to return. Add ethnl_parse_header_dev_put() helper to properly untrack reference taken by ethnl_parse_header_dev_get(). Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-12-14mptcp: fix deadlock in __mptcp_push_pending()Maxim Galaganov1-1/+1
__mptcp_push_pending() may call mptcp_flush_join_list() with subflow socket lock held. If such call hits mptcp_sockopt_sync_all() then subsequently __mptcp_sockopt_sync() could try to lock the subflow socket for itself, causing a deadlock. sysrq: Show Blocked State task:ss-server state:D stack: 0 pid: 938 ppid: 1 flags:0x00000000 Call Trace: <TASK> __schedule+0x2d6/0x10c0 ? __mod_memcg_state+0x4d/0x70 ? csum_partial+0xd/0x20 ? _raw_spin_lock_irqsave+0x26/0x50 schedule+0x4e/0xc0 __lock_sock+0x69/0x90 ? do_wait_intr_irq+0xa0/0xa0 __lock_sock_fast+0x35/0x50 mptcp_sockopt_sync_all+0x38/0xc0 __mptcp_push_pending+0x105/0x200 mptcp_sendmsg+0x466/0x490 sock_sendmsg+0x57/0x60 __sys_sendto+0xf0/0x160 ? do_wait_intr_irq+0xa0/0xa0 ? fpregs_restore_userregs+0x12/0xd0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f9ba546c2d0 RSP: 002b:00007ffdc3b762d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f9ba56c8060 RCX: 00007f9ba546c2d0 RDX: 000000000000077a RSI: 0000000000e5e180 RDI: 0000000000000234 RBP: 0000000000cc57f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ba56c8060 R13: 0000000000b6ba60 R14: 0000000000cc7840 R15: 41d8685b1d7901b8 </TASK> Fix the issue by using __mptcp_flush_join_list() instead of plain mptcp_flush_join_list() inside __mptcp_push_pending(), as suggested by Florian. The sockopt sync will be deferred to the workqueue. Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/244 Suggested-by: Florian Westphal <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Signed-off-by: Maxim Galaganov <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-14mptcp: clear 'kern' flag from fallback socketsFlorian Westphal1-1/+3
The mptcp ULP extension relies on sk->sk_sock_kern being set correctly: It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from working for plain tcp sockets (any userspace-exposed socket). But in case of fallback, accept() can return a plain tcp sk. In such case, sk is still tagged as 'kernel' and setsockopt will work. This will crash the kernel, The subflow extension has a NULL ctx->conn mptcp socket: BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0 Call Trace: tcp_data_ready+0xf8/0x370 [..] Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-14mptcp: remove tcp ulp setsockopt supportFlorian Westphal1-1/+0
TCP_ULP setsockopt cannot be used for mptcp because its already used internally to plumb subflow (tcp) sockets to the mptcp layer. syzbot managed to trigger a crash for mptcp connections that are in fallback mode: KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027] CPU: 1 PID: 1083 Comm: syz-executor.3 Not tainted 5.16.0-rc2-syzkaller #0 RIP: 0010:tls_build_proto net/tls/tls_main.c:776 [inline] [..] __tcp_set_ulp net/ipv4/tcp_ulp.c:139 [inline] tcp_set_ulp+0x428/0x4c0 net/ipv4/tcp_ulp.c:160 do_tcp_setsockopt+0x455/0x37c0 net/ipv4/tcp.c:3391 mptcp_setsockopt+0x1b47/0x2400 net/mptcp/sockopt.c:638 Remove support for TCP_ULP setsockopt. Fixes: d9e4c1291810 ("mptcp: only admit explicitly supported sockopt") Reported-by: [email protected] Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-14net: linkwatch: be more careful about dev->linkwatch_dev_trackerEric Dumazet1-1/+12
Apparently a concurrent linkwatch_add_event() could run while we are in __linkwatch_run_queue(). We need to free dev->linkwatch_dev_tracker tracker under lweventlist_lock protection to avoid this race. syzbot report: [ 77.935949][ T3661] reference already released. [ 77.941015][ T3661] allocated in: [ 77.944482][ T3661] linkwatch_fire_event+0x202/0x260 [ 77.950318][ T3661] netif_carrier_on+0x9c/0x100 [ 77.955120][ T3661] __ieee80211_sta_join_ibss+0xc52/0x1590 [ 77.960888][ T3661] ieee80211_sta_create_ibss.cold+0xd2/0x11f [ 77.966908][ T3661] ieee80211_ibss_work.cold+0x30e/0x60f [ 77.972483][ T3661] ieee80211_iface_work+0xb70/0xd00 [ 77.977715][ T3661] process_one_work+0x9ac/0x1680 [ 77.982671][ T3661] worker_thread+0x652/0x11c0 [ 77.987371][ T3661] kthread+0x405/0x4f0 [ 77.991465][ T3661] ret_from_fork+0x1f/0x30 [ 77.995895][ T3661] freed in: [ 77.999006][ T3661] linkwatch_do_dev+0x96/0x160 [ 78.004014][ T3661] __linkwatch_run_queue+0x233/0x6a0 [ 78.009496][ T3661] linkwatch_event+0x4a/0x60 [ 78.014099][ T3661] process_one_work+0x9ac/0x1680 [ 78.019034][ T3661] worker_thread+0x652/0x11c0 [ 78.023719][ T3661] kthread+0x405/0x4f0 [ 78.027810][ T3661] ret_from_fork+0x1f/0x30 [ 78.042541][ T3661] ------------[ cut here ]------------ [ 78.048253][ T3661] WARNING: CPU: 0 PID: 3661 at lib/ref_tracker.c:120 ref_tracker_free.cold+0x110/0x14e [ 78.062364][ T3661] Modules linked in: [ 78.066424][ T3661] CPU: 0 PID: 3661 Comm: kworker/0:5 Not tainted 5.16.0-rc4-next-20211210-syzkaller #0 [ 78.076075][ T3661] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ 78.090648][ T3661] Workqueue: events linkwatch_event [ 78.095890][ T3661] RIP: 0010:ref_tracker_free.cold+0x110/0x14e [ 78.102191][ T3661] Code: ea 03 48 c1 e0 2a 0f b6 04 02 84 c0 74 04 3c 03 7e 4c 8b 7b 18 e8 6b 54 e9 fa e8 26 4d 57 f8 4c 89 ee 48 89 ef e8 fb 33 36 00 <0f> 0b 41 bd ea ff ff ff e9 bd 60 e9 fa 4c 89 f7 e8 16 45 a2 f8 e9 [ 78.127211][ T3661] RSP: 0018:ffffc90002b5fb18 EFLAGS: 00010246 [ 78.133684][ T3661] RAX: 0000000000000000 RBX: ffff88807467f700 RCX: 0000000000000000 [ 78.141928][ T3661] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000001 [ 78.150087][ T3661] RBP: ffff888057e105b8 R08: 0000000000000001 R09: ffffffff8ffa1967 [ 78.158211][ T3661] R10: 0000000000000001 R11: 0000000000000000 R12: 1ffff9200056bf65 [ 78.166204][ T3661] R13: 0000000000000292 R14: ffff88807467f718 R15: 00000000c0e0008c [ 78.174321][ T3661] FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 [ 78.183310][ T3661] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 78.190156][ T3661] CR2: 000000c000208800 CR3: 000000007f7b5000 CR4: 00000000003506f0 [ 78.198235][ T3661] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 78.206214][ T3661] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 78.214328][ T3661] Call Trace: [ 78.217679][ T3661] <TASK> [ 78.220621][ T3661] ? __sanitizer_cov_trace_const_cmp4+0x1c/0x70 [ 78.226981][ T3661] ? nlmsg_notify+0xbe/0x280 [ 78.231607][ T3661] ? ref_tracker_dir_exit+0x330/0x330 [ 78.237654][ T3661] ? linkwatch_do_dev+0x96/0x160 [ 78.242628][ T3661] ? __linkwatch_run_queue+0x233/0x6a0 [ 78.248170][ T3661] ? linkwatch_event+0x4a/0x60 [ 78.252946][ T3661] ? process_one_work+0x9ac/0x1680 [ 78.258136][ T3661] ? worker_thread+0x853/0x11c0 [ 78.263020][ T3661] ? kthread+0x405/0x4f0 [ 78.267905][ T3661] ? ret_from_fork+0x1f/0x30 [ 78.272670][ T3661] ? netdev_state_change+0xa1/0x130 [ 78.278019][ T3661] ? netdev_exit+0xd0/0xd0 [ 78.282466][ T3661] ? dev_activate+0x420/0xa60 [ 78.287261][ T3661] linkwatch_do_dev+0x96/0x160 [ 78.292043][ T3661] __linkwatch_run_queue+0x233/0x6a0 [ 78.297505][ T3661] ? linkwatch_do_dev+0x160/0x160 [ 78.302561][ T3661] linkwatch_event+0x4a/0x60 [ 78.307225][ T3661] process_one_work+0x9ac/0x1680 [ 78.312292][ T3661] ? pwq_dec_nr_in_flight+0x2a0/0x2a0 [ 78.317757][ T3661] ? rwlock_bug.part.0+0x90/0x90 [ 78.322726][ T3661] ? _raw_spin_lock_irq+0x41/0x50 [ 78.327844][ T3661] worker_thread+0x853/0x11c0 [ 78.332543][ T3661] ? process_one_work+0x1680/0x1680 [ 78.338500][ T3661] kthread+0x405/0x4f0 [ 78.342610][ T3661] ? set_kthread_struct+0x130/0x130 Fixes: 63f13937cbe9 ("net: linkwatch: add net device refcount tracker") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-14mptcp: adjust to use netns refcount trackerEric Dumazet1-1/+1
MPTCP can change sk_net_refcnt after sock_create_kern() call. We need to change its corresponding get_net() to avoid a splat at release time, as in : refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 0 PID: 3599 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 1 PID: 3599 Comm: syz-fuzzer Not tainted 5.16.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d b1 99 a1 09 31 ff 89 de e8 5d 3a 9c fd 84 db 75 e0 e8 74 36 9c fd 48 c7 c7 60 00 05 8a c6 05 91 99 a1 09 01 e8 cc 4b 27 05 <0f> 0b eb c4 e8 58 36 9c fd 0f b6 1d 80 99 a1 09 31 ff 89 de e8 28 RSP: 0018:ffffc90001f5fab0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888021873a00 RSI: ffffffff815f1e28 RDI: fffff520003ebf48 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815ebbce R11: 0000000000000000 R12: 1ffff920003ebf5b R13: 00000000ffffffef R14: ffffffff8d2fcd94 R15: ffffc90001f5fd10 FS: 000000c00008a090(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0a5b59e300 CR3: 000000001cbe6000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] ref_tracker_free+0x4fe/0x610 lib/ref_tracker.c:101 netns_tracker_free include/net/net_namespace.h:327 [inline] put_net_track include/net/net_namespace.h:341 [inline] __sk_destruct+0x4a6/0x920 net/core/sock.c:2042 sk_destruct+0xbd/0xe0 net/core/sock.c:2058 __sk_free+0xef/0x3d0 net/core/sock.c:2069 sk_free+0x78/0xa0 net/core/sock.c:2080 sock_put include/net/sock.h:1911 [inline] __mptcp_close_ssk+0x435/0x590 net/mptcp/protocol.c:2276 __mptcp_destroy_sock+0x35f/0x830 net/mptcp/protocol.c:2702 mptcp_close+0x5f8/0x7f0 net/mptcp/protocol.c:2750 inet_release+0x12e/0x280 net/ipv4/af_inet.c:428 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:476 __sock_release+0xcd/0x280 net/socket.c:649 sock_close+0x18/0x20 net/socket.c:1314 __fput+0x286/0x9f0 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:175 [inline] exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: ffa84b5ffb37 ("net: add netns refcount tracker to struct sock") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Cc: Mat Martineau <[email protected]> Cc: Florian Westphal <[email protected]> Acked-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-14ipv6: use GFP_ATOMIC in rt6_probe()Eric Dumazet1-1/+1
syzbot reminded me that rt6_probe() can run from atomic contexts. stack backtrace: CPU: 1 PID: 7461 Comm: syz-executor.2 Not tainted 5.16.0-rc4-next-20211210-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_usage_bug kernel/locking/lockdep.c:203 [inline] valid_state kernel/locking/lockdep.c:3945 [inline] mark_lock_irq kernel/locking/lockdep.c:4148 [inline] mark_lock.cold+0x61/0x8e kernel/locking/lockdep.c:4605 mark_usage kernel/locking/lockdep.c:4500 [inline] __lock_acquire+0x11d5/0x54a0 kernel/locking/lockdep.c:4981 lock_acquire kernel/locking/lockdep.c:5639 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604 __fs_reclaim_acquire mm/page_alloc.c:4550 [inline] fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4564 might_alloc include/linux/sched/mm.h:253 [inline] slab_pre_alloc_hook mm/slab.h:739 [inline] slab_alloc_node mm/slub.c:3145 [inline] slab_alloc mm/slub.c:3239 [inline] kmem_cache_alloc_trace+0x3b/0x2c0 mm/slub.c:3256 kmalloc include/linux/slab.h:581 [inline] kzalloc include/linux/slab.h:715 [inline] ref_tracker_alloc+0xe1/0x430 lib/ref_tracker.c:74 netdev_tracker_alloc include/linux/netdevice.h:3860 [inline] dev_hold_track include/linux/netdevice.h:3877 [inline] rt6_probe net/ipv6/route.c:661 [inline] find_match.part.0+0xac9/0xd00 net/ipv6/route.c:752 find_match net/ipv6/route.c:825 [inline] __find_rr_leaf+0x17f/0xd20 net/ipv6/route.c:826 find_rr_leaf net/ipv6/route.c:847 [inline] rt6_select net/ipv6/route.c:891 [inline] fib6_table_lookup+0x649/0xa20 net/ipv6/route.c:2185 ip6_pol_route+0x1c5/0x11e0 net/ipv6/route.c:2221 pol_lookup_func include/net/ip6_fib.h:580 [inline] fib6_rule_lookup+0x52a/0x6f0 net/ipv6/fib6_rules.c:120 ip6_route_output_flags_noref+0x2e2/0x380 net/ipv6/route.c:2629 ip6_route_output_flags+0x72/0x320 net/ipv6/route.c:2642 ip6_route_output include/net/ip6_route.h:98 [inline] ip6_dst_lookup_tail+0x5ab/0x1620 net/ipv6/ip6_output.c:1070 ip6_dst_lookup_flow+0x8c/0x1d0 net/ipv6/ip6_output.c:1200 geneve_get_v6_dst+0x46f/0x9a0 drivers/net/geneve.c:858 geneve6_xmit_skb drivers/net/geneve.c:991 [inline] geneve_xmit+0x520/0x3530 drivers/net/geneve.c:1074 __netdev_start_xmit include/linux/netdevice.h:4685 [inline] netdev_start_xmit include/linux/netdevice.h:4699 [inline] xmit_one net/core/dev.c:3473 [inline] dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489 __dev_queue_xmit+0x2983/0x3640 net/core/dev.c:4112 neigh_resolve_output net/core/neighbour.c:1522 [inline] neigh_resolve_output+0x50e/0x820 net/core/neighbour.c:1502 neigh_output include/net/neighbour.h:541 [inline] ip6_finish_output2+0x56e/0x14f0 net/ipv6/ip6_output.c:126 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline] __ip6_finish_output+0x61e/0xe80 net/ipv6/ip6_output.c:170 ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224 dst_output include/net/dst.h:451 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] ndisc_send_skb+0xa99/0x17f0 net/ipv6/ndisc.c:508 ndisc_send_rs+0x12e/0x6f0 net/ipv6/ndisc.c:702 addrconf_rs_timer+0x3f2/0x820 net/ipv6/addrconf.c:3898 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421 expire_timers kernel/time/timer.c:1466 [inline] __run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734 __run_timers kernel/time/timer.c:1715 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> Fixes: fb67510ba9bd ("ipv6: add net device refcount tracker to rt6_probe_deferred()") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-14xsk: Do not sleep in poll() when need_wakeup setMagnus Karlsson1-2/+2
Do not sleep in poll() when the need_wakeup flag is set. When this flag is set, the application needs to explicitly wake up the driver with a syscall (poll, recvmsg, sendmsg, etc.) to guarantee that Rx and/or Tx processing will be processed promptly. But the current code in poll(), sleeps first then wakes up the driver. This means that no driver processing will occur (baring any interrupts) until the timeout has expired. Fix this by checking the need_wakeup flag first and if set, wake the driver and return to the application. Only if need_wakeup is not set should the process sleep if there is a timeout set in the poll() call. Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") Reported-by: Keith Wiles <[email protected]> Signed-off-by: Magnus Karlsson <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-12-14Revert "pktgen: use min() to make code cleaner"David S. Miller1-2/+3
This reverts commit 13510fef48a3803d9ee8f044b015dacfb06fe0f5. Causes build warnings. Signed-off-by: David S. Miller <[email protected]>