aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2020-04-30net: bridge: vlan: Add a schedule point during VLAN processingIdo Schimmel1-0/+1
User space can request to delete a range of VLANs from a bridge slave in one netlink request. For each deleted VLAN the FDB needs to be traversed in order to flush all the affected entries. If a large range of VLANs is deleted and the number of FDB entries is large or the FDB lock is contented, it is possible for the kernel to loop through the deleted VLANs for a long time. In case preemption is disabled, this can result in a soft lockup. Fix this by adding a schedule point after each VLAN is deleted to yield the CPU, if needed. This is safe because the VLANs are traversed in process context. Fixes: bdced7ef7838 ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests") Signed-off-by: Ido Schimmel <[email protected]> Reported-by: Stefan Priebe - Profihost AG <[email protected]> Tested-by: Stefan Priebe - Profihost AG <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-30mptcp: fix uninitialized value accessPaolo Abeni1-1/+1
tcp_v{4,6}_syn_recv_sock() set 'own_req' only when returning a not NULL 'child', let's check 'own_req' only if child is available to avoid an - unharmful - UBSAN splat. v1 -> v2: - reference the correct hash Fixes: 4c8941de781c ("mptcp: avoid flipping mp_capable field in syn_recv_sock()") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-30mptcp: initialize the data_fin field for mpc packetsPaolo Abeni1-0/+1
When parsing MPC+data packets we set the dss field, so we must also initialize the data_fin, or we can find stray value there. Fixes: 9a19371bf029 ("mptcp: fix data_fin handing in RX path") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-30mptcp: fix 'use_ack' option access.Paolo Abeni1-1/+1
The mentioned RX option field is initialized only for DSS packet, we must access it only if 'dss' is set too, or the subflow will end-up in a bad status, leading to RFC violations. Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-30mptcp: avoid a WARN on bad input.Paolo Abeni1-2/+2
Syzcaller has found a way to trigger the WARN_ON_ONCE condition in check_fully_established(). The root cause is a legit fallback to TCP scenario, so replace the WARN with a plain message on a more strict condition. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-30mptcp: move option parsing into mptcp_incoming_options()Paolo Abeni5-69/+115
The mptcp_options_received structure carries several per packet flags (mp_capable, mp_join, etc.). Such fields must be cleared on each packet, even on dropped ones or packet not carrying any MPTCP options, but the current mptcp code clears them only on TCP option reset. On several races/corner cases we end-up with stray bits in incoming options, leading to WARN_ON splats. e.g.: [ 171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713 [ 171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531) [ 171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata [ 171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95 [ 171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 [ 171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531) [ 171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c [ 171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282 [ 171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e [ 171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955 [ 171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca [ 171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020 [ 171.228460] FS: 00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000 [ 171.230065] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0 [ 171.232586] Call Trace: [ 171.233109] <IRQ> [ 171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691) [ 171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832) [ 171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1)) [ 171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217) [ 171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822) [ 171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730) [ 171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009) [ 171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1)) [ 171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232) [ 171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252) [ 171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539) [ 171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135) [ 171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640) [ 171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293) [ 171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083) [ 171.282358] </IRQ> We could address the issue clearing explicitly the relevant fields in several places - tcp_parse_option, tcp_fast_parse_options, possibly others. Instead we move the MPTCP option parsing into the already existing mptcp ingress hook, so that we need to clear the fields in a single place. This allows us dropping an MPTCP hook from the TCP code and removing the quite large mptcp_options_received from the tcp_sock struct. On the flip side, the MPTCP sockets will traverse the option space twice (in tcp_parse_option() and in mptcp_incoming_options(). That looks acceptable: we already do that for syn and 3rd ack packets, plain TCP socket will benefit from it, and even MPTCP sockets will experience better code locality, reducing the jumps between TCP and MPTCP code. v1 -> v2: - rebased on current '-net' tree Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-30mptcp: consolidate synack processing.Paolo Abeni3-28/+24
Currently the MPTCP code uses 2 hooks to process syn-ack packets, mptcp_rcv_synsent() and the sk_rx_dst_set() callback. We can drop the first, moving the relevant code into the latter, reducing the hooking into the TCP code. This is also needed by the next patch. v1 -> v2: - use local tcp sock ptr instead of casting the sk variable several times - DaveM Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-29mptcp: replace mptcp_disconnect with a stubFlorian Westphal1-5/+6
Paolo points out that mptcp_disconnect is bogus: "lock_sock(sk); looks suspicious (lock should be already held by the caller) And call to: tcp_disconnect(sk, flags); too, sk is not a tcp socket". ->disconnect() gets called from e.g. inet_stream_connect when one tries to disassociate a connected socket again (to re-connect without closing the socket first). MPTCP however uses mptcp_stream_connect, not inet_stream_connect, for the mptcp-socket connect call. inet_stream_connect only gets called indirectly, for the tcp socket, so any ->disconnect() calls end up calling tcp_disconnect for that tcp subflow sk. This also explains why syzkaller has not yet reported a problem here. So for now replace this with a stub that doesn't do anything. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/14 Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-28net/x25: Fix null-ptr-deref in x25_disconnectYueHaibing1-4/+6
We should check null before do x25_neigh_put in x25_disconnect, otherwise may cause null-ptr-deref like this: #include <sys/socket.h> #include <linux/x25.h> int main() { int sck_x25; sck_x25 = socket(AF_X25, SOCK_SEQPACKET, 0); close(sck_x25); return 0; } BUG: kernel NULL pointer dereference, address: 00000000000000d8 CPU: 0 PID: 4817 Comm: t2 Not tainted 5.7.0-rc3+ #159 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3- RIP: 0010:x25_disconnect+0x91/0xe0 Call Trace: x25_release+0x18a/0x1b0 __sock_release+0x3d/0xc0 sock_close+0x13/0x20 __fput+0x107/0x270 ____fput+0x9/0x10 task_work_run+0x6d/0xb0 exit_to_usermode_loop+0x102/0x110 do_syscall_64+0x23c/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Reported-by: [email protected] Fixes: 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect") Signed-off-by: YueHaibing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-27Merge tag 'batadv-net-for-davem-20200427' of git://git.open-mesh.org/linux-mergeDavid S. Miller3-10/+4
Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - fix random number generation in network coding, by George Spelvin - fix reference counter leaks, by Xiyu Yang (3 patches) ==================== Signed-off-by: David S. Miller <[email protected]>
2020-04-27sch_sfq: validate silly quantum valuesEric Dumazet1-0/+9
syzbot managed to set up sfq so that q->scaled_quantum was zero, triggering an infinite loop in sfq_dequeue() More generally, we must only accept quantum between 1 and 2^18 - 7, meaning scaled_quantum must be in [1, 0x7FFF] range. Otherwise, we also could have a loop in sfq_dequeue() if scaled_quantum happens to be 0x8000, since slot->allot could indefinitely switch between 0 and 0x8000. Fixes: eeaeb068f139 ("sch_sfq: allow big packets and be fair") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: [email protected] Cc: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-27sch_choke: avoid potential panic in choke_reset()Eric Dumazet1-1/+2
If choke_init() could not allocate q->tab, we would crash later in choke_reset(). BUG: KASAN: null-ptr-deref in memset include/linux/string.h:366 [inline] BUG: KASAN: null-ptr-deref in choke_reset+0x208/0x340 net/sched/sch_choke.c:326 Write of size 8 at addr 0000000000000000 by task syz-executor822/7022 CPU: 1 PID: 7022 Comm: syz-executor822 Not tainted 5.7.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 __kasan_report.cold+0x5/0x4d mm/kasan/report.c:515 kasan_report+0x33/0x50 mm/kasan/common.c:625 check_memory_region_inline mm/kasan/generic.c:187 [inline] check_memory_region+0x141/0x190 mm/kasan/generic.c:193 memset+0x20/0x40 mm/kasan/common.c:85 memset include/linux/string.h:366 [inline] choke_reset+0x208/0x340 net/sched/sch_choke.c:326 qdisc_reset+0x6b/0x520 net/sched/sch_generic.c:910 dev_deactivate_queue.constprop.0+0x13c/0x240 net/sched/sch_generic.c:1138 netdev_for_each_tx_queue include/linux/netdevice.h:2197 [inline] dev_deactivate_many+0xe2/0xba0 net/sched/sch_generic.c:1195 dev_deactivate+0xf8/0x1c0 net/sched/sch_generic.c:1233 qdisc_graft+0xd25/0x1120 net/sched/sch_api.c:1051 tc_modify_qdisc+0xbab/0x1a00 net/sched/sch_api.c:1670 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5454 netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362 ___sys_sendmsg+0x100/0x170 net/socket.c:2416 __sys_sendmsg+0xec/0x1b0 net/socket.c:2449 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 Fixes: 77e62da6e60c ("sch_choke: drop all packets in queue during reset") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Cc: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-27fq_codel: fix TCA_FQ_CODEL_DROP_BATCH_SIZE sanity checksEric Dumazet1-1/+1
My intent was to not let users set a zero drop_batch_size, it seems I once again messed with min()/max(). Fixes: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-27net/tls: Fix sk_psock refcnt leak when in tls_data_ready()Xiyu Yang1-2/+3
tls_data_ready() invokes sk_psock_get(), which returns a reference of the specified sk_psock object to "psock" with increased refcnt. When tls_data_ready() returns, local variable "psock" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of tls_data_ready(). When "psock->ingress_msg" is empty but "psock" is not NULL, the function forgets to decrease the refcnt increased by sk_psock_get(), causing a refcnt leak. Fix this issue by calling sk_psock_put() on all paths when "psock" is not NULL. Signed-off-by: Xiyu Yang <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-27net/x25: Fix x25_neigh refcnt leak when x25 disconnectXiyu Yang1-0/+4
x25_connect() invokes x25_get_neigh(), which returns a reference of the specified x25_neigh object to "x25->neighbour" with increased refcnt. When x25 connect success and returns, the reference still be hold by "x25->neighbour", so the refcount should be decreased in x25_disconnect() to keep refcount balanced. The reference counting issue happens in x25_disconnect(), which forgets to decrease the refcnt increased by x25_get_neigh() in x25_connect(), causing a refcnt leak. Fix this issue by calling x25_neigh_put() before x25_disconnect() returns. Signed-off-by: Xiyu Yang <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-27net/tls: Fix sk_psock refcnt leak in bpf_exec_tx_verdict()Xiyu Yang1-0/+2
bpf_exec_tx_verdict() invokes sk_psock_get(), which returns a reference of the specified sk_psock object to "psock" with increased refcnt. When bpf_exec_tx_verdict() returns, local variable "psock" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of bpf_exec_tx_verdict(). When "policy" equals to NULL but "psock" is not NULL, the function forgets to decrease the refcnt increased by sk_psock_get(), causing a refcnt leak. Fix this issue by calling sk_psock_put() on this error path before bpf_exec_tx_verdict() returns. Signed-off-by: Xiyu Yang <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-27vsock/virtio: fix multiple packet delivery to monitoring devicesStefano Garzarella1-0/+4
In virtio_transport.c, if the virtqueue is full, the transmitting packet is queued up and it will be sent in the next iteration. This causes the same packet to be delivered multiple times to monitoring devices. We want to continue to deliver packets to monitoring devices before it is put in the virtqueue, to avoid that replies can appear in the packet capture before the transmitted packet. This patch fixes the issue, adding a new flag (tap_delivered) in struct virtio_vsock_pkt, to check if the packet is already delivered to monitoring devices. In vhost/vsock.c, we are splitting packets, so we must set 'tap_delivered' to false when we queue up the same virtio_vsock_pkt to handle the remaining bytes. Signed-off-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-25net: remove obsolete commentEric Dumazet1-1/+0
Commit b656722906ef ("net: Increase the size of skb_frag_t") removed the 16bit limitation of a frag on some 32bit arches. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-25mptcp: fix race in msk status updatePaolo Abeni1-1/+1
Currently subflow_finish_connect() changes unconditionally any msk socket status other than TCP_ESTABLISHED. If an unblocking connect() races with close(), we can end-up triggering: IPv4: Attempt to release TCP socket in state 1 00000000e32b8b7e when the msk socket is disposed. Be sure to enter the established status only from SYN_SENT. Fixes: c3c123d16c0e ("net: mptcp: don't hang in mptcp_sendmsg() after TCP fallback") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds29-105/+191
Pull networking fixes from David Miller: 1) Fix memory leak in netfilter flowtable, from Roi Dayan. 2) Ref-count leaks in netrom and tipc, from Xiyu Yang. 3) Fix warning when mptcp socket is never accepted before close, from Florian Westphal. 4) Missed locking in ovs_ct_exit(), from Tonghao Zhang. 5) Fix large delays during PTP synchornization in cxgb4, from Rahul Lakkireddy. 6) team_mode_get() can hang, from Taehee Yoo. 7) Need to use kvzalloc() when allocating fw tracer in mlx5 driver, from Niklas Schnelle. 8) Fix handling of bpf XADD on BTF memory, from Jann Horn. 9) Fix BPF_STX/BPF_B encoding in x86 bpf jit, from Luke Nelson. 10) Missing queue memory release in iwlwifi pcie code, from Johannes Berg. 11) Fix NULL deref in macvlan device event, from Taehee Yoo. 12) Initialize lan87xx phy correctly, from Yuiko Oshino. 13) Fix looping between VRF and XFRM lookups, from David Ahern. 14) etf packet scheduler assumes all sockets are full sockets, which is not necessarily true. From Eric Dumazet. 15) Fix mptcp data_fin handling in RX path, from Paolo Abeni. 16) fib_select_default() needs to handle nexthop objects, from David Ahern. 17) Use GFP_ATOMIC under spinlock in mac80211_hwsim, from Wei Yongjun. 18) vxlan and geneve use wrong nlattr array, from Sabrina Dubroca. 19) Correct rx/tx stats in bcmgenet driver, from Doug Berger. 20) BPF_LDX zero-extension is encoded improperly in x86_32 bpf jit, fix from Luke Nelson. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (100 commits) selftests/bpf: Fix a couple of broken test_btf cases tools/runqslower: Ensure own vmlinux.h is picked up first bpf: Make bpf_link_fops static bpftool: Respect the -d option in struct_ops cmd selftests/bpf: Add test for freplace program with expected_attach_type bpf: Propagate expected_attach_type when verifying freplace programs bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd bpf, x86_32: Fix logic error in BPF_LDX zero-extension bpf, x86_32: Fix clobbering of dst for BPF_JSET bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension bpf: Fix reStructuredText markup net: systemport: suppress warnings on failed Rx SKB allocations net: bcmgenet: suppress warnings on failed Rx SKB allocations macsec: avoid to set wrong mtu mac80211: sta_info: Add lockdep condition for RCU list usage mac80211: populate debugfs only after cfg80211 init net: bcmgenet: correct per TX/RX ring statistics net: meth: remove spurious copyright text net: phy: bcm84881: clear settings on link down chcr: Fix CPU hard lockup ...
2020-04-24Merge tag 'mac80211-for-net-2020-04-24' of ↵David S. Miller5-20/+45
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Just three changes: * fix a wrong GFP_KERNEL in hwsim * fix the debugfs mess after the mac80211 registration race fix * suppress false-positive RCU list lockdep warnings ==================== Signed-off-by: David S. Miller <[email protected]>
2020-04-24mac80211: sta_info: Add lockdep condition for RCU list usageMadhuparna Bhowmik1-1/+2
The function sta_info_get_by_idx() uses RCU list primitive. It is called with local->sta_mtx held from mac80211/cfg.c. Add lockdep expression to avoid any false positive RCU list warnings. Signed-off-by: Madhuparna Bhowmik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-04-24mac80211: populate debugfs only after cfg80211 initJohannes Berg4-19/+43
When fixing the initialization race, we neglected to account for the fact that debugfs is initialized in wiphy_register(), and some debugfs things went missing (or rather were rerooted to the global debugfs root). Fix this by adding debugfs entries only after wiphy_register(). This requires some changes in the rate control code since it currently adds debugfs at alloc time, which can no longer be done after the reordering. Reported-by: Jouni Malinen <[email protected]> Reported-by: kernel test robot <[email protected]> Reported-by: Hauke Mehrtens <[email protected]> Reported-by: Felix Fietkau <[email protected]> Cc: [email protected] Fixes: 52e04b4ce5d0 ("mac80211: fix race in ieee80211_register_hw()") Signed-off-by: Johannes Berg <[email protected]> Acked-by: Sumit Garg <[email protected]> Link: https://lore.kernel.org/r/20200423111344.0e00d3346f12.Iadc76a03a55093d94391fc672e996a458702875d@changeid Signed-off-by: Johannes Berg <[email protected]>
2020-04-23net/x25: Fix x25_neigh refcnt leak when receiving frameXiyu Yang1-1/+3
x25_lapb_receive_frame() invokes x25_get_neigh(), which returns a reference of the specified x25_neigh object to "nb" with increased refcnt. When x25_lapb_receive_frame() returns, local variable "nb" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one path of x25_lapb_receive_frame(). When pskb_may_pull() returns false, the function forgets to decrease the refcnt increased by x25_get_neigh(), causing a refcnt leak. Fix this issue by calling x25_neigh_put() when pskb_may_pull() returns false. Fixes: cb101ed2c3c7 ("x25: Handle undersized/fragmented skbs") Signed-off-by: Xiyu Yang <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-23mptcp/pm_netlink.c : add check for nla_put_in/6_addrBo YU1-5/+7
Normal there should be checked for nla_put_in6_addr like other usage in net. Detected by CoverityScan, CID# 1461639 Fixes: 01cacb00b35c ("mptcp: add netlink-based PM") Signed-off-by: Bo YU <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-23Merge tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6Linus Torvalds9-29/+47
Pull nfsd fixes from Chuck Lever: "The first set of 5.7-rc fixes for NFS server issues. These were all unresolved at the time the 5.7 window opened, and needed some additional time to ensure they were correctly addressed. They are ready now. At the moment I know of one more urgent issue regarding the NFS server. A fix has been tested and is under review. I expect to send one more pull request, containing this fix (which now consists of 3 patches). Fixes: - Address several use-after-free and memory leak bugs - Prevent a backchannel livelock" * tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6: svcrdma: Fix leak of svc_rdma_recv_ctxt objects svcrdma: Fix trace point use-after-free race SUNRPC: Fix backchannel RPC soft lockups SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge nfsd: memory corruption in nfsd4_lock()
2020-04-22ipv4: Update fib_select_default to handle nexthop objectsDavid Ahern1-3/+3
A user reported [0] hitting the WARN_ON in fib_info_nh: [ 8633.839816] ------------[ cut here ]------------ [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381 ... [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381 ... [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286 [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000 [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000 [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0 [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60 [ 8633.839857] FS: 00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000 [ 8633.839858] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0 [ 8633.839867] Call Trace: [ 8633.839871] ip_route_output_key_hash_rcu+0x421/0x890 [ 8633.839873] ip_route_output_key_hash+0x5e/0x80 [ 8633.839876] ip_route_output_flow+0x1a/0x50 [ 8633.839878] __ip4_datagram_connect+0x154/0x310 [ 8633.839880] ip4_datagram_connect+0x28/0x40 [ 8633.839882] __sys_connect+0xd6/0x100 ... The WARN_ON is triggered in fib_select_default which is invoked when there are multiple default routes. Update the function to use fib_info_nhc and convert the nexthop checks to use fib_nh_common. Add test case that covers the affected code path. [0] https://github.com/FRRouting/frr/issues/6089 Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects") Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-22netlabel: Kconfig: Update reference for NetLabel Tools projectSalvatore Bonaccorso1-1/+1
The NetLabel Tools project has moved from http://netlabel.sf.net to a GitHub project. Update to directly refer to the new home for the tools. Signed-off-by: Salvatore Bonaccorso <[email protected]> Acked-by: Paul Moore <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-22mptcp: fix data_fin handing in RX pathPaolo Abeni1-2/+1
The data fin flag is set only via a DSS option, but mptcp_incoming_options() copies it unconditionally from the provided RX options. Since we do not clear all the mptcp sock RX options in a socket free/alloc cycle, we can end-up with a stray data_fin value while parsing e.g. MPC packets. That would lead to mapping data corruption and will trigger a few WARN_ON() in the RX path. Instead of adding a costly memset(), fetch the data_fin flag only for DSS packets - when we always explicitly initialize such bit at option parsing time. Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-22sctp: Fix SHUTDOWN CTSN Ack in the peer restart caseJere Leppänen1-1/+5
When starting shutdown in sctp_sf_do_dupcook_a(), get the value for SHUTDOWN Cumulative TSN Ack from the new association, which is reconstructed from the cookie, instead of the old association, which the peer doesn't have anymore. Otherwise the SHUTDOWN is either ignored or replied to with an ABORT by the peer because CTSN Ack doesn't match the peer's Initial TSN. Fixes: bdf6fa52f01b ("sctp: handle association restarts when the socket is closed.") Signed-off-by: Jere Leppänen <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-22sctp: Fix bundling of SHUTDOWN with COOKIE-ACKJere Leppänen1-3/+3
When we start shutdown in sctp_sf_do_dupcook_a(), we want to bundle the SHUTDOWN with the COOKIE-ACK to ensure that the peer receives them at the same time and in the correct order. This bundling was broken by commit 4ff40b86262b ("sctp: set chunk transport correctly when it's a new asoc"), which assigns a transport for the COOKIE-ACK, but not for the SHUTDOWN. Fix this by passing a reference to the COOKIE-ACK chunk as an argument to sctp_sf_do_9_2_start_shutdown() and onward to sctp_make_shutdown(). This way the SHUTDOWN chunk is assigned the same transport as the COOKIE-ACK chunk, which allows them to be bundled. In sctp_sf_do_9_2_start_shutdown(), the void *arg parameter was previously unused. Now that we're taking it into use, it must be a valid pointer to a chunk, or NULL. There is only one call site where it's not, in sctp_sf_autoclose_timer_expire(). Fix that too. Fixes: 4ff40b86262b ("sctp: set chunk transport correctly when it's a new asoc") Signed-off-by: Jere Leppänen <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-22net: dsa: don't fail to probe if we couldn't set the MTUVladimir Oltean1-5/+3
There is no reason to fail the probing of the switch if the MTU couldn't be configured correctly (either the switch port itself, or the host port) for whatever reason. MTU-sized traffic probably won't work, sure, but we can still probably limp on and support some form of communication anyway, which the users would probably appreciate more. Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports") Reported-by: Oleksij Rempel <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-22sched: etf: do not assume all sockets are full blownEric Dumazet1-3/+4
skb->sk does not always point to a full blown socket, we need to use sk_fullsock() before accessing fields which only make sense on full socket. BUG: KASAN: use-after-free in report_sock_error+0x286/0x300 net/sched/sch_etf.c:141 Read of size 1 at addr ffff88805eb9b245 by task syz-executor.5/9630 CPU: 1 PID: 9630 Comm: syz-executor.5 Not tainted 5.7.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:382 __kasan_report.cold+0x35/0x4d mm/kasan/report.c:511 kasan_report+0x33/0x50 mm/kasan/common.c:625 report_sock_error+0x286/0x300 net/sched/sch_etf.c:141 etf_enqueue_timesortedlist+0x389/0x740 net/sched/sch_etf.c:170 __dev_xmit_skb net/core/dev.c:3710 [inline] __dev_queue_xmit+0x154a/0x30a0 net/core/dev.c:4021 neigh_hh_output include/net/neighbour.h:499 [inline] neigh_output include/net/neighbour.h:508 [inline] ip6_finish_output2+0xfb5/0x25b0 net/ipv6/ip6_output.c:117 __ip6_finish_output+0x442/0xab0 net/ipv6/ip6_output.c:143 ip6_finish_output+0x34/0x1f0 net/ipv6/ip6_output.c:153 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x239/0x810 net/ipv6/ip6_output.c:176 dst_output include/net/dst.h:435 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip6_xmit+0xe1a/0x2090 net/ipv6/ip6_output.c:280 tcp_v6_send_synack+0x4e7/0x960 net/ipv6/tcp_ipv6.c:521 tcp_rtx_synack+0x10d/0x1a0 net/ipv4/tcp_output.c:3916 inet_rtx_syn_ack net/ipv4/inet_connection_sock.c:669 [inline] reqsk_timer_handler+0x4c2/0xb40 net/ipv4/inet_connection_sock.c:763 call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1405 expire_timers kernel/time/timer.c:1450 [inline] __run_timers kernel/time/timer.c:1774 [inline] __run_timers kernel/time/timer.c:1741 [inline] run_timer_softirq+0x623/0x1600 kernel/time/timer.c:1787 __do_softirq+0x26c/0x9f7 kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x192/0x1d0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:546 [inline] smp_apic_timer_interrupt+0x19e/0x600 arch/x86/kernel/apic/apic.c:1140 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:des_encrypt+0x157/0x9c0 lib/crypto/des.c:792 Code: 85 22 06 00 00 41 31 dc 41 8b 4d 04 44 89 e2 41 83 e4 3f 4a 8d 3c a5 60 72 72 88 81 e2 3f 3f 3f 3f 48 89 f8 48 c1 e8 03 31 d9 <0f> b6 34 28 48 89 f8 c1 c9 04 83 e0 07 83 c0 03 40 38 f0 7c 09 40 RSP: 0018:ffffc90003b5f6c0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff10e4e55 RBX: 00000000d2f846d0 RCX: 00000000d2f846d0 RDX: 0000000012380612 RSI: ffffffff839863ca RDI: ffffffff887272a8 RBP: dffffc0000000000 R08: ffff888091d0a380 R09: 0000000000800081 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000012 R13: ffff8880a8ae8078 R14: 00000000c545c93e R15: 0000000000000006 cipher_crypt_one crypto/cipher.c:75 [inline] crypto_cipher_encrypt_one+0x124/0x210 crypto/cipher.c:82 crypto_cbcmac_digest_update+0x1b5/0x250 crypto/ccm.c:830 crypto_shash_update+0xc4/0x120 crypto/shash.c:119 shash_ahash_update+0xa3/0x110 crypto/shash.c:246 crypto_ahash_update include/crypto/hash.h:547 [inline] hash_sendmsg+0x518/0xad0 crypto/algif_hash.c:102 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x308/0x7e0 net/socket.c:2362 ___sys_sendmsg+0x100/0x170 net/socket.c:2416 __sys_sendmmsg+0x195/0x480 net/socket.c:2506 __do_sys_sendmmsg net/socket.c:2535 [inline] __se_sys_sendmmsg net/socket.c:2532 [inline] __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2532 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x45c829 Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f6d9528ec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004fc080 RCX: 000000000045c829 RDX: 0000000000000001 RSI: 0000000020002640 RDI: 0000000000000004 RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000008d7 R14: 00000000004cb7aa R15: 00007f6d9528f6d4 Fixes: 4b15c7075352 ("net/sched: Make etf report drops on error_queue") Fixes: 25db26a91364 ("net/sched: Introduce the ETF Qdisc") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Cc: Vinicius Costa Gomes <[email protected]> Reviewed-by: Vinicius Costa Gomes <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-22xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finishDavid Ahern2-4/+0
IPSKB_XFRM_TRANSFORMED and IP6SKB_XFRM_TRANSFORMED are skb flags set by xfrm code to tell other skb handlers that the packet has been passed through the xfrm output functions. Simplify the code and just always set them rather than conditionally based on netfilter enabled thus making the flag available for other users. Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-21cgroup, netclassid: remove double cond_reschedJiri Slaby1-3/+1
Commit 018d26fcd12a ("cgroup, netclassid: periodically release file_lock on classid") added a second cond_resched to write_classid indirectly by update_classid_task. Remove the one in write_classid. Signed-off-by: Jiri Slaby <[email protected]> Cc: Dmitry Yakunin <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: David S. Miller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2-4/+6
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) flow_block_cb memleak in nf_flow_table_offload_del_cb(), from Roi Dayan. 2) Fix error path handling in nf_nat_inet_register_fn(), from Hillf Danton. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-04-21batman-adv: Fix refcnt leak in batadv_v_ogm_processXiyu Yang1-1/+1
batadv_v_ogm_process() invokes batadv_hardif_neigh_get(), which returns a reference of the neighbor object to "hardif_neigh" with increased refcount. When batadv_v_ogm_process() returns, "hardif_neigh" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling paths of batadv_v_ogm_process(). When batadv_v_ogm_orig_get() fails to get the orig node and returns NULL, the refcnt increased by batadv_hardif_neigh_get() is not decreased, causing a refcnt leak. Fix this issue by jumping to "out" label when batadv_v_ogm_orig_get() fails to get the orig node. Fixes: 9323158ef9f4 ("batman-adv: OGMv2 - implement originators logic") Signed-off-by: Xiyu Yang <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2020-04-21batman-adv: Fix refcnt leak in batadv_store_throughput_overrideXiyu Yang1-1/+1
batadv_show_throughput_override() invokes batadv_hardif_get_by_netdev(), which gets a batadv_hard_iface object from net_dev with increased refcnt and its reference is assigned to a local pointer 'hard_iface'. When batadv_store_throughput_override() returns, "hard_iface" becomes invalid, so the refcount should be decreased to keep refcount balanced. The issue happens in one error path of batadv_store_throughput_override(). When batadv_parse_throughput() returns NULL, the refcnt increased by batadv_hardif_get_by_netdev() is not decreased, causing a refcnt leak. Fix this issue by jumping to "out" label when batadv_parse_throughput() returns NULL. Fixes: 0b5ecc6811bd ("batman-adv: add throughput override attribute to hard_ifaces") Signed-off-by: Xiyu Yang <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2020-04-21batman-adv: Fix refcnt leak in batadv_show_throughput_overrideXiyu Yang1-0/+1
batadv_show_throughput_override() invokes batadv_hardif_get_by_netdev(), which gets a batadv_hard_iface object from net_dev with increased refcnt and its reference is assigned to a local pointer 'hard_iface'. When batadv_show_throughput_override() returns, "hard_iface" becomes invalid, so the refcount should be decreased to keep refcount balanced. The issue happens in the normal path of batadv_show_throughput_override(), which forgets to decrease the refcnt increased by batadv_hardif_get_by_netdev() before the function returns, causing a refcnt leak. Fix this issue by calling batadv_hardif_put() before the batadv_show_throughput_override() returns in the normal path. Fixes: 0b5ecc6811bd ("batman-adv: add throughput override attribute to hard_ifaces") Signed-off-by: Xiyu Yang <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2020-04-21batman-adv: fix batadv_nc_random_weight_tqGeorge Spelvin1-8/+1
and change to pseudorandom numbers, as this is a traffic dithering operation that doesn't need crypto-grade. The previous code operated in 4 steps: 1. Generate a random byte 0 <= rand_tq <= 255 2. Multiply it by BATADV_TQ_MAX_VALUE - tq 3. Divide by 255 (= BATADV_TQ_MAX_VALUE) 4. Return BATADV_TQ_MAX_VALUE - rand_tq This would apperar to scale (BATADV_TQ_MAX_VALUE - tq) by a random value between 0/255 and 255/255. But! The intermediate value between steps 3 and 4 is stored in a u8 variable. So it's truncated, and most of the time, is less than 255, after which the division produces 0. Specifically, if tq is odd, the product is always even, and can never be 255. If tq is even, there's exactly one random byte value that will produce a product byte of 255. Thus, the return value is 255 (511/512 of the time) or 254 (1/512 of the time). If we assume that the truncation is a bug, and the code is meant to scale the input, a simpler way of looking at it is that it's returning a random value between tq and BATADV_TQ_MAX_VALUE, inclusive. Well, we have an optimized function for doing just that. Fixes: 3c12de9a5c75 ("batman-adv: network coding - code and transmit packets if possible") Signed-off-by: George Spelvin <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2020-04-20mptcp: drop req socket remote_key* fieldsPaolo Abeni3-17/+19
We don't need them, as we can use the current ingress opt data instead. Setting them in syn_recv_sock() may causes inconsistent mptcp socket status, as per previous commit. Fixes: cc7972ea1932 ("mptcp: parse and emit MP_CAPABLE option according to v1 spec") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-20mptcp: avoid flipping mp_capable field in syn_recv_sock()Paolo Abeni1-16/+30
If multiple CPUs races on the same req_sock in syn_recv_sock(), flipping such field can cause inconsistent child socket status. When racing, the CPU losing the req ownership may still change the mptcp request socket mp_capable flag while the CPU owning the request is cloning the socket, leaving the child socket with 'is_mptcp' set but no 'mp_capable' flag. Such socket will stay with 'conn' field cleared, heading to oops in later mptcp callback. Address the issue tracking the fallback status in a local variable. Fixes: 58b09919626b ("mptcp: create msk early") Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-20mptcp: handle mptcp listener destruction via rcuFlorian Westphal1-0/+3
Following splat can occur during self test: BUG: KASAN: use-after-free in subflow_data_ready+0x156/0x160 Read of size 8 at addr ffff888100c35c28 by task mptcp_connect/4808 subflow_data_ready+0x156/0x160 tcp_child_process+0x6a3/0xb30 tcp_v4_rcv+0x2231/0x3730 ip_protocol_deliver_rcu+0x5c/0x860 ip_local_deliver_finish+0x220/0x360 ip_local_deliver+0x1c8/0x4e0 ip_rcv_finish+0x1da/0x2f0 ip_rcv+0xd0/0x3c0 __netif_receive_skb_one_core+0xf5/0x160 __netif_receive_skb+0x27/0x1c0 process_backlog+0x21e/0x780 net_rx_action+0x35f/0xe90 do_softirq+0x4c/0x50 [..] This occurs when accessing subflow_ctx->conn. Problem is that tcp_child_process() calls listen sockets' sk_data_ready() notification, but it doesn't hold the listener lock. Another cpu calling close() on the listener will then cause transition of refcount to 0. Fixes: 58b09919626bf ("mptcp: create msk early") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-20ipv6: fix restrict IPV6_ADDRFORM operationJohn Haxby1-7/+6
Commit b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation") fixed a problem found by syzbot an unfortunate logic error meant that it also broke IPV6_ADDRFORM. Rearrange the checks so that the earlier test is just one of the series of checks made before moving the socket from IPv6 to IPv4. Fixes: b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation") Signed-off-by: John Haxby <[email protected]> Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
2020-04-20net: openvswitch: ovs_ct_exit to be done under ovs_lockTonghao Zhang2-2/+5
syzbot wrote: | ============================= | WARNING: suspicious RCU usage | 5.7.0-rc1+ #45 Not tainted | ----------------------------- | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!! | | other info that might help us debug this: | rcu_scheduler_active = 2, debug_locks = 1 | ... | | stack backtrace: | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 | Workqueue: netns cleanup_net | Call Trace: | ... | ovs_ct_exit | ovs_exit_net | ops_exit_list.isra.7 | cleanup_net | process_one_work | worker_thread To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add lockdep_ovsl_is_held as optional lockdep expression. Link: https://lore.kernel.org/lkml/[email protected] Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit") Cc: Pravin B Shelar <[email protected]> Cc: Yi-Hung Wei <[email protected]> Reported-by: [email protected] Signed-off-by: Tonghao Zhang <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-19netfilter: nat: fix error handling upon registering inet hookHillf Danton1-2/+2
A case of warning was reported by syzbot. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 19934 at net/netfilter/nf_nat_core.c:1106 nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 19934 Comm: syz-executor.5 Not tainted 5.6.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 panic+0x2e3/0x75c kernel/panic.c:221 __warn.cold+0x2f/0x35 kernel/panic.c:582 report_bug+0x27b/0x2f0 lib/bug.c:195 fixup_bug arch/x86/kernel/traps.c:175 [inline] fixup_bug arch/x86/kernel/traps.c:170 [inline] do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267 do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027 RIP: 0010:nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106 Code: ff df 48 c1 ea 03 80 3c 02 00 75 75 48 8b 44 24 10 4c 89 ef 48 c7 00 00 00 00 00 e8 e8 f8 53 fb e9 4d fe ff ff e8 ee 9c 16 fb <0f> 0b e9 41 fe ff ff e8 e2 45 54 fb e9 b5 fd ff ff 48 8b 7c 24 20 RSP: 0018:ffffc90005487208 EFLAGS: 00010246 RAX: 0000000000040000 RBX: 0000000000000004 RCX: ffffc9001444a000 RDX: 0000000000040000 RSI: ffffffff865c94a2 RDI: 0000000000000005 RBP: ffff88808b5cf000 R08: ffff8880a2620140 R09: fffffbfff14bcd79 R10: ffffc90005487208 R11: fffffbfff14bcd78 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 nf_nat_ipv6_unregister_fn net/netfilter/nf_nat_proto.c:1017 [inline] nf_nat_inet_register_fn net/netfilter/nf_nat_proto.c:1038 [inline] nf_nat_inet_register_fn+0xfc/0x140 net/netfilter/nf_nat_proto.c:1023 nf_tables_register_hook net/netfilter/nf_tables_api.c:224 [inline] nf_tables_addchain.constprop.0+0x82e/0x13c0 net/netfilter/nf_tables_api.c:1981 nf_tables_newchain+0xf68/0x16a0 net/netfilter/nf_tables_api.c:2235 nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362 ___sys_sendmsg+0x100/0x170 net/socket.c:2416 __sys_sendmsg+0xec/0x1b0 net/socket.c:2449 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 and to quiesce it, unregister NFPROTO_IPV6 hook instead of NFPROTO_INET in case of failing to register NFPROTO_IPV4 hook. Reported-by: syzbot <[email protected]> Fixes: d164385ec572 ("netfilter: nat: add inet family nat support") Cc: Florian Westphal <[email protected]> Cc: Stefano Brivio <[email protected]> Signed-off-by: Hillf Danton <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-04-18mptcp: fix 'Attempt to release TCP socket in state' warningsFlorian Westphal2-3/+12
We need to set sk_state to CLOSED, else we will get following: IPv4: Attempt to release TCP socket in state 3 00000000b95f109e IPv4: Attempt to release TCP socket in state 10 00000000b95f109e First one is from inet_sock_destruct(), second one from mptcp_sk_clone failure handling. Setting sk_state to CLOSED isn't enough, we also need to orphan sk so it has DEAD flag set. Otherwise, a very similar warning is printed from inet_sock_destruct(). Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-18mptcp: fix splat when incoming connection is never accepted before exit/closeFlorian Westphal2-1/+25
Following snippet (replicated from syzkaller reproducer) generates warning: "IPv4: Attempt to release TCP socket in state 1". int main(void) { struct sockaddr_in sin1 = { .sin_family = 2, .sin_port = 0x4e20, .sin_addr.s_addr = 0x010000e0, }; struct sockaddr_in sin2 = { .sin_family = 2, .sin_addr.s_addr = 0x0100007f, }; struct sockaddr_in sin3 = { .sin_family = 2, .sin_port = 0x4e20, .sin_addr.s_addr = 0x0100007f, }; int r0 = socket(0x2, 0x1, 0x106); int r1 = socket(0x2, 0x1, 0x106); bind(r1, (void *)&sin1, sizeof(sin1)); connect(r1, (void *)&sin2, sizeof(sin2)); listen(r1, 3); return connect(r0, (void *)&sin3, 0x4d); } Reason is that the newly generated mptcp socket is closed via the ulp release of the tcp listener socket when its accept backlog gets purged. To fix this, delay setting the ESTABLISHED state until after userspace calls accept and via mptcp specific destructor. Fixes: 58b09919626bf ("mptcp: create msk early") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/9 Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-18ipv6: rpl: fix full address compressionAlexander Aring1-3/+4
This patch makes it impossible that cmpri or cmpre values are set to the value 16 which is not possible, because these are 4 bit values. We currently run in an overflow when assigning the value 16 to it. According to the standard a value of 16 can be interpreted as a full elided address which isn't possible to set as compression value. A reason why this cannot be set is that the current ipv6 header destination address should never show up inside the segments of the rpl header. In this case we run in a overflow and the address will have no compression at all. Means cmpri or compre is set to 0. As we handle cmpri and cmpre sometimes as unsigned char or 4 bit value inside the rpl header the current behaviour ends in an invalid header format. This patch simple use the best compression method if we ever run into the case that the destination address is showed up inside the rpl segments. We avoid the overflow handling and the rpl header is still valid, even when we have the destination address inside the rpl segments. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-18tipc: Fix potential tipc_node refcnt leak in tipc_rcvXiyu Yang1-1/+3
tipc_rcv() invokes tipc_node_find() twice, which returns a reference of the specified tipc_node object to "n" with increased refcnt. When tipc_rcv() returns or a new object is assigned to "n", the original local reference of "n" becomes invalid, so the refcount should be decreased to keep refcount balanced. The issue happens in some paths of tipc_rcv(), which forget to decrease the refcnt increased by tipc_node_find() and will cause a refcnt leak. Fix this issue by calling tipc_node_put() before the original object pointed by "n" becomes invalid. Signed-off-by: Xiyu Yang <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>