aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-10-15net/smc: fix use-after-free of delayed eventsKarsten Graul1-8/+5
When a delayed event is enqueued then the event worker will send this event the next time it is running and no other flow is currently active. The event handler is called for the delayed event, and the pointer to the event keeps set in lgr->delayed_event. This pointer is cleared later in the processing by smc_llc_flow_start(). This can lead to a use-after-free condition when the processing does not reach smc_llc_flow_start(), but frees the event because of an error situation. Then the delayed_event pointer is still set but the event is freed. Fix this by always clearing the delayed event pointer when the event is provided to the event handler for processing, and remove the code to clear it in smc_llc_flow_start(). Fixes: 555da9af827d ("net/smc: add event-based llc_flow framework") Signed-off-by: Karsten Graul <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-15bpfilter: Fix build error with CONFIG_BPFILTER_UMHYueHaibing1-1/+1
IF CONFIG_BPFILTER_UMH is set, building fails: In file included from /usr/include/sys/socket.h:33:0, from net/bpfilter/main.c:6: /usr/include/bits/socket.h:390:10: fatal error: asm/socket.h: No such file or directory #include <asm/socket.h> ^~~~~~~~~~~~~~ compilation terminated. scripts/Makefile.userprogs:43: recipe for target 'net/bpfilter/main.o' failed make[2]: *** [net/bpfilter/main.o] Error 1 Add missing include path to fix this. Signed-off-by: YueHaibing <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-15net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_infoLeon Romanovsky1-1/+4
The access of tcf_tunnel_info() produces the following splat, so fix it by dereferencing the tcf_tunnel_key_params pointer with marker that internal tcfa_liock is held. ============================= WARNING: suspicious RCU usage 5.9.0+ #1 Not tainted ----------------------------- include/net/tc_act/tc_tunnel_key.h:59 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by tc/34839: #0: ffff88828572c2a0 (&p->tcfa_lock){+...}-{2:2}, at: tc_setup_flow_action+0xb3/0x48b5 stack backtrace: CPU: 1 PID: 34839 Comm: tc Not tainted 5.9.0+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x9a/0xd0 tc_setup_flow_action+0x14cb/0x48b5 fl_hw_replace_filter+0x347/0x690 [cls_flower] fl_change+0x2bad/0x4875 [cls_flower] tc_new_tfilter+0xf6f/0x1ba0 rtnetlink_rcv_msg+0x5f2/0x870 netlink_rcv_skb+0x124/0x350 netlink_unicast+0x433/0x700 netlink_sendmsg+0x6f1/0xbd0 sock_sendmsg+0xb0/0xe0 ____sys_sendmsg+0x4fa/0x6d0 ___sys_sendmsg+0x12e/0x1b0 __sys_sendmsg+0xa4/0x120 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f1f8cd4fe57 Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffdc1e193b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1f8cd4fe57 RDX: 0000000000000000 RSI: 00007ffdc1e19420 RDI: 0000000000000003 RBP: 000000005f85aafa R08: 0000000000000001 R09: 00007ffdc1e1936c R10: 000000000040522d R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 00007ffdc1e1d6f0 R15: 0000000000482420 Fixes: 3ebaf6da0716 ("net: sched: Do not assume RTNL is held in tunnel key action helpers") Fixes: 7a47281439ba ("net: sched: lock action when translating it to flow_action infra") Signed-off-by: Leon Romanovsky <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-14Merge branch 'ibmveth-gso-fix'Jakub Kicinski1-4/+15
David Wilder says: ==================== ibmveth gso fix The ibmveth driver is a virtual Ethernet driver used on IBM pSeries systems. Gso packets can be sent between LPARS (virtual hosts) without segmentation, by flagging gso packets using one of two methods depending on the firmware version. Some gso packet were not correctly identified by the receiver. This patch-set corrects this issue. V2: - Added fix tags. - Byteswap the constant at compilation time. - Updated the commit message to clarify what frame validation is performed by the hypervisor. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-14ibmveth: Identify ingress large send packets.David Wilder1-1/+12
Ingress large send packets are identified by either: The IBMVETH_RXQ_LRG_PKT flag in the receive buffer or with a -1 placed in the ip header checksum. The method used depends on firmware version. Frame geometry and sufficient header validation is performed by the hypervisor eliminating the need for further header checks here. Fixes: 7b5967389f5a ("ibmveth: set correct gso_size and gso_type") Signed-off-by: David Wilder <[email protected]> Reviewed-by: Thomas Falcon <[email protected]> Reviewed-by: Cristobal Forno <[email protected]> Reviewed-by: Pradeep Satyanarayana <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-14ibmveth: Switch order of ibmveth_helper calls.David Wilder1-5/+5
ibmveth_rx_csum_helper() must be called after ibmveth_rx_mss_helper() as ibmveth_rx_csum_helper() may alter ip and tcp checksum values. Fixes: 66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.") Signed-off-by: David Wilder <[email protected]> Reviewed-by: Thomas Falcon <[email protected]> Reviewed-by: Cristobal Forno <[email protected]> Reviewed-by: Pradeep Satyanarayana <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-14cxgb4: handle 4-tuple PEDIT to NAT mode translationHerat Ramani2-13/+177
The 4-tuple NAT offload via PEDIT always overwrites all the 4-tuple fields even if they had not been explicitly enabled. If any fields in the 4-tuple are not enabled, then the hardware overwrites the disabled fields with zeros, instead of ignoring them. So, add a parser that can translate the enabled 4-tuple PEDIT fields to one of the NAT mode combinations supported by the hardware and hence avoid overwriting disabled fields to 0. Any rule with unsupported NAT mode combination is rejected. Signed-off-by: Herat Ramani <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-14Merge branch 'l3mdev-icmp-error-route-lookup-fixes'Jakub Kicinski5-6/+653
Mathieu Desnoyers says: ==================== l3mdev icmp error route lookup fixes Here is a series of fixes for ipv4 and ipv6 which ensure the route lookup is performed on the right routing table in VRF configurations when sending TTL expired icmp errors (useful for traceroute). It includes tests for both ipv4 and ipv6. These fixes address specifically address the code paths involved in sending TTL expired icmp errors. As detailed in the individual commit messages, those fixes do not address similar icmp errors related to network namespaces and unreachable / fragmentation needed messages, which appear to use different code paths. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-14selftests: Add VRF route leaking testsMichael Jeanson2-0/+627
The objective of the tests is to check that ICMP errors generated while crossing between VRFs are properly routed back to the source host. The first ttl test sends a ping with a ttl of 1 from h1 to h2 and parses the output of the command to check that a ttl expired error is received. The second ttl test runs traceroute from h1 to h2 and parses the output to check for a hop on r1. The mtu test sends a ping with a payload of 1450 from h1 to h2, through r1 which has an interface with a mtu of 1400 and parses the output of the command to check that a fragmentation needed error is received. [ The IPv6 MTU test still fails with the symmetric routing setup. It appears to be caused by source address selection picking ::1. Fixing this is beyond the scope of this series. ] Signed-off-by: Michael Jeanson <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-14ipv6/icmp: l3mdev: Perform icmp error route lookup on source device routing ↵Mathieu Desnoyers2-4/+5
table (v2) As per RFC4443, the destination address field for ICMPv6 error messages is copied from the source address field of the invoking packet. In configurations with Virtual Routing and Forwarding tables, looking up which routing table to use for sending ICMPv6 error messages is currently done by using the destination net_device. If the source and destination interfaces are within separate VRFs, or one in the global routing table and the other in a VRF, looking up the source address of the invoking packet in the destination interface's routing table will fail if the destination interface's routing table contains no route to the invoking packet's source address. One observable effect of this issue is that traceroute6 does not work in the following cases: - Route leaking between global routing table and VRF - Route leaking between VRFs Use the source device routing table when sending ICMPv6 error messages. [ In the context of ipv4, it has been pointed out that a similar issue may exist with ICMP errors triggered when forwarding between network namespaces. It would be worthwhile to investigate whether ipv6 has similar issues, but is outside of the scope of this investigation. ] [ Testing shows that similar issues exist with ipv6 unreachable / fragmentation needed messages. However, investigation of this additional failure mode is beyond this investigation's scope. ] Link: https://tools.ietf.org/html/rfc4443 Signed-off-by: Mathieu Desnoyers <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-14ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing ↵Mathieu Desnoyers1-2/+21
table (v2) As per RFC792, ICMP errors should be sent to the source host. However, in configurations with Virtual Routing and Forwarding tables, looking up which routing table to use is currently done by using the destination net_device. commit 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine L3 domain") changes the interface passed to l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev to skb_dst(skb_in)->dev. This effectively uses the destination device rather than the source device for choosing which routing table should be used to lookup where to send the ICMP error. Therefore, if the source and destination interfaces are within separate VRFs, or one in the global routing table and the other in a VRF, looking up the source host in the destination interface's routing table will fail if the destination interface's routing table contains no route to the source host. One observable effect of this issue is that traceroute does not work in the following cases: - Route leaking between global routing table and VRF - Route leaking between VRFs Preferably use the source device routing table when sending ICMP error messages. If no source device is set, fall-back on the destination device routing table. Else, use the main routing table (index 0). [ It has been pointed out that a similar issue may exist with ICMP errors triggered when forwarding between network namespaces. It would be worthwhile to investigate, but is outside of the scope of this investigation. ] [ It has also been pointed out that a similar issue exists with unreachable / fragmentation needed messages, which can be triggered by changing the MTU of eth1 in r1 to 1400 and running: ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2 Some investigation points to raw_icmp_error() and raw_err() as being involved in this last scenario. The focus of this patch is TTL expired ICMP messages, which go through icmp_route_lookup. Investigation of failure modes related to raw_icmp_error() is beyond this investigation's scope. ] Fixes: 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine L3 domain") Link: https://tools.ietf.org/html/rfc792 Signed-off-by: Mathieu Desnoyers <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski9-34/+168
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Extend nf_queue selftest to cover re-queueing, non-gso mode and delayed queueing, from Florian Westphal. 2) Clear skb->tstamp in IPVS forwarding path, from Julian Anastasov. 3) Provide netlink extended error reporting for EEXIST case. 4) Missing VLAN offload tag and proto in log target. ==================== Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-13ip_gre: set dev->hard_header_len and dev->needed_headroom properlyCong Wang1-4/+11
GRE tunnel has its own header_ops, ipgre_header_ops, and sets it conditionally. When it is set, it assumes the outer IP header is already created before ipgre_xmit(). This is not true when we send packets through a raw packet socket, where L2 headers are supposed to be constructed by user. Packet socket calls dev_validate_header() to validate the header. But GRE tunnel does not set dev->hard_header_len, so that check can be simply bypassed, therefore uninit memory could be passed down to ipgre_xmit(). Similar for dev->needed_headroom. dev->hard_header_len is supposed to be the length of the header created by dev->header_ops->create(), so it should be used whenever header_ops is set, and dev->needed_headroom should be used when it is not set. Reported-and-tested-by: [email protected] Cc: William Tu <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Xie He <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-13socket: don't clear SOCK_TSTAMP_NEW when SO_TIMESTAMPNS is disabledChristian Eggers1-1/+0
SOCK_TSTAMP_NEW (timespec64 instead of timespec) is also used for hardware time stamps (configured via SO_TIMESTAMPING_NEW). User space (ptp4l) first configures hardware time stamping via SO_TIMESTAMPING_NEW which sets SOCK_TSTAMP_NEW. In the next step, ptp4l disables SO_TIMESTAMPNS(_NEW) (software time stamps), but this must not switch hardware time stamps back to "32 bit mode". This problem happens on 32 bit platforms were the libc has already switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW socket options). ptp4l complains with "missing timestamp on transmitted peer delay request" because the wrong format is received (and discarded). Fixes: 887feae36aee ("socket: Add SO_TIMESTAMP[NS]_NEW") Fixes: 783da70e8396 ("net: add sock_enable_timestamps") Signed-off-by: Christian Eggers <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Acked-by: Deepa Dinamani <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-13socket: fix option SO_TIMESTAMPING_NEWChristian Eggers1-7/+3
The comparison of optname with SO_TIMESTAMPING_NEW is wrong way around, so SOCK_TSTAMP_NEW will first be set and than reset again. Additionally move it out of the test for SOF_TIMESTAMPING_RX_SOFTWARE as this seems unrelated. This problem happens on 32 bit platforms were the libc has already switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW socket options). ptp4l complains with "missing timestamp on transmitted peer delay request" because the wrong format is received (and discarded). Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW") Signed-off-by: Christian Eggers <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Deepa Dinamani <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Acked-by: Deepa Dinamani <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-14netfilter: nf_log: missing vlan offload tag and protoPablo Neira Ayuso5-7/+39
Dump vlan tag and proto for the usual vlan offload case if the NF_LOG_MACDECODE flag is set on. Without this information the logging is misleading as there is no reference to the VLAN header. [12716.993704] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0800 SRC=192.168.10.2 DST=172.217.168.163 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=2548 DF PROTO=TCP SPT=55848 DPT=80 WINDOW=501 RES=0x00 ACK FIN URGP=0 [12721.157643] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0806 ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=86:6c:92:ea:d6:73 IPSRC=192.168.10.2 MACDST=0e:3b:eb:86:73:76 IPDST=192.168.10.1 Fixes: 83e96d443b37 ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-10-13docs: networking: update XPS to account for netif_set_xps_queueWillem de Bruijn1-3/+3
With the introduction of netif_set_xps_queue, XPS can be enabled by the driver at initialization. Update the documentation to reflect this, as otherwise users may incorrectly believe that the feature is off by default. Fixes: 537c00de1c9b ("net: Add functions netif_reset_xps_queue and netif_set_xps_queue") Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-12net: fec: Fix phy_device lookup for phy_reset_after_clk_enable()Marek Vasut1-2/+23
The phy_reset_after_clk_enable() is always called with ndev->phydev, however that pointer may be NULL even though the PHY device instance already exists and is sufficient to perform the PHY reset. This condition happens in fec_open(), where the clock must be enabled first, then the PHY must be reset, and then the PHY IDs can be read out of the PHY. If the PHY still is not bound to the MAC, but there is OF PHY node and a matching PHY device instance already, use the OF PHY node to obtain the PHY device instance, and then use that PHY device instance when triggering the PHY reset. Fixes: 1b0a83ac04e3 ("net: fec: add phy_reset_after_clk_enable() support") Signed-off-by: Marek Vasut <[email protected]> Cc: Christoph Niedermaier <[email protected]> Cc: David S. Miller <[email protected]> Cc: NXP Linux Team <[email protected]> Cc: Richard Leitner <[email protected]> Cc: Shawn Guo <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-12mlx4: handle non-napi callers to napi_pollJonathan Lemon2-1/+4
netcons calls napi_poll with a budget of 0 to transmit packets. Handle this by: - skipping RX processing - do not try to recycle TX packets to the RX cache Signed-off-by: Jonathan Lemon <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-12net: korina: fix kfree of rx/tx descriptor arrayValentin Vidic1-1/+2
kmalloc returns KSEG0 addresses so convert back from KSEG1 in kfree. Also make sure array is freed when the driver is unloaded from the kernel. Fixes: ef11291bcd5f ("Add support the Korina (IDT RC32434) Ethernet MAC") Signed-off-by: Valentin Vidic <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-12net: dsa: microchip: fix race conditionChristian Eggers1-7/+9
Between queuing the delayed work and finishing the setup of the dsa ports, the process may sleep in request_module() (via phy_device_create()) and the queued work may be executed prior to the switch net devices being registered. In ksz_mib_read_work(), a NULL dereference will happen within netof_carrier_ok(dp->slave). Not queuing the delayed work in ksz_init_mib_timer() makes things even worse because the work will now be queued for immediate execution (instead of 2000 ms) in ksz_mac_link_down() via dsa_port_link_register_of(). Call tree: ksz9477_i2c_probe() \--ksz9477_switch_register() \--ksz_switch_register() +--dsa_register_switch() | \--dsa_switch_probe() | \--dsa_tree_setup() | \--dsa_tree_setup_switches() | +--dsa_switch_setup() | | +--ksz9477_setup() | | | \--ksz_init_mib_timer() | | | |--/* Start the timer 2 seconds later. */ | | | \--schedule_delayed_work(&dev->mib_read, msecs_to_jiffies(2000)); | | \--__mdiobus_register() | | \--mdiobus_scan() | | \--get_phy_device() | | +--get_phy_id() | | \--phy_device_create() | | |--/* sleeping, ksz_mib_read_work() can be called meanwhile */ | | \--request_module() | | | \--dsa_port_setup() | +--/* Called for non-CPU ports */ | +--dsa_slave_create() | | +--/* Too late, ksz_mib_read_work() may be called beforehand */ | | \--port->slave = ... | ... | +--Called for CPU port */ | \--dsa_port_link_register_of() | \--ksz_mac_link_down() | +--/* mib_read must be initialized here */ | +--/* work is already scheduled, so it will be executed after 2000 ms */ | \--schedule_delayed_work(&dev->mib_read, 0); \-- /* here port->slave is setup properly, scheduling the delayed work should be safe */ Solution: 1. Do not queue (only initialize) delayed work in ksz_init_mib_timer(). 2. Only queue delayed work in ksz_mac_link_down() if init is completed. 3. Queue work once in ksz_switch_register(), after dsa_register_switch() has completed. Fixes: 7c6ff470aa86 ("net: dsa: microchip: add MIB counter reading support") Signed-off-by: Christian Eggers <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-12netfilter: nftables: extend error reporting for chain updatesPablo Neira Ayuso1-5/+14
The initial support for netlink extended ACK is missing the chain update path, which results in misleading error reporting in case of EEXIST. Fixes 36dd1bcc07e5 ("netfilter: nf_tables: initial support for extended ACK reporting") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-10-12ipvs: clear skb->tstamp in forwarding pathJulian Anastasov1-0/+6
fq qdisc requires tstamp to be cleared in forwarding path Reported-by: Evgeny B <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=209427 Suggested-by: Eric Dumazet <[email protected]> Fixes: 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths") Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.") Signed-off-by: Julian Anastasov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-10-12selftests: netfilter: extend nfqueue test caseFlorian Westphal2-22/+109
add a test with re-queueing: usespace doesn't pass accept verdict, but tells to re-queue to another nf_queue instance. Also, make the second nf-queue program use non-gso mode, kernel will have to perform software segmentation. Lastly, do not queue every packet, just one per second, and add delay when re-injecting the packet to the kernel. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-10-10ipv4: Restore flowi4_oif update before call to xfrm_lookup_routeDavid Ahern1-1/+3
Tobias reported regressions in IPsec tests following the patch referenced by the Fixes tag below. The root cause is dropping the reset of the flowi4_oif after the fib_lookup. Apparently it is needed for xfrm cases, so restore the oif update to ip_route_output_flow right before the call to xfrm_lookup_route. Fixes: 2fbc6e89b2f1 ("ipv4: Update exception handling for multipath routes via same device") Reported-by: Tobias Brunner <[email protected]> Signed-off-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-10Merge branch 'mptcp-some-fallback-fixes'Jakub Kicinski4-9/+58
Paolo Abeni says: ==================== mptcp: some fallback fixes pktdrill pointed-out we currently don't handle properly some fallback scenario for MP_JOIN subflows The first patch addresses such issue. Patch 2/2 fixes a related pre-existing issue that is more evident after 1/2: we could keep using for MPTCP signaling closed subflows. ==================== Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-10mptcp: subflows garbage collectionPaolo Abeni3-0/+24
The msk can close MP_JOIN subflows if the initial handshake fails. Currently such subflows are kept alive in the conn_list until the msk itself is closed. Beyond the wasted memory, we could end-up sending the DATA_FIN and the DATA_FIN ack on such socket, even after a reset. Fixes: 43b54c6ee382 ("mptcp: Use full MPTCP-level disconnect state machine") Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-10mptcp: fix fallback for MP_JOIN subflowsPaolo Abeni3-9/+34
Additional/MP_JOIN subflows that do not pass some initial handshake tests currently causes fallback to TCP. That is an RFC violation: we should instead reset the subflow and leave the the msk untouched. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/91 Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-10net: smc: fix missing brace warning for old compilersPujin Shi1-2/+2
For older versions of gcc, the array = {0}; will cause warnings: net/smc/smc_llc.c: In function 'smc_llc_add_link_local': net/smc/smc_llc.c:1212:9: warning: missing braces around initializer [-Wmissing-braces] struct smc_llc_msg_add_link add_llc = {0}; ^ net/smc/smc_llc.c:1212:9: warning: (near initialization for 'add_llc.hd') [-Wmissing-braces] net/smc/smc_llc.c: In function 'smc_llc_srv_delete_link_local': net/smc/smc_llc.c:1245:9: warning: missing braces around initializer [-Wmissing-braces] struct smc_llc_msg_del_link del_llc = {0}; ^ net/smc/smc_llc.c:1245:9: warning: (near initialization for 'del_llc.hd') [-Wmissing-braces] 2 warnings generated Fixes: 4dadd151b265 ("net/smc: enqueue local LLC messages") Signed-off-by: Pujin Shi <[email protected]> Acked-by: Karsten Graul <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-10net: smc: fix missing brace warning for old compilersPujin Shi1-1/+1
For older versions of gcc, the array = {0}; will cause warnings: net/smc/smc_llc.c: In function 'smc_llc_send_link_delete_all': net/smc/smc_llc.c:1317:9: warning: missing braces around initializer [-Wmissing-braces] struct smc_llc_msg_del_link delllc = {0}; ^ net/smc/smc_llc.c:1317:9: warning: (near initialization for 'delllc.hd') [-Wmissing-braces] 1 warnings generated Fixes: f3811fd7bc97 ("net/smc: send DELETE_LINK, ALL message and wait for send to complete") Signed-off-by: Pujin Shi <[email protected]> Acked-by: Karsten Graul <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-10Merge tag 'linux-can-fixes-for-5.9-20201008' of ↵Jakub Kicinski2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can ==================== linux-can-fixes-for-5.9-20201008 The first patch is by Lucas Stach and fixes m_can driver by removing an erroneous call to m_can_class_suspend() in runtime suspend. Which causes the pinctrl state to get stuck on the "sleep" state, which breaks all CAN functionality on SoCs where this state is defined. The last two patches target the j1939 protocol: Cong Wang fixes a syzbot finding of an uninitialized variable in the j1939 transport protocol. I contribute a patch, that fixes the initialization of a same uninitialized variable in a different function. ==================== Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-09tipc: fix NULL pointer dereference in tipc_named_rcvHoang Huu Le2-2/+10
In the function node_lost_contact(), we call __skb_queue_purge() without grabbing the list->lock. This can cause to a race-condition why processing the list 'namedq' in calling path tipc_named_rcv()->tipc_named_dequeue(). [] BUG: kernel NULL pointer dereference, address: 0000000000000000 [] #PF: supervisor read access in kernel mode [] #PF: error_code(0x0000) - not-present page [] PGD 7ca63067 P4D 7ca63067 PUD 6c553067 PMD 0 [] Oops: 0000 [#1] SMP NOPTI [] CPU: 1 PID: 15 Comm: ksoftirqd/1 Tainted: G O 5.9.0-rc6+ #2 [] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS [...] [] RIP: 0010:tipc_named_rcv+0x103/0x320 [tipc] [] Code: 41 89 44 24 10 49 8b 16 49 8b 46 08 49 c7 06 00 00 00 [...] [] RSP: 0018:ffffc900000a7c58 EFLAGS: 00000282 [] RAX: 00000000000012ec RBX: 0000000000000000 RCX: ffff88807bde1270 [] RDX: 0000000000002c7c RSI: 0000000000002c7c RDI: ffff88807b38f1a8 [] RBP: ffff88807b006288 R08: ffff88806a367800 R09: ffff88806a367900 [] R10: ffff88806a367a00 R11: ffff88806a367b00 R12: ffff88807b006258 [] R13: ffff88807b00628a R14: ffff888069334d00 R15: ffff88806a434600 [] FS: 0000000000000000(0000) GS:ffff888079480000(0000) knlGS:0[...] [] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [] CR2: 0000000000000000 CR3: 0000000077320000 CR4: 00000000000006e0 [] Call Trace: [] ? tipc_bcast_rcv+0x9a/0x1a0 [tipc] [] tipc_rcv+0x40d/0x670 [tipc] [] ? _raw_spin_unlock+0xa/0x20 [] tipc_l2_rcv_msg+0x55/0x80 [tipc] [] __netif_receive_skb_one_core+0x8c/0xa0 [] process_backlog+0x98/0x140 [] net_rx_action+0x13a/0x420 [] __do_softirq+0xdb/0x316 [] ? smpboot_thread_fn+0x2f/0x1e0 [] ? smpboot_thread_fn+0x74/0x1e0 [] ? smpboot_thread_fn+0x14e/0x1e0 [] run_ksoftirqd+0x1a/0x40 [] smpboot_thread_fn+0x149/0x1e0 [] ? sort_range+0x20/0x20 [] kthread+0x131/0x150 [] ? kthread_unuse_mm+0xa0/0xa0 [] ret_from_fork+0x22/0x30 [] Modules linked in: veth tipc(O) ip6_udp_tunnel udp_tunnel [...] [] CR2: 0000000000000000 [] ---[ end trace 65c276a8e2e2f310 ]--- To fix this, we need to grab the lock of the 'namedq' list on both path calling. Fixes: cad2929dc432 ("tipc: update a binding service via broadcast") Acked-by: Jon Maloy <[email protected]> Signed-off-by: Hoang Huu Le <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-09tipc: fix the skb_unshare() in tipc_buf_append()Cong Wang1-1/+2
skb_unshare() drops a reference count on the old skb unconditionally, so in the failure case, we end up freeing the skb twice here. And because the skb is allocated in fclone and cloned by caller tipc_msg_reassemble(), the consequence is actually freeing the original skb too, thus triggered the UAF by syzbot. Fix this by replacing this skb_unshare() with skb_cloned()+skb_copy(). Fixes: ff48b6222e65 ("tipc: use skb_unshare() instead in tipc_buf_append()") Reported-and-tested-by: [email protected] Cc: Jon Maloy <[email protected]> Cc: Ying Xue <[email protected]> Signed-off-by: Cong Wang <[email protected]> Reviewed-by: Xin Long <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-09net/tls: remove a duplicate function prototypeRandy Dunlap1-4/+0
Remove one of the two instances of the function prototype for tls_validate_xmit_skb(). Signed-off-by: Randy Dunlap <[email protected]> Cc: Boris Pismenny <[email protected]> Cc: Aviad Yehezkel <[email protected]> Cc: John Fastabend <[email protected]> Cc: Daniel Borkmann <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-09net/tls: sendfile fails with ktls offloadRohit Maheshwari1-5/+6
At first when sendpage gets called, if there is more data, 'more' in tls_push_data() gets set which later sets pending_open_record_frags, but when there is no more data in file left, and last time tls_push_data() gets called, pending_open_record_frags doesn't get reset. And later when 2 bytes of encrypted alert comes as sendmsg, it first checks for pending_open_record_frags, and since this is set, it creates a record with 0 data bytes to encrypt, meaning record length is prepend_size + tag_size only, which causes problem. We should set/reset pending_open_record_frags based on more bit. Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Rohit Maheshwari <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-09net: tlan: Fix typo abitraryNaoki Hayama1-1/+1
Fix comment typo. s/abitrary/arbitrary/ Signed-off-by: Naoki Hayama <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-09net: ipv6: Discard next-hop MTU less than minimum link MTUGeorg Kohmann1-1/+2
When a ICMPV6_PKT_TOOBIG report a next-hop MTU that is less than the IPv6 minimum link MTU, the estimated path MTU is reduced to the minimum link MTU. This behaviour breaks TAHI IPv6 Core Conformance Test v6LC4.1.6: Packet Too Big Less than IPv6 MTU. Referring to RFC 8201 section 4: "If a node receives a Packet Too Big message reporting a next-hop MTU that is less than the IPv6 minimum link MTU, it must discard it. A node must not reduce its estimate of the Path MTU below the IPv6 minimum link MTU on receipt of a Packet Too Big message." Drop the path MTU update if reported MTU is less than the minimum link MTU. Signed-off-by: Georg Kohmann <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-09net: ipa: skip suspend/resume activities if not set upAlex Elder1-0/+6
When processing a system suspend request we suspend modem endpoints if they are enabled, and call ipa_cmd_tag_process() (which issues IPA commands) to ensure the IPA pipeline is cleared. It is an error to attempt to issue an IPA command before setup is complete, so this is clearly a bug. But we also shouldn't suspend or resume any endpoints that have not been set up. Have ipa_endpoint_suspend() and ipa_endpoint_resume() immediately return if setup hasn't completed, to avoid any attempt to configure endpoints or issue IPA commands in that case. Fixes: 84f9bd12d46d ("soc: qcom: ipa: IPA endpoints") Tested-by: Matthias Kaehlcke <[email protected]> Signed-off-by: Alex Elder <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski1-7/+25
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter selftests fixes from Fabian Frederick: 1) Extend selftest nft_meta.sh to check for meta cpu. 2) Fix selftest nft_meta.sh error reporting. 3) Fix shellcheck warnings in selftest nft_meta.sh. 4) Extend selftest nft_meta.sh to check for meta time. ==================== Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-09net: mptcp: make DACK4/DACK8 usage consistent among all subflowsDavide Caratti3-4/+3
using packetdrill it's possible to observe the same MPTCP DSN being acked by different subflows with DACK4 and DACK8. This is in contrast with what specified in RFC8684 §3.3.2: if an MPTCP endpoint transmits a 64-bit wide DSN, it MUST be acknowledged with a 64-bit wide DACK. Fix 'use_64bit_ack' variable to make it a property of MPTCP sockets, not TCP subflows. Fixes: a0c1d0eafd1e ("mptcp: Use 32-bit DATA_ACK when possible") Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-09net: fec: Fix PHY init after phy_reset_after_clk_enable()Marek Vasut1-5/+5
The phy_reset_after_clk_enable() does a PHY reset, which means the PHY loses its register settings. The fec_enet_mii_probe() starts the PHY and does the necessary calls to configure the PHY via PHY framework, and loads the correct register settings into the PHY. Therefore, fec_enet_mii_probe() should be called only after the PHY has been reset, not before as it is now. Fixes: 1b0a83ac04e3 ("net: fec: add phy_reset_after_clk_enable() support") Reviewed-by: Andrew Lunn <[email protected]> Tested-by: Richard Leitner <[email protected]> Signed-off-by: Marek Vasut <[email protected]> Cc: Christoph Niedermaier <[email protected]> Cc: David S. Miller <[email protected]> Cc: NXP Linux Team <[email protected]> Cc: Shawn Guo <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-08net: j1939: j1939_session_fresh_new(): fix missing initialization of skbcntMarc Kleine-Budde1-0/+1
This patch add the initialization of skbcnt, similar to: e009f95b1543 can: j1935: j1939_tp_tx_dat_new(): fix missing initialization of skbcnt Let's play save and initialize this skbcnt as well. Suggested-by: Jakub Kicinski <[email protected]> Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Marc Kleine-Budde <[email protected]>
2020-10-08Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds4-68/+115
Pull vhost fixes from Michael Tsirkin: "Some last minute vhost,vdpa fixes. The last two of them haven't been in next but they do seem kind of obvious, very small and safe, fix bugs reported in the field, and they are both in a new mlx5 vdpa driver, so it's not like we can introduce regressions" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa/mlx5: Fix dependency on MLX5_CORE vdpa/mlx5: should keep avail_index despite device status vhost-vdpa: fix page pinning leakage in error path vhost-vdpa: fix vhost_vdpa_map() on error condition vhost: Don't call log_access_ok() when using IOTLB vhost: Use vhost_get_used_size() in vhost_vring_set_addr() vhost: Don't call access_ok() when using IOTLB vhost vdpa: fix vhost_vdpa_open error handling
2020-10-08can: j1935: j1939_tp_tx_dat_new(): fix missing initialization of skbcntCong Wang1-0/+1
This fixes an uninit-value warning: BUG: KMSAN: uninit-value in can_receive+0x26b/0x630 net/can/af_can.c:650 Reported-and-tested-by: [email protected] Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Cc: Robin van der Gracht <[email protected]> Cc: Oleksij Rempel <[email protected]> Cc: Pengutronix Kernel Team <[email protected]> Cc: Oliver Hartkopp <[email protected]> Cc: Marc Kleine-Budde <[email protected]> Signed-off-by: Cong Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Kleine-Budde <[email protected]>
2020-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds26-350/+212
Pull networking fixes from Jakub Kicinski: "One more set of fixes from the networking tree: - add missing input validation in nl80211_del_key(), preventing out-of-bounds access - last minute fix / improvement of a MRP netlink (uAPI) interface introduced in 5.9 (current) release - fix "unresolved symbol" build error under CONFIG_NET w/o CONFIG_INET due to missing tcp_timewait_sock and inet_timewait_sock BTF. - fix 32 bit sub-register bounds tracking in the bpf verifier for OR case - tcp: fix receive window update in tcp_add_backlog() - openvswitch: handle DNAT tuple collision in conntrack-related code - r8169: wait for potential PHY reset to finish after applying a FW file, avoiding unexpected PHY behaviour and failures later on - mscc: fix tail dropping watermarks for Ocelot switches - avoid use-after-free in macsec code after a call to the GRO layer - avoid use-after-free in sctp error paths - add a device id for Cellient MPL200 WWAN card - rxrpc fixes: - fix the xdr encoding of the contents read from an rxrpc key - fix a BUG() for a unsupported encoding type. - fix missing _bh lock annotations. - fix acceptance handling for an incoming call where the incoming call is encrypted. - the server token keyring isn't network namespaced - it belongs to the server, so there's no need. Namespacing it means that request_key() fails to find it. - fix a leak of the server keyring" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (21 commits) net: usb: qmi_wwan: add Cellient MPL200 card macsec: avoid use-after-free in macsec_handle_frame() r8169: consider that PHY reset may still be in progress after applying firmware openvswitch: handle DNAT tuple collision sctp: fix sctp_auth_init_hmacs() error path bridge: Netlink interface fix. net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() bpf: Fix scalar32_min_max_or bounds tracking tcp: fix receive window update in tcp_add_backlog() net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails mptcp: more DATA FIN fixes net: mscc: ocelot: warn when encoding an out-of-bounds watermark value net: mscc: ocelot: divide watermark value by 60 when writing to SYS_ATOP net: qrtr: ns: Fix the incorrect usage of rcu_read_lock() rxrpc: Fix server keyring leak rxrpc: The server keyring isn't network-namespaced rxrpc: Fix accept on a connection that need securing rxrpc: Fix some missing _bh annotations on locking conn->state_lock rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read() rxrpc: Fix rxkad token xdr encoding ...
2020-10-08vdpa/mlx5: Fix dependency on MLX5_COREEli Cohen1-4/+3
Remove propmt for selecting MLX5_VDPA by the user and modify MLX5_VDPA_NET to select MLX5_VDPA. Also modify MLX5_VDPA_NET to depend on mlx5_core. This fixes an issue where configuration sets 'y' for MLX5_VDPA_NET while MLX5_CORE is compiled as a module causing link errors. Reported-by: kernel test robot <[email protected]> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 device")s Signed-off-by: Eli Cohen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2020-10-08vdpa/mlx5: should keep avail_index despite device statusSi-Wei Liu1-6/+14
A VM with mlx5 vDPA has below warnings while being reset: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11) vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11) We should allow userspace emulating the virtio device be able to get to vq's avail_index, regardless of vDPA device status. Save the index that was last seen when virtq was stopped, so that userspace doesn't complain. Signed-off-by: Si-Wei Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Eli Cohen <[email protected]>
2020-10-08net: usb: qmi_wwan: add Cellient MPL200 cardWilken Gottwalt1-0/+1
Add usb ids of the Cellient MPL200 card. Signed-off-by: Wilken Gottwalt <[email protected]> Acked-by: Bjørn Mork <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-08macsec: avoid use-after-free in macsec_handle_frame()Eric Dumazet1-1/+3
De-referencing skb after call to gro_cells_receive() is not allowed. We need to fetch skb->len earlier. Fixes: 5491e7c6b1a9 ("macsec: enable GRO and RPS on macsec devices") Signed-off-by: Eric Dumazet <[email protected]> Cc: Paolo Abeni <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-10-08r8169: consider that PHY reset may still be in progress after applying firmwareHeiner Kallweit1-0/+7
Some firmware files trigger a PHY soft reset and don't wait for it to be finished. PHY register writes directly after applying the firmware may fail or provide unexpected results therefore. Fix this by waiting for bit BMCR_RESET to be cleared after applying firmware. There's nothing wrong with the referenced change, it's just that the fix will apply cleanly only after this change. Fixes: 89fbd26cca7e ("r8169: fix firmware not resetting tp->ocp_base") Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>