aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-09-17hv_netvsc: Add validation for untrusted Hyper-V valuesAndres Beltran4-20/+188
For additional robustness in the face of Hyper-V errors or malicious behavior, validate all values that originate from packets that Hyper-V has sent to the guest in the host-to-guest ring buffer. Ensure that invalid values cannot cause indexing off the end of an array, or subvert an existing validation via integer overflow. Ensure that outgoing packets do not have any leftover guest memory that has not been zeroed out. Signed-off-by: Andres Beltran <[email protected]> Co-developed-by: Andrea Parri (Microsoft) <[email protected]> Signed-off-by: Andrea Parri (Microsoft) <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: [email protected] Reviewed-by: Haiyang Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-16net: dsa: microchip: ksz8795: really set the correct number of portsMatthias Schiffer1-1/+1
The KSZ9477 and KSZ8795 use the port_cnt field differently: For the KSZ9477, it includes the CPU port(s), while for the KSZ8795, it doesn't. It would be a good cleanup to make the handling of both drivers match, but as a first step, fix the recently broken assignment of num_ports in the KSZ8795 driver (which completely broke probing, as the CPU port index was always failing the num_ports check). Fixes: af199a1a9cb0 ("net: dsa: microchip: set the correct number of ports") Signed-off-by: Matthias Schiffer <[email protected]> Reviewed-by: Codrin Ciubotariu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-16geneve: add transport ports in route lookup for geneveMark Gray1-10/+27
This patch adds transport ports information for route lookup so that IPsec can select Geneve tunnel traffic to do encryption. This is needed for OVS/OVN IPsec with encrypted Geneve tunnels. This can be tested by configuring a host-host VPN using an IKE daemon and specifying port numbers. For example, for an Openswan-type configuration, the following parameters should be configured on both hosts and IPsec set up as-per normal: $ cat /etc/ipsec.conf conn in ... left=$IP1 right=$IP2 ... leftprotoport=udp/6081 rightprotoport=udp ... conn out ... left=$IP1 right=$IP2 ... leftprotoport=udp rightprotoport=udp/6081 ... The tunnel can then be setup using "ip" on both hosts (but changing the relevant IP addresses): $ ip link add tun type geneve id 1000 remote $IP2 $ ip addr add 192.168.0.1/24 dev tun $ ip link set tun up This can then be tested by pinging from $IP1: $ ping 192.168.0.2 Without this patch the traffic is unencrypted on the wire. Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels") Signed-off-by: Qiuyu Xiao <[email protected]> Signed-off-by: Mark Gray <[email protected]> Reviewed-by: Greg Rose <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-16net: hns: kerneldoc fixesLu Wei1-20/+20
Fix some parameter description or spelling mistakes. Signed-off-by: Lu Wei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller10-38/+47
Alexei Starovoitov says: ==================== pull-request: bpf 2020-09-15 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 19 day(s) which contain a total of 10 files changed, 47 insertions(+), 38 deletions(-). The main changes are: 1) docs/bpf fixes, from Andrii. 2) ld_abs fix, from Daniel. 3) socket casting helpers fix, from Martin. 4) hash iterator fixes, from Yonghong. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-09-15bpf: Fix a rcu warning for bpffs map pretty-printYonghong Song1-1/+3
Running selftest ./btf_btf -p the kernel had the following warning: [ 51.528185] WARNING: CPU: 3 PID: 1756 at kernel/bpf/hashtab.c:717 htab_map_get_next_key+0x2eb/0x300 [ 51.529217] Modules linked in: [ 51.529583] CPU: 3 PID: 1756 Comm: test_btf Not tainted 5.9.0-rc1+ #878 [ 51.530346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014 [ 51.531410] RIP: 0010:htab_map_get_next_key+0x2eb/0x300 ... [ 51.542826] Call Trace: [ 51.543119] map_seq_next+0x53/0x80 [ 51.543528] seq_read+0x263/0x400 [ 51.543932] vfs_read+0xad/0x1c0 [ 51.544311] ksys_read+0x5f/0xe0 [ 51.544689] do_syscall_64+0x33/0x40 [ 51.545116] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The related source code in kernel/bpf/hashtab.c: 709 static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) 710 { 711 struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 712 struct hlist_nulls_head *head; 713 struct htab_elem *l, *next_l; 714 u32 hash, key_size; 715 int i = 0; 716 717 WARN_ON_ONCE(!rcu_read_lock_held()); In kernel/bpf/inode.c, bpffs map pretty print calls map->ops->map_get_next_key() without holding a rcu_read_lock(), hence causing the above warning. To fix the issue, just surrounding map->ops->map_get_next_key() with rcu read lock. Fixes: a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") Reported-by: Alexei Starovoitov <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Cc: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-15bpf: Bpf_skc_to_* casting helpers require a NULL check on skMartin KaFai Lau1-7/+7
The bpf_skc_to_* type casting helpers are available to BPF_PROG_TYPE_TRACING. The traced PTR_TO_BTF_ID may be NULL. For example, the skb->sk may be NULL. Thus, these casting helpers need to check "!sk" also and this patch fixes them. Fixes: 0d4fad3e57df ("bpf: Add bpf_skc_to_udp6_sock() helper") Fixes: 478cfbdf5f13 ("bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers") Fixes: af7ec1383361 ("bpf: Add bpf_skc_to_tcp6_sock() helper") Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Yonghong Song <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-15ipv4: Update exception handling for multipath routes via same deviceDavid Ahern1-5/+8
Kfir reported that pmtu exceptions are not created properly for deployments where multipath routes use the same device. After some digging I see 2 compounding problems: 1. ip_route_output_key_hash_rcu is updating the flowi4_oif *after* the route lookup. This is the second use case where this has been a problem (the first is related to use of vti devices with VRF). I can not find any reason for the oif to be changed after the lookup; the code goes back to the start of git. It does not seem logical so remove it. 2. fib_lookups for exceptions do not call fib_select_path to handle multipath route selection based on the hash. The end result is that the fib_lookup used to add the exception always creates it based using the first leg of the route. An example topology showing the problem: | host1 +------+ | eth0 | .209 +------+ | +------+ switch | br0 | +------+ | +---------+---------+ | host2 | host3 +------+ +------+ | eth0 | .250 | eth0 | 192.168.252.252 +------+ +------+ +-----+ +-----+ | vti | .2 | vti | 192.168.247.3 +-----+ +-----+ \ / ================================= tunnels 192.168.247.1/24 for h in host1 host2 host3; do ip netns add ${h} ip -netns ${h} link set lo up ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1 done ip netns add switch ip -netns switch li set lo up ip -netns switch link add br0 type bridge stp 0 ip -netns switch link set br0 up for n in 1 2 3; do ip -netns switch link add eth-sw type veth peer name eth-h${n} ip -netns switch li set eth-h${n} master br0 up ip -netns switch li set eth-sw netns host${n} name eth0 done ip -netns host1 addr add 192.168.252.209/24 dev eth0 ip -netns host1 link set dev eth0 up ip -netns host1 route add 192.168.247.0/24 \ nexthop via 192.168.252.250 dev eth0 nexthop via 192.168.252.252 dev eth0 ip -netns host2 addr add 192.168.252.250/24 dev eth0 ip -netns host2 link set dev eth0 up ip -netns host2 addr add 192.168.252.252/24 dev eth0 ip -netns host3 link set dev eth0 up ip netns add tunnel ip -netns tunnel li set lo up ip -netns tunnel li add br0 type bridge ip -netns tunnel li set br0 up for n in $(seq 11 20); do ip -netns tunnel addr add dev br0 192.168.247.${n}/24 done for n in 2 3 do ip -netns tunnel link add vti${n} type veth peer name eth${n} ip -netns tunnel link set eth${n} mtu 1360 master br0 up ip -netns tunnel link set vti${n} netns host${n} mtu 1360 up ip -netns host${n} addr add dev vti${n} 192.168.247.${n}/24 done ip -netns tunnel ro add default nexthop via 192.168.247.2 nexthop via 192.168.247.3 ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.11 ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.15 ip -netns host1 ro ls cache Before this patch the cache always shows exceptions against the first leg in the multipath route; 192.168.252.250 per this example. Since the hash has an initial random seed, you may need to vary the final octet more than what is listed. In my tests, using addresses between 11 and 19 usually found 1 that used both legs. With this patch, the cache will have exceptions for both legs. Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions") Reported-by: Kfir Itzhak <[email protected]> Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-15net: tipc: kerneldoc fixesLu Wei1-1/+2
Fix parameter description of tipc_link_bc_create() Reported-by: Hulk Robot <[email protected]> Fixes: 16ad3f4022bb ("tipc: introduce variable window congestion control") Signed-off-by: Lu Wei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-15ibmvnic: update MAINTAINERSDany Madden1-1/+3
Update supporters for IBM Power SRIOV Virtual NIC Device Driver. Thomas Falcon is moving on to other works. Dany Madden, Lijun Pan and Sukadev Bhattiprolu are the current supporters. Signed-off-by: Dany Madden <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-14docs/bpf: Remove source code linksAndrii Nakryiko1-1/+1
Make path to bench_ringbufs.c just a text, not a special link. Fixes: 97abb2b39682 ("docs/bpf: Add BPF ring buffer design notes") Reported-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-14xsk: Fix number of pinned pages/umem size discrepancyBjörn Töpel1-9/+8
For AF_XDP sockets, there was a discrepancy between the number of of pinned pages and the size of the umem region. The size of the umem region is used to validate the AF_XDP descriptor addresses. The logic that pinned the pages covered by the region only took whole pages into consideration, creating a mismatch between the size and pinned pages. A user could then pass AF_XDP addresses outside the range of pinned pages, but still within the size of the region, crashing the kernel. This change correctly calculates the number of pages to be pinned. Further, the size check for the aligned mode is simplified. Now the code simply checks if the size is divisible by the chunk size. Fixes: bbff2f321a86 ("xsk: new descriptor addressing scheme") Reported-by: Ciara Loftus <[email protected]> Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Tested-by: Ciara Loftus <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-14net: sched: initialize with 0 before setting erspan md->uXin Long1-0/+1
In fl_set_erspan_opt(), all bits of erspan md was set 1, as this function is also used to set opt MASK. However, when setting for md->u.index for opt VALUE, the rest bits of the union md->u will be left 1. It would cause to fail the match of the whole md when version is 1 and only index is set. This patch is to fix by initializing with 0 before setting erspan md->u. Reported-by: Shuang Li <[email protected]> Fixes: 79b1011cb33d ("net: sched: allow flower to match erspan options") Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-14Merge branch 'net-improve-vxlan-option-process-in-net_sched-and-lwtunnel'David S. Miller4-1/+8
Xin Long says: ==================== net: improve vxlan option process in net_sched and lwtunnel This patch is to do some mask when setting vxlan option in net_sched and lwtunnel, so that only available bits can be set on vxlan md gbp. This would help when users don't know exactly vxlan's gbp bits, and avoid some mismatch because of some unavailable bits set by users. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-09-14lwtunnel: only keep the available bits when setting vxlan md->gbpXin Long1-0/+1
As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can be set to or parse from the packet for vxlan gbp option. So do the mask when set it in lwtunnel, as it does in act_tunnel_key and cls_flower. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-14net: sched: only keep the available bits when setting vxlan md->gbpXin Long3-1/+7
As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can be set to or parse from the packet for vxlan gbp option. So we'd better do the mask when set it in act_tunnel_key and cls_flower. Otherwise, when users don't know these bits, they may configure with a value which can never be matched. Reported-by: Shuang Li <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-14tipc: use skb_unshare() instead in tipc_buf_append()Xin Long1-1/+2
In tipc_buf_append() it may change skb's frag_list, and it causes problems when this skb is cloned. skb_unclone() doesn't really make this skb's flag_list available to change. Shuang Li has reported an use-after-free issue because of this when creating quite a few macvlan dev over the same dev, where the broadcast packets will be cloned and go up to the stack: [ ] BUG: KASAN: use-after-free in pskb_expand_head+0x86d/0xea0 [ ] Call Trace: [ ] dump_stack+0x7c/0xb0 [ ] print_address_description.constprop.7+0x1a/0x220 [ ] kasan_report.cold.10+0x37/0x7c [ ] check_memory_region+0x183/0x1e0 [ ] pskb_expand_head+0x86d/0xea0 [ ] process_backlog+0x1df/0x660 [ ] net_rx_action+0x3b4/0xc90 [ ] [ ] Allocated by task 1786: [ ] kmem_cache_alloc+0xbf/0x220 [ ] skb_clone+0x10a/0x300 [ ] macvlan_broadcast+0x2f6/0x590 [macvlan] [ ] macvlan_process_broadcast+0x37c/0x516 [macvlan] [ ] process_one_work+0x66a/0x1060 [ ] worker_thread+0x87/0xb10 [ ] [ ] Freed by task 3253: [ ] kmem_cache_free+0x82/0x2a0 [ ] skb_release_data+0x2c3/0x6e0 [ ] kfree_skb+0x78/0x1d0 [ ] tipc_recvmsg+0x3be/0xa40 [tipc] So fix it by using skb_unshare() instead, which would create a new skb for the cloned frag and it'll be safe to change its frag_list. The similar things were also done in sctp_make_reassembled_event(), which is using skb_copy(). Reported-by: Shuang Li <[email protected]> Fixes: 37e22164a8a3 ("tipc: rename and move message reassembly function") Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-14tipc: Fix memory leak in tipc_group_create_member()Peilin Ye1-4/+10
tipc_group_add_to_tree() returns silently if `key` matches `nkey` of an existing node, causing tipc_group_create_member() to leak memory. Let tipc_group_add_to_tree() return an error in such a case, so that tipc_group_create_member() can handle it properly. Fixes: 75da2163dbb6 ("tipc: introduce communication groups") Reported-and-tested-by: [email protected] Cc: Hillf Danton <[email protected]> Link: https://syzkaller.appspot.com/bug?id=048390604fe1b60df34150265479202f10e13aff Signed-off-by: Peilin Ye <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-14ipv4: Initialize flowi4_multipath_hash in data pathDavid Ahern4-0/+4
flowi4_multipath_hash was added by the commit referenced below for tunnels. Unfortunately, the patch did not initialize the new field for several fast path lookups that do not initialize the entire flow struct to 0. Fix those locations. Currently, flowi4_multipath_hash is random garbage and affects the hash value computed by fib_multipath_hash for multipath selection. Fixes: 24ba14406c5c ("route: Add multipath_hash in flowi_common to make user-define hash") Signed-off-by: David Ahern <[email protected]> Cc: wenxu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-14Merge branch 'net-lantiq-Fix-bugs-in-NAPI-handling'David S. Miller1-8/+13
Hauke Mehrtens says: ==================== net: lantiq: Fix bugs in NAPI handling This fixes multiple bugs in the NAPI handling. Changes since: v1: - removed stable tag from "net: lantiq: use netif_tx_napi_add() for TX NAPI" - Check the NAPI budged in "net: lantiq: Use napi_complete_done()" - Add extra fix "net: lantiq: Disable IRQs only if NAPI gets scheduled" ==================== Signed-off-by: David S. Miller <[email protected]>
2020-09-14net: lantiq: Disable IRQs only if NAPI gets scheduledHauke Mehrtens1-3/+5
The napi_schedule() call will only schedule the NAPI if it is not already running. To make sure that we do not deactivate interrupts without scheduling NAPI only deactivate the interrupts in case NAPI also gets scheduled. Signed-off-by: Hauke Mehrtens <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-14net: lantiq: Use napi_complete_done()Hauke Mehrtens1-4/+4
Use napi_complete_done() and activate the interrupts when this function returns true. This way the generic NAPI code can take care of activating the interrupts. Signed-off-by: Hauke Mehrtens <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-14net: lantiq: use netif_tx_napi_add() for TX NAPIHauke Mehrtens1-1/+1
netif_tx_napi_add() should be used for NAPI in the TX direction instead of the netif_napi_add() function. Signed-off-by: Hauke Mehrtens <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-14net: lantiq: Wake TX queue againHauke Mehrtens1-0/+3
The call to netif_wake_queue() when the TX descriptors were freed was missing. When there are no TX buffers available the TX queue will be stopped, but it was not started again when they are available again, this is fixed in this patch. Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver") Signed-off-by: Hauke Mehrtens <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-14rndis_host: increase sleep time in the query-response loopOlympia Giannou1-1/+1
Some WinCE devices face connectivity issues via the NDIS interface. They fail to register, resulting in -110 timeout errors and failures during the probe procedure. In this kind of WinCE devices, the Windows-side ndis driver needs quite more time to be loaded and configured, so that the linux rndis host queries to them fail to be responded correctly on time. More specifically, when INIT is called on the WinCE side - no other requests can be served by the Client and this results in a failed QUERY afterwards. The increase of the waiting time on the side of the linux rndis host in the command-response loop leaves the INIT process to complete and respond to a QUERY, which comes afterwards. The WinCE devices with this special "feature" in their ndis driver are satisfied by this fix. Signed-off-by: Olympia Giannou <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-11net: ethernet: ti: cpsw_new: fix suspend/resumeGrygorii Strashko1-0/+53
Add missed suspend/resume callbacks to properly restore networking after suspend/resume cycle. Fixes: ed3525eda4c4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac") Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-11net: ipa: fix u32_replace_bits by u32p_xxx versionVadym Kochan1-2/+2
Looks like u32p_replace_bits() should be used instead of u32_replace_bits() which does not modifies the value but returns the modified version. Fixes: 2b9feef2b6c2 ("soc: qcom: ipa: filter and routing tables") Signed-off-by: Vadym Kochan <[email protected]> Reviewed-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-11hinic: fix rewaking txq after netif_tx_disableLuo bin2-16/+28
When calling hinic_close in hinic_set_channels, all queues are stopped after netif_tx_disable, but some queue may be rewaken in free_tx_poll by mistake while drv is handling tx irq. If one queue is rewaken core may call hinic_xmit_frame to send pkt after netif_tx_disable within a short time which may results in accessing memory that has been already freed in hinic_close. So we call napi_disable before netif_tx_disable in hinic_close to fix this bug. Fixes: 2eed5a8b614b ("hinic: add set_channels ethtool_ops support") Signed-off-by: Luo bin <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-11taprio: Fix allowing too small intervalsVinicius Costa Gomes1-11/+17
It's possible that the user specifies an interval that couldn't allow any packet to be transmitted. This also avoids the issue of the hrtimer handler starving the other threads because it's running too often. The solution is to reject interval sizes that according to the current link speed wouldn't allow any packet to be transmitted. Reported-by: [email protected] Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") Signed-off-by: Vinicius Costa Gomes <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-11Merge branch '40GbE' of ↵David S. Miller3-18/+43
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2020-09-09 This series contains updates to i40e and igc drivers. Stefan Assmann changes num_vlans to u16 to fix may be used uninitialized error and propagates error in i40_set_vsi_promisc() for i40e. Vinicius corrects timestamping latency values for i225 devices and accounts for TX timestamping delay for igc. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-09-11enetc: Fix mdio bus removal on PF probe bailoutClaudiu Manoil1-1/+1
This is the correct resolution for the conflict from merging the "net" tree fix: commit 26cb7085c898 ("enetc: Remove the mdio bus on PF probe bailout") with the "net-next" new work: commit 07095c025ac2 ("net: enetc: Use DT protocol information to set up the ports") that moved mdio bus allocation to an ealier stage of the PF probing routine. Fixes: a57066b1a019 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") Signed-off-by: Claudiu Manoil <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10docs/bpf: Fix ringbuf documentationAndrii Nakryiko1-4/+1
Remove link to litmus tests that didn't make it to upstream. Fix ringbuf benchmark link. I wasn't able to test this with `make htmldocs`, unfortunately, because of Sphinx dependencies. But bench_ringbufs.c path is certainly correct now. Fixes: 97abb2b39682 ("docs/bpf: Add BPF ring buffer design notes") Reported-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-10net: dec: de2104x: Increase receive ring size for TulipLucy Yan1-1/+1
Increase Rx ring size to address issue where hardware is reaching the receive work limit. Before: [ 102.223342] de2104x 0000:17:00.0 eth0: rx work limit reached [ 102.245695] de2104x 0000:17:00.0 eth0: rx work limit reached [ 102.251387] de2104x 0000:17:00.0 eth0: rx work limit reached [ 102.267444] de2104x 0000:17:00.0 eth0: rx work limit reached Signed-off-by: Lucy Yan <[email protected]> Reviewed-by: Moritz Fischer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10netlink: fix doc about nlmsg_parse/nla_validateNicolas Dichtel1-2/+0
There is no @validate argument. CC: Johannes Berg <[email protected]> Fixes: 3de644035446 ("netlink: re-add parse/validate functions in strict mode") Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10net: DCB: Validate DCB_ATTR_DCB_BUFFER argumentPetr Machata1-0/+8
The parameter passed via DCB_ATTR_DCB_BUFFER is a struct dcbnl_buffer. The field prio2buffer is an array of IEEE_8021Q_MAX_PRIORITIES bytes, where each value is a number of a buffer to direct that priority's traffic to. That value is however never validated to lie within the bounds set by DCBX_MAX_BUFFERS. The only driver that currently implements the callback is mlx5 (maintainers CCd), and that does not do any validation either, in particual allowing incorrect configuration if the prio2buffer value does not fit into 4 bits. Instead of offloading the need to validate the buffer index to drivers, do it right there in core, and bounce the request if the value is too large. CC: Parav Pandit <[email protected]> CC: Saeed Mahameed <[email protected]> Fixes: e549f6f9c098 ("net/dcb: Add dcbnl buffer attribute") Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10Merge branch 'net-Fix-bridge-enslavement-failure'David S. Miller2-1/+48
Ido Schimmel says: ==================== net: Fix bridge enslavement failure Patch #1 fixes an issue in which an upper netdev cannot be enslaved to a bridge when it has multiple netdevs with different parent identifiers beneath it. Patch #2 adds a test case using two netdevsim instances. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-09-10selftests: rtnetlink: Test bridge enslavement with different parent IDsIdo Schimmel1-0/+47
Test that an upper device of netdevs with different parent IDs can be enslaved to a bridge. The test fails without previous commit. Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10net: Fix bridge enslavement failureIdo Schimmel1-1/+1
When a netdev is enslaved to a bridge, its parent identifier is queried. This is done so that packets that were already forwarded in hardware will not be forwarded again by the bridge device between netdevs belonging to the same hardware instance. The operation fails when the netdev is an upper of netdevs with different parent identifiers. Instead of failing the enslavement, have dev_get_port_parent_id() return '-EOPNOTSUPP' which will signal the bridge to skip the query operation. Other callers of the function are not affected by this change. Fixes: 7e1146e8c10c ("net: devlink: introduce devlink_compat_switch_id_get() helper") Signed-off-by: Ido Schimmel <[email protected]> Reported-by: Vasundhara Volam <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10net: mvneta: fix possible use-after-free in mvneta_xdp_put_buffLorenzo Bianconi1-2/+2
Release first buffer as last one since it contains references to subsequent fragments. This code will be optimized introducing multi-buffer bit in xdp_buff structure. Fixes: ca0e014609f05 ("net: mvneta: move skb build after descriptors processing") Signed-off-by: Lorenzo Bianconi <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10s390/qeth: delay draining the TX buffersJulian Wiedmann2-2/+2
Wait until the QDIO data connection is severed. Otherwise the device might still be processing the buffers, and end up accessing skb data that we already freed. Fixes: 8b5026bc1693 ("s390/qeth: fix qdio teardown after early init error") Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10net: Fix broken NETIF_F_CSUM_MASK spell in netdev_features.hMiaohe Lin1-1/+1
Remove the weird space inside the NETIF_F_CSUM_MASK. Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10net: Correct the comment of dst_dev_put()Miaohe Lin1-1/+1
Since commit 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries"), we use blackhole_netdev to invalidate dst entries instead of loopback device anymore. Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10hdlc_ppp: add range checks in ppp_cp_parse_cr()Dan Carpenter1-5/+11
There are a couple bugs here: 1) If opt[1] is zero then this results in a forever loop. If the value is less than 2 then it is invalid. 2) It assumes that "len" is more than sizeof(valid_accm) or 6 which can result in memory corruption. In the case of LCP_OPTION_ACCM, then we should check "opt[1]" instead of "len" because, if "opt[1]" is less than sizeof(valid_accm) then "nak_len" gets out of sync and it can lead to memory corruption in the next iterations through the loop. In case of LCP_OPTION_MAGIC, the only valid value for opt[1] is 6, but the code is trying to log invalid data so we should only discard the data when "len" is less than 6 because that leads to a read overflow. Reported-by: ChenNan Of Chaitin Security Research Lab <[email protected]> Fixes: e022c2f07ae5 ("WAN: new synchronous PPP implementation for generic HDLC.") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10net: phy: call phy_disable_interrupts() in phy_attach_direct() insteadYoshihiro Shimoda1-4/+4
Since the micrel phy driver calls phy_init_hw() as a workaround, the commit 9886a4dbd2aa ("net: phy: call phy_disable_interrupts() in phy_init_hw()") disables the interrupt unexpectedly. So, call phy_disable_interrupts() in phy_attach_direct() instead. Otherwise, the phy cannot link up after the ethernet cable was disconnected. Note that other drivers (like at803x.c) also calls phy_init_hw(). So, perhaps, the driver caused a similar issue too. Fixes: 9886a4dbd2aa ("net: phy: call phy_disable_interrupts() in phy_init_hw()") Signed-off-by: Yoshihiro Shimoda <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10hv_netvsc: Cache the current data path to avoid duplicate call and messageDexuan Cui2-1/+23
The previous change "hv_netvsc: Switch the data path at the right time during hibernation" adds the call of netvsc_vf_changed() upon NETDEV_CHANGE, so it's necessary to avoid the duplicate call and message when the VF is brought UP or DOWN. Signed-off-by: Dexuan Cui <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10hv_netvsc: Switch the data path at the right time during hibernationDexuan Cui1-10/+1
When netvsc_resume() is called, the mlx5 VF NIC has not been resumed yet, so in the future the host might sliently fail the call netvsc_vf_changed() -> netvsc_switch_datapath() there, even if the call works now. Call netvsc_vf_changed() in the NETDEV_CHANGE event handler: at that time the mlx5 VF NIC has been resumed. Fixes: 19162fd4063a ("hv_netvsc: Fix hibernation for mlx5 VF driver") Signed-off-by: Dexuan Cui <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10net: sch_generic: aviod concurrent reset and enqueue op for lockless qdiscYunsheng Lin1-15/+33
Currently there is concurrent reset and enqueue operation for the same lockless qdisc when there is no lock to synchronize the q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in qdisc_deactivate() called by dev_deactivate_queue(), which may cause out-of-bounds access for priv->ring[] in hns3 driver if user has requested a smaller queue num when __dev_xmit_skb() still enqueue a skb with a larger queue_mapping after the corresponding qdisc is reset, and call hns3_nic_net_xmit() with that skb later. Reused the existing synchronize_net() in dev_deactivate_many() to make sure skb with larger queue_mapping enqueued to old qdisc(which is saved in dev_queue->qdisc_sleeping) will always be reset when dev_reset_queue() is called. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Signed-off-by: Yunsheng Lin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10net: dsa: microchip: look for phy-mode in port nodesHelmut Grohne5-18/+47
Documentation/devicetree/bindings/net/dsa/dsa.txt says that the phy-mode property should be specified on port nodes. However, the microchip drivers read it from the switch node. Let the driver use the per-port property and fall back to the old location with a warning. Fix in-tree users. Signed-off-by: Helmut Grohne <[email protected]> Link: https://lore.kernel.org/netdev/20200617082235.GA1523@laureti-dev/ Acked-by: Alexandre Belloni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10mptcp: fix kmalloc flag in mptcp_pm_nl_get_local_idGeliang Tang1-1/+1
mptcp_pm_nl_get_local_id may be called in interrupt context, so we need to use GFP_ATOMIC flag to allocate memory to avoid sleeping in atomic context. [ 280.209809] BUG: sleeping function called from invalid context at mm/slab.h:498 [ 280.209812] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1680, name: kworker/1:3 [ 280.209814] INFO: lockdep is turned off. [ 280.209816] CPU: 1 PID: 1680 Comm: kworker/1:3 Tainted: G W 5.9.0-rc3-mptcp+ #146 [ 280.209818] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 280.209820] Workqueue: events mptcp_worker [ 280.209822] Call Trace: [ 280.209824] <IRQ> [ 280.209826] dump_stack+0x77/0xa0 [ 280.209829] ___might_sleep.cold+0xa6/0xb6 [ 280.209832] kmem_cache_alloc_trace+0x1d1/0x290 [ 280.209835] mptcp_pm_nl_get_local_id+0x23c/0x410 [ 280.209840] subflow_init_req+0x1e9/0x2ea [ 280.209843] ? inet_reqsk_alloc+0x1c/0x120 [ 280.209845] ? kmem_cache_alloc+0x264/0x290 [ 280.209849] tcp_conn_request+0x303/0xae0 [ 280.209854] ? printk+0x53/0x6a [ 280.209857] ? tcp_rcv_state_process+0x28f/0x1374 [ 280.209859] tcp_rcv_state_process+0x28f/0x1374 [ 280.209864] ? tcp_v4_do_rcv+0xb3/0x1f0 [ 280.209866] tcp_v4_do_rcv+0xb3/0x1f0 [ 280.209869] tcp_v4_rcv+0xed6/0xfa0 [ 280.209873] ip_protocol_deliver_rcu+0x28/0x270 [ 280.209875] ip_local_deliver_finish+0x89/0x120 [ 280.209877] ip_local_deliver+0x180/0x220 [ 280.209881] ip_rcv+0x166/0x210 [ 280.209885] __netif_receive_skb_one_core+0x82/0x90 [ 280.209888] process_backlog+0xd6/0x230 [ 280.209891] net_rx_action+0x13a/0x410 [ 280.209895] __do_softirq+0xcf/0x468 [ 280.209899] asm_call_on_stack+0x12/0x20 [ 280.209901] </IRQ> [ 280.209903] ? ip_finish_output2+0x240/0x9a0 [ 280.209906] do_softirq_own_stack+0x4d/0x60 [ 280.209908] do_softirq.part.0+0x2b/0x60 [ 280.209911] __local_bh_enable_ip+0x9a/0xa0 [ 280.209913] ip_finish_output2+0x264/0x9a0 [ 280.209916] ? rcu_read_lock_held+0x4d/0x60 [ 280.209920] ? ip_output+0x7a/0x250 [ 280.209922] ip_output+0x7a/0x250 [ 280.209925] ? __ip_finish_output+0x330/0x330 [ 280.209928] __ip_queue_xmit+0x1dc/0x5a0 [ 280.209931] __tcp_transmit_skb+0xa0f/0xc70 [ 280.209937] tcp_connect+0xb03/0xff0 [ 280.209939] ? lockdep_hardirqs_on_prepare+0xe7/0x190 [ 280.209942] ? ktime_get_with_offset+0x125/0x150 [ 280.209944] ? trace_hardirqs_on+0x1c/0xe0 [ 280.209948] tcp_v4_connect+0x449/0x550 [ 280.209953] __inet_stream_connect+0xbb/0x320 [ 280.209955] ? mark_held_locks+0x49/0x70 [ 280.209958] ? lockdep_hardirqs_on_prepare+0xe7/0x190 [ 280.209960] ? __local_bh_enable_ip+0x6b/0xa0 [ 280.209963] inet_stream_connect+0x32/0x50 [ 280.209966] __mptcp_subflow_connect+0x1fd/0x242 [ 280.209972] mptcp_pm_create_subflow_or_signal_addr+0x2db/0x600 [ 280.209975] mptcp_worker+0x543/0x7a0 [ 280.209980] process_one_work+0x26d/0x5b0 [ 280.209984] ? process_one_work+0x5b0/0x5b0 [ 280.209987] worker_thread+0x48/0x3d0 [ 280.209990] ? process_one_work+0x5b0/0x5b0 [ 280.209993] kthread+0x117/0x150 [ 280.209996] ? kthread_park+0x80/0x80 [ 280.209998] ret_from_fork+0x22/0x30 Fixes: 01cacb00b35cb ("mptcp: add netlink-based PM") Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-10Merge branch 'mptcp-fix-subflow-s-local_id-remote_id-issues'David S. Miller2-4/+20
Geliang Tang says: ==================== mptcp: fix subflow's local_id/remote_id issues v2: - add Fixes tags; - simply with 'return addresses_equal'; - use 'reversed Xmas tree' way. ==================== Signed-off-by: David S. Miller <[email protected]>