aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-02-29batman-adv: B.A.T.M.A.N. V - implement neighbor comparison API callsAntonio Quartulli1-0/+38
Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: ELP - send unicast ELP packets for throughput samplingAntonio Quartulli2-0/+72
In case of an unused wireless link, the mac80211 throughput estimation won't get updated further. Consequently, the reported throughput metric will become obsolete. With this patch unicast sampling is introduced by periodically sending unicast ELP packets to each neighbor on idle WiFi links. These sampling packets will fill an entire frame, so that the measurement is as reliable as possible Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: ELP - compute the metric based on the estimated throughputAntonio Quartulli7-2/+164
In case of wireless interface retrieve the throughput by querying cfg80211. To perform this call a separate work must be scheduled because the function may sleep and this is not allowed within an RCU protected context (RCU in this case is used to iterate over all the neighbours). Use ethtool to retrieve information about an Ethernet link like HALF/FULL_DUPLEX and advertised bandwidth (e.g. 100/10Mbps). The metric is updated each time a new ELP packet is sent, this way it is possible to timely react to a metric variation which can imply (for example) a neighbour disconnection. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: keep track of when unicast packets are sentAntonio Quartulli10-34/+75
To enable ELP to send probing packets over wireless links only if needed, batman-adv must keep track of the last time it sent a unicast packet towards every neighbour. For this purpose a 2 main changes are introduced: 1) a new member of the elp_neigh_node structure stores the last time a unicast packet was sent towards this neighbour; 2) a wrapper function for sending unicast packets is implemented. This function will simply update the member describe din point 1) and then forward the packet to the real sending routine. Point 2) implies that any code-path leading to a unicast sending now has to use the new wrapper. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: add throughput override attribute to hard_ifacesAntonio Quartulli5-2/+86
This attribute is exported to user space to disable the link throughput auto-detection by setting a fixed value. The throughput override value is used when batman-adv is computing the link throughput towards a neighbour. If the value is set to 0 then batman-adv will try to detect the throughput by itself. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: OGMv2 - implement originators logicAntonio Quartulli5-41/+566
Add the support for recognising new originators in the network and rebroadcast their OGMs. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: OGMv2 - add basic infrastructureAntonio Quartulli9-3/+427
This is the initial implementation of the new OGM protocol (version 2). It has been designed to work on top of the newly added ELP. In the previous version the OGM protocol was used to both measure link qualities and flood the network with the metric information. In this version the protocol is in charge of the latter task only, leaving the former to ELP. This means being able to decouple the interval used by the neighbor discovery from the OGM broadcasting, which revealed to be costly in dense networks and needed to be relaxed so leading to a less responsive routing protocol. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: ELP - adding sysfs parameter for elp intervalLinus Luessing1-0/+7
This parameter can be set individually on each interface and allows the configuration of the elp interval for the link quality measurements during runtime. Usually it is desirable to set it to a higher (= slower) value on interfaces which have a more static characteristic (e.g. wired interfaces) or very dense neighbourhoods to reduce overhead. Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> [[email protected]: respin on top of the latest master] Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: ELP - creating neighbor structuresLinus Luessing5-1/+214
Initially developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: ELP - adding basic infrastructureLinus Luessing9-1/+363
The B.A.T.M.A.N. protocol originally only used a single message type (called OGM) to determine the link qualities to the direct neighbors and spreading these link quality information through the whole mesh. This procedure is summarized on the BATMAN concept page and explained in details in the RFC draft published in 2008. This approach was chosen for its simplicity during the protocol design phase and the implementation. However, it also bears some drawbacks: * Wireless interfaces usually come with some packet loss, therefore a higher broadcast rate is desirable to allow a fast reaction on flaky connections. Other interfaces of the same host might be connected to Ethernet LANs / VPNs / etc which rarely exhibit packet loss would benefit from a lower broadcast rate to reduce overhead. * It generally is more desirable to detect local link quality changes at a faster rate than propagating all these changes through the entire mesh (the far end of the mesh does not need to care about local link quality changes that much). Other optimizations strategies, like reducing overhead, might be possible if OGMs weren't used for all tasks in the mesh at the same time. As a result detecting local link qualities shall be handled by an independent message type, ELP, whereas the OGM message type remains responsible for flooding the mesh with these link quality information and determining the overall path transmit qualities. Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: Add hard_iface specific sysfs wrapper macros for UINTLinus Luessing1-0/+49
This allows us to easily add a sysfs parameter for an unsigned int later, which is not for a batman mesh interface (e.g. bat0), but for a common interface instead. It allows reading and writing an atomic_t in hard_iface (instead of bat_priv compared to the mesh variant). Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> [[email protected]: rename functions and move macros] Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-26net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump.MINOURA Makoto / 箕浦 真3-6/+20
When the send skbuff reaches the end, nlmsg_put and friends returns -EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called within a for_each_netdev loop and the first fdb entry of a following netdev could fit in the remaining skbuff. This breaks the mechanism of cb->args[0] and idx to keep track of the entries that are already dumped, which results missing entries in bridge fdb show command. Signed-off-by: Minoura Makoto <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-26GSO: Provide software checksum of tunneled UDP fragmentation offloadAlexander Duyck3-7/+37
On reviewing the code I realized that GRE and UDP tunnels could cause a kernel panic if we used GSO to segment a large UDP frame that was sent through the tunnel with an outer checksum and hardware offloads were not available. In order to correct this we need to update the feature flags that are passed to the skb_segment function so that in the event of UDP fragmentation being requested for the inner header the segmentation function will correctly generate the checksum for the payload if we cannot segment the outer header. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-26net: l3mdev: prefer VRF master for source address selectionDavid Lamparter1-0/+17
When selecting an address in context of a VRF, the vrf master should be preferred for address selection. If it isn't, the user has a hard time getting the system to select to their preference - the code will pick the address off the first in-VRF interface it can find, which on a router could well be a non-routable address. Signed-off-by: David Lamparter <[email protected]> Signed-off-by: David Ahern <[email protected]> [dsa: Fixed comment style and removed extra blank link ] Signed-off-by: David S. Miller <[email protected]>
2016-02-26net: l3mdev: address selection should only consider devices in L3 domainDavid Ahern2-2/+14
David Lamparter noted a use case where the source address selection fails to pick an address from a VRF interface - unnumbered interfaces. Relevant commands from his script: ip addr add 9.9.9.9/32 dev lo ip link set lo up ip link add name vrf0 type vrf table 101 ip rule add oif vrf0 table 101 ip rule add iif vrf0 table 101 ip link set vrf0 up ip addr add 10.0.0.3/32 dev vrf0 ip link add name dummy2 type dummy ip link set dummy2 master vrf0 up --> note dummy2 has no address - unnumbered device ip route add 10.2.2.2/32 dev dummy2 table 101 ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02 tcpdump -ni dummy2 & And using ping instead of his socat example: $ ping -I vrf0 -c1 10.2.2.2 ping: Warning: source address might be selected on device other than vrf0. PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data. >From tcpdump: 12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64 Note the source address is from lo and is not a VRF local address. With this patch: $ ping -I vrf0 -c1 10.2.2.2 PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data. >From tcpdump: 12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64 Now the source address comes from vrf0. The ipv4 function for selecting source address takes a const argument. Removing the const requires touching a lot of places, so instead l3mdev_master_ifindex_rcu is changed to take a const argument and then do the typecast to non-const as required by netdev_master_upper_dev_get_rcu. This is similar to what l3mdev_fib_table_rcu does. IPv6 for unnumbered interfaces appears to be selecting the addresses properly. Cc: David Lamparter <[email protected]> Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-26Merge branch 'for-linus' of ↵Linus Torvalds2-6/+13
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "There are two small messenger bug fixes and a log spam regression fix" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: don't spam dmesg with stray reply warnings libceph: use the right footer size when skipping a message libceph: don't bail early from try_read() when skipping a message
2016-02-266lowpan: iphc: fix invalid case handlingAlexander Aring1-2/+2
This patch fixes the return value in a case which should never occur. Instead returning "-EINVAL" we return LOWPAN_IPHC_DAM_00 which is invalid on context based addresses. Also change the WARN_ON_ONCE to WARN_ONCE which was suggested by Dan Carpenter. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-02-25Merge tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-1/+1
Pull nfsd bugfix from Bruce Fields: "One fix for a bug that could cause a NULL write past the end of a buffer in case of unusually long writes to some system interfaces used by mountd and other nfs support utilities" * tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux: sunrpc/cache: fix off-by-one in qword_get()
2016-02-25net: ethtool: remove unused __ethtool_get_settingsDavid Decotigny1-31/+14
replaced by __ethtool_get_link_ksettings. Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: core: use __ethtool_get_ksettingsDavid Decotigny2-12/+14
Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: bridge: use __ethtool_get_ksettingsDavid Decotigny1-3/+3
Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: 8021q: use __ethtool_get_ksettingsDavid Decotigny1-4/+4
Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: ethtool: add new ETHTOOL_xLINKSETTINGS APIDavid Decotigny1-6/+447
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API, handled by the new get_link_ksettings/set_link_ksettings callbacks. This API provides support for most legacy ethtool_cmd fields, adds support for larger link mode masks (up to 4064 bits, variable length), and removes ethtool_cmd deprecated fields (transceiver/maxrxpkt/maxtxpkt). This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides the following backward compatibility properties: - legacy ethtool with legacy drivers: no change, still using the get_settings/set_settings callbacks. - legacy ethtool with new get/set_link_ksettings drivers: the new driver callbacks are used, data internally converted to legacy ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link mode mask. ETHTOOL_SSET will fail if user tries to set the ethtool_cmd deprecated fields to non-0 (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if driver sets higher bits. - future ethtool with legacy drivers: no change, still using the get_settings/set_settings callbacks, internally converted to new data structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be ignored and seen as 0 from user space. Note that that "future" ethtool tool will not allow changes to these deprecated fields. - future ethtool with new drivers: direct call to the new callbacks. By "future" ethtool, what is meant is: - query: first try ETHTOOL_GLINKSETTINGS, and revert to ETHTOOL_GSET if fails - set: query first and remember which of ETHTOOL_GLINKSETTINGS or ETHTOOL_GSET was successful + if ETHTOOL_GLINKSETTINGS was successful, then change config with ETHTOOL_SLINKSETTINGS. A failure there is final (do not try ETHTOOL_SSET). + otherwise ETHTOOL_GSET was successful, change config with ETHTOOL_SSET. A failure there is final (do not try ETHTOOL_SLINKSETTINGS). The interaction user/kernel via the new API requires a small ETHTOOL_GLINKSETTINGS handshake first to agree on the length of the link mode bitmaps. If kernel doesn't agree with user, it returns the bitmap length it is expecting from user as a negative length (and cmd field is 0). When kernel and user agree, kernel returns valid info in all fields (ie. link mode length > 0 and cmd is ETHTOOL_GLINKSETTINGS). Data structure crossing user/kernel boundary is 32/64-bit agnostic. Converted internally to a legal kernel bitmap. The internal __ethtool_get_settings kernel helper will gradually be replaced by __ethtool_get_link_ksettings by the time the first "link_settings" drivers start to appear. So this patch doesn't change it, it will be removed before it needs to be changed. Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: Facility to report route quality of connected socketsTom Herbert1-0/+4
This patch add the SO_CNX_ADVICE socket option (setsockopt only). The purpose is to allow an application to give feedback to the kernel about the quality of the network path for a connected socket. The value argument indicates the type of quality report. For this initial patch the only supported advice is a value of 1 which indicates "bad path, please reroute"-- the action taken by the kernel is to call dst_negative_advice which will attempt to choose a different ECMP route, reset the TX hash for flow label and UDP source port in encapsulation, etc. This facility should be useful for connected UDP sockets where only the application can provide any feedback about path quality. It could also be useful for TCP applications that have additional knowledge about the path outside of the normal TCP control loop. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: ipv6: Make address flushing on ifdown optionalDavid Ahern1-15/+121
Currently, all ipv6 addresses are flushed when the interface is configured down, including global, static addresses: $ ip -6 addr show dev eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000 inet6 2100:1::2/120 scope global valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link valid_lft forever preferred_lft forever $ ip link set dev eth1 down $ ip -6 addr show dev eth1 << nothing; all addresses have been flushed>> Add a new sysctl to make this behavior optional. The new setting defaults to flush all addresses to maintain backwards compatibility. When the set global addresses with no expire times are not flushed on an admin down. The sysctl is per-interface or system-wide for all interfaces $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1 or $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1 Will keep addresses on eth1 on an admin down. $ ip -6 addr show dev eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000 inet6 2100:1::2/120 scope global valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link valid_lft forever preferred_lft forever $ ip link set dev eth1 down $ ip -6 addr show dev eth1 3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000 inet6 2100:1::2/120 scope global tentative valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative valid_lft forever preferred_lft forever Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25tipc: fix null deref crash in compat config pathFlorian Westphal1-0/+1
msg.dst_sk needs to be set up with a valid socket because some callbacks later derive the netns from it. Fixes: 263ea09084d172d ("Revert "genl: Add genlmsg_new_unicast() for unicast message allocation") Reported-by: Jon Maloy <[email protected]> Bisected-by: Jon Maloy <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Acked-by Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25tipc: fix crash during node removalJon Paul Maloy1-13/+11
When the TIPC module is unloaded, we have identified a race condition that allows a node reference counter to go to zero and the node instance being freed before the node timer is finished with accessing it. This leads to occasional crashes, especially in multi-namespace environments. The scenario goes as follows: CPU0:(node_stop) CPU1:(node_timeout) // ref == 2 1: if(!mod_timer()) 2: if (del_timer()) 3: tipc_node_put() // ref -> 1 4: tipc_node_put() // ref -> 0 5: kfree_rcu(node); 6: tipc_node_get(node) 7: // BOOM! We now clean up this functionality as follows: 1) We remove the node pointer from the node lookup table before we attempt deactivating the timer. This way, we reduce the risk that tipc_node_find() may obtain a valid pointer to an instance marked for deletion; a harmless but undesirable situation. 2) We use del_timer_sync() instead of del_timer() to safely deactivate the node timer without any risk that it might be reactivated by the timeout handler. There is no risk of deadlock here, since the two functions never touch the same spinlocks. 3: We remove a pointless tipc_node_get() + tipc_node_put() from the timeout handler. Reported-by: Zhijiang Hu <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25tipc: eliminate risk of finding to-be-deleted node instanceJon Paul Maloy1-9/+9
Although we have never seen it happen, we have identified the following problematic scenario when nodes are stopped and deleted: CPU0: CPU1: tipc_node_xxx() //ref == 1 tipc_node_put() //ref -> 0 tipc_node_find() // node still in table tipc_node_delete() list_del_rcu(n. list) tipc_node_get() //ref -> 1, bad kfree_rcu() tipc_node_put() //ref to 0 again. kfree_rcu() // BOOM! We fix this by introducing use of the conditional kref_get_if_not_zero() instead of kref_get() in the function tipc_node_find(). This eliminates any risk of post-mortem access. Reported-by: Zhijiang Hu <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: fix bridge multicast packet checksum validationLinus Lüssing1-2/+20
We need to update the skb->csum after pulling the skb, otherwise an unnecessary checksum (re)computation can ocure for IGMP/MLD packets in the bridge code. Additionally this fixes the following splats for network devices / bridge ports with support for and enabled RX checksum offloading: [...] [ 43.986968] eth0: hw csum failure [ 43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2 [ 43.996193] Hardware name: BCM2709 [ 43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] (show_stack+0x10/0x14) [ 44.007432] [<8001cf14>] (show_stack) from [<801ab614>] (dump_stack+0x80/0x90) [ 44.014695] [<801ab614>] (dump_stack) from [<802e4548>] (__skb_checksum_complete+0x6c/0xac) [ 44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] (ipv6_mc_validate_checksum+0x104/0x178) [ 44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] (skb_checksum_trimmed+0x130/0x188) [ 44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] (ipv6_mc_check_mld+0x118/0x338) [ 44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] (br_multicast_rcv+0x5dc/0xd00) [ 44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] (br_handle_frame_finish+0xac/0x51c) [...] Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code") Reported-by: Álvaro Fernández Rojas <[email protected]> Signed-off-by: Linus Lüssing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: dsa: drop vlan_getnextVivien Didelot1-34/+1
The VLAN GetNext operation is specific to some switches, and thus can be complicated to implement for some drivers. Remove the support for the vlan_getnext/port_pvid_get approach in favor of the generic and simpler port_vlan_dump function. Signed-off-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: dsa: add port_vlan_dump routineVivien Didelot1-0/+3
Similar to port_fdb_dump, add a port_vlan_dump function to DSA drivers which gets passed the switchdev VLAN object and callback. This function, if implemented, takes precedence over the soon legacy vlan_getnext/port_pvid_get approach. Signed-off-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net_sched: add network namespace support for tc actionsWANG Cong13-138/+699
Currently tc actions are stored in a per-module hashtable, therefore are visible to all network namespaces. This is probably the last part of the tc subsystem which is not aware of netns now. This patch makes them per-netns, several tc action API's need to be adjusted for this. The tc action API code is ugly due to historical reasons, we need to refactor that code in the future. Cc: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net_sched: prepare tcf_hashinfo_destroy() for netns supportWANG Cong1-3/+29
We only release the memory of the hashtable itself, not its entries inside. This is not a problem yet since we only call it in module release path, and module is refcount'ed by actions. This would be a problem after we move the per module hinfo into per netns in the latter patch. Cc: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25NFC: Close a race condition in llcp_sock_getname()Cong Wang1-0/+6
llcp_sock_getname() checks llcp_sock->dev to make sure llcp_sock is already connected or bound, however, we could be in the middle of llcp_sock_bind() where llcp_sock->dev is bound and llcp_sock->service_name_len is set, but llcp_sock->service_name is not, in this case we would lead to copy some bytes from a NULL pointer. Just lock the sock since this is not a hot path anyway. Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Samuel Ortiz <[email protected]>
2016-02-25NFC: Use GFP_USER for user-controlled kmallocCong Wang1-2/+2
These two functions are called in sendmsg path, and the 'len' is passed from user-space, so we should not allow malicious users to OOM kernel on purpose. Reported-by: Dmitry Vyukov <[email protected]> Acked-by: Eric Dumazet <[email protected]> Reviewed-by: Julian Calaby <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Samuel Ortiz <[email protected]>
2016-02-24Merge tag 'mac80211-for-davem-2016-02-23' of ↵David S. Miller7-20/+60
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Another small set of fixes: * stop critical protocol session on disconnect to avoid it getting stuck * wext: fix two RTNL message ordering issues * fix an uninitialized value (found by KASAN) * fix an out-of-bounds access (also found by KASAN) * clear connection keys when freeing them in all cases (IBSS, all other places already did so) * fix expected throughput unit to get consistent values * set default TX aggregation timeout to 0 in minstrel to avoid (really just hide) issues and perform better ==================== Signed-off-by: David S. Miller <[email protected]>
2016-02-24bpf: fix csum setting for bpf_set_tunnel_keyDaniel Borkmann1-2/+4
The fix in 35e2d1152b22 ("tunnels: Allow IPv6 UDP checksums to be correctly controlled.") changed behavior for bpf_set_tunnel_key() when in use with IPv6 and thus uncovered a bug that TUNNEL_CSUM needed to be set but wasn't. As a result, the stack dropped ingress vxlan IPv6 packets, that have been sent via eBPF through collect meta data mode due to checksum now being zero. Since after LCO, we enable IPv4 checksum by default, so make that analogous and only provide a flag BPF_F_ZERO_CSUM_TX for the user to turn it off in IPv4 case. Fixes: 35e2d1152b22 ("tunnels: Allow IPv6 UDP checksums to be correctly controlled.") Fixes: c6c33454072f ("bpf: support ipv6 for bpf_skb_{set,get}_tunnel_key") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-24netfilter: ipset: Fix set:list type crash when flush/dump set in parallelJozsef Kadlecsik2-30/+28
Flushing/listing entries was not RCU safe, so parallel flush/dump could lead to kernel crash. Bug reported by Deniz Eren. Fixes netfilter bugzilla id #1050. Signed-off-by: Jozsef Kadlecsik <[email protected]>
2016-02-24libceph: don't spam dmesg with stray reply warningsIlya Dryomov1-2/+2
Commit d15f9d694b77 ("libceph: check data_len in ->alloc_msg()") mistakenly bumped the log level on the "tid %llu unknown, skipping" message. Turn it back into a dout() - stray replies are perfectly normal when OSDs flap, crash, get killed for testing purposes, etc. Cc: [email protected] # 4.3+ Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2016-02-24libceph: use the right footer size when skipping a messageIlya Dryomov1-2/+9
ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13. Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated. Cc: [email protected] # 3.19+ Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2016-02-24libceph: don't bail early from try_read() when skipping a messageIlya Dryomov1-2/+2
The contract between try_read() and try_write() is that when called each processes as much data as possible. When instructed by osd_client to skip a message, try_read() is violating this contract by returning after receiving and discarding a single message instead of checking for more. try_write() then gets a chance to write out more requests, generating more replies/skips for try_read() to handle, forcing the messenger into a starvation loop. Cc: [email protected] # 3.10+ Reported-by: Varada Kari <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Tested-by: Varada Kari <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2016-02-24ipv4: only create late gso-skb if skb is already set up with CHECKSUM_PARTIALHannes Frederic Sowa1-1/+4
Otherwise we break the contract with GSO to only pass CHECKSUM_PARTIAL skbs down. This can easily happen with UDP+IPv4 sockets with the first MSG_MORE write smaller than the MTU, second write is a sendfile. Returning -EOPNOTSUPP lets the callers fall back into normal sendmsg path, were we calculate the checksum manually during copying. Commit d749c9cbffd6 ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets") started to exposes this bug. Fixes: d749c9cbffd6 ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets") Reported-by: Jiri Benc <[email protected]> Cc: Jiri Benc <[email protected]> Reported-by: Wakko Warner <[email protected]> Cc: Wakko Warner <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-24eth: Pull header from first fragment via eth_get_headlenAlexander Duyck1-1/+2
We want to try and pull the L4 header in if it is available in the first fragment. As such add the flag to indicate we want to pull the headers on the first fragment in. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-24flow_dissector: Use same pointer for IPv4 and IPv6 addressesAlexander Duyck1-6/+5
The IPv6 parsing was using a local pointer when it could use the same pointer as the IPv4 portion of the code since the key_addrs can support both IPv4 and IPv6 as it is just a pointer. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-24flow_dissector: Correctly handle parsing FCoEAlexander Duyck1-2/+5
The flow dissector bits handling FCoE didn't bother to actually validate that the space there was enough for the FCoE header. So we need to update things so that if there is room we add the header and report a good result, otherwise we do not add the header, and report the bad result. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-24flow_dissector: Fix fragment handling for header length computationAlexander Duyck1-3/+7
It turns out that for IPv4 we were reporting the ip_proto of the fragment, and for IPv6 we were not. This patch updates that behavior so that we always report the IP protocol of the fragment. In addition it takes the steps of updating the payload offset code so that we will determine the start of the payload not including the L4 header for any fragment after the first. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-24flow_dissector: Check for IP fragmentation even if not using IPv4 addressAlexander Duyck1-8/+9
This patch corrects the logic for the IPv4 parsing so that it is consistent with how we handle IPv6. Specifically if we do not have the flow key indicating we want the addresses we still may need to take a look at the IP fragmentation bits and to see if we should stop after we have recognized the L3 header. Fixes: 807e165dc44f ("flow_dissector: Add control/reporting of fragmentation") Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-24soreuseport: fix merge conflict in tcp bindCraig Gallek1-0/+1
One of the validation checks for the new array-based TCP SO_REUSEPORT validation was unintentionally dropped in ea8add2b1903. This adds it back. Lack of this check allows the user to allocate multiple sock_reuseport structures (leaking all but the first). Fixes: ea8add2b1903 ("tcp/dccp: better use of ephemeral ports in bind()") Signed-off-by: Craig Gallek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-246lowpan: fix error checking codeAndrzej Hajda1-1/+1
Bool variable 'fail' is always non-negative, it indicates an error if it is true. The problem has been detected using coccinelle script scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci Signed-off-by: Andrzej Hajda <[email protected]> Acked-by: Alexander Aring <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-02-246lowpan: iphc: fix stateful multicast compressionAlexander Aring1-1/+2
In case of multicast address we need to set always the LOWPAN_IPHC_M bit and if a destination context identifier was found for a multicast address then we need to set the LOWPAN_IPHC_DAC as well. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>