aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-04-14netfilter: x_tables: don't move to non-existent next ruleFlorian Westphal3-3/+13
Ben Hawkes says: In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it is possible for a user-supplied ipt_entry structure to have a large next_offset field. This field is not bounds checked prior to writing a counter value at the supplied offset. Base chains enforce absolute verdict. User defined chains are supposed to end with an unconditional return, xtables userspace adds them automatically. But if such return is missing we will move to non-existent next rule. Reported-by: Ben Hawkes <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-04-13dsa: Rename phys_port_mask to enabled_port_maskAndrew Lunn1-4/+4
The phys in phys_port_mask suggests this mask is about PHYs. In fact, it means physical ports. Rename to enabled_port_mask, indicating external enabled ports of the switch, which is hopefully less confusing. Signed-off-by: Andrew Lunn <[email protected]> Tested-by: Vivien Didelot <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-13net: dsa: Remove allocation of driver private memoryAndrew Lunn1-1/+1
The drivers now allocate their own memory for private usage. Remove the allocation from the core code. Signed-off-by: Andrew Lunn <[email protected]> Acked-by: Florian Fainelli <[email protected]> Tested-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-13net: dsa: Have the switch driver allocate there own private memoryAndrew Lunn1-3/+5
Now the switch devices have a dev pointer, make use of it for allocating the drivers private data structures using a devm_kzalloc(). Signed-off-by: Andrew Lunn <[email protected]> Acked-by: Florian Fainelli <[email protected]> Tested-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-13net: dsa: Pass the dsa device to the switch driversAndrew Lunn1-3/+4
By passing a device structure to the switch devices, it allows them to use devm_* methods for resource management. Signed-off-by: Andrew Lunn <[email protected]> Acked-by: Florian Fainelli <[email protected]> Tested-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-13Merge tag 'mac80211-next-for-davem-2016-04-13' of ↵David S. Miller35-205/+205
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== To synchronize with Kalle, here's just a big change that affects all drivers - removing the duplicated enum ieee80211_band and replacing it by enum nl80211_band. On top of that, just a small documentation update. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-04-13tipc: remove remnants of old broadcast codeJon Paul Maloy1-15/+0
We remove a couple of leftover fields in struct tipc_bearer. Those were used by the old broadcast implementation, and are not needed any longer. There is no functional changes in this commit. Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-136lowpan: move mac802154 headerAlexander Aring1-3/+0
In case of link-layer specific handling for 802.15.4 we need to cast to 802.15.4 sepcific structures. Simple add this header when include the 6lowpan header. Signed-off-by: Alexander Aring <[email protected]> Reviewed-by: Stefan Schmidt<[email protected]> Acked-by: Jukka Rissanen <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-04-136lowpan: add lowpan_is_ll functionAlexander Aring1-0/+9
This patch adds the lowpan_is_ll function, which can be used to make a special 6lowpan linklayer handling for a specific 6lowpan linklayer type. Signed-off-by: Alexander Aring <[email protected]> Reviewed-by: Stefan Schmidt<[email protected]> Acked-by: Jukka Rissanen <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-04-136lowpan: move eui64 uncompress functionAlexander Aring1-16/+0
This function will be use in later functionality in other branches than generic 6lowpan, so we move it to the global 6lowpan header. Signed-off-by: Alexander Aring <[email protected]> Reviewed-by: Stefan Schmidt<[email protected]> Acked-by: Jukka Rissanen <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-04-136lowpan: iphc: remove unnecessary zero dataAlexander Aring1-1/+1
This patch removes unnecessary zero data for a stack variable. Signed-off-by: Alexander Aring <[email protected]> Acked-by: Jukka Rissanen <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-04-136lowpan: iphc: rename add lowpan prefixAlexander Aring1-25/+31
This patch adds a lowpan prefix to each functions which doesn't have such prefix currently. Reviewed-by: Stefan Schmidt <[email protected]> Signed-off-by: Alexander Aring <[email protected]> Acked-by: Jukka Rissanen <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-04-136lowpan: move lowpan_802154_dev to 6lowpanAlexander Aring1-12/+0
This patch moves the 802.15.4 link layer specific structures to generic 6lowpan. This is necessary for special 802.15.4 6lowpan handling in 6lowpan generic layer. Reviewed-by: Stefan Schmidt <[email protected]> Signed-off-by: Alexander Aring <[email protected]> Acked-by: Jukka Rissanen <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-04-136lowpan: change naming for lowpan private dataAlexander Aring8-87/+91
This patch changes the naming for interface private data for lowpan intefaces. The current private data scheme is: ------------------------------------------------- | 6LoWPAN Generic | LinkLayer 6LoWPAN | ------------------------------------------------- the current naming schemes are: - 6LoWPAN Generic: - lowpan_priv - LinkLayer 6LoWPAN: - BTLE - lowpan_dev - 802.15.4: - lowpan_dev_info the new naming scheme with this patch will be: - 6LoWPAN Generic: - lowpan_dev - LinkLayer 6LoWPAN: - BTLE - lowpan_btle_dev - 802.15.4: - lowpan_802154_dev Signed-off-by: Alexander Aring <[email protected]> Reviewed-by: Stefan Schmidt<[email protected]> Acked-by: Jukka Rissanen <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-04-13ieee802154: 6lowpan: fix short addr hashAlexander Aring1-1/+1
The short address is unique in combination with the panid. This patch will add the panid for generating an ieee802154 address hash. Reviewed-by: Stefan Schmidt <[email protected]> Signed-off-by: Alexander Aring <[email protected]> Acked-by: Jukka Rissanen <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-04-13nl802154: avoid address change while running lowpanAlexander Aring1-0/+10
The generation of autoconfigured IPv6 link-local addresses starts with a notification on interface up. To handle autoconfiguration according to RFC 4944 requires pan id and short address to generate an autoconfigured link-local address. This patch will avoid changing of these link-layer address configuration while lowpan interface is up. Reviewed-by: Stefan Schmidt <[email protected]> Signed-off-by: Alexander Aring <[email protected]> Acked-by: Jukka Rissanen <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-04-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller6-36/+273
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains the first batch of Netfilter updates for your net-next tree. 1) Define pr_fmt() in nf_conntrack, from Weongyo Jeong. 2) Define and register netfilter's afinfo for the bridge family, this comes in preparation for native nfqueue's bridge for nft, from Stephane Bryant. 3) Add new attributes to store layer 2 and VLAN headers to nfqueue, also from Stephane Bryant. 4) Parse new NFQA_VLAN and NFQA_L2HDR nfqueue netlink attributes coming from userspace, from Stephane Bryant. 5) Use net->ipv6.devconf_all->hop_limit instead of hardcoded hop_limit in IPv6 SYNPROXY, from Liping Zhang. 6) Remove unnecessary check for dst == NULL in nf_reject_ipv6, from Haishuang Yan. 7) Deinline ctnetlink event report functions, from Florian Westphal. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-04-13netfilter: ebtables: Fix extension lookup with identical namePhil Sutter1-1/+5
If a requested extension exists as module and is not loaded, ebt_check_match() might accidentally use an NFPROTO_UNSPEC one with same name and fail. Reproduced with limit match: Given xt_limit and ebt_limit both built as module, the following would fail: modprobe xt_limit ebtables -I INPUT --limit 1/s -j ACCEPT The fix is to make ebt_check_match() distrust a found NFPROTO_UNSPEC extension and retry after requesting an appropriate module. Cc: Florian Westphal <[email protected]> Signed-off-by: Phil Sutter <[email protected]> Acked-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-04-12netfilter: conntrack: move expectation event helper to ecache.cFlorian Westphal1-0/+30
Not performance critical, it is only invoked when an expectation is added/destroyed. While at it, kill unused nf_ct_expect_event() wrapper. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-04-12netfilter: conntrack: de-inline nf_conntrack_eventmask_reportFlorian Westphal1-0/+54
Way too large; move it to nf_conntrack_ecache.c. Reduces total object size by 1216 byte on my machine. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-04-12Merge branch 'for-upstream' of ↵David S. Miller4-6/+26
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-04-12 Here's a set of Bluetooth & 802.15.4 patches intended for the 4.7 kernel: - Fix for race condition in vhci driver - Memory leak fix for ieee802154/adf7242 driver - Improvements to deal with single-mode (LE-only) Bluetooth controllers - Fix for allowing the BT_SECURITY_FIPS security level - New BCM2E71 ACPI ID - NULL pointer dereference fix fox hci_ldisc driver Let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-04-12cfg80211: remove enum ieee80211_bandJohannes Berg35-205/+205
This enum is already perfectly aliased to enum nl80211_band, and the only reason for it is that we get IEEE80211_NUM_BANDS out of it. There's no really good reason to not declare the number of bands in nl80211 though, so do that and remove the cfg80211 one. Signed-off-by: Johannes Berg <[email protected]>
2016-04-12nl80211: check netlink protocol in socket release notificationDmitry Ivanov1-1/+1
A non-privileged user can create a netlink socket with the same port_id as used by an existing open nl80211 netlink socket (e.g. as used by a hostapd process) with a different protocol number. Closing this socket will then lead to the notification going to nl80211's socket release notification handler, and possibly cause an action such as removing a virtual interface. Fix this issue by checking that the netlink protocol is NETLINK_GENERIC. Since generic netlink has no notifier chain of its own, we can't fix the problem more generically. Fixes: 026331c4d9b5 ("cfg80211/mac80211: allow registering for and sending action frames") Cc: [email protected] Signed-off-by: Dmitry Ivanov <[email protected]> [rewrite commit message] Signed-off-by: Johannes Berg <[email protected]>
2016-04-11KEYS: Add a facility to restrict new links into a keyringDavid Howells2-3/+3
Add a facility whereby proposed new links to be added to a keyring can be vetted, permitting them to be rejected if necessary. This can be used to block public keys from which the signature cannot be verified or for which the signature verification fails. It could also be used to provide blacklisting. This affects operations like add_key(), KEYCTL_LINK and KEYCTL_INSTANTIATE. To this end: (1) A function pointer is added to the key struct that, if set, points to the vetting function. This is called as: int (*restrict_link)(struct key *keyring, const struct key_type *key_type, unsigned long key_flags, const union key_payload *key_payload), where 'keyring' will be the keyring being added to, key_type and key_payload will describe the key being added and key_flags[*] can be AND'ed with KEY_FLAG_TRUSTED. [*] This parameter will be removed in a later patch when KEY_FLAG_TRUSTED is removed. The function should return 0 to allow the link to take place or an error (typically -ENOKEY, -ENOPKG or -EKEYREJECTED) to reject the link. The pointer should not be set directly, but rather should be set through keyring_alloc(). Note that if called during add_key(), preparse is called before this method, but a key isn't actually allocated until after this function is called. (2) KEY_ALLOC_BYPASS_RESTRICTION is added. This can be passed to key_create_or_update() or key_instantiate_and_link() to bypass the restriction check. (3) KEY_FLAG_TRUSTED_ONLY is removed. The entire contents of a keyring with this restriction emplaced can be considered 'trustworthy' by virtue of being in the keyring when that keyring is consulted. (4) key_alloc() and keyring_alloc() take an extra argument that will be used to set restrict_link in the new key. This ensures that the pointer is set before the key is published, thus preventing a window of unrestrictedness. Normally this argument will be NULL. (5) As a temporary affair, keyring_restrict_trusted_only() is added. It should be passed to keyring_alloc() as the extra argument instead of setting KEY_FLAG_TRUSTED_ONLY on a keyring. This will be replaced in a later patch with functions that look in the appropriate places for authoritative keys. Signed-off-by: David Howells <[email protected]> Reviewed-by: Mimi Zohar <[email protected]>
2016-04-11net: vrf: Fix dev refcnt leak due to IPv6 prefix routeDavid Ahern1-0/+10
ifupdown2 found a kernel bug with IPv6 routes and movement from the main table to the VRF table. Sequence of events: Create the interface and add addresses: ip link add dev eth4.105 link eth4 type vlan id 105 ip addr add dev eth4.105 8.105.105.10/24 ip -6 addr add dev eth4.105 2008:105:105::10/64 At this point IPv6 has inserted a prefix route in the main table even though the interface is 'down'. From there the VRF device is created: ip link add dev vrf105 type vrf table 105 ip addr add dev vrf105 9.9.105.10/32 ip -6 addr add dev vrf105 2000:9:105::10/128 ip link set vrf105 up Then the interface is enslaved, while still in the 'down' state: ip link set dev eth4.105 master vrf105 Since the device is down the VRF driver cycling the device does not send the NETDEV_UP and NETDEV_DOWN but rather the NETDEV_CHANGE event which does not flush the routes inserted prior. When the link is brought up ip link set dev eth4.105 up the prefix route is added in the VRF table, but does not remove the route from the main table. Fix by handling the NETDEV_CHANGEUPPER event similar what was implemented for IPv4 in 7f49e7a38b77 ("net: Flush local routes when device changes vrf association") Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11net: vrf: Fix dst reference countingDavid Ahern2-6/+8
Vivek reported a kernel exception deleting a VRF with an active connection through it. The root cause is that the socket has a cached reference to a dst that is destroyed. Converting the dst_destroy to dst_release and letting proper reference counting kick in does not work as the dst has a reference to the device which needs to be released as well. I talked to Hannes about this at netdev and he pointed out the ipv4 and ipv6 dst handling has dst_ifdown for just this scenario. Rather than continuing with the reinvented dst wheel in VRF just remove it and leverage the ipv4 and ipv6 versions. Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver") Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11rxrpc: Create a null security type and get rid of conditional callsDavid Howells9-61/+105
Create a null security type for security index 0 and get rid of all conditional calls to the security operations. We expect normally to be using security, so this should be of little negative impact. Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11rxrpc: Absorb the rxkad security moduleDavid Howells6-134/+85
Absorb the rxkad security module into the af_rxrpc module so that there's only one module file. This avoids a circular dependency whereby rxkad pins af_rxrpc and cached connections pin rxkad but can't be manually evicted (they will expire eventually and cease pinning). With this change, af_rxrpc can just be unloaded, despite having cached connections. Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11rxrpc: Don't assume transport address family and size when using itDavid Howells2-4/+4
Don't assume transport address family and size when using the peer address to send a packet. Instead, use the start of the transport address rather than any particular element of the union and use the transport address length noted inside the sockaddr_rxrpc struct. This will be necessary when IPv6 support is introduced. Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11rxrpc: Don't pass gfp around in incoming call handling functionsDavid Howells4-12/+9
Don't pass gfp around in incoming call handling functions, but rather hard code it at the points where we actually need it since the value comes from within the rxrpc driver and is always the same. Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11rxrpc: Differentiate local and remote abort codes in structsDavid Howells8-21/+37
In the rxrpc_connection and rxrpc_call structs, there's one field to hold the abort code, no matter whether that value was generated locally to be sent or was received from the peer via an abort packet. Split the abort code fields in two for cleanliness sake and add an error field to hold the Linux error number to the rxrpc_call struct too (sometimes this is generated in a context where we can't return it to userspace directly). Furthermore, add a skb mark to indicate a packet that caused a local abort to be generated so that recvmsg() can pick up the correct abort code. A future addition will need to be to indicate to userspace the difference between aborts via a control message. Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11rxrpc: Static arrays of strings should be const char *const[]David Howells2-2/+2
Static arrays of strings should be const char *const[]. Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11rxrpc: Move some miscellaneous bits out into their own fileDavid Howells5-84/+106
Move some miscellaneous bits out into their own file to make it easier to split the call handling. Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11rxrpc: Disable a debugging statement that has been left enabled.David Howells1-1/+1
Disable a debugging statement that has been left enabled Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11rxrpc: do not pull udp headers on receiveWillem de Bruijn1-2/+2
Commit e6afc8ace6dd modified the udp receive path by pulling the udp header before queuing an skbuff onto the receive queue. Rxrpc also calls skb_recv_datagram to dequeue an skb from a udp socket. Modify this receive path to also no longer expect udp headers. Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Signed-off-by: Willem de Bruijn <[email protected]> Tested-by: Thierry Reding <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11sunrpc: do not pull udp headers on receiveWillem de Bruijn3-7/+5
Commit e6afc8ace6dd modified the udp receive path by pulling the udp header before queuing an skbuff onto the receive queue. Sunrpc also calls skb_recv_datagram to dequeue an skb from a udp socket. Modify this receive path to also no longer expect udp headers. Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Reported-by: Franklin S Cooper Jr. <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Tested-by: Thierry Reding <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11tipc: purge deferred updates from dead nodesErik Hugne1-0/+19
If a peer node becomes unavailable, in addition to removing the nametable entries from this node we also need to purge all deferred updates associated with this node. Signed-off-by: Erik Hugne <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11tipc: make dist queue pernetErik Hugne3-9/+11
Nametable updates received from the network that cannot be applied immediately are placed on a defer queue. This queue is global to the TIPC module, which might cause problems when using TIPC in containers. To prevent nametable updates from escaping into the wrong namespace, we make the queue pernet instead. Signed-off-by: Erik Hugne <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11net: ipv4: Consider failed nexthops in multipath routesDavid Ahern2-5/+40
Multipath route lookups should consider knowledge about next hops and not select a hop that is known to be failed. Example: [h2] [h3] 15.0.0.5 | | 3| 3| [SP1] [SP2]--+ 1 2 1 2 | | /-------------+ | | \ / | | X | | / \ | | / \---------------\ | 1 2 1 2 12.0.0.2 [TOR1] 3-----------------3 [TOR2] 12.0.0.3 4 4 \ / \ / \ / -------| |-----/ 1 2 [TOR3] 3| | [h1] 12.0.0.1 host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5: root@h1:~# ip ro ls ... 12.0.0.0/24 dev swp1 proto kernel scope link src 12.0.0.1 15.0.0.0/16 nexthop via 12.0.0.2 dev swp1 weight 1 nexthop via 12.0.0.3 dev swp1 weight 1 ... If the link between tor3 and tor1 is down and the link between tor1 and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and ssh 15.0.0.5 gets the other. Connections that attempt to use the 12.0.0.2 nexthop fail since that neighbor is not reachable: root@h1:~# ip neigh show ... 12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE 12.0.0.2 dev swp1 FAILED ... The failed path can be avoided by considering known neighbor information when selecting next hops. If the neighbor lookup fails we have no knowledge about the nexthop, so give it a shot. If there is an entry then only select the nexthop if the state is sane. This is similar to what fib_detect_death does. To maintain backward compatibility use of the neighbor information is based on a new sysctl, fib_multipath_use_neigh. Signed-off-by: David Ahern <[email protected]> Reviewed-by: Julian Anastasov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-11->getxattr(): pass dentry and inode as separate argumentsAl Viro1-1/+1
Signed-off-by: Al Viro <[email protected]>
2016-04-10netlink: don't send NETLINK_URELEASE for unbound socketsDmitry Ivanov1-1/+1
All existing users of NETLINK_URELEASE use it to clean up resources that were previously allocated to a socket via some command. As a result, no users require getting this notification for unbound sockets. Sending it for unbound sockets, however, is a problem because any user (including unprivileged users) can create a socket that uses the same ID as an existing socket. Binding this new socket will fail, but if the NETLINK_URELEASE notification is generated for such sockets, the users thereof will be tricked into thinking the socket that they allocated the resources for is closed. In the nl80211 case, this will cause destruction of virtual interfaces that still belong to an existing hostapd process; this is the case that Dmitry noticed. In the NFC case, it will cause a poll abort. In the case of netlink log/queue it will cause them to stop reporting events, as if NFULNL_CFG_CMD_UNBIND/NFQNL_CFG_CMD_UNBIND had been called. Fix this problem by checking that the socket is bound before generating the NETLINK_URELEASE notification. Cc: [email protected] Signed-off-by: Dmitry Ivanov <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-10decnet: Do not build routes to devices without decnet private data.David S. Miller1-1/+8
In particular, make sure we check for decnet private presence for loopback devices. Signed-off-by: David S. Miller <[email protected]>
2016-04-10sctp: avoid refreshing heartbeat timer too oftenMarcelo Ricardo Leitner4-33/+40
Currently on high rate SCTP streams the heartbeat timer refresh can consume quite a lot of resources as timer updates are costly and it contains a random factor, which a) is also costly and b) invalidates mod_timer() optimization for not editing a timer to the same value. It may even cause the timer to be slightly advanced, for no good reason. As suggested by David Laight this patch now removes this timer update from hot path by leaving the timer on and re-evaluating upon its expiration if the heartbeat is still needed or not, similarly to what is done for TCP. If it's not needed anymore the timer is re-scheduled to the new timeout, considering the time already elapsed. For this, we now record the last tx timestamp per transport, updated in the same spots as hb timer was restarted on tx. Also split up sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the heartbeat one. On loopback with MTU of 65535 and data chunks with 1636, so that we have a considerable amount of chunks without stressing system calls, netperf -t SCTP_STREAM -l 30, perf looked like this before: Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000 Overhead Command Shared Object Symbol + 6,15% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string - 5,43% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore - _raw_write_unlock_irqrestore - 96,54% _raw_spin_unlock_irqrestore - 36,14% mod_timer + 97,24% sctp_transport_reset_timers + 2,76% sctp_do_sm + 33,65% __wake_up_sync_key + 28,77% sctp_ulpq_tail_event + 1,40% del_timer - 1,84% mod_timer + 99,03% sctp_transport_reset_timers + 0,97% sctp_do_sm + 1,50% sctp_ulpq_tail_event And after this patch, now with netperf -l 60: Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000 Overhead Command Shared Object Symbol + 5,65% netperf [kernel.vmlinux] [k] memcpy_erms + 5,59% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string - 5,05% netperf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore - _raw_spin_unlock_irqrestore + 49,89% __wake_up_sync_key + 45,68% sctp_ulpq_tail_event - 2,85% mod_timer + 76,51% sctp_transport_reset_t3_rtx + 23,49% sctp_do_sm + 1,55% del_timer + 2,50% netperf [sctp] [k] sctp_datamsg_from_user + 2,26% netperf [sctp] [k] sctp_sendmsg Throughput-wise, from 6800mbps without the patch to 7050mbps with it, ~3.7%. Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller34-110/+203
2016-04-09ipv6: fix inet6_lookup_listener()Eric Dumazet1-2/+2
A stupid refactoring bug in inet6_lookup_listener() needs to be fixed in order to get proper SO_REUSEPORT behavior. Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Maciej Żenczykowski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds24-51/+144
Pull networking fixes from David Miller: 1) Stale SKB data pointer access across pskb_may_pull() calls in L2TP, from Haishuang Yan. 2) Fix multicast frame handling in mac80211 AP code, from Felix Fietkau. 3) mac80211 station hashtable insert errors not handled properly, fix from Johannes Berg. 4) Fix TX descriptor count limit handling in e1000, from Alexander Duyck. 5) Revert a buggy netdev refcount fix in netpoll, from Bjorn Helgaas. 6) Must assign rtnl_link_ops of the device before registering it, fix in ip6_tunnel from Thadeu Lima de Souza Cascardo. 7) Memory leak fix in tc action net exit, from WANG Cong. 8) Add missing AF_KCM entries to name tables, from Dexuan Cui. 9) Fix regression in GRE handling of csums wrt. FOU, from Alexander Duyck. 10) Fix memory allocation alignment and congestion map corruption in RDS, from Shamir Rabinovitch. 11) Fix default qdisc regression in tuntap driver, from Jason Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) bridge, netem: mark mailing lists as moderated tuntap: restore default qdisc mpls: find_outdev: check for err ptr in addition to NULL check ipv6: Count in extension headers in skb->network_header RDS: fix congestion map corruption for PAGE_SIZE > 4k RDS: memory allocated must be align to 8 GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU net: add the AF_KCM entries to family name tables MAINTAINERS: intel-wired-lan list is moderated lib/test_bpf: Add additional BPF_ADD tests lib/test_bpf: Add test to check for result of 32-bit add that overflows lib/test_bpf: Add tests for unsigned BPF_JGT lib/test_bpf: Fix JMP_JSET tests VSOCK: Detach QP check should filter out non matching QPs. stmmac: fix adjust link call in case of a switch is attached af_packet: tone down the Tx-ring unsupported spew. net_sched: fix a memory leak in tc action samples/bpf: Enable powerpc support samples/bpf: Use llc in PATH, rather than a hardcoded value samples/bpf: Fix build breakage with map_perf_test_user.c ...
2016-04-08net: dsa: make the VLAN add function return voidVivien Didelot1-8/+3
The switchdev design implies that a software error should not happen in the commit phase since it must have been previously reported in the prepare phase. If an hardware error occurs during the commit phase, there is nothing switchdev can do about it. The DSA layer separates port_vlan_prepare and port_vlan_add for simplicity and convenience. If an hardware error occurs during the commit phase, there is no need to report it outside the driver itself. Make the DSA port_vlan_add routine return void for explicitness. Signed-off-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-08net: dsa: make the FDB add function return voidVivien Didelot1-8/+8
The switchdev design implies that a software error should not happen in the commit phase since it must have been previously reported in the prepare phase. If an hardware error occurs during the commit phase, there is nothing switchdev can do about it. The DSA layer separates port_fdb_prepare and port_fdb_add for simplicity and convenience. If an hardware error occurs during the commit phase, there is no need to report it outside the DSA driver itself. Make the DSA port_fdb_add routine return void for explicitness. Signed-off-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-08net: dsa: make the STP state function return voidVivien Didelot1-17/+15
The DSA layer doesn't care about the return code of the port_stp_update routine, so make it void in the layer and the DSA drivers. Replace the useless dsa_slave_stp_update function with a dsa_slave_stp_state function used to reply to the switchdev SWITCHDEV_ATTR_ID_PORT_STP_STATE attribute. In the meantime, rename port_stp_update to port_stp_state_set to explicit the state change. Signed-off-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-04-08Merge tag 'mac80211-next-for-davem-2016-04-06' of ↵David S. Miller39-984/+1809
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== For the 4.7 cycle, we have a number of changes: * Bob's mesh mode rhashtable conversion, this includes the rhashtable API change for allocation flags * BSSID scan, connect() command reassoc support (Jouni) * fast (optimised data only) and support for RSS in mac80211 (myself) * various smaller changes ==================== Signed-off-by: David S. Miller <[email protected]>