aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2013-11-06net: Explicitly initialize u64_stats_sync structures for lockdepJohn Stultz12-5/+137
In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz <[email protected]> Acked-by: Julian Anastasov <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Alexey Kuznetsov <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Hideaki YOSHIFUJI <[email protected]> Cc: James Morris <[email protected]> Cc: Jesse Gross <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Mirko Lindner <[email protected]> Cc: Patrick McHardy <[email protected]> Cc: Roger Luethi <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Simon Horman <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Petazzoni <[email protected]> Cc: Wensong Zhang <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-11-05ipv6: drop the judgement in rt6_alloc_cow()Duan Jiong1-5/+3
Now rt6_alloc_cow() is only called by ip6_pol_route() when rt->rt6i_flags doesn't contain both RTF_NONEXTHOP and RTF_GATEWAY, and rt->rt6i_flags hasn't been changed in ip6_rt_copy(). So there is no neccessary to judge whether rt->rt6i_flags contains RTF_GATEWAY or not. Signed-off-by: Duan Jiong <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-05ipv6: fix headroom calculation in udp6_ufo_fragmentHannes Frederic Sowa1-1/+1
Commit 1e2bd517c108816220f262d7954b697af03b5f9c ("udp6: Fix udp fragmentation for tunnel traffic.") changed the calculation if there is enough space to include a fragment header in the skb from a skb->mac_header dervived one to skb_headroom. Because we already peeled off the skb to transport_header this is wrong. Change this back to check if we have enough room before the mac_header. This fixes a panic Saran Neti reported. He used the tbf scheduler which skb_gso_segments the skb. The offsets get negative and we panic in memcpy because the skb was erroneously not expanded at the head. Reported-by: Saran Neti <[email protected]> Cc: Pravin B Shelar <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-05ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACEHannes Frederic Sowa5-5/+11
Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery, their sockets won't accept and install new path mtu information and they will always use the interface mtu for outgoing packets. It is guaranteed that the packet is not fragmented locally. But we won't set the DF-Flag on the outgoing frames. Florian Weimer had the idea to use this flag to ensure DNS servers are never generating outgoing fragments. They may well be fragmented on the path, but the server never stores or usees path mtu values, which could well be forged in an attack. (The root of the problem with path MTU discovery is that there is no reliable way to authenticate ICMP Fragmentation Needed But DF Set messages because they are sent from intermediate routers with their source addresses, and the IMCP payload will not always contain sufficient information to identify a flow.) Recent research in the DNS community showed that it is possible to implement an attack where DNS cache poisoning is feasible by spoofing fragments. This work was done by Amir Herzberg and Haya Shulman: <https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf> This issue was previously discussed among the DNS community, e.g. <http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>, without leading to fixes. This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK" for the enforcement of the non-fragmentable checks. If other users than ip_append_page/data should use this semantic too, we have to add a new flag to IPCB(skb)->flags to suppress local fragmentation and check for this in ip_finish_output. Many thanks to Florian Weimer for the idea and feedback while implementing this patch. Cc: David S. Miller <[email protected]> Suggested-by: Florian Weimer <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-05Merge branch 'for-upstream' of ↵John W. Linville14-900/+1434
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
2013-11-05Merge branch 'for-john' of ↵John W. Linville2-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
2013-11-05Merge branch 'for-john' of ↵John W. Linville32-446/+1205
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Conflicts: net/wireless/reg.c
2013-11-05ipv6: remove old conditions on flow label sharingFlorent Fourcot1-33/+0
The code of flow label in Linux Kernel follows the rules of RFC 1809 (an informational one) for conditions on flow label sharing. There rules are not in the last proposed standard for flow label (RFC 6437), or in the previous one (RFC 3697). Since this code does not follow any current or old standard, we can remove it. With this removal, the ipv6_opt_cmp function is now a dead code and it can be removed too. Changelog to v1: * add justification for the change * remove the condition on IPv6 options [ Remove ipv6_hdr_cmp and it is now unused as well. -DaveM ] Signed-off-by: Florent Fourcot <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-05Merge branch 'for-davem' of ↵David S. Miller34-745/+4116
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== Please accept the following pull request intended for the 3.13 tree... I had intended to pass most of these to you as much as two weeks ago. Unfortunately, I failed to account for the effects of bad Internet connections and my own fatique/laziness while traveling. On the bright side, at least these have been baking in linux-next for some time! For the mac80211 bits, Johannes says: "This time I have two fixes for P2P (which requires not using CCK rates) and a workaround for APs with broken WMM information." For the iwlwifi bits, Johannes says: "I have a few fixes for warnings/issues: one from Alex, fixing scan timings, one from Emmanuel fixing a WARN_ON in the DVM driver, one from Stanislaw removing a trigger-happy WARN_ON in the MVM driver and a change from myself to try to recover when the device isn't processing commands quickly." And: "For this round, I have a lot of changes: * power management improvements * BT coexistence improvements/updates * new device support * VHT support * IBSS support (though due to a small bug it requires new firmware) * various other fixes/improvements." For the Bluetooth bits, Gustavo says: "More patches for 3.12, busy times for Bluetooth. More than a 100 commits since the last pull. The bulk of work comes from Johan and Marcel, they are doing fixes and improvements all over the Bluetooth subsystem, as the diffstat can show." For the ath10k and ath6kl bits, Kalle says: "Bartosz added support to ath10k for our 10.x AP firmware branch, which gives us AP specific features and fixes. We still support the main firmware branch as well just like before, ath10k detects runtime what firmware is used. Unfortunately the firmware interface in 10.x branch is somewhat different so there was quite a lot of changes in ath10k for this. Michal and Sujith did some performance improvements in ath10k. Vladimir fixed a compiler warning and Fengguang removed an extra semicolon." For the NFC bits, Samuel says: "It's a fairly big one, with the following highlights: - NFC digital layer implementation: Most NFC chipsets implement the NFC digital layer in firmware, but others have more basic functionalities and expect the host to implement the digital layer. This layer sits below the NFC core. - Sony's port100 support: This is "soft" NFC USB dongle that expects the digital layer to be implemented on the host. This is the first user of our NFC digital stack implementation. - Secure element API: We now provide a netlink API for enabling, disabling and discovering NFC attached (embedded or UICC ones) secure elements. With some userspace help, this allows us to support NFC payments. Only the pn544 driver currently supports that API. - NCI SPI fixes and improvements: In order to support NCI devices over SPI, we fixed and improved our NCI/SPI implementation. The currently most deployed NFC NCI chipset, Broadcom's bcm2079x, supports that mode and we're planning to use our NCI/SPI framework to implement a driver for it. - pn533 fragmentation support in target mode: This was the only missing feature from our pn533 impementation. We now support fragmentation in both Tx and Rx modes, in target mode." On top of all that, brcmfmac and rt2x00 both get the usual flurry of updates. A few other drivers get hit here or there as well. ==================== Signed-off-by: David S. Miller <[email protected]>
2013-11-04net: introduce skb_coalesce_rx_frag()Jason Wang1-0/+12
Sometimes we need to coalesce the rx frags to avoid frag list. One example is virtio-net driver which tries to use small frags for both MTU sized packet and GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this. Cc: Rusty Russell <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Michael Dalton <[email protected]> Cc: Eric Dumazet <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Jason Wang <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-04tcp: properly handle stretch acks in slow startYuchung Cheng14-67/+56
Slow start now increases cwnd by 1 if an ACK acknowledges some packets, regardless the number of packets. Consequently slow start performance is highly dependent on the degree of the stretch ACKs caused by receiver or network ACK compression mechanisms (e.g., delayed-ACK, GRO, etc). But slow start algorithm is to send twice the amount of packets of packets left so it should process a stretch ACK of degree N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A follow up patch will use the remainder of the N (if greater than 1) to adjust cwnd in the congestion avoidance phase. In addition this patch retires the experimental limited slow start (LSS) feature. LSS has multiple drawbacks but questionable benefit. The fractional cwnd increase in LSS requires a loop in slow start even though it's rarely used. Configuring such an increase step via a global sysctl on different BDPS seems hard. Finally and most importantly the slow start overshoot concern is now better covered by the Hybrid slow start (hystart) enabled by default. Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-04tcp: enable sockets to use MSG_FASTOPEN by defaultYuchung Cheng1-1/+1
Applications have started to use Fast Open (e.g., Chrome browser has such an optional flag) and the feature has gone through several generations of kernels since 3.7 with many real network tests. It's time to enable this flag by default for applications to test more conveniently and extensively. Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-04Merge branch 'master' of ↵David S. Miller5-11/+52
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: ==================== This batch contains fives nf_tables patches for your net-next tree, they are: * Fix possible use after free in the module removal path of the x_tables compatibility layer, from Dan Carpenter. * Add filter chain type for the bridge family, from myself. * Fix Kconfig dependencies of the nf_tables bridge family with the core, from myself. * Fix sparse warnings in nft_nat, from Tomasz Bursztyka. * Remove duplicated include in the IPv4 family support for nf_tables, from Wei Yongjun. ==================== Signed-off-by: David S. Miller <[email protected]>
2013-11-04Merge branch 'master' of ↵David S. Miller20-193/+280
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== This is another batch containing Netfilter/IPVS updates for your net-next tree, they are: * Six patches to make the ipt_CLUSTERIP target support netnamespace, from Gao feng. * Two cleanups for the nf_conntrack_acct infrastructure, introducing a new structure to encapsulate conntrack counters, from Holger Eitzenberger. * Fix missing verdict in SCTP support for IPVS, from Daniel Borkmann. * Skip checksum recalculation in SCTP support for IPVS, also from Daniel Borkmann. * Fix behavioural change in xt_socket after IP early demux, from Florian Westphal. * Fix bogus large memory allocation in the bitmap port set type in ipset, from Jozsef Kadlecsik. * Fix possible compilation issues in the hash netnet set type in ipset, also from Jozsef Kadlecsik. * Define constants to identify netlink callback data in ipset dumps, again from Jozsef Kadlecsik. * Use sock_gen_put() in xt_socket to replace xt_socket_put_sk, from Eric Dumazet. * Improvements for the SH scheduler in IPVS, from Alexander Frolkin. * Remove extra delay due to unneeded rcu barrier in IPVS net namespace cleanup path, from Julian Anastasov. * Save some cycles in ip6t_REJECT by skipping checksum validation in packets leaving from our stack, from Stanislav Fomichev. * Fix IPVS_CMD_ATTR_MAX definition in IPVS, larger that required, from Julian Anastasov. ==================== Signed-off-by: David S. Miller <[email protected]>
2013-11-04netfilter: nft_compat: use _safe version of list_for_eachDan Carpenter1-4/+4
We need to use the _safe version of list_for_each_entry() here otherwise we have a use after free bug. Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-11-04Merge branch 'master' of ↵David S. Miller12-2288/+2496
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== Open vSwitch A set of updates for net-next/3.13. Major changes are: * Restructure flow handling code to be more logically organized and easier to read. * Rehashing of the flow table is moved from a workqueue to flow installation time. Before, heavy load could block the workqueue for excessive periods of time. * Additional debugging information is provided to help diagnose megaflows. * It's now possible to match on TCP flags. ==================== Signed-off-by: David S. Miller <[email protected]>
2013-11-04net: checksum: fix warning in skb_checksumDaniel Borkmann1-1/+1
This patch fixes a build warning in skb_checksum() by wrapping the csum_partial() usage in skb_checksum(). The problem is that on a few architectures, csum_partial is used with prefix asmlinkage whereas on most architectures it's not. So fix this up generically as we did with csum_block_add_ext() to match the signature. Introduced by 2817a336d4d ("net: skb_checksum: allow custom update/combine for walking skb"). Reported-by: Fengguang Wu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-04Merge branch 'master' of ↵John W. Linville34-745/+4116
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/net/wireless/brcm80211/brcmfmac/sdio_host.h
2013-11-04Merge branch 'master' of ↵John W. Linville15-31/+164
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless Conflicts: drivers/net/wireless/iwlwifi/pcie/drv.c
2013-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller26-96/+153
Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <[email protected]>
2013-11-04net: sctp: do not trigger BUG_ON in sctp_cmd_delete_tcbDaniel Borkmann1-1/+0
Introduced in f9e42b853523 ("net: sctp: sideeffect: throw BUG if primary_path is NULL"), we intended to find a buggy assoc that's part of the assoc hash table with a primary_path that is NULL. However, we better remove the BUG_ON for now and find a more suitable place to assert for these things as Mark reports that this also triggers the bug when duplication cookie processing happens, and the assoc is not part of the hash table (so all good in this case). Such a situation can for example easily be reproduced by: tc qdisc add dev eth0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1 tc qdisc add dev eth0 parent 1:2 handle 20: netem loss 20% tc filter add dev eth0 protocol ip parent 1: prio 2 u32 match ip \ protocol 132 0xff match u8 0x0b 0xff at 32 flowid 1:2 This drops 20% of COOKIE-ACK packets. After some follow-up discussion with Vlad we came to the conclusion that for now we should still better remove this BUG_ON() assertion, and come up with two follow-ups later on, that is, i) find a more suitable place for this assertion, and possibly ii) have a special allocator/initializer for such kind of temporary assocs. Reported-by: Mark Thomas <[email protected]> Signed-off-by: Vlad Yasevich <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Neil Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-03net/hsr: Add support for the High-availability Seamless Redundancy protocol ↵Arvid Brodin12-0/+2339
(HSRv0) High-availability Seamless Redundancy ("HSR") provides instant failover redundancy for Ethernet networks. It requires a special network topology where all nodes are connected in a ring (each node having two physical network interfaces). It is suited for applications that demand high availability and very short reaction time. HSR acts on the Ethernet layer, using a registered Ethernet protocol type to send special HSR frames in both directions over the ring. The driver creates virtual network interfaces that can be used just like any ordinary Linux network interface, for IP/TCP/UDP traffic etc. All nodes in the network ring must be HSR capable. This code is a "best effort" to comply with the HSR standard as described in IEC 62439-3:2010 (HSRv0). Signed-off-by: Arvid Brodin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-03net: extend net_device allocation to vmalloc()Eric Dumazet2-6/+18
Joby Poriyath provided a xen-netback patch to reduce the size of xenvif structure as some netdev allocation could fail under memory pressure/fragmentation. This patch is handling the problem at the core level, allowing any netdev structures to use vmalloc() if kmalloc() failed. As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT to kzalloc() flags to do this fallback only when really needed. Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Joby Poriyath <[email protected]> Cc: Ben Hutchings <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-03net: sctp: fix and consolidate SCTP checksumming codeDaniel Borkmann1-8/+1
This fixes an outstanding bug found through IPVS, where SCTP packets with skb->data_len > 0 (non-linearized) and empty frag_list, but data accumulated in frags[] member, are forwarded with incorrect checksum letting SCTP initial handshake fail on some systems. Linearizing each SCTP skb in IPVS to prevent that would not be a good solution as this leads to an additional and unnecessary performance penalty on the load-balancer itself for no good reason (as we actually only want to update the checksum, and can do that in a different/better way presented here). The actual problem is elsewhere, namely, that SCTP's checksumming in sctp_compute_cksum() does not take frags[] into account like skb_checksum() does. So while we are fixing this up, we better reuse the existing code that we have anyway in __skb_checksum() and use it for walking through the data doing checksumming. This will not only fix this issue, but also consolidates some SCTP code with core sk_buff code, bringing it closer together and removing respectively avoiding reimplementation of skb_checksum() for no good reason. As crc32c() can use hardware implementation within the crypto layer, we leave that intact (it wraps around / falls back to e.g. slice-by-8 algorithm in __crc32c_le() otherwise); plus use the __crc32c_le_combine() combinator for crc32c blocks. Also, we remove all other SCTP checksumming code, so that we only have to use sctp_compute_cksum() from now on; for doing that, we need to transform SCTP checkumming in output path slightly, and can leave the rest intact. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-03net: skb_checksum: allow custom update/combine for walking skbDaniel Borkmann1-10/+21
Currently, skb_checksum walks over 1) linearized, 2) frags[], and 3) frag_list data and calculats the one's complement, a 32 bit result suitable for feeding into itself or csum_tcpudp_magic(), but unsuitable for SCTP as we're calculating CRC32c there. Hence, in order to not re-implement the very same function in SCTP (and maybe other protocols) over and over again, use an update() + combine() callback internally to allow for walking over the skb with different algorithms. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-03netfilter: nf_tables: remove duplicated include from nf_tables_ipv4.cWei Yongjun1-1/+0
Remove duplicated include. Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-11-03netfilter: ctnetlink: account both directions in one stepHolger Eitzenberger1-25/+24
With the intent to dump other accounting data later. This patch is a cleanup. Signed-off-by: Holger Eitzenberger <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-11-03netfilter: introduce nf_conn_acct structureHolger Eitzenberger4-20/+30
Encapsulate counters for both directions into nf_conn_acct. During that process also consistently name pointers to the extend 'acct', not 'counters'. This patch is a cleanup. Signed-off-by: Holger Eitzenberger <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-11-02net: flow_dissector: fail on evil iph->ihlJason Wang1-1/+1
We don't validate iph->ihl which may lead a dead loop if we meet a IPIP skb whose iph->ihl is zero. Fix this by failing immediately when iph->ihl is evil (less than 5). This issue were introduced by commit ec5efe7946280d1e84603389a1030ccec0a767ae (rps: support IPIP encapsulation). Cc: Eric Dumazet <[email protected]> Cc: Petr Matousek <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Daniel Borkmann <[email protected]> Signed-off-by: Jason Wang <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-02Merge branch 'master' of ↵David S. Miller4-69/+41
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Conflicts: net/xfrm/xfrm_policy.c Minor merge conflict in xfrm_policy.c, consisting of overlapping changes which were trivial to resolve. Signed-off-by: David S. Miller <[email protected]>
2013-11-02Merge branch 'master' of ↵David S. Miller3-10/+18
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== 1) Fix a possible race on ipcomp scratch buffers because of too early enabled siftirqs. From Michal Kubecek. 2) The current xfrm garbage collector threshold is too small for some workloads, resulting in bad performance on these workloads. Increase the threshold from 1024 to 32768. 3) Some codepaths might not have a dst_entry attached to the skb when calling xfrm_decode_session(). So add a check to prevent a null pointer dereference in this case. ==================== Signed-off-by: David S. Miller <[email protected]>
2013-11-01openvswitch: Use flow hash during flow lookup operation.Pravin B Shelar1-1/+1
Flow->hash can be used to detect hash collisions and avoid flow key compare in flow lookup. Signed-off-by: Pravin B Shelar <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-11-01openvswitch: TCP flags matching support.Jarno Rajahalme3-2/+33
tcp_flags=flags/mask Bitwise match on TCP flags. The flags and mask are 16-bit num‐ bers written in decimal or in hexadecimal prefixed by 0x. Each 1-bit in mask requires that the corresponding bit in port must match. Each 0-bit in mask causes the corresponding bit to be ignored. TCP protocol currently defines 9 flag bits, and additional 3 bits are reserved (must be transmitted as zero), see RFCs 793, 3168, and 3540. The flag bits are, numbering from the least significant bit: 0: FIN No more data from sender. 1: SYN Synchronize sequence numbers. 2: RST Reset the connection. 3: PSH Push function. 4: ACK Acknowledgement field significant. 5: URG Urgent pointer field significant. 6: ECE ECN Echo. 7: CWR Congestion Windows Reduced. 8: NS Nonce Sum. 9-11: Reserved. 12-15: Not matchable, must be zero. Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-11-01openvswitch: Widen TCP flags handling.Jarno Rajahalme3-7/+5
Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t). The kernel interface remains at 8 bits, which makes no functional difference now, as none of the higher bits is currently of interest to the userspace. Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-11-01openvswitch: Enable all GSO features on internal port.Pravin B Shelar1-1/+1
OVS already can handle all types of segmentation offloads that are supported by the kernel. Following patch specifically enables UDP and IPV6 segmentation offloads. Signed-off-by: Pravin B Shelar <[email protected]> Signed-off-by: Jesse Gross <[email protected]>
2013-11-01Merge branch 'linus' into sched/coreIngo Molnar61-236/+475
Resolve cherry-picking conflicts: Conflicts: mm/huge_memory.c mm/memory.c mm/mprotect.c See this upstream merge commit for more details: 52469b4fcd4f Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <[email protected]>
2013-11-01xfrm: Fix null pointer dereference when decoding sessionsSteffen Klassert2-2/+10
On some codepaths the skb does not have a dst entry when xfrm_decode_session() is called. So check for a valid skb_dst() before dereferencing the device interface index. We use 0 as the device index if there is no valid skb_dst(), or at reverse decoding we use skb_iif as device interface index. Bug was introduced with git commit bafd4bd4dc ("xfrm: Decode sessions with output interface."). Reported-by: Meelis Roos <[email protected]> Tested-by: Meelis Roos <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2013-10-31SUNRPC: Cleanup xs_destroy()Trond Myklebust1-10/+5
There is no longer any need for a separate xs_local_destroy() helper. Signed-off-by: Trond Myklebust <[email protected]>
2013-10-31SUNRPC: close a rare race in xs_tcp_setup_socket.NeilBrown1-4/+9
We have one report of a crash in xs_tcp_setup_socket. The call path to the crash is: xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested. The 'sock' passed to that last function is NULL. The only way I can see this happening is a concurrent call to xs_close: xs_close -> xs_reset_transport -> sock_release -> inet_release inet_release sets: sock->sk = NULL; inet_stream_connect calls lock_sock(sock->sk); which gets NULL. All calls to xs_close are protected by XPRT_LOCKED as are most activations of the workqueue which runs xs_tcp_setup_socket. The exception is xs_tcp_schedule_linger_timeout. So presumably the timeout queued by the later fires exactly when some other code runs xs_close(). To protect against this we can move the cancel_delayed_work_sync() call from xs_destory() to xs_close(). As xs_close is never called from the worker scheduled on ->connect_worker, this can never deadlock. Signed-off-by: NeilBrown <[email protected]> [Trond: Make it safe to call cancel_delayed_work_sync() on AF_LOCAL sockets] Signed-off-by: Trond Myklebust <[email protected]>
2013-10-306lowpan: cleanup skb copy dataAlexander Aring1-5/+8
This patch drops the direct memcpy on skb and uses the right skb memcpy functions. Also remove an unnecessary check if plen is non zero. Signed-off-by: Alexander Aring <[email protected]> Reviewed-by: Werner Almesberger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-10-306lowpan: set 6lowpan network and transport headerAlexander Aring1-0/+2
This is necessary to access network header with the skb_network_header function instead of calculate the position with mac_len, etc. Do the same for the transport header, when we replace the IPv6 header with the 6LoWPAN header. Signed-off-by: Alexander Aring <[email protected]> Acked-by: Werner Almesberger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-10-306lowpan: set and use mac_len for mac header lengthAlexander Aring2-12/+3
Set the mac header length while creating the 802.15.4 mac header. Drop the function for recalculate mac header length in upper layers which was static and works for intra pan communication only. Signed-off-by: Alexander Aring <[email protected]> Reviewed-by: Werner Almesberger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-10-306lowpan: remove unnecessary set of headersAlexander Aring1-4/+0
On receiving side we don't need to set any headers in skb because the 6LoWPAN layer do not access it. Currently these values will set twice after calling netif_rx. Signed-off-by: Alexander Aring <[email protected]> Reviewed-by: Werner Almesberger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-10-30ipv6: remove the unnecessary statement in find_match()Duan Jiong1-1/+1
After reading the function rt6_check_neigh(), we can know that the RT6_NUD_FAIL_SOFT can be returned only when the IS_ENABLE(CONFIG_IPV6_ROUTER_PREF) is false. so in function find_match(), there is no need to execute the statement !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF). Signed-off-by: Duan Jiong <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-10-30mac802154: Use pr_err(...) rather than printk(KERN_ERR ...)Chen Weilong1-4/+2
This change is inspired by checkpatch. Signed-off-by: Weilong Chen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-10-30tipc: remove two indentation levels in tipc_recv_msg routineYing Xue1-89/+84
The message dispatching part of tipc_recv_msg() is wrapped layers of while/if/if/switch, causing out-of-control indentation and does not look very good. We reduce two indentation levels by separating the message dispatching from the blocks that checks link state and sequence numbers, allowing longer function and arg names to be consistently indented without wrapping. Additionally we also rename "cont" label to "discard" and add one new label called "unlock_discard" to make code clearer. In all, these are cosmetic changes that do not alter the operation of TIPC in any way. Signed-off-by: Ying Xue <[email protected]> Reviewed-by: Erik Hugne <[email protected]> Cc: David Laight <[email protected]> Cc: Andreas Bofjäll <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-10-30SUNRPC: remove duplicated include from clnt.cWei Yongjun1-1/+0
Remove duplicated include. Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2013-10-30Merge branch 'dma_complete' into nextVinod Koul1-2/+2
2013-10-29tcp: temporarily disable Fast Open on SYN timeoutYuchung Cheng2-3/+8
Fast Open currently has a fall back feature to address SYN-data being dropped but it requires the middle-box to pass on regular SYN retry after SYN-data. This is implemented in commit aab487435 ("net-tcp: Fast Open client - detecting SYN-data drops") However some NAT boxes will drop all subsequent packets after first SYN-data and blackholes the entire connections. An example is in commit 356d7d8 "netfilter: nf_conntrack: fix tcp_in_window for Fast Open". The sender should note such incidents and fall back to use the regular TCP handshake on subsequent attempts temporarily as well: after the second SYN timeouts the original Fast Open SYN is most likely lost. When such an event recurs Fast Open is disabled based on the number of recurrences exponentially. Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-10-30net: ipvs: sctp: do not recalc sctp csum when ports didn't changeDaniel Borkmann1-6/+33
Unlike UDP or TCP, we do not take the pseudo-header into account in SCTP checksums. So in case port mapping is the very same, we do not need to recalculate the whole SCTP checksum in software, which is very expensive. Also, similarly as in TCP, take into account when a private helper mangled the packet. In that case, we also need to recalculate the checksum even if ports might be same. Thanks for feedback regarding skb->ip_summed checks from Julian Anastasov; here's a discussion on these checks for snat and dnat: * For snat_handler(), we can see CHECKSUM_PARTIAL from virtual devices, and from LOCAL_OUT, otherwise it should be CHECKSUM_UNNECESSARY. In general, in snat it is more complex. skb contains the original route and ip_vs_route_me_harder() can change the route after snat_handler. So, for locally generated replies from local server we can not preserve the CHECKSUM_PARTIAL mode. It is an chicken or egg dilemma: snat_handler needs the device after rerouting (to check for NETIF_F_SCTP_CSUM), while ip_route_me_harder() wants the snat_handler() to put the new saddr for proper rerouting. * For dnat_handler(), we should not see CHECKSUM_COMPLETE for SCTP, in fact the small set of drivers that support SCTP offloading return CHECKSUM_UNNECESSARY on correctly received SCTP csum. We can see CHECKSUM_PARTIAL from local stack or received from virtual drivers. The idea is that SCTP decides to avoid csum calculation if hardware supports offloading. IPVS can change the device after rerouting to real server but we can preserve the CHECKSUM_PARTIAL mode if the new device supports offloading too. This works because skb dst is changed before dnat_handler and we see the new device. So, checks in the 'if' part will decide whether it is ok to keep CHECKSUM_PARTIAL for the output. If the packet was with CHECKSUM_NONE, hence we deal with unknown checksum. As we recalculate the sum for IP header in all cases, it should be safe to use CHECKSUM_UNNECESSARY. We can forward wrong checksum in this case (without cp->app). In case of CHECKSUM_UNNECESSARY, the csum was valid on receive. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>