aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4/tcp_output.c
AgeCommit message (Collapse)AuthorFilesLines
2012-04-21tcp: Initial repair modePavel Emelyanov1-3/+13
This includes (according the the previous description): * TCP_REPAIR sockoption This one just puts the socket in/out of the repair mode. Allowed for CAP_NET_ADMIN and for closed/establised sockets only. When repair mode is turned off and the socket happens to be in the established state the window probe is sent to the peer to 'unlock' the connection. * TCP_REPAIR_QUEUE sockoption This one sets the queue which we're about to repair. The 'no-queue' is set by default. * TCP_QUEUE_SEQ socoption Sets the write_seq/rcv_nxt of a selected repaired queue. Allowed for TCP_CLOSE-d sockets only. When the socket changes its state the other seq-s are changed by the kernel according to the protocol rules (most of the existing code is actually reused). * Ability to forcibly bind a socket to a port The sk->sk_reuse is set to SK_FORCE_REUSE. * Immediate connect modification The connect syscall initializes the connection, then directly jumps to the code which finalizes it. * Silent close modification The close just aborts the connection (similar to SO_LINGER with 0 time) but without sending any FIN/RST-s to peer. Signed-off-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-21tcp: Move code aroundPavel Emelyanov1-2/+2
This is just the preparation patch, which makes the needed for TCP repair code ready for use. Signed-off-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-18tcp: fix retransmit of partially acked framesEric Dumazet1-0/+1
Alexander Beregalov reported skb_over_panic errors and provided stack trace. I occurs commit a21d45726aca (tcp: avoid order-1 allocations on wifi and tx path) added a regression, when a retransmit is done after a partial ACK. tcp_retransmit_skb() tries to aggregate several frames if the first one has enough available room to hold the following ones payload. This is controlled by /proc/sys/net/ipv4/tcp_retrans_collapse tunable (default : enabled) Problem is we must make sure _pskb_trim_head() doesnt fool skb_availroom() when pulling some bytes from skb (this pull is done when receiver ACK part of the frame). Reported-by: Alexander Beregalov <[email protected]> Cc: Marc MERLIN <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-15net: cleanup unsigned to unsigned intEric Dumazet1-12/+12
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-11tcp: avoid order-1 allocations on wifi and tx pathEric Dumazet1-1/+1
Marc Merlin reported many order-1 allocations failures in TX path on its wireless setup, that dont make any sense with MTU=1500 network, and non SG capable hardware. After investigation, it turns out TCP uses sk_stream_alloc_skb() and used as a convention skb_tailroom(skb) to know how many bytes of data payload could be put in this skb (for non SG capable devices) Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER + sizeof(struct skb_shared_info) being above 2048) Later, mac80211 layer need to add some bytes at the tail of skb (IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is available has to call pskb_expand_head() and request order-1 allocations. This patch changes sk_stream_alloc_skb() so that only sk->sk_prot->max_header bytes of headroom are reserved, and use a new skb field, avail_size to hold the data payload limit. This way, order-0 allocations done by TCP stack can leave more than 2 KB of tailroom and no more allocation is performed in mac80211 layer (or any layer needing some tailroom) avail_size is unioned with mark/dropcount, since mark will be set later in IP stack for output packets. Therefore, skb size is unchanged. Reported-by: Marc MERLIN <[email protected]> Tested-by: Marc MERLIN <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-02-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-4/+2
2012-01-30tcp: fix tcp_trim_head() to adjust segment count with skb MSSNeal Cardwell1-4/+2
This commit fixes tcp_trim_head() to recalculate the number of segments in the skb with the skb's existing MSS, so trimming the head causes the skb segment count to be monotonically non-increasing - it should stay the same or go down, but not increase. Previously tcp_trim_head() used the current MSS of the connection. But if there was a decrease in MSS between original transmission and ACK (e.g. due to PMTUD), this could cause tcp_trim_head() to counter-intuitively increase the segment count when trimming bytes off the head of an skb. This violated assumptions in tcp_tso_acked() that tcp_trim_head() only decreases the packet count, so that packets_acked in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to pass u32 pkts_acked values as large as 0xffffffff to ca_ops->pkts_acked(). As an aside, if tcp_trim_head() had really wanted the skb to reflect the current MSS, it should have called tcp_set_skb_tso_segs() unconditionally, since a decrease in MSS would mean that a single-packet skb should now be sliced into multiple segments. Signed-off-by: Neal Cardwell <[email protected]> Acked-by: Nandita Dukkipati <[email protected]> Acked-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-01-26tcp: add LINUX_MIB_TCPRETRANSFAIL counterEric Dumazet1-1/+3
It might be useful to get a counter of failed tcp_retransmit_skb() calls. Reported-by: Satoru Moriya <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-12-12foundations of per-cgroup memory pressure controlling.Glauber Costa1-1/+1
This patch replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to acessor macros. Those macros can either receive a socket argument, or a mem_cgroup argument, depending on the context they live in. Since we're only doing a macro wrapping here, no performance impact at all is expected in the case where we don't have cgroups disabled. Signed-off-by: Glauber Costa <[email protected]> Reviewed-by: Hiroyouki Kamezawa <[email protected]> CC: David S. Miller <[email protected]> CC: Eric W. Biederman <[email protected]> CC: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-12-05tcp: fix tcp_trim_head()Eric Dumazet1-5/+8
commit f07d960df3 (tcp: avoid frag allocation for small frames) breaked assumption in tcp stack that skb is either linear (skb->data_len == 0), or fully fragged (skb->data_len == skb->len) tcp_trim_head() made this assumption, we must fix it. Thanks to Vijay for providing a very detailed explanation. Reported-by: Vijay Subramanian <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-12-04tcp: take care of misalignmentsEric Dumazet1-1/+9
We discovered that TCP stack could retransmit misaligned skbs if a malicious peer acknowledged sub MSS frame. This currently can happen only if output interface is non SG enabled : If SG is enabled, tcp builds headless skbs (all payload is included in fragments), so the tcp trimming process only removes parts of skb fragments, header stay aligned. Some arches cant handle misalignments, so force a head reallocation and shrink headroom to MAX_TCP_HEADER. Dont care about misaligments on x86 and PPC (or other arches setting NET_IP_ALIGN to 0) This patch introduces __pskb_copy() which can specify the headroom of new head, and pskb_copy() becomes a wrapper on top of __pskb_copy() Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-11-29tcp: do not scale TSO segment size with reordering degreeNeal Cardwell1-1/+1
Since 2005 (c1b4a7e69576d65efc31a8cea0714173c2841244) tcp_tso_should_defer has been using tcp_max_burst() as a target limit for deciding how large to make outgoing TSO packets when not using sysctl_tcp_tso_win_divisor. But since 2008 (dd9e0dda66ba38a2ddd1405ac279894260dc5c36) tcp_max_burst() returns the reordering degree. We should not have tcp_tso_should_defer attempt to build larger segments just because there is more reordering. This commit splits the notion of deferral size used in TSO from the notion of burst size used in cwnd moderation, and returns the TSO deferral limit to its original value. Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-11-08tcp: Fix comments for Nagle algorithmFeng King1-1/+1
TCP_NODELAY is weaker than TCP_CORK, when TCP_CORK was set, small segments will always pass Nagle test regardless of TCP_NODELAY option. Signed-off-by: Feng King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-10-21tcp: add const qualifiers where possibleEric Dumazet1-35/+37
Adding const qualifiers to pointers can ease code review, and spot some bugs. It might allow compiler to optimize code further. For example, is it legal to temporary write a null cksum into tcphdr in tcp_md5_hash_header() ? I am afraid a sniffer could catch the temporary null value... Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-10-19net: add skb frag size accessorsEric Dumazet1-3/+5
To ease skb->truesize sanitization, its better to be able to localize all references to skb frags size. Define accessors : skb_frag_size() to fetch frag size, and skb_frag_size_{set|add|sub}() to manipulate it. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-09-27tcp: rename tcp_skb_cb flagsEric Dumazet1-31/+32
Rename struct tcp_skb_cb "flags" to "tcp_flags" to ease code review and maintenance. Its content is a combination of FIN/SYN/RST/PSH/ACK/URG/ECE/CWR flags Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-08-24Proportional Rate Reduction for TCP.Nandita Dukkipati1-1/+6
This patch implements Proportional Rate Reduction (PRR) for TCP. PRR is an algorithm that determines TCP's sending rate in fast recovery. PRR avoids excessive window reductions and aims for the actual congestion window size at the end of recovery to be as close as possible to the window determined by the congestion control algorithm. PRR also improves accuracy of the amount of data sent during loss recovery. The patch implements the recommended flavor of PRR called PRR-SSRB (Proportional rate reduction with slow start reduction bound) and replaces the existing rate halving algorithm. PRR improves upon the existing Linux fast recovery under a number of conditions including: 1) burst losses where the losses implicitly reduce the amount of outstanding data (pipe) below the ssthresh value selected by the congestion control algorithm and, 2) losses near the end of short flows where application runs out of data to send. As an example, with the existing rate halving implementation a single loss event can cause a connection carrying short Web transactions to go into the slow start mode after the recovery. This is because during recovery Linux pulls the congestion window down to packets_in_flight+1 on every ACK. A short Web response often runs out of new data to send and its pipe reduces to zero by the end of recovery when all its packets are drained from the network. Subsequent HTTP responses using the same connection will have to slow start to raise cwnd to ssthresh. PRR on the other hand aims for the cwnd to be as close as possible to ssthresh by the end of recovery. A description of PRR and a discussion of its performance can be found at the following links: - IETF Draft: http://tools.ietf.org/html/draft-mathis-tcpm-proportional-rate-reduction-01 - IETF Slides: http://www.ietf.org/proceedings/80/slides/tcpm-6.pdf http://tools.ietf.org/agenda/81/slides/tcpm-2.pdf - Paper to appear in Internet Measurements Conference (IMC) 2011: Improving TCP Loss Recovery Nandita Dukkipati, Matt Mathis, Yuchung Cheng Signed-off-by: Nandita Dukkipati <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-08-24net: ipv4: convert to SKB frag APIsIan Campbell1-1/+1
Signed-off-by: Ian Campbell <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Alexey Kuznetsov <[email protected]> Cc: "Pekka Savola (ipv6)" <[email protected]> Cc: James Morris <[email protected]> Cc: Hideaki YOSHIFUJI <[email protected]> Cc: Patrick McHardy <[email protected]> Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
2011-05-08inet: Pass flowi to ->queue_xmit().David S. Miller1-1/+1
This allows us to acquire the exact route keying information from the protocol, however that might be managed. It handles all of the possibilities, from the simplest case of storing the key in inet->cork.fl to the more complex setup SCTP has where individual transports determine the flow. Signed-off-by: David S. Miller <[email protected]>
2011-04-07Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6Linus Torvalds1-1/+1
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings
2011-04-01tcp: len check is unnecessarily devastating, change to WARN_ONIlpo Järvinen1-1/+2
All callers are prepared for alloc failures anyway, so this error can safely be boomeranged to the callers domain without super bad consequences. ...At worst the connection might go into a state where each RTO tries to (unsuccessfully) re-fragment with such a mis-sized value and eventually dies. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-31Fix common misspellingsLucas De Marchi1-1/+1
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <[email protected]>
2011-02-21tcp: undo_retrans counter fixesYuchung Cheng1-1/+1
Fix a bug that undo_retrans is incorrectly decremented when undo_marker is not set or undo_retrans is already 0. This happens when sender receives more DSACK ACKs than packets retransmitted during the current undo phase. This may also happen when sender receives DSACK after the undo operation is completed or cancelled. Fix another bug that undo_retrans is incorrectly incremented when sender retransmits an skb and tcp_skb_pcount(skb) > 1 (TSO). This case is rare but not impossible. Signed-off-by: Yuchung Cheng <[email protected]> Acked-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-01-13Merge branch 'for-next' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2010-12-22Merge branch 'master' into for-nextJiri Kosina1-19/+23
Conflicts: MAINTAINERS arch/arm/mach-omap2/pm24xx.c drivers/scsi/bfa/bfa_fcpim.c Needed to update to apply fixes for which the old branch was too outdated.
2010-12-20TCP: increase default initial receive window.Nandita Dukkipati1-3/+8
This patch changes the default initial receive window to 10 mss (defined constant). The default window is limited to the maximum of 10*1460 and 2*mss (when mss > 1460). draft-ietf-tcpm-initcwnd-00 is a proposal to the IETF that recommends increasing TCP's initial congestion window to 10 mss or about 15KB. Leading up to this proposal were several large-scale live Internet experiments with an initial congestion window of 10 mss (IW10), where we showed that the average latency of HTTP responses improved by approximately 10%. This was accompanied by a slight increase in retransmission rate (0.5%), most of which is coming from applications opening multiple simultaneous connections. To understand the extreme worst case scenarios, and fairness issues (IW10 versus IW3), we further conducted controlled testbed experiments. We came away finding minimal negative impact even under low link bandwidths (dial-ups) and small buffers. These results are extremely encouraging to adopting IW10. However, an initial congestion window of 10 mss is useless unless a TCP receiver advertises an initial receive window of at least 10 mss. Fortunately, in the large-scale Internet experiments we found that most widely used operating systems advertised large initial receive windows of 64KB, allowing us to experiment with a wide range of initial congestion windows. Linux systems were among the few exceptions that advertised a small receive window of 6KB. The purpose of this patch is to fix this shortcoming. References: 1. A comprehensive list of all IW10 references to date. http://code.google.com/speed/protocols/tcpm-IW10.html 2. Paper describing results from large-scale Internet experiments with IW10. http://ccr.sigcomm.org/drupal/?q=node/621 3. Controlled testbed experiments under worst case scenarios and a fairness study. http://www.ietf.org/proceedings/79/slides/tcpm-0.pdf 4. Raw test data from testbed experiments (Linux senders/receivers) with initial congestion and receive windows of both 10 mss. http://research.csc.ncsu.edu/netsrv/?q=content/iw10 5. Internet-Draft. Increasing TCP's Initial Window. https://datatracker.ietf.org/doc/draft-ietf-tcpm-initcwnd/ Signed-off-by: Nandita Dukkipati <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-12-13net: Abstract default ADVMSS behind an accessor.David S. Miller1-5/+9
Make all RTAX_ADVMSS metric accesses go through a new helper function, dst_metric_advmss(). Leave the actual default metric as "zero" in the real metric slot, and compute the actual default value dynamically via a new dst_ops AF specific callback. For stacked IPSEC routes, we use the advmss of the path which preserves existing behavior. Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates advmss on pmtu updates. This inconsistency in advmss handling results in more raw metric accesses than I wish we ended up with. Signed-off-by: David S. Miller <[email protected]>
2010-12-08Merge branch 'master' of ↵David S. Miller1-19/+23
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath/ath9k/ar9003_eeprom.c net/llc/af_llc.c
2010-12-08tcp: protect sysctl_tcp_cookie_size readsEric Dumazet1-12/+15
Make sure sysctl_tcp_cookie_size is read once in tcp_cookie_size_check(), or we might return an illegal value to caller if sysctl_tcp_cookie_size is changed by another cpu. Signed-off-by: Eric Dumazet <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: William Allen Simpson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-12-08tcp: avoid a possible divide by zeroEric Dumazet1-2/+4
sysctl_tcp_tso_win_divisor might be set to zero while one cpu runs in tcp_tso_should_defer(). Make sure we dont allow a divide by zero by reading sysctl_tcp_tso_win_divisor exactly once. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-12-08tcp: Bug fix in initialization of receive window.Nandita Dukkipati1-5/+4
The bug has to do with boundary checks on the initial receive window. If the initial receive window falls between init_cwnd and the receive window specified by the user, the initial window is incorrectly brought down to init_cwnd. The correct behavior is to allow it to remain unchanged. Signed-off-by: Nandita Dukkipati <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-12-02tcp: use TCP_BASE_MSS to set basic mss valueShan Wei1-1/+1
TCP_BASE_MSS is defined, but not used. commit 5d424d5a introduce this macro, so use it to initial sysctl_tcp_base_mss. commit 5d424d5a674f782d0659a3b66d951f412901faee Author: John Heffner <[email protected]> Date: Mon Mar 20 17:53:41 2006 -0800 [TCP]: MTU probing Signed-off-by: Shan Wei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-24xps: Improvements in TX queue selectionTom Herbert1-1/+4
In dev_pick_tx, don't do work in calculating queue index or setting the index in the sock unless the device has more than one queue. This allows the sock to be set only with a queue index of a multi-queue device which is desirable if device are stacked like in a tunnel. We also allow the mapping of a socket to queue to be changed. To maintain in order packet transmission a flag (ooo_okay) has been added to the sk_buff structure. If a transport layer sets this flag on a packet, the transmit queue can be changed for the socket. Presumably, the transport would set this if there was no possbility of creating OOO packets (for instance, there are no packets in flight for the socket). This patch includes the modification in TCP output for setting this flag. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-17network: tcp_connect should return certain errors up the stackEric Paris1-1/+4
The current tcp_connect code completely ignores errors from sending an skb. This makes sense in many situations (like -ENOBUFFS) but I want to be able to immediately fail connections if they are denied by the SELinux netfilter hook. Netfilter does not normally return ECONNREFUSED when it drops a packet so we respect that error code as a final and fatal error that can not be recovered. Based-on-patch-by: Patrick McHardy <[email protected]> Signed-off-by: Eric Paris <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-01tree-wide: fix comment/printk typosUwe Kleine-König1-1/+1
"gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2010-09-23net: return operator cleanupEric Dumazet1-4/+4
Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-09-01tcp: update also tcp_output with regard to RFC 5681Gerrit Renker1-9/+3
Thanks to Ilpo Jarvinen, this updates also the initial window setting for tcp_output with regard to RFC 5681. Signed-off-by: Gerrit Renker <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-08-22tcp: allow effective reduction of TCP's rcv-buffer via setsockoptHagen Paul Pfeifer1-0/+11
Via setsockopt it is possible to reduce the socket RX buffer (SO_RCVBUF). TCP method to select the initial window and window scaling option in tcp_select_initial_window() currently misbehaves and do not consider a reduced RX socket buffer via setsockopt. Even though the server's RX buffer is reduced via setsockopt() to 256 byte (Initial Window 384 byte => 256 * 2 - (256 * 2 / 4)) the window scale option is still 7: 192.168.1.38.40676 > 78.47.222.210.5001: Flags [S], seq 2577214362, win 5840, options [mss 1460,sackOK,TS val 338417 ecr 0,nop,wscale 0], length 0 78.47.222.210.5001 > 192.168.1.38.40676: Flags [S.], seq 1570631029, ack 2577214363, win 384, options [mss 1452,sackOK,TS val 2435248895 ecr 338417,nop,wscale 7], length 0 192.168.1.38.40676 > 78.47.222.210.5001: Flags [.], ack 1, win 5840, options [nop,nop,TS val 338421 ecr 2435248895], length 0 Within tcp_select_initial_window() the original space argument - a representation of the rx buffer size - is expanded during tcp_select_initial_window(). Only sysctl_tcp_rmem[2], sysctl_rmem_max and window_clamp are considered to calculate the initial window. This patch adjust the window_clamp argument if the user explicitly reduce the receive buffer. Signed-off-by: Hagen Paul Pfeifer <[email protected]> Cc: David S. Miller <[email protected]> Cc: Patrick McHardy <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-07-20Merge branch 'master' of ↵David S. Miller1-0/+3
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/vhost/net.c net/bridge/br_device.c Fix merge conflict in drivers/vhost/net.c with guidance from Stephen Rothwell. Revert the effects of net-2.6 commit 573201f36fd9c7c6d5218cdcd9948cee700b277d since net-next-2.6 has fixes that make bridge netpoll work properly thus we don't need it disabled. Signed-off-by: David S. Miller <[email protected]>
2010-07-19tcp: fix crash in tcp_xmit_retransmit_queueIlpo Järvinen1-0/+3
It can happen that there are no packets in queue while calling tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns NULL and that gets deref'ed to get sacked into a local var. There is no work to do if no packets are outstanding so we just exit early. This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out guard to make joining diff nicer). Signed-off-by: Ilpo Järvinen <[email protected]> Reported-by: Lennart Schulte <[email protected]> Tested-by: Lennart Schulte <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-07-12net/ipv4: EXPORT_SYMBOL cleanupsEric Dumazet1-7/+5
CodingStyle cleanups EXPORT_SYMBOL should immediately follow the symbol declaration. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-06-28tcp: tso_fragment() might avoid GFP_ATOMICEric Dumazet1-3/+3
We can pass a gfp argument to tso_fragment() and avoid GFP_ATOMIC allocations sometimes. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-06-15tcp: unify tcp flag macrosChangli Gao1-30/+29
unify tcp flag macros: TCPHDR_FIN, TCPHDR_SYN, TCPHDR_RST, TCPHDR_PSH, TCPHDR_ACK, TCPHDR_URG, TCPHDR_ECE and TCPHDR_CWR. TCBCB_FLAG_* are replaced with the corresponding TCPHDR_*. Signed-off-by: Changli Gao <[email protected]> ---- include/net/tcp.h | 24 ++++++------- net/ipv4/tcp.c | 8 ++-- net/ipv4/tcp_input.c | 2 - net/ipv4/tcp_output.c | 59 ++++++++++++++++----------------- net/netfilter/nf_conntrack_proto_tcp.c | 32 ++++++----------- net/netfilter/xt_TCPMSS.c | 4 -- 6 files changed, 58 insertions(+), 71 deletions(-) Signed-off-by: David S. Miller <[email protected]>
2010-05-17tcp: tcp_synack_options() fix Eric Dumazet1-5/+4
Commit 33ad798c924b4a (tcp: options clean up) introduced a problem if MD5+SACK+timestamps were used in initial SYN message. Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP sessions, but since 40 bytes of tcp options space are not enough to store all the bits needed, we chose to disable timestamps in this case. We send a SYN-ACK _without_ timestamp option, but socket has timestamps enabled and all further outgoing messages contain a TS block, all with the initial timestamp of the remote peer. Fix is to really disable timestamps option for the whole session. Reported-by: Bijay Singh <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-05-16net: Introduce sk_route_nocapsEric Dumazet1-1/+1
TCP-MD5 sessions have intermittent failures, when route cache is invalidated. ip_queue_xmit() has to find a new route, calls sk_setup_caps(sk, &rt->u.dst), destroying the sk->sk_route_caps &= ~NETIF_F_GSO_MASK that MD5 desperately try to make all over its way (from tcp_transmit_skb() for example) So we send few bad packets, and everything is fine when tcp_transmit_skb() is called again for this socket. Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a socket field, sk_route_nocaps, containing bits to mask on sk_route_caps. Reported-by: Bhaskar Dutta <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-04-22tcp: fix outsegs stat for TSO segmentsTom Herbert1-2/+3
Account for TSO segments of an skb in TCP_MIB_OUTSEGS counter. Without doing this, the counter can be off by orders of magnitude from the actual number of segments sent. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-04-20net: Fix various endianness glitchesEric Dumazet1-2/+2
Sparse can help us find endianness bugs, but we need to make some cleanups to be able to more easily spot real bugs. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-04-15net: replace ipfragok with skb->local_dfShan Wei1-1/+1
As Herbert Xu said: we should be able to simply replace ipfragok with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function) has droped ipfragok and set local_df value properly. The patch kills the ipfragok parameter of .queue_xmit(). Signed-off-by: Shan Wei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-04-11tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skbDavid S. Miller1-0/+1
Back in commit 04a0551c87363f100b04d28d7a15a632b70e18e7 ("loopback: Drop obsolete ip_summed setting") we stopped setting CHECKSUM_UNNECESSARY in the loopback xmit. This is because such a setting was a lie since it implies that the checksum field of the packet is properly filled in. Instead what happens normally is that CHECKSUM_PARTIAL is set and skb->csum is calculated as needed. But this was only happening for TCP data packets (via the skb->ip_summed assignment done in tcp_sendmsg()). It doesn't happen for non-data packets like ACKs etc. Fix this by setting skb->ip_summed in the common non-data packet constructor. It already is setting skb->csum to zero. But this reminds us that we still have things like ip_output.c's ip_dev_loopback_xmit() which sets skb->ip_summed to the value CHECKSUM_UNNECESSARY, which Herbert's patch teaches us is not valid. So we'll have to address that at some point too. Signed-off-by: David S. Miller <[email protected]>
2010-04-11inet: Remove unused send_check length argumentHerbert Xu1-1/+1
inet: Remove unused send_check length argument This patch removes the unused length argument from the send_check function in struct inet_connection_sock_af_ops. Signed-off-by: Herbert Xu <[email protected]> Tested-by: Yinghai <[email protected]> Signed-off-by: David S. Miller <[email protected]>