aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)AuthorFilesLines
2008-09-23tcp: Fix order of tests in tcp_retransmit_skb()David S. Miller1-1/+1
tcp_write_queue_next() must only be made if we know that tcp_skb_is_last() evaluates to false. Signed-off-by: David S. Miller <[email protected]>
2008-09-21net: Remove __skb_insert() calls outside of skbuff internals.David S. Miller1-2/+2
This minor cleanup simplifies later changes which will convert struct sk_buff and friends over to using struct list_head. Signed-off-by: David S. Miller <[email protected]>
2008-09-22ipvs: Fix unused label warningSven Wegener1-0/+2
Signed-off-by: Sven Wegener <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-22ipvs: Restrict sync message to 255 connectionsSven Wegener1-2/+4
The nr_conns variable in the sync message header is only eight bits wide and will overflow on interfaces with a large MTU. As a result the backup won't parse all connections contained in the sync buffer. On regular ethernet with an MTU of 1500 this isn't a problem, because we can't overflow the value, but consider jumbo frames being used on a cross-over connection between both directors. We now restrict the size of the sync buffer, so that we never put more than 255 connections into a single sync buffer. Signed-off-by: Sven Wegener <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-21tcp: advertise MSS requested by userTom Quetchenbach2-3/+14
I'm trying to use the TCP_MAXSEG option to setsockopt() to set the MSS for both sides of a bidirectional connection. man tcp says: "If this option is set before connection establishment, it also changes the MSS value announced to the other end in the initial packet." However, the kernel only uses the MTU/route cache to set the advertised MSS. That means if I set the MSS to, say, 500 before calling connect(), I will send at most 500-byte packets, but I will still receive 1500-byte packets in reply. This is a bug, either in the kernel or the documentation. This patch (applies to latest net-2.6) reduces the advertised value to that requested by the user as long as setsockopt() is called before connect() or accept(). This seems like the behavior that one would expect as well as that which is documented. I've tried to make sure that things that depend on the advertised MSS are set correctly. Signed-off-by: Tom Quetchenbach <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20net: Use hton[sl]() instead of __constant_hton[sl]() where applicableArnaldo Carvalho de Melo1-2/+2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: back retransmit_high when it over-estimatedIlpo Järvinen1-2/+10
If lost skb is sacked, we might have nothing to retransmit as high as the retransmit_high is pointing to, so place it lower to avoid unnecessary walking. This is mainly for the case where high L'ed skbs gets sacked. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: don't clear lost_skb_hint when not necessaryIlpo Järvinen1-1/+13
Most importantly avoid doing it with cumulative ACK. However, since we have lost_cnt_hint in the picture as well needing adjustments, it's not as trivial as dealing with retransmit_skb_hint (and cannot be done in the all place we could trivially leave retransmit_skb_hint untouched). With the previous patch, this should mostly remove O(n^2) behavior while cumulative ACKs start flowing once rexmit after a lossy round-trip made it through. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: don't clear retransmit_skb_hint when not necessaryIlpo Järvinen2-4/+8
Most importantly avoid doing it with cumulative ACK. Not clearing means that we no longer need n^2 processing in resolution of each fast recovery. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: remove retransmit_skb_hint clearing from failureIlpo Järvinen1-3/+1
This doesn't much sense here afaict, probably never has. Since fragmenting and collapsing deal the hints by themselves, there should be very little reason for the rexmit loop to do that. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: reorganize retransmit code loopsIlpo Järvinen1-46/+33
Both loops are quite similar, so they can be combined with little effort. As a result, forward_skb_hint becomes obsolete as well. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: remove tp->lost_out guard to make joining diff nicerIlpo Järvinen1-37/+38
The validity of the retransmit_high must then be ensured if no L'ed skb exits! This makes a minor change to behavior, we now have to iterate the head to find out that the loop terminates. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: Reorganize skb tagbit checksIlpo Järvinen1-19/+19
Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: remove obsolete validity concernIlpo Järvinen1-4/+0
Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: add tcp_can_forward_retransmitIlpo Järvinen1-18/+28
Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: No need to clear retransmit_skb_hint when SACKingIlpo Järvinen1-7/+0
Because lost counter no longer requires tuning, this is trivial to remove (the tuning wouldn't have been too hard either) because no "new" retransmittable skb appeared below retransmit_skb_hint when SACKing for sure. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: Kill precaution that's very likely obsoleteIlpo Järvinen1-4/+0
I suspect it might have been related to the changed amount of lost skbs, which was counted by retransmit_cnt_hint that got changed. The place for this clearing was very illogical anyway, it should have been after the LOST-bit clearing loop to make any sense. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: convert retransmit_cnt_hint to seqnoIlpo Järvinen2-32/+27
Main benefit in this is that we can then freely point the retransmit_skb_hint to anywhere we want to because there's no longer need to know what would be the count changes involve, and since this is really used only as a terminator, unnecessary work is one time walk at most, and if some retransmissions are necessary after that point later on, the walk is not full waste of time anyway. Since retransmit_high must be kept valid, all lost markers must ensure that. Now I also have learned how those "holes" in the rexmittable skbs can appear, mtu probe does them. So I removed the misleading comment as well. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: add helper for lost bit togglingIlpo Järvinen1-10/+12
This useful because we'd need to verifying soon in many places which makes things slightly more complex than it used to be. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: move tcp_verify_retransmit_hintIlpo Järvinen1-13/+13
Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: Partial hint clearing has again become meaninglessIlpo Järvinen2-5/+4
Ie., the difference between partial and all clearing doesn't exists anymore since the SACK optimizations got dropped by an sacktag rewrite. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-17ipvs: change some __constant_htons() to htons()Brian Haley2-2/+2
Change __contant_htons() to htons() in the IPVS code when not in an initializer. -Brian Signed-off-by: Brian Haley <[email protected]> Acked-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-17ipvs: add __aquire/__release annotations to ↵Simon Horman1-0/+2
ip_vs_info_seq_start/ip_vs_info_seq_stop This teaches sparse that the following are not problems: make C=1 CHECK net/ipv4/ipvs/ip_vs_ctl.c net/ipv4/ipvs/ip_vs_ctl.c:1793:14: warning: context imbalance in 'ip_vs_info_seq_start' - wrong count at exit net/ipv4/ipvs/ip_vs_ctl.c:1842:13: warning: context imbalance in 'ip_vs_info_seq_stop' - unexpected unlock Acked-by: Sven Wegener <[email protected]> Acked-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-17ipvs: supply a valid 0 address to ip_vs_conn_new()Simon Horman1-1/+2
ip_vs_conn_new expects a union nf_inet_addr as the type for its address parameters, not a plain integer. This problem was detected by sparse. make C=1 CHECK net/ipv4/ipvs/ip_vs_core.c net/ipv4/ipvs/ip_vs_core.c:469:9: warning: Using plain integer as NULL pointer Acked-by: Sven Wegener <[email protected]> Acked-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-17ipvs: only unlock in ip_vs_edit_service() if already lockedSimon Horman1-3/+4
Jumping to out unlocks __ip_vs_svc_lock, but that lock is not taken until after code that may jump to out. This problem was detected by sparse. make C=1 CHECK net/ipv4/ipvs/ip_vs_ctl.c net/ipv4/ipvs/ip_vs_ctl.c:1332:2: warning: context imbalance in 'ip_vs_edit_service' - unexpected unlock Acked-by: Sven Wegener <[email protected]> Acked-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-15udp: Fix rcv socket lockingHerbert Xu1-29/+33
The previous patch in response to the recursive locking on IPsec reception is broken as it tries to drop the BH socket lock while in user context. This patch fixes it by shrinking the section protected by the socket lock to sock_queue_rcv_skb only. The only reason we added the lock is for the accounting which happens in that function. Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-12net: ip_vs_proto_{tcp,udp} build fixStephen Rothwell2-0/+2
Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 into ↵Simon Horman3-3/+48
lvs-next-2.6
2008-09-09This reverts "Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp"Gerrit Renker1-2/+15
as it accentally contained the wrong set of patches. These will be submitted separately. Signed-off-by: Gerrit Renker <[email protected]>
2008-09-08Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_expDavid S. Miller1-15/+2
Conflicts: net/dccp/input.c net/dccp/options.c
2008-09-08Merge branch 'master' of ↵David S. Miller3-3/+48
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: net/mac80211/mlme.c
2008-09-09ipvs: Embed user stats structure into kernel stats structureSven Wegener3-67/+56
Instead of duplicating the fields, integrate a user stats structure into the kernel stats structure. This is more robust when the members are changed, because they are now automatically kept in sync. Signed-off-by: Sven Wegener <[email protected]> Reviewed-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-09ipvs: Restrict connection table size via KconfigSven Wegener1-1/+2
Instead of checking the value in include/net/ip_vs.h, we can just restrict the range in our Kconfig file. This will prevent values outside of the range early. Signed-off-by: Sven Wegener <[email protected]> Reviewed-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-09IPVS: Remove incorrect ip_route_me_harder(), fix IPv6Julius Volz1-9/+0
Remove an incorrect ip_route_me_harder() that was probably a result of merging my IPv6 patches with the local client patches. With this, IPv6+NAT are working again. Signed-off-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-09ipvs: handle PARTIAL_CHECKSUMSimon Horman2-4/+70
Now that LVS can load balance locally generated traffic, packets may come from the loopback device and thus may have a partial checksum. The existing code allows for the case where there is no checksum at all for TCP, however Herbert Xu has confirmed that this is not legal. Signed-off-by: Simon Horman <[email protected]> Acked-by: Julius Volz <[email protected]>
2008-09-08netns : fix kernel panic in timewait socket destructionDaniel Lezcano2-0/+36
How to reproduce ? - create a network namespace - use tcp protocol and get timewait socket - exit the network namespace - after a moment (when the timewait socket is destroyed), the kernel panics. # BUG: unable to handle kernel NULL pointer dereference at 0000000000000007 IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 PGD 119985067 PUD 11c5c0067 PMD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table] Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3 RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246 RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30 RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00 RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200 R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00 R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff8800bff9e000, task ffff88011ff76690) Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a 0000000000000008 0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7 ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108 Call Trace: <IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e [<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e [<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193 [<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd [<ffffffff8200d08c>] ? call_softirq+0x1c/0x28 [<ffffffff8200e611>] ? do_softirq+0x2c/0x68 [<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9 [<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70 <EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b [<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7 65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0 48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00 RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 RSP <ffff88011ff7fed0> CR2: 0000000000000007 This patch provides a function to purge all timewait sockets related to a network namespace. The timewait sockets life cycle is not tied with the network namespace, that means the timewait sockets stay alive while the network namespace dies. The timewait sockets are for avoiding to receive a duplicate packet from the network, if the network namespace is freed, the network stack is removed, so no chance to receive any packets from the outside world. Furthermore, having a pending destruction timer on these sockets with a network namespace freed is not safe and will lead to an oops if the timer callback which try to access data belonging to the namespace like for example in: inet_twdr_do_twkill_work -> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED); Purging the timewait sockets at the network namespace destruction will: 1) speed up memory freeing for the namespace 2) fix kernel panic on asynchronous timewait destruction Signed-off-by: Daniel Lezcano <[email protected]> Acked-by: Denis V. Lunev <[email protected]> Acked-by: Eric W. Biederman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-08IPVS: use ipv6_addr_copy()Simon Horman1-2/+2
It is standard to use ipv6_addr_copy() to fill in the in6 element of a union nf_inet_addr snet. Thanks to Julius Volz for pointing this out. Cc: Brian Haley <[email protected]> Signed-off-by: Simon Horman <[email protected]> Acked-by: Julius Volz <[email protected]>
2008-09-08IPVS: fix bogus indentationSimon Horman1-1/+1
Sorry, this was my error. Thanks to Julius Volz for pointing it out. Signed-off-by: Simon Horman <[email protected]> Acked-by: Julius Volz <[email protected]>
2008-09-08ipvs: Reject ipv6 link-local addresses for destinationsSven Wegener1-1/+2
We can't use non-local link-local addresses for destinations, without knowing the interface on which we can reach the address. Reject them for now. Signed-off-by: Sven Wegener <[email protected]> Acked-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-08ipvs: Mark tcp/udp v4 and v6 debug functions staticSven Wegener1-2/+2
They are only used in this file, so they should be static Signed-off-by: Sven Wegener <[email protected]> Acked-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-08ipvs: Return negative error values from ip_vs_edit_service()Sven Wegener1-2/+2
Like the other code in this function does. Signed-off-by: Sven Wegener <[email protected]> Acked-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-08ipvs: Use pointer to address from sync messageSven Wegener1-3/+3
We want a pointer to it, not the value casted to a pointer. Signed-off-by: Sven Wegener <[email protected]> Acked-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-05ipvs: load balance ipv6 connections from a local processSimon Horman1-50/+41
This allows IPVS to load balance IPv6 connections made by a local process. For example a proxy server running locally. External client --> pound:443 -> Local:443 --> IPVS:80 --> RealServer This is an extenstion to the IPv4 work done in this area by Siim Põder and Malcolm Turnbull. Cc: Siim Põder <[email protected]> Cc: Malcolm Turnbull <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-05ipvs: load balance IPv4 connections from a local processMalcolm Turnbull2-94/+134
This allows IPVS to load balance connections made by a local process. For example a proxy server running locally. External client --> pound:443 -> Local:443 --> IPVS:80 --> RealServer Signed-off-by: Siim Põder <[email protected]> Signed-off-by: Malcolm Turnbull <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-05IPVS: Allow adding IPv6 services from userspaceJulius Volz1-5/+48
Allow adding IPv6 services through the genetlink interface and add checks to see if the chosen scheduler is supported with IPv6 and whether the supplied prefix length is sane. Make sure the service count exported via the sockopt interface only counts IPv4 services. Signed-off-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-05IPVS: Activate IPv6 Netfilter hooksJulius Volz1-0/+37
Register the previously defined or adapted netfilter hook functions for IPv6 as PF_INET6 hooks. Signed-off-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-05IPVS: Adjust various debug outputs to use new macrosJulius Volz4-63/+78
Adjust various debug outputs to use the new *_BUF macro variants for correct output of v4/v6 addresses. Signed-off-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-05IPVS: Add function to determine if IPv6 address is localVince Busam1-7/+49
Add __ip_vs_addr_is_local_v6() to find out if an IPv6 address belongs to a local interface. Use this function to decide whether to set the IP_VS_CONN_F_LOCALNODE flag for IPv6 destinations. Signed-off-by: Vince Busam <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-05IPVS: Turn off FTP application helper for IPv6Julius Volz1-0/+16
Immediately return from FTP application helper and do nothing when dealing with IPv6 packets. IPv6 is not supported by this helper yet. Signed-off-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-05IVPS: Disable sync daemon for IPv6 connectionsJulius Volz1-1/+2
Disable the sync daemon for IPv6 connections, works only with IPv4 for now. Signed-off-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>