aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)AuthorFilesLines
2008-10-07tcp: Fix possible double-ack w/ user dmaAli Saidi1-1/+2
From: Ali Saidi <[email protected]> When TCP receive copy offload is enabled it's possible that tcp_rcv_established() will cause two acks to be sent for a single packet. In the case that a tcp_dma_early_copy() is successful, copied_early is set to true which causes tcp_cleanup_rbuf() to be called early which can send an ack. Further along in tcp_rcv_established(), __tcp_ack_snd_check() is called and will schedule a delayed ACK. If no packets are processed before the delayed ack timer expires the packet will be acked twice. Signed-off-by: David S. Miller <[email protected]>
2008-10-07netns: make udpv6 mib per/namespaceDenis V. Lunev1-3/+0
Signed-off-by: Denis V. Lunev <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-07tcp: cleanup messy initializerIlpo Järvinen1-2/+2
I'm quite sure that if I give this function in its old format for you to inspect, you start to wonder what is the type of demanded or if it's a global variable. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-07tcp: kill pointless urg_modeIlpo Järvinen4-14/+20
It all started from me noticing that this urgent check in tcp_clean_rtx_queue is unnecessarily inside the loop. Then I took a longer look to it and found out that the users of urg_mode can trivially do without, well almost, there was one gotcha. Bonus: those funny people who use urg with >= 2^31 write_seq - snd_una could now rejoice too (that's the only purpose for the between being there, otherwise a simple compare would have done the thing). Not that I assume that the rest of the tcp code happily lives with such mind-boggling numbers :-). Alas, it turned out to be impossible to set wmem to such numbers anyway, yes I really tried a big sendfile after setting some wmem but nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf... So I hacked a bit variable to long and found out that it seems to work... :-) Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-07net: wrap sk->sk_backlog_rcv()Peter Zijlstra2-2/+2
Wrap calling sk->sk_backlog_rcv() in a function. This will allow extending the generic sk_backlog_rcv behaviour. Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-07inet: Don't lookup the socket if there's a socket attached to the skbKOVACS Krisztian1-3/+7
Use the socket cached in the skb if it's present. Signed-off-by: KOVACS Krisztian <[email protected]> Acked-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-07inet: Add udplib_lookup_skb() helpersKOVACS Krisztian1-2/+12
To be able to use the cached socket reference in the skb during input processing we add a new set of lookup functions that receive the skb on their argument list. Signed-off-by: KOVACS Krisztian <[email protected]> Acked-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-07inet_hashtables: Add inet_lookup_skb helpersArnaldo Carvalho de Melo1-2/+1
To be able to use the cached socket reference in the skb during input processing we add a new set of lookup functions that receive the skb on their argument list. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: KOVACS Krisztian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 into ↵Simon Horman14-51/+106
lvs-next-2.6
2008-10-07IPVS: Move IPVS to net/netfilter/ipvsJulius Volz26-13979/+0
Since IPVS now has partial IPv6 support, this patch moves IPVS from net/ipv4/ipvs to net/netfilter/ipvs. It's a result of: $ git mv net/ipv4/ipvs net/netfilter and adapting the relevant Kconfigs/Makefiles to the new path. Signed-off-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-10-06tcp: Respect SO_RCVLOWAT in tcp_poll().David S. Miller1-4/+8
Based upon a report by Vito Caputo. Signed-off-by: David S. Miller <[email protected]>
2008-10-01udp: Export UDP socket lookup functionKOVACS Krisztian1-0/+7
The iptables tproxy code has to be able to do UDP socket hash lookups, so we have to provide an exported lookup function for this purpose. Signed-off-by: KOVACS Krisztian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-01tcp: Port redirection support for TCPKOVACS Krisztian3-1/+4
Current TCP code relies on the local port of the listening socket being the same as the destination address of the incoming connection. Port redirection used by many transparent proxying techniques obviously breaks this, so we have to store the original destination port address. This patch extends struct inet_request_sock and stores the incoming destination port value there. It also modifies the handshake code to use that value as the source port when sending reply packets. Signed-off-by: KOVACS Krisztian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-01ipv4: Make Netfilter's ip_route_me_harder() non-local address compatibleKOVACS Krisztian4-1/+9
Netfilter's ip_route_me_harder() tries to re-route packets either generated or re-routed by Netfilter. This patch changes ip_route_me_harder() to handle packets from non-locally-bound sockets with IP_TRANSPARENT set as local and to set the appropriate flowi flags when re-doing the routing lookup. Signed-off-by: KOVACS Krisztian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-01tcp: Handle TCP SYN+ACK/ACK/RST transparencyKOVACS Krisztian1-3/+9
The TCP stack sends out SYN+ACK/ACK/RST reply packets in response to incoming packets. The non-local source address check on output bites us again, as replies for transparently redirected traffic won't have a chance to leave the node. This patch selectively sets the FLOWI_FLAG_ANYSRC flag when doing the route lookup for those replies. Transparent replies are enabled if the listening socket has the transparent socket flag set. Signed-off-by: KOVACS Krisztian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-01ipv4: Make inet_sock.h independent of route.hKOVACS Krisztian1-0/+1
inet_iif() in inet_sock.h requires route.h. Since users of inet_iif() usually require other route.h functionality anyway this patch moves inet_iif() to route.h. Signed-off-by: KOVACS Krisztian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-01ipv4: Allow binding to non-local addresses if IP_TRANSPARENT is setTóth László Attila1-1/+1
Setting IP_TRANSPARENT is not really useful without allowing non-local binds for the socket. To make user-space code simpler we allow these binds even if IP_TRANSPARENT is set but IP_FREEBIND is not. Signed-off-by: Tóth László Attila <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-01ipv4: Implement IP_TRANSPARENT socket optionKOVACS Krisztian2-1/+15
This patch introduces the IP_TRANSPARENT socket option: enabling that will make the IPv4 routing omit the non-local source address check on output. Setting IP_TRANSPARENT requires NET_ADMIN capability. Signed-off-by: KOVACS Krisztian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-01ipv4: Loosen source address check on IPv4 outputJulian Anastasov1-7/+13
ip_route_output() contains a check to make sure that no flows with non-local source IP addresses are routed. This obviously makes using such addresses impossible. This patch introduces a flowi flag which makes omitting this check possible. The new flag provides a way of handling transparent and non-transparent connections differently. Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: KOVACS Krisztian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-01Merge branch 'master' of ↵David S. Miller2-30/+34
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath9k/core.c drivers/net/wireless/ath9k/main.c net/core/dev.c
2008-10-01tcp: Fix NULL dereference in tcp_4_send_ack()Vitaliy Gusev1-1/+1
Fix NULL dereference in tcp_4_send_ack(). As skb->dev is reset to NULL in tcp_v4_rcv() thus OOPS occurs: BUG: unable to handle kernel NULL pointer dereference at 00000000000004d0 IP: [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250 Stack: ffff810005dbb000 ffff810015c8acc0 e77b2c6e5f861600 a01610802e90cb6d 0a08010100000000 88afffff88afffff 0000000080762be8 0000000115c872e8 0004122000000000 0000000000000001 ffffffff80762b88 0000000000000020 Call Trace: <IRQ> [<ffffffff80499c33>] tcp_v4_reqsk_send_ack+0x20/0x22 [<ffffffff8049bce5>] tcp_check_req+0x108/0x14c [<ffffffff8047aaf7>] ? rt_intern_hash+0x322/0x33c [<ffffffff80499846>] tcp_v4_do_rcv+0x399/0x4ec [<ffffffff8045ce4b>] ? skb_checksum+0x4f/0x272 [<ffffffff80485b74>] ? __inet_lookup_listener+0x14a/0x15c [<ffffffff8049babc>] tcp_v4_rcv+0x6a1/0x701 [<ffffffff8047e739>] ip_local_deliver_finish+0x157/0x24a [<ffffffff8047ec9a>] ip_local_deliver+0x72/0x7c [<ffffffff8047e5bd>] ip_rcv_finish+0x38d/0x3b2 [<ffffffff803d3548>] ? scsi_io_completion+0x19d/0x39e [<ffffffff8047ebe5>] ip_rcv+0x2a2/0x2e5 [<ffffffff80462faa>] netif_receive_skb+0x293/0x303 [<ffffffff80465a9b>] process_backlog+0x80/0xd0 [<ffffffff802630b4>] ? __rcu_process_callbacks+0x125/0x1b4 [<ffffffff8046560e>] net_rx_action+0xb9/0x17f [<ffffffff80234cc5>] __do_softirq+0xa3/0x164 [<ffffffff8020c52c>] call_softirq+0x1c/0x28 <EOI> [<ffffffff8020de1c>] do_softirq+0x34/0x72 [<ffffffff80234b8e>] local_bh_enable_ip+0x3f/0x50 [<ffffffff804d43ca>] _spin_unlock_bh+0x12/0x14 [<ffffffff804599cd>] release_sock+0xb8/0xc1 [<ffffffff804a6f9a>] inet_stream_connect+0x146/0x25c [<ffffffff80243078>] ? autoremove_wake_function+0x0/0x38 [<ffffffff8045751f>] sys_connect+0x68/0x8e [<ffffffff80291818>] ? fd_install+0x5f/0x68 [<ffffffff80457784>] ? sock_map_fd+0x55/0x62 [<ffffffff8020b39b>] system_call_after_swapgs+0x7b/0x80 Code: 41 10 11 d0 83 d0 00 4d 85 ed 89 45 c0 c7 45 c4 08 00 00 00 74 07 41 8b 45 04 89 45 c8 48 8b 43 20 8b 4d b8 48 8d 55 b0 48 89 de <48> 8b 80 d0 04 00 00 48 8b b8 60 01 00 00 e8 20 ae fe ff 65 48 RIP [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250 RSP <ffffffff80762b78> CR2: 00000000000004d0 Signed-off-by: Vitaliy Gusev <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-23tcp: Fix queue traversal in tcp_use_frto().David S. Miller1-0/+2
We must check tcp_skb_is_last() before doing a tcp_write_queue_next(). Signed-off-by: David S. Miller <[email protected]>
2008-09-23tcp: Fix order of tests in tcp_retransmit_skb()David S. Miller1-1/+1
tcp_write_queue_next() must only be made if we know that tcp_skb_is_last() evaluates to false. Signed-off-by: David S. Miller <[email protected]>
2008-09-21net: Remove __skb_insert() calls outside of skbuff internals.David S. Miller1-2/+2
This minor cleanup simplifies later changes which will convert struct sk_buff and friends over to using struct list_head. Signed-off-by: David S. Miller <[email protected]>
2008-09-22ipvs: Fix unused label warningSven Wegener1-0/+2
Signed-off-by: Sven Wegener <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-22ipvs: Restrict sync message to 255 connectionsSven Wegener1-2/+4
The nr_conns variable in the sync message header is only eight bits wide and will overflow on interfaces with a large MTU. As a result the backup won't parse all connections contained in the sync buffer. On regular ethernet with an MTU of 1500 this isn't a problem, because we can't overflow the value, but consider jumbo frames being used on a cross-over connection between both directors. We now restrict the size of the sync buffer, so that we never put more than 255 connections into a single sync buffer. Signed-off-by: Sven Wegener <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-21tcp: advertise MSS requested by userTom Quetchenbach2-3/+14
I'm trying to use the TCP_MAXSEG option to setsockopt() to set the MSS for both sides of a bidirectional connection. man tcp says: "If this option is set before connection establishment, it also changes the MSS value announced to the other end in the initial packet." However, the kernel only uses the MTU/route cache to set the advertised MSS. That means if I set the MSS to, say, 500 before calling connect(), I will send at most 500-byte packets, but I will still receive 1500-byte packets in reply. This is a bug, either in the kernel or the documentation. This patch (applies to latest net-2.6) reduces the advertised value to that requested by the user as long as setsockopt() is called before connect() or accept(). This seems like the behavior that one would expect as well as that which is documented. I've tried to make sure that things that depend on the advertised MSS are set correctly. Signed-off-by: Tom Quetchenbach <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20net: Use hton[sl]() instead of __constant_hton[sl]() where applicableArnaldo Carvalho de Melo1-2/+2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: back retransmit_high when it over-estimatedIlpo Järvinen1-2/+10
If lost skb is sacked, we might have nothing to retransmit as high as the retransmit_high is pointing to, so place it lower to avoid unnecessary walking. This is mainly for the case where high L'ed skbs gets sacked. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: don't clear lost_skb_hint when not necessaryIlpo Järvinen1-1/+13
Most importantly avoid doing it with cumulative ACK. However, since we have lost_cnt_hint in the picture as well needing adjustments, it's not as trivial as dealing with retransmit_skb_hint (and cannot be done in the all place we could trivially leave retransmit_skb_hint untouched). With the previous patch, this should mostly remove O(n^2) behavior while cumulative ACKs start flowing once rexmit after a lossy round-trip made it through. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: don't clear retransmit_skb_hint when not necessaryIlpo Järvinen2-4/+8
Most importantly avoid doing it with cumulative ACK. Not clearing means that we no longer need n^2 processing in resolution of each fast recovery. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: remove retransmit_skb_hint clearing from failureIlpo Järvinen1-3/+1
This doesn't much sense here afaict, probably never has. Since fragmenting and collapsing deal the hints by themselves, there should be very little reason for the rexmit loop to do that. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: reorganize retransmit code loopsIlpo Järvinen1-46/+33
Both loops are quite similar, so they can be combined with little effort. As a result, forward_skb_hint becomes obsolete as well. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: remove tp->lost_out guard to make joining diff nicerIlpo Järvinen1-37/+38
The validity of the retransmit_high must then be ensured if no L'ed skb exits! This makes a minor change to behavior, we now have to iterate the head to find out that the loop terminates. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: Reorganize skb tagbit checksIlpo Järvinen1-19/+19
Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: remove obsolete validity concernIlpo Järvinen1-4/+0
Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: add tcp_can_forward_retransmitIlpo Järvinen1-18/+28
Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: No need to clear retransmit_skb_hint when SACKingIlpo Järvinen1-7/+0
Because lost counter no longer requires tuning, this is trivial to remove (the tuning wouldn't have been too hard either) because no "new" retransmittable skb appeared below retransmit_skb_hint when SACKing for sure. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: Kill precaution that's very likely obsoleteIlpo Järvinen1-4/+0
I suspect it might have been related to the changed amount of lost skbs, which was counted by retransmit_cnt_hint that got changed. The place for this clearing was very illogical anyway, it should have been after the LOST-bit clearing loop to make any sense. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: convert retransmit_cnt_hint to seqnoIlpo Järvinen2-32/+27
Main benefit in this is that we can then freely point the retransmit_skb_hint to anywhere we want to because there's no longer need to know what would be the count changes involve, and since this is really used only as a terminator, unnecessary work is one time walk at most, and if some retransmissions are necessary after that point later on, the walk is not full waste of time anyway. Since retransmit_high must be kept valid, all lost markers must ensure that. Now I also have learned how those "holes" in the rexmittable skbs can appear, mtu probe does them. So I removed the misleading comment as well. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: add helper for lost bit togglingIlpo Järvinen1-10/+12
This useful because we'd need to verifying soon in many places which makes things slightly more complex than it used to be. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: move tcp_verify_retransmit_hintIlpo Järvinen1-13/+13
Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20tcp: Partial hint clearing has again become meaninglessIlpo Järvinen2-5/+4
Ie., the difference between partial and all clearing doesn't exists anymore since the SACK optimizations got dropped by an sacktag rewrite. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-17ipvs: change some __constant_htons() to htons()Brian Haley2-2/+2
Change __contant_htons() to htons() in the IPVS code when not in an initializer. -Brian Signed-off-by: Brian Haley <[email protected]> Acked-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-17ipvs: add __aquire/__release annotations to ↵Simon Horman1-0/+2
ip_vs_info_seq_start/ip_vs_info_seq_stop This teaches sparse that the following are not problems: make C=1 CHECK net/ipv4/ipvs/ip_vs_ctl.c net/ipv4/ipvs/ip_vs_ctl.c:1793:14: warning: context imbalance in 'ip_vs_info_seq_start' - wrong count at exit net/ipv4/ipvs/ip_vs_ctl.c:1842:13: warning: context imbalance in 'ip_vs_info_seq_stop' - unexpected unlock Acked-by: Sven Wegener <[email protected]> Acked-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-17ipvs: supply a valid 0 address to ip_vs_conn_new()Simon Horman1-1/+2
ip_vs_conn_new expects a union nf_inet_addr as the type for its address parameters, not a plain integer. This problem was detected by sparse. make C=1 CHECK net/ipv4/ipvs/ip_vs_core.c net/ipv4/ipvs/ip_vs_core.c:469:9: warning: Using plain integer as NULL pointer Acked-by: Sven Wegener <[email protected]> Acked-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-17ipvs: only unlock in ip_vs_edit_service() if already lockedSimon Horman1-3/+4
Jumping to out unlocks __ip_vs_svc_lock, but that lock is not taken until after code that may jump to out. This problem was detected by sparse. make C=1 CHECK net/ipv4/ipvs/ip_vs_ctl.c net/ipv4/ipvs/ip_vs_ctl.c:1332:2: warning: context imbalance in 'ip_vs_edit_service' - unexpected unlock Acked-by: Sven Wegener <[email protected]> Acked-by: Julius Volz <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2008-09-15udp: Fix rcv socket lockingHerbert Xu1-29/+33
The previous patch in response to the recursive locking on IPsec reception is broken as it tries to drop the BH socket lock while in user context. This patch fixes it by shrinking the section protected by the socket lock to sock_queue_rcv_skb only. The only reason we added the lock is for the accounting which happens in that function. Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-12net: ip_vs_proto_{tcp,udp} build fixStephen Rothwell2-0/+2
Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 into ↵Simon Horman3-3/+48
lvs-next-2.6