aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2011-03-15ipvs: rename estimator functionsJulian Anastasov2-8/+8
Rename ip_vs_new_estimator to ip_vs_start_estimator and ip_vs_kill_estimator to ip_vs_stop_estimator to better match their logic. Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2011-03-15ipvs: optimize rates readingJulian Anastasov2-39/+25
Move the estimator reading from estimation_timer to user context. ip_vs_read_estimator() will be used to decode the rate values. As the decoded rates are not set by estimation timer there is no need to reset them in ip_vs_zero_stats. There is no need ip_vs_new_estimator() to encode stats to rates, if the destination is in trash both the stats and the rates are inactive. Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2011-03-15ipvs: properly zero stats and ratesJulian Anastasov2-43/+68
Currently, the new percpu counters are not zeroed and the zero commands do not work as expected, we still show the old sum of percpu values. OTOH, we can not reset the percpu counters from user context without causing the incrementing to use old and bogus values. So, as Eric Dumazet suggested fix that by moving all overhead to stats reading in user context. Do not introduce overhead in timer context (estimator) and incrementing (packet handling in softirqs). The new ustats0 field holds the zero point for all counter values, the rates always use 0 as base value as before. When showing the values to user space just give the difference between counters and the base values. The only drawback is that percpu stats are not zeroed, they are accessible only from /proc and are new interface, so it should not be a compatibility problem as long as the sum stats are correct after zeroing. Signed-off-by: Julian Anastasov <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2011-03-15ipvs: reorganize tot_statsJulian Anastasov3-26/+28
The global tot_stats contains cpustats field just like the stats for dest and svc, so better use it to simplify the usage in estimation_timer. As tot_stats is registered as estimator we can remove the special ip_vs_read_cpu_stats call for tot_stats. Fix ip_vs_read_cpu_stats to be called under stats lock because it is still used as synchronization between estimation timer and user context (the stats readers). Also, make sure ip_vs_stats_percpu_show reads properly the u64 stats from user context. Signed-off-by: Julian Anastasov <[email protected]> Eric Dumazet <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2011-03-15netfilter:ipvs: use kmemdupShan Wei2-7/+5
The semantic patch that makes this output is available in scripts/coccinelle/api/memdup.cocci. More information about semantic patching is available at http://coccinelle.lip6.fr/ Signed-off-by: Shan Wei <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2011-03-15ipvs: remove _bh from percpu stats readingJulian Anastasov1-4/+4
ip_vs_read_cpu_stats is called only from timer, so no need for _bh locks. Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Hans Schillstrom <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2011-03-15ipvs: avoid lookup for fwmark 0Julian Anastasov1-3/+5
Restore the previous behaviour to lookup for fwmark service only when fwmark is non-null. This saves only CPU. Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Hans Schillstrom <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2011-03-14net: dcbnl: Update copyright datesMark Rustad1-1/+1
Signed-off-by: Mark Rustad <[email protected]> Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-14tcp_cubic: fix low utilization of CUBIC with HyStartSangtae Ha1-0/+7
HyStart sets the initial exit point of slow start. Suppose that HyStart exits at 0.5BDP in a BDP network and no history exists. If the BDP of a network is large, CUBIC's initial cwnd growth may be too conservative to utilize the link. CUBIC increases the cwnd 20% per RTT in this case. Signed-off-by: Sangtae Ha <[email protected]> Acked-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-14tcp_cubic: make the delay threshold of HyStart less sensitiveSangtae Ha1-1/+1
Make HyStart less sensitive to abrupt delay variations due to buffer bloat. Signed-off-by: Sangtae Ha <[email protected]> Acked-by: Stephen Hemminger <[email protected]> Reported-by: Lucas Nussbaum <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-14tcp_cubic: enable high resolution ack time if neededstephen hemminger1-0/+4
This is a refined version of an earlier patch by Lucas Nussbaum. Cubic needs RTT values in milliseconds. If HZ < 1000 then the values will be too coarse. Signed-off-by: Stephen Hemminger <[email protected]> Reported-by: Lucas Nussbaum <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-14tcp_cubic: fix clock dependencystephen hemminger1-12/+19
The hystart code was written with assumption that HZ=1000. Replace the use of jiffies with bictcp_clock as a millisecond real time clock. Signed-off-by: Stephen Hemminger <[email protected]> Reported-by: Lucas Nussbaum <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-14tcp_cubic: make ack train delta value a parameterstephen hemminger1-1/+4
Make the spacing between ACK's that indicates a train a tuneable value like other hystart values. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-14tcp_cubic: fix comparison of jiffiesstephen hemminger1-2/+4
Jiffies wraps around therefore the correct way to compare is to use cast to signed value. Note: cubic is not using full jiffies value on 64 bit arch because using full unsigned long makes struct bictcp grow too large for the available ca_priv area. Includes correction from Sangtae Ha to improve ack train detection. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-14tcp: fix RTT for quick packets in congestion controlstephen hemminger1-1/+1
In the congestion control interface, the callback for each ACK includes an estimated round trip time in microseconds. Some algorithms need high resolution (Vegas style) but most only need jiffie resolution. If RTT is not accurate (like a retransmission) -1 is used as a flag value. When doing coarse resolution if RTT is less than a a jiffie then 0 should be returned rather than no estimate. Otherwise algorithms that expect good ack's to trigger slow start (like CUBIC Hystart) will be confused. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-14af_unix: update locking commentDaniel Baluta1-1/+1
We latch our state using a spinlock not a r/w kind of lock. Signed-off-by: Daniel Baluta <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-14bridge: skip forwarding delay if not using STPstephen hemminger1-2/+2
If Spanning Tree Protocol is not enabled, there is no good reason for the bridge code to wait for the forwarding delay period before enabling the link. The purpose of the forwarding delay is to allow STP to learn about other bridges before nominating itself. The only possible impact is that when starting up a new port the bridge may flood a packet now, where previously it might have seen traffic from the other host and preseeded the forwarding table. Includes change for local variable br already available in that func. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-14bridge: control carrier based on ports onlinestephen hemminger3-13/+27
This makes the bridge device behave like a physical device. In earlier releases the bridge always asserted carrier. This changes the behavior so that bridge device carrier is on only if one or more ports are in the forwarding state. This should help IPv6 autoconfiguration, DHCP, and routing daemons. I did brief testing with Network and Virt manager and they seem fine, but since this changes behavior of bridge, it should wait until net-next (2.6.39). Signed-off-by: Stephen Hemminger <[email protected]> Reviewed-by: Nicolas de Pesloüan <[email protected]> Tested-By: Adam Majer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-14Merge branch 'tipc-Mar14-2011' of ↵David S. Miller1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/net-next-2.6
2011-03-14pktgen: bug fix in transmission headers with frags=0Daniel Turull1-2/+1
(bug introduced by commit 26ad787962ef84677a48c560 (pktgen: speedup fragmented skbs) The headers of pktgen were incorrectly added in a pktgen packet without frags (frags=0). There was an offset in the pktgen headers. The cause was in reusing the pgh variable as a return variable in skb_put when adding the payload to the skb. Signed-off-by: Daniel Turull <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Eric Dumazet <[email protected]>
2011-03-14mac80211: fix channel type recalculation with HT and non-HT interfacesFelix Fietkau1-0/+3
When running an AP interface along with the cooked monitor interface created by hostapd, adding an interface and deleting it again triggers a channel type recalculation during which the (non-HT) monitor interface takes precedence over the HT AP interface, thus causing the channel type to be set to non-HT. Fix this by ensuring that a more wide channel type will not be overwritten by a less wide channel type. Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: John W. Linville <[email protected]>
2011-03-14mac80211: Shortcut minstrel_ht rate setup for non-MRR capable devicesHelmut Schaa1-7/+34
Devices without multi rate retry support won't be able to use all rates as specified by mintrel_ht. Hence, we can simply skip setting up further rates as the devices will only use the first one. Also add a special case for devices with only two possible tx rates. We use sample_rate -> max_prob_rate for sampling and max_tp_rate -> max_prob_rate by default. Signed-off-by: Helmut Schaa <[email protected]> Signed-off-by: John W. Linville <[email protected]>
2011-03-14netfilter: nf_conntrack: fix sysctl memory leakStephen Hemminger1-0/+1
Message in log because sysctl table was not empty at netns exit WARNING: at net/sysctl_net.c:84 sysctl_net_exit+0x2a/0x2c() Instrumenting showed that the nf_conntrack_timestamp was the entry that was being created but not cleared. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2011-03-14Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds3-15/+64
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: NFSROOT should default to "proto=udp" nfs4: remove duplicated #include NFSv4: nfs4_state_mark_reclaim_nograce() should be static NFSv4: Fix the setlk error handler NFSv4.1: Fix the handling of the SEQUENCE status bits NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses NFSv4.1 reclaim complete must wait for completion NFSv4: remove duplicate clientid in struct nfs_client NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY sunrpc: Propagate errors from xs_bind() through xs_create_sock() (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid nfs: fix compilation warning nfs: add kmalloc return value check in decode_and_add_ds SUNRPC: Remove resource leak in svc_rdma_send_error() nfs: close NFSv4 COMMIT vs. CLOSE race SUNRPC: Close a race in __rpc_wait_for_completion_task()
2011-03-14netfilter: x_tables: return -ENOENT for non-existant matches/targetsPatrick McHardy1-2/+2
As Stephen correctly points out, we need to return -ENOENT in xt_find_match()/xt_find_target() after the patch "netfilter: x_tables: misuse of try_then_request_module" in order to properly indicate a non-existant module to the caller. Signed-off-by: Patrick McHardy <[email protected]>
2011-03-14tipc: delete extra semicolon blocking node deletionPaul Gortmaker1-1/+1
Remove bogus semicolon only recently introduced in 34e46258cb9f5 that blocks cleanup of nodes for N>1 on shutdown. Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-14kill path_lookup()Al Viro1-1/+1
all remaining callers pass LOOKUP_PARENT to it, so flags argument can die; renamed to kern_path_parent() Signed-off-by: Al Viro <[email protected]>
2011-03-13inetpeer: should use call_rcu() variantEric Dumazet1-1/+1
After commit 7b46ac4e77f3224a (inetpeer: Don't disable BH for initial fast RCU lookup.), we should use call_rcu() to wait proper RCU grace period. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-13xfrm: Add user interface for esn and big anti-replay windowsSteffen Klassert2-14/+87
This patch adds a netlink based user interface to configure esn and big anti-replay windows. The new netlink attribute XFRMA_REPLAY_ESN_VAL is used to configure the new implementation. If the XFRM_STATE_ESN flag is set, we use esn and support for big anti-replay windows for the configured state. If this flag is not set we use the new implementation with 32 bit sequence numbers. A big anti-replay window can be configured in this case anyway. Signed-off-by: Steffen Klassert <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-13xfrm: Add support for IPsec extended sequence numbersSteffen Klassert2-1/+193
This patch adds support for IPsec extended sequence numbers (esn) as defined in RFC 4303. The bits to manage the anti-replay window are based on a patch from Alex Badea. Signed-off-by: Steffen Klassert <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-13xfrm: Support anti-replay window size bigger than 32 packetsSteffen Klassert1-1/+206
As it is, the anti-replay bitmap in struct xfrm_replay_state can only accomodate 32 packets. Even though it is possible to configure anti-replay window sizes up to 255 packets from userspace. So we reject any packet with a sequence number within the configured window but outside the bitmap. With this patch, we represent the anti-replay window as a bitmap of variable length that can be accessed via the new struct xfrm_replay_state_esn. Thus, we have no limit on the window size anymore. To use the new anti-replay window implementantion, new userspace tools are required. We leave the old implementation untouched to stay in sync with old userspace tools. Signed-off-by: Steffen Klassert <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-13xfrm: Move IPsec replay detection functions to a separate fileSteffen Klassert6-124/+154
To support multiple versions of replay detection, we move the replay detection functions to a separate file and make them accessible via function pointers contained in the struct xfrm_replay. Signed-off-by: Steffen Klassert <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-13esp6: Add support for IPsec extended sequence numbersSteffen Klassert1-19/+86
This patch adds IPsec extended sequence numbers support to esp6. We use the authencesn crypto algorithm to handle esp with separate encryption/authentication algorithms. Signed-off-by: Steffen Klassert <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-13esp4: Add support for IPsec extended sequence numbersSteffen Klassert1-18/+82
This patch adds IPsec extended sequence numbers support to esp4. We use the authencesn crypto algorithm to handle esp with separate encryption/authentication algorithms. Signed-off-by: Steffen Klassert <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-13xfrm: Use separate low and high order bits of the sequence numbers in ↵Steffen Klassert6-9/+9
xfrm_skb_cb To support IPsec extended sequence numbers, we split the output sequence numbers of xfrm_skb_cb in low and high order 32 bits and we add the high order 32 bits to the input sequence numbers. All users are updated accordingly. Signed-off-by: Steffen Klassert <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-13Merge branch 'tipc-Mar13-2011' of ↵David S. Miller21-387/+273
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/net-next-2.6
2011-03-13ipv4: Fix PMTU update.Hiroaki SHIMODA2-5/+18
On current net-next-2.6, when Linux receives ICMP Type: 3, Code: 4 (Destination unreachable (Fragmentation needed)), icmp_unreach -> ip_rt_frag_needed (peer->pmtu_expires is set here) -> tcp_v4_err -> do_pmtu_discovery -> ip_rt_update_pmtu (peer->pmtu_expires is already set, so check_peer_pmtu is skipped.) -> check_peer_pmtu check_peer_pmtu is skipped and MTU is not updated. To fix this, let check_peer_pmtu execute unconditionally. And some minor fixes 1) Avoid potential peer->pmtu_expires set to be zero. 2) In check_peer_pmtu, argument of time_before is reversed. 3) check_peer_pmtu expects peer->pmtu_orig is initialized as zero, but not initialized. Signed-off-by: Hiroaki SHIMODA <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-13tipc: Eliminate obsolete routine for handling routed messagesAllan Stephens3-16/+0
Eliminates a routine that is used in handling messages arriving from another cluster or zone. Such messages can no longer be received by TIPC now that multi-cluster and multi-zone network support has been eliminated. Signed-off-by: Allan Stephens <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-13tipc: Eliminate remaining support for routing table messagesAllan Stephens3-46/+4
Gets rid of all remaining code relating to ROUTE_DISTRIBUTOR messages. These messages were only used in multi-cluster and multi-zone networks, which TIPC no longer supports. (For safety, TIPC now treats such messages the same way that it handles other unrecognized messages.) Signed-off-by: Allan Stephens <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-13tipc: Remove bearer flag indicating existence of broadcast addressAllan Stephens2-7/+2
Eliminates the flag in the TIPC bearer structure that indicates if the bearer supports broadcasting, since the flag is always set to 1 and serves no useful purpose. Signed-off-by: Allan Stephens <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-13tipc: Don't respond to neighbor discovery request on blocked bearerAllan Stephens1-1/+1
Adds a check to prevent TIPC from trying to respond to an incoming LINK_CONFIG request message if the associated bearer is currently prohibited from sending messages. Signed-off-by: Allan Stephens <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-13tipc: Eliminate unnecessary constant for neighbor discovery msg sizeAllan Stephens2-3/+2
Eliminates an unnecessary constant that defines the size of a LINK_CONFIG message, and uses one of the existing standard message size symbols in its place. (The defunct constant was located in the wrong place anyway, since it was grouped with other constants that define message users instead of message sizes.) Signed-off-by: Allan Stephens <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-13tipc: Remove unused field in bearer structureAllan Stephens2-3/+0
Eliminates a field in TIPC's bearer objects that is set, but never referenced. Signed-off-by: Allan Stephens <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-13tipc: Correct misnamed references to neighbor discovery domainAllan Stephens3-9/+9
Renames items that are improperly labelled as "network scope" items (which are represented by simple integer values) rather than "network domain" items (which are represented by <Z.C.N>-type network addresses). This change is purely cosmetic, and does not affect the operation of TIPC. Signed-off-by: Allan Stephens <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-13tipc: Optimizations to link creation codeAllan Stephens5-36/+28
Enhances link creation code as follows: 1) Detects illegal attempts to add a requested link earlier in the link creation process. This prevents TIPC from wasting time initializing a link object it then throws away, and also eliminates the code needed to do the throwing away. 2) Passes in the node object associated with the requested link. This allows TIPC to eliminate a search to locate the node object, as well as code that attempted to create the node if it doesn't exist. Signed-off-by: Allan Stephens <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-13tipc: Give Tx of discovery responses priority over link messagesAllan Stephens1-7/+9
Delay releasing the node lock when processing a neighbor discovery message until after the optional discovery response message has been sent. This helps ensure that any link protocol messages sent by a link endpoint created as a result of a neighbor discovery request are received after the discovery response is received, thereby giving the receiving node a chance to create a peer link endpoint to consume those link protocol messages, if one does not already exist. Signed-off-by: Allan Stephens <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-13tipc: Cosmetic changes to neighbor discovery logicAllan Stephens1-44/+59
Reworks the appearance of the routine that processes incoming LINK_CONFIG messages to keep the main logic flow at a consistent level of indentation, and to add comments outlining the various phases involved in processing each message. This rework is being done to allow upcoming enhancements to this routine to be integrated more cleanly. The diff isn't really readable, so know that it was a case of the old code being like: tipc_disc_recv_msg(..) { if (in_own_cluster(orig)) { ... lines and lines of stuff ... } } which is now replaced with the more sane: tipc_disc_recv_msg(..) { if (!in_own_cluster(orig)) return; ... lines and lines of stuff ... } Instances of spin locking within the reindented block were replaced with the identical tipc_node_[un]lock() abstractions. Note that all these changes are cosmetic in nature, and do not change the way LINK_CONFIG messages are processed. Signed-off-by: Allan Stephens <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-13tipc: Fix redundant link field handling in link protocol messageAllan Stephens1-1/+3
Ensures that the "redundant link exists" field of the LINK_PROTOCOL messages sent by a link endpoint is set if and only if the sending node has at least one other working link to the peer node. Previously, the bit was set only if there were at least 2 working links to the peer node, meaning the bit was incorrectly left unset in messages sent by a non-working link endpoint when exactly one alternate working link was available. The revised code now takes the state of the link sending the message into account when deciding if an alternate link exists. Signed-off-by: Allan Stephens <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-13tipc: make msg_set_redundant_link() consistent with other set opsAllan Stephens2-11/+3
All the other boolean like msg_set_X(m) operations don't export both a msg_set_X(a) and a msg_clear_X(m), but instead just have the single msg_set_X(m, val) variant. Make the redundant_link one consistent by having the set take a value, and delete the msg_clear_redundant_link() anomoly. This is a cosmetic change and should not change behaviour. Signed-off-by: Allan Stephens <[email protected]> Signed-off-by: Paul Gortmaker <[email protected]>
2011-03-13tipc: cosmetic - function names are not to be full sentencesPaul Gortmaker4-9/+9
Function names like "tipc_node_has_redundant_links" are unweildy and result in long lines even for simple lines. The "has" doesn't contribute any value add, so dropping that is a slight step in the right direction. This is a cosmetic change, basic result of: for i in `grep -l tipc_node_has_ *` ; do sed -i s/tipc_node_has_/tipc_node_/ $i ; done Signed-off-by: Paul Gortmaker <[email protected]>