aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)AuthorFilesLines
2010-11-28net: add some KERN_CONT markers to continuation linesUwe Kleine-König1-16/+16
Cc: [email protected] Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-28tcp: restrict net.ipv4.tcp_adv_win_scale (#20312)Alexey Dobriyan1-1/+5
tcp_win_from_space() does the following: if (sysctl_tcp_adv_win_scale <= 0) return space >> (-sysctl_tcp_adv_win_scale); else return space - (space >> sysctl_tcp_adv_win_scale); "space" is int. As per C99 6.5.7 (3) shifting int for 32 or more bits is undefined behaviour. Indeed, if sysctl_tcp_adv_win_scale is exactly 32, space >> 32 equals space and function returns 0. Which means we busyloop in tcp_fixup_rcvbuf(). Restrict net.ipv4.tcp_adv_win_scale to [-31, 31]. Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312 Steps to reproduce: echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale wget www.kernel.org [softlockup] Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-27netns: Don't leak others' openreq-s in procPavel Emelyanov1-1/+3
The /proc/net/tcp leaks openreq sockets from other namespaces. Signed-off-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-27rtnl: make link af-specific updates atomicThomas Graf1-5/+21
As David pointed out correctly, updates to af-specific attributes are currently not atomic. If multiple changes are requested and one of them fails, previous updates may have been applied already leaving the link behind in a undefined state. This patch splits the function parse_link_af() into two functions validate_link_af() and set_link_at(). validate_link_af() is placed to validate_linkmsg() check for errors as early as possible before any changes to the link have been made. set_link_af() is called to commit the changes later. This method is not fail proof, while it is currently sufficient to make set_link_af() inerrable and thus 100% atomic, the validation function method will not be able to detect all error scenarios in the future, there will likely always be errors depending on states which are f.e. not protected by rtnl_mutex and thus may change between validation and setting. Also, instead of silently ignoring unknown address families and config blocks for address families which did not register a set function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are returned to avoid comitting 4 out of 5 update requests without notifying the user. Signed-off-by: Thomas Graf <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-24tcp: Make TCP_MAXSEG minimum more correct.David S. Miller1-1/+1
Use TCP_MIN_MSS instead of constant 64. Reported-by: Min Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-24xps: Improvements in TX queue selectionTom Herbert1-1/+4
In dev_pick_tx, don't do work in calculating queue index or setting the index in the sock unless the device has more than one queue. This allows the sock to be set only with a queue index of a multi-queue device which is desirable if device are stacked like in a tunnel. We also allow the mapping of a socket to queue to be changed. To maintain in order packet transmission a flag (ooo_okay) has been added to the sk_buff structure. If a transport layer sets this flag on a packet, the transmit queue can be changed for the socket. Presumably, the transport would set this if there was no possbility of creating OOO packets (for instance, there are no packets in flight for the socket). This patch includes the modification in TCP output for setting this flag. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-22Net: ipv4: netfilter: Makefile: Remove deprecated kbuild goal definitionsTracey Dent1-3/+3
Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-21net: allow GFP_HIGHMEM in __vmalloc()Eric Dumazet1-1/+1
We forgot to use __GFP_HIGHMEM in several __vmalloc() calls. In ceph, add the missing flag. In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is cleaner and allows using HIGHMEM pages as well. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-19Merge branch 'master' of ↵David S. Miller1-0/+3
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bonding/bond_main.c net/core/net-sysfs.c net/ipv6/addrconf.c
2010-11-18igmp: refine skb allocationsEric Dumazet1-4/+13
IGMP allocates MTU sized skbs. This may fail for large MTU (order-2 allocations), so add a fallback to try lower sizes. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-18bonding: IGMP handling cleanupEric Dumazet1-13/+19
Instead of iterating in_dev->mc_list from bonding driver, its better to call a helper function provided by igmp.c Details of implementation (locking) are private to igmp code. ip_mc_rejoin_group(struct ip_mc_list *im) becomes ip_mc_rejoin_groups(struct in_device *in_dev); Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-17net: ipv4: tcp_probe: cleanup snprintf() useVasiliy Kulikov1-2/+2
snprintf() returns number of bytes that were copied if there is no overflow. This code uses return value as number of copied bytes. Theoretically format string '%lu.%09lu %pI4:%u %pI4:%u %d %#x %#x %u %u %u %u\n' may be expanded up to 163 bytes. In reality tv.tv_sec is just few bytes instead of 20, 2 ports are just 5 bytes each instead of 10, length is 5 bytes instead of 10. The rest is an unstrusted input. Theoretically if tv_sec is big then copy_to_user() would overflow tbuf. tbuf was increased to fit in 163 bytes. snprintf() is used to follow return value semantic. Signed-off-by: Vasiliy Kulikov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-17net: use the macros defined for the members of flowiChangli Gao17-198/+106
Use the macros defined for the members of flowi to clean the code up. Signed-off-by: Changli Gao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-17ipv4: AF_INET link address familyThomas Graf1-0/+75
Implements the AF_INET link address family exposing the per device configuration settings via netlink using the attribute IFLA_INET_CONF. The format of IFLA_INET_CONF differs depending on the direction the attribute is sent. The attribute sent by the kernel consists of a u32 array, basically a 1:1 copy of in_device->cnf.data[]. The attribute expected by the kernel must consist of a sequence of nested u32 attributes, each representing a change request, e.g. [IFLA_INET_CONF] = { [IPV4_DEVCONF_FORWARDING] = 1, [IPV4_DEVCONF_NOXFRM] = 0, } libnl userspace API documentation and example available from: http://www.infradead.org/~tgr/libnl/doc-git/group__link__inet.html Signed-off-by: Thomas Graf <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-17network: tcp_connect should return certain errors up the stackEric Paris1-1/+4
The current tcp_connect code completely ignores errors from sending an skb. This makes sense in many situations (like -ENOBUFFS) but I want to be able to immediately fail connections if they are denied by the SELinux netfilter hook. Netfilter does not normally return ECONNREFUSED when it drops a packet so we respect that error code as a final and fatal error that can not be recovered. Based-on-patch-by: Patrick McHardy <[email protected]> Signed-off-by: Eric Paris <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-16xfrm: update flowi saddr in icmp_send if unsetUlrich Weber1-0/+3
otherwise xfrm_lookup will fail to find correct policy Signed-off-by: Ulrich Weber <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-16udp: use atomic_inc_not_zero_hintEric Dumazet1-2/+2
UDP sockets refcount is usually 2, unless an incoming frame is going to be queued in receive or backlog queue. Using atomic_inc_not_zero_hint() permits to reduce latency, because processor issues less memory transactions. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-15Docs/Kconfig: Update: osdl.org -> linuxfoundation.orgMichael Witten1-1/+3
Some of the documentation refers to web pages under the domain `osdl.org'. However, `osdl.org' now redirects to `linuxfoundation.org'. Rather than rely on redirections, this patch updates the addresses appropriately; for the most part, only documentation that is meant to be current has been updated. The patch should be pretty quick to scan and check; each new web-page url was gotten by trying out the original URL in a browser and then simply copying the the redirected URL (formatting as necessary). There is some conflict as to which one of these domain names is preferred: linuxfoundation.org linux-foundation.org So, I wrote: [email protected] and got this reply: Message-ID: <[email protected]> Date: Mon, 15 Nov 2010 10:41:42 -0800 From: David Ames <[email protected]> ... linuxfoundation.org is preferred. The canonical name for our web site is www.linuxfoundation.org. Our list site is actually lists.linux-foundation.org. Regarding email linuxfoundation.org is preferred there are a few people who choose to use linux-foundation.org for their own reasons. Consequently, I used `linuxfoundation.org' for web pages and `lists.linux-foundation.org' for mailing-list web pages and email addresses; the only personal email address I updated from `@osdl.org' was that of Andrew Morton, who prefers `linux-foundation.org' according `git log'. Signed-off-by: Michael Witten <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2010-11-15xfrm: use gre key as flow upper protocol infoTimo Teräs2-5/+22
The GRE Key field is intended to be used for identifying an individual traffic flow within a tunnel. It is useful to be able to have XFRM policy selector matches to have different policies for different GRE tunnels. Signed-off-by: Timo Teräs <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-15netfilter: nf_nat_amanda: rename a variableEric Dumazet1-4/+4
Avoid a sparse warning about 'ret' variable shadowing Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2010-11-15netfilter: add __rcu annotationsEric Dumazet2-7/+15
Use helpers to reduce number of sparse warnings (CONFIG_SPARSE_RCU_POINTER=y) Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2010-11-15ipv4: Fix build with multicast disabled.David S. Miller1-10/+10
net/ipv4/igmp.c: In function 'ip_mc_inc_group': net/ipv4/igmp.c:1228: error: implicit declaration of function 'for_each_pmc_rtnl' net/ipv4/igmp.c:1228: error: expected ';' before '{' token net/ipv4/igmp.c: In function 'ip_mc_unmap': net/ipv4/igmp.c:1333: error: expected ';' before 'igmp_group_dropped' ... Move for_each_pmc_rcu and for_each_pmc_rtnl macro definitions outside of multicast ifdef protection. Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-15netfilter: nf_nat: don't use atomic bit operationChangli Gao1-2/+2
As we own the conntrack and the others can't see it until we confirm it, we don't need to use atomic bit operation on ct->status. Signed-off-by: Changli Gao <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2010-11-15netfilter: xt_LOG: do print MAC header on FORWARDJan Engelhardt1-2/+1
I am observing consistent behavior even with bridges, so let's unlock this. xt_mac is already usable in FORWARD, too. Section 9 of http://ebtables.sourceforge.net/br_fw_ia/br_fw_ia.html#section9 says the MAC source address is changed, but my observation does not match that claim -- the MAC header is retained. Signed-off-by: Jan Engelhardt <[email protected]> [Patrick; code inspection seems to confirm this] Signed-off-by: Patrick McHardy <[email protected]>
2010-11-14Merge branch 'master' of ↵David S. Miller7-24/+22
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2010-11-12tcp: Don't change unlocked socket state in tcp_v4_err().David S. Miller1-5/+3
Alexey Kuznetsov noticed a regression introduced by commit f1ecd5d9e7366609d640ff4040304ea197fbc618 ("Revert Backoff [v3]: Revert RTO on ICMP destination unreachable") The RTO and timer modification code added to tcp_v4_err() doesn't check sock_owned_by_user(), which if true means we don't have exclusive access to the socket and therefore cannot modify it's critical state. Just skip this new code block if sock_owned_by_user() is true and eliminate the now superfluous sock_owned_by_user() code block contained within. Reported-by: Alexey Kuznetsov <[email protected]> Signed-off-by: David S. Miller <[email protected]> CC: Damian Lukowski <[email protected]> Acked-by: Eric Dumazet <[email protected]>
2010-11-12igmp: RCU conversion of in_dev->mc_listEric Dumazet1-119/+104
in_dev->mc_list is protected by one rwlock (in_dev->mc_list_lock). This can easily be converted to a RCU protection. Writers hold RTNL, so mc_list_lock is removed, not replaced by a spinlock. Signed-off-by: Eric Dumazet <[email protected]> Cc: Cypher Wu <[email protected]> Cc: Américo Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-11ipv4: Make rt->fl.iif tests lest obscure.David S. Miller5-15/+15
When we test rt->fl.iif against zero, we're seeing if it's an output or an input route. Make that explicit with some helper functions. Signed-off-by: David S. Miller <[email protected]>
2010-11-11net: get rid of rtable->idevEric Dumazet2-57/+4
It seems idev field in struct rtable has no special purpose, but adding extra atomic ops. We hold refcounts on the device itself (using percpu data, so pretty cheap in current kernel). infiniband case is solved using dst.dev instead of idev->dev Removal of this field means routing without route cache is now using shared data, percpu data, and only potential contention is a pair of atomic ops on struct neighbour per forwarded packet. About 5% speedup on routing test. Signed-off-by: Eric Dumazet <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Sean Hefty <[email protected]> Cc: Hal Rosenstock <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-10tcp: Increase TCP_MAXSEG socket option minimum.David S. Miller1-1/+1
As noted by Steve Chen, since commit f5fff5dc8a7a3f395b0525c02ba92c95d42b7390 ("tcp: advertise MSS requested by user") we can end up with a situation where tcp_select_initial_window() does a divide by a zero (or even negative) mss value. The problem is that sometimes we effectively subtract TCPOLEN_TSTAMP_ALIGNED and/or TCPOLEN_MD5SIG_ALIGNED from the mss. Fix this by increasing the minimum from 8 to 64. Reported-by: Steve Chen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-10net: avoid limits overflowEric Dumazet5-15/+17
Robin Holt tried to boot a 16TB machine and found some limits were reached : sysctl_tcp_mem[2], sysctl_udp_mem[2] We can switch infrastructure to use long "instead" of "int", now atomic_long_t primitives are available for free. Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Robin Holt <[email protected]> Reviewed-by: Robin Holt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-09net/ipv4/tcp.c: Update WARN usesJoe Perches1-9/+7
Coalesce long formats. Align arguments. Remove KERN_<level>. Signed-off-by: Joe Perches <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-09inet: fix ip_mc_drop_socket()Eric Dumazet1-3/+1
commit 8723e1b4ad9be4444 (inet: RCU changes in inetdev_by_index()) forgot one call site in ip_mc_drop_socket() We should not decrease idev refcount after inetdev_by_index() call, since refcount is not increased anymore. Reported-by: Markus Trippelsdorf <[email protected]> Reported-by: Miles Lane <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-04inet_diag: Make sure we actually run the same bytecode we audited.Nelson Elhage1-11/+16
We were using nlmsg_find_attr() to look up the bytecode by attribute when auditing, but then just using the first attribute when actually running bytecode. So, if we received a message with two attribute elements, where only the second had type INET_DIAG_REQ_BYTECODE, we would validate and run different bytecode strings. Fix this by consistently using nlmsg_find_attr everywhere. Signed-off-by: Nelson Elhage <[email protected]> Signed-off-by: Thomas Graf <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-04fib: fib_result_assign() should not change fib refcountsEric Dumazet1-4/+1
After commit ebc0ffae5 (RCU conversion of fib_lookup()), fib_result_assign() should not change fib refcounts anymore. Thanks to Michael who did the bisection and bug report. Reported-by: Michael Ellerman <[email protected]> Tested-by: Michael Ellerman <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-11-03Merge branch 'master' of ↵David S. Miller3-20/+22
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
2010-11-03ipv4: netfilter: ip_tables: fix information leak to userlandVasiliy Kulikov1-0/+1
Structure ipt_getinfo is copied to userland with the field "name" that has the last elements unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2010-11-03ipv4: netfilter: arp_tables: fix information leak to userlandVasiliy Kulikov1-0/+1
Structure arpt_getinfo is copied to userland with the field "name" that has the last elements unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2010-11-01tree-wide: fix comment/printk typosUwe Kleine-König1-1/+1
"gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2010-10-30ip_gre: fix fallback tunnel setupEric Dumazet1-3/+3
Before making the fallback tunnel visible to lookups, we should make sure it is completely setup, once ipgre_tunnel_init() had been called and tstats per_cpu pointer allocated. move rcu_assign_pointer(ign->tunnels_wc[0], tunnel); from ipgre_fb_tunnel_init() to ipgre_init_net() Based on a patch from Pavel Emelyanov Reported-by: Pavel Emelyanov <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-29netfilter: nf_nat: fix compiler warning with CONFIG_NF_CT_NETLINK=nPatrick McHardy1-20/+20
net/ipv4/netfilter/nf_nat_core.c:52: warning: 'nf_nat_proto_find_get' defined but not used net/ipv4/netfilter/nf_nat_core.c:66: warning: 'nf_nat_proto_put' defined but not used Reported-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2010-10-28fib: Fix fib zone and its hash leak on namespace stopPavel Emelyanov3-1/+24
When we stop a namespace we flush the table and free one, but the added fn_zone-s (and their hashes if grown) are leaked. Need to free. Tries releases all its stuff in the flushing code. Shame on us - this bug exists since the very first make-fib-per-net patches in 2.6.27 :( Signed-off-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-27tunnels: Fix tunnels change rcu protectionPavel Emelyanov2-0/+2
After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug was introduced into the SIOCCHGTUNNEL code. The tunnel is first unlinked, then addresses change, then it is linked back probably into another bucket. But while changing the parms, the hash table is unlocked to readers and they can lookup the improper tunnel. Respective commits are b7285b79 (ipip: get rid of ipip_lock), 1507850b (gre: get rid of ipgre_lock), 3a43be3c (sit: get rid of ipip6_lock) and 94767632 (ip6tnl: get rid of ip6_tnl_lock). The quick fix is to wait for quiescent state to pass after unlinking, but if it is inappropriate I can invent something better, just let me know. Signed-off-by: Pavel Emelyanov <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-27inetpeer: __rcu annotationsEric Dumazet1-58/+80
Adds __rcu annotations to inetpeer (struct inet_peer)->avl_left (struct inet_peer)->avl_right This is a tedious cleanup, but removes one smp_wmb() from link_to_pool() since we now use more self documenting rcu_assign_pointer(). Note the use of RCU_INIT_POINTER() instead of rcu_assign_pointer() in all cases we dont need a memory barrier. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-27tunnels: add __rcu annotationsEric Dumazet1-10/+19
Add __rcu annotations to : (struct ip_tunnel)->prl (struct ip_tunnel_prl_entry)->next (struct xfrm_tunnel)->next struct xfrm_tunnel *tunnel4_handlers struct xfrm_tunnel *tunnel64_handlers And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-27net: add __rcu annotations to protocolEric Dumazet1-3/+5
Add __rcu annotations to : struct net_protocol *inet_protos struct net_protocol *inet6_protos And use appropriate casts to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-27ipv4: add __rcu annotations to routes.cEric Dumazet1-29/+46
Add __rcu annotations to : (struct dst_entry)->rt_next (struct rt_hash_bucket)->chain And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-26fib_hash: fix rcu sparse and logical errorsEric Dumazet1-16/+20
While fixing CONFIG_SPARSE_RCU_POINTER errors, I had to fix accesses to fz->fz_hash for real. - &fz->fz_hash[fn_hash(f->fn_key, fz)] + rcu_dereference(fz->fz_hash) + fn_hash(f->fn_key, fz) Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-25ipv4: add __rcu annotations to ip_ra_chainEric Dumazet1-3/+7
Add __rcu annotations to : (struct ip_ra_chain)->next struct ip_ra_chain *ip_ra_chain; And use appropriate rcu primitives. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-25net: add __rcu annotation to sk_filterEric Dumazet1-1/+1
Add __rcu annotation to : (struct sock)->sk_filter And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>