aboutsummaryrefslogtreecommitdiff
path: root/include/net/netns/ipv4.h
AgeCommit message (Collapse)AuthorFilesLines
2015-02-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+1
Conflicts: drivers/net/vxlan.c drivers/vhost/net.c include/linux/if_vlan.h net/core/dev.c The net/core/dev.c conflict was the overlap of one commit marking an existing function static whilst another was adding a new function. In the include/linux/if_vlan.h case, the type used for a local variable was changed in 'net', whereas the function got rewritten to fix a stacked vlan bug in 'net-next'. In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next' overlapped with an endainness fix for VHOST 1.0 in 'net'. In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter in 'net-next' whereas in 'net' there was a bug fix to pass in the correct network namespace pointer in calls to this function. Signed-off-by: David S. Miller <[email protected]>
2015-02-01ipv4: tcp: get rid of ugly unicast_sockEric Dumazet1-0/+1
In commit be9f4a44e7d41 ("ipv4: tcp: remove per net tcp_sock") I tried to address contention on a socket lock, but the solution I chose was horrible : commit 3a7c384ffd57e ("ipv4: tcp: unicast_sock should not land outside of TCP stack") addressed a selinux regression. commit 0980e56e506b ("ipv4: tcp: set unicast_sock uc_ttl to -1") took care of another regression. commit b5ec8eeac46 ("ipv4: fix ip_send_skb()") fixed another regression. commit 811230cd85 ("tcp: ipv4: initialize unicast_sock sk_pacing_rate") was another shot in the dark. Really, just use a proper socket per cpu, and remove the skb_orphan() call, to re-enable flow control. This solves a serious problem with FQ packet scheduler when used in hostile environments, as we do not want to allocate a flow structure for every RST packet sent in response to a spoofed packet. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-01-31ipv4: icmp: use percpu allocationEric Dumazet1-1/+2
Get rid of nr_cpu_ids and use modern percpu allocation. Note that the sockets themselves are not yet allocated using NUMA affinity. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-09-09net/ipv4: bind ip_nonlocal_bind to current netnsVincent Bernat1-0/+1
net.ipv4.ip_nonlocal_bind sysctl was global to all network namespaces. This patch allows to set a different value for each network namespace. Signed-off-by: Vincent Bernat <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-05-14ipv4: make ip_local_reserved_ports per netnsWANG Cong1-0/+4
ip_local_port_range is already per netns, so should ip_local_reserved_ports be. And since it is none by default we don't actually need it when we don't enable CONFIG_SYSCTL. By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port() Cc: "David S. Miller" <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-05-13net: support marking accepting TCP socketsLorenzo Colitti1-0/+1
When using mark-based routing, sockets returned from accept() may need to be marked differently depending on the incoming connection request. This is the case, for example, if different socket marks identify different networks: a listening socket may want to accept connections from all networks, but each connection should be marked with the network that the request came in on, so that subsequent packets are sent on the correct network. This patch adds a sysctl to mark TCP sockets based on the fwmark of the incoming SYN packet. If enabled, and an unmarked socket receives a SYN, then the SYN packet's fwmark is written to the connection's inet_request_sock, and later written back to the accepted socket when the connection is established. If the socket already has a nonzero mark, then the behaviour is the same as it is today, i.e., the listening socket's fwmark is used. Black-box tested using user-mode linux: - IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the mark of the incoming SYN packet. - The socket returned by accept() is marked with the mark of the incoming SYN packet. - Tested with syncookies=1 and syncookies=2. Signed-off-by: Lorenzo Colitti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-05-13net: add a sysctl to reflect the fwmark on repliesLorenzo Colitti1-0/+2
Kernel-originated IP packets that have no user socket associated with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.) are emitted with a mark of zero. Add a sysctl to make them have the same mark as the packet they are replying to. This allows an administrator that wishes to do so to use mark-based routing, firewalling, etc. for these replies by marking the original packets inbound. Tested using user-mode linux: - ICMP/ICMPv6 echo replies and errors. - TCP RST packets (IPv4 and IPv6). Signed-off-by: Lorenzo Colitti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-05-08ping: move ping_group_range out of CONFIG_SYSCTLCong Wang1-1/+6
Similarly, when CONFIG_SYSCTL is not set, ping_group_range should still work, just that no one can change it. Therefore we should move it out of sysctl_net_ipv4.c. And, it should not share the same seqlock with ip_local_port_range. BTW, rename it to ->ping_group_range instead. Cc: David S. Miller <[email protected]> Cc: Francois Romieu <[email protected]> Reported-by: Stefan de Konink <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-05-08ipv4: move local_port_range out of CONFIG_SYSCTLCong Wang1-1/+1
When CONFIG_SYSCTL is not set, ip_local_port_range should still work, just that no one can change it. Therefore we should move it out of sysctl_inet.c. Also, rename it to ->ip_local_ports instead. Cc: David S. Miller <[email protected]> Cc: Francois Romieu <[email protected]> Reported-by: Stefan de Konink <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-01-13ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against ↵Hannes Frederic Sowa1-0/+1
pmtu spoofing While forwarding we should not use the protocol path mtu to calculate the mtu for a forwarded packet but instead use the interface mtu. We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was introduced for multicast forwarding. But as it does not conflict with our usage in unicast code path it is perfect for reuse. I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular dependencies because of IPSKB_FORWARDED. Because someone might have written a software which does probe destinations manually and expects the kernel to honour those path mtus I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone can disable this new behaviour. We also still use mtus which are locked on a route for forwarding. The reason for this change is, that path mtus information can be injected into the kernel via e.g. icmp_err protocol handler without verification of local sockets. As such, this could cause the IPv4 forwarding path to wrongfully emit fragmentation needed notifications or start to fragment packets along a path. Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED won't be set and further fragmentation logic will use the path mtu to determine the fragmentation size. They also recheck packet size with help of path mtu discovery and report appropriate errors. Cc: Eric Dumazet <[email protected]> Cc: David Miller <[email protected]> Cc: John Heffner <[email protected]> Cc: Steffen Klassert <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-12-18inet: make no_pmtu_disc per namespace and kill ipv4_configHannes Frederic Sowa1-0/+1
The other field in ipv4_config, log_martians, was converted to a per-interface setting, so we can just remove the whole structure. Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-10-21tcp_memcontrol: Remove the per netns control.Eric W. Biederman1-1/+0
The code that is implemented is per memory cgroup not per netns, and having per netns bits is just confusing. Remove the per netns bits to make it easier to see what is really going on. Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-09-30net ipv4: Convert ipv4.ip_local_port_range to be per netns v3Eric W. Biederman1-0/+6
- Move sysctl_local_ports from a global variable into struct netns_ipv4. - Modify inet_get_local_port_range to take a struct net, and update all of the callers. - Move the initialization of sysctl_local_ports into sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c v2: - Ensure indentation used tabs - Fixed ip.h so it applies cleanly to todays net-next v3: - Compile fixes of strange callers of inet_get_local_port_range. This patch now successfully passes an allmodconfig build. Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range Originally-by: Samya <[email protected]> Acked-by: Nicolas Dichtel <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-07-31net: split rt_genid for ipv4 and ipv6fan.du1-0/+1
Current net name space has only one genid for both IPv4 and IPv6, it has below drawbacks: - Add/delete an IPv4 address will invalidate all IPv6 routing table entries. - Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table entries even when the policy is only applied for one address family. Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6 separately in a fine granularity. Signed-off-by: Fan Du <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-02-06xfrm: make gc_thresh configurable in all namespacesMichal Kubecek1-0/+1
The xfrm gc threshold can be configured via xfrm{4,6}_gc_thresh sysctl but currently only in init_net, other namespaces always use the default value. This can substantially limit the number of IPsec tunnels that can be effectively used. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2013-01-06tcp: make sysctl_tcp_ecn namespace awareHannes Frederic Sowa1-0/+2
As per suggestion from Eric Dumazet this patch makes tcp_ecn sysctl namespace aware. The reason behind this patch is to ease the testing of ecn problems on the internet and allows applications to tune their own use of ecn. Cc: Eric Dumazet <[email protected]> Cc: David Miller <[email protected]> Cc: Stephen Hemminger <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-09-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+0
Conflicts: drivers/net/team/team.c drivers/net/usb/qmi_wwan.c net/batman-adv/bat_iv_ogm.c net/ipv4/fib_frontend.c net/ipv4/route.c net/l2tp/l2tp_netlink.c The team, fib_frontend, route, and l2tp_netlink conflicts were simply overlapping changes. qmi_wwan and bat_iv_ogm were of the "use HEAD" variety. With help from Antonio Quartulli. Signed-off-by: David S. Miller <[email protected]>
2012-09-18netns: move net->ipv4.rt_genid to net->rt_genidNicolas Dichtel1-1/+0
This commit prepares the use of rt_genid by both IPv4 and IPv6. Initialization is left in IPv4 part. Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextPablo Neira Ayuso1-1/+2
This merges (3f509c6 netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation) to Patrick McHardy's IPv6 NAT changes.
2012-08-30netfilter: add protocol independent NAT corePatrick McHardy1-2/+0
Convert the IPv4 NAT implementation to a protocol independent core and address family specific modules. Signed-off-by: Patrick McHardy <[email protected]>
2012-08-14userns: Use kgids for sysctl_ping_group_rangeEric W. Biederman1-1/+2
- Store sysctl_ping_group_range as a paire of kgid_t values instead of a pair of gid_t values. - Move the kgid conversion work from ping_init_sock into ipv4_ping_group_range - For invalid cases reset to the default disabled state. With the kgid_t conversion made part of the original value sanitation from userspace understand how the code will react becomes clearer and it becomes possible to set the sysctl ping group range from something other than the initial user namespace. Cc: Vasiliy Kulikov <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]>
2012-07-30ipv4: remove rt_cache_rebuild_countEric Dumazet1-2/+0
After IP route cache removal, rt_cache_rebuild_count is no longer used. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-07-20tcp: use hash_32() in tcp_metricsEric Dumazet1-1/+1
Fix a missing roundup_pow_of_two(), since tcpmhash_entries is not guaranteed to be a power of two. Uses hash_32() instead of custom hash. tcpmhash_entries should be an unsigned int. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-07-19ipv4: tcp: remove per net tcp_sockEric Dumazet1-1/+0
tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket per network namespace. This leads to bad behavior on multiqueue NICS, because many cpus contend for the socket lock and once socket lock is acquired, extra false sharing on various socket fields slow down the operations. To better resist to attacks, we use a percpu socket. Each cpu can run without contention, using appropriate memory (local node) Additional features : 1) We also mirror the queue_mapping of the incoming skb, so that answers use the same queue if possible. 2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree() 3) We now limit the number of in-flight RST/ACK [1] packets per cpu, instead of per namespace, and we honor the sysctl_wmem_default limit dynamically. (Prior to this patch, sysctl_wmem_default value was copied at boot time, so any further change would not affect tcp_sock limit) [1] These packets are only generated when no socket was matched for the incoming packet. Reported-by: Bill Sommerfeld <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-07-10tcp: Maintain dynamic metrics in local cache.David S. Miller1-0/+3
Maintain a local hash table of TCP dynamic metrics blobs. Computed TCP metrics are no longer maintained in the route metrics. The table uses RCU and an extremely simple hash so that it has low latency and low overhead. A simple hash is legitimate because we only make metrics blobs for fully established connections. Some tweaking of the default hash table sizes, metric timeouts, and the hash chain length limit certainly could use some tweaking. But the basic design seems sound. With help from Eric Dumazet and Joe Perches. Signed-off-by: David S. Miller <[email protected]>
2012-07-05ipv4: Avoid overhead when no custom FIB rules are installed.David S. Miller1-0/+8
If the user hasn't actually installed any custom rules, or fiddled with the default ones, don't go through the whole FIB rules layer. It's just pure overhead. Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check the individual tables by hand, one by one. Also, move fib_num_tclassid_users into the ipv4 network namespace. Signed-off-by: David S. Miller <[email protected]>
2012-06-08inetpeer: add namespace support for inetpeerGao feng1-1/+1
now inetpeer doesn't support namespace,the information will be leaking across namespace. this patch move the global vars v4_peers and v6_peers to netns_ipv4 and netns_ipv6 as a field peers. add struct pernet_operations inetpeer_ops to initial pernet inetpeer data. and change family_to_base and inet_getpeer to support namespace. Signed-off-by: Gao feng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-12-12per-netns ipv4 sysctl_tcp_memGlauber Costa1-0/+1
This patch allows each namespace to independently set up its levels for tcp memory pressure thresholds. This patch alone does not buy much: we need to make this values per group of process somehow. This is achieved in the patches that follows in this patchset. Signed-off-by: Glauber Costa <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> CC: David S. Miller <[email protected]> CC: Eric W. Biederman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-05-13net: ipv4: add IPPROTO_ICMP socket kindVasiliy Kulikov1-0/+2
This patch adds IPPROTO_ICMP socket kind. It makes it possible to send ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages without any special privileges. In other words, the patch makes it possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In order not to increase the kernel's attack surface, the new functionality is disabled by default, but is enabled at bootup by supporting Linux distributions, optionally with restriction to a group or a group range (see below). Similar functionality is implemented in Mac OS X: http://www.manpagez.com/man/4/icmp/ A new ping socket is created with socket(PF_INET, SOCK_DGRAM, PROT_ICMP) Message identifiers (octets 4-5 of ICMP header) are interpreted as local ports. Addresses are stored in struct sockaddr_in. No port numbers are reserved for privileged processes, port 0 is reserved for API ("let the kernel pick a free number"). There is no notion of remote ports, remote port numbers provided by the user (e.g. in connect()) are ignored. Data sent and received include ICMP headers. This is deliberate to: 1) Avoid the need to transport headers values like sequence numbers by other means. 2) Make it easier to port existing programs using raw sockets. ICMP headers given to send() are checked and sanitized. The type must be ICMP_ECHO and the code must be zero (future extensions might relax this, see below). The id is set to the number (local port) of the socket, the checksum is always recomputed. ICMP reply packets received from the network are demultiplexed according to their id's, and are returned by recv() without any modifications. IP header information and ICMP errors of those packets may be obtained via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source quenches and redirects are reported as fake errors via the error queue (IP_RECVERR); the next hop address for redirects is saved to ee_info (in network order). socket(2) is restricted to the group range specified in "/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning that nobody (not even root) may create ping sockets. Setting it to "100 100" would grant permissions to the single group (to either make /sbin/ping g+s and owned by this group or to grant permissions to the "netadmins" group), "0 4294967295" would enable it for the world, "100 4294967295" would enable it for the users, but not daemons. The existing code might be (in the unlikely case anyone needs it) extended rather easily to handle other similar pairs of ICMP messages (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply etc.). Userspace ping util & patch for it: http://openwall.info/wiki/people/segoon/ping For Openwall GNU/*/Linux it was the last step on the road to the setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels) is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs: http://mirrors.kernel.org/openwall/Owl/current/iso/ Initially this functionality was written by Pavel Kankovsky for Linux 2.4.32, but unfortunately it was never made public. All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with the patch. PATCH v3: - switched to flowi4. - minor changes to be consistent with raw sockets code. PATCH v2: - changed ping_debug() to pr_debug(). - removed CONFIG_IP_PING. - removed ping_seq_fops.owner field (unused for procfs). - switched to proc_net_fops_create(). - switched to %pK in seq_printf(). PATCH v1: - fixed checksumming bug. - CAP_NET_RAW may not create icmp sockets anymore. RFC v2: - minor cleanups. - introduced sysctl'able group range to restrict socket(2). Signed-off-by: Vasiliy Kulikov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-03-24ipv4: Invalidate nexthop cache nh_saddr more correctly.David S. Miller1-0/+1
Any operation that: 1) Brings up an interface 2) Adds an IP address to an interface 3) Deletes an IP address from an interface can potentially invalidate the nh_saddr value, requiring it to be recomputed. Perform the recomputation lazily using a generation ID. Reported-by: Julian Anastasov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2011-01-14netfilter: nf_conntrack: use is_vmalloc_addr()Patrick McHardy1-1/+0
Use is_vmalloc_addr() in nf_ct_free_hashtable() and get rid of the vmalloc flags to indicate that a hash table has been allocated using vmalloc(). Signed-off-by: Patrick McHardy <[email protected]>
2010-05-08ipv4: remove ip_rt_secret timer (v4)Neil Horman1-1/+0
A while back there was a discussion regarding the rt_secret_interval timer. Given that we've had the ability to do emergency route cache rebuilds for awhile now, based on a statistical analysis of the various hash chain lengths in the cache, the use of the flush timer is somewhat redundant. This patch removes the rt_secret_interval sysctl, allowing us to rely solely on the statistical analysis mechanism to determine the need for route cache flushes. Signed-off-by: Neil Horman <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-04-13ipv4: ipmr: support multiple tablesPatrick McHardy1-0/+5
This patch adds support for multiple independant multicast routing instances, named "tables". Userspace multicast routing daemons can bind to a specific table instance by issuing a setsockopt call using a new option MRT_TABLE. The table number is stored in the raw socket data and affects all following ipmr setsockopt(), getsockopt() and ioctl() calls. By default, a single table (RT_TABLE_DEFAULT) is created with a default routing rule pointing to it. Newly created pimreg devices have the table number appended ("pimregX"), with the exception of devices created in the default table, which are named just "pimreg" for compatibility reasons. Packets are directed to a specific table instance using routing rules, similar to how regular routing rules work. Currently iif, oif and mark are supported as keys, source and destination addresses could be supported additionally. Example usage: - bind pimd/xorp/... to a specific table: uint32_t table = 123; setsockopt(fd, IPPROTO_IP, MRT_TABLE, &table, sizeof(table)); - create routing rules directing packets to the new table: # ip mrule add iif eth0 lookup 123 # ip mrule add oif eth0 lookup 123 Signed-off-by: Patrick McHardy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-04-13ipv4: ipmr: move mroute data into seperate structurePatrick McHardy1-12/+1
Signed-off-by: Patrick McHardy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-04-13ipv4: ipmr: convert struct mfc_cache to struct list_headPatrick McHardy1-2/+2
Signed-off-by: Patrick McHardy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-04-13ipv4: ipmr: move unres_queue and timer to per-namespace dataPatrick McHardy1-0/+2
The unres_queue is currently shared between all namespaces. Following patches will additionally allow to create multiple multicast routing tables in each namespace. Having a single shared queue for all these users seems to excessive, move the queue and the cleanup timer to the per-namespace data to unshare it. As a side-effect, this fixes a bug in the seq file iteration functions: the first entry returned is always from the current namespace, entries returned after that may belong to any namespace. Signed-off-by: Patrick McHardy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-02-10Merge branch 'master' of /repos/git/net-next-2.6Patrick McHardy1-0/+1
Signed-off-by: Patrick McHardy <[email protected]>
2010-02-08netfilter: nf_conntrack: fix hash resizing with namespacesPatrick McHardy1-0/+1
As noticed by Jon Masters <[email protected]>, the conntrack hash size is global and not per namespace, but modifiable at runtime through /sys/module/nf_conntrack/hashsize. Changing the hash size will only resize the hash in the current namespace however, so other namespaces will use an invalid hash size. This can cause crashes when enlarging the hashsize, or false negative lookups when shrinking it. Move the hash size into the per-namespace data and only use the global hash size to initialize the per-namespace value when instanciating a new namespace. Additionally restrict hash resizing to init_net for now as other namespaces are not handled currently. Cc: [email protected] Signed-off-by: Patrick McHardy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-01-18netfilter: netns: #ifdef ->iptable_security, ->ip6table_securityAlexey Dobriyan1-0/+2
'security' tables depend on SECURITY, so ifdef them. Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2009-01-22netns: ipmr: declare reg_vif_num per-namespaceBenjamin Thery1-0/+3
Preliminary work to make IPv4 multicast routing netns-aware. Declare variable 'reg_vif_num' per-namespace, move into struct netns_ipv4. At the moment, this variable is only referenced in init_net. Signed-off-by: Benjamin Thery <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2009-01-22netns: ipmr: declare mroute_do_assert and mroute_do_pim per-namespaceBenjamin Thery1-0/+2
Preliminary work to make IPv4 multicast routing netns-aware. Declare IPv multicast routing variables 'mroute_do_assert' and 'mroute_do_pim' per-namespace in struct netns_ipv4. At the moment, these variables are only referenced in init_net. Signed-off-by: Benjamin Thery <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2009-01-22netns: ipmr: declare counter cache_resolve_queue_len per-namespaceBenjamin Thery1-0/+1
Preliminary work to make IPv4 multicast routing netns-aware. Declare variable cache_resolve_queue_len per-namespace: move it into struct netns_ipv4. This variable counts the number of unresolved cache entries queued in the list mfc_unres_queue. This list is kept global to all netns as the number of entries per namespace is limited to 10 (hardcoded in routine ipmr_cache_unresolved). Entries belonging to different namespaces in mfc_unres_queue will be identified by matching the mfc_net member introduced previously in struct mfc_cache. Keeping this list global to all netns, also allows us to keep a single timer (ipmr_expire_timer) to handle their expiration. In some places cache_resolve_queue_len value was tested for arming or deleting the timer. These tests were equivalent to testing mfc_unres_queue value instead and are replaced in this patch. At the moment, cache_resolve_queue_len is only referenced in init_net. Signed-off-by: Benjamin Thery <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2009-01-22netns: ipmr: dynamically allocate mfc_cache_arrayBenjamin Thery1-0/+1
Preliminary work to make IPv4 multicast routing netns-aware. Dynamically allocate IPv4 multicast forwarding cache, mfc_cache_array, and move it to struct netns_ipv4. At the moment, mfc_cache_array is only referenced in init_net. Signed-off-by: Benjamin Thery <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2009-01-22netns: ipmr: dynamically allocate vif_tableBenjamin Thery1-0/+2
Preliminary work to make IPv6 multicast routing netns-aware. Dynamically allocate interface table vif_table and move it to struct netns_ipv4, and update MIF_EXISTS() macro. At the moment, vif_table is only referenced in init_net. Signed-off-by: Benjamin Thery <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2009-01-22netns: ipmr: allocate mroute_socket per-namespace.Benjamin Thery1-0/+4
Preliminary work to make IPv4 multicast routing netns-aware. Make IPv4 multicast routing mroute_socket per-namespace, moves it into struct netns_ipv4. At the moment, mroute_socket is only referenced in init_net. Signed-off-by: Benjamin Thery <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-27net: implement emergency route cache rebulds when gc_elasticity is exceededNeil Horman1-0/+2
This is a patch to provide on demand route cache rebuilding. Currently, our route cache is rebulid periodically regardless of need. This introduced unneeded periodic latency. This patch offers a better approach. Using code provided by Eric Dumazet, we compute the standard deviation of the average hash bucket chain length while running rt_check_expire. Should any given chain length grow to larger that average plus 4 standard deviations, we trigger an emergency hash table rebuild for that net namespace. This allows for the common case in which chains are well behaved and do not grow unevenly to not incur any latency at all, while those systems (which may be being maliciously attacked), only rebuild when the attack is detected. This patch take 2 other factors into account: 1) chains with multiple entries that differ by attributes that do not affect the hash value are only counted once, so as not to unduly bias system to rebuilding if features like QOS are heavily used 2) if rebuilding crosses a certain threshold (which is adjustable via the added sysctl in this patch), route caching is disabled entirely for that net namespace, since constant rebuilding is less efficient that no caching at all Tested successfully by me. Signed-off-by: Neil Horman <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-08netfilter: netns nat: per-netns bysource hashAlexey Dobriyan1-0/+2
Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2008-10-08netfilter: netns nat: per-netns NAT tableAlexey Dobriyan1-0/+1
Same story as with iptable_filter, iptables_raw tables. Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2008-07-05netns: place rt_genid into struct netDenis V. Lunev1-0/+1
Signed-off-by: Denis V. Lunev <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-07-05netns: make rt_secret_rebuild timer per namespaceDenis V. Lunev1-0/+2
Signed-off-by: Denis V. Lunev <[email protected]> Signed-off-by: David S. Miller <[email protected]>