aboutsummaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)AuthorFilesLines
2015-12-14Merge branch 'master' of ↵Pablo Neira Ayuso18-95/+125
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Resolve conflict between commit 264640fc2c5f4f ("ipv6: distinguish frag queues by device for multicast and link-local packets") from the net tree and commit 029f7f3b8701c ("netfilter: ipv6: nf_defrag: avoid/free clone operations") from the nf-next tree. Signed-off-by: Pablo Neira Ayuso <[email protected]> Conflicts: net/ipv6/netfilter/nf_conntrack_reasm.c
2015-12-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-0/+1
Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains Netfilter fixes for you net tree, specifically for nf_tables and nfnetlink_queue, they are: 1) Avoid a compilation warning in nfnetlink_queue that was introduced in the previous merge window with the simplification of the conntrack integration, from Arnd Bergmann. 2) nfnetlink_queue is leaking the pernet subsystem registration from a failure path, patch from Nikolay Borisov. 3) Pass down netns pointer to batch callback in nfnetlink, this is the largest patch and it is not a bugfix but it is a dependency to resolve a splat in the correct way. 4) Fix a splat due to incorrect socket memory accounting with nfnetlink skbuff clones. 5) Add missing conntrack dependencies to NFT_DUP_IPV4 and NFT_DUP_IPV6. 6) Traverse the nftables commit list in reverse order from the commit path, otherwise we crash when the user applies an incremental update via 'nft -f' that deletes an object that was just introduced in this batch, from Xin Long. Regarding the compilation warning fix, many people have sent us (and keep sending us) patches to address this, that's why I'm including this batch even if this is not critical. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-12-10netfilter: nf_dup: add missing dependencies with NF_CONNTRACKPablo Neira Ayuso1-0/+1
CONFIG_NF_CONNTRACK=m CONFIG_NF_DUP_IPV4=y results in: net/built-in.o: In function `nf_dup_ipv4': >> (.text+0xd434f): undefined reference to `nf_conntrack_untracked' Reported-by: kbuild test robot <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-12-09netfilter: ipv6: nf_defrag: fix NULL deref panicFlorian Westphal1-5/+8
Valdis reports NULL deref in nf_ct_frag6_gather. Problem is bogus use of skb_queue_walk() -- we miss first skb in the list since we start with head->next instead of head. In case the element we're looking for was head->next we won't find a result and then trip over NULL iter. (defrag uses plain NULL-terminated list rather than one terminated by head-of-list-pointer, which is what skb_queue_walk expects). Fixes: 029f7f3b8701cc7a ("netfilter: ipv6: nf_defrag: avoid/free clone operations") Reported-by: Valdis Kletnieks <[email protected]> Tested-by: Valdis Kletnieks <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-12-05ipv6: keep existing flags when setting IFA_F_OPTIMISTICBjørn Mork1-1/+1
Commit 64236f3f3d74 ("ipv6: introduce IFA_F_STABLE_PRIVACY flag") failed to update the setting of the IFA_F_OPTIMISTIC flag, causing the IFA_F_STABLE_PRIVACY flag to be lost if IFA_F_OPTIMISTIC is set. Cc: Erik Kline <[email protected]> Cc: Fernando Gont <[email protected]> Cc: Lorenzo Colitti <[email protected]> Cc: YOSHIFUJI Hideaki/吉藤英明 <[email protected]> Cc: Hannes Frederic Sowa <[email protected]> Fixes: 64236f3f3d74 ("ipv6: introduce IFA_F_STABLE_PRIVACY flag") Signed-off-by: Bjørn Mork <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-05ipv6: Only act upon NETDEV_*_TYPE_CHANGE if we have ipv6 addressesAndrew Lunn1-1/+2
An interface changing type may not have IPv6 addresses. Don't call the address configuration type change in this case. Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-04gre6: allow to update all parameters via rtnlNicolas Dichtel1-5/+3
Parameters were updated only if the kernel was unable to find the tunnel with the new parameters, ie only if core pamareters were updated (keys, addr, link, type). Now it's possible to update ttl, hoplimit, flowinfo and flags. Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller17-88/+102
Conflicts: drivers/net/ethernet/renesas/ravb_main.c kernel/bpf/syscall.c net/ipv4/ipmr.c All three conflicts were cases of overlapping changes. Signed-off-by: David S. Miller <[email protected]>
2015-12-03net: ipv6: restrict hop_limit sysctl setting to range [1; 255]Phil Sutter1-1/+15
Setting a value bigger than 255 resulted in using only the lower eight bits of that value as it is assigned to the u8 header field. To avoid this unexpected result, reject such values. Setting a value of zero is technically possible, but hosts receiving such a packet have to treat it like hop_limit was set to one, according to RFC2460. Therefore I don't see a use-case for that. Setting a route's hop_limit to zero in iproute2 means to use the sysctl default, which is not the case here: Setting e.g. net.conf.eth0.hop_limit=0 will not make the kernel use net.conf.all.hop_limit for outgoing packets on eth0. To avoid these kinds of confusion, reject zero. Signed-off-by: Phil Sutter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-03ipv6: kill sk_dst_lockEric Dumazet4-26/+4
While testing the np->opt RCU conversion, I found that UDP/IPv6 was using a mixture of xchg() and sk_dst_lock to protect concurrent changes to sk->sk_dst_cache, leading to possible corruptions and crashes. ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest way to fix the mess is to remove sk_dst_lock completely, as we did for IPv4. __ip6_dst_store() and ip6_dst_store() share same implementation. sk_setup_caps() being called with socket lock being held or not, we have to use sk_dst_set() instead of __sk_dst_set() Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);" in ip6_dst_store() before the sk_setup_caps(sk, dst) call. This is because ip6_dst_store() can be called from process context, without any lock held. As soon as the dst is installed in sk->sk_dst_cache, dst can be freed from another cpu doing a concurrent ip6_dst_store() Doing the dst dereference before doing the install is needed to make sure no use after free would trigger. Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-02tcp: suppress too verbose messages in tcp_send_ack()Eric Dumazet1-3/+3
If tcp_send_ack() can not allocate skb, we properly handle this and setup a timer to try later. Use __GFP_NOWARN to avoid polluting syslog in the case host is under memory pressure, so that pertinent messages are not lost under a flood of useless information. sk_gfp_atomic() can use its gfp_mask argument (all callers currently were using GFP_ATOMIC before this patch) We rename sk_gfp_atomic() to sk_gfp_mask() to clearly express this function now takes into account its second argument (gfp_mask) Note that when tcp_transmit_skb() is called with clone_it set to false, we do not attempt memory allocations, so can pass a 0 gfp_mask, which most compilers can emit faster than a non zero or constant value. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-02ipv6: add complete rcu protection around np->optEric Dumazet9-36/+74
This patch addresses multiple problems : UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions while socket is not locked : Other threads can change np->opt concurrently. Dmitry posted a syzkaller (http://github.com/google/syzkaller) program desmonstrating use-after-free. Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock() and dccp_v6_request_recv_sock() also need to use RCU protection to dereference np->opt once (before calling ipv6_dup_options()) This patch adds full RCU protection to np->opt Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-01Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests"Nicolas Dichtel3-9/+5
This reverts commit ab450605b35caa768ca33e86db9403229bf42be4. In IPv6, we cannot inherit the dst of the original dst. ndisc packets are IPv6 packets and may take another route than the original packet. This patch breaks the following scenario: a packet comes from eth0 and is forwarded through vxlan1. The encapsulated packet triggers an NS which cannot be sent because of the wrong route. CC: Jiri Benc <[email protected]> CC: Thomas Graf <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-30net: remove unnecessary mroute.h includesNikolay Aleksandrov1-1/+0
It looks like many files are including mroute.h unnecessarily, so remove the include. Most importantly remove it from ipv6. CC: Hideaki YOSHIFUJI <[email protected]> CC: Steffen Klassert <[email protected]> CC: Herbert Xu <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-24net: ipmr, ip6mr: fix vif/tunnel failure race conditionNikolay Aleksandrov1-4/+0
Since (at least) commit b17a7c179dd3 ("[NET]: Do sysfs registration as part of register_netdevice."), netdev_run_todo() deals only with unregistration, so we don't need to do the rtnl_unlock/lock cycle to finish registration when failing pimreg or dvmrp device creation. In fact that opens a race condition where someone can delete the device while rtnl is unlocked because it's fully registered. The problem gets worse when netlink support is introduced as there are more points of entry that can cause it and it also makes reusing that code correctly impossible. Signed-off-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-24ipv6: distinguish frag queues by device for multicast and link-local packetsMichal Kubeček2-5/+10
If a fragmented multicast packet is received on an ethernet device which has an active macvlan on top of it, each fragment is duplicated and received both on the underlying device and the macvlan. If some fragments for macvlan are processed before the whole packet for the underlying device is reassembled, the "overlapping fragments" test in ip6_frag_queue() discards the whole fragment queue. To resolve this, add device ifindex to the search key and require it to match reassembling multicast packets and packets to link-local addresses. Note: similar patch has been already submitted by Yoshifuji Hideaki in http://patchwork.ozlabs.org/patch/220979/ but got lost and forgotten for some reason. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-23netfilter: ipv6: avoid nf_iterate recursionFlorian Westphal2-49/+36
The previous patch changed nf_ct_frag6_gather() to morph reassembled skb with the previous one. This means that the return value is always NULL or the skb argument. So change it to an err value. Instead of invoking NF_HOOK recursively with threshold to skip already-called hooks we can now just return NF_ACCEPT to move on to the next hook except for -EINPROGRESS (which means skb has been queued for reassembly), in which case we return NF_STOLEN. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-11-23netfilter: ipv6: nf_defrag: avoid/free clone operationsFlorian Westphal2-71/+40
commit 6aafeef03b9d9ecf ("netfilter: push reasm skb through instead of original frag skbs") changed ipv6 defrag to not use the original skbs anymore. So rather than keeping the original skbs around just to discard them afterwards just use the original skbs directly for the fraglist of the newly assembled skb and remove the extra clone/free operations. The skb that completes the fragment queue is morphed into a the reassembled one instead, just like ipv4 defrag. openvswitch doesn't need any additional skb_morph magic anymore to deal with this situation so just remove that. A followup patch can then also remove the NF_HOOK (re)invocation in the ipv6 netfilter defrag hook. Cc: Joe Stringer <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-11-23netfilter: remove duplicate includestephen hemminger1-1/+0
Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-11-22net: ip6mr: fix static mfc/dev leaks on table destructionNikolay Aleksandrov1-7/+8
Similar to ipv4, when destroying an mrt table the static mfc entries and the static devices are kept, which leads to devices that can never be destroyed (because of refcnt taken) and leaked memory. Make sure that everything is cleaned up on netns destruction. Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code") CC: Benjamin Thery <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-22net: IPv6 fib lookup tracepointDavid Ahern1-0/+10
Add tracepoint to show fib6 table lookups and result. Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-18net/ip6_tunnel: fix dst leakPaolo Abeni1-1/+1
the commit cdf3464e6c6b ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel") introduced percpu storage for ip6_tunnel dst cache, but while clearing such cache it used raw_cpu_ptr to walk the per cpu entries, so cached dst on non current cpu are not actually reset. This patch replaces raw_cpu_ptr with per_cpu_ptr, properly cleaning such storage. Fixes: cdf3464e6c6b ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel") Signed-off-by: Paolo Abeni <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-16snmp: Remove duplicate OUTMCAST stat incrementNeil Horman1-2/+0
the OUTMCAST stat is double incremented, getting bumped once in the mcast code itself, and again in the common ip output path. Remove the mcast bump, as its not needed Validated by the reporter, with good results Signed-off-by: Neil Horman <[email protected]> Reported-by: Claus Jensen <[email protected]> CC: Claus Jensen <[email protected]> CC: David Miller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-15tcp: ensure proper barriers in lockless contextsEric Dumazet1-4/+15
Some functions access TCP sockets without holding a lock and might output non consistent data, depending on compiler and or architecture. tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ... Introduce sk_state_load() and sk_state_store() to fix the issues, and more clearly document where this lack of locking is happening. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-15ipv6: Check rt->dst.from for the DST_NOCACHE routeMartin KaFai Lau1-1/+2
All DST_NOCACHE rt6_info used to have rt->dst.from set to its parent. After commit 8e3d5be73681 ("ipv6: Avoid double dst_free"), DST_NOCACHE is also set to rt6_info which does not have a parent (i.e. rt->dst.from is NULL). This patch catches the rt->dst.from == NULL case. Fixes: 8e3d5be73681 ("ipv6: Avoid double dst_free") Signed-off-by: Martin KaFai Lau <[email protected]> Cc: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-15ipv6: Check expire on DST_NOCACHE routeMartin KaFai Lau1-1/+10
Since the expires of the DST_NOCACHE rt can be set during the ip6_rt_update_pmtu(), we also need to consider the expires value when doing ip6_dst_check(). This patches creates __rt6_check_expired() to only check the expire value (if one exists) of the current rt. In rt6_dst_from_check(), it adds __rt6_check_expired() as one of the condition check. Signed-off-by: Martin KaFai Lau <[email protected]> Cc: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-15ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 treeMartin KaFai Lau1-1/+7
The original bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1272571 The setup has a IPv4 GRE tunnel running in a IPSec. The bug happens when ndisc starts sending router solicitation at the gre interface. The simplified oops stack is like: __lock_acquire+0x1b2/0x1c30 lock_acquire+0xb9/0x140 _raw_write_lock_bh+0x3f/0x50 __ip6_ins_rt+0x2e/0x60 ip6_ins_rt+0x49/0x50 ~~~~~~~~ __ip6_rt_update_pmtu.part.54+0x145/0x250 ip6_rt_update_pmtu+0x2e/0x40 ~~~~~~~~ ip_tunnel_xmit+0x1f1/0xf40 __gre_xmit+0x7a/0x90 ipgre_xmit+0x15a/0x220 dev_hard_start_xmit+0x2bd/0x480 __dev_queue_xmit+0x696/0x730 dev_queue_xmit+0x10/0x20 neigh_direct_output+0x11/0x20 ip6_finish_output2+0x21f/0x770 ip6_finish_output+0xa7/0x1d0 ip6_output+0x56/0x190 ~~~~~~~~ ndisc_send_skb+0x1d9/0x400 ndisc_send_rs+0x88/0xc0 ~~~~~~~~ The rt passed to ip6_rt_update_pmtu() is created by icmp6_dst_alloc() and it is not managed by the fib6 tree, so its rt6i_table == NULL. When __ip6_rt_update_pmtu() creates a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL and it causes the ip6_ins_rt() oops. During pmtu update, we only want to create a RTF_CACHE clone from a rt which is currently managed (or owned) by the fib6 tree. It means either rt->rt6i_node != NULL or rt is a RTF_PCPU clone. It is worth to note that rt6i_table may not be NULL even it is not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()). Hence, rt6i_node is a better check instead of rt6i_table. Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu") Signed-off-by: Martin KaFai Lau <[email protected]> Reported-by: Chris Siebenmann <[email protected]> Cc: Chris Siebenmann <[email protected]> Cc: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-05tcp: use correct req pointer in tcp_move_syn() callsEric Dumazet1-1/+1
I mistakenly took wrong request sock pointer when calling tcp_move_syn() @req_unhash is either a copy of @req, or a NULL value for FastOpen connexions (as we do not expect to unhash the temporary request sock from ehash table) Fixes: 805c4bc05705 ("tcp: fix req->saved_syn race") Signed-off-by: Eric Dumazet <[email protected]> Cc: Ying Cai <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-05tcp: fix req->saved_syn raceEric Dumazet1-8/+12
For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix ireq->pktopts race"), we need to make sure we do not access req->saved_syn unless we own the request sock. This fixes races for listeners using TCP_SAVE_SYN option. Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Ying Cai <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-04ipv6: clean up dev_snmp6 proc entry when we fail to initialize inet6_devSabrina Dubroca1-0/+1
In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up the dev_snmp6 entry that we have already registered for this device. Call snmp6_unregister_dev in this case. Fixes: a317a2f19da7d ("ipv6: fail early when creating netdev named all or default") Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-25/+16
Minor overlapping changes in net/ipv4/ipmr.c, in 'net' we were fixing the "BH-ness" of the counter bumps whilst in 'net-next' the functions were modified to take an explicit 'net' parameter. Signed-off-by: David S. Miller <[email protected]>
2015-11-03ipv6: fix tunnel error handlingMichal Kubeček1-1/+11
Both tunnel6_protocol and tunnel46_protocol share the same error handler, tunnel6_err(), which traverses through tunnel6_handlers list. For ipip6 tunnels, we need to traverse tunnel46_handlers as we do e.g. in tunnel46_rcv(). Current code can generate an ICMPv6 error message with an IPv4 packet embedded in it. Fixes: 73d605d1abbd ("[IPSEC]: changing API of xfrm6_tunnel_register") Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-03xfrm: dst_entries_init() per-net dst_opsDan Streetman1-15/+38
Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops templates; their dst_entries counters will never be used. Move the xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init and dst_entries_destroy for each net namespace. The ipv4 and ipv6 xfrms each create dst_ops template, and perform dst_entries_init on the templates. The template values are copied to each net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops pcpuc_entries field is a percpu counter and cannot be used correctly by simply copying it to another object. The result of this is a very subtle bug; changes to the dst entries counter from one net namespace may sometimes get applied to a different net namespace dst entries counter. This is because of how the percpu counter works; it has a main count field as well as a pointer to the percpu variables. Each net namespace maintains its own main count variable, but all point to one set of percpu variables. When any net namespace happens to change one of the percpu variables to outside its small batch range, its count is moved to the net namespace's main count variable. So with multiple net namespaces operating concurrently, the dst_ops entries counter can stray from the actual value that it should be; if counts are consistently moved from one net namespace to another (which my testing showed is likely), then one net namespace winds up with a negative dst_ops count while another winds up with a continually increasing count, eventually reaching its gc_thresh limit, which causes all new traffic on the net namespace to fail with -ENOBUFS. Signed-off-by: Dan Streetman <[email protected]> Signed-off-by: Dan Streetman <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2015-11-02sit: fix sit0 percpu double allocationsEric Dumazet1-22/+4
sit0 device allocates its percpu storage twice : - One time in ipip6_tunnel_init() - One time in ipip6_fb_tunnel_init() Thus we leak 48 bytes per possible cpu per network namespace dismantle. ipip6_fb_tunnel_init() can be much simpler and does not return an error, and should be called after register_netdev() Note that ipip6_tunnel_clone_6rd() also needs to be called after register_netdev() (calling ipip6_tunnel_init()) Fixes: ebe084aafb7e ("sit: Use ipip6_tunnel_init as the ndo_init function.") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Cc: Steffen Klassert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02net: fix percpu memory leaksEric Dumazet2-6/+18
This patch fixes following problems : 1) percpu_counter_init() can return an error, therefore init_frag_mem_limit() must propagate this error so that inet_frags_init_net() can do the same up to its callers. 2) If ip[46]_frags_ns_ctl_register() fail, we must unwind properly and free the percpu_counter. Without this fix, we leave freed object in percpu_counters global list (if CONFIG_HOTPLUG_CPU) leading to crashes. This bug was detected by KASAN and syzkaller tool (http://github.com/google/syzkaller) Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Cc: Hannes Frederic Sowa <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02ipv6: fix crash on ICMPv6 redirects with prohibited/blackholed sourceMatthias Schiffer1-2/+1
There are other error values besides ip6_null_entry that can be returned by ip6_route_redirect(): fib6_rule_action() can also result in ip6_blk_hole_entry and ip6_prohibit_entry if such ip rules are installed. Only checking for ip6_null_entry in rt6_do_redirect() causes ip6_ins_rt() to be called with rt->rt6i_table == NULL in these cases, making the kernel crash. Signed-off-by: Matthias Schiffer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-02tcp/dccp: fix ireq->pktopts raceEric Dumazet1-9/+9
IPv6 request sockets store a pointer to skb containing the SYN packet to be able to transfer it to full blown socket when 3WHS is done (ireq->pktopts -> np->pktoptions) As explained in commit 5e0724d027f0 ("tcp/dccp: fix hashdance race for passive sessions"), we must transfer the skb only if we won the hashdance race, if multiple cpus receive the 'ack' packet completing 3WHS at the same time. Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-01ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragmentHannes Frederic Sowa1-4/+4
CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one of those warn about them once and handle them gracefully by recalculating the checksum. Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets") See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL") Cc: Eric Dumazet <[email protected]> Cc: Vlad Yasevich <[email protected]> Cc: Benjamin Coddington <[email protected]> Cc: Tom Herbert <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-01ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked socketsHannes Frederic Sowa1-37/+33
We cannot reliable calculate packet size on MSG_MORE corked sockets and thus cannot decide if they are going to be fragmented later on, so better not use CHECKSUM_PARTIAL in the first place. The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in the existence of IPv6 extension headers, but the condition was wrong. Fix it up, too. Also the condition to check whether the packet fits into one fragment was wrong and has been corrected. Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets") See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL") Cc: Eric Dumazet <[email protected]> Cc: Vlad Yasevich <[email protected]> Cc: Benjamin Coddington <[email protected]> Cc: Tom Herbert <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-4/+3
2015-10-30Merge branch 'master' of ↵David S. Miller1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2015-10-30 1) The flow cache is limited by the flow cache limit which depends on the number of cpus and the xfrm garbage collector threshold which is independent of the number of cpus. This leads to the fact that on systems with more than 16 cpus we hit the xfrm garbage collector limit and refuse new allocations, so new flows are dropped. On systems with 16 or less cpus, we hit the flowcache limit. In this case, we shrink the flow cache instead of refusing new flows. We increase the xfrm garbage collector threshold to INT_MAX to get the same behaviour, independent of the number of cpus. 2) Fix some unaligned accesses on sparc systems. From Sowmini Varadhan. 3) Fix some header checks in _decode_session4. We may call pskb_may_pull with a negative value converted to unsigened int from pskb_may_pull. This can lead to incorrect policy lookups. We fix this by a check of the data pointer position before we call pskb_may_pull. 4) Reload skb header pointers after calling pskb_may_pull in _decode_session4 as this may change the pointers into the packet. 5) Add a missing statistic counter on inner mode errors. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-10-30ipv6: recreate ipv6 link-local addresses when increasing MTU over IPV6_MIN_MTUAlexander Duyck1-19/+27
This change makes it so that we reinitialize the interface if the MTU is increased back above IPV6_MIN_MTU and the interface is up. Cc: Hannes Frederic Sowa <[email protected]> Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-10-29ipv6: protect mtu calculation of wrap-around and infinite loop by rounding ↵Hannes Frederic Sowa1-0/+2
issues Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. This is the second version of the patch which doesn't use the overflow_usub function, which got reverted for now. Suggested-by: Linus Torvalds <[email protected]> Cc: Linus Torvalds <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-10-29Revert "Merge branch 'ipv6-overflow-arith'"Hannes Frederic Sowa1-5/+1
Linus dislikes these changes. To not hold up the net-merge let's revert it for now and fix the bug like Linus suggested. This reverts commit ec3661b42257d9a06cf0d318175623ac7a660113, reversing changes made to c80dbe04612986fd6104b4a1be21681b113b5ac9. Cc: Linus Torvalds <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-10-27ipv6: Export nf_ct_frag6_consume_orig()Joe Stringer1-0/+1
This is needed in openvswitch to fix an skb leak in the next patch. Signed-off-by: Joe Stringer <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-10-26ipv6: icmp: include addresses in debug messagesBjørn Mork1-4/+8
Messages like "icmp6_send: no reply to icmp error" are close to useless. Adding source and destination addresses to provide some more clue. Signed-off-by: Bjørn Mork <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-10-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller7-16/+52
Conflicts: net/ipv6/xfrm6_output.c net/openvswitch/flow_netlink.c net/openvswitch/vport-gre.c net/openvswitch/vport-vxlan.c net/openvswitch/vport.c net/openvswitch/vport.h The openvswitch conflicts were overlapping changes. One was the egress tunnel info fix in 'net' and the other was the vport ->send() op simplification in 'net-next'. The xfrm6_output.c conflicts was also a simplification overlapping a bug fix. Signed-off-by: David S. Miller <[email protected]>
2015-10-23tcp/dccp: fix hashdance race for passive sessionsEric Dumazet1-3/+6
Multiple cpus can process duplicates of incoming ACK messages matching a SYN_RECV request socket. This is a rare event under normal operations, but definitely can happen. Only one must win the race, otherwise corruption would occur. To fix this without adding new atomic ops, we use logic in inet_ehash_nolisten() to detect the request was present in the same ehash bucket where we try to insert the new child. If request socket was not found, we have to undo the child creation. This actually removes a spin_lock()/spin_unlock() pair in reqsk_queue_unlink() for the fast path. Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-10-23ipv6: protect mtu calculation of wrap-around and infinite loop by rounding ↵Hannes Frederic Sowa1-1/+5
issues Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. Reported-by: Dmitry Vyukov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-10-23ipv6: fix the incorrect return value of throw routelucien2-5/+26
The error condition -EAGAIN, which is signaled by throw routes, tells the rules framework to walk on searching for next matches. If the walk ends and we stop walking the rules with the result of a throw route we have to translate the error conditions to -ENETUNREACH. Signed-off-by: Xin Long <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>