aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2017-02-20openvswitch: Set event bit after initializing labels.Jarno Rajahalme1-3/+6
Connlabels are included in conntrack netlink event messages only if the IPCT_LABEL bit is set in the event cache (see ctnetlink_conntrack_event()). Set it after initializing labels for a new connection. Found upon further system testing, where it was noticed that labels were missing from the conntrack events. Fixes: 193e30967897 ("openvswitch: Do not trigger events for unconfirmed connections.") Signed-off-by: Jarno Rajahalme <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-19sctp: check duplicate node before inserting a new transportXin Long1-0/+13
sctp has changed to use rhlist for transport rhashtable since commit 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable"). But rhltable_insert_key doesn't check the duplicate node when inserting a node, unlike rhashtable_lookup_insert_key. It may cause duplicate assoc/transport in rhashtable. like: client (addr A, B) server (addr X, Y) connect to X INIT (1) ------------> connect to Y INIT (2) ------------> INIT_ACK (1) <------------ INIT_ACK (2) <------------ After sending INIT (2), one transport will be created and hashed into rhashtable. But when receiving INIT_ACK (1) and processing the address params, another transport will be created and hashed into rhashtable with the same addr Y and EP as the last transport. This will confuse the assoc/transport's lookup. This patch is to fix it by returning err if any duplicate node exists before inserting it. Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable") Reported-by: Fabio M. Di Nitto <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-19sctp: add reconf chunk eventXin Long1-0/+30
This patch is to add reconf chunk event based on the sctp event frame in rx path, it will call sctp_sf_do_reconf to process the reconf chunk. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-19sctp: add reconf chunk processXin Long1-0/+54
This patch is to add a function to process the incoming reconf chunk, in which it verifies the chunk, and traverses the param and process it with the right function one by one. sctp_sf_do_reconf would be the process function of reconf chunk event. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-19sctp: add a function to verify the sctp reconf chunkXin Long1-0/+59
This patch is to add a function sctp_verify_reconf to do some length check and multi-params check for sctp stream reconf according to rfc6525 section 3.1. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-19sctp: implement receiver-side procedures for the Incoming SSN Reset Request ↵Xin Long2-7/+71
Parameter This patch is to implement Receiver-Side Procedures for the Incoming SSN Reset Request Parameter described in rfc6525 section 5.2.3. It's also to move str_list endian conversion out of sctp_make_strreset_req, so that sctp_make_strreset_req can be used more conveniently to process inreq. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-19sctp: implement receiver-side procedures for the Outgoing SSN Reset Request ↵Xin Long1-0/+113
Parameter This patch is to implement Receiver-Side Procedures for the Outgoing SSN Reset Request Parameter described in rfc6525 section 5.2.2. Note that some checks must be after request_seq check, as even those checks fail, strreset_inseq still has to be increase by 1. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-19sctp: add support for generating stream ssn reset event notificationXin Long1-0/+29
This patch is to add Stream Reset Event described in rfc6525 section 6.1.1. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-19sctp: add support for generating stream reconf resp chunkXin Long1-0/+74
This patch is to define Re-configuration Response Parameter described in rfc6525 section 4.4. As optional fields are only for SSN/TSN Reset Request Parameter, it uses another function to make that. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-30/+44
2017-02-18ipv6: release dst on error in ip6_dst_lookup_tailWillem de Bruijn1-2/+4
If ip6_dst_lookup_tail has acquired a dst and fails the IPv4-mapped check, release the dst before returning an error. Fixes: ec5e3b0a1d41 ("ipv6: Inhibit IPv4-mapped src address on the wire.") Signed-off-by: Willem de Bruijn <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17irda: Fix lockdep annotations in hashbin_delete().David S. Miller1-18/+16
A nested lock depth was added to the hasbin_delete() code but it doesn't actually work some well and results in tons of lockdep splats. Fix the code instead to properly drop the lock around the operation and just keep peeking the head of the hashbin queue. Reported-by: Dmitry Vyukov <[email protected]> Tested-by: Dmitry Vyukov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17tcp: use page_ref_inc() in tcp_sendmsg()Eric Dumazet1-1/+1
sk_page_frag_refill() allocates either a compound page or an order-0 page. We can use page_ref_inc() which is slightly faster than get_page() Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17tcp: accommodate sequence number to a peer's shrunk receive window caused by ↵Cui, Cheng1-2/+5
precision loss in window scaling Prevent sending out a left-shifted sequence number from a Linux sender in response to a peer's shrunk receive-window caused by losing least significant bits in window-scaling. Cc: "David S. Miller" <[email protected]> Cc: Alexey Kuznetsov <[email protected]> Cc: James Morris <[email protected]> Cc: Hideaki YOSHIFUJI <[email protected]> Cc: Patrick McHardy <[email protected]> Signed-off-by: Cheng Cui <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17rds:Remove unnecessary ib_ring unallocZhu Yanjun1-1/+0
In the function rds_ib_xmit_atomic, ib_ring is not allocated successfully. As such, it is not necessary to unalloc it. Cc: Joe Jin <[email protected]> Cc: Junxiao Bi <[email protected]> Signed-off-by: Zhu Yanjun <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17pkt_sched: Remove useless qdisc_stab_lockGao Feng1-12/+0
The qdisc_stab_lock is used in qdisc_get_stab and qdisc_put_stab. These two functions are invoked in qdisc_create, qdisc_change, and qdisc_destroy which run fully under RTNL. So it already makes sure only one could access the qdisc_stab_list at the same time. Then it is unnecessary to use qdisc_stab_lock now. Signed-off-by: Gao Feng <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17rxrpc: Change module filename to rxrpc.koDavid Howells1-6/+6
Change module filename from af-rxrpc.ko to rxrpc.ko so as to be consistent with the other protocol drivers. Also adjust the documentation to reflect this. Further, there is no longer a standalone rxkad module, as it has been merged into the rxrpc core, so get rid of references to that. Reported-by: Marc Dionne <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17rtnl: don't account unused struct ifla_port_vsi in rtnl_port_sizeDaniel Borkmann1-4/+7
When allocating rtnl dump messages, struct ifla_port_vsi is never dumped, so we can save header plus payload in rtnl_port_size(). Infact, attribute IFLA_PORT_VSI_TYPE and struct ifla_port_vsi are not used anywhere in the kernel. We only need to keep the nla policy should applications in user space be filling this out. Same NLA_BINARY issue exists as was fixed in 364d5716a7ad ("rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY") and others, but then again IFLA_PORT_VSI_TYPE is not used anywhere, so just add a comment that it's unused. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17bridge: don't indicate expiry on NTF_EXT_LEARNED fdb entriesRoopa Prabhu1-1/+1
added_by_external_learn fdb entries are added and expired by external entities like switchdev driver or external controllers. ageing is already disabled for such entries. Hence, don't indicate expiry for such fdb entries. CC: Nikolay Aleksandrov <[email protected]> CC: Jiri Pirko <[email protected]> CC: Ido Schimmel <[email protected]> Signed-off-by: Roopa Prabhu <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Tested-by: Ido Schimmel <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17bpf: make jited programs visible in tracesDaniel Borkmann2-1/+9
Long standing issue with JITed programs is that stack traces from function tracing check whether a given address is kernel code through {__,}kernel_text_address(), which checks for code in core kernel, modules and dynamically allocated ftrace trampolines. But what is still missing is BPF JITed programs (interpreted programs are not an issue as __bpf_prog_run() will be attributed to them), thus when a stack trace is triggered, the code walking the stack won't see any of the JITed ones. The same for address correlation done from user space via reading /proc/kallsyms. This is read by tools like perf, but the latter is also useful for permanent live tracing with eBPF itself in combination with stack maps when other eBPF types are part of the callchain. See offwaketime example on dumping stack from a map. This work tries to tackle that issue by making the addresses and symbols known to the kernel. The lookup from *kernel_text_address() is implemented through a latched RB tree that can be read under RCU in fast-path that is also shared for symbol/size/offset lookup for a specific given address in kallsyms. The slow-path iteration through all symbols in the seq file done via RCU list, which holds a tiny fraction of all exported ksyms, usually below 0.1 percent. Function symbols are exported as bpf_prog_<tag>, in order to aide debugging and attribution. This facility is currently enabled for root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening is active in any mode. The rationale behind this is that still a lot of systems ship with world read permissions on kallsyms thus addresses should not get suddenly exposed for them. If that situation gets much better in future, we always have the option to change the default on this. Likewise, unprivileged programs are not allowed to add entries there either, but that is less of a concern as most such programs types relevant in this context are for root-only anyway. If enabled, call graphs and stack traces will then show a correct attribution; one example is illustrated below, where the trace is now visible in tooling such as perf script --kallsyms=/proc/kallsyms and friends. Before: 7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux) f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so) After: 7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux) [...] 7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux) f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so) Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
2017-02-17bpf: mark all registered map/prog types as __ro_after_initDaniel Borkmann1-9/+9
All map types and prog types are registered to the BPF core through bpf_register_map_type() and bpf_register_prog_type() during init and remain unchanged thereafter. As by design we don't (and never will) have any pluggable code that can register to that at any later point in time, lets mark all the existing bpf_{map,prog}_type_list objects in the tree as __ro_after_init, so they can be moved to read-only section from then onwards. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17dccp: fix freeing skb too early for IPV6_RECVPKTINFOAndrey Konovalov1-1/+2
In the current DCCP implementation an skb for a DCCP_PKT_REQUEST packet is forcibly freed via __kfree_skb in dccp_rcv_state_process if dccp_v6_conn_request successfully returns. However, if IPV6_RECVPKTINFO is set on a socket, the address of the skb is saved to ireq->pktopts and the ref count for skb is incremented in dccp_v6_conn_request, so skb is still in use. Nevertheless, it gets freed in dccp_rcv_state_process. Fix by calling consume_skb instead of doing goto discard and therefore calling __kfree_skb. Similar fixes for TCP: fb7e2399ec17f1004c0e0ccfd17439f8759ede01 [TCP]: skb is unexpectedly freed. 0aea76d35c9651d55bbaf746e7914e5f9ae5a25d tcp: SYN packets are now simply consumed Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17bridge: vlan_tunnel: explicitly reset metadata attrs to NULL on failureRoopa Prabhu1-0/+2
Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Signed-off-by: Roopa Prabhu <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17tipc: Fix tipc_sk_reinit race conditionsHerbert Xu2-11/+23
There are two problems with the function tipc_sk_reinit. Firstly it's doing a manual walk over an rhashtable. This is broken as an rhashtable can be resized and if you manually walk over it during a resize then you may miss entries. Secondly it's missing memory barriers as previously the code used spinlocks which provide the barriers implicitly. This patch fixes both problems. Fixes: 07f6c4bc048a ("tipc: convert tipc reference table to...") Signed-off-by: Herbert Xu <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17net/sched: cls_bpf: Reflect HW offload statusOr Gerlitz1-2/+11
BPF classifier support for the "in hw" offloading flags. Signed-off-by: Or Gerlitz <[email protected]> Reviewed-by: Amir Vadai <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17net/sched: cls_u32: Reflect HW offload statusOr Gerlitz1-0/+10
U32 support for the "in hw" offloading flags. Signed-off-by: Or Gerlitz <[email protected]> Reviewed-by: Amir Vadai <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17net/sched: cls_matchall: Reflect HW offloading statusOr Gerlitz1-2/+10
Matchall support for the "in hw" offloading flags. Signed-off-by: Or Gerlitz <[email protected]> Reviewed-by: Amir Vadai <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17net/sched: cls_flower: Reflect HW offload statusOr Gerlitz1-0/+5
Flower support for the "in hw" offloading flags. Signed-off-by: Or Gerlitz <[email protected]> Reviewed-by: Amir Vadai <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17net/sched: cls_matchall: Dump the classifier flagsOr Gerlitz1-0/+3
The classifier flags are not dumped to user-space, do that. Signed-off-by: Or Gerlitz <[email protected]> Acked-by: Jiri Pirko <[email protected]> Acked-by: Yotam Gigi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17net/sched: cls_flower: Properly handle classifier flags dumpingOr Gerlitz1-1/+2
Dump the classifier flags only if non zero and make sure to check the return status of the handler that puts them into the netlink msg. Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-17packet: Do not call fanout_release from atomic contextsAnoob Soman1-9/+22
Commit 6664498280cf ("packet: call fanout_release, while UNREGISTERING a netdev"), unfortunately, introduced the following issues. 1. calling mutex_lock(&fanout_mutex) (fanout_release()) from inside rcu_read-side critical section. rcu_read_lock disables preemption, most often, which prohibits calling sleeping functions. [ ] include/linux/rcupdate.h:560 Illegal context switch in RCU read-side critical section! [ ] [ ] rcu_scheduler_active = 1, debug_locks = 0 [ ] 4 locks held by ovs-vswitchd/1969: [ ] #0: (cb_lock){++++++}, at: [<ffffffff8158a6c9>] genl_rcv+0x19/0x40 [ ] #1: (ovs_mutex){+.+.+.}, at: [<ffffffffa04878ca>] ovs_vport_cmd_del+0x4a/0x100 [openvswitch] [ ] #2: (rtnl_mutex){+.+.+.}, at: [<ffffffff81564157>] rtnl_lock+0x17/0x20 [ ] #3: (rcu_read_lock){......}, at: [<ffffffff81614165>] packet_notifier+0x5/0x3f0 [ ] [ ] Call Trace: [ ] [<ffffffff813770c1>] dump_stack+0x85/0xc4 [ ] [<ffffffff810c9077>] lockdep_rcu_suspicious+0x107/0x110 [ ] [<ffffffff810a2da7>] ___might_sleep+0x57/0x210 [ ] [<ffffffff810a2fd0>] __might_sleep+0x70/0x90 [ ] [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0 [ ] [<ffffffff810de93f>] ? vprintk_default+0x1f/0x30 [ ] [<ffffffff81186e88>] ? printk+0x4d/0x4f [ ] [<ffffffff816106dd>] fanout_release+0x1d/0xe0 [ ] [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0 2. calling mutex_lock(&fanout_mutex) inside spin_lock(&po->bind_lock). "sleeping function called from invalid context" [ ] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620 [ ] in_atomic(): 1, irqs_disabled(): 0, pid: 1969, name: ovs-vswitchd [ ] INFO: lockdep is turned off. [ ] Call Trace: [ ] [<ffffffff813770c1>] dump_stack+0x85/0xc4 [ ] [<ffffffff810a2f52>] ___might_sleep+0x202/0x210 [ ] [<ffffffff810a2fd0>] __might_sleep+0x70/0x90 [ ] [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0 [ ] [<ffffffff816106dd>] fanout_release+0x1d/0xe0 [ ] [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0 3. calling dev_remove_pack(&fanout->prot_hook), from inside spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack() -> synchronize_net(), which might sleep. [ ] BUG: scheduling while atomic: ovs-vswitchd/1969/0x00000002 [ ] INFO: lockdep is turned off. [ ] Call Trace: [ ] [<ffffffff813770c1>] dump_stack+0x85/0xc4 [ ] [<ffffffff81186274>] __schedule_bug+0x64/0x73 [ ] [<ffffffff8162b8cb>] __schedule+0x6b/0xd10 [ ] [<ffffffff8162c5db>] schedule+0x6b/0x80 [ ] [<ffffffff81630b1d>] schedule_timeout+0x38d/0x410 [ ] [<ffffffff810ea3fd>] synchronize_sched_expedited+0x53d/0x810 [ ] [<ffffffff810ea6de>] synchronize_rcu_expedited+0xe/0x10 [ ] [<ffffffff8154eab5>] synchronize_net+0x35/0x50 [ ] [<ffffffff8154eae3>] dev_remove_pack+0x13/0x20 [ ] [<ffffffff8161077e>] fanout_release+0xbe/0xe0 [ ] [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0 4. fanout_release() races with calls from different CPU. To fix the above problems, remove the call to fanout_release() under rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and netdev_run_todo will be happy that &dev->ptype_specific list is empty. In order to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} to __fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure fanout->prot_hook is removed as well. Fixes: 6664498280cf ("packet: call fanout_release, while UNREGISTERING a netdev") Reported-by: Eric Dumazet <[email protected]> Signed-off-by: Anoob Soman <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-16Merge branch 'master' of ↵David S. Miller21-141/+394
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-02-16 1) Make struct xfrm_input_afinfo const, nothing writes to it. From Florian Westphal. 2) Remove all places that write to the afinfo policy backend and make the struct const then. From Florian Westphal. 3) Prepare for packet consuming gro callbacks and add ESP GRO handlers. ESP packets can be decapsulated at the GRO layer then. It saves a round through the stack for each ESP packet. Please note that this has a merge coflict between commit 63fca65d0863 ("net: add confirm_neigh method to dst_ops") from net-next and 3d7d25a68ea5 ("xfrm: policy: remove garbage_collect callback") a2817d8b279b ("xfrm: policy: remove family field") from ipsec-next. The conflict can be solved as it is done in linux-next. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-02-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller11-44/+74
2017-02-15rhashtable: Revert nested table changes.David S. Miller2-23/+11
This reverts commits: 6a25478077d987edc5e2f880590a2bc5fcab4441 9dbbfb0ab6680c6a85609041011484e6658e7d3c 40137906c5f55c252194ef5834130383e639536f It's too risky to put in this late in the release cycle. We'll put these changes into the next merge window instead. Signed-off-by: David S. Miller <[email protected]>
2017-02-15openvswitch: Set internal device max mtu to ETH_MAX_MTU.Jarno Rajahalme1-0/+2
Commit 91572088e3fd ("net: use core MTU range checking in core net infra") changed the openvswitch internal device to use the core net infra for controlling the MTU range, but failed to actually set the max_mtu as described in the commit message, which now defaults to ETH_DATA_LEN. This patch fixes this by setting max_mtu to ETH_MAX_MTU after ether_setup() call. Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra") Signed-off-by: Jarno Rajahalme <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-15net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notificationMarcus Huewe1-1/+2
When setting a neigh related sysctl parameter, we always send a NETEVENT_DELAY_PROBE_TIME_UPDATE netevent. For instance, when executing sysctl net.ipv6.neigh.wlp3s0.retrans_time_ms=2000 a NETEVENT_DELAY_PROBE_TIME_UPDATE netevent is generated. This is caused by commit 2a4501ae18b5 ("neigh: Send a notification when DELAY_PROBE_TIME changes"). According to the commit's description, it was intended to generate such an event when setting the "delay_first_probe_time" sysctl parameter. In order to fix this, only generate this event when actually setting the "delay_first_probe_time" sysctl parameter. This fix should not have any unintended side-effects, because all but one registered netevent callbacks check for other netevent event types (the registered callbacks were obtained by grepping for "register_netevent_notifier"). The only callback that uses the NETEVENT_DELAY_PROBE_TIME_UPDATE event is mlxsw_sp_router_netevent_event() (in drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c): in case of this event, it only accesses the DELAY_PROBE_TIME of the passed neigh_parms. Fixes: 2a4501ae18b5 ("neigh: Send a notification when DELAY_PROBE_TIME changes") Signed-off-by: Marcus Huewe <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-15esp: Add a software GRO codepathSteffen Klassert11-7/+287
This patch adds GRO ifrastructure and callbacks for ESP on ipv4 and ipv6. In case the GRO layer detects an ESP packet, the esp{4,6}_gro_receive() function does a xfrm state lookup and calls the xfrm input layer if it finds a matching state. The packet will be decapsulated and reinjected it into layer 2. Signed-off-by: Steffen Klassert <[email protected]>
2017-02-15xfrm: Extend the sec_path for IPsec offloadingSteffen Klassert1-0/+2
We need to keep per packet offloading informations across the layers. So we extend the sec_path to carry these for the input and output offload codepath. Signed-off-by: Steffen Klassert <[email protected]>
2017-02-15xfrm: Export xfrm_parse_spi.Steffen Klassert1-0/+1
We need it in the ESP offload handlers, so export it. Signed-off-by: Steffen Klassert <[email protected]>
2017-02-15net: Prepare gro for packet consuming gro callbacksSteffen Klassert2-0/+11
The upcomming IPsec ESP gro callbacks will consume the skb, so prepare for that. Signed-off-by: Steffen Klassert <[email protected]>
2017-02-15net: Add a skb_gro_flush_final helper.Steffen Klassert3-3/+3
Add a skb_gro_flush_final helper to prepare for consuming skbs in call_gro_receive. We will extend this helper to not touch the skb if the skb is consumed by a gro callback with a followup patch. We need this to handle the upcomming IPsec ESP callbacks as they reinject the skb to the napi_gro_receive asynchronous. The handler is used in all gro_receive functions that can call the ESP gro handlers. Signed-off-by: Steffen Klassert <[email protected]>
2017-02-15xfrm: Add a secpath_set helper.Steffen Klassert2-24/+25
Add a new helper to set the secpath to the skb. This avoids code duplication, as this is used in multiple places. Signed-off-by: Steffen Klassert <[email protected]>
2017-02-14tcp: tcp_probe: use spin_lock_bh()Eric Dumazet1-2/+2
tcp_rcv_established() can now run in process context. We need to disable BH while acquiring tcp probe spinlock, or risk a deadlock. Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Ricardo Nabinger Sanchez <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-14packet: fix races in fanout_add()Eric Dumazet1-25/+30
Multiple threads can call fanout_add() at the same time. We need to grab fanout_mutex earlier to avoid races that could lead to one thread freeing po->rollover that was set by another thread. Do the same in fanout_release(), for peace of mind, and to help us finding lockdep issues earlier. Fixes: dc99f600698d ("packet: Add fanout support.") Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state") Signed-off-by: Eric Dumazet <[email protected]> Cc: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-14net_sched: nla_memdup_cookie() can be staticWei Yongjun1-1/+1
Fixes the following sparse warning: net/sched/act_api.c:532:5: warning: symbol 'nla_memdup_cookie' was not declared. Should it be static? Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-14kcm: fix a null pointer dereference in kcm_sendmsg()WANG Cong1-2/+4
In commit 98e3862ca2b1 ("kcm: fix 0-length case for kcm_sendmsg()") I tried to avoid skb allocation for 0-length case, but missed a check for NULL pointer in the non EOR case. Fixes: 98e3862ca2b1 ("kcm: fix 0-length case for kcm_sendmsg()") Reported-by: Dmitry Vyukov <[email protected]> Cc: Tom Herbert <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-14bridge: fdb: converge fdb_delete_by functions into oneNikolay Aleksandrov1-47/+15
We can simplify the logic of entries pointing to the bridge by converging the fdb_delete_by functions, this would allow us to use the same function for both cases since the fdb's dst is set to NULL if it is pointing to the bridge thus we can always check for a port match. Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-14bridge: fdb: add proper lock checks in searching functionsNikolay Aleksandrov2-0/+13
In order to avoid new errors add checks to br_fdb_find and fdb_find_rcu functions. The first requires hash_lock, the second obviously RCU. Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-14bridge: fdb: converge fdb searching functions into oneNikolay Aleksandrov4-70/+54
Before this patch we had 3 different fdb searching functions which was confusing. This patch reduces all of them to one - fdb_find_rcu(), and two flavors: br_fdb_find() which requires hash_lock and br_fdb_find_rcu which requires RCU. This makes it clear what needs to be used, we also remove two abusers of __br_fdb_get which called it under hash_lock and replace them with br_fdb_find(). Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-14ipv6: Handle IPv4-mapped src to in6addr_any dst.Jonathan T. Leighton3-8/+21
This patch adds a check on the type of the source address for the case where the destination address is in6addr_any. If the source is an IPv4-mapped IPv6 source address, the destination is changed to ::ffff:127.0.0.1, and otherwise the destination is changed to ::1. This is done in three locations to handle UDP calls to either connect() or sendmsg() and TCP calls to connect(). Note that udpv6_sendmsg() delays handling an in6addr_any destination until very late, so the patch only needs to handle the case where the source is an IPv4-mapped IPv6 address. Signed-off-by: Jonathan T. Leighton <[email protected]> Signed-off-by: David S. Miller <[email protected]>