aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-02-18tcp/dccp: fix another race at listener dismantleEric Dumazet5-35/+35
Ilya reported following lockdep splat: kernel: ========================= kernel: [ BUG: held lock freed! ] kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted kernel: ------------------------- kernel: swapper/5/0 is freeing memory ffff880035c9d200-ffff880035c9dbff, with a lock still held there! kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at: [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0 kernel: 4 locks held by swapper/5/0: kernel: #0: (rcu_read_lock){......}, at: [<ffffffff8169ef6b>] netif_receive_skb_internal+0x4b/0x1f0 kernel: #1: (rcu_read_lock){......}, at: [<ffffffff816e977f>] ip_local_deliver_finish+0x3f/0x380 kernel: #2: (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>] sk_clone_lock+0x19b/0x440 kernel: #3: (&(&queue->rskq_lock)->rlock){+.-...}, at: [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0 To properly fix this issue, inet_csk_reqsk_queue_add() needs to return to its callers if the child as been queued into accept queue. We also need to make sure listener is still there before calling sk->sk_data_ready(), by holding a reference on it, since the reference carried by the child can disappear as soon as the child is put on accept queue. Reported-by: Ilya Dryomov <[email protected]> Fixes: ebb516af60e1 ("tcp/dccp: fix race at listener dismantle phase") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-18route: check and remove route cache when we get routeXin Long1-14/+63
Since the gc of ipv4 route was removed, the route cached would has no chance to be removed, and even it has been timeout, it still could be used, cause no code to check it's expires. Fix this issue by checking and removing route cache when we get route. Signed-off-by: Xin Long <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-18net_sched: Improve readability of filter processingJamal Hadi Salim1-1/+1
Signed-off-by: Jamal Hadi Salim <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-18bridge: switchdev: Offload VLAN flags to hardware bridgeIdo Schimmel1-0/+11
When VLANs are created / destroyed on a VLAN filtering bridge (MASTER flag set), the configuration is passed down to the hardware. However, when only the flags (e.g. PVID) are toggled, the configuration is done in the software bridge alone. While it is possible to pass these flags to hardware when invoked with the SELF flag set, this creates inconsistency with regards to the way the VLANs are initially configured. Pass the flags down to the hardware even when the VLAN already exists and only the flags are toggled. Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-18net_sched fix: reclassification needs to consider ether protocol changesJamal Hadi Salim1-0/+1
actions could change the etherproto in particular with ethernet tunnelled data. Typically such actions, after peeling the outer header, will ask for the packet to be reclassified. We then need to restart the classification with the new proto header. Example setup used to catch this: sudo tc qdisc add dev $ETH ingress sudo $TC filter add dev $ETH parent ffff: pref 1 protocol 802.1Q \ u32 match u32 0 0 flowid 1:1 \ action vlan pop reclassify Fixes: 3b3ae880266d ("net: sched: consolidate tc_classify{,_compat}") Signed-off-by: Jamal Hadi Salim <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-18net-sysfs: remove unused fmt_long_hexColin Ian King1-1/+0
Ever since commit 04ed3e741d0f133e02bed7fa5c98edba128f90e7 ("net: change netdev->features to u32") the format string fmt_long_hex has not been used, so we may as well remove it. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17tcp: correctly crypto_alloc_hash return checkInsu Yun1-1/+1
crypto_alloc_hash never returns NULL Signed-off-by: Insu Yun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17vlan: change return type of vlan_proc_rem_devZhang Shengju2-4/+3
Since function vlan_proc_rem_dev() will only return 0, it's better to return void instead of int. Signed-off-by: Zhang Shengju <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17net: dsa: Unregister slave_dev in error pathFlorian Fainelli1-0/+1
With commit 0071f56e46da ("dsa: Register netdev before phy"), we are now trying to free a network device that has been previously registered, and in case of errors, this will make us hit the BUG_ON(dev->reg_state != NETREG_UNREGISTERED) condition. Fix this by adding a missing unregister_netdev() before free_netdev(). Fixes: 0071f56e46da ("dsa: Register netdev before phy") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-18netfilter: ipvs: avoid unused variable warningsArnd Bergmann2-15/+8
The proc_create() and remove_proc_entry() functions do not reference their arguments when CONFIG_PROC_FS is disabled, so we get a couple of warnings about unused variables in IPVS: ipvs/ip_vs_app.c:608:14: warning: unused variable 'net' [-Wunused-variable] ipvs/ip_vs_ctl.c:3950:14: warning: unused variable 'net' [-Wunused-variable] ipvs/ip_vs_ctl.c:3994:14: warning: unused variable 'net' [-Wunused-variable] This removes the local variables and instead looks them up separately for each use, which obviously avoids the warning. Signed-off-by: Arnd Bergmann <[email protected]> Fixes: 4c50a8ce2b63 ("netfilter: ipvs: avoid unused variable warning") Acked-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2016-02-18netfilter: ipvs: Remove noisy debug print from ip_vs_del_serviceYannick Brosseau1-2/+0
This have been there for a long time, but does not seem to add value Signed-off-by: Yannick Brosseau <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2016-02-17ipv4: Remove inet_lro libraryBen Hutchings3-383/+0
There are no longer any in-tree drivers that use it. Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17af_llc: fix types on llc_ui_wait_for_connOne Thousand Gnomes1-2/+2
The timeout is a long, we return it truncated if it is huge. Basically harmless as the only caller does a boolean check, but tidy it up anyway. (64bit build tested this time. Thank you 0day) Signed-off-by: Alan Cox <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17sctp: remove the unused sctp_datamsg_free()Xin Long1-13/+0
Since commit 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg") used sctp_datamsg_put in sctp_sendmsg, instead of sctp_datamsg_free, this function has no use in sctp. So we will remove it. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17sctp: remove rcu_read_lock in sctp_seq_dump_remote_addrs()Xin Long1-2/+0
sctp_seq_dump_remote_addrs is only called by sctp_assocs_seq_show() and it has been protected by rcu_read_lock that is from rhashtable_walk_start(). So we will remove this one. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17sctp: move rcu_read_lock from __sctp_lookup_association to ↵Xin Long1-2/+2
sctp_lookup_association __sctp_lookup_association() is only invoked by sctp_v4_err() and sctp_rcv(), both which run on the rx BH, and it has been protected by rcu_read_lock [see ip_local_deliver_finish() / ipv6_rcv()]. So we can move it to sctp_lookup_association, only let sctp_lookup_association use rcu_read_lock. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17l2tp: Fix error creating L2TP tunnelsMark Tomlinson1-4/+14
A previous commit (33f72e6) added notification via netlink for tunnels when created/modified/deleted. If the notification returned an error, this error was returned from the tunnel function. If there were no listeners, the error code ESRCH was returned, even though having no listeners is not an error. Other calls to this and other similar notification functions either ignore the error code, or filter ESRCH. This patch checks for ESRCH and does not flag this as an error. Reviewed-by: Hamish Martin <[email protected]> Signed-off-by: Mark Tomlinson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17core: remove unneded headers for net cgroup controllers.Rosen, Rami2-2/+0
commit 3ed80a6 (cgroup: drop module support) made including module.h redundant in the net cgroup controllers, netclassid_cgroup.c and netprio_cgroup.c. This patch removes them. Signed-off-by: Rami Rosen <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17auth_gss: fix panic in gss_pipe_downcall() in fips modeScott Mayhew1-1/+1
On Mon, 15 Feb 2016, Trond Myklebust wrote: > Hi Scott, > > On Mon, Feb 15, 2016 at 2:28 PM, Scott Mayhew <[email protected]> wrote: > > md5 is disabled in fips mode, and attempting to import a gss context > > using md5 while in fips mode will result in crypto_alg_mod_lookup() > > returning -ENOENT, which will make its way back up to > > gss_pipe_downcall(), where the BUG() is triggered. Handling the -ENOENT > > allows for a more graceful failure. > > > > Signed-off-by: Scott Mayhew <[email protected]> > > --- > > net/sunrpc/auth_gss/auth_gss.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c > > index 799e65b..c30fc3b 100644 > > --- a/net/sunrpc/auth_gss/auth_gss.c > > +++ b/net/sunrpc/auth_gss/auth_gss.c > > @@ -737,6 +737,9 @@ gss_pipe_downcall(struct file *filp, const char __user *src, size_t mlen) > > case -ENOSYS: > > gss_msg->msg.errno = -EAGAIN; > > break; > > + case -ENOENT: > > + gss_msg->msg.errno = -EPROTONOSUPPORT; > > + break; > > default: > > printk(KERN_CRIT "%s: bad return from " > > "gss_fill_context: %zd\n", __func__, err); > > -- > > 2.4.3 > > > > Well debugged, but I unfortunately do have to ask if this patch is > sufficient? In addition to -ENOENT, and -ENOMEM, it looks to me as if > crypto_alg_mod_lookup() can also fail with -EINTR, -ETIMEDOUT, and > -EAGAIN. Don't we also want to handle those? You're right, I was focusing on the panic that I could easily reproduce. I'm still not sure how I could trigger those other conditions. > > In fact, peering into the rats nest that is > gss_import_sec_context_kerberos(), it looks as if that is just a tiny > subset of all the errors that we might run into. Perhaps the right > thing to do here is to get rid of the BUG() (but keep the above > printk) and just return a generic error? That sounds fine to me -- updated patch attached. -Scott >From d54c6b64a107a90a38cab97577de05f9a4625052 Mon Sep 17 00:00:00 2001 From: Scott Mayhew <[email protected]> Date: Mon, 15 Feb 2016 15:12:19 -0500 Subject: [PATCH] auth_gss: remove the BUG() from gss_pipe_downcall() Instead return a generic error via gss_msg->msg.errno. None of the errors returned by gss_fill_context() should necessarily trigger a kernel panic. Signed-off-by: Scott Mayhew <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2016-02-17xprtrdma: rpcrdma_bc_receive_call() should init rq_private_buf.lenChuck Lever1-0/+2
Some NFSv4.1 OPEN requests were hanging waiting for the NFS server to finish recalling delegations. Turns out that each NFSv4.1 CB request on RDMA gets a GARBAGE_ARGS reply from the Linux client. Commit 756b9b37cfb2e3dc added a line in bc_svc_process that overwrites the incoming rq_rcv_buf's length with the value in rq_private_buf.len. But rpcrdma_bc_receive_call() does not invoke xprt_complete_bc_request(), thus rq_private_buf.len is not initialized. svc_process_common() is invoked with a zero-length RPC message, and fails. Fixes: 756b9b37cfb2e3dc ('SUNRPC: Fix callback channel') Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-02-17net: add tc offload feature flagJohn Fastabend1-0/+1
Its useful to turn off the qdisc offload feature at a per device level. This gives us a big hammer to enable/disable offloading. More fine grained control (i.e. per rule) may be supported later. Signed-off-by: John Fastabend <[email protected]> Acked-by: Jiri Pirko <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17net: sched: add cls_u32 offload hooks for netdevsJohn Fastabend1-2/+97
This patch allows netdev drivers to consume cls_u32 offloads via the ndo_setup_tc ndo op. This works aligns with how network drivers have been doing qdisc offloads for mqprio. Signed-off-by: John Fastabend <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17net: rework setup_tc ndo op to consume general tc operandJohn Fastabend1-3/+6
This patch updates setup_tc so we can pass additional parameters into the ndo op in a generic way. To do this we provide structured union and type flag. This lets each classifier and qdisc provide its own set of attributes without having to add new ndo ops or grow the signature of the callback. Signed-off-by: John Fastabend <[email protected]> Acked-by: Jiri Pirko <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-17net: rework ndo tc op to consume additional qdisc handle parameterJohn Fastabend1-2/+3
The ndo_setup_tc() op was added to support drivers offloading tx qdiscs however only support for mqprio was ever added. So we only ever added support for passing the number of traffic classes to the driver. This patch generalizes the ndo_setup_tc op so that a handle can be provided to indicate if the offload is for ingress or egress or potentially even child qdiscs. CC: Murali Karicheri <[email protected]> CC: Shradha Shah <[email protected]> CC: Or Gerlitz <[email protected]> CC: Ariel Elior <[email protected]> CC: Jeff Kirsher <[email protected]> CC: Bruce Allan <[email protected]> CC: Jesse Brandeburg <[email protected]> CC: Don Skidmore <[email protected]> Signed-off-by: John Fastabend <[email protected]> Acked-by: Jiri Pirko <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16net: Export ip fragment sysctl to unprivileged usersNikolay Borisov1-4/+0
Now that all the ip fragmentation related sysctls are namespaceified there is no reason to hide them anymore from "root" users inside containers. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16ipv4: namespacify ip fragment max dist sysctl knobNikolay Borisov1-12/+13
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16ipv4: namespacify ip_early_demux sysctl knobNikolay Borisov3-12/+10
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16ipv4: Namespacify ip_dynaddr sysctl knobNikolay Borisov2-15/+10
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16igmp: net: Move igmp namespace init to correct fileNikolay Borisov2-6/+14
When igmp related sysctl were namespacified their initializatin was erroneously put into the tcp socket namespace constructor. This patch moves the relevant code into the igmp namespace constructor to keep things consistent. Also sprinkle some #ifdefs to silence warnings Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16ipv4: Namespaceify ip_default_ttl sysctl knobNikolay Borisov6-18/+23
Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16tcp: add tcpi_min_rtt and tcpi_notsent_bytes to tcp_infoEric Dumazet1-0/+6
tcpi_min_rtt reports the minimal rtt observed by TCP stack for the flow, in usec unit. Might be ~0U if not yet known. tcpi_notsent_bytes reports the amount of bytes in the write queue that were not yet sent. This is done in a single patch to not add a temporary 32bit padding hole in tcp_info. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16tcp: md5: release request socket instead of listenerEric Dumazet1-2/+4
If tcp_v4_inbound_md5_hash() returns an error, we must release the refcount on the request socket, not on the listener. The bug was added for IPv4 only. Fixes: 079096f103fac ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16net/ipv4: add dst cache support for gre lwtunnelsPaolo Abeni1-3/+10
In case of UDP traffic with datagram length below MTU this gives about 4% performance increase Signed-off-by: Paolo Abeni <[email protected]> Suggested-and-Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16net: add dst_cache to ovs vxlan lwtunnelPaolo Abeni3-1/+16
In case of UDP traffic with datagram length below MTU this give about 2% performance increase when tunneling over ipv4 and about 60% when tunneling over ipv6 Signed-off-by: Paolo Abeni <[email protected]> Suggested-and-acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16ip_tunnel: replace dst_cache with generic implementationPaolo Abeni3-73/+23
The current ip_tunnel cache implementation is prone to a race that will cause the wrong dst to be cached on cuncurrent dst cache miss and ip tunnel update via netlink. Replacing with the generic implementation fix the issue. Signed-off-by: Paolo Abeni <[email protected]> Suggested-and-acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16net: replace dst_cache ip6_tunnel implementation with the generic onePaolo Abeni4-104/+14
This also fix a potential race into the existing tunnel code, which could lead to the wrong dst to be permanenty cached: CPU1: CPU2: <xmit on ip6_tunnel> <cache lookup fails> dst = ip6_route_output(...) <tunnel params are changed via nl> dst_cache_reset() // no effect, // the cache is empty dst_cache_set() // the wrong dst // is permanenty stored // into the cache With the new dst implementation the above race is not possible since the first cache lookup after dst_cache_reset will fail due to the timestamp check Signed-off-by: Paolo Abeni <[email protected]> Suggested-and-acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16net: add dst_cache supportPaolo Abeni3-0/+173
This patch add a generic, lockless dst cache implementation. The need for lock is avoided updating the dst cache fields only in per cpu scope, and requiring that the cache manipulation functions are invoked with the local bh disabled. The refresh_ts and reset_ts fields are used to ensure the cache consistency in case of cuncurrent cache update (dst_cache_set*) and reset operation (dst_cache_reset). Consider the following scenario: CPU1: CPU2: <cache lookup with emtpy cache: it fails> <get dst via uncached route lookup> <related configuration changes> dst_cache_reset() dst_cache_set() The dst entry set passed to dst_cache_set() should not be used for later dst cache lookup, because it's obtained using old configuration values. Since the refresh_ts is updated only on dst_cache lookup, the cached value in the above scenario will be discarded on the next lookup. Signed-off-by: Paolo Abeni <[email protected]> Suggested-and-acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16tcp: do not set rtt_min to 1Eric Dumazet1-1/+4
There are some cases where rtt_us derives from deltas of jiffies, instead of using usec timestamps. Since we want to track minimal rtt, better to assume a delta of 0 jiffie might be in fact be very close to 1 jiffie. It is kind of sad jiffies_to_usecs(1) calls a function instead of simply using a constant. Fixes: f672258391b42 ("tcp: track min RTT using windowed min-filter") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Cc: Yuchung Cheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16net: dsa: remove phy_disconnect from error pathSascha Hauer1-1/+0
The phy has not been initialized, disconnecting it in the error path results in a NULL pointer exception. Drop the phy_disconnect from the error path. Signed-off-by: Sascha Hauer <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Acked-by: Neil Armstrong <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16tipc: refactor node xmit and fix memory leaksRichard Alpe2-24/+38
Refactor tipc_node_xmit() to fail fast and fail early. Fix several potential memory leaks in unexpected error paths. Reported-by: Dmitry Vyukov <[email protected]> Reviewed-by: Jon Maloy <[email protected]> Signed-off-by: Richard Alpe <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16tipc: fix premature addition of node to lookup tableJon Paul Maloy1-6/+6
In commit 5266698661401a ("tipc: let broadcast packet reception use new link receive function") we introduced a new per-node broadcast reception link instance. This link is created at the moment the node itself is created. Unfortunately, the allocation is done after the node instance has already been added to the node lookup hash table. This creates a potential race condition, where arriving broadcast packets are able to find and access the node before it has been fully initialized, and before the above mentioned link has been created. The result is occasional crashes in the function tipc_bcast_rcv(), which is trying to access the not-yet existing link. We fix this by deferring the addition of the node instance until after it has been fully initialized in the function tipc_node_create(). Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16bridge: mdb: avoid uninitialized variable warningArnd Bergmann1-2/+2
A recent change to the mdb code confused the compiler to the point where it did not realize that the port-group returned from br_mdb_add_group() is always valid when the function returns a nonzero return value, so we get a spurious warning: net/bridge/br_mdb.c: In function 'br_mdb_add': net/bridge/br_mdb.c:542:4: error: 'pg' may be used uninitialized in this function [-Werror=maybe-uninitialized] __br_mdb_notify(dev, entry, RTM_NEWMDB, pg); Slightly rearranging the code in br_mdb_add_group() makes the problem go away, as gcc is clever enough to see that both functions check for 'ret != 0'. Signed-off-by: Arnd Bergmann <[email protected]> Fixes: 9e8430f8d60d ("bridge: mdb: Passing the port-group pointer to br_mdb module") Signed-off-by: David S. Miller <[email protected]>
2016-02-16net: Copy inner L3 and L4 headers as unaligned on GRE TEBAlexander Duyck1-0/+7
This patch corrects the unaligned accesses seen on GRE TEB tunnels when generating hash keys. Specifically what this patch does is make it so that we force the use of skb_copy_bits when the GRE inner headers will be unaligned due to NET_IP_ALIGNED being a non-zero value. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16ethtool: ensure channel counts are within bounds during SCHANNELSKeller, Jacob E1-2/+11
Add a sanity check to ensure that all requested channel sizes are within bounds, which should reduce errors in driver implementation. Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}Keller, Jacob E1-0/+55
Ethernet drivers implementing both {GS}RXFH and {GS}CHANNELS ethtool ops incorrectly allow SCHANNELS when it would conflict with the settings from SRXFH. This occurs because it is not possible for drivers to understand whether their Rx flow indirection table has been configured or is in the default state. In addition, drivers currently behave in various ways when increasing the number of Rx channels. Some drivers will always destroy the Rx flow indirection table when this occurs, whether it has been set by the user or not. Other drivers will attempt to preserve the table even if the user has never modified it from the default driver settings. Neither of these situation is desirable because it leads to unexpected behavior or loss of user configuration. The correct behavior is to simply return -EINVAL when SCHANNELS would conflict with the current Rx flow table settings. However, it should only do so if the current settings were modified by the user. If we required that the new settings never conflict with the current (default) Rx flow settings, we would force users to first reduce their Rx flow settings and then reduce the number of Rx channels. This patch proposes a solution implemented in net/core/ethtool.c which ensures that all drivers behave correctly. It checks whether the RXFH table has been configured to non-default settings, and stores this information in a private netdev flag. When the number of channels is requested to change, it first ensures that the current Rx flow table is not going to assign flows to now disabled channels. Signed-off-by: Jacob Keller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller7-16/+91
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contain a rather large batch for your net that includes accumulated bugfixes, they are: 1) Run conntrack cleanup from workqueue process context to avoid hitting soft lockup via watchdog for large tables. This is required by the IPv6 masquerading extension. From Florian Westphal. 2) Use original skbuff from nfnetlink batch when calling netlink_ack() on error since this needs to access the skb->sk pointer. 3) Incremental fix on top of recent Sasha Levin's lock fix for conntrack resizing. 4) Fix several problems in nfnetlink batch message header sanitization and error handling, from Phil Turnbull. 5) Select NF_DUP_IPV6 based on CONFIG_IPV6, from Arnd Bergmann. 6) Fix wrong signess in return values on nf_tables counter expression, from Anton Protopopov. Due to the NetDev 1.1 organization burden, I had no chance to pass up this to you any sooner in this release cycle, sorry about that. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-02-16af_unix: Guard against other == sk in unix_dgram_sendmsgRainer Weikusat1-1/+6
The unix_dgram_sendmsg routine use the following test if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) { to determine if sk and other are in an n:1 association (either established via connect or by using sendto to send messages to an unrelated socket identified by address). This isn't correct as the specified address could have been bound to the sending socket itself or because this socket could have been connected to itself by the time of the unix_peer_get but disconnected before the unix_state_lock(other). In both cases, the if-block would be entered despite other == sk which might either block the sender unintentionally or lead to trying to unlock the same spin lock twice for a non-blocking send. Add a other != sk check to guard against this. Fixes: 7d267278a9ec ("unix: avoid use-after-free in ep_remove_wait_queue") Reported-By: Philipp Hahn <[email protected]> Signed-off-by: Rainer Weikusat <[email protected]> Tested-by: Philipp Hahn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16af_unix: Don't set err in unix_stream_read_generic unless there was an errorRainer Weikusat1-6/+10
The present unix_stream_read_generic contains various code sequences of the form err = -EDISASTER; if (<test>) goto out; This has the unfortunate side effect of possibly causing the error code to bleed through to the final out: return copied ? : err; and then to be wrongly returned if no data was copied because the caller didn't supply a data buffer, as demonstrated by the program available at http://pad.lv/1540731 Change it such that err is only set if an error condition was detected. Fixes: 3822b5c2fc62 ("af_unix: Revert 'lock_interruptible' in stream receive code") Reported-by: Joseph Salisbury <[email protected]> Signed-off-by: Rainer Weikusat <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-16batman-adv: Avoid endless loop in bat-on-bat netdevice checkAndrew Lunn1-0/+25
batman-adv checks in different situation if a new device is already on top of a different batman-adv device. This is done by getting the iflink of a device and all its parent. It assumes that this iflink is always a parent device in an acyclic graph. But this assumption is broken by devices like veth which are actually a pair of two devices linked to each other. The recursive check would therefore get veth0 when calling dev_get_iflink on veth1. And it gets veth0 when calling dev_get_iflink with veth1. Creating a veth pair and loading batman-adv freezes parts of the system ip link add veth0 type veth peer name veth1 modprobe batman-adv An RCU stall will be detected on the system which cannot be fixed. INFO: rcu_sched self-detected stall on CPU 1: (5264 ticks this GP) idle=3e9/140000000000001/0 softirq=144683/144686 fqs=5249 (t=5250 jiffies g=46 c=45 q=43) Task dump for CPU 1: insmod R running task 0 247 245 0x00000008 ffffffff8151f140 ffffffff8107888e ffff88000fd141c0 ffffffff8151f140 0000000000000000 ffffffff81552df0 ffffffff8107b420 0000000000000001 ffff88000e3fa700 ffffffff81540b00 ffffffff8107d667 0000000000000001 Call Trace: <IRQ> [<ffffffff8107888e>] ? rcu_dump_cpu_stacks+0x7e/0xd0 [<ffffffff8107b420>] ? rcu_check_callbacks+0x3f0/0x6b0 [<ffffffff8107d667>] ? hrtimer_run_queues+0x47/0x180 [<ffffffff8107cf9d>] ? update_process_times+0x2d/0x50 [<ffffffff810873fb>] ? tick_handle_periodic+0x1b/0x60 [<ffffffff810290ae>] ? smp_trace_apic_timer_interrupt+0x5e/0x90 [<ffffffff813bbae2>] ? apic_timer_interrupt+0x82/0x90 <EOI> [<ffffffff812c3fd7>] ? __dev_get_by_index+0x37/0x40 [<ffffffffa0031f3e>] ? batadv_hard_if_event+0xee/0x3a0 [batman_adv] [<ffffffff812c5801>] ? register_netdevice_notifier+0x81/0x1a0 [...] This can be avoided by checking if two devices are each others parent and stopping the check in this situation. Fixes: b7eddd0b3950 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface") Signed-off-by: Andrew Lunn <[email protected]> [[email protected]: rewritten description, extracted fix] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-16batman-adv: Only put orig_node_vlan list reference when removedSven Eckelmann1-2/+4
The batadv_orig_node_vlan reference counter in batadv_tt_global_size_mod can only be reduced when the list entry was actually removed. Otherwise the reference counter may reach zero when batadv_tt_global_size_mod is called from two different contexts for the same orig_node_vlan but only one context is actually removing the entry from the list. The release function for this orig_node_vlan is not called inside the vlan_list_lock spinlock protected region because the function batadv_tt_global_size_mod still holds a orig_node_vlan reference for the object pointer on the stack. Thus the actual release function (when required) will be called only at the end of the function. Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>