aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2015-08-25ip6_gre: release cached dst on tunnel removalhuaibin Wang1-0/+1
When a tunnel is deleted, the cached dst entry should be released. This problem may prevent the removal of a netns (seen with a x-netns IPv6 gre tunnel): unregister_netdevice: waiting for lo to become free. Usage count = 3 CC: Dmitry Kozlov <[email protected]> Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: huaibin Wang <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25route: fix a use-after-freeWANG Cong1-1/+2
This patch fixes the following crash: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc7+ #166 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88010656d280 ti: ffff880106570000 task.ti: ffff880106570000 RIP: 0010:[<ffffffff8182f91b>] [<ffffffff8182f91b>] dst_destroy+0xa6/0xef RSP: 0018:ffff880107603e38 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff8800d225a000 RCX: ffffffff82250fd0 RDX: 0000000000000001 RSI: ffffffff82250fd0 RDI: 6b6b6b6b6b6b6b6b RBP: ffff880107603e58 R08: 0000000000000001 R09: 0000000000000001 R10: 000000000000b530 R11: ffff880107609000 R12: 0000000000000000 R13: ffffffff82343c40 R14: 0000000000000000 R15: ffffffff8182fb4f FS: 0000000000000000(0000) GS:ffff880107600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fcabd9d3000 CR3: 00000000d7279000 CR4: 00000000000006e0 Stack: ffffffff82250fd0 ffff8801077d6f00 ffffffff82253c40 ffff8800d225a000 ffff880107603e68 ffffffff8182fb5d ffff880107603f08 ffffffff810d795e ffffffff810d7648 ffff880106574000 ffff88010656d280 ffff88010656d280 Call Trace: <IRQ> [<ffffffff8182fb5d>] dst_destroy_rcu+0xe/0x1d [<ffffffff810d795e>] rcu_process_callbacks+0x618/0x7eb [<ffffffff810d7648>] ? rcu_process_callbacks+0x302/0x7eb [<ffffffff8182fb4f>] ? dst_gc_task+0x1eb/0x1eb [<ffffffff8107e11b>] __do_softirq+0x178/0x39f [<ffffffff8107e52e>] irq_exit+0x41/0x95 [<ffffffff81a4f215>] smp_apic_timer_interrupt+0x34/0x40 [<ffffffff81a4d5cd>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff8100b968>] ? default_idle+0x21/0x32 [<ffffffff8100b966>] ? default_idle+0x1f/0x32 [<ffffffff8100bf19>] arch_cpu_idle+0xf/0x11 [<ffffffff810b0bc7>] default_idle_call+0x1f/0x21 [<ffffffff810b0dce>] cpu_startup_entry+0x1ad/0x273 [<ffffffff8102fe67>] start_secondary+0x135/0x156 dst is freed right before lwtstate_put(), this is not correct... Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry") Acked-by: Jiri Benc <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25net-next: Fix warning while make xmldocs caused by skbuff.cMasanari Iida1-2/+2
This patch fix following warnings. .//net/core/skbuff.c:407: warning: No description found for parameter 'len' .//net/core/skbuff.c:407: warning: Excess function parameter 'length' description in '__netdev_alloc_skb' .//net/core/skbuff.c:476: warning: No description found for parameter 'len' .//net/core/skbuff.c:476: warning: Excess function parameter 'length' description in '__napi_alloc_skb' Signed-off-by: Masanari Iida <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25ah4: Fix error return in ah_input().David S. Miller1-1/+3
Noticed by Herbert Xu. Signed-off-by: David S. Miller <[email protected]>
2015-08-25ah6: fix error return codeJulia Lawall1-1/+3
Return a negative error code on failure. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier ret; expression e1,e2; @@ ( if (\(ret < 0\|ret != 0\)) { ... return ret; } | ret = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: check for valid cm_id before initiating connection[email protected]1-2/+13
Connection could have been dropped while the route is being resolved so check for valid cm_id before initiating the connection. Reviewed-by: Ajaykumar Hotchandani <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: return EMSGSIZE for oversize requests before processing/queueingMukesh Kacker1-5/+6
rds_send_queue_rm() allows for the "current datagram" being queued to exceed SO_SNDBUF thresholds by checking bytes queued without counting in length of current datagram. (Since sk_sndbuf is set to twice requested SO_SNDBUF value as a kernel heuristic this is usually fine!) If this "current datagram" squeezing past the threshold is itself many times the size of the sk_sndbuf threshold itself then even twice the SO_SNDBUF does not save us and it gets queued but cannot be transmitted. Threads block and deadlock and device becomes unusable. The check for this datagram not exceeding SNDBUF thresholds (EMSGSIZE) is not done on this datagram as that check is only done if queueing attempt fails. (Datagrams that follow this datagram fail queueing attempts, go through the check and eventually trip EMSGSIZE error but zero length datagrams silently fail!) This fix moves the check for datagrams exceeding SNDBUF limits before any processing or queueing is attempted and returns EMSGSIZE early in the rds_sndmsg() code. This change also ensures that all datagrams get checked for exceeding SNDBUF/sk_sndbuf size limits and the large datagrams that exceed those limits do not get to rds_send_queue_rm() code for processing. Signed-off-by: Mukesh Kacker <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: make sure rds_send_drop_to properly takes the m_rs_lock[email protected]1-1/+15
rds_send_drop_to() is used during socket tear down to find all the messages on the socket and flush them . It can race with the acking code unless it takes the m_rs_lock on each and every message. This plugs a hole where we didn't take m_rs_lock on any message that didn't have the RDS_MSG_ON_CONN set. Taking m_rs_lock avoids double frees and other memory corruptions as the ack code trusts the message m_rs pointer on a socket that had actually been freed. We must take m_rs_lock to access m_rs. Because of lock nesting and rs access, we also need to acquire rs_lock. Reviewed-by: Ajaykumar Hotchandani <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: Don't destroy the rdma id until after we're done using itSantosh Shilimkar1-1/+2
During connection resets, we are destroying the rdma id too soon. We can't destroy it when it is still in use. So lets move rdma_destroy_id() after we clear the rings. Reviewed-by: Ajaykumar Hotchandani <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: Fix assertion level from fatal to warning[email protected]2-2/+2
Fix the asserion level since its not fatal and can be hit in normal execution paths. There is no need to take the system down. We keep the WARN_ON() to detect the condition if we get here with bad pages. Reviewed-by: Ajaykumar Hotchandani <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: Make sure we do a signaled send for large-send[email protected]1-0/+5
WR(Work Requests )always generate a WC(Work Completion) with signaled send. Default RDS ib code is setup for un-signaled completion. Since RDS connction is persistent, we can end up sending the data even after large-send when the remote end is not active(for any reason). By doing a signaled send at least once per large-send, we can at least detect the problem in work completion handler there by avoiding sending more data to inactive remote. Reviewed-by: Ajaykumar Hotchandani <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: Mark message mapped before transmit[email protected]1-8/+16
rds_send_xmit() marks the rds message map flag after xmit_[rdma/atomic]() which is clearly wrong. We need to maintain the ownership between transport and rds. Also take care of error path. Reviewed-by: Ajaykumar Hotchandani <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: add a sock_destruct callback debug aid[email protected]1-0/+9
This helps to detect the accidental processes/apps trying to destroy the RDS socket which they are sharing with other processes/apps. Reviewed-by: Ajaykumar Hotchandani <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: check for congestion updates during rds_send_xmit[email protected]1-1/+2
Ensure we don't keep sending the data if the link is congested. Reviewed-by: Ajaykumar Hotchandani <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: make sure we post recv buffers[email protected]5-8/+57
If we get an ENOMEM during rds_ib_recv_refill, we might never come back and refill again later. Patch makes sure to kick krdsd into helping out. To achieve this we add RDS_RECV_REFILL flag and update in the refill path based on that so that at least some therad will keep posting receive buffers. Since krdsd and softirq both might race for refill, we decide to schedule on work queue based on ring_low instead of ring_empty. Reviewed-by: Ajaykumar Hotchandani <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: don't update ip address tables if the address hasn't changed[email protected]1-2/+7
If the ip address tables hasn't changed, there is no need to remove them only to be added back again. Lets fix it. Reviewed-by: Ajaykumar Hotchandani <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: destroy the ib state earlier during shutdown[email protected]1-8/+10
Destroy ib state early during shutdown. Otherwise we can get callbacks after the QP isn't really able to handle them. Reviewed-by: Ajaykumar Hotchandani <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: always free recv frag as we free its ring entry[email protected]1-3/+10
We were still seeing rare occurrences of the WARN_ON(recv->r_frag) which indicates that the recv refill path was finding allocated frags in ring entries that were marked free. These were usually followed by OOM crashes. They only seem to be occurring in the presence of completion errors and connection resets. This patch ensures that we free the frag as we mark the ring entry free. This should stop the refill path from finding allocated frags in ring entries that were marked free. Reviewed-by: Ajaykumar Hotchandani <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25RDS: restore return value in rds_cmsg_rdma_args()[email protected]1-0/+2
In rds_cmsg_rdma_args() 'ret' is used by rds_pin_pages() which returns number of pinned pages on success. And the same value is returned to the caller of rds_cmsg_rdma_args() on success which is not intended. Commit f4a3fc03c1d7 ("RDS: Clean up error handling in rds_cmsg_rdma_args") removed the 'ret = 0' line which broke RDS RDMA mode. Fix it by restoring the return value on rds_pin_pages() success keeping the clean-up in place. Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25tcp: refine pacing rate determinationEric Dumazet2-1/+36
When TCP pacing was added back in linux-3.12, we chose to apply a fixed ratio of 200 % against current rate, to allow probing for optimal throughput even during slow start phase, where cwnd can be doubled every other gRTT. At Google, we found it was better applying a different ratio while in Congestion Avoidance phase. This ratio was set to 120 %. We've used the normal tcp_in_slow_start() helper for a while, then tuned the condition to select the conservative ratio as soon as cwnd >= ssthresh/2 : - After cwnd reduction, it is safer to ramp up more slowly, as we approach optimal cwnd. - Initial ramp up (ssthresh == INFINITY) still allows doubling cwnd every other RTT. Signed-off-by: Eric Dumazet <[email protected]> Cc: Neal Cardwell <[email protected]> Cc: Yuchung Cheng <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25xfrm: Use VRF master index if output device is enslavedDavid Ahern2-4/+10
Directs route lookups to VRF table. Compiles out if NET_VRF is not enabled. With this patch able to successfully bring up ipsec tunnels in VRFs, even with duplicate network configuration. Signed-off-by: David Ahern <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Acked-by: Steffen Klassert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25tcp: fix slow start after idle vs TSO/GSOEric Dumazet3-8/+9
slow start after idle might reduce cwnd, but we perform this after first packet was cooked and sent. With TSO/GSO, it means that we might send a full TSO packet even if cwnd should have been reduced to IW10. Moving the SSAI check in skb_entail() makes sense, because we slightly reduce number of times this check is done, especially for large send() and TCP Small queue callbacks from softirq context. As Neal pointed out, we also need to perform the check if/when receive window opens. Tested: Following packetdrill test demonstrates the problem // Test of slow start after idle `sysctl -q net.ipv4.tcp_slow_start_after_idle=1` 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> +.100 < . 1:1(0) ack 1 win 511 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0 +0 write(4, ..., 26000) = 26000 +0 > . 1:5001(5000) ack 1 +0 > . 5001:10001(5000) ack 1 +0 %{ assert tcpi_snd_cwnd == 10 }% +.100 < . 1:1(0) ack 10001 win 511 +0 %{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }% +0 > . 10001:20001(10000) ack 1 +0 > P. 20001:26001(6000) ack 1 +.100 < . 1:1(0) ack 26001 win 511 +0 %{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }% +4 write(4, ..., 20000) = 20000 // If slow start after idle works properly, we should send 5 MSS here (cwnd/2) +0 > . 26001:31001(5000) ack 1 +0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }% +0 > . 31001:36001(5000) ack 1 Signed-off-by: Eric Dumazet <[email protected]> Cc: Neal Cardwell <[email protected]> Cc: Yuchung Cheng <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-25batman-adv: beautify supported routing algorithm listMarek Lindner1-1/+1
Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: Fix conditional statements indentationSven Eckelmann1-1/+1
commit 29b9256e6631 ("batman-adv: consider outgoing interface in OGM sending") incorrectly indented the interface check code. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: Add lockdep_asserts for documented external locksSven Eckelmann3-0/+11
Some functions already have documentation about locks they require inside their kerneldoc header. These can be directly tested during runtime using the lockdep asserts. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: Annotate deleting functions with external lock via lockdepSven Eckelmann4-5/+25
Functions which use (h)list_del* are requiring correct locking when they operate on global lists. Most of the time the search in the list and the delete are done in the same function. All other cases should have it visible that they require a special lock to avoid race conditions. Lockdep asserts can be used to check these problem during runtime when the lockdep functionality is enabled. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: convert bat_priv->tt.req_list to hlistMarek Lindner3-16/+19
Since the list's tail is never accessed using a double linked list head wastes memory. Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: Fix gw_bandwidth calculation on 32 bit systemsSven Eckelmann1-7/+42
The TVLV for the gw_bandwidth stores everything as u32. But the gw_bandwidth reads the signed long which limits the maximum value to (2 ** 31 - 1) on systems with 4 byte long. Also the input value is always converted from either Mibit/s or Kibit/s to 100Kibit/s. This reduces the values even further when the user sets it via the default unit Kibit/s. It may even cause an integer overflow and end up with a value the user never intended. Instead read the values as u64, check for possible overflows, do the unit adjustments and then reduce the size to u32. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: Return EINVAL on invalid gw_bandwidth changeSven Eckelmann1-2/+2
Invalid speed settings by the user are currently acknowledged as correct but not stored. Instead the return of the store operation of the file "gw_bandwidth" should indicate that the given value is not acceptable. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: prevent potential hlist double deletionMarek Lindner1-1/+1
The hlist_del_rcu() call in batadv_tt_global_size_mod() does not check if the element still is part of the list prior to deletion. The atomic list counter should prevent the worst but converting to hlist_del_init_rcu() ensures the element can't be deleted more than once. Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: convert orig_node->vlan_list to hlistMarek Lindner3-9/+10
Since the list's tail is never accessed using a double linked list head wastes memory. Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: Remove batadv_ types forward declarationsSven Eckelmann11-27/+0
main.h is included in every file and is the only way to access types.h. This makes forward declarations for all types defined in types.h unnecessary. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: rename batadv_new_tt_req_node to batadv_tt_req_node_newMarek Lindner1-4/+9
Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: update kernel doc of batadv_tt_global_del_orig_entry()Marek Lindner1-6/+12
The updated kernel doc & additional comment shall prevent accidental copy & paste errors or calling the function without the required precautions. Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: Remove multiple assignment per lineSven Eckelmann7-17/+39
The Linux CodingStyle disallows multiple assignments in a single line. (see chapter 1) Reported-by: Markus Pargmann <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: Fix kerneldoc over 80 column linesSven Eckelmann2-4/+4
Kerneldoc required single line documentation in the past (before 2009). Therefore, the 80 columns limit per line check of checkpatch was disabled for kerneldoc. But kerneldoc is not excluded anymore from it and checkpatch now enabled the check again. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-25batman-adv: Replace C99 int types with kernel typeSven Eckelmann30-585/+563
(s|u)(8|16|32|64) are the preferred types in the kernel. The use of the standard C99 types u?int(8|16|32|64)_t are objected by some people and even checkpatch now warns about using them. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-08-24net: Fix RCU splat in af_keyDavid Ahern1-23/+23
Hit the following splat testing VRF change for ipsec: [ 113.475692] =============================== [ 113.476194] [ INFO: suspicious RCU usage. ] [ 113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED Not tainted [ 113.477545] ------------------------------- [ 113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 Illegal context switch in RCU read-side critical section! [ 113.479288] [ 113.479288] other info that might help us debug this: [ 113.479288] [ 113.480207] [ 113.480207] rcu_scheduler_active = 1, debug_locks = 1 [ 113.480931] 2 locks held by setkey/6829: [ 113.481371] #0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff814e9887>] pfkey_sendmsg+0xfb/0x213 [ 113.482509] #1: (rcu_read_lock){......}, at: [<ffffffff814e767f>] rcu_read_lock+0x0/0x6e [ 113.483509] [ 113.483509] stack backtrace: [ 113.484041] CPU: 0 PID: 6829 Comm: setkey Not tainted 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED [ 113.485422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 113.486845] 0000000000000001 ffff88001d4c7a98 ffffffff81518af2 ffffffff81086962 [ 113.487732] ffff88001d538480 ffff88001d4c7ac8 ffffffff8107ae75 ffffffff8180a154 [ 113.488628] 0000000000000b30 0000000000000000 00000000000000d0 ffff88001d4c7ad8 [ 113.489525] Call Trace: [ 113.489813] [<ffffffff81518af2>] dump_stack+0x4c/0x65 [ 113.490389] [<ffffffff81086962>] ? console_unlock+0x3d6/0x405 [ 113.491039] [<ffffffff8107ae75>] lockdep_rcu_suspicious+0xfa/0x103 [ 113.491735] [<ffffffff81064032>] rcu_preempt_sleep_check+0x45/0x47 [ 113.492442] [<ffffffff8106404d>] ___might_sleep+0x19/0x1c8 [ 113.493077] [<ffffffff81064268>] __might_sleep+0x6c/0x82 [ 113.493681] [<ffffffff81133190>] cache_alloc_debugcheck_before.isra.50+0x1d/0x24 [ 113.494508] [<ffffffff81134876>] kmem_cache_alloc+0x31/0x18f [ 113.495149] [<ffffffff814012b5>] skb_clone+0x64/0x80 [ 113.495712] [<ffffffff814e6f71>] pfkey_broadcast_one+0x3d/0xff [ 113.496380] [<ffffffff814e7b84>] pfkey_broadcast+0xb5/0x11e [ 113.497024] [<ffffffff814e82d1>] pfkey_register+0x191/0x1b1 [ 113.497653] [<ffffffff814e9770>] pfkey_process+0x162/0x17e [ 113.498274] [<ffffffff814e9895>] pfkey_sendmsg+0x109/0x213 In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes the RCU lock. Since pfkey_broadcast takes the RCU lock the allocation argument is pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock. The one call outside of rcu can be done with GFP_KERNEL. Fixes: 7f6b9dbd5afbd ("af_key: locking change") Signed-off-by: David Ahern <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-24ila: Precompute checksum difference for translationsTom Herbert1-0/+18
In the ILA build state for LWT compute the checksum difference to apply to transport checksums that include the IPv6 pseudo header. The difference is between the route destination (from fib6_config) and the locator to write. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-24lwt: Add cfg argument to build_stateTom Herbert6-10/+19
Add cfg and family arguments to lwt build state functions. cfg is a void pointer and will either be a pointer to a fib_config or fib6_config structure. The family parameter indicates which one (either AF_INET or AF_INET6). LWT encpasulation implementation may use the fib configuration to build the LWT state. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-23Merge tag 'nfc-next-4.3-1' of ↵David S. Miller3-5/+106
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC 4.3 pull request This is the NFC pull request for 4.3. With this one we have: - A new driver for Samsung's S3FWRN5 NFC chipset. In order to properly support this driver, a few NCI core routines needed to be exported. Future drivers like Intel's Fields Peak will benefit from this. - SPI support as a physical transport for STM st21nfcb. - An additional netlink API for sending replies back to userspace from vendor commands. - 2 small fixes for TI's trf7970a - A few st-nci fixes. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-08-23tipc: fix stale link problem during synchronizationJon Paul Maloy2-3/+12
Recent changes to the link synchronization means that we can now just drop packets arriving on the synchronizing link before the synch point is reached. This has lead to significant simplifications to the implementation, but also turns out to have a flip side that we need to consider. Under unlucky circumstances, the two endpoints may end up repeatedly dropping each other's packets, while immediately asking for retransmission of the same packets, just to drop them once more. This pattern will eventually be broken when the synch point is reached on the other link, but before that, the endpoints may have arrived at the retransmission limit (stale counter) that indicates that the link should be broken. We see this happen at rare occasions. The fix for this is to not ask for retransmissions when a link is in state LINK_SYNCHING. The fact that the link has reached this state means that it has already received the first SYNCH packet, and that it knows the synch point. Hence, it doesn't need any more packets until the other link has reached the synch point, whereafter it can go ahead and ask for the missing packets. However, because of the reduced traffic on the synching link that follows this change, it may now take longer to discover that the synch point has been reached. We compensate for this by letting all packets, on any of the links, trig a check for synchronization termination. This is possible because the packets themselves don't contain any information that is needed for discovering this condition. Reviewed-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-23tipc: interrupt link synchronization when a link goes downJon Paul Maloy2-4/+9
When we introduced the new link failover/synch mechanism in commit 6e498158a827fd515b514842e9a06bdf0f75ab86 ("tipc: move link synch and failover to link aggregation level"), we missed the case when the non-tunnel link goes down during the link synchronization period. In this case the tunnel link will remain in state LINK_SYNCHING, something leading to unpredictable behavior when the failover procedure is initiated. In this commit, we ensure that the node and remaining link goes back to regular communication state (SELF_UP_PEER_UP/LINK_ESTABLISHED) when one of the parallel links goes down. We also ensure that we don't re-enter synch mode if subsequent SYNCH packets arrive on the remaining link. Reviewed-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-23tipc: eliminate risk of premature link setup during failoverJon Paul Maloy1-1/+3
When a link goes down, and there is still a working link towards its destination node, a failover is initiated, and the failed link is not allowed to re-establish until that procedure is finished. To ensure this, the concerned link endpoints are set to state LINK_FAILINGOVER, and the node endpoints to NODE_FAILINGOVER during the failover period. However, if the link reset is due to a disabled bearer, the corres- ponding link endpoint is deleted, and only the node endpoint knows about the ongoing failover. Now, if the disabled bearer is re-enabled during the failover period, the discovery mechanism may create a new link endpoint that is ready to be established, despite that this is not permitted. This situation may cause both the ongoing failover and any subsequent link synchronization to fail. In this commit, we ensure that a newly created link goes directly to state LINK_FAILINGOVER if the corresponding node state is NODE_FAILINGOVER. This eliminates the problem described above. Furthermore, we tighten the criteria for which packets are allowed to end a failover state in the function tipc_node_check_state(). By checking that the receiving link is up and running, instead of just checking that it is not in failover mode, we eliminate the risk that protocol packets from the re-created link may cause the failover to be prematurely terminated. Reviewed-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-23netlink: mmap: fix tx type checkKen-ichirou MATSUZAWA1-1/+1
I can't send netlink message via mmaped netlink socket since commit: a8866ff6a5bce7d0ec465a63bc482a85c09b0d39 netlink: make the check for "send from tx_ring" deterministic msg->msg_iter.type is set to WRITE (1) at SYSCALL_DEFINE6(sendto, ... import_single_range(WRITE, ... iov_iter_init(1, WRITE, ... call path, so that we need to check the type by iter_is_iovec() to accept the WRITE. Signed-off-by: Ken-ichirou MATSUZAWA <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-23fou: Do WARN_ON_ONCE in gue_gro_receive for bad proto callbacksTom Herbert1-1/+1
Do WARN_ON_ONCE instead of WARN_ON in gue_gro_receive when the offload callcaks are bad (either don't exist or gro_receive is not specified). Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-23gro: Fix remcsum offload to deal with frags in GROTom Herbert1-16/+12
The remote checksum offload GRO did not consider the case that frag0 might be in use. This patch fixes that by accessing headers using the skb_gro functions and not saving offsets relative to skb->head. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-08-22Merge branch 'for-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 9p regression fix from Al Viro: "Fix for breakage introduced when switching p9_client_{read,write}() to struct iov_iter * (went into 4.1)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: 9p: ensure err is initialized to 0 in p9_client_read/write
2015-08-229p: ensure err is initialized to 0 in p9_client_read/writeVincent Bernat1-0/+2
Some use of those functions were providing unitialized values to those functions. Notably, when reading 0 bytes from an empty file on a 9P filesystem, the return code of read() was not 0. Tested with this simple program: #include <assert.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main(int argc, const char **argv) { assert(argc == 2); char buffer[256]; int fd = open(argv[1], O_RDONLY|O_NOCTTY); assert(fd >= 0); assert(read(fd, buffer, 0) == 0); return 0; } Cc: [email protected] # v4.1 Signed-off-by: Vincent Bernat <[email protected]> Signed-off-by: Al Viro <[email protected]>
2015-08-21mm: make page pfmemalloc check more robustMichal Hocko1-1/+1
Commit c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") added checks for page->pfmemalloc to __skb_fill_page_desc(): if (page->pfmemalloc && !page->mapping) skb->pfmemalloc = true; It assumes page->mapping == NULL implies that page->pfmemalloc can be trusted. However, __delete_from_page_cache() can set set page->mapping to NULL and leave page->index value alone. Due to being in union, a non-zero page->index will be interpreted as true page->pfmemalloc. So the assumption is invalid if the networking code can see such a page. And it seems it can. We have encountered this with a NFS over loopback setup when such a page is attached to a new skbuf. There is no copying going on in this case so the page confuses __skb_fill_page_desc which interprets the index as pfmemalloc flag and the network stack drops packets that have been allocated using the reserves unless they are to be queued on sockets handling the swapping which is the case here and that leads to hangs when the nfs client waits for a response from the server which has been dropped and thus never arrive. The struct page is already heavily packed so rather than finding another hole to put it in, let's do a trick instead. We can reuse the index again but define it to an impossible value (-1UL). This is the page index so it should never see the value that large. Replace all direct users of page->pfmemalloc by page_is_pfmemalloc which will hide this nastiness from unspoiled eyes. The information will get lost if somebody wants to use page->index obviously but that was the case before and the original code expected that the information should be persisted somewhere else if that is really needed (e.g. what SLAB and SLUB do). [[email protected]: fix blooper in slub] Fixes: c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") Signed-off-by: Michal Hocko <[email protected]> Debugged-by: Vlastimil Babka <[email protected]> Debugged-by: Jiri Bohac <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: David Miller <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: <[email protected]> [3.6+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>