aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-01-19svcrdma: Add class for RDMA backwards direction transportChuck Lever7-15/+470
To support the server-side of an NFSv4.1 backchannel on RDMA connections, add a transport class that enables backward direction messages on an existing forward channel connection. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Bruce Fields <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-01-19svcrdma: Define maximum number of backchannel requestsChuck Lever2-12/+18
Extra resources for handling backchannel requests have to be pre-allocated when a transport instance is created. Set up additional fields in svcxprt_rdma to track these resources. The max_requests fields are elements of the RPC-over-RDMA protocol, so they should be u32. To ensure that unsigned arithmetic is used everywhere, some other fields in the svcxprt_rdma struct are updated. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Bruce Fields <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-01-19svcrdma: Make map_xdr non-staticChuck Lever1-7/+7
Pre-requisite to use map_xdr in the backchannel code. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Bruce Fields <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-01-19svcrdma: Remove last two __GFP_NOFAIL call sitesChuck Lever2-2/+7
Clean up. These functions can otherwise fail, so check for page allocation failures too. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Bruce Fields <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-01-19svcrdma: Add gfp flags to svc_rdma_post_recv()Chuck Lever2-4/+6
svc_rdma_post_recv() allocates pages for receive buffers on-demand. It uses GFP_KERNEL so the allocator tries hard, and may sleep. But I'm about to add a call to svc_rdma_post_recv() from a function that may not sleep. Since all svc_rdma_post_recv() call sites can tolerate its failure, allow it to fail if the page allocator returns nothing. Longer term, receive buffers, being a finite resource per-connection, should be pre-allocated and re-used. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Bruce Fields <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-01-19svcrdma: Remove unused req_map and ctxt kmem_cachesChuck Lever2-42/+0
Clean up. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Bruce Fields <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-01-19svcrdma: Improve allocation of struct svc_rdma_req_mapChuck Lever2-13/+78
To ensure this allocation cannot fail and will not sleep, pre-allocate the req_map structures per-connection. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Bruce Fields <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-01-19svcrdma: Improve allocation of struct svc_rdma_op_ctxtChuck Lever1-13/+89
When the maximum payload size of NFS READ and WRITE was increased by commit cc9a903d915c ("svcrdma: Change maximum server payload back to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt increased to over 6KB (on x86_64). That makes allocating one of these from a kmem_cache more likely to fail in situations when system memory is exhausted. Since I'm about to add a caller where this allocation must always work _and_ it cannot sleep, pre-allocate ctxts for each connection. Another motivation for this change is that NFSv4.x servers are required by specification not to drop NFS requests. Pre-allocating memory resources reduces the likelihood of a drop. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Bruce Fields <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-01-19svcrdma: Clean up process_context()Chuck Lever1-23/+21
Be sure the completed ctxt is put in every path. The xprt enqueue can take a while, so put the completed ctxt back in circulation _before_ enqueuing the xprt. Remove/disable debugging. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Bruce Fields <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-01-19svcrdma: Clean up rdma_create_xprt()Chuck Lever1-8/+1
kzalloc is used here, so setting the atomic fields to zero is unnecessary. sc_ord is set again in handle_connect_req. The other fields are re-initialized in svc_rdma_accept(). Signed-off-by: Chuck Lever <[email protected]> Acked-by: Bruce Fields <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2016-01-19soreuseport: fix NULL ptr dereference SO_REUSEPORT after bindCraig Gallek1-1/+8
Marc Dionne discovered a NULL pointer dereference when setting SO_REUSEPORT on a socket after it is bound. This patch removes the assumption that at least one socket in the reuseport group is bound with the SO_REUSEPORT option before other bind calls occur. Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Reported-by: Marc Dionne <[email protected]> Signed-off-by: Craig Gallek <[email protected]> Tested-by: Marc Dionne <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-01-19af_iucv: Validate socket address length in iucv_sock_bind()Ursula Braun1-0/+3
Signed-off-by: Ursula Braun <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Reviewed-by: Evgeny Cherkashin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-01-19udp: fix potential infinite loop in SO_REUSEPORT logicEric Dumazet2-22/+42
Using a combination of connected and un-connected sockets, Dmitry was able to trigger soft lockups with his fuzzer. The problem is that sockets in the SO_REUSEPORT array might have different scores. Right after sk2=socket(), setsockopt(sk2,...,SO_REUSEPORT, on) and bind(sk2, ...), but _before_ the connect(sk2) is done, sk2 is added into the soreuseport array, with a score which is smaller than the score of first socket sk1 found in hash table (I am speaking of the regular UDP hash table), if sk1 had the connect() done, giving a +8 to its score. hash bucket [X] -> sk1 -> sk2 -> NULL sk1 score = 14 (because it did a connect()) sk2 score = 6 SO_REUSEPORT fast selection is an optimization. If it turns out the score of the selected socket does not match score of first socket, just fallback to old SO_REUSEPORT logic instead of trying to be too smart. Normal SO_REUSEPORT users do not mix different kind of sockets, as this mechanism is used for load balance traffic. Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Craig Gallek <[email protected]> Acked-by: Craig Gallek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-01-18ovs: limit ovs recursions in ovs_execute_actions to not corrupt stackHannes Frederic Sowa1-5/+14
It was seen that defective configurations of openvswitch could overwrite the STACK_END_MAGIC and cause a hard crash of the kernel because of too many recursions within ovs. This problem arises due to the high stack usage of openvswitch. The rest of the kernel is fine with the current limit of 10 (RECURSION_LIMIT). We use the already existing recursion counter in ovs_execute_actions to implement an upper bound of 5 recursions. Cc: Pravin Shelar <[email protected]> Cc: Simon Horman <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Simon Horman <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-01-18netfilter: nf_tables_netdev: fix error path in module initializationPablo Neira Ayuso1-4/+4
Unregister the chain type and return error, otherwise this leaks the subscription to the netdevice notifier call chain. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-01-18netfilter: xt_TCPMSS: handle CHECKSUM_COMPLETE in tcpmss_tg6()Eric Dumazet1-2/+7
In case MSS option is added in TCP options, skb length increases by 4. IPv6 needs to update skb->csum if skb has CHECKSUM_COMPLETE, otherwise kernel complains loudly in netdev_rx_csum_fault() with a stack dump. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-01-17sctp: the temp asoc's transports should not be hashed/unhashedXin Long2-2/+8
Re-establish the previous behavior and avoid hashing temporary asocs by checking t->asoc->temp in sctp_(un)hash_transport. Also, remove the check of t->asoc->temp in __sctp_lookup_association, since they are never hashed now. Fixes: 4f0087812648 ("sctp: apply rhashtable api to send/recv path") Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Reported-by: Vlad Yasevich <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-01-16batman-adv: Drop immediate orig_node free functionSven Eckelmann3-27/+13
It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_orig_node_free_ref. Fixes: 72822225bd41 ("batman-adv: Fix rcu_barrier() miss due to double call_rcu() in TT code") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Drop immediate batadv_hard_iface free functionSven Eckelmann2-20/+7
It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_hardif_free_ref. Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Drop immediate neigh_ifinfo free functionSven Eckelmann1-24/+10
It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_neigh_ifinfo_free_ref. Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Drop immediate batadv_hardif_neigh_node free functionSven Eckelmann1-33/+13
It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_hardif_neigh_free_ref. Fixes: cef63419f7db ("batman-adv: add list of unique single hop neighbors per hard-interface") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Drop immediate batadv_neigh_node free functionSven Eckelmann1-23/+10
It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_neigh_node_free_ref. Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Drop immediate batadv_orig_ifinfo free functionSven Eckelmann1-28/+31
It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_orig_ifinfo_free_ref. Fixes: 7351a4822d42 ("batman-adv: split out router from orig_node") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Avoid recursive call_rcu for batadv_nc_nodeSven Eckelmann1-11/+8
The batadv_nc_node_free_ref function uses call_rcu to delay the free of the batadv_nc_node object until no (already started) rcu_read_lock is enabled anymore. This makes sure that no context is still trying to access the object which should be removed. But batadv_nc_node also contains a reference to orig_node which must be removed. The reference drop of orig_node was done in the call_rcu function batadv_nc_node_free_rcu but should actually be done in the batadv_nc_node_release function to avoid nested call_rcus. This is important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not detect the inner call_rcu as relevant for its execution. Otherwise this barrier will most likely be inserted in the queue before the callback of the first call_rcu was executed. The caller of rcu_barrier will therefore continue to run before the inner call_rcu callback finished. Fixes: d56b1705e28c ("batman-adv: network coding - detect coding nodes and remove these after timeout") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Avoid recursive call_rcu for batadv_bla_claimSven Eckelmann1-7/+3
The batadv_claim_free_ref function uses call_rcu to delay the free of the batadv_bla_claim object until no (already started) rcu_read_lock is enabled anymore. This makes sure that no context is still trying to access the object which should be removed. But batadv_bla_claim also contains a reference to backbone_gw which must be removed. The reference drop of backbone_gw was done in the call_rcu function batadv_claim_free_rcu but should actually be done in the batadv_claim_release function to avoid nested call_rcus. This is important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not detect the inner call_rcu as relevant for its execution. Otherwise this barrier will most likely be inserted in the queue before the callback of the first call_rcu was executed. The caller of rcu_barrier will therefore continue to run before the inner call_rcu callback finished. Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code") Signed-off-by: Sven Eckelmann <[email protected]> Acked-by: Simon Wunderlich <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds11-30/+59
Pull networking fixes from David Miller: "A quick set of bug fixes after there initial networking merge: 1) Netlink multicast group storage allocator only was tested with nr_groups equal to 1, make it work for other values too. From Matti Vaittinen. 2) Check build_skb() return value in macb and hip04_eth drivers, from Weidong Wang. 3) Don't leak x25_asy on x25_asy_open() failure. 4) More DMA map/unmap fixes in 3c59x from Neil Horman. 5) Don't clobber IP skb control block during GSO segmentation, from Konstantin Khlebnikov. 6) ECN helpers for ipv6 don't fixup the checksum, from Eric Dumazet. 7) Fix SKB segment utilization estimation in xen-netback, from David Vrabel. 8) Fix lockdep splat in bridge addrlist handling, from Nikolay Aleksandrov" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) bgmac: Fix reversed test of build_skb() return value. bridge: fix lockdep addr_list_lock false positive splat net: smsc: Add support h8300 xen-netback: free queues after freeing the net device xen-netback: delete NAPI instance when queue fails to initialize xen-netback: use skb to determine number of required guest Rx requests net: sctp: Move sequence start handling into sctp_transport_get_idx() ipv6: update skb->csum when CE mark is propagated net: phy: turn carrier off on phy attach net: macb: clear interrupts when disabling them sctp: support to lookup with ep+paddr in transport rhashtable net: hns: fixes no syscon error when init mdio dts: hisi: fixes no syscon fault when init mdio net: preserve IP control block during GSO segmentation fsl/fman: Delete one function call "put_device" in dtsec_config() hip04_eth: fix missing error handle for build_skb failed 3c59x: fix another page map/single unmap imbalance 3c59x: balance page maps and unmaps x25_asy: Free x25_asy on x25_asy_open() failure. mlxsw: fix SWITCHDEV_OBJ_ID_PORT_MDB ...
2016-01-15Merge tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linuxLinus Torvalds4-8/+50
Pull nfsd updates from Bruce Fields: "Smaller bugfixes and cleanup, including a fix for a failures of kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK storms that can affect some high-availability NFS setups" * tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux: nfsd: add new io class tracepoint nfsd: give up on CB_LAYOUTRECALLs after two lease periods nfsd: Fix nfsd leaks sunrpc module references lockd: constify nlmsvc_binding structure lockd: use to_delayed_work nfsd: use to_delayed_work Revert "svcrdma: Do not send XDR roundup bytes for a write chunk" lockd: Register callbacks on the inetaddr_chain and inet6addr_chain nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain sunrpc: Add a function to close temporary transports immediately nfsd: don't base cl_cb_status on stale information nfsd4: fix gss-proxy 4.1 mounts for some AD principals nfsd: fix unlikely NULL deref in mach_creds_match nfsd: minor consolidation of mach_cred handling code nfsd: helper for dup of possibly NULL string svcrpc: move some initialization to common code nfsd: fix a warning message nfsd: constify nfsd4_callback_ops structure nfsd: recover: constify nfsd4_client_tracking_ops structures svcrdma: Do not send XDR roundup bytes for a write chunk
2016-01-15bridge: fix lockdep addr_list_lock false positive splatNikolay Aleksandrov1-0/+8
After promisc mode management was introduced a bridge device could do dev_set_promiscuity from its ndo_change_rx_flags() callback which in turn can be called after the bridge's addr_list_lock has been taken (e.g. by dev_uc_add). This causes a false positive lockdep splat because the port interfaces' addr_list_lock is taken when br_manage_promisc() runs after the bridge's addr list lock was already taken. To remove the false positive introduce a custom bridge addr_list_lock class and set it on bridge init. A simple way to reproduce this is with the following: $ brctl addbr br0 $ ip l add l br0 br0.100 type vlan id 100 $ ip l set br0 up $ ip l set br0.100 up $ echo 1 > /sys/class/net/br0/bridge/vlan_filtering $ brctl addif br0 eth0 Splat: [ 43.684325] ============================================= [ 43.684485] [ INFO: possible recursive locking detected ] [ 43.684636] 4.4.0-rc8+ #54 Not tainted [ 43.684755] --------------------------------------------- [ 43.684906] brctl/1187 is trying to acquire lock: [ 43.685047] (_xmit_ETHER){+.....}, at: [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40 [ 43.685460] but task is already holding lock: [ 43.685618] (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80 [ 43.686015] other info that might help us debug this: [ 43.686316] Possible unsafe locking scenario: [ 43.686743] CPU0 [ 43.686967] ---- [ 43.687197] lock(_xmit_ETHER); [ 43.687544] lock(_xmit_ETHER); [ 43.687886] *** DEADLOCK *** [ 43.688438] May be due to missing lock nesting notation [ 43.688882] 2 locks held by brctl/1187: [ 43.689134] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81510317>] rtnl_lock+0x17/0x20 [ 43.689852] #1: (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80 [ 43.690575] stack backtrace: [ 43.690970] CPU: 0 PID: 1187 Comm: brctl Not tainted 4.4.0-rc8+ #54 [ 43.691270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 43.691770] ffffffff826a25c0 ffff8800369fb8e0 ffffffff81360ceb ffffffff826a25c0 [ 43.692425] ffff8800369fb9b8 ffffffff810d0466 ffff8800369fb968 ffffffff81537139 [ 43.693071] ffff88003a08c880 0000000000000000 00000000ffffffff 0000000002080020 [ 43.693709] Call Trace: [ 43.693931] [<ffffffff81360ceb>] dump_stack+0x4b/0x70 [ 43.694199] [<ffffffff810d0466>] __lock_acquire+0x1e46/0x1e90 [ 43.694483] [<ffffffff81537139>] ? netlink_broadcast_filtered+0x139/0x3e0 [ 43.694789] [<ffffffff8153b5da>] ? nlmsg_notify+0x5a/0xc0 [ 43.695064] [<ffffffff810d10f5>] lock_acquire+0xe5/0x1f0 [ 43.695340] [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40 [ 43.695623] [<ffffffff815edea5>] _raw_spin_lock_bh+0x45/0x80 [ 43.695901] [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40 [ 43.696180] [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40 [ 43.696460] [<ffffffff8150189c>] dev_set_promiscuity+0x3c/0x50 [ 43.696750] [<ffffffffa0586845>] br_port_set_promisc+0x25/0x50 [bridge] [ 43.697052] [<ffffffffa05869aa>] br_manage_promisc+0x8a/0xe0 [bridge] [ 43.697348] [<ffffffffa05826ee>] br_dev_change_rx_flags+0x1e/0x20 [bridge] [ 43.697655] [<ffffffff81501532>] __dev_set_promiscuity+0x132/0x1f0 [ 43.697943] [<ffffffff81501672>] __dev_set_rx_mode+0x82/0x90 [ 43.698223] [<ffffffff815072de>] dev_uc_add+0x5e/0x80 [ 43.698498] [<ffffffffa05b3c62>] vlan_device_event+0x542/0x650 [8021q] [ 43.698798] [<ffffffff8109886d>] notifier_call_chain+0x5d/0x80 [ 43.699083] [<ffffffff810988b6>] raw_notifier_call_chain+0x16/0x20 [ 43.699374] [<ffffffff814f456e>] call_netdevice_notifiers_info+0x6e/0x80 [ 43.699678] [<ffffffff814f4596>] call_netdevice_notifiers+0x16/0x20 [ 43.699973] [<ffffffffa05872be>] br_add_if+0x47e/0x4c0 [bridge] [ 43.700259] [<ffffffffa058801e>] add_del_if+0x6e/0x80 [bridge] [ 43.700548] [<ffffffffa0588b5f>] br_dev_ioctl+0xaf/0xc0 [bridge] [ 43.700836] [<ffffffff8151a7ac>] dev_ifsioc+0x30c/0x3c0 [ 43.701106] [<ffffffff8151aac9>] dev_ioctl+0xf9/0x6f0 [ 43.701379] [<ffffffff81254345>] ? mntput_no_expire+0x5/0x450 [ 43.701665] [<ffffffff812543ee>] ? mntput_no_expire+0xae/0x450 [ 43.701947] [<ffffffff814d7b02>] sock_do_ioctl+0x42/0x50 [ 43.702219] [<ffffffff814d8175>] sock_ioctl+0x1e5/0x290 [ 43.702500] [<ffffffff81242d0b>] do_vfs_ioctl+0x2cb/0x5c0 [ 43.702771] [<ffffffff81243079>] SyS_ioctl+0x79/0x90 [ 43.703033] [<ffffffff815eebb6>] entry_SYSCALL_64_fastpath+0x16/0x7a CC: Vlad Yasevich <[email protected]> CC: Stephen Hemminger <[email protected]> CC: Bridge list <[email protected]> CC: Andy Gospodarek <[email protected]> CC: Roopa Prabhu <[email protected]> Fixes: 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode.") Reported-by: Andy Gospodarek <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-01-15net: sctp: Move sequence start handling into sctp_transport_get_idx()Geert Uytterhoeven1-3/+3
net/sctp/proc.c: In function ‘sctp_transport_get_idx’: net/sctp/proc.c:313: warning: ‘obj’ may be used uninitialized in this function This is currently a false positive, as all callers check for a zero offset first, and handle this case in the exact same way. Move the check and handling into sctp_transport_get_idx() to kill the compiler warning, and avoid future bugs. Signed-off-by: Geert Uytterhoeven <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-01-15ipv6: update skb->csum when CE mark is propagatedEric Dumazet1-1/+1
When a tunnel decapsulates the outer header, it has to comply with RFC 6080 and eventually propagate CE mark into inner header. It turns out IP6_ECN_set_ce() does not correctly update skb->csum for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure" messages and stack traces. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-01-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds8-136/+59
Merge first patch-bomb from Andrew Morton: - A few hotfixes which missed 4.4 becasue I was asleep. cc'ed to -stable - A few misc fixes - OCFS2 updates - Part of MM. Including pretty large changes to page-flags handling and to thp management which have been buffered up for 2-3 cycles now. I have a lot of MM material this time. [ It turns out the THP part wasn't quite ready, so that got dropped from this series - Linus ] * emailed patches from Andrew Morton <[email protected]>: (117 commits) zsmalloc: reorganize struct size_class to pack 4 bytes hole mm/zbud.c: use list_last_entry() instead of list_tail_entry() zram/zcomp: do not zero out zcomp private pages zram: pass gfp from zcomp frontend to backend zram: try vmalloc() after kmalloc() zram/zcomp: use GFP_NOIO to allocate streams mm: add tracepoint for scanning pages drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64 mm/page_isolation: use macro to judge the alignment mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE() mm: rework virtual memory accounting include/linux/memblock.h: fix ordering of 'flags' argument in comments mm: move lru_to_page to mm_inline.h Documentation/filesystems: describe the shared memory usage/accounting memory-hotplug: don't BUG() in register_memory_resource() hugetlb: make mm and fs code explicitly non-modular mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd() mm: make sure isolate_lru_page() is never called for tail page vmstat: make vmstat_updater deferrable again and shut down on idle ...
2016-01-15sctp: support to lookup with ep+paddr in transport rhashtableXin Long1-15/+23
Now, when we sendmsg, we translate the ep to laddr by selecting the first element of the list, and then do a lookup for a transport. But sctp_hash_cmp() will compare it against asoc addr_list, which may be a subset of ep addr_list, meaning that this chosen laddr may not be there, and thus making it impossible to find the transport. So we fix it by using ep + paddr to lookup transports in hashtable. In sctp_hash_cmp, if .ep is set, we will check if this ep == asoc->ep, or we will do the laddr check. Fixes: d6c0256a60e6 ("sctp: add the rhashtable apis for sctp global transport hashtable") Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Reported-by: Vlad Yasevich <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-01-15net: preserve IP control block during GSO segmentationKonstantin Khlebnikov4-4/+9
Skb_gso_segment() uses skb control block during segmentation. This patch adds 32-bytes room for previous control block which will be copied into all resulting segments. This patch fixes kernel crash during fragmenting forwarded packets. Fragmentation requires valid IP CB in skb for clearing ip options. Also patch removes custom save/restore in ovs code, now it's redundant. Signed-off-by: Konstantin Khlebnikov <[email protected]> Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com Signed-off-by: David S. Miller <[email protected]>
2016-01-14Merge tag 'nfs-for-4.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds10-92/+298
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Fix a regression in the SunRPC socket polling code - Fix the attribute cache revalidation code - Fix race in __update_open_stateid() - Fix an lo->plh_block_lgets imbalance in layoutreturn - Fix an Oopsable typo in ff_mirror_match_fh() Features: - pNFS layout recall performance improvements. - pNFS/flexfiles: Support server-supplied layoutstats sampling period Bugfixes + cleanups: - NFSv4: Don't perform cached access checks before we've OPENed the file - Fix starvation issues with background flushes - Reclaim writes should be flushed as unstable writes if there are already entries in the commit lists - Various bugfixes from Chuck to fix NFS/RDMA send queue ordering problems - Ensure that we propagate fatal layoutget errors back to the application - Fixes for sundry flexfiles layoutstats bugs - Fix files/flexfiles to not cache invalidated layouts in the DS commit buckets" * tag 'nfs-for-4.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits) NFS: Fix a compile warning about unused variable in nfs_generic_pg_pgios() NFSv4: Fix a compile warning about no prototype for nfs4_ioctl() NFS: Use wait_on_atomic_t() for unlock after readahead SUNRPC: Fixup socket wait for memory NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid() NFSv4.1/pNFS: Fix a race in initiate_file_draining() NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn() NFS: Relax requirements in nfs_flush_incompatible NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid NFS: Allow multiple commit requests in flight per file NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs SUNRPC: Fix a missing break in rpc_anyaddr() pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh() NFS: Fix attribute cache revalidation NFS: Ensure we revalidate attributes before using execute_ok() ...
2016-01-14mm: memcontrol: switch to the updated jump-label APIJohannes Weiner1-2/+2
According to <linux/jump_label.h> the direct use of struct static_key is deprecated. Update the socket and slab accounting code accordingly. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: David S. Miller <[email protected]> Reported-by: Jason Baron <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-14mm: memcontrol: generalize the socket accounting jump labelJohannes Weiner2-7/+2
The unified hierarchy memory controller is going to use this jump label as well to control the networking callbacks. Move it to the memory controller code and give it a more generic name. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-14net: tcp_memcontrol: simplify linkage between socket and page counterJohannes Weiner5-97/+36
There won't be any separate counters for socket memory consumed by protocols other than TCP in the future. Remove the indirection and link sockets directly to their owning memory cgroup. Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-14net: tcp_memcontrol: sanitize tcp memory accounting callbacksJohannes Weiner2-12/+21
There won't be a tcp control soft limit, so integrating the memcg code into the global skmem limiting scheme complicates things unnecessarily. Replace this with simple and clear charge and uncharge calls--hidden behind a jump label--to account skb memory. Note that this is not purely aesthetic: as a result of shoehorning the per-memcg code into the same memory accounting functions that handle the global level, the old code would compare the per-memcg consumption against the smaller of the per-memcg limit and the global limit. This allowed the total consumption of multiple sockets to exceed the global limit, as long as the individual sockets stayed within bounds. After this change, the code will always compare the per-memcg consumption to the per-memcg limit, and the global consumption to the global limit, and thus close this loophole. Without a soft limit, the per-memcg memory pressure state in sockets is generally questionable. However, we did it until now, so we continue to enter it when the hard limit is hit, and packets are dropped, to let other sockets in the cgroup know that they shouldn't grow their transmit windows, either. However, keep it simple in the new callback model and leave memory pressure lazily when the next packet is accepted (as opposed to doing it synchroneously when packets are processed). When packets are dropped, network performance will already be in the toilet, so that should be a reasonable trade-off. As described above, consumption is now checked on the per-memcg level and the global level separately. Likewise, memory pressure states are maintained on both the per-memcg level and the global level, and a socket is considered under pressure when either level asserts as much. Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-14net: tcp_memcontrol: simplify the per-memcg limit accessJohannes Weiner1-8/+0
tcp_memcontrol replicates the global sysctl_mem limit array per cgroup, but it only ever sets these entries to the value of the memory_allocated page_counter limit. Use the latter directly. Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-14net: tcp_memcontrol: remove dead per-memcg count of allocated socketsJohannes Weiner1-3/+0
The number of allocated sockets is used for calculations in the soft limit phase, where packets are accepted but the socket is under memory pressure. Since there is no soft limit phase in tcp_memcontrol, and memory pressure is only entered when packets are already dropped, this is actually dead code. Remove it. As this is the last user of parent_cg_proto(), remove that too. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: David S. Miller <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-14net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-labelJohannes Weiner3-9/+7
Move the jump-label from sock_update_memcg() and sock_release_memcg() to the callsite, and so eliminate those function calls when socket accounting is not enabled. This also eliminates the need for dummy functions because the calls will be optimized away if the Kconfig options are not enabled. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: David S. Miller <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-14memcg: do not allow to disable tcp accounting after limit is setVladimir Davydov1-12/+5
There are two bits defined for cg_proto->flags - MEMCG_SOCK_ACTIVATED and MEMCG_SOCK_ACTIVE - both are set in tcp_update_limit, but the former is never cleared while the latter can be cleared by unsetting the limit. This allows to disable tcp socket accounting for new sockets after it was enabled by writing -1 to memory.kmem.tcp.limit_in_bytes while still guaranteeing that memcg_socket_limit_enabled static key will be decremented on memcg destruction. This functionality looks dubious, because it is not clear what a use case would be. By enabling tcp accounting a user accepts the price. If they then find the performance degradation unacceptable, they can always restart their workload with tcp accounting disabled. It does not seem there is any need to flip it while the workload is running. Besides, it contradicts to how kmem accounting API works: writing whatever to memory.kmem.limit_in_bytes enables kmem accounting for the cgroup in question, after which it cannot be disabled. Therefore one might expect that writing -1 to memory.kmem.tcp.limit_in_bytes just enables socket accounting w/o limiting it, which might be useful by itself, but it isn't true. Since this API peculiarity is not documented anywhere, I propose to drop it. This will allow to simplify the code by dropping cg_proto->flags. Signed-off-by: Vladimir Davydov <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-14kmemcg: account certain kmem allocations to memcgVladimir Davydov2-2/+2
Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [[email protected]: coding-style fixes] Signed-off-by: Vladimir Davydov <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-14netfilter: nft_ct: keep counters away from CONFIG_NF_CONNTRACK_LABELSPablo Neira Ayuso1-1/+1
This is accidental, they don't depend on the label infrastructure. Fixes: 48f66c905a97 ("netfilter: nft_ct: add byte/packet counter support") Reported-by: Arnd Bergmann <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Acked-by: Florian Westphal <[email protected]>
2016-01-14mac80211: Don't buffer non-bufferable MMPDUsHelmut Schaa1-0/+5
Non-bufferable MMPDUs are sent out to STAs even while in PS mode (for example probe responses). Applying filtered frame handling for these doesn't seem to make much sense and will only create more air utilization when the STA wakes up. Hence, apply filtered frame handling only for bufferable MMPDUs. Discovered while testing an old VOIP phone that started probing for APs while in PS mode. The mac80211/ath9k AP where the STA is associated would reply with a probe response but the phone sometimes moved to a new channel already and couldn't ack the probe response anymore. In that case mac80211 applied filtered frame handling for the un-acked probe response. Signed-off-by: Helmut Schaa <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2016-01-14mac80211: handle sched_scan_stopped vs. hw restart raceEliad Peller2-0/+9
On hw restart, mac80211 might try to reconfigure already stopped sched scan, if ieee80211_sched_scan_stopped_work() wasn't scheduled yet. This in turn will keep the device driver with scheduled scan configured, while both mac80211 and cfg80211 will clear their sched scan state once the work is scheduled. Fix it by ignoring ieee80211_sched_scan_stopped() calls while in hw restart, and flush the work before starting the reconfiguration. Signed-off-by: Eliad Peller <[email protected]> Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2016-01-14mac80211: fix PS-Poll handlingEmmanuel Grumbach1-1/+1
My commit below broken PS-Poll handling. In case the driver has no frames buffered, driver_release_tids will be 0, but calling find_highest_prio_tid() with 0 as a parameter is not a good idea: fls(0) - 1 = -1. This bug caused mac80211 to think that frames were buffered in the driver which in turn was confused because mac80211 was asking to release frames that were not reported to exist. On iwlwifi, this led to the WARNING below: WARNING: CPU: 0 PID: 11230 at drivers/net/wireless/intel/iwlwifi/mvm/sta.c:1733 iwl_mvm_sta_modify_sleep_tx_count+0x2af/0x320 [iwlmvm]() ffffffffc0627c60 ffff8800069b7648 ffffffff81888913 0000000000000000 0000000000000000 ffff8800069b7688 ffffffff81089d6a ffff8800069b7678 0000000000000001 ffff88003b35abf0 ffff88000698b128 ffff8800069b76d4 Call Trace: [<ffffffff81888913>] dump_stack+0x4c/0x65 [<ffffffff81089d6a>] warn_slowpath_common+0x8a/0xc0 [<ffffffff81089e5a>] warn_slowpath_null+0x1a/0x20 [<ffffffffc05f36bf>] iwl_mvm_sta_modify_sleep_tx_count+0x2af/0x320 [iwlmvm] [<ffffffffc05dae41>] iwl_mvm_mac_release_buffered_frames+0x31/0x40 [iwlmvm] [<ffffffffc045d8b6>] ieee80211_sta_ps_deliver_response+0x6e6/0xd80 [mac80211] [<ffffffffc0461296>] ieee80211_sta_ps_deliver_poll_response+0x26/0x30 [mac80211] [<ffffffffc048f743>] ieee80211_rx_handlers+0xa83/0x2900 [mac80211] [<ffffffffc04917ad>] ieee80211_prepare_and_rx_handle+0x1ed/0xa70 [mac80211] [<ffffffffc045e3d5>] ? sta_info_get_bss+0x5/0x4a0 [mac80211] [<ffffffffc04925b6>] ieee80211_rx_napi+0x586/0xcd0 [mac80211] [<ffffffffc05eaa3e>] iwl_mvm_rx_rx_mpdu+0x59e/0xc60 [iwlmvm] Fixes: 0ead2510f8ce ("mac80211: allow the driver to send EOSP when needed") Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2016-01-14mac80211: clear local->sched_scan_req properly on reconfigEliad Peller1-1/+4
On reconfig, in case of sched_scan_req->n_scan_plans > 1, local->sched_scan_req was never cleared, although cfg80211_sched_scan_stopped_rtnl() was called, resulting in local->sched_scan_req holding a stale and preventing further scheduled scan requests. Clear it explicitly in this case. Fixes: 42a7e82c6792 ("mac80211: Do not restart scheduled scan if multiple scan plans are set") Signed-off-by: Eliad Peller <[email protected]> Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2016-01-14mac80211: avoid ROC during hw restartEliad Peller3-2/+18
Defer ROC requests during hw restart, as the driver might not be fully configured in this stage (e.g. channel contexts were not added yet) Signed-off-by: Eliad Peller <[email protected]> Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2016-01-14regulatory: fix world regulatory domain dataJohannes Berg1-5/+7
The rule definitions here aren't really valid, they would be rejected if it came from userspace due to the bandwidth specified being bigger than the rule's width. This is fairly much inconsequential since the other rules around them do enable the bandwidth, but express that better using the NL80211_RRF_AUTO_BW flag. Signed-off-by: Johannes Berg <[email protected]>