blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2014-06-04	SUNRPC: Move congestion window constants to header file	Chuck Lever	1	-19/+9
	I would like to use one of the RPC client's congestion algorithm constants in transport-specific code. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Reset connection timeout after successful reconnect	Chuck Lever	1	-0/+1
	If the new connection is able to make forward progress, reset the re-establish timeout. Otherwise it keeps growing even if disconnect events are rare. The same behavior as TCP is adopted: reconnect immediately if the transport instance has been able to make some forward progress. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Use macros for reconnection timeout constants	Chuck Lever	1	-7/+12
	Clean up: Ensure the same max and min constant values are used everywhere when setting reconnect timeouts. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Allocate missing pagelist	Shirley Ma	1	-0/+6
	GETACL relies on transport layer to alloc memory for reply buffer. However xprtrdma assumes that the reply buffer (pagelist) has been pre-allocated in upper layer. This problem was reported by IOL OFA lab test on PPC. Signed-off-by: Shirley Ma <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Tested-by: Edward Mossman <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Remove Tavor MTU setting	Chuck Lever	1	-14/+0
	Clean up. Remove HCA-specific clutter in xprtrdma, which is supposed to be device-independent. Hal Rosenstock <[email protected]> observes: > Note that there is OpenSM option (enable_quirks) to return 1K MTU > in SA PathRecord responses for Tavor so that can be used for this. > The default setting for enable_quirks is FALSE so that would need > changing. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting	Chuck Lever	1	-9/+20
	Devesh Sharma <[email protected]> reports that after a disconnect, his HCA is failing to create a fresh QP, leaving ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to wake up and post LOCAL_INV as they exit, causing an oops. rpcrdma_ep_connect() is allowing the wake-up by leaking the QP creation error code (-EPERM in this case) to the RPC client's generic layer. xprt_connect_status() does not recognize -EPERM, so it kills pending RPC tasks immediately rather than retrying the connect. Re-arrange the QP creation logic so that when it fails on reconnect, it leaves ->qp with the old QP rather than NULL. If pending RPC tasks wake and exit, LOCAL_INV work requests will flush rather than oops. On initial connect, leaving ->qp == NULL is OK, since there are no pending RPCs that might use ->qp. But be sure not to try to destroy a NULL QP when rpcrdma_ep_connect() is retried. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Reduce the number of hardway buffer allocations	Chuck Lever	1	-12/+13
	While marshaling an RPC/RDMA request, the inline_{rsize,wsize} settings determine whether an inline request is used, or whether read or write chunks lists are built. The current default value of these settings is 1024. Any RPC request smaller than 1024 bytes is sent to the NFS server completely inline. rpcrdma_buffer_create() allocates and pre-registers a set of RPC buffers for each transport instance, also based on the inline rsize and wsize settings. RPC/RDMA requests and replies are built in these buffers. However, if an RPC/RDMA request is expected to be larger than 1024, a buffer has to be allocated and registered for that RPC, and deregistered and released when the RPC is complete. This is known has a "hardway allocation." Since the introduction of NFSv4, the size of RPC requests has become larger, and hardway allocations are thus more frequent. Hardway allocations are significant overhead, and they waste the existing RPC buffers pre-allocated by rpcrdma_buffer_create(). We'd like fewer hardway allocations. Increasing the size of the pre-registered buffers is the most direct way to do this. However, a blanket increase of the inline thresholds has interoperability consequences. On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000 bytes for each RPC request buffer, using kmalloc(). Due to internal fragmentation, this wastes nearly 1200 bytes because kmalloc() already returns an 8192-byte piece of memory for a 7000-byte allocation request, though the extra space remains unused. So let's round up the size of the pre-allocated buffers, and make use of the unused space in the kmalloc'd memory. This change reduces the amount of hardway allocated memory for an NFSv4 general connectathon run from 1322092 to 9472 bytes (99%). Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Limit work done by completion handler	Chuck Lever	2	-4/+7
	Sagi Grimberg <[email protected]> points out that a steady stream of CQ events could starve other work because of the boundless loop pooling in rpcrdma_{send,recv}_poll(). Instead of a (potentially infinite) while loop, return after collecting a budgeted number of completions. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Sagi Grimberg <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrmda: Reduce calls to ib_poll_cq() in completion handlers	Chuck Lever	2	-18/+42
	Change the completion handlers to grab up to 16 items per ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16 items are returned. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrmda: Reduce lock contention in completion handlers	Chuck Lever	1	-4/+10
	Skip the ib_poll_cq() after re-arming, if the provider knows there are no additional items waiting. (Have a look at commit ed23a727 for more details). Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Split the completion queue	Chuck Lever	2	-92/+137
	The current CQ handler uses the ib_wc.opcode field to distinguish between event types. However, the contents of that field are not reliable if the completion status is not IB_WC_SUCCESS. When an error completion occurs on a send event, the CQ handler schedules a tasklet with something that is not a struct rpcrdma_rep. This is never correct behavior, and sometimes it results in a panic. To resolve this issue, split the completion queue into a send CQ and a receive CQ. The send CQ handler now handles only struct rpcrdma_mw wr_id's, and the receive CQ handler now handles only struct rpcrdma_rep wr_id's. Fix suggested by Shirley Ma <[email protected]> Reported-by: Rafael Reiter <[email protected]> Fixes: 5c635e09cec0feeeb310968e51dad01040244851 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211 Signed-off-by: Chuck Lever <[email protected]> Tested-by: Klemens Senn <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Make rpcrdma_ep_destroy() return void	Chuck Lever	3	-13/+4
	Clean up: rpcrdma_ep_destroy() returns a value that is used only to print a debugging message. rpcrdma_ep_destroy() already prints debugging messages in all error cases. Make rpcrdma_ep_destroy() return void instead. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Simplify rpcrdma_deregister_external() synopsis	Chuck Lever	4	-10/+4
	Clean up: All remaining callers of rpcrdma_deregister_external() pass NULL as the last argument, so remove that argument. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: mount reports "Invalid mount option" if memreg mode not supported	Chuck Lever	1	-4/+4
	If the selected memory registration mode is not supported by the underlying provider/HCA, the NFS mount command reports that there was an invalid mount option, and fails. This is misleading. Reporting a problem allocating memory is a lot closer to the truth. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Fall back to MTHCAFMR when FRMR is not supported	Chuck Lever	1	-16/+15
	An audit of in-kernel RDMA providers that do not support the FRMR memory registration shows that several of them support MTHCAFMR. Prefer MTHCAFMR when FRMR is not supported. If MTHCAFMR is not supported, only then choose ALLPHYSICAL. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Remove REGISTER memory registration mode	Chuck Lever	2	-88/+5
	All kernel RDMA providers except amso1100 support either MTHCAFMR or FRMR, both of which are faster than REGISTER. amso1100 can continue to use ALLPHYSICAL. The only other ULP consumer in the kernel that uses the reg_phys_mr verb is Lustre. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Remove MEMWINDOWS registration modes	Chuck Lever	4	-203/+7
	The MEMWINDOWS and MEMWINDOWS_ASYNC memory registration modes were intended as stop-gap modes before the introduction of FRMR. They are now considered obsolete. MEMWINDOWS_ASYNC is also considered unsafe because it can leave client memory registered and exposed for an indeterminant time after each I/O. At this point, the MEMWINDOWS modes add needless complexity, so remove them. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: Remove BOUNCEBUFFERS memory registration mode	Chuck Lever	3	-28/+1
	Clean up: This memory registration mode is slow and was never meant for use in production environments. Remove it to reduce implementation complexity. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context	Chuck Lever	3	-7/+21
	An IB provider can invoke rpcrdma_conn_func() in an IRQ context, thus rpcrdma_conn_func() cannot be allowed to directly invoke generic RPC functions like xprt_wake_pending_tasks(). Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	nfs-rdma: Fix for FMR leaks	Allen Andrews	1	-35/+38
	Two memory region leaks were found during testing: 1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is called. If ib_alloc_fast_reg_page_list returns an error it bails out of the routine dropping the last ib_alloc_fast_reg_mr frmr region creating a memory leak. Added code to dereg the last frmr if ib_alloc_fast_reg_page_list fails. 2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free the MR's on the rb_mws list if there are rb_send_bufs present. However, in rpcrdma_buffer_create while the rb_mws list is being built if one of the MR allocation requests fail after some MR's have been allocated on the rb_mws list the routine never gets to create any rb_send_bufs but instead jumps to the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws list because the rb_send_bufs were never created. This leaks all the MR's on the rb_mws list that were created prior to one of the MR allocations failing. Issue(2) was seen during testing. Our adapter had a finite number of MR's available and we created enough connections to where we saw an MR allocation failure on our Nth NFS connection request. After the kernel cleaned up the resources it had allocated for the Nth connection we noticed that FMR's had been leaked due to the coding error described above. Issue(1) was seen during a code review while debugging issue(2). Signed-off-by: Allen Andrews <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04	xprtrdma: mind the device's max fast register page list depth	Steve Wise	3	-16/+36
	Some rdma devices don't support a fast register page list depth of at least RPCRDMA_MAX_DATA_SEGS. So xprtrdma needs to chunk its fast register regions according to the minimum of the device max supported depth or RPCRDMA_MAX_DATA_SEGS. Signed-off-by: Steve Wise <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-03	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	13	-44/+131
	Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <[email protected]>
2014-06-03	net: remove some unless free on failure in alloc_netdev_mqs()	WANG Cong	1	-5/+0
	When we jump to free_pcpu on failure in alloc_netdev_mqs() rx and tx queues are not yet allocated, so no need to free them. Cc: David S. Miller <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-03	rtnetlink: fix a memory leak when ->newlink fails	Cong Wang	1	-3/+7
	It is possible that ->newlink() fails before registering the device, in this case we should just free it, it's safe to call free_netdev(). Fixes: commit 0e0eee2465df77bcec2 (net: correct error path in rtnl_newlink()) Cc: David S. Miller <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-03	xfrm: fix race between netns cleanup and state expire notification	Michal Kubecek	1	-11/+25
	The xfrm_user module registers its pernet init/exit after xfrm itself so that its net exit function xfrm_user_net_exit() is executed before xfrm_net_exit() which calls xfrm_state_fini() to cleanup the SA's (xfrm states). This opens a window between zeroing net->xfrm.nlsk pointer and deleting all xfrm_state instances which may access it (via the timer). If an xfrm state expires in this window, xfrm_exp_state_notify() will pass null pointer as socket to nlmsg_multicast(). As the notifications are called inside rcu_read_lock() block, it is sufficient to retrieve the nlsk socket with rcu_dereference() and check the it for null. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-03	Merge branch 'locking-core-for-linus' of ↵	Linus Torvalds	17	-36/+34
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull core locking updates from Ingo Molnar: "The main changes in this cycle were: - reduced/streamlined smp_mb__() interface that allows more usecases and makes the existing ones less buggy, especially in rarer architectures - add rwsem implementation comments - bump up lockdep limits" 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) rwsem: Add comments to explain the meaning of the rwsem's count field lockdep: Increase static allocations arch: Mass conversion of smp_mb__() arch,doc: Convert smp_mb__() arch,xtensa: Convert smp_mb__() arch,x86: Convert smp_mb__() arch,tile: Convert smp_mb__() arch,sparc: Convert smp_mb__() arch,sh: Convert smp_mb__() arch,score: Convert smp_mb__() arch,s390: Convert smp_mb__() arch,powerpc: Convert smp_mb__() arch,parisc: Convert smp_mb__() arch,openrisc: Convert smp_mb__() arch,mn10300: Convert smp_mb__() arch,mips: Convert smp_mb__() arch,metag: Convert smp_mb__() arch,m68k: Convert smp_mb__() arch,m32r: Convert smp_mb__() arch,ia64: Convert smp_mb__() ...
2014-06-02	Merge branch 'ethtool-rssh-fixes' of ↵	David S. Miller	1	-62/+48
	git://git.kernel.org/pub/scm/linux/kernel/git/bwh/net-next Ben Hutchings says: ==================== Pull request: Fixes for new ethtool RSS commands This addresses several problems I previously identified with the new ETHTOOL_{G,S}RSSH commands: 1. Missing validation of reserved parameters 2. Vague documentation 3. Use of unnamed magic number 4. No consolidation with existing driver operations I don't currently have access to suitable network hardware, but have tested these changes with a dummy driver that can support various combinations of operations and sizes, together with (a) Debian's ethtool 3.13 (b) ethtool 3.14 with the submitted patch to use ETHTOOL_{G,S}RSSH and minor adjustment for fixes 1 and 3. v2: Update RSS operations in vmxnet3 too ==================== Signed-off-by: David S. Miller <[email protected]>
2014-06-03	ethtool: Check that reserved fields of struct ethtool_rxfh are 0	Ben Hutchings	1	-53/+36
	We should fail rather than silently ignoring use of these extensions. Signed-off-by: Ben Hutchings <[email protected]>
2014-06-03	ethtool: Replace ethtool_ops::{get,set}_rxfh_indir() with {get,set}_rxfh()	Ben Hutchings	1	-4/+4
	ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers regardless of whether they expose the hash key, unless you try to set a hash key for a driver that doesn't expose it. Signed-off-by: Ben Hutchings <[email protected]> Acked-by: Jeff Kirsher <[email protected]>
2014-06-02	bridge: Add bridge ifindex to bridge fdb notify msgs	Roopa Prabhu	1	-0/+3
	(This patch was previously posted as RFC at http://patchwork.ozlabs.org/patch/352677/) This patch adds NDA_MASTER attribute to neighbour attributes enum for bridge/master ifindex. And adds NDA_MASTER to bridge fdb notify msgs. Today bridge fdb notifications dont contain bridge information. Userspace can derive it from the port information in the fdb notification. However this is tricky in some scenarious. Example, bridge port delete notification comes before bridge fdb delete notifications. And we have seen problems in userspace when using libnl where, the bridge fdb delete notification handling code does not understand which bridge this fdb entry is part of because the bridge and port association has already been deleted. And these notifications (port membership and fdb) are generated on separate rtnl groups. Fixing the order of notifications could possibly solve the problem for some cases (I can submit a separate patch for that). This patch chooses to add NDA_MASTER to bridge fdb notify msgs because it not only solves the problem described above, but also helps userspace avoid another lookup into link msgs to derive the master index. Signed-off-by: Roopa Prabhu <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02	net: filter: fix possible memory leak in __sk_prepare_filter()	Leon Yu	1	-1/+6
	__sk_prepare_filter() was reworked in commit bd4cf0ed3 (net: filter: rework/optimize internal BPF interpreter's instruction set) so that it should have uncharged memory once things went wrong. However that work isn't complete. Error is handled only in __sk_migrate_filter() while memory can still leak in the error path right after sk_chk_filter(). Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Leon Yu <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Tested-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02	tcp: fix cwnd undo on DSACK in F-RTO	Yuchung Cheng	1	-6/+5
	This bug is discovered by an recent F-RTO issue on tcpm list https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html The bug is that currently F-RTO does not use DSACK to undo cwnd in certain cases: upon receiving an ACK after the RTO retransmission in F-RTO, and the ACK has DSACK indicating the retransmission is spurious, the sender only calls tcp_try_undo_loss() if some never retransmisted data is sacked (FLAG_ORIG_DATA_SACKED). The correct behavior is to unconditionally call tcp_try_undo_loss so the DSACK information is used properly to undo the cwnd reduction. Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02	fib_trie: use seq_file_net rather than seq->private	David Ahern	1	-1/+1
	Make fib_triestat_seq_show consistent with other /proc/net files and use seq_file_net. Signed-off-by: David Ahern <[email protected]> Cc: David S. Miller <[email protected]> Cc: Alexey Kuznetsov <[email protected]> Cc: James Morris <[email protected]> Cc: Hideaki YOSHIFUJI <[email protected]> Cc: Patrick McHardy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02	netlink: Only check file credentials for implicit destinations	Eric W. Biederman	1	-1/+6
	It was possible to get a setuid root or setcap executable to write to it's stdout or stderr (which has been set made a netlink socket) and inadvertently reconfigure the networking stack. To prevent this we check that both the creator of the socket and the currentl applications has permission to reconfigure the network stack. Unfortunately this breaks Zebra which always uses sendto/sendmsg and creates it's socket without any privileges. To keep Zebra working don't bother checking if the creator of the socket has privilege when a destination address is specified. Instead rely exclusively on the privileges of the sender of the socket. Note from Andy: This is exactly Eric's code except for some comment clarifications and formatting fixes. Neither I nor, I think, anyone else is thrilled with this approach, but I'm hesitant to wait on a better fix since 3.15 is almost here. Note to stable maintainers: This is a mess. An earlier series of patches in 3.15 fix a rather serious security issue (CVE-2014-0181), but they did so in a way that breaks Zebra. The offending series includes: commit aa4cf9452f469f16cea8c96283b641b4576d4a7b Author: Eric W. Biederman <[email protected]> Date: Wed Apr 23 14:28:03 2014 -0700 net: Add variants of capable for use on netlink messages If a given kernel version is missing that series of fixes, it's probably worth backporting it and this patch. if that series is present, then this fix is critical if you care about Zebra. Cc: [email protected] Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02	net: fix inet_getid() and ipv6_select_ident() bugs	Eric Dumazet	1	-8/+3
	I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery is disabled. Note how GSO/TSO packets do not have monotonically incrementing ID. 06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396) 06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212) 06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972) 06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292) 06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764) It appears I introduced this bug in linux-3.1. inet_getid() must return the old value of peer->ip_id_count, not the new one. Lets revert this part, and remove the prevention of a null identification field in IPv6 Fragment Extension Header, which is dubious and not even done properly. Fixes: 87c48fa3b463 ("ipv6: make fragment identifications less predictable") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02	bridge: Prevent insertion of FDB entry with disallowed vlan	Toshiaki Makita	3	-2/+37
	br_handle_local_finish() is allowing us to insert an FDB entry with disallowed vlan. For example, when port 1 and 2 are communicating in vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can interfere with their communication by spoofed src mac address with vlan id 10. Note: Even if it is judged that a frame should not be learned, it should not be dropped because it is destined for not forwarding layer but higher layer. See IEEE 802.1Q-2011 8.13.10. Signed-off-by: Toshiaki Makita <[email protected]> Acked-by: Vlad Yasevich <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02	Merge branch 'for-davem' of ↵	David S. Miller	23	-157/+645
	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-06-02 Please pull this remaining batch of updates intended for the 3.16 stream... For the mac80211 bits, Johannes says: "The remainder for -next right now is mostly fixes, and a handful of small new things like some CSA infrastructure, the regdb script mW/dBm conversion change and sending wiphy notifications." For the bluetooth bits, Gustavo says: "Some more patches for 3.16. There is nothing really special here, just a bunch of clean ups, fixes plus some small improvements. Please pull." For the nfc bits, Samuel says: "We have: - Felica (Type3) tags support for trf7970a - Type 4b tags support for port100 - st21nfca DTS typo fix - A few sparse warning fixes" For the atheros bits, Kalle says: "Ben added support for setting antenna configurations. Michal improved warm reset so that we would not need to fall back to cold reset that often, an issue where ath10k stripped protected flag while in monitor mode and made module initialisation asynchronous to fix the problems with firmware loading when the driver is linked to the kernel. Luca removed unused channel_switch_beacon callbacks both from ath9k and ath10k. Marek fixed Protected Management Frames (PMF) when using Action Frames. Also we had other small fixes everywhere in the driver." Along with that, there are a handful of updates to a variety of drivers. This includes updates to at76c50x-usb, ath9k, b43, brcmfmac, mwifiex, rsi, rtlwifi, and wil6210. ==================== Signed-off-by: David S. Miller <[email protected]>
2014-06-02	inetpeer: get rid of ip_id_count	Eric Dumazet	12	-113/+38
	Ideally, we would need to generate IP ID using a per destination IP generator. linux kernels used inet_peer cache for this purpose, but this had a huge cost on servers disabling MTU discovery. 1) each inet_peer struct consumes 192 bytes 2) inetpeer cache uses a binary tree of inet_peer structs, with a nominal size of ~66000 elements under load. 3) lookups in this tree are hitting a lot of cache lines, as tree depth is about 20. 4) If server deals with many tcp flows, we have a high probability of not finding the inet_peer, allocating a fresh one, inserting it in the tree with same initial ip_id_count, (cf secure_ip_id()) 5) We garbage collect inet_peer aggressively. IP ID generation do not have to be 'perfect' Goal is trying to avoid duplicates in a short period of time, so that reassembly units have a chance to complete reassembly of fragments belonging to one message before receiving other fragments with a recycled ID. We simply use an array of generators, and a Jenkin hash using the dst IP as a key. ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it belongs (it is only used from this file) secure_ip_id() and secure_ipv6_id() no longer are needed. Rename ip_select_ident_more() to ip_select_ident_segs() to avoid unnecessary decrement/increment of the number of segments. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02	net: Add support for device specific address syncing	Alexander Duyck	1	-0/+85
	This change provides a function to be used in order to break the ndo_set_rx_mode call into a set of address add and remove calls. The code is based on the implementation of dev_uc_sync/dev_mc_sync. Since they essentially do the same thing but with only one dev I simply named my functions __dev_uc_sync/__dev_mc_sync. I also implemented an unsync version of the functions as well to allow for cleanup on close. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02	6lowpan_rtnl: fix off by one while fragmentation	Alexander Aring	1	-1/+1
	This patch fix a off by one error while fragmentation. If the frag_cap value is equal to skb_unprocessed value we need to stop the fragmentation loop because the last fragment which has a size of skb_unprocessed fits into the frag capability size. This issue was introduced by commit d4b2816d67d6e07b2f27037f282d8db03a5829d7 ("6lowpan: fix fragmentation"). Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02	6lowpan_rtnl: fix fragmentation with two fragments	Alexander Aring	1	-2/+2
	This patch fix the 6LoWPAN fragmentation for the case if we have exactly two fragments. The problem is that the (skb_unprocessed >= frag_cap) condition is always false on the second fragment after sending the first fragment. A fragmentation with only one fragment doesn't make any sense. The solution is that we use a do while loop here, that ensures we sending always a minimum of two fragments if we need a fragmentation. This issue was introduced by commit d4b2816d67d6e07b2f27037f282d8db03a5829d7 ("6lowpan: fix fragmentation"). Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02	genetlink: remove superfluous assignment	Denis ChengRq	1	-5/+1
	the local variable ops and n_ops were just read out from family, and not changed, hence no need to assign back. Validation functions should operate on const parameters and not change anything. Signed-off-by: Cheng Renquan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02	Merge branch 'master' of ↵	John W. Linville	23	-157/+645
	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
2014-06-02	Bluetooth: Fix L2CAP deadlock	Jukka Taimisto	1	-1/+4
	-[0x01 Introduction We have found a programming error causing a deadlock in Bluetooth subsystem of Linux kernel. The problem is caused by missing release_sock() call when L2CAP connection creation fails due full accept queue. The issue can be reproduced with 3.15-rc5 kernel and is also present in earlier kernels. -[0x02 Details The problem occurs when multiple L2CAP connections are created to a PSM which contains listening socket (like SDP) and left pending, for example, configuration (the underlying ACL link is not disconnected between connections). When L2CAP connection request is received and listening socket is found the l2cap_sock_new_connection_cb() function (net/bluetooth/l2cap_sock.c) is called. This function locks the 'parent' socket and then checks if the accept queue is full. 1178 lock_sock(parent); 1179 1180 /* Check for backlog size */ 1181 if (sk_acceptq_is_full(parent)) { 1182 BT_DBG("backlog full %d", parent->sk_ack_backlog); 1183 return NULL; 1184 } If case the accept queue is full NULL is returned, but the 'parent' socket is not released. Thus when next L2CAP connection request is received the code blocks on lock_sock() since the parent is still locked. Also note that for connections already established and waiting for configuration to complete a timeout will occur and l2cap_chan_timeout() (net/bluetooth/l2cap_core.c) will be called. All threads calling this function will also be blocked waiting for the channel mutex since the thread which is waiting on lock_sock() alread holds the channel mutex. We were able to reproduce this by sending continuously L2CAP connection request followed by disconnection request containing invalid CID. This left the created connections pending configuration. After the deadlock occurs it is impossible to kill bluetoothd, btmon will not get any more data etc. requiring reboot to recover. -[0x03 Fix Releasing the 'parent' socket when l2cap_sock_new_connection_cb() returns NULL seems to fix the issue. Signed-off-by: Jukka Taimisto <[email protected]> Reported-by: Tommi Mäkilä <[email protected]> Signed-off-by: Johan Hedberg <[email protected]> Cc: [email protected]
2014-06-02	netfilter: nf_tables: atomic allocation in set notifications from rcu callback	Pablo Neira Ayuso	1	-6/+6
	Use GFP_ATOMIC allocations when sending removal notifications of anonymous sets from rcu callback context. Sleeping in that context is illegal. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2014-06-02	netfilter: nf_tables: allow to delete several objects from a batch	Pablo Neira Ayuso	1	-9/+31
	Three changes to allow the deletion of several objects with dependencies in one transaction, they are: 1) Introduce speculative counter increment/decrement that is undone in the abort path if required, thus we avoid hitting -EBUSY when deleting the chain. The counter updates are reverted in the abort path. 2) Increment/decrement table/chain use counter for each set/rule. We need this to fully rely on the use counters instead of the list content, eg. !list_empty(&chain->rules) which evaluate true in the middle of the transaction. 3) Decrement table use counter when an anonymous set is bound to the rule in the commit path. This avoids hitting -EBUSY when deleting the table that contains anonymous sets. The anonymous sets are released in the nf_tables_rule_destroy path. This should not be a problem since the rule already bumped the use counter of the chain, so the bound anonymous set reflects dependencies through the rule object, which already increases the chain use counter. So the general assumption after this patch is that the use counters are bumped by direct object dependencies. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2014-06-02	netfilter: nft_rbtree: introduce locking	Pablo Neira Ayuso	1	-1/+21
	There's no rbtree rcu version yet, so let's fall back on the spinlock to protect the concurrent access of this structure both from user (to update the set content) and kernel-space (in the packet path). Signed-off-by: Pablo Neira Ayuso <[email protected]>
2014-06-02	netfilter: nf_tables: release objects in reverse order in the abort path	Pablo Neira Ayuso	1	-1/+2
	The patch c7c32e7 ("netfilter: nf_tables: defer all object release via rcu") indicates that we always release deleted objects in the reverse order, but that is only needed in the abort path. These are the two possible scenarios when releasing objects: 1) Deletion scenario in the commit path: no need to release objects in the reverse order since userspace already ensures that dependencies are fulfilled), ie. userspace tells us to delete rule -> ... -> rule -> chain -> table. In this case, we have to release the objects in the same order as userspace provided. 2) Deletion scenario in the abort path: we have to iterate in the reverse order to undo what it cannot be added, ie. userspace sent us a batch that includes: table -> chain -> rule -> ... -> rule, and that needs to be partially undone. In this case, we have to release objects in the reverse order to ensure that the set and chain objects point to valid rule and table objects. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2014-06-02	netfilter: nf_tables: fix wrong transaction ordering in set elements	Pablo Neira Ayuso	1	-2/+2
	The transaction needs to be placed at the end of the commit list, otherwise event notifications are reordered and we may crash when releasing object via call_rcu. This problem was introduced in 60319eb ("netfilter: nf_tables: use new transaction infrastructure to handle elements"). Reported-by: Arturo Borrero Gonzalez <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2014-06-02	netfilter: nfnetlink_acct: Fix memory leak	Mathieu Poirier	1	-1/+0
	Allocation of memory need only to happen once, that is after the proper checks on the NFACCT_FLAGS have been done. Otherwise the code can return without freeing already allocated memory. Signed-off-by: Mathieu Poirier <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>