aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2014-06-05bridge: Fix incorrect judgment of promiscToshiaki Makita1-1/+2
br_manage_promisc() incorrectly expects br_auto_port() to return only 0 or 1, while it actually returns flags, i.e., a subset of BR_AUTO_MASK. Signed-off-by: Toshiaki Makita <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-05MPLS: Use mpls_features to activate software MPLS GSO segmentationSimon Horman1-1/+27
If an MPLS packet requires segmentation then use mpls_features to determine if the software implementation should be used. As no driver advertises MPLS GSO segmentation this will always be the case. I had not noticed that this was necessary before as software MPLS GSO segmentation was already being used in my test environment. I believe that the reason for that is the skbs in question always had fragments and the driver I used does not advertise NETIF_F_FRAGLIST (which seems to be the case for most drivers). Thus software segmentation was activated by skb_gso_ok(). This introduces the overhead of an extra call to skb_network_protocol() in the case where where CONFIG_NET_MPLS_GSO is set and skb->ip_summed == CHECKSUM_NONE. Thanks to Jesse Gross for prompting me to investigate this. Signed-off-by: Simon Horman <[email protected]> Acked-by: YAMAMOTO Takashi <[email protected]> Acked-by: Thomas Graf <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-05Merge branch 'for-upstream' of ↵John W. Linville6-37/+60
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
2014-06-05ipv4: use skb frags api in udp4_hwcsum()WANG Cong1-4/+5
Cc: "David S. Miller" <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-05net: use the new API kvfree()WANG Cong11-58/+12
It is available since v3.15-rc5. Cc: Pablo Neira Ayuso <[email protected]> Cc: "David S. Miller" <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-05dns_resolver: Do not accept domain names longer than 255 charsManuel Schölling1-2/+2
According to RFC1035 "[...] the total length of a domain name (i.e., label octets and label length octets) is restricted to 255 octets or less." Signed-off-by: Manuel Schölling <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-04vxlan: Add support for UDP checksums (v4 sending, v6 zero csums)Tom Herbert1-1/+1
Added VXLAN link configuration for sending UDP checksums, and allowing TX and RX of UDP6 checksums. Also, call common iptunnel_handle_offloads and added GSO support for checksums. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-04gre: Call gso_make_checksumTom Herbert8-4/+17
Call gso_make_checksum. This should have the benefit of using a checksum that may have been previously computed for the packet. This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that offload GRE GSO with and without the GRE checksum offloaed. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-04net: Add GSO support for UDP tunnels with checksumTom Herbert6-23/+28
Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates that a device is capable of computing the UDP checksum in the encapsulating header of a UDP tunnel. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-04tcp: Call gso_make_checksumTom Herbert1-5/+2
Call common gso_make_checksum when calculating checksum for a TCP GSO segment. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-04net: Support for multiple checksums with gsoTom Herbert2-1/+15
When creating a GSO packet segment we may need to set more than one checksum in the packet (for instance a TCP checksum and UDP checksum for VXLAN encapsulation). To be efficient, we want to do checksum calculation for any part of the packet at most once. This patch adds csum_start offset to skb_gso_cb. This tracks the starting offset for skb->csum which is initially set in skb_segment. When a protocol needs to compute a transport checksum it calls gso_make_checksum which computes the checksum value from the start of transport header to csum_start and then adds in skb->csum to get the full checksum. skb->csum and csum_start are then updated to reflect the checksum of the resultant packet starting from the transport header. This patch also adds a flag to skbuff, encap_hdr_csum, which is set in *gso_segment fucntions to indicate that a tunnel protocol needs checksum calculation Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-04l2tp: call udp{6}_set_csumTom Herbert1-49/+5
Call common functions to set checksum for UDP tunnel. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-04udp: Generic functions to set checksumTom Herbert2-0/+75
Added udp_set_csum and udp6_set_csum functions to set UDP checksums in packets. These are for simple UDP packets such as those that might be created in UDP tunnels. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-04ipv6: Fix regression caused by efe4208 in udp_v6_mcast_next()Sven Wegener1-4/+4
Commit efe4208 ("ipv6: make lookups simpler and faster") introduced a regression in udp_v6_mcast_next(), resulting in multicast packets not reaching the destination sockets under certain conditions. The packet's IPv6 addresses are wrongly compared to the IPv6 addresses from the function's socket argument, which indicates the starting point for looping, instead of the loop variable. If the addresses from the first socket do not match the packet's addresses, no socket in the list will match. Signed-off-by: Sven Wegener <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-04net: Revert "fib_trie: use seq_file_net rather than seq->private"Sasha Levin1-1/+1
This reverts commit 30f38d2fdd79f13fc929489f7e6e517b4a4bfe63. fib_triestat is surrounded by a big lie: while it claims that it's a seq_file (fib_triestat_seq_open, fib_triestat_seq_show), it isn't: static const struct file_operations fib_triestat_fops = { .owner = THIS_MODULE, .open = fib_triestat_seq_open, .read = seq_read, .llseek = seq_lseek, .release = single_release_net, }; Yes, fib_triestat is just a regular file. A small detail (assuming CONFIG_NET_NS=y) is that while for seq_files you could do seq_file_net() to get the net ptr, doing so for a regular file would be wrong and would dereference an invalid pointer. The fib_triestat lie claimed a victim, and trying to show the file would be bad for the kernel. This patch just reverts the issue and fixes fib_triestat, which still needs a rewrite to either be a seq_file or stop claiming it is. Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-04xprtrdma: Disconnect on registration failureChuck Lever2-23/+42
If rpcrdma_register_external() fails during request marshaling, the current RPC request is killed. Instead, this RPC should be retried after reconnecting the transport instance. The most likely reason for registration failure with FRMR is a failed post_send, which would be due to a remote transport disconnect or memory exhaustion. These issues can be recovered by a retry. Problems encountered in the marshaling logic itself will not be corrected by trying again, so these should still kill a request. Now that we've added a clean exit for marshaling errors, take the opportunity to defang some BUG_ON's. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Remove BUG_ON() call sitesChuck Lever2-9/+12
If an error occurs in the marshaling logic, fail the RPC request being processed, but leave the client running. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Avoid deadlock when credit window is resetChuck Lever3-19/+7
Update the cwnd while processing the server's reply. Otherwise the next task on the xprt_sending queue is still subject to the old credit window. Currently, no task is awoken if the old congestion window is still exceeded, even if the new window is larger, and a deadlock results. This is an issue during a transport reconnect. Servers don't normally shrink the credit window, but the client does reset it to 1 when reconnecting so the server can safely grow it again. As a minor optimization, remove the hack of grabbing the initial cwnd size (which happens to be RPC_CWNDSCALE) and using that value as the congestion scaling factor. The scaling value is invariant, and we are better off without the multiplication operation. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04SUNRPC: Move congestion window constants to header fileChuck Lever1-19/+9
I would like to use one of the RPC client's congestion algorithm constants in transport-specific code. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Reset connection timeout after successful reconnectChuck Lever1-0/+1
If the new connection is able to make forward progress, reset the re-establish timeout. Otherwise it keeps growing even if disconnect events are rare. The same behavior as TCP is adopted: reconnect immediately if the transport instance has been able to make some forward progress. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Use macros for reconnection timeout constantsChuck Lever1-7/+12
Clean up: Ensure the same max and min constant values are used everywhere when setting reconnect timeouts. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Allocate missing pagelistShirley Ma1-0/+6
GETACL relies on transport layer to alloc memory for reply buffer. However xprtrdma assumes that the reply buffer (pagelist) has been pre-allocated in upper layer. This problem was reported by IOL OFA lab test on PPC. Signed-off-by: Shirley Ma <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Tested-by: Edward Mossman <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Remove Tavor MTU settingChuck Lever1-14/+0
Clean up. Remove HCA-specific clutter in xprtrdma, which is supposed to be device-independent. Hal Rosenstock <[email protected]> observes: > Note that there is OpenSM option (enable_quirks) to return 1K MTU > in SA PathRecord responses for Tavor so that can be used for this. > The default setting for enable_quirks is FALSE so that would need > changing. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnectingChuck Lever1-9/+20
Devesh Sharma <[email protected]> reports that after a disconnect, his HCA is failing to create a fresh QP, leaving ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to wake up and post LOCAL_INV as they exit, causing an oops. rpcrdma_ep_connect() is allowing the wake-up by leaking the QP creation error code (-EPERM in this case) to the RPC client's generic layer. xprt_connect_status() does not recognize -EPERM, so it kills pending RPC tasks immediately rather than retrying the connect. Re-arrange the QP creation logic so that when it fails on reconnect, it leaves ->qp with the old QP rather than NULL. If pending RPC tasks wake and exit, LOCAL_INV work requests will flush rather than oops. On initial connect, leaving ->qp == NULL is OK, since there are no pending RPCs that might use ->qp. But be sure not to try to destroy a NULL QP when rpcrdma_ep_connect() is retried. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Reduce the number of hardway buffer allocationsChuck Lever1-12/+13
While marshaling an RPC/RDMA request, the inline_{rsize,wsize} settings determine whether an inline request is used, or whether read or write chunks lists are built. The current default value of these settings is 1024. Any RPC request smaller than 1024 bytes is sent to the NFS server completely inline. rpcrdma_buffer_create() allocates and pre-registers a set of RPC buffers for each transport instance, also based on the inline rsize and wsize settings. RPC/RDMA requests and replies are built in these buffers. However, if an RPC/RDMA request is expected to be larger than 1024, a buffer has to be allocated and registered for that RPC, and deregistered and released when the RPC is complete. This is known has a "hardway allocation." Since the introduction of NFSv4, the size of RPC requests has become larger, and hardway allocations are thus more frequent. Hardway allocations are significant overhead, and they waste the existing RPC buffers pre-allocated by rpcrdma_buffer_create(). We'd like fewer hardway allocations. Increasing the size of the pre-registered buffers is the most direct way to do this. However, a blanket increase of the inline thresholds has interoperability consequences. On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000 bytes for each RPC request buffer, using kmalloc(). Due to internal fragmentation, this wastes nearly 1200 bytes because kmalloc() already returns an 8192-byte piece of memory for a 7000-byte allocation request, though the extra space remains unused. So let's round up the size of the pre-allocated buffers, and make use of the unused space in the kmalloc'd memory. This change reduces the amount of hardway allocated memory for an NFSv4 general connectathon run from 1322092 to 9472 bytes (99%). Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Limit work done by completion handlerChuck Lever2-4/+7
Sagi Grimberg <[email protected]> points out that a steady stream of CQ events could starve other work because of the boundless loop pooling in rpcrdma_{send,recv}_poll(). Instead of a (potentially infinite) while loop, return after collecting a budgeted number of completions. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Sagi Grimberg <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrmda: Reduce calls to ib_poll_cq() in completion handlersChuck Lever2-18/+42
Change the completion handlers to grab up to 16 items per ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16 items are returned. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrmda: Reduce lock contention in completion handlersChuck Lever1-4/+10
Skip the ib_poll_cq() after re-arming, if the provider knows there are no additional items waiting. (Have a look at commit ed23a727 for more details). Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Split the completion queueChuck Lever2-92/+137
The current CQ handler uses the ib_wc.opcode field to distinguish between event types. However, the contents of that field are not reliable if the completion status is not IB_WC_SUCCESS. When an error completion occurs on a send event, the CQ handler schedules a tasklet with something that is not a struct rpcrdma_rep. This is never correct behavior, and sometimes it results in a panic. To resolve this issue, split the completion queue into a send CQ and a receive CQ. The send CQ handler now handles only struct rpcrdma_mw wr_id's, and the receive CQ handler now handles only struct rpcrdma_rep wr_id's. Fix suggested by Shirley Ma <[email protected]> Reported-by: Rafael Reiter <[email protected]> Fixes: 5c635e09cec0feeeb310968e51dad01040244851 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211 Signed-off-by: Chuck Lever <[email protected]> Tested-by: Klemens Senn <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Make rpcrdma_ep_destroy() return voidChuck Lever3-13/+4
Clean up: rpcrdma_ep_destroy() returns a value that is used only to print a debugging message. rpcrdma_ep_destroy() already prints debugging messages in all error cases. Make rpcrdma_ep_destroy() return void instead. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Simplify rpcrdma_deregister_external() synopsisChuck Lever4-10/+4
Clean up: All remaining callers of rpcrdma_deregister_external() pass NULL as the last argument, so remove that argument. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: mount reports "Invalid mount option" if memreg mode not supportedChuck Lever1-4/+4
If the selected memory registration mode is not supported by the underlying provider/HCA, the NFS mount command reports that there was an invalid mount option, and fails. This is misleading. Reporting a problem allocating memory is a lot closer to the truth. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Fall back to MTHCAFMR when FRMR is not supportedChuck Lever1-16/+15
An audit of in-kernel RDMA providers that do not support the FRMR memory registration shows that several of them support MTHCAFMR. Prefer MTHCAFMR when FRMR is not supported. If MTHCAFMR is not supported, only then choose ALLPHYSICAL. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Remove REGISTER memory registration modeChuck Lever2-88/+5
All kernel RDMA providers except amso1100 support either MTHCAFMR or FRMR, both of which are faster than REGISTER. amso1100 can continue to use ALLPHYSICAL. The only other ULP consumer in the kernel that uses the reg_phys_mr verb is Lustre. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Remove MEMWINDOWS registration modesChuck Lever4-203/+7
The MEMWINDOWS and MEMWINDOWS_ASYNC memory registration modes were intended as stop-gap modes before the introduction of FRMR. They are now considered obsolete. MEMWINDOWS_ASYNC is also considered unsafe because it can leave client memory registered and exposed for an indeterminant time after each I/O. At this point, the MEMWINDOWS modes add needless complexity, so remove them. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Remove BOUNCEBUFFERS memory registration modeChuck Lever3-28/+1
Clean up: This memory registration mode is slow and was never meant for use in production environments. Remove it to reduce implementation complexity. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process contextChuck Lever3-7/+21
An IB provider can invoke rpcrdma_conn_func() in an IRQ context, thus rpcrdma_conn_func() cannot be allowed to directly invoke generic RPC functions like xprt_wake_pending_tasks(). Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04nfs-rdma: Fix for FMR leaksAllen Andrews1-35/+38
Two memory region leaks were found during testing: 1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is called. If ib_alloc_fast_reg_page_list returns an error it bails out of the routine dropping the last ib_alloc_fast_reg_mr frmr region creating a memory leak. Added code to dereg the last frmr if ib_alloc_fast_reg_page_list fails. 2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free the MR's on the rb_mws list if there are rb_send_bufs present. However, in rpcrdma_buffer_create while the rb_mws list is being built if one of the MR allocation requests fail after some MR's have been allocated on the rb_mws list the routine never gets to create any rb_send_bufs but instead jumps to the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws list because the rb_send_bufs were never created. This leaks all the MR's on the rb_mws list that were created prior to one of the MR allocations failing. Issue(2) was seen during testing. Our adapter had a finite number of MR's available and we created enough connections to where we saw an MR allocation failure on our Nth NFS connection request. After the kernel cleaned up the resources it had allocated for the Nth connection we noticed that FMR's had been leaked due to the coding error described above. Issue(1) was seen during a code review while debugging issue(2). Signed-off-by: Allen Andrews <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: mind the device's max fast register page list depthSteve Wise3-16/+36
Some rdma devices don't support a fast register page list depth of at least RPCRDMA_MAX_DATA_SEGS. So xprtrdma needs to chunk its fast register regions according to the minimum of the device max supported depth or RPCRDMA_MAX_DATA_SEGS. Signed-off-by: Steve Wise <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller13-44/+131
Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <[email protected]>
2014-06-03net: remove some unless free on failure in alloc_netdev_mqs()WANG Cong1-5/+0
When we jump to free_pcpu on failure in alloc_netdev_mqs() rx and tx queues are not yet allocated, so no need to free them. Cc: David S. Miller <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-03rtnetlink: fix a memory leak when ->newlink failsCong Wang1-3/+7
It is possible that ->newlink() fails before registering the device, in this case we should just free it, it's safe to call free_netdev(). Fixes: commit 0e0eee2465df77bcec2 (net: correct error path in rtnl_newlink()) Cc: David S. Miller <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-03xfrm: fix race between netns cleanup and state expire notificationMichal Kubecek1-11/+25
The xfrm_user module registers its pernet init/exit after xfrm itself so that its net exit function xfrm_user_net_exit() is executed before xfrm_net_exit() which calls xfrm_state_fini() to cleanup the SA's (xfrm states). This opens a window between zeroing net->xfrm.nlsk pointer and deleting all xfrm_state instances which may access it (via the timer). If an xfrm state expires in this window, xfrm_exp_state_notify() will pass null pointer as socket to nlmsg_multicast(). As the notifications are called inside rcu_read_lock() block, it is sufficient to retrieve the nlsk socket with rcu_dereference() and check the it for null. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-03Merge branch 'locking-core-for-linus' of ↵Linus Torvalds17-36/+34
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull core locking updates from Ingo Molnar: "The main changes in this cycle were: - reduced/streamlined smp_mb__*() interface that allows more usecases and makes the existing ones less buggy, especially in rarer architectures - add rwsem implementation comments - bump up lockdep limits" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) rwsem: Add comments to explain the meaning of the rwsem's count field lockdep: Increase static allocations arch: Mass conversion of smp_mb__*() arch,doc: Convert smp_mb__*() arch,xtensa: Convert smp_mb__*() arch,x86: Convert smp_mb__*() arch,tile: Convert smp_mb__*() arch,sparc: Convert smp_mb__*() arch,sh: Convert smp_mb__*() arch,score: Convert smp_mb__*() arch,s390: Convert smp_mb__*() arch,powerpc: Convert smp_mb__*() arch,parisc: Convert smp_mb__*() arch,openrisc: Convert smp_mb__*() arch,mn10300: Convert smp_mb__*() arch,mips: Convert smp_mb__*() arch,metag: Convert smp_mb__*() arch,m68k: Convert smp_mb__*() arch,m32r: Convert smp_mb__*() arch,ia64: Convert smp_mb__*() ...
2014-06-02Merge branch 'ethtool-rssh-fixes' of ↵David S. Miller1-62/+48
git://git.kernel.org/pub/scm/linux/kernel/git/bwh/net-next Ben Hutchings says: ==================== Pull request: Fixes for new ethtool RSS commands This addresses several problems I previously identified with the new ETHTOOL_{G,S}RSSH commands: 1. Missing validation of reserved parameters 2. Vague documentation 3. Use of unnamed magic number 4. No consolidation with existing driver operations I don't currently have access to suitable network hardware, but have tested these changes with a dummy driver that can support various combinations of operations and sizes, together with (a) Debian's ethtool 3.13 (b) ethtool 3.14 with the submitted patch to use ETHTOOL_{G,S}RSSH and minor adjustment for fixes 1 and 3. v2: Update RSS operations in vmxnet3 too ==================== Signed-off-by: David S. Miller <[email protected]>
2014-06-03ethtool: Check that reserved fields of struct ethtool_rxfh are 0Ben Hutchings1-53/+36
We should fail rather than silently ignoring use of these extensions. Signed-off-by: Ben Hutchings <[email protected]>
2014-06-03ethtool: Replace ethtool_ops::{get,set}_rxfh_indir() with {get,set}_rxfh()Ben Hutchings1-4/+4
ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers regardless of whether they expose the hash key, unless you try to set a hash key for a driver that doesn't expose it. Signed-off-by: Ben Hutchings <[email protected]> Acked-by: Jeff Kirsher <[email protected]>
2014-06-02bridge: Add bridge ifindex to bridge fdb notify msgsRoopa Prabhu1-0/+3
(This patch was previously posted as RFC at http://patchwork.ozlabs.org/patch/352677/) This patch adds NDA_MASTER attribute to neighbour attributes enum for bridge/master ifindex. And adds NDA_MASTER to bridge fdb notify msgs. Today bridge fdb notifications dont contain bridge information. Userspace can derive it from the port information in the fdb notification. However this is tricky in some scenarious. Example, bridge port delete notification comes before bridge fdb delete notifications. And we have seen problems in userspace when using libnl where, the bridge fdb delete notification handling code does not understand which bridge this fdb entry is part of because the bridge and port association has already been deleted. And these notifications (port membership and fdb) are generated on separate rtnl groups. Fixing the order of notifications could possibly solve the problem for some cases (I can submit a separate patch for that). This patch chooses to add NDA_MASTER to bridge fdb notify msgs because it not only solves the problem described above, but also helps userspace avoid another lookup into link msgs to derive the master index. Signed-off-by: Roopa Prabhu <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02net: filter: fix possible memory leak in __sk_prepare_filter()Leon Yu1-1/+6
__sk_prepare_filter() was reworked in commit bd4cf0ed3 (net: filter: rework/optimize internal BPF interpreter's instruction set) so that it should have uncharged memory once things went wrong. However that work isn't complete. Error is handled only in __sk_migrate_filter() while memory can still leak in the error path right after sk_chk_filter(). Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Leon Yu <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Tested-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-02tcp: fix cwnd undo on DSACK in F-RTOYuchung Cheng1-6/+5
This bug is discovered by an recent F-RTO issue on tcpm list https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html The bug is that currently F-RTO does not use DSACK to undo cwnd in certain cases: upon receiving an ACK after the RTO retransmission in F-RTO, and the ACK has DSACK indicating the retransmission is spurious, the sender only calls tcp_try_undo_loss() if some never retransmisted data is sacked (FLAG_ORIG_DATA_SACKED). The correct behavior is to unconditionally call tcp_try_undo_loss so the DSACK information is used properly to undo the cwnd reduction. Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>