aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-02-09SUNRPC: Handle connection reset more efficiently.Trond Myklebust1-16/+18
If the connection reset is due to an active call on our side, then the state change is sometimes not reported. Catch those instances using xs_error_report() instead. Also remove the xs_tcp_shutdown() call in xs_tcp_send_request() as the change in behaviour makes it redundant. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flagTrond Myklebust3-3/+0
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_releaseTrond Myklebust1-5/+1
Use of socket shutdown() means that we monitor the shutdown process through the xs_tcp_state_change() callback, so it is preferable to a full close in all cases unless we're destroying the transport. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connectionTrond Myklebust1-2/+2
The previous behaviour left the connection half-open in order to try to scrape the last replies from the socket. Now that we have more reliable reconnection, change the behaviour to close down the socket faster. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORTTrond Myklebust2-4/+0
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09SUNRPC: Remove TCP socket linger codeTrond Myklebust1-35/+0
Now that we no longer use the partial shutdown code when closing the socket, we no longer need to worry about the TCP linger2 state. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Remove TCP client connection reset hackTrond Myklebust2-67/+1
Instead we rely on SO_REUSEPORT to provide the reconnection semantics that we need for NFSv2/v3. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: TCP/UDP always close the old socket before reconnectingTrond Myklebust1-2/+3
It is not safe to call xs_reset_transport() from inside xs_udp_setup_socket() or xs_tcp_setup_socket(), since they do not own the correct locks. Instead, do it in xs_connect(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Add helpers to prevent socket create from racingTrond Myklebust3-6/+41
The socket lock is currently held by the task that is requesting the connection be established. While that is efficient in the case where the connection happens quickly, it is racy in the case where it doesn't. What we really want is for the connect helper to be able to block access to the socket while it is being set up. This patch does so by arranging to transfer the socket lock from the task that is requesting the connect attempt, and then releasing that lock once everything is done. This scheme also gives us automatic protection against collisions with the RPC close code, so we can kill the cancel_delayed_work_sync() call in xs_close(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Ensure xs_reset_transport() resets the close connection flagsTrond Myklebust1-16/+13
Otherwise, we may end up looping. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Do not clear the source port in xs_reset_transportTrond Myklebust1-2/+0
Now that we can reuse bound ports after a close, we never really want to clear the transport's source port after it has been set. Doing so really messes up the NFSv3 DRC on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Handle EADDRINUSE on connectTrond Myklebust2-0/+5
Now that we're setting SO_REUSEPORT, we still need to handle the case where a connect() is attempted, but the old socket is still lingering. Essentially, all we want to do here is handle the error by waiting a few seconds and then retrying. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08SUNRPC: Set SO_REUSEPORT socket option for TCP connectionsTrond Myklebust1-4/+49
When using TCP, we need the ability to reuse port numbers after a disconnection, so that the NFSv3 server knows that we're the same client. Currently we use a hack to work around the TCP socket's TIME_WAIT: we send an RST instead of closing, which doesn't always work... The SO_REUSEPORT option added in Linux 3.9 allows us to bind multiple TCP connections to the same source address+port combination, and thus to use ordinary TCP close() instead of the current hack. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08Merge tag 'nfs-rdma-for-3.20-part-2' of ↵Trond Myklebust1-3/+4
git://git.linux-nfs.org/projects/anna/nfs-rdma NFS: RDMA Client Sparse Fixes This patch fixes a sparse warning in the initial submission. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-for-3.20-part-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma: xprtrdma: Address sparse complaint in rpcr_to_rdmar()
2015-02-05NFSv4.1: Fix pnfs_put_lseg racesTrond Myklebust1-34/+19
pnfs_layoutreturn_free_lseg_async() can also race with inode put in the general case. We can now fix this, and also simplify the code. Cc: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-05NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFSTrond Myklebust1-1/+1
In we want to be able to call pnfs_send_layoutreturn() from within the writeback path, we really want it to use GFP_NOFS in order to prevent recursion. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-05NFSv4.1: Pin the inode and super block in asynchronous layoutreturnsTrond Myklebust2-8/+12
If we're sending an asynchronous layoutreturn, then we need to ensure that the inode and the super block remain pinned. Cc: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Peng Tao <tao.peng@primarydata.com>
2015-02-05NFSv4.1: Pin the inode and super block in asynchronous layoutcommitTrond Myklebust2-8/+12
If we're sending an asynchronous layoutcommit, then we need to ensure that the inode and the super block remain pinned. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Peng Tao <tao.peng@primarydata.com>
2015-02-05NFSv4: Ensure we reference the inode for return-on-close in delegreturnTrond Myklebust3-9/+36
If we have to do a return-on-close in the delegreturn code, then we must ensure that the inode and super block remain referenced. Cc: Peng Tao <tao.peng@primarydata.com> Cc: stable@vger.kernel.org # 3.17.x Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Peng Tao <tao.peng@primarydata.com>
2015-02-05xprtrdma: Address sparse complaint in rpcr_to_rdmar()Chuck Lever1-3/+4
With "make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__": linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: warning: incorrect type in initializer (different base types) linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: expected restricted __be32 [usertype] *buffer linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: got unsigned int [usertype] *rq_buffer As far as I can tell this is a false positive. Reported-by: kbuild-all@01.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-02-04NFSv4.1: Ask for no delegation on OPEN if using O_DIRECTTrond Myklebust3-26/+85
If we're using NFSv4.1, then we have the ability to let the server know whether or not we believe that returning a delegation as part of our OPEN request would be useful. The feature needs to be used with care, since the client sending the request doesn't necessarily know how other clients are using that file, and how they may be affected by the delegation. For this reason, our initial use of the feature will be to let the server know when the client believes that handing out a delegation would not be useful. The first application for this function is when opening the file using O_DIRECT. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-04NFS: Add Anna Schumaker as co-maintainer for the NFS clientTrond Myklebust1-0/+1
Anna has essentially been performing the duties of co-maintainer for the past several years. In recognition of those efforts, I'd like to add her to the maintainers file. Cc: Anna Schumaker <anna.schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-03SUNRPC: NULL utsname dereference on NFS umount during namespace cleanupTrond Myklebust4-12/+24
Fix an Oopsable condition when nsm_mon_unmon is called as part of the namespace cleanup, which now apparently happens after the utsname has been freed. Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.home Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Cc: stable@vger.kernel.org # 3.18 Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-03Merge branch 'flexfiles'Trond Myklebust39-1041/+4268
* flexfiles: (53 commits) pnfs: lookup new lseg at lseg boundary nfs41: .init_read and .init_write can be called with valid pg_lseg pnfs: Update documentation on the Layout Drivers pnfs/flexfiles: Add the FlexFile Layout Driver nfs: count DIO good bytes correctly with mirroring nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags nfs/flexfiles: send layoutreturn before freeing lseg nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE nfs41: allow async version layoutreturn nfs41: add range to layoutreturn args pnfs: allow LD to ask to resend read through pnfs nfs: add nfs_pgio_current_mirror helper nfs: only reset desc->pg_mirror_idx when mirroring is supported nfs41: add a debug warning if we destroy an unempty layout pnfs: fail comparison when bucket verifier not set nfs: mirroring support for direct io nfs: add mirroring support to pgio layer pnfs: pass ds_commit_idx through the commit path ... Conflicts: fs/nfs/pnfs.c fs/nfs/pnfs.h
2015-02-03pnfs: lookup new lseg at lseg boundaryWeston Andros Adamson1-2/+8
Before mirroring support was added, the pageio descriptor's pg_lseg was set to null when an RPC was sent. Because of this, pg_init was called at lseg boundaries with pg_lseg = NULL, and it could be set to the new lseg. Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03nfs41: .init_read and .init_write can be called with valid pg_lsegPeng Tao1-21/+20
With pgio refactoring in v3.15, .init_read and .init_write can be called with valid pgio->pg_lseg. file layout was fixed at that time by commit c6194271f (pnfs: filelayout: support non page aligned layouts). But the generic helper still needs to be fixed. Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03pnfs: Update documentation on the Layout DriversTom Haynes1-6/+7
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03pnfs/flexfiles: Add the FlexFile Layout DriverTom Haynes13-12/+2325
The flexfile layout is a new layout that extends the file layout. It is currently being drafted as a specification at https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layout-types/ Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Tom Haynes <loghyr@primarydata.com> Signed-off-by: Tao Peng <bergwolf@primarydata.com>
2015-02-03nfs: count DIO good bytes correctly with mirroringPeng Tao1-4/+8
When resending to MDS, we might resend multiple mirroring requests to MDS. As a result, nfs_direct_good_bytes() ends up counting bytes multiple times, causing application to get wrong return results in read/write syscalls. Fix it by tracking start of a dreq and checking the range of pgio header. Cc: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03nfs41: wait for LAYOUTRETURN before retrying LAYOUTGETPeng Tao3-3/+45
Also take care to stop waiting if someone clears retry bit. Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writesPeng Tao2-0/+7
To allow pnfs LD to ask direct writes to be resend. Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flagsPeng Tao2-0/+21
Use it to indicate that LD wants to retry layoutget. LD can set it whenever it wants the common pnfs code to return and retry pnfs path through a new layout. The bit gets cleared when client does a new layoutget, when client closes the file (ROC case), or when kernel needs to evict the inode (non-ROC case). Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03nfs/flexfiles: send layoutreturn before freeing lsegPeng Tao1-25/+56
Otherwise we'll lose error tracking information when encoding layoutreturn. pnfs_put_lseg may be called from rpc callbacks. So we should not call pnfs_send_layoutreturn directly because it can deadlock in the rpc layer. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSEPeng Tao3-7/+36
When it is set, generic pnfs would try to send layoutreturn right before last close/delegation_return regard less NFS_LAYOUT_ROC is set or not. LD can then make sure layoutreturn is always sent rather than being omitted. The difference against NFS_LAYOUT_RETURN is that NFS_LAYOUT_RETURN_BEFORE_CLOSE does not block usage of the layout so LD can set it and expect generic layer to try pnfs path at the same time. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03nfs41: allow async version layoutreturnPeng Tao3-8/+16
Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03nfs41: add range to layoutreturn argsPeng Tao3-5/+7
So that callers can specify which range to return. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03pnfs: allow LD to ask to resend read through pnfsPeng Tao2-1/+16
If current IO cannot be completed due to some transient errors, LD may want to ask generic layer to resend the request through pnfs again. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03nfs: add nfs_pgio_current_mirror helperPeng Tao4-14/+24
Let it return current nfs_pgio_mirror in use depending on pg_mirror_count. For read, we always use pg_mirrors[0], so this effectively gives us freedom to use pg_mirror_idx to track the actual mirror to read from through out the IO stack. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03nfs: only reset desc->pg_mirror_idx when mirroring is supportedPeng Tao2-4/+11
so that we don't reset desc->pg_mirror_idx for read unnecessarily. Remove WARN_ON_ONCE from __nfs_pageio_add_request to allow LD to set pg_mirror_idx for read where pg_mirror_count is always 1. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03nfs41: add a debug warning if we destroy an unempty layoutPeng Tao1-0/+2
So that we can detect the case if some layout segments are still pinned which is surely a bug that we need to fix. Signed-off-by: Peng Tao <tao.peng@primarydata.com>
2015-02-03pnfs: fail comparison when bucket verifier not setWeston Andros Adamson1-1/+5
This skips the WARN_ON_ONCE, but doesnt change behavior (the memcmp would fail). Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
2015-02-03nfs: mirroring support for direct ioWeston Andros Adamson1-14/+57
The current mirroring code only notices short writes to the first mirror. This patch keeps per-mirror byte counts and only considers a byte to be written once all mirrors report so. Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03nfs: add mirroring support to pgio layerWeston Andros Adamson9-67/+311
This patch adds mirrored write support to the pgio layer. The default is to use one mirror, but pgio callers may define callbacks to change this to any value up to the (arbitrarily selected) limit of 16. The basic idea is to break out members of nfs_pageio_descriptor that cannot be shared between mirrored DSes and put them in a new structure. Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03pnfs: pass ds_commit_idx through the commit pathWeston Andros Adamson6-17/+24
Pass ds_commit_idx through the nfs commit path. It's used to select the commit bucket when using pnfs and is ignored when not using pnfs. Several functions had to be changed: nfs_retry_commit, nfs_mark_request_commit, pnfs_mark_request_commit and the pnfs layout driver .mark_request_commit functions. Signed-off-by: Tom Haynes <loghyr@primarydata.com>
2015-02-03nfs: rename pgio header ds_idx to ds_commit_idxWeston Andros Adamson3-11/+9
'ds_commit_idx' is a better name - it is used to select the right commit bucket for pnfs. Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03nfs: handle overlapping reqs in lock_and_joinWeston Andros Adamson1-6/+11
This is needed for mirrored DS support, where multuple requests cover the same range. Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03pnfs: release lseg in pnfs_generic_pg_cleanupWeston Andros Adamson5-18/+21
This is needed to support mirrored writes - the first write can't just trash the lseg, we need to keep it around until all mirrors have written. Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03nfs: introduce pg_cleanup op for pgio descriptorsWeston Andros Adamson2-1/+5
Add a new operation to nfs_pageio_ops that is called on nfs_pageio_complete. Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
2015-02-03nfs/filelayout: use pnfs_error_mark_layout_for_returnPeng Tao2-11/+1
Instead of calling layoutreturn directly, call pnfs_error_mark_layout_for_return to mark layouts for return and let generic code return layout when layout segments are freed. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com> Conflicts: fs/nfs/filelayout/filelayout.c
2015-02-03nfs41: clear NFS_LAYOUT_RETURN if layoutreturn is sent or failed to sendPeng Tao2-0/+6
So that pnfs path is not disabled for ever. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>