aboutsummaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)AuthorFilesLines
2016-01-04NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structuresTrond Myklebust2-2/+9
Signed-off-by: Trond Myklebust <[email protected]>
2016-01-04NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()Trond Myklebust1-5/+5
Make it more obvious what we're returning... Signed-off-by: Trond Myklebust <[email protected]>
2016-01-04NFSv4.1/pNFS: Fix a race in initiate_file_draining()Trond Myklebust1-4/+1
Peng Tao points out that the call to pnfs_mark_matching_lsegs_return() could race with pnfs_put_lseg(), in which case the layout segment is cleared, but no layoutreturn will be sent. Fix is to replace the call to pnfs_mark_matching_lsegs_invalid(). Reported-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2016-01-04NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layoutTrond Myklebust2-7/+21
Fix a bug whereby if all the layout segments could be immediately freed, the call to pnfs_error_mark_layout_for_return() would never result in a layoutreturn. Signed-off-by: Trond Myklebust <[email protected]>
2016-01-04NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomodeTrond Myklebust1-4/+12
If pnfs_mark_matching_lsegs_return() needs to mark a layout segment for return, then it must also set the return iomode. Signed-off-by: Trond Myklebust <[email protected]>
2016-01-04NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateidsTrond Myklebust1-3/+3
Signed-off-by: Trond Myklebust <[email protected]>
2016-01-04NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()Trond Myklebust1-6/+6
A stateid is a structure, pass it as a pointer. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-31NFS: Relax requirements in nfs_flush_incompatibleTrond Myklebust3-7/+8
If two processes share the same credentials and NFSv4 open stateid, then allow them both to dirty the same page, even if their nfs_open_context differs. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-31NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalidTrond Myklebust6-0/+37
If the layout segment is invalid, then we should not be adding more write requests to the commit list. Instead, those writes should be replayed after requesting a new layout. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-31NFS: Allow multiple commit requests in flight per fileTrond Myklebust5-49/+35
Allow synchronous RPC calls to wait for pending RPC calls to finish, but also allow asynchronous ones to just fire off another commit. With this patch, the xfstests generic/074 test completes in 226s instead of 242s Signed-off-by: Trond Myklebust <[email protected]>
2015-12-31NFS/pNFS: Fix up pNFS write reschedule layering violations and bugsTrond Myklebust4-19/+22
The flexfiles layout in particular, seems to want to poke around in the O_DIRECT flags when retransmitting. This patch sets up an interface to allow it to call back into O_DIRECT to handle retransmission correctly. It also fixes a potential bug whereby we could change the behaviour of O_DIRECT if an error is already pending. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-30switch ->get_link() to delayed_call, kill ->put_link()Al Viro1-3/+3
Signed-off-by: Al Viro <[email protected]>
2015-12-30pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh()Trond Myklebust1-1/+1
Jeff reports seeing an Oops in ff_layout_alloc_lseg. Turns out copy+paste has played cruel tricks on a nested loop. Reported-by: Jeff Layton <[email protected]> Cc: [email protected] # 4.3+ Signed-off-by: Trond Myklebust <[email protected]>
2015-12-30NFS: Fix attribute cache revalidationTrond Myklebust1-15/+39
If a NFSv4 client uses the cache_consistency_bitmask in order to request only information about the change attribute, timestamps and size, then it has not revalidated all attributes, and hence the attribute timeout timestamp should not be updated. Reported-by: Donald Buczek <[email protected]> Cc: [email protected] Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFS: Ensure we revalidate attributes before using execute_ok()Trond Myklebust1-2/+16
Donald Buczek reports that NFS clients can also report incorrect results for access() due to lack of revalidation of attributes before calling execute_ok(). Looking closely, it seems chdir() is afflicted with the same problem. Fix is to ensure we call nfs_revalidate_inode_rcu() or nfs_revalidate_inode() as appropriate before deciding to trust execute_ok(). Reported-by: Donald Buczek <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28Merge branch 'flexfiles'Trond Myklebust14-174/+339
* flexfiles: pNFS/flexfiles: Ensure we record layoutstats even if RPC is terminated early pNFS: Add flag to track if we've called nfs4_ff_layout_stat_io_start_read/write pNFS/flexfiles: Fix a statistics gathering imbalance pNFS/flexfiles: Don't mark the entire layout as failed, when returning it pNFS/flexfiles: Don't prevent flexfiles client from retrying LAYOUTGET pnfs/flexfiles: count io stat in rpc_count_stats callback pnfs/flexfiles: do not mark delay-like status as DS failure NFS41: map NFS4ERR_LAYOUTUNAVAILABLE to ENODATA nfs: only remove page from mapping if launder_page fails nfs: handle request add failure properly nfs: centralize pgio error cleanup nfs: clean up rest of reqs when failing to add one NFS41: pop some layoutget errors to application pNFS/flexfiles: Support server-supplied layoutstats sampling period
2015-12-28NFSv4: List stateid information in the callback tracepointsTrond Myklebust2-6/+79
The stateid is extremely valuable when debugging. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALLTrond Myklebust1-1/+1
If the client is promising to return the layout ASAP, then there is no need to return DELAY and have the server retry. Instead default to the normal procedure described in RFC5661. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1Trond Myklebust1-0/+20
The RFC requires us to check if the server is recalling a stateid that we haven't yet received. If so, tell it to wait. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS: If we have to delay the layout callback, mark the layout for returnTrond Myklebust3-3/+18
If the client needs to delay the layout callback, then speed up the recall process by marking the remaining layout segments to be actively returned by the client. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFSv4.1/pNFS: Add a helper to mark the layout as returnedTrond Myklebust4-1/+17
This ensures that we don't reuse the stateid if a layout return or implied layout return means that we've returned all layout segments Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS: Ensure nfs4_layoutget_prepare returns the correct errorTrond Myklebust1-4/+5
If we're unable to perform the layoutget due to an invalid open stateid or a bulk recall, ensure that we return the error so that the caller can decide on an appropriate action. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS/flexfiles: Ensure we record layoutstats even if RPC is terminated earlyTrond Myklebust1-6/+31
Currently, we will only record the layoutstats correctly if the RPC call successfully obtains a slot. If we exit before that happens, then we may find ourselves starting the busy timer through the call in ff_layout_(read|write)_prepare_layoutstats, but never stopping it. The same thing happens if we're doing DA-DS. The fix is to ensure that we catch these cases in the rpc_release() callback. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS: Add flag to track if we've called nfs4_ff_layout_stat_io_start_read/writeTrond Myklebust1-25/+70
Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS/flexfiles: Fix a statistics gathering imbalanceTrond Myklebust1-1/+1
When we replay a failed read, write or commit to the dataserver, we need to ensure that we call ff_layout_read_prepare_v3(), ff_layout_write_prepare_v3 or ff_layout_commit_prepare_v3() so that we reset the statistics. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS/flexfiles: Don't mark the entire layout as failed, when returning itTrond Myklebust2-3/+1
In pNFS/flexfiles, we want to return the layout without necessarily marking it as having completely failed. We therefore move the call to pnfs_layout_io_set_failed() out of pnfs_error_mark_layout_for_return(), and then ensura that pNFS/files layout calls it separately. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS/flexfiles: Don't prevent flexfiles client from retrying LAYOUTGETTrond Myklebust4-53/+6
Fix a bug in which flexfiles clients are falling back to I/O through the MDS even when the FF_FLAGS_NO_IO_THRU_MDS flag is set. The flexfiles client will always report errors through the LAYOUTRETURN and/or LAYOUTERROR mechanisms, so it should normally be safe for it to retry the LAYOUTGET until it fails or succeeds. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pnfs/flexfiles: count io stat in rpc_count_stats callbackPeng Tao1-12/+10
If client ever restarts IO due to some errors, we'll endup mis-counting IO stats if we do the counting in .rpc_done callback. Move it to .rpc_count_stats callback that is only called when releasing RPC. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pnfs/flexfiles: do not mark delay-like status as DS failurePeng Tao1-1/+8
We just need to delay and retry in these cases. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFS41: map NFS4ERR_LAYOUTUNAVAILABLE to ENODATAPeng Tao1-0/+9
Instead of mapping it to EIO that is a fatal error and fails application. We'll go inband after getting NFS4ERR_LAYOUTUNAVAILABLE. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28nfs: only remove page from mapping if launder_page failsPeng Tao2-17/+24
Instead of dropping pages when write fails, only do it when we get fatal failure in launder_page write back. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28nfs: handle request add failure properlyPeng Tao5-31/+67
When we fail to queue a read page to IO descriptor, we need to clean it up otherwise it is hanging around preventing nfs module from being removed. When we fail to queue a write page to IO descriptor, we need to clean it up and also save the failure status to open context. Then at file close, we can try to write pages back again and drop the page if it fails to writeback in .launder_page, which will be done in the next patch. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28nfs: centralize pgio error cleanupPeng Tao2-32/+33
In case we fail during setting things up for read/write IO, set pg_error in IO descriptor and do the cleanup in nfs_pageio_add_request, where we clean up all pages that are still hanging around on the IO descriptor. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28nfs: clean up rest of reqs when failing to add onePeng Tao1-3/+14
If we fail to set up things before sending anything over wire, we need to clean up the reqs that are still attached to the IO descriptor. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFS41: pop some layoutget errors to applicationPeng Tao6-14/+78
For ERESTARTSYS/EIO/EROFS/ENOSPC/E2BIG in layoutget, we should just bail out instead of hiding the error and retrying inband IO. Change all the call sites to pop the error all the way up. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS/flexfiles: Support server-supplied layoutstats sampling periodTrond Myklebust2-3/+14
Some servers want to be able to control the frequency with which clients report layoutstats, for instance, in order to monitor QoS for a particular file or set of file. In order to support this, the flexfiles layout allows the server to pass this info as a hint in the layout payload. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFS: Flush reclaim writes using FLUSH_COND_STABLETrond Myklebust1-1/+1
If there are already writes queued up for commit, then don't flush just this page even if it is a reclaim issue. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFS: Background flush should not be low priorityTrond Myklebust1-2/+0
Background flush is needed in order to satisfy the global page limits. Don't subvert by reducing the priority. This should also address a write starvation issue that was reported by Neil Brown. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFSv4.1/pnfs: Fixup an lo->plh_block_lgets imbalance in layoutreturnTrond Myklebust1-1/+0
Since commit 2d8ae84fbc32, nothing is bumping lo->plh_block_lgets in the layoutreturn path, so it should not be touched in nfs4_layoutreturn_release either. Fixes: 2d8ae84fbc32 ("NFSv4.1/pnfs: Remove redundant lo->plh_block_lgets...") Cc: [email protected] # 4.3+ Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFSv4: Don't perform cached access checks before we've OPENed the fileTrond Myklebust1-0/+3
Donald Buczek reports that a nfs4 client incorrectly denies execute access based on outdated file mode (missing 'x' bit). After the mode on the server is 'fixed' (chmod +x) further execution attempts continue to fail, because the nfs ACCESS call updates the access parameter but not the mode parameter or the mode in the inode. The root cause is ultimately that the VFS is calling may_open() before the NFS client has a chance to OPEN the file and hence revalidate the access and attribute caches. Al Viro suggests: >>> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know >>> that things will be caught by server anyway? >> >> That can work as long as we're guaranteed that everything that calls >> inode_permission() with MAY_OPEN on a regular file will also follow up >> with a vfs_open() or dentry_open() on success. Is this always the >> case? > > 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS since > it doesn't have ->tmpfile() instance anyway) > > 2) in atomic_open(), after the call of ->atomic_open() has succeeded. > > 3) in do_last(), followed on success by vfs_open() > > That's all. All calls of inode_permission() that get MAY_OPEN come from > may_open(), and there's no other callers of that puppy. Reported-by: Donald Buczek <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=109771 Link: http://lkml.kernel.org/r/[email protected] Cc: Al Viro <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28nfs: machine credential support for additional operationsAndrew Elble1-0/+20
Allow LAYOUTRETURN and DELEGRETURN to use machine credentials if the server supports it. Add request for OPEN_DOWNGRADE as the close path also uses that. Signed-off-by: Andrew Elble <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28nfs: do not initialise statics to 0Wei Tang1-1/+1
This patch fixes the checkpatch.pl error to nfs4sysctl.c: ERROR: do not initialise statics to 0 Signed-off-by: Wei Tang <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS: Modify pnfs_update_layout tracepoints to use layout stateidTrond Myklebust3-16/+28
Instead of displaying a layout segment pointer in these tracepoints, let's use the layout stateid, now that Olga gave us a set of tools for displaying them. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFSv4: Fix unused variable warnings in nfs4_init_*_client_string()Trond Myklebust1-6/+3
Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28nfs: add new tracepoint for pnfs_update_layoutJeff Layton2-6/+88
pnfs_update_layout is really the "nexus" of layout handling. If it returns NULL then we end up going through the MDS. This patch adds some tracepoints to that function that allow us to determine the cause when we end up going through the MDS unexpectedly. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28Adding tracepoint to cached openOlga Kornievskaia2-0/+41
Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28Adding stateid information to tracepointsOlga Kornievskaia3-32/+245
Operations to which stateid information is added: close, delegreturn, open, read, setattr, layoutget, layoutcommit, test_stateid, write, lock, locku, lockt Format is "stateid=<seqid>:<crc32 hash stateid.other>", also "openstateid=", "layoutstateid=", and "lockstateid=" for open_file, layoutget, set_lock tracepoints. New function is added to internal.h, nfs_stateid_hash(), to compute the hash trace_nfs4_setattr() is moved from nfs4_do_setattr() to _nfs4_do_setattr() to get access to stateid. trace_nfs4_setattr and trace_nfs4_delegreturn are changed from INODE_EVENT to new event type, INODE_STATEID_EVENT which is same as INODE_EVENT but adds stateid information for locking tracepoints, moved trace_nfs4_set_lock() into _nfs4_do_setlk() to get access to stateid information, and removed trace_nfs4_lock_reclaim(), trace_nfs4_lock_expired() as they call into _nfs4_do_setlk() and both were previously same LOCK_EVENT type. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFS: Allow the combination pNFS and labeled NFSTrond Myklebust1-0/+3
Fix the nfs4_pnfs_open_bitmap so that it also allows for labeled NFS. Signed-off-by: Trond Myklebust <trond,[email protected]>
2015-12-28NFS42: handle layoutstats stateid errorPeng Tao1-2/+27
When server returns layoutstats stateid error, we should invalidate client's layout so that next IO can trigger new layoutget. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28nfs: Fix race in __update_open_stateid()Andrew Elble1-1/+1
We've seen this in a packet capture - I've intermixed what I think was going on. The fix here is to grab the so_lock sooner. 1964379 -> #1 open (for write) reply seqid=1 1964393 -> #2 open (for read) reply seqid=2 __nfs4_close(), state->n_wronly-- nfs4_state_set_mode_locked(), changes state->state = [R] state->flags is [RW] state->state is [R], state->n_wronly == 0, state->n_rdonly == 1 1964398 -> #3 open (for write) call -> because close is already running 1964399 -> downgrade (to read) call seqid=2 (close of #1) 1964402 -> #3 open (for write) reply seqid=3 __update_open_stateid() nfs_set_open_stateid_locked(), changes state->flags state->flags is [RW] state->state is [R], state->n_wronly == 0, state->n_rdonly == 1 new sequence number is exposed now via nfs4_stateid_copy() next step would be update_open_stateflags(), pending so_lock 1964403 -> downgrade reply seqid=2, fails with OLD_STATEID (close of #1) nfs4_close_prepare() gets so_lock and recalcs flags -> send close 1964405 -> downgrade (to read) call seqid=3 (close of #1 retry) __update_open_stateid() gets so_lock * update_open_stateflags() updates state->n_wronly. nfs4_state_set_mode_locked() updates state->state state->flags is [RW] state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1 * should have suppressed the preceding nfs4_close_prepare() from sending open_downgrade 1964406 -> write call 1964408 -> downgrade (to read) reply seqid=4 (close of #1 retry) nfs_clear_open_stateid_locked() state->flags is [R] state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1 1964409 -> write reply (fails, openmode) Signed-off-by: Andrew Elble <[email protected]> Cc: stable@vger,kernel.org Signed-off-by: Trond Myklebust <[email protected]>