aboutsummaryrefslogtreecommitdiff
path: root/fs/nfs/flexfilelayout/flexfilelayout.c
AgeCommit message (Collapse)AuthorFilesLines
2016-08-29pNFS/flexfiles: Fix an Oopsable condition when connection to the DS failsTrond Myklebust1-19/+18
If the attempt to connect to a DS fails inside ff_layout_pg_init_read or ff_layout_pg_init_write, then we currently end up clearing the layout segment carried by the struct nfs_pageio_descriptor, causing an Oops when we later call into ff_layout_read_pagelist/ff_layout_write_pagelist. The fix is to ensure we return the layout and then retry. Fixes: 446ca2195303 ("pNFS/flexfiles: When initing reads or writes, we...") Cc: [email protected] # v4.7+ Signed-off-by: Trond Myklebust <[email protected]>
2016-08-14pNFS/flexfiles: Fix layoutstat periodic reportingTrond Myklebust1-4/+4
Putting the periodicity timer in the mirror instances is causing non-scalable reporting behaviour and missed reporting intervals. When you recall layouts and/or implement client side mirroring, it leads to consecutive reports with only a few ms between RPC calls. Signed-off-by: Trond Myklebust <[email protected]> Fixes: d0379a5d066a9 ("pNFS/flexfiles: Support server-supplied...")
2016-07-05pNFS: Files and flexfiles always need to commit before layoutcommitTrond Myklebust1-2/+5
So ensure that we mark the layout for commit once the write is done, and then ensure that the commit to ds is finished before sending layoutcommit. Note that by doing this, we're able to optimise away the commit for the case of servers that don't need layoutcommit in order to return updated attributes. Signed-off-by: Trond Myklebust <[email protected]>
2016-07-05pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit()Trond Myklebust1-9/+10
Let's just have one place where we check ff_layout_need_layoutcommit(). Signed-off-by: Trond Myklebust <[email protected]>
2016-07-05pNFS/flexfiles: Fix layoutcommit after a commit to DSTrond Myklebust1-2/+1
We should always do a layoutcommit after commit to DS, except if the layout segment we're using has set FF_FLAGS_NO_LAYOUTCOMMIT. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <[email protected]>
2016-05-26pnfs: pnfs_update_layout needs to consider if strict iomode checking is onTom Haynes1-13/+36
As flexfiles has FF_FLAGS_NO_READ_IO, there is a need to generically support enforcing that a IOMODE_RW segment will not allow READ I/O. Signed-off-by: Tom Haynes <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-26nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and ↵Tom Haynes1-2/+3
reading is disabled Signed-off-by: Tom Haynes <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-17flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTEDJeff Layton1-2/+0
Setting just the NFS_LAYOUT_RETURN_REQUESTED flag doesn't do anything, unless there are lsegs that are also being marked for return. At the point where that happens this flag is also set, so these set_bit calls don't do anything useful. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-17pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit setJeff Layton1-2/+2
Otherwise, we'll end up returning layouts that we've just received if the client issues a new LAYOUTGET prior to the LAYOUTRETURN. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-17pNFS/flexfiles: When initing reads or writes, we might have to retry ↵Tom Haynes1-4/+25
connecting to DSes If we are initializing reads or writes and can not connect to a DS, then check whether or not IO is allowed through the MDS. If it is allowed, reset to the MDS. Else, fail the layout segment and force a retry of a new layout segment. Signed-off-by: Tom Haynes <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-17pNFS/flexfiles: When checking for available DSes, conditionally check for MDS ioTom Haynes1-3/+2
Whenever we check to see if we have the needed number of DSes for the action, we may also have to check to see whether IO is allowed to go to the MDS or not. [jlayton: fix merge conflict due to lack of localio patches here] Signed-off-by: Tom Haynes <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-17pNFS/flexfile: Fix erroneous fall back to read/write through the MDSTrond Myklebust1-17/+6
This patch fixes a problem whereby the pNFS client falls back to doing reads and writes through the metadata server even when the layout flag FF_FLAGS_NO_IO_THRU_MDS is set. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-17NFSv4: Label stateids with the typeTrond Myklebust1-3/+4
In order to more easily distinguish what kind of stateid we are dealing with, introduce a type that can be used to label the stateid structure. The label will be useful both for debugging, but also when dealing with operations like SETATTR, READ and WRITE that can take several different types of stateid as arguments. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-09nfs: have flexfiles mirror keep creds for both ro and rw layoutsJeff Layton1-7/+23
A mirror can be shared between multiple layouts, even with different iomodes. That makes stats gathering simpler, but it causes a problem when we get different creds in READ vs. RW layouts. The current code drops the newer credentials onto the floor when this occurs. That's problematic when you fetch a READ layout first, and then a RW. If the READ layout doesn't have the correct creds to do a write, then writes will fail. We could just overwrite the READ credentials with the RW ones, but that would break the ability for the server to fence the layout for reads if things go awry. We need to be able to revert to the earlier READ creds if the RW layout is returned afterward. The simplest fix is to just keep two sets of creds per mirror. One for READ layouts and one for RW, and then use the appropriate set depending on the iomode of the layout segment. Also fix up some RCU nits that sparse found. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-09nfs: get a reference to the credential in ff_layout_alloc_lsegJeff Layton1-7/+34
We're just as likely to have allocation problems here as we would if we delay looking up the credential like we currently do. Fix the code to get a rpc_cred reference early, as soon as the mirror is set up. This allows us to eliminate the mirror early if there is a problem getting an rpc credential. This also allows us to drop the uid/gid from the layout_mirror struct as well. In the event that we find an existing mirror where this one would go, we swap in the new creds unconditionally, and drop the reference to the old one. Note that the old ff_layout_update_mirror_cred function wouldn't set this pointer unless the DS version was 3, but we don't know what the DS version is at this point. I'm a little unclear on why it did that as you still need creds to talk to v4 servers as well. I have the code set it regardless of the DS version here. Also note the change to using generic creds instead of calling lookup_cred directly. With that change, we also need to populate the group_info pointer in the acred as some functions expect that to never be NULL. Instead of allocating one every time however, we can allocate one when the module is loaded and share it since the group_info is refcounted. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-09nfs: have ff_layout_get_ds_cred take a reference to the credJeff Layton1-6/+9
In later patches, we're going to want to allow the creds to be updated when we get a new layout with updated creds. Have this function take a reference to the cred that is later put once the call has been dispatched. Also, prepare for this change by ensuring we follow RCU rules when getting a reference to the cred as well. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-09NFS: Save struct inode * inside nfs_commit_info to clarify usage of i_lockDave Wysochanski1-2/+2
Commit ea2cf22 created nfs_commit_info and saved &inode->i_lock inside this NFS specific structure. This obscures the usage of i_lock. Instead, save struct inode * so later it's clear the spinlock taken is i_lock. Should be no functional change. Signed-off-by: Dave Wysochanski <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-01-27NFS: Cleanup - rename NFS_LAYOUT_RETURN_BEFORE_CLOSETrond Myklebust1-1/+1
NFS_LAYOUT_RETURN_BEFORE_CLOSE is being used to signal that a layoutreturn is needed, either due to a layout recall or to a layout error. Rename it to NFS_LAYOUT_RETURN_REQUESTED in order to clarify its purpose. Signed-off-by: Trond Myklebust <[email protected]>
2016-01-22Merge branch 'bugfixes'Trond Myklebust1-4/+2
* bugfixes: pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn pNFS/flexfiles: Improve merging of errors in LAYOUTRETURN
2016-01-22pNFS/flexfiles: Fix an XDR encoding bug in layoutreturnTrond Myklebust1-4/+2
We must not skip encoding the statistics, or the server will see an XDR encoding error. Signed-off-by: Trond Myklebust <[email protected]> Cc: [email protected] # 4.0+
2016-01-07Merge branch 'bugfixes'Trond Myklebust1-1/+1
* bugfixes: SUNRPC: Fixup socket wait for memory SUNRPC: Fix a missing break in rpc_anyaddr() pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh() NFS: Fix attribute cache revalidation NFS: Ensure we revalidate attributes before using execute_ok() NFS: Flush reclaim writes using FLUSH_COND_STABLE NFS: Background flush should not be low priority NFSv4.1/pnfs: Fixup an lo->plh_block_lgets imbalance in layoutreturn NFSv4: Don't perform cached access checks before we've OPENed the file NFS: Allow the combination pNFS and labeled NFS NFS42: handle layoutstats stateid error nfs: Fix race in __update_open_stateid() nfs: fix missing assignment in nfs4_sequence_done tracepoint
2016-01-04Merge branch 'pnfs_generic'Trond Myklebust1-12/+1
* pnfs_generic: NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid() NFSv4.1/pNFS: Fix a race in initiate_file_draining() NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn() NFS: Relax requirements in nfs_flush_incompatible NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid NFS: Allow multiple commit requests in flight per file NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs NFSv4: List stateid information in the callback tracepoints NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1 pNFS: If we have to delay the layout callback, mark the layout for return NFSv4.1/pNFS: Add a helper to mark the layout as returned pNFS: Ensure nfs4_layoutget_prepare returns the correct error
2015-12-31NFS/pNFS: Fix up pNFS write reschedule layering violations and bugsTrond Myklebust1-12/+1
The flexfiles layout in particular, seems to want to poke around in the O_DIRECT flags when retransmitting. This patch sets up an interface to allow it to call back into O_DIRECT to handle retransmission correctly. It also fixes a potential bug whereby we could change the behaviour of O_DIRECT if an error is already pending. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-30pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh()Trond Myklebust1-1/+1
Jeff reports seeing an Oops in ff_layout_alloc_lseg. Turns out copy+paste has played cruel tricks on a nested loop. Reported-by: Jeff Layton <[email protected]> Cc: [email protected] # 4.3+ Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS/flexfiles: Ensure we record layoutstats even if RPC is terminated earlyTrond Myklebust1-6/+31
Currently, we will only record the layoutstats correctly if the RPC call successfully obtains a slot. If we exit before that happens, then we may find ourselves starting the busy timer through the call in ff_layout_(read|write)_prepare_layoutstats, but never stopping it. The same thing happens if we're doing DA-DS. The fix is to ensure that we catch these cases in the rpc_release() callback. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS: Add flag to track if we've called nfs4_ff_layout_stat_io_start_read/writeTrond Myklebust1-25/+70
Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS/flexfiles: Fix a statistics gathering imbalanceTrond Myklebust1-1/+1
When we replay a failed read, write or commit to the dataserver, we need to ensure that we call ff_layout_read_prepare_v3(), ff_layout_write_prepare_v3 or ff_layout_commit_prepare_v3() so that we reset the statistics. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS/flexfiles: Don't prevent flexfiles client from retrying LAYOUTGETTrond Myklebust1-4/+0
Fix a bug in which flexfiles clients are falling back to I/O through the MDS even when the FF_FLAGS_NO_IO_THRU_MDS flag is set. The flexfiles client will always report errors through the LAYOUTRETURN and/or LAYOUTERROR mechanisms, so it should normally be safe for it to retry the LAYOUTGET until it fails or succeeds. Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pnfs/flexfiles: count io stat in rpc_count_stats callbackPeng Tao1-12/+10
If client ever restarts IO due to some errors, we'll endup mis-counting IO stats if we do the counting in .rpc_done callback. Move it to .rpc_count_stats callback that is only called when releasing RPC. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pnfs/flexfiles: do not mark delay-like status as DS failurePeng Tao1-1/+8
We just need to delay and retry in these cases. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28NFS41: pop some layoutget errors to applicationPeng Tao1-3/+22
For ERESTARTSYS/EIO/EROFS/ENOSPC/E2BIG in layoutget, we should just bail out instead of hiding the error and retrying inband IO. Change all the call sites to pop the error all the way up. Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-12-28pNFS/flexfiles: Support server-supplied layoutstats sampling periodTrond Myklebust1-3/+13
Some servers want to be able to control the frequency with which clients report layoutstats, for instance, in order to monitor QoS for a particular file or set of file. In order to support this, the flexfiles layout allows the server to pass this info as a hint in the layout payload. Signed-off-by: Trond Myklebust <[email protected]>
2015-11-02pNFS/flexfiles: Add support for FF_FLAGS_NO_IO_THRU_MDSTrond Myklebust1-2/+16
For loosely coupled pNFS/flexfiles systems, there is often no advantage at all in going through the MDS for I/O, since the MDS is subject to the same limitations as all other clients when talking to DSes. If a DS is unresponsive, I/O through the MDS will fail. For such systems, the only scalable solution is to have the pNFS clients retry doing pNFS, and so the protocol now provides a flag that allows the pNFS server to signal this. If LAYOUTGET returns FF_FLAGS_NO_IO_THRU_MDS, then we should assume that the MDS wants the client to retry using these devices, even if they were previously marked as being unavailable. To do so, we add a helper, ff_layout_mark_devices_valid() that will be called from layoutget. Signed-off-by: Trond Myklebust <[email protected]>
2015-11-02pNFS/flexfiles: When mirrored, retry failed reads by switching mirrorsTrond Myklebust1-8/+14
If the pNFS/flexfiles file is mirrored, and a read to one mirror fails, then we should bump the mirror index, so that we retry to a different mirror. Once we've iterated through all mirrors and all failed, we can return the layout and issue a new LAYOUTGET. Signed-off-by: Trond Myklebust <[email protected]>
2015-09-02NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cbTrond Myklebust1-11/+9
Signed-off-by: Trond Myklebust <[email protected]>
2015-09-02NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error()Trond Myklebust1-9/+1
When I/O cannot complete due to a fatal error on the DS, ensure that we invalidate the corresponding layout segment and return it. Signed-off-by: Trond Myklebust <[email protected]>
2015-09-01NFSv4.1/flexfiles: Fix freeing of mirrorsTrond Myklebust1-12/+2
Mirrors are now shared objects, so we should not be freeing them directly inside ff_layout_free_lseg(). We should already be doing the right thing in _ff_layout_free_lseg(), so just let it handle things. Also ensure that ff_layout_free_mirror() frees the RPC credential if it is set. Fixes: 28a0d72c6867 ("Add refcounting to struct nfs4_ff_layout_mirror") Signed-off-by: Trond Myklebust <[email protected]>
2015-08-30NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errorsTrond Myklebust1-8/+16
If the file was fenced and/or has been deleted on the DS, then we want to retry pNFS after a layoutreturn with error report. If the server cannot fix the problem, then we rely on it to tell us so in the response to the LAYOUTGET. Signed-off-by: Trond Myklebust <[email protected]>
2015-08-25NFSv4.1/flexfiles: Allow coalescing of new layout segments and existing onesTrond Myklebust1-0/+60
In order to ensure atomicity of updates, we merge the old layout segments into the new ones, and then invalidate the old ones. Also ensure that we order the list of layout segments so that RO segments are preferred over RW. Signed-off-by: Trond Myklebust <[email protected]>
2015-08-25NFSv4.1/flexfile: ff_layout_remove_mirror can be statickbuild test robot1-1/+1
Signed-off-by: Fengguang Wu <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-08-25NFSv4.2/pnfs: Make the layoutstats timer configurableTrond Myklebust1-1/+4
Allow advanced users to set the layoutstats timer in order to lengthen or shorten the period between layoutstat transmissions to the server. Signed-off-by: Trond Myklebust <[email protected]>
2015-08-25NFSv4.1/flexfile: Ensure uniqueness of mirrors across layout segmentsTrond Myklebust1-29/+96
Keep the full list of mirrors in the struct nfs4_ff_layout_mirror so that they can be shared among the layout segments that use them. Also ensure that we send out only one copy of the layoutstats per mirror. Signed-off-by: Trond Myklebust <[email protected]>
2015-08-25NFSv4.1/flexfiles: Remove mirror backpointer to lseg.Trond Myklebust1-13/+12
When we start sharing mirrors between several lsegs, we won't be able to keep it. Signed-off-by: Trond Myklebust <[email protected]>
2015-08-25NFSv4.1/flexfiles: Add refcounting to struct nfs4_ff_layout_mirrorTrond Myklebust1-9/+27
We do want to share mirrors between layout segments, so add a refcount to enable that. Signed-off-by: Trond Myklebust <[email protected]>
2015-08-25NFS41/flexfiles: zero out DS write wccPeng Tao1-0/+2
We do not want to update inode attributes with DS values. Cc: [email protected] # v4.0+ Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-08-20NFSv4.1/pnfs Ensure flexfiles reports all connection related errorsTrond Myklebust1-13/+35
Make sure that we also handle RPC level connection and protocol negotiation errors. Reported-by: Tom Haynes <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-08-20NFSv4.1/pnfs: Ensure the flexfiles layoutstats timers are consistentTrond Myklebust1-27/+24
We want to ensure that the stopwatches for the busy timer and the aggregate timer are consistent. This means that they need to use the same start/stop times. Signed-off-by: Trond Myklebust <[email protected]>
2015-08-19NFS41/flexfiles: update inode after write finishesPeng Tao1-0/+3
Otherwise we break fstest case tests/read_write/mctime.t Does files layout need the same fix as well? Cc: [email protected] # v4.0+ Cc: Anna Schumaker <[email protected]> Signed-off-by: Peng Tao <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-08-17nfs: remove some dead code in ff_layout_pg_get_mirror_count_writeJeff Layton1-2/+0
We already know that pg_lseg is NULL here. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-08-12NFSv4.2/pnfs: Use GFP_NOIO for layoutstat reporting in the writeback pathTrond Myklebust1-2/+4
Prevent a potential deadlock. Signed-off-by: Trond Myklebust <[email protected]>