aboutsummaryrefslogtreecommitdiff
path: root/fs/nfs/flexfilelayout/flexfilelayout.c
AgeCommit message (Collapse)AuthorFilesLines
2017-11-17fs, nfs: convert nfs_client.cl_count from atomic_t to refcount_tElena Reshetova1-6/+6
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs_client.cl_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <[email protected]> Reviewed-by: David Windsor <[email protected]> Reviewed-by: Hans Liljestrand <[email protected]> Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2017-11-17fs, nfs: convert nfs4_ff_layout_mirror.ref from atomic_t to refcount_tElena Reshetova1-4/+4
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nfs4_ff_layout_mirror.ref is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <[email protected]> Reviewed-by: David Windsor <[email protected]> Reviewed-by: Hans Liljestrand <[email protected]> Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2017-07-19pNFS/flexfiles: Handle expired layout segments in ff_layout_initiate_commit()Trond Myklebust1-0/+4
If the layout has expired due to a fencing event, then we should not attempt to commit to the DS. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2017-07-13PNFS for stateid errors retry against MDS firstOlga Kornievskaia1-33/+0
Upon receiving a stateid error such as BAD_STATEID, the client should retry the operation against the MDS before deciding to do stateid recovery. Previously, the code would initiate state recovery and it could lead to a race in a state manager that could chose an incorrect recovery method which would lead to the EIO failure for the application. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2017-05-24pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()Dan Carpenter1-0/+1
If xdr_inline_decode() fails then we end up returning ERR_PTR(0). The caller treats NULL returns as -ENOMEM so it doesn't really hurt runtime, but obviously we intended to set an error code here. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2017-05-09pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabledTrond Myklebust1-0/+11
Layoutstats is always desirable when using the flexfiles driver, so we should enable it if that driver is being loaded. It is safe to do so, because even when the mount specifies NFSv4.1, we will turn it off if the server tells us it is unsupported. Signed-off-by: Trond Myklebust <[email protected]>
2017-04-29pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure pathTrond Myklebust1-3/+8
If the attempt to write through pNFS fails, we need to use the same failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS flag is set or we have sufficient valid DSes, then we must retry through pNFS Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <[email protected]>
2017-04-25pNFS: Ensure we check layout segment validity in the pg_init() callbackTrond Myklebust1-0/+2
If we have a layout segment cached in pgio->pg_lseg, we should check it for validity before reusing it in a new RPC request. Otherwise, if we recoalesce, we can end up looping forever. Signed-off-by: Trond Myklebust <[email protected]>
2017-03-01Merge tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds1-39/+21
Pull NFS client updates from Anna Schumaker: "Highlights include: Stable bugfixes: - NFSv4: Fix memory and state leak in _nfs4_open_and_get_state - xprtrdma: Fix Read chunk padding - xprtrdma: Per-connection pad optimization - xprtrdma: Disable pad optimization by default - xprtrdma: Reduce required number of send SGEs - nlm: Ensure callback code also checks that the files match - pNFS/flexfiles: If the layout is invalid, it must be updated before retrying - NFSv4: Fix reboot recovery in copy offload - Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE" - NFSv4: fix getacl head length estimation - NFSv4: fix getacl ERANGE for sum ACL buffer sizes Features: - Add and use dprintk_cont macros - Various cleanups to NFS v4.x to reduce code duplication and complexity - Remove unused cr_magic related code - Improvements to sunrpc "read from buffer" code - Clean up sunrpc timeout code and allow changing TCP timeout parameters - Remove duplicate mw_list management code in xprtrdma - Add generic functions for encoding and decoding xdr streams Bugfixes: - Clean up nfs_show_mountd_netid - Make layoutreturn_ops static and use NULL instead of 0 to fix sparse warnings - Properly handle -ERESTARTSYS in nfs_rename() - Check if register_shrinker() failed during rpcauth_init() - Properly clean up procfs/pipefs entries - Various NFS over RDMA related fixes - Silence unititialized variable warning in sunrpc" * tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (64 commits) NFSv4: fix getacl ERANGE for some ACL buffer sizes NFSv4: fix getacl head length estimation Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE" NFSv4: Fix reboot recovery in copy offload pNFS/flexfiles: If the layout is invalid, it must be updated before retrying NFSv4: Clean up owner/group attribute decode SUNRPC: Add a helper function xdr_stream_decode_string_dup() NFSv4: Remove bogus "struct nfs_client" argument from decode_ace() NFSv4: Fix the underestimation of delegation XDR space reservation NFSv4: Replace callback string decode function with a generic NFSv4: Replace the open coded decode_opaque_inline() with the new generic NFSv4: Replace ad-hoc xdr encode/decode helpers with xdr_stream_* generics SUNRPC: Add generic helpers for xdr_stream encode/decode sunrpc: silence uninitialized variable warning nlm: Ensure callback code also checks that the files match sunrpc: Allow xprt->ops->timer method to sleep xprtrdma: Refactor management of mw_list field xprtrdma: Handle stale connection rejection xprtrdma: Properly recover FRWRs with in-flight FASTREG WRs xprtrdma: Shrink send SGEs array ...
2017-02-27lib/vsprintf.c: remove %Z supportAlexey Dobriyan1-2/+2
Now that %z is standartised in C99 there is no reason to support %Z. Unlike %L it doesn't even make format strings smaller. Use BUILD_BUG_ON in a couple ATM drivers. In case anyone didn't notice lib/vsprintf.o is about half of SLUB which is in my opinion is quite an achievement. Hopefully this patch inspires someone else to trim vsprintf.c more. Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22pNFS/flexfiles: If the layout is invalid, it must be updated before retryingTrond Myklebust1-6/+7
If we see that our pNFS READ/WRITE/COMMIT operation failed, but we also see that our layout segment is no longer valid, then we need to get a new layout segment before retrying. Fixes: 90816d1ddacf ("NFSv4.1/flexfiles: Don't mark the entire deviceid...") Cc: [email protected] # v4.2+ Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2017-02-21NFSv4: Replace ad-hoc xdr encode/decode helpers with xdr_stream_* genericsTrond Myklebust1-4/+1
Signed-off-by: Trond Myklebust <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2017-01-30pNFS/flexfiles: Make local symbol layoutreturn_ops staticWei Yongjun1-1/+1
Fixes the following sparse warning: fs/nfs/flexfilelayout/flexfilelayout.c:2114:34: warning: symbol 'layoutreturn_ops' was not declared. Should it be static? Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2017-01-30NFS: Use nfs4_setup_sequence() everywhereAnna Schumaker1-28/+12
This does the right thing depending on if we have a session, rather than needing to handle this manually in multiple places. Signed-off-by: Anna Schumaker <[email protected]>
2016-12-25ktime: Get rid of ktime_equal()Thomas Gleixner1-1/+1
No point in going through loops and hoops instead of just comparing the values. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]>
2016-12-25ktime: Get rid of the unionThomas Gleixner1-2/+1
ktime is a union because the initial implementation stored the time in scalar nanoseconds on 64 bit machine and in a endianess optimized timespec variant for 32bit machines. The Y2038 cleanup removed the timespec variant and switched everything to scalar nanoseconds. The union remained, but become completely pointless. Get rid of the union and just keep ktime_t as simple typedef of type s64. The conversion was done with coccinelle and some manual mopping up. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]>
2016-12-19pNFS/flexfiles: delete deviceid, don't mark inactiveWeston Andros Adamson1-2/+4
Instead of marking a device inactive, remove it from the cache entirely. Flexfiles has a way to report errors back to the server, so we don't want to stop devices from being tried again for 120 seconds. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2016-12-09pNFS/flexfiles: Ensure we have enough buffer for layoutreturnTrond Myklebust1-6/+26
The flexfiles client can piggyback both layout errors and layoutstats as part of the layoutreturn. Both these payloads can get large, with 20 layout error entries taking up about 1.2K, and 4 layoutstats entries taking up another 1K. This patch allows a maximum payload of 4k by allocating a full page. Signed-off-by: Trond Myklebust <[email protected]>
2016-12-09pNFS/flexfiles: Remove a redundant parameter in ff_layout_encode_ioerr()Trond Myklebust1-6/+4
Signed-off-by: Trond Myklebust <[email protected]>
2016-12-08pNFS/flexfiles: Fix a deadlock on LAYOUTGETFred Isaman1-34/+3
We encountered a deadlock where the SEQUENCE that accompanied the LAYOUTGET triggered a session drain, while ff_layout_alloc_lseg triggered a GETDEVICEINFO. The GETDEVICEINFO hung waiting for the session drain, while the LAYOUTGET held the slot waiting for alloc_lseg to finish. Avoid this by moving the call to nfs4_find_get_deviceid out of ff_layout_alloc_lseg and into nfs4_ff_layout_prepare_ds. Signed-off-by: Fred Isaman <[email protected]> [[email protected]: pNFS/flexfiles: fix races in ff_layout_mirror_valid] Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2016-12-03pNFS/flexfiles: Support sending layoutstats in layoutreturnTrond Myklebust1-6/+76
Add the ability to send an array of layoutstats entries as part of layoutreturn. Signed-off-by: Trond Myklebust <[email protected]>
2016-12-03pNFS/flexfiles: Minor refactoring before adding iostats to layoutreturnTrond Myklebust1-25/+34
Signed-off-by: Trond Myklebust <[email protected]>
2016-12-03NFS: Fix up read of mirror statsTrond Myklebust1-0/+2
Need to lock while reading in order to ensure 64-bit reads are correct. Signed-off-by: Trond Myklebust <[email protected]>
2016-12-03pNFS/flexfiles: Clean up layoutstatsTrond Myklebust1-20/+7
Signed-off-by: Trond Myklebust <[email protected]>
2016-12-03pNFS/flexfiles: Refactor encoding of the layoutreturn payloadTrond Myklebust1-13/+50
Add the layout error payload to the flexfiles layoutreturn private data, and set up the encoding mechanisms. This is a refactoring in preparation for adding the layout iostats payload. Signed-off-by: Trond Myklebust <[email protected]>
2016-12-02pNFS/flexfiles: Only send layoutstats updates for mirrors that were updatedTrond Myklebust1-0/+6
If there have been no reads or writes to a given mirror since the last layoutstats update, then don't resend the same data. Signed-off-by: Trond Myklebust <[email protected]>
2016-12-02pNFS/flexfiles: Don't attempt to send layoutstats if there are no entriesTrond Myklebust1-0/+5
If the list of mirrors is empty, then don't send an RPC call. Signed-off-by: Trond Myklebust <[email protected]>
2016-12-01pNFS: Get rid of unnecessary layout parameter in encode_layoutreturn callbackTrond Myklebust1-2/+2
The parameter is already present in the "args" structure. Signed-off-by: Trond Myklebust <[email protected]>
2016-12-01pNFS: Fix a deadlock between read resends and layoutreturnTrond Myklebust1-0/+4
We must not call nfs_pageio_init_read() on a new nfs_pageio_descriptor while holding a reference to a layout segment, as that can deadlock pnfs_update_layout(). Fixes: d67ae825a59d6 ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <[email protected]> Cc: [email protected] # v4.0+
2016-09-27NFSv4.x: Allow callers of nfs_remove_bad_delegation() to specify a stateidTrond Myklebust1-1/+1
Allow the callers of nfs_remove_bad_delegation() to specify the stateid that needs to be marked as bad. Signed-off-by: Trond Myklebust <[email protected]> Tested-by: Oleg Drokin <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-08-29pNFS/flexfiles: Fix an Oopsable condition when connection to the DS failsTrond Myklebust1-19/+18
If the attempt to connect to a DS fails inside ff_layout_pg_init_read or ff_layout_pg_init_write, then we currently end up clearing the layout segment carried by the struct nfs_pageio_descriptor, causing an Oops when we later call into ff_layout_read_pagelist/ff_layout_write_pagelist. The fix is to ensure we return the layout and then retry. Fixes: 446ca2195303 ("pNFS/flexfiles: When initing reads or writes, we...") Cc: [email protected] # v4.7+ Signed-off-by: Trond Myklebust <[email protected]>
2016-08-14pNFS/flexfiles: Fix layoutstat periodic reportingTrond Myklebust1-4/+4
Putting the periodicity timer in the mirror instances is causing non-scalable reporting behaviour and missed reporting intervals. When you recall layouts and/or implement client side mirroring, it leads to consecutive reports with only a few ms between RPC calls. Signed-off-by: Trond Myklebust <[email protected]> Fixes: d0379a5d066a9 ("pNFS/flexfiles: Support server-supplied...")
2016-07-05pNFS: Files and flexfiles always need to commit before layoutcommitTrond Myklebust1-2/+5
So ensure that we mark the layout for commit once the write is done, and then ensure that the commit to ds is finished before sending layoutcommit. Note that by doing this, we're able to optimise away the commit for the case of servers that don't need layoutcommit in order to return updated attributes. Signed-off-by: Trond Myklebust <[email protected]>
2016-07-05pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit()Trond Myklebust1-9/+10
Let's just have one place where we check ff_layout_need_layoutcommit(). Signed-off-by: Trond Myklebust <[email protected]>
2016-07-05pNFS/flexfiles: Fix layoutcommit after a commit to DSTrond Myklebust1-2/+1
We should always do a layoutcommit after commit to DS, except if the layout segment we're using has set FF_FLAGS_NO_LAYOUTCOMMIT. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <[email protected]>
2016-05-26pnfs: pnfs_update_layout needs to consider if strict iomode checking is onTom Haynes1-13/+36
As flexfiles has FF_FLAGS_NO_READ_IO, there is a need to generically support enforcing that a IOMODE_RW segment will not allow READ I/O. Signed-off-by: Tom Haynes <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-26nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and ↵Tom Haynes1-2/+3
reading is disabled Signed-off-by: Tom Haynes <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-17flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTEDJeff Layton1-2/+0
Setting just the NFS_LAYOUT_RETURN_REQUESTED flag doesn't do anything, unless there are lsegs that are also being marked for return. At the point where that happens this flag is also set, so these set_bit calls don't do anything useful. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-17pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit setJeff Layton1-2/+2
Otherwise, we'll end up returning layouts that we've just received if the client issues a new LAYOUTGET prior to the LAYOUTRETURN. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-17pNFS/flexfiles: When initing reads or writes, we might have to retry ↵Tom Haynes1-4/+25
connecting to DSes If we are initializing reads or writes and can not connect to a DS, then check whether or not IO is allowed through the MDS. If it is allowed, reset to the MDS. Else, fail the layout segment and force a retry of a new layout segment. Signed-off-by: Tom Haynes <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-17pNFS/flexfiles: When checking for available DSes, conditionally check for MDS ioTom Haynes1-3/+2
Whenever we check to see if we have the needed number of DSes for the action, we may also have to check to see whether IO is allowed to go to the MDS or not. [jlayton: fix merge conflict due to lack of localio patches here] Signed-off-by: Tom Haynes <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-17pNFS/flexfile: Fix erroneous fall back to read/write through the MDSTrond Myklebust1-17/+6
This patch fixes a problem whereby the pNFS client falls back to doing reads and writes through the metadata server even when the layout flag FF_FLAGS_NO_IO_THRU_MDS is set. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-17NFSv4: Label stateids with the typeTrond Myklebust1-3/+4
In order to more easily distinguish what kind of stateid we are dealing with, introduce a type that can be used to label the stateid structure. The label will be useful both for debugging, but also when dealing with operations like SETATTR, READ and WRITE that can take several different types of stateid as arguments. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-09nfs: have flexfiles mirror keep creds for both ro and rw layoutsJeff Layton1-7/+23
A mirror can be shared between multiple layouts, even with different iomodes. That makes stats gathering simpler, but it causes a problem when we get different creds in READ vs. RW layouts. The current code drops the newer credentials onto the floor when this occurs. That's problematic when you fetch a READ layout first, and then a RW. If the READ layout doesn't have the correct creds to do a write, then writes will fail. We could just overwrite the READ credentials with the RW ones, but that would break the ability for the server to fence the layout for reads if things go awry. We need to be able to revert to the earlier READ creds if the RW layout is returned afterward. The simplest fix is to just keep two sets of creds per mirror. One for READ layouts and one for RW, and then use the appropriate set depending on the iomode of the layout segment. Also fix up some RCU nits that sparse found. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-09nfs: get a reference to the credential in ff_layout_alloc_lsegJeff Layton1-7/+34
We're just as likely to have allocation problems here as we would if we delay looking up the credential like we currently do. Fix the code to get a rpc_cred reference early, as soon as the mirror is set up. This allows us to eliminate the mirror early if there is a problem getting an rpc credential. This also allows us to drop the uid/gid from the layout_mirror struct as well. In the event that we find an existing mirror where this one would go, we swap in the new creds unconditionally, and drop the reference to the old one. Note that the old ff_layout_update_mirror_cred function wouldn't set this pointer unless the DS version was 3, but we don't know what the DS version is at this point. I'm a little unclear on why it did that as you still need creds to talk to v4 servers as well. I have the code set it regardless of the DS version here. Also note the change to using generic creds instead of calling lookup_cred directly. With that change, we also need to populate the group_info pointer in the acred as some functions expect that to never be NULL. Instead of allocating one every time however, we can allocate one when the module is loaded and share it since the group_info is refcounted. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-09nfs: have ff_layout_get_ds_cred take a reference to the credJeff Layton1-6/+9
In later patches, we're going to want to allow the creds to be updated when we get a new layout with updated creds. Have this function take a reference to the cred that is later put once the call has been dispatched. Also, prepare for this change by ensuring we follow RCU rules when getting a reference to the cred as well. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-05-09NFS: Save struct inode * inside nfs_commit_info to clarify usage of i_lockDave Wysochanski1-2/+2
Commit ea2cf22 created nfs_commit_info and saved &inode->i_lock inside this NFS specific structure. This obscures the usage of i_lock. Instead, save struct inode * so later it's clear the spinlock taken is i_lock. Should be no functional change. Signed-off-by: Dave Wysochanski <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2016-01-27NFS: Cleanup - rename NFS_LAYOUT_RETURN_BEFORE_CLOSETrond Myklebust1-1/+1
NFS_LAYOUT_RETURN_BEFORE_CLOSE is being used to signal that a layoutreturn is needed, either due to a layout recall or to a layout error. Rename it to NFS_LAYOUT_RETURN_REQUESTED in order to clarify its purpose. Signed-off-by: Trond Myklebust <[email protected]>
2016-01-22Merge branch 'bugfixes'Trond Myklebust1-4/+2
* bugfixes: pNFS/flexfiles: Fix an XDR encoding bug in layoutreturn pNFS/flexfiles: Improve merging of errors in LAYOUTRETURN
2016-01-22pNFS/flexfiles: Fix an XDR encoding bug in layoutreturnTrond Myklebust1-4/+2
We must not skip encoding the statistics, or the server will see an XDR encoding error. Signed-off-by: Trond Myklebust <[email protected]> Cc: [email protected] # 4.0+