aboutsummaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)AuthorFilesLines
2022-03-02NFS: Convert readdir page cache to use a cookie based indexTrond Myklebust2-84/+69
Instead of using a linear index to address the pages, use the cookie of the first entry, since that is what we use to match the page anyway. This allows us to avoid re-reading the entire cache on a seekdir() type of operation. The latter is very common when re-exporting NFS, and is a major performance drain. The change does affect our duplicate cookie detection, since we can no longer rely on the page index as a linear offset for detecting whether we looped backwards. However since we no longer do a linear search through all the pages on each call to nfs_readdir(), this is less of a concern than it was previously. The other downside is that invalidate_mapping_pages() no longer can use the page index to avoid clearing pages that have been read. A subsequent patch will restore the functionality this provides to the 'ls -l' heuristic. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Clean up page array initialisation/freeTrond Myklebust1-10/+6
Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Trace effects of the readdirplus heuristicTrond Myklebust2-1/+60
Enable tracking of when the readdirplus heuristic causes a page cache invalidation. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Trace effects of readdirplus on the dcacheTrond Myklebust2-0/+8
Trace the effects of readdirplus on attribute and dentry revalidation. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Add basic readdir tracingTrond Myklebust2-1/+80
Add tracing to track how often the client goes to the server for updated readdir information. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Don't request readdirplus when revalidation was forcedTrond Myklebust1-10/+16
If the revalidation was forced, due to the presence of a LOOKUP_EXCL or a LOOKUP_REVAL flag, then readdirplus won't help. It also can't help when we're doing a path component lookup. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Readdirplus can't help lookup for case insensitive filesystemsTrond Myklebust1-0/+2
If the filesystem is case insensitive, then readdirplus can't help with cache misses, since it won't return case folded variants of the filename. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFSv4: Ask for a full XDR buffer of readdir goodnessTrond Myklebust2-6/+7
Instead of pretending that we know the ratio of directory info vs readdirplus attribute info, just set the 'dircount' field to the same value as the 'maxcount' field. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Don't ask for readdirplus unless it can help nfs_getattr()Trond Myklebust1-20/+25
If attribute caching is turned off, then use of readdirplus is not going to help stat() performance. Readdirplus also doesn't help if a file is being written to, since we will have to flush those writes in order to sync the mtime/ctime. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Improve heuristic for readdirplusTrond Myklebust4-34/+55
The heuristic for readdirplus is designed to try to detect 'ls -l' and similar patterns. It does so by looking for cache hit/miss patterns in both the attribute cache and in the dcache of the files in a given directory, and then sets a flag for the readdirplus code to interpret. The problem with this approach is that a single attribute or dcache miss can cause the NFS code to force a refresh of the attributes for the entire set of files contained in the directory. To be able to make a more nuanced decision, let's sample the number of hits and misses in the set of open directory descriptors. That allows us to set thresholds at which we start preferring READDIRPLUS over regular READDIR, or at which we start to force a re-read of the remaining readdir cache using READDIRPLUS. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Reduce use of uncached readdirTrond Myklebust1-20/+3
When reading a very large directory, we want to try to keep the page cache up to date if doing so is inexpensive. With the change to allow readdir to continue reading even when the cache is incomplete, we no longer need to fall back to uncached readdir in order to scale to large directories. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Simplify nfs_readdir_xdr_to_array()Trond Myklebust1-18/+11
Recent changes to readdir mean that we can cope with partially filled page cache entries, so we no longer need to rely on looping in nfs_readdir_xdr_to_array(). Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: If the cookie verifier changes, we must invalidate the page cacheTrond Myklebust1-1/+6
Ensure that if the cookie verifier changes when we use the zero-valued cookie, then we invalidate any cached pages. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Adjust the amount of readahead performed by NFS readdirTrond Myklebust1-1/+52
The current NFS readdir code will always try to maximise the amount of readahead it performs on the assumption that we can cache anything that isn't immediately read by the process. There are several cases where this assumption breaks down, including when the 'ls -l' heuristic kicks in to try to force use of readdirplus as a batch replacement for lookup/getattr. This patch therefore tries to tone down the amount of readahead we perform, and adjust it to try to match the amount of data being requested by user space. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Don't advance the page pointer unless the page is fullTrond Myklebust1-10/+22
When we hit the end of the data in the readdir page, we don't want to start filling a new page, unless this one is full. Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Don't re-read the entire page cache to find the next cookieTrond Myklebust1-3/+7
If the page cache entry that was last read gets invalidated for some reason, then make sure we can re-create it on the next call to readdir. This, combined with the cache page validation, allows us to reuse the cached value of page-index on successive calls to nfs_readdir. Credit is due to Benjamin Coddington for showing that the concept works, and that it allows for improved cache sharing between processes even in the case where pages are lost due to LRU or active invalidation. Suggested-by: Benjamin Coddington <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-03-02NFS: Store the change attribute in the directory page cacheTrond Myklebust1-31/+37
Use the change attribute and the first cookie in a directory page cache entry to validate that the page is up to date. Suggested-by: Benjamin Coddington <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-02-28NFSD: Move svc_serv_ops::svo_function into struct svc_servChuck Lever1-32/+11
Hoist svo_function back into svc_serv and remove struct svc_serv_ops, since the struct is now devoid of fields. Signed-off-by: Chuck Lever <[email protected]>
2022-02-28NFSD: Remove svc_serv_ops::svo_moduleChuck Lever2-6/+2
struct svc_serv_ops is about to be removed. Neil Brown says: > I suspect svo_module can go as well - I don't think the thread is > ever the thing that primarily keeps a module active. A random sample of kthread_create() callers shows sunrpc is the only one that manages module reference count in this way. Suggested-by: Neil Brown <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-02-28SUNRPC: Remove svc_shutdown_net()Chuck Lever1-1/+1
Clean up: svc_shutdown_net() now does nothing but call svc_close_net(). Replace all external call sites. svc_close_net() is renamed to be the inverse of svc_xprt_create(). Signed-off-by: Chuck Lever <[email protected]>
2022-02-28SUNRPC: Rename svc_create_xprt()Chuck Lever1-6/+6
Clean up: Use the "svc_xprt_<task>" function naming convention as is used for other external APIs. Signed-off-by: Chuck Lever <[email protected]>
2022-02-28SUNRPC: Remove the .svo_enqueue_xprt methodChuck Lever1-2/+0
We have never been able to track down and address the underlying cause of the performance issues with workqueue-based service support. svo_enqueue_xprt is called multiple times per RPC, so it adds instruction path length, but always ends up at the same function: svc_xprt_do_enqueue(). We do not anticipate needing this flexibility for dynamic nfsd thread management support. As a micro-optimization, remove .svo_enqueue_xprt because Spectre/Meltdown makes virtual function calls more costly. This change essentially reverts commit b9e13cdfac70 ("nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operation"). Signed-off-by: Chuck Lever <[email protected]>
2022-02-28NFS: Calculate page offsets algorithmicallyTrond Myklebust1-5/+13
Instead of relying on counting the page offsets as we walk through the page cache, switch to calculating them algorithmically. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-28NFS: Use kzalloc() to avoid initialising the nfs_open_dir_contextTrond Myklebust1-7/+4
Signed-off-by: Trond Myklebust <[email protected]>
2022-02-28NFS: Initialise the readdir verifier as best we can in nfs_opendir()Trond Myklebust1-0/+1
For the purpose of ensuring that opendir() followed by seekdir() work as correctly as possible, try to initialise the readdir verifier in nfs_opendir(). Signed-off-by: Trond Myklebust <[email protected]>
2022-02-28NFS: Trace lookup revalidation failureTrond Myklebust1-12/+5
Enable tracing of lookup revalidation failures. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-28NFS: Return valid errors from nfs2/3_decode_dirent()Trond Myklebust2-16/+7
Valid return values for decode_dirent() callback functions are: 0: Success -EBADCOOKIE: End of directory -EAGAIN: End of xdr_stream All errors need to map into one of those three values. Fixes: 573c4e1ef53a ("NFS: Simplify ->decode_dirent() calling sequence") Signed-off-by: Trond Myklebust <[email protected]>
2022-02-28Revert "NFSv4: use unique client identifiers in network namespaces"Trond Myklebust1-14/+0
This reverts commit 50c790a0b69bdc420f00f30bdf348d6c90194c78. The functionality is believed to be capable of causing regressions in existing setups, so the author has requested that it be reverted. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFS: Use of mapping_set_error() results in spurious errorsTrond Myklebust1-1/+4
The use of mapping_set_error() in conjunction with calls to filemap_check_errors() is problematic because every error gets reported as either an EIO or an ENOSPC by filemap_check_errors() in functions such as filemap_write_and_wait() or filemap_write_and_wait_range(). In almost all cases, we prefer to use the more nuanced wb errors. Fixes: b8946d7bfb94 ("NFS: Revalidate the file mapping on all fatal writeback errors") Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFS: Clean up NFSv4.2 xattrsTrond Myklebust3-12/+18
Add a helper for the xattr mask so that we can get rid of the inlined ifdefs. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFS: Remove unnecessary XATTR cache invalidation in nfs_fhget()Trond Myklebust1-2/+0
We should never expect the 'xattr_cache' to be non-null in that case, hence nfs_set_cache_invalid() is just going to optimise it away. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFS: NFSv2/v3 clients should never be setting NFS_CAP_XATTRTrond Myklebust2-0/+2
Ensure that we always initialise the 'xattr_support' field in struct nfs_fsinfo, so that nfs_server_set_fsinfo() doesn't declare our NFSv2/v3 client to be capable of supporting the NFSv4.2 xattr protocol by setting the NFS_CAP_XATTR capability. This configuration can cause nfs_do_access() to set access mode bits that are unsupported by the NFSv3 ACCESS call, which may confuse spec-compliant servers. Reported-by: Olga Kornievskaia <[email protected]> Fixes: b78ef845c35d ("NFSv4.2: query the server for extended attribute support") Cc: [email protected] Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFS: Remove unused flag NFS_INO_REVAL_PAGECACHETrond Myklebust2-4/+2
Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFS: Replace last uses of NFS_INO_REVAL_PAGECACHETrond Myklebust2-14/+12
Now that we have more fine grained attribute revalidation, let's just get rid of NFS_INO_REVAL_PAGECACHE. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFSv4: use unique client identifiers in network namespacesBenjamin Coddington1-0/+14
In order to differentiate client state, assign a random uuid to the uniquifing portion of the client identifier when a network namespace is created. Containers may still override this value if they wish to maintain stable client identifiers by writing to /sys/fs/nfs/net/client/identifier, either by udev rules or other means. Signed-off-by: Benjamin Coddington <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFSv4.1 support for NFS4_RESULT_PRESERVER_UNLINKEDOlga Kornievskaia2-2/+10
In 4.1+, the server is allowed to set a flag NFS4_RESULT_PRESERVE_UNLINKED in reply to the OPEN, that tells the client that it does not need to do a silly rename of an opened file when it's being removed. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFSv4.2/copyoffload: Convert GFP_NOFS to GFP_KERNELTrond Myklebust3-8/+8
There doesn't seem to be any reason why the copy offload code can't use GFP_KERNEL. It can't get called by direct reclaim. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFSv4/flexfiles: Convert GFP_NOFS to GFP_KERNELTrond Myklebust2-9/+10
Assume that the higher layers will have set memalloc_nofs_save/restore as appropriate. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFS: Convert GFP_NOFS to GFP_KERNELTrond Myklebust4-14/+13
Assume that sections that should not re-enter the filesystem are already protected with memalloc_nofs_save/restore call, so relax those GFP_NOFS instances which might be used by other contexts. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFSv4.2: Fix up an invalid combination of memory allocation flagsTrond Myklebust1-4/+3
We should use either GFP_KERNEL or GFP_NOFS, but not both. Also strip GFP_KERNEL_ACCOUNT down to GFP_KERNEL. This memory is shrinkable, so does not need to be limited by kmemcg. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFSv4: Charge NFSv4 open state trackers to kmemcgTrond Myklebust2-4/+5
Allow kmemcg to limit the number of NFSv4 delegation, lock and open state trackers. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFS: Charge open/lock file contexts to kmemcgTrond Myklebust2-3/+3
Allow kmemcg to limit the number of open/lock file contexts, in the same way that it limits the parent file descriptors. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFSv4: Protect the state recovery thread against direct reclaimTrond Myklebust1-0/+12
If memory allocation triggers a direct reclaim from the state recovery thread, then we can deadlock. Use memalloc_nofs_save/restore to ensure that doesn't happen. Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFSv4.2: fix reference count leaks in _nfs42_proc_copy_notify()Xin Xiong1-3/+6
[You don't often get email from [email protected]. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] The reference counting issue happens in two error paths in the function _nfs42_proc_copy_notify(). In both error paths, the function simply returns the error code and forgets to balance the refcount of object `ctx`, bumped by get_nfs_open_context() earlier, which may cause refcount leaks. Fix it by balancing refcount of the `ctx` object before the function returns in both error paths. Signed-off-by: Xin Xiong <[email protected]> Signed-off-by: Xiyu Yang <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25Convert NFS from readpages to readaheadMatthew Wilcox (Oracle)3-12/+17
NFS is one of the last two users of the deprecated ->readpages aop. This conversion looks straightforward, but I have only compile-tested it. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-02-25NFS: simplify check for freeing cn_respTom Rix1-2/+2
nfs42_files_from_same_server() is called to check if freeing cn_resp is required, just do the free. Signed-off-by: Tom Rix <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-02-16NFS: Do not report writeback errors in nfs_getattr()Trond Myklebust1-6/+3
The result of the writeback, whether it is an ENOSPC or an EIO, or anything else, does not inhibit the NFS client from reporting the correct file timestamps. Fixes: 79566ef018f5 ("NFS: Getattr doesn't require data sync semantics") Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2022-02-14NFS: LOOKUP_DIRECTORY is also ok with symlinksTrond Myklebust1-2/+2
Commit ac795161c936 (NFSv4: Handle case where the lookup of a directory fails) [1], part of Linux since 5.17-rc2, introduced a regression, where a symbolic link on an NFS mount to a directory on another NFS does not resolve(?) the first time it is accessed: Reported-by: Paul Menzel <[email protected]> Fixes: ac795161c936 ("NFSv4: Handle case where the lookup of a directory fails") Signed-off-by: Trond Myklebust <[email protected]> Tested-by: Donald Buczek <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2022-02-14NFS: Remove an incorrect revalidation in nfs4_update_changeattr_locked()Trond Myklebust1-2/+1
In nfs4_update_changeattr_locked(), we don't need to set the NFS_INO_REVAL_PAGECACHE flag, because we already know the value of the change attribute, and we're already flagging the size. In fact, this forces us to revalidate the change attribute a second time for no good reason. This extra flag appears to have been introduced as part of the xattr feature, when update_changeattr_locked() was converted for use by the xattr code. Fixes: 1b523ca972ed ("nfs: modify update_changeattr to deal with regular files") Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2022-02-08NFS: Fix nfs4_proc_get_locations() kernel-doc commentYang Li1-1/+2
Add the description of @server and @fhandle, and remove the excess @inode in nfs4_proc_get_locations() kernel-doc comment to remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/nfs/nfs4proc.c:8219: warning: Function parameter or member 'server' not described in 'nfs4_proc_get_locations' fs/nfs/nfs4proc.c:8219: warning: Function parameter or member 'fhandle' not described in 'nfs4_proc_get_locations' fs/nfs/nfs4proc.c:8219: warning: Excess function parameter 'inode' description in 'nfs4_proc_get_locations' Reported-by: Abaci Robot <[email protected]> Signed-off-by: Yang Li <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>