aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-08-13NFS: Remove a bogus flag setting in pnfs_write_done_resend_to_mdsTrond Myklebust1-1/+0
Since pnfs_write_done_resend_to_mds() does not actually call end_page_writeback() on the pages that are being redirected to the metadata server, callers of fsync() do not see the I/O as complete until the writeback to the MDS finishes. We therefore do not need to set NFS_CONTEXT_RESEND_WRITES, since there is nothing to redrive. Signed-off-by: Trond Myklebust <[email protected]>
2022-08-13NFS: Fix another fsync() issue after a server rebootTrond Myklebust4-11/+12
Currently, when the writeback code detects a server reboot, it redirties any pages that were not committed to disk, and it sets the flag NFS_CONTEXT_RESEND_WRITES in the nfs_open_context of the file descriptor that dirtied the file. While this allows the file descriptor in question to redrive its own writes, it violates the fsync() requirement that we should be synchronising all writes to disk. While the problem is infrequent, we do see corner cases where an untimely server reboot causes the fsync() call to abandon its attempt to sync data to disk and causing data corruption issues due to missed error conditions or similar. In order to tighted up the client's ability to deal with this situation without introducing livelocks, add a counter that records the number of times pages are redirtied due to a server reboot-like condition, and use that in fsync() to redrive the sync to disk. Fixes: 2197e9b06c22 ("NFS: Fix up fsync() when the server rebooted") Cc: [email protected] Signed-off-by: Trond Myklebust <[email protected]>
2022-08-13NFS: Fix missing unlock in nfs_unlink()Sun Ke1-1/+3
Add the missing unlock before goto. Fixes: 3c59366c207e ("NFS: don't unhash dentry during unlink/rename") Signed-off-by: Sun Ke <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-08-09NFS: Improve readpage/writepage tracingTrond Myklebust1-28/+26
Switch formatting to better match that used by other NFS tracepoints. Signed-off-by: Trond Myklebust <[email protected]>
2022-08-09NFS: Improve O_DIRECT tracingTrond Myklebust1-12/+9
Switch the formatting to match the other NFS tracepoints. Signed-off-by: Trond Myklebust <[email protected]>
2022-08-09NFS: Improve write error tracingTrond Myklebust3-20/+27
Don't leak request pointers, but use the "device:inode" labelling that is used by all the other trace points. Furthermore, replace use of page indexes with an offset, again in order to align behaviour with other NFS trace points. Signed-off-by: Trond Myklebust <[email protected]>
2022-08-08NFS: don't unhash dentry during unlink/renameNeilBrown2-18/+63
NFS unlink() (and rename over existing target) must determine if the file is open, and must perform a "silly rename" instead of an unlink (or before rename) if it is. Otherwise the client might hold a file open which has been removed on the server. Consequently if it determines that the file isn't open, it must block any subsequent opens until the unlink/rename has been completed on the server. This is currently achieved by unhashing the dentry. This forces any open attempt to the slow-path for lookup which will block on i_rwsem on the directory until the unlink/rename completes. A future patch will change the VFS to only get a shared lock on i_rwsem for unlink, so this will no longer work. Instead we introduce an explicit interlock. A special value is stored in dentry->d_fsdata while the unlink/rename is running and ->d_revalidate blocks while that value is present. When ->d_revalidate unblocks, the dentry will be invalid. This closes the race without requiring exclusion on i_rwsem. d_fsdata is already used in two different ways. 1/ an IS_ROOT directory dentry might have a "devname" stored in d_fsdata. Such a dentry doesn't have a name and so cannot be the target of unlink or rename. For safety we check if an old devname is still stored, and remove it if it is. 2/ a dentry with DCACHE_NFSFS_RENAMED set will have a 'struct nfs_unlinkdata' stored in d_fsdata. While this is set maydelete() will fail, so an unlink or rename will never proceed on such a dentry. Neither of these can be in effect when a dentry is the target of unlink or rename. So we can expect d_fsdata to be NULL, and store a special value ((void*)1) which is given the name NFS_FSDATA_BLOCKED to indicate that any lookup will be blocked. The d_count() is incremented under d_lock() when a lookup finds the dentry, so we check d_count() is low, and set NFS_FSDATA_BLOCKED under the same lock to avoid any races. Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-08-02NFSv4/pnfs: Fix a use-after-free bug in openTrond Myklebust1-5/+6
If someone cancels the open RPC call, then we must not try to free either the open slot or the layoutget operation arguments, since they are likely still in use by the hung RPC call. Fixes: 6949493884fe ("NFSv4: Don't hold the layoutget locks across multiple RPC calls") Signed-off-by: Trond Myklebust <[email protected]>
2022-08-02NFS: nfs_async_write_reschedule_io must not recurse into the writeback codeTrond Myklebust1-2/+0
It is not safe to call filemap_fdatawrite_range() from nfs_async_write_reschedule_io(), since we're often calling from a page reclaim context. Just let fsync() redrive the writeback for us. Signed-off-by: Trond Myklebust <[email protected]>
2022-07-27SUNRPC: Don't reuse bvec on retransmission of the requestTrond Myklebust4-21/+22
If a request is re-encoded and then retransmitted, we need to make sure that we also re-encode the bvec, in case the page lists have changed. Fixes: ff053dbbaffe ("SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()") Signed-off-by: Trond Myklebust <[email protected]>
2022-07-27SUNRPC: Reinitialise the backchannel request buffers before reuseTrond Myklebust1-0/+14
When we're reusing the backchannel requests instead of freeing them, then we should reinitialise any values of the send/receive xdr_bufs so that they reflect the available space. Fixes: 0d2a970d0ae5 ("SUNRPC: Fix a backchannel race") Signed-off-by: Trond Myklebust <[email protected]>
2022-07-27NFSv4.1: RECLAIM_COMPLETE must handle EACCESZhang Xianwei1-0/+3
A client should be able to handle getting an EACCES error while doing a mount operation to reclaim state due to NFS4CLNT_RECLAIM_REBOOT being set. If the server returns RPC_AUTH_BADCRED because authentication failed when we execute "exportfs -au", then RECLAIM_COMPLETE will go a wrong way. After mount succeeds, all OPEN call will fail due to an NFS4ERR_GRACE error being returned. This patch is to fix it by resending a RPC request. Signed-off-by: Zhang Xianwei <[email protected]> Signed-off-by: Yi Wang <[email protected]> Fixes: aa5190d0ed7d ("NFSv4: Kill nfs4_async_handle_error() abuses by NFSv4.1") Signed-off-by: Trond Myklebust <[email protected]>
2022-07-25NFSv4.1 probe offline transports for trunking on session creationOlga Kornievskaia1-0/+8
Once the session is established call into the SUNRPC layer to check if any offlined trunking connections should be re-enabled. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-25SUNRPC create a function that probes only offline transportsOlga Kornievskaia2-0/+67
For only offline transports, attempt to check connectivity via a NULL call and, if that succeeds, call a provided session trunking detection function. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-25SUNRPC export xprt_iter_rewind functionOlga Kornievskaia2-1/+2
Make xprt_iter_rewind callable outside of xprtmultipath.c Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-25SUNRPC restructure rpc_clnt_setup_test_and_add_xprtOlga Kornievskaia1-21/+31
In preparation for code re-use, pull out the part of the rpc_clnt_setup_test_and_add_xprt() portion that sends a NULL rpc and then calls a session trunking function into a helper function. Re-organize the end of the function for code re-use. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-25NFSv4.1 remove xprt from xprt_switch if session trunking test failsOlga Kornievskaia1-0/+3
If we are doing a session trunking test and it fails for the transport, then remove this transport from the xprt_switch group. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-25SUNRPC create an rpc function that allows xprt removal from rpc_clntOlga Kornievskaia5-8/+24
Expose a function that allows a removal of xprt from the rpc_clnt. When called from NFS that's running a trunked transport then don't decrement the active transport counter. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-25SUNRPC enable back offline transports in trunking discoveryOlga Kornievskaia2-0/+15
When we are adding a transport to a xprt_switch that's already on the list but has been marked OFFLINE, then make the state ONLINE since it's been tested now. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-25SUNRPC create an iterator to list only OFFLINE xprtsOlga Kornievskaia3-12/+101
Create a new iterator helper that will go thru the all the transports in the switch and return transports that are marked OFFLINE. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-25NFSv4.1 offline trunkable transports on DESTROY_SESSIONOlga Kornievskaia1-0/+1
When session is destroy, some of the transports might no longer be valid trunks for the new session. Offline existing transports. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-25SUNRPC add function to offline remove trunkable transportsOlga Kornievskaia2-0/+47
Iterate thru available transports in the xprt_switch for all trunkable transports offline and possibly remote them as well. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-25SUNRPC expose functions for offline remote xprt functionalityOlga Kornievskaia3-23/+40
Re-arrange the code that make offline transport and delete transport callable functions. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-23SUNRPC: Remove xdr_align_data() and xdr_expand_hole()Anna Schumaker2-68/+0
These functions are no longer needed now that the NFS client places data and hole segments directly. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-23NFS: Replace the READ_PLUS decoding codeAnna Schumaker1-82/+88
We now take a 2-step process that allows us to place data and hole segments directly at their final position in the xdr_stream without needing to do a bunch of redundant copies to expand holes. Due to the variable lengths of each segment, the xdr metadata might cross page boundaries which I account for by setting a small scratch buffer so xdr_inline_decode() won't fail. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-23SUNRPC: Add a function for zeroing out a portion of an xdr_streamAnna Schumaker2-0/+25
This will be used during READ_PLUS decoding for handling HOLE segments. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-23SUNRPC: Add a function for directly setting the xdr page lenAnna Schumaker2-0/+31
We need to do this step during READ_PLUS decoding so that we know pages are the right length and any extra data has been preserved in the tail. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-23SUNRPC: Introduce xdr_stream_move_subsegment()Anna Schumaker2-0/+61
I do this by creating an xdr subsegment for the range we will be operating over. This lets me shift data to the correct place without potentially overwriting anything already there. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-23NFS: Replace fs_context-related dprintk() call sites with tracepointsChuck Lever2-10/+73
Contributed as part of the long patch series that converts NFS from using dprintk to tracepoints for observability. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-23SUNRPC: Replace dprintk() call site in xs_data_readyChuck Lever2-2/+24
Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-23SUNRPC: Fail faster on bad verifierChuck Lever1-1/+1
A bad verifier is not a garbage argument, it's an authentication failure. Retrying it doesn't make the problem go away, and delays upper layer recovery steps. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-23nfs: only issue commit in DIO codepath if we have uncommitted dataJeff Layton3-19/+32
Currently, we try to determine whether to issue a commit based on nfs_write_need_commit which looks at the current verifier. In the case where we got a short write and then tried to follow it up with one that failed, the verifier can't be trusted. What we really want to know is whether the pgio request had any successful writes that came back as UNSTABLE. Add a new flag to the pgio request, and use that to indicate that we've had a successful unstable write. Only issue a commit if that flag is set. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-23nfs: always check dreq->error after a commitJeff Layton1-1/+2
When the client gets back a short DIO write, it will then attempt to issue another write to finish the DIO request. If that write then fails (as is often the case in an -ENOSPC situation), then we still may need to issue a COMMIT if the earlier short write was unstable. If that COMMIT then succeeds, then we don't want the client to reschedule the write requests, and to instead just return a short write. Otherwise, we can end up looping over the same DIO write forever. Always consult dreq->error after a successful RPC, even when the flag state is not NFS_ODIRECT_DONE. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2028370 Reported-by: Boyang Xue <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-23nfs: add new nfs_direct_req tracepoint eventsJeff Layton3-33/+114
Add some new tracepoints to the DIO write code. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-23SUNRPC: Shrink size of struct rpc_taskTrond Myklebust1-2/+2
Move the field 'tk_rpc_status' so that we eliminate a 4 byte hole in the structure. For x86_64, this shrinks the size of the struct by 8 bytes. 'pahole' output before the change: /* size: 232, cachelines: 4, members: 27 */ /* sum members: 222, holes: 1, sum holes: 4 */ /* sum bitfield members: 8 bits (1 bytes) */ /* padding: 5 */ /* last cacheline: 40 bytes */ 'pahole' output after the change: /* size: 224, cachelines: 4, members: 27 */ /* padding: 1 */ /* last cacheline: 32 bytes */ Signed-off-by: Trond Myklebust <[email protected]>
2022-07-13NFSv4: Fix races in the legacy idmapper upcallTrond Myklebust1-22/+24
nfs_idmap_instantiate() will cause the process that is waiting in request_key_with_auxdata() to wake up and exit. If there is a second process waiting for the idmap->idmap_mutex, then it may wake up and start a new call to request_key_with_auxdata(). If the call to idmap_pipe_downcall() from the first process has not yet finished calling nfs_idmap_complete_pipe_upcall_locked(), then we may end up triggering the WARN_ON_ONCE() in nfs_idmap_prepare_pipe_upcall(). The fix is to ensure that we clear idmap->idmap_upcall_data before calling nfs_idmap_instantiate(). Fixes: e9ab41b620e4 ("NFSv4: Clean up the legacy idmapper upcall") Signed-off-by: Trond Myklebust <[email protected]>
2022-07-12NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZEAnna Schumaker4-10/+31
Previously, we required this to value to be a power of 2 for UDP related reasons. This patch keeps the power of 2 rule for UDP but allows more flexibility for TCP and RDMA. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-12sunrpc: fix expiry of auth credsDan Aloni1-1/+1
Before this commit, with a large enough LRU of expired items (100), the loop skipped all the expired items and was entirely ineffectual in trimming the LRU list. Fixes: 95cd623250ad ('SUNRPC: Clean up the AUTH cache code') Signed-off-by: Dan Aloni <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-12nfs: fix port value parsingIan Kent1-1/+1
The valid values of nfs options port and mountport are 0 to USHRT_MAX. The fs parser will return a fail for port values that are negative and the sloppy option handling then returns success. But the sloppy option handling is meant to return success for invalid options not valid options with invalid values. Restricting the sloppy option override to handle failure returns for invalid options only is sufficient to resolve this problem. Changes: v2: utilize the return value from fs_parse() to resolve this problem instead of changing the parameter definitions. Suggested-by: Trond Myklebust <[email protected]> Signed-off-by: Ian Kent <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-12nfs: Replace kmap() with kmap_local_page()Fabio M. De Francesco1-2/+2
The use of kmap() is being deprecated in favor of kmap_local_page(). With kmap_local_page(), the mapping is per thread, CPU local and not globally visible. Furthermore, the mapping can be acquired from any context (including interrupts). Therefore, use kmap_local_page() in nfs_do_filldir() because this mapping is per thread, CPU local, and not globally visible. Suggested-by: Ira Weiny <[email protected]> Signed-off-by: Fabio M. De Francesco <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-12NFS: remove redundant code in nfs_file_write()ChenXiaoSong1-2/+0
filemap_fdatawait_range() will always return 0, after patch 6c984083ec24 ("NFS: Use of mapping_set_error() results in spurious errors"), it will not save the wb err in struct address_space->flags: result = filemap_fdatawait_range(file->f_mapping, ...) = 0 filemap_check_errors(mapping) = 0 test_bit(..., &mapping->flags) // flags is 0 Signed-off-by: ChenXiaoSong <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-12nfs/blocklayout: refactor block device openingChristoph Hellwig1-31/+11
Deduplicate the helpers to open a device node by passing a name prefix argument and using the same helper for both kinds of paths. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-07-12NFSv4.1: Handle NFS4ERR_DELAY replies to OP_SEQUENCE correctlyTrond Myklebust1-1/+0
Don't assume that the NFS4ERR_DELAY means that the server is processing this slot id. Fixes: 3453d5708b33 ("NFSv4.1: Avoid false retries when RPC calls are interrupted") Signed-off-by: Trond Myklebust <[email protected]>
2022-07-12NFSv4.1: Don't decrease the value of seq_nr_highest_sentTrond Myklebust1-3/+2
When we're trying to figure out what the server may or may not have seen in terms of request numbers, do not assume that requests with a larger number were missed, just because we saw a reply to a request with a smaller number. Fixes: 3453d5708b33 ("NFSv4.1: Avoid false retries when RPC calls are interrupted") Signed-off-by: Trond Myklebust <[email protected]>
2022-07-10NFS: Fix case insensitive renamesTrond Myklebust1-0/+4
For filesystems that are case insensitive and case preserving, we need to be able to rename from one case folded variant of the filename to another. Currently, if we have looked up the target filename before the call to rename, then we may have a hashed dentry with that target name in the dcache, causing the vfs to optimise away the rename. To avoid that, let's drop the target dentry, and leave it to the server to optimise away the rename if that is the correct thing to do. Signed-off-by: Trond Myklebust <[email protected]>
2022-07-10pNFS/files: Handle RDMA connection errors correctlyTrond Myklebust1-0/+2
The RPC/RDMA driver will return -EPROTO and -ENODEV as connection errors under certain circumstances. Make sure that we handle them correctly and avoid cycling forever in a LAYOUTGET/LAYOUTRETURN loop. Signed-off-by: Trond Myklebust <[email protected]>
2022-07-10pNFS/flexfiles: Report RDMA connection errors to the serverTrond Myklebust1-0/+4
The RPC/RDMA driver will return -EPROTO and -ENODEV as connection errors under certain circumstances. Make sure that we handle them and report them to the server. If not, we can end up cycling forever in a LAYOUTGET/LAYOUTRETURN loop. Fixes: a12f996d3413 ("NFSv4/pNFS: Use connections to a DS that are all of the same protocol family") Cc: [email protected] # 5.11.x Signed-off-by: Trond Myklebust <[email protected]>
2022-07-10Revert "pNFS: nfs3_set_ds_client should set NFS_CS_NOPING"Trond Myklebust1-1/+0
This reverts commit c6eb58435b98bd843d3179664a0195ff25adb2c3. If a transport is down, then we want to fail over to other transports if they are listed in the GETDEVICEINFO reply. Fixes: c6eb58435b98 ("pNFS: nfs3_set_ds_client should set NFS_CS_NOPING") Cc: [email protected] # 5.11.x Signed-off-by: Trond Myklebust <[email protected]>
2022-07-10SUNRPC: Fix an RPC/RDMA performance regressionTrond Myklebust3-12/+6
Use the standard gfp mask instead of using GFP_NOWAIT. The latter causes issues when under memory pressure. Signed-off-by: Trond Myklebust <[email protected]>
2022-07-10Linux 5.19-rc6Linus Torvalds1-1/+1