aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-12-09fs: nfsd: Fix signedness bug in compare_blobRasmus Villemoes1-8/+7
Bugs similar to the one in acbbe6fbb240 (kcmp: fix standard comparison bug) are in rich supply. In this variant, the problem is that struct xdr_netobj::len has type unsigned int, so the expression o1->len - o2->len _also_ has type unsigned int; it has completely well-defined semantics, and the result is some non-negative integer, which is always representable in a long long. But this means that if the conditional triggers, we are guaranteed to return a positive value from compare_blob. In this case it could be fixed by - res = o1->len - o2->len; + res = (long long)o1->len - (long long)o2->len; but I'd rather eliminate the usually broken 'return a - b;' idiom. Reviewed-by: Jeff Layton <[email protected]> Cc: <[email protected]> Signed-off-by: Rasmus Villemoes <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: add some tracepoints around enqueue and dequeue of svc_xprtJeff Layton2-7/+109
These were useful when I was tracking down a race condition between svc_xprt_do_enqueue and svc_get_next_xprt. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: convert to lockless lookup of queued server threadsJeff Layton4-103/+132
Testing has shown that the pool->sp_lock can be a bottleneck on a busy server. Every time data is received on a socket, the server must take that lock in order to dequeue a thread from the sp_threads list. Address this problem by eliminating the sp_threads list (which contains threads that are currently idle) and replacing it with a RQ_BUSY flag in svc_rqst. This allows us to walk the sp_all_threads list under the rcu_read_lock and find a suitable thread for the xprt by doing a test_and_set_bit. Note that we do still have a potential atomicity problem however with this approach. We don't want svc_xprt_do_enqueue to set the rqst->rq_xprt pointer unless a test_and_set_bit of RQ_BUSY returned zero (which indicates that the thread was idle). But, by the time we check that, the bit could be flipped by a waking thread. To address this, we acquire a new per-rqst spinlock (rq_lock) and take that before doing the test_and_set_bit. If that returns false, then we can set rq_xprt and drop the spinlock. Then, when the thread wakes up, it must set the bit under the same spinlock and can trust that if it was already set then the rq_xprt is also properly set. With this scheme, the case where we have an idle thread no longer needs to take the highly contended pool->sp_lock at all, and that removes the bottleneck. That still leaves one issue: What of the case where we walk the whole sp_all_threads list and don't find an idle thread? Because the search is lockess, it's possible for the queueing to race with a thread that is going to sleep. To address that, we queue the xprt and then search again. If we find an idle thread at that point, we can't attach the xprt to it directly since that might race with a different thread waking up and finding it. All we can do is wake the idle thread back up and let it attempt to find the now-queued xprt. Signed-off-by: Jeff Layton <[email protected]> Tested-by: Chris Worley <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: fix potential races in pool_stats collectionJeff Layton2-9/+9
In a later patch, we'll be removing some spinlocking around the socket and thread queueing code in order to fix some contention problems. At that point, the stats counters will no longer be protected by the sp_lock. Change the counters to atomic_long_t fields, except for the "sockets_queued" counter which will still be manipulated under a spinlock. Signed-off-by: Jeff Layton <[email protected]> Tested-by: Chris Worley <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free itJeff Layton3-5/+10
...also make the manipulation of sp_all_threads list use RCU-friendly functions. Signed-off-by: Jeff Layton <[email protected]> Tested-by: Chris Worley <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: require svc_create callers to pass in meaningful shutdown routineJeff Layton2-4/+1
Currently all svc_create callers pass in NULL for the shutdown parm, which then gets fixed up to be svc_rpcb_cleanup if the service uses rpcbind. Simplify this by instead having the the only caller that requires it (lockd) pass in svc_rpcb_cleanup and get rid of the special casing. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: have svc_wake_up only deal with pool 0Jeff Layton1-21/+16
The way that svc_wake_up works is a bit inefficient. It walks all of the available pools for a service and either wakes up a task in each one or sets the SP_TASK_PENDING flag in each one. When svc_wake_up is called, there is no need to wake up more than one thread to do this work. In practice, only lockd currently uses this function and it's single threaded anyway. Thus, this just boils down to doing a wake up of a thread in pool 0 or setting a single flag. Eliminate the for loop in this function and change it to just operate on pool 0. Also update the comments that sit above it and get rid of some code that has been commented out for years now. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: convert sp_task_pending flag to use atomic bitopsJeff Layton2-5/+6
In a later patch, we'll want to be able to handle this flag without holding the sp_lock. Change this field to an unsigned long flags field, and declare a new flag in it that can be managed with atomic bitops. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: move rq_cachetype field to better optimize spaceJeff Layton1-1/+1
There are a couple of holes in the svc_rqst field on x86_64. Move the rq_cachetype to a different location to eliminate both of them. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: move rq_splice_ok flag into rq_flagsJeff Layton7-12/+13
Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: move rq_dropme flag into rq_flagsJeff Layton5-9/+9
Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: move rq_usedeferral flag to rq_flagsJeff Layton5-9/+10
Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: move rq_local field to rq_flagsJeff Layton4-5/+9
Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to itJeff Layton5-10/+22
In a later patch, we're going to need some atomic bit flags. Since that field will need to be an unsigned long, we mitigate that space consumption by migrating some other bitflags to the new field. Start with the rq_secure flag. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09Merge tag 'nfs-for-3.19-1' into nfsd for-3.19 branchJ. Bruce Fields271-1021/+2556
Mainly what I need is 860a0d9e511f "sunrpc: add some tracepoints in svc_rqst handling functions", which subsequent server rpc patches from jlayton depend on. I'm merging this later tag on the assumption that's more likely to be a tested and stable point.
2014-12-01nfsd: minor off by one checks in __write_versions()Dan Carpenter1-3/+3
My static checker complains that if "len == remaining" then it means we have truncated the last character off the version string. The intent of the code is that we print as many versions as we can without truncating a version. Then we put a newline at the end. If the newline can't fit we return -EINVAL. Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-01sunrpc: release svc_pool_map reference when serv allocation failsJeff Layton1-5/+7
Currently, it leaks when the allocation fails. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-01sunrpc: eliminate the XPT_DETACHED flagJeff Layton2-7/+4
All it does is indicate whether a xprt has already been deleted from a list or not, which is unnecessary since we use list_del_init and it's always set and checked under the sv_lock anyway. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-11-27sunrpc: add a debugfs rpc_xprt directory with an info file in itJeff Layton4-7/+134
Add a new directory heirarchy under the debugfs sunrpc/ directory: sunrpc/ rpc_xprt/ <xprt id>/ Within that directory, we can put files that give info about the xprts. We do have the (minor) problem that there is no succinct, unique identifier for rpc_xprts. So we generate them synthetically with a static atomic_t counter. For now, this directory just holds an "info" file, but we may add other files to it in the future. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-27sunrpc: add debugfs file for displaying client rpc_task queueJeff Layton7-1/+245
It's possible to get a dump of the RPC task queue by writing a value to /proc/sys/sunrpc/rpc_debug. If you write any value to that file, you get a dump of the RPC client task list into the log buffer. This is a rather inconvenient interface however, and makes it hard to get immediate info about the task queue. Add a new directory hierarchy under debugfs: sunrpc/ rpc_clnt/ <clientid>/ Within each clientid directory we create a new "tasks" file that will dump info similar to what shows up in the log buffer, but with a few small differences -- we avoid printing raw kernel addresses in favor of symbolic names and the XID is also displayed. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-26Merge tag 'nfs-rdma-for-3.19' of ↵Trond Myklebust3-17/+107
git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next Pull NFS client RDMA changes for 3.19 from Anna Schumaker: "NFS: Client side changes for RDMA These patches various bugfixes and cleanups for using NFS over RDMA, including better error handling and performance improvements by using pad optimization. Signed-off-by: Anna Schumaker <[email protected]>" * tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma: xprtrdma: Display async errors xprtrdma: Enable pad optimization xprtrdma: Re-write rpcrdma_flush_cqs() xprtrdma: Refactor tasklet scheduling xprtrdma: unmap all FMRs during transport disconnect xprtrdma: Cap req_cqinit xprtrdma: Return an errno from rpcrdma_register_external()
2014-11-26Merge tag 'nfs-cel-for-3.19' of ↵Trond Myklebust4-26/+39
git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next Pull pull additional NFS client changes for 3.19 from Anna Schumaker: "NFS: Generic client side changes from Chuck These patches fixes for iostats and SETCLIENTID in addition to cleaning up the nfs4_init_callback() function. Signed-off-by: Anna Schumaker <[email protected]>" * tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma: NFS: Clean up nfs4_init_callback() NFS: SETCLIENTID XDR buffer sizes are incorrect SUNRPC: serialize iostats updates
2014-11-25nfs: Add DEALLOCATE supportAnna Schumaker8-2/+91
This patch adds support for using the NFS v4.2 operation DEALLOCATE to punch holes in a file. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-25nfs: Add ALLOCATE supportAnna Schumaker11-1/+183
This patch adds support for using the NFS v4.2 operation ALLOCATE to preallocate data in a file. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-25NFS: Clean up nfs4_init_callback()Chuck Lever1-17/+14
nfs4_init_callback() is never invoked for NFS versions other than 4. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25NFS: SETCLIENTID XDR buffer sizes are incorrectChuck Lever1-4/+6
Use the correct calculation of the maximum size of a clientaddr4 when encoding and decoding SETCLIENTID operations. clientaddr4 is defined in section 2.2.10 of RFC3530bis-31. The usage in encode_setclientid_maxsz is missing the 4-byte length in both strings, but is otherwise correct. decode_setclientid_maxsz simply asks for a page of receive buffer space, which is unnecessarily large (more than 4KB). Note that a SETCLIENTID reply is either clientid+verifier, or clientaddr4, depending on the returned NFS status. It doesn't hurt to allocate enough space for both. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25SUNRPC: serialize iostats updatesChuck Lever2-5/+19
Occasionally mountstats reports a negative retransmission rate. Ensure that two RPCs completing concurrently don't confuse the sums in the transport's op_metrics array. Since pNFS filelayout can invoke rpc_count_iostats() on another transport from xprt_release(), we can't rely on simply holding the transport_lock in xprt_release(). There's nothing for it but hard serialization. One spin lock per RPC operation should make this as painless as it can be. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25xprtrdma: Display async errorsChuck Lever1-4/+32
An async error upcall is a hard error, and should be reported in the system log. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25xprtrdma: Enable pad optimizationChuck Lever1-1/+1
The Linux NFS/RDMA server used to reject NFSv3 WRITE requests when pad optimization was enabled. That bug was fixed by commit e560e3b510d2 ("svcrdma: Add zero padding if the client doesn't send it"). We can now enable pad optimization on the client, which helps performance and is supported now by both Linux and Solaris servers. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25xprtrdma: Re-write rpcrdma_flush_cqs()Chuck Lever1-2/+9
Currently rpcrdma_flush_cqs() attempts to avoid code duplication, and simply invokes rpcrdma_recvcq_upcall and rpcrdma_sendcq_upcall. 1. rpcrdma_flush_cqs() can run concurrently with provider upcalls. Both flush_cqs() and the upcalls were invoking ib_poll_cq() in different threads using the same wc buffers (ep->rep_recv_wcs and ep->rep_send_wcs), added by commit 1c00dd077654 ("xprtrmda: Reduce calls to ib_poll_cq() in completion handlers"). During transport disconnect processing, this sometimes resulted in the same reply getting added to the rpcrdma_tasklets_g list more than once, which corrupted the list. 2. The upcall functions drain only a limited number of CQEs, thanks to the poll budget added by commit 8301a2c047cc ("xprtrdma: Limit work done by completion handler"). Fixes: a7bc211ac926 ("xprtrdma: On disconnect, don't ignore ... ") BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=276 Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25xprtrdma: Refactor tasklet schedulingChuck Lever1-5/+12
Restore the separate function that schedules the reply handling tasklet. I need to call it from two different paths. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25xprtrdma: unmap all FMRs during transport disconnectChuck Lever2-2/+42
When using RPCRDMA_MTHCAFMR memory registration, after a few transport disconnect / reconnect cycles, ib_map_phys_fmr() starts to return EINVAL because the provider has exhausted its map pool. Make sure that all FMRs are unmapped during transport disconnect, and that ->send_request remarshals them during an RPC retransmit. This resets the transport's MRs to ensure that none are leaked during a disconnect. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25xprtrdma: Cap req_cqinitChuck Lever2-1/+9
Recent work made FRMR registration and invalidation completions unsignaled. This greatly reduces the adapter interrupt rate. Every so often, however, a posted send Work Request is allowed to signal. Otherwise, the provider's Work Queue will wrap and the workload will hang. The number of Work Requests that are allowed to remain unsignaled is determined by the value of req_cqinit. Currently, this is set to the size of the send Work Queue divided by two, minus 1. For FRMR, the send Work Queue is the maximum number of concurrent RPCs (currently 32) times the maximum number of Work Requests an RPC might use (currently 7, though some adapters may need more). For mlx4, this is 224 entries. This leaves completion signaling disabled for 111 send Work Requests. Some providers hold back dispatching Work Requests until a CQE is generated. If completions are disabled, then no CQEs are generated for quite some time, and that can stall the Work Queue. I've seen this occur running xfstests generic/113 over NFSv4, where eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM because the Work Queue has overflowed. The connection is dropped and re-established. Cap the rep_cqinit setting so completions are not left turned off for too long. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269 Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25xprtrdma: Return an errno from rpcrdma_register_external()Chuck Lever1-2/+2
The RPC/RDMA send_request method and the chunk registration code expects an errno from the registration function. This allows the upper layers to distinguish between a recoverable failure (for example, temporary memory exhaustion) and a hard failure (for example, a bug in the registration logic). Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-24nfs: define nfs_inc_fscache_stats and using it as possibleLi RongQing2-12/+17
Define and use nfs_inc_fscache_stats when plus one, which can save to pass one parameter. Signed-off-by: Li RongQing <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24nfs: replace nfs_add_stats with nfs_inc_stats when add oneLi RongQing2-2/+2
Signed-off-by: Li RongQing <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24NFS: Deletion of unnecessary checks before the function call "nfs_put_client"Markus Elfring2-12/+6
The nfs_put_client() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24sunrpc: eliminate RPC_TRACEPOINTSJeff Layton3-13/+4
It's always set to the same value as CONFIG_TRACEPOINTS, so we can just use that instead. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24sunrpc: eliminate RPC_DEBUGJeff Layton32-56/+53
It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24lockd: eliminate LOCKD_DEBUGJeff Layton2-6/+2
LOCKD_DEBUG is always the same value as CONFIG_SUNRPC_DEBUG, so we can just use it instead. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24nfs41: fix nfs4_proc_layoutget error handlingPeng Tao1-3/+3
nfs4_layoutget_release() drops layout hdr refcnt. Grab the refcnt early so that it is safe to call .release in case nfs4_alloc_pages fails. Signed-off-by: Peng Tao <[email protected]> Fixes: a47970ff78147 ("NFSv4.1: Hold reference to layout hdr in layoutget") Cc: [email protected] # 3.9+ Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24NFS: fix subtle change in COMMIT behaviorWeston Andros Adamson5-15/+27
Recent work in the pgio layer made it possible for there to be more than one request per page. This caused a subtle change in commit behavior, because write.c:nfs_commit_unstable_pages compares the number of *pages* waiting for writeback against the number of requests on a commit list to choose when to send a COMMIT in a non-blocking flush. This is probably hard to hit in normal operation - you have to be using rsize/wsize < PAGE_SIZE, or pnfs with lots of boundaries that are not page aligned to have a noticeable change in behavior. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24pnfs/blocklayout: fix end calculation in pnfs_num_cont_bytesChristoph Hellwig1-1/+1
Use the number of pages in the pagecache mapping instead of the number of pnfs requests which is only slightly related. Reported-by: Weston Andros Adamson <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24sunrpc: add tracepoints in xs_tcp_data_recvJeff Layton3-60/+104
Add tracepoints inside the main loop on xs_tcp_data_recv that allow us to keep an eye on what's happening during each phase of it. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24sunrpc: add new tracepoints in xprt handling codeJeff Layton3-2/+78
...so we can keep track of when calls are sent and replies received. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24sunrpc: add some tracepoints in svc_rqst handling functionsJeff Layton3-19/+88
...just around svc_send, svc_recv and svc_process for now. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24NFS: Use nfs_server_capable() for checknig NFS_CAP_SEEKAnna Schumaker1-1/+1
This should make the code easier to maintain in the future. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24nfs: Remove dead case from nfs4_map_errors()Jan Kara1-2/+0
NFS4ERR_ACCESS has number 13 and thus is matched and returned immediately at the beginning of nfs4_map_errors() and there's no point in checking it later. Coverity-id: 733891 Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-23Linux 3.18-rc6Linus Torvalds1-1/+1
2014-11-23uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUMEAndy Lutomirski2-2/+1
x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but not on non-paranoid returns. I suspect that this is a mistake and that the code only works because int3 is paranoid. Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME from the uprobes code. Reported-by: Oleg Nesterov <[email protected]> Acked-by: Srikar Dronamraju <[email protected]> Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>