blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2014-12-09	sunrpc: require svc_create callers to pass in meaningful shutdown routine	Jeff Layton	1	-3/+0
	Currently all svc_create callers pass in NULL for the shutdown parm, which then gets fixed up to be svc_rpcb_cleanup if the service uses rpcbind. Simplify this by instead having the the only caller that requires it (lockd) pass in svc_rpcb_cleanup and get rid of the special casing. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09	sunrpc: have svc_wake_up only deal with pool 0	Jeff Layton	1	-21/+16
	The way that svc_wake_up works is a bit inefficient. It walks all of the available pools for a service and either wakes up a task in each one or sets the SP_TASK_PENDING flag in each one. When svc_wake_up is called, there is no need to wake up more than one thread to do this work. In practice, only lockd currently uses this function and it's single threaded anyway. Thus, this just boils down to doing a wake up of a thread in pool 0 or setting a single flag. Eliminate the for loop in this function and change it to just operate on pool 0. Also update the comments that sit above it and get rid of some code that has been commented out for years now. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09	sunrpc: convert sp_task_pending flag to use atomic bitops	Jeff Layton	1	-4/+3
	In a later patch, we'll want to be able to handle this flag without holding the sp_lock. Change this field to an unsigned long flags field, and declare a new flag in it that can be managed with atomic bitops. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09	sunrpc: move rq_splice_ok flag into rq_flags	Jeff Layton	2	-2/+2
	Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09	sunrpc: move rq_dropme flag into rq_flags	Jeff Layton	2	-3/+3
	Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09	sunrpc: move rq_usedeferral flag to rq_flags	Jeff Layton	2	-2/+2
	Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09	sunrpc: move rq_local field to rq_flags	Jeff Layton	1	-1/+4
	Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09	sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it	Jeff Layton	1	-1/+4
	In a later patch, we're going to need some atomic bit flags. Since that field will need to be an unsigned long, we mitigate that space consumption by migrating some other bitflags to the new field. Start with the rq_secure flag. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-09	Merge tag 'nfs-for-3.19-1' into nfsd for-3.19 branch	J. Bruce Fields	34	-147/+536
	Mainly what I need is 860a0d9e511f "sunrpc: add some tracepoints in svc_rqst handling functions", which subsequent server rpc patches from jlayton depend on. I'm merging this later tag on the assumption that's more likely to be a tested and stable point.
2014-12-01	sunrpc: release svc_pool_map reference when serv allocation fails	Jeff Layton	1	-5/+7
	Currently, it leaks when the allocation fails. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-12-01	sunrpc: eliminate the XPT_DETACHED flag	Jeff Layton	1	-3/+1
	All it does is indicate whether a xprt has already been deleted from a list or not, which is unnecessary since we use list_del_init and it's always set and checked under the sv_lock anyway. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-11-27	sunrpc: add a debugfs rpc_xprt directory with an info file in it	Jeff Layton	2	-7/+116
	Add a new directory heirarchy under the debugfs sunrpc/ directory: sunrpc/ rpc_xprt/ <xprt id>/ Within that directory, we can put files that give info about the xprts. We do have the (minor) problem that there is no succinct, unique identifier for rpc_xprts. So we generate them synthetically with a static atomic_t counter. For now, this directory just holds an "info" file, but we may add other files to it in the future. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-27	sunrpc: add debugfs file for displaying client rpc_task queue	Jeff Layton	5	-1/+210
	It's possible to get a dump of the RPC task queue by writing a value to /proc/sys/sunrpc/rpc_debug. If you write any value to that file, you get a dump of the RPC client task list into the log buffer. This is a rather inconvenient interface however, and makes it hard to get immediate info about the task queue. Add a new directory hierarchy under debugfs: sunrpc/ rpc_clnt/ <clientid>/ Within each clientid directory we create a new "tasks" file that will dump info similar to what shows up in the log buffer, but with a few small differences -- we avoid printing raw kernel addresses in favor of symbolic names and the XID is also displayed. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-26	Merge tag 'nfs-rdma-for-3.19' of ↵	Trond Myklebust	3	-17/+107
	git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next Pull NFS client RDMA changes for 3.19 from Anna Schumaker: "NFS: Client side changes for RDMA These patches various bugfixes and cleanups for using NFS over RDMA, including better error handling and performance improvements by using pad optimization. Signed-off-by: Anna Schumaker <[email protected]>" * tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma: xprtrdma: Display async errors xprtrdma: Enable pad optimization xprtrdma: Re-write rpcrdma_flush_cqs() xprtrdma: Refactor tasklet scheduling xprtrdma: unmap all FMRs during transport disconnect xprtrdma: Cap req_cqinit xprtrdma: Return an errno from rpcrdma_register_external()
2014-11-26	Merge tag 'nfs-cel-for-3.19' of ↵	Trond Myklebust	1	-5/+16
	git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next Pull pull additional NFS client changes for 3.19 from Anna Schumaker: "NFS: Generic client side changes from Chuck These patches fixes for iostats and SETCLIENTID in addition to cleaning up the nfs4_init_callback() function. Signed-off-by: Anna Schumaker <[email protected]>" * tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma: NFS: Clean up nfs4_init_callback() NFS: SETCLIENTID XDR buffer sizes are incorrect SUNRPC: serialize iostats updates
2014-11-25	SUNRPC: serialize iostats updates	Chuck Lever	1	-5/+16
	Occasionally mountstats reports a negative retransmission rate. Ensure that two RPCs completing concurrently don't confuse the sums in the transport's op_metrics array. Since pNFS filelayout can invoke rpc_count_iostats() on another transport from xprt_release(), we can't rely on simply holding the transport_lock in xprt_release(). There's nothing for it but hard serialization. One spin lock per RPC operation should make this as painless as it can be. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25	xprtrdma: Display async errors	Chuck Lever	1	-4/+32
	An async error upcall is a hard error, and should be reported in the system log. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25	xprtrdma: Enable pad optimization	Chuck Lever	1	-1/+1
	The Linux NFS/RDMA server used to reject NFSv3 WRITE requests when pad optimization was enabled. That bug was fixed by commit e560e3b510d2 ("svcrdma: Add zero padding if the client doesn't send it"). We can now enable pad optimization on the client, which helps performance and is supported now by both Linux and Solaris servers. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25	xprtrdma: Re-write rpcrdma_flush_cqs()	Chuck Lever	1	-2/+9
	Currently rpcrdma_flush_cqs() attempts to avoid code duplication, and simply invokes rpcrdma_recvcq_upcall and rpcrdma_sendcq_upcall. 1. rpcrdma_flush_cqs() can run concurrently with provider upcalls. Both flush_cqs() and the upcalls were invoking ib_poll_cq() in different threads using the same wc buffers (ep->rep_recv_wcs and ep->rep_send_wcs), added by commit 1c00dd077654 ("xprtrmda: Reduce calls to ib_poll_cq() in completion handlers"). During transport disconnect processing, this sometimes resulted in the same reply getting added to the rpcrdma_tasklets_g list more than once, which corrupted the list. 2. The upcall functions drain only a limited number of CQEs, thanks to the poll budget added by commit 8301a2c047cc ("xprtrdma: Limit work done by completion handler"). Fixes: a7bc211ac926 ("xprtrdma: On disconnect, don't ignore ... ") BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=276 Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25	xprtrdma: Refactor tasklet scheduling	Chuck Lever	1	-5/+12
	Restore the separate function that schedules the reply handling tasklet. I need to call it from two different paths. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25	xprtrdma: unmap all FMRs during transport disconnect	Chuck Lever	2	-2/+42
	When using RPCRDMA_MTHCAFMR memory registration, after a few transport disconnect / reconnect cycles, ib_map_phys_fmr() starts to return EINVAL because the provider has exhausted its map pool. Make sure that all FMRs are unmapped during transport disconnect, and that ->send_request remarshals them during an RPC retransmit. This resets the transport's MRs to ensure that none are leaked during a disconnect. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25	xprtrdma: Cap req_cqinit	Chuck Lever	2	-1/+9
	Recent work made FRMR registration and invalidation completions unsignaled. This greatly reduces the adapter interrupt rate. Every so often, however, a posted send Work Request is allowed to signal. Otherwise, the provider's Work Queue will wrap and the workload will hang. The number of Work Requests that are allowed to remain unsignaled is determined by the value of req_cqinit. Currently, this is set to the size of the send Work Queue divided by two, minus 1. For FRMR, the send Work Queue is the maximum number of concurrent RPCs (currently 32) times the maximum number of Work Requests an RPC might use (currently 7, though some adapters may need more). For mlx4, this is 224 entries. This leaves completion signaling disabled for 111 send Work Requests. Some providers hold back dispatching Work Requests until a CQE is generated. If completions are disabled, then no CQEs are generated for quite some time, and that can stall the Work Queue. I've seen this occur running xfstests generic/113 over NFSv4, where eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM because the Work Queue has overflowed. The connection is dropped and re-established. Cap the rep_cqinit setting so completions are not left turned off for too long. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269 Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-25	xprtrdma: Return an errno from rpcrdma_register_external()	Chuck Lever	1	-2/+2
	The RPC/RDMA send_request method and the chunk registration code expects an errno from the registration function. This allows the upper layers to distinguish between a recoverable failure (for example, temporary memory exhaustion) and a hard failure (for example, a bug in the registration logic). Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-11-24	sunrpc: eliminate RPC_TRACEPOINTS	Jeff Layton	1	-1/+1
	It's always set to the same value as CONFIG_TRACEPOINTS, so we can just use that instead. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24	sunrpc: eliminate RPC_DEBUG	Jeff Layton	28	-44/+44
	It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24	sunrpc: add tracepoints in xs_tcp_data_recv	Jeff Layton	1	-59/+2
	Add tracepoints inside the main loop on xs_tcp_data_recv that allow us to keep an eye on what's happening during each phase of it. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24	sunrpc: add new tracepoints in xprt handling code	Jeff Layton	2	-2/+15
	...so we can keep track of when calls are sent and replies received. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-24	sunrpc: add some tracepoints in svc_rqst handling functions	Jeff Layton	2	-19/+33
	...just around svc_send, svc_recv and svc_process for now. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-11-19	merge nfs bugfixes into nfsd for-3.19 branch	J. Bruce Fields	2	-16/+46
	In addition to nfsd bugfixes, there are some fixes in -rc5 for client bugs that can interfere with my testing.
2014-11-19	SUNRPC: Fix locking around callback channel reply receive	Trond Myklebust	1	-11/+16
	Both xprt_lookup_rqst() and xprt_complete_rqst() require that you take the transport lock in order to avoid races with xprt_transmit(). Signed-off-by: Trond Myklebust <[email protected]> Cc: [email protected] Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-11-13	sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor	Jeff Layton	1	-5/+30
	Bruce reported that he was seeing the following BUG pop: BUG: sleeping function called from invalid context at mm/slab.c:2846 in_atomic(): 0, irqs_disabled(): 0, pid: 4539, name: mount.nfs 2 locks held by mount.nfs/4539: #0: (nfs_clid_init_mutex){+.+.+.}, at: [<ffffffffa01c0a9a>] nfs4_discover_server_trunking+0x4a/0x2f0 [nfsv4] #1: (rcu_read_lock){......}, at: [<ffffffffa00e3185>] gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss] Preemption disabled at:[<ffffffff81a4f082>] printk+0x4d/0x4f CPU: 3 PID: 4539 Comm: mount.nfs Not tainted 3.18.0-rc1-00013-g5b095e9 #3393 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 ffff880021499390 ffff8800381476a8 ffffffff81a534cf 0000000000000001 0000000000000000 ffff8800381476c8 ffffffff81097854 00000000000000d0 0000000000000018 ffff880038147718 ffffffff8118e4f3 0000000020479f00 Call Trace: [<ffffffff81a534cf>] dump_stack+0x4f/0x7c [<ffffffff81097854>] __might_sleep+0x114/0x180 [<ffffffff8118e4f3>] __kmalloc+0x1a3/0x280 [<ffffffffa00e31d8>] gss_stringify_acceptor+0x58/0xb0 [auth_rpcgss] [<ffffffffa00e3185>] ? gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss] [<ffffffffa006b438>] rpcauth_stringify_acceptor+0x18/0x30 [sunrpc] [<ffffffffa01b0469>] nfs4_proc_setclientid+0x199/0x380 [nfsv4] [<ffffffffa01b04d0>] ? nfs4_proc_setclientid+0x200/0x380 [nfsv4] [<ffffffffa01bdf1a>] nfs40_discover_server_trunking+0xda/0x150 [nfsv4] [<ffffffffa01bde45>] ? nfs40_discover_server_trunking+0x5/0x150 [nfsv4] [<ffffffffa01c0acf>] nfs4_discover_server_trunking+0x7f/0x2f0 [nfsv4] [<ffffffffa01c8e24>] nfs4_init_client+0x104/0x2f0 [nfsv4] [<ffffffffa01539b4>] nfs_get_client+0x314/0x3f0 [nfs] [<ffffffffa0153780>] ? nfs_get_client+0xe0/0x3f0 [nfs] [<ffffffffa01c83aa>] nfs4_set_client+0x8a/0x110 [nfsv4] [<ffffffffa0069708>] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc] [<ffffffffa01c9b2f>] nfs4_create_server+0x12f/0x390 [nfsv4] [<ffffffffa01c1472>] nfs4_remote_mount+0x32/0x60 [nfsv4] [<ffffffff81196489>] mount_fs+0x39/0x1b0 [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20 [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150 [<ffffffffa01c1396>] nfs_do_root_mount+0x86/0xc0 [nfsv4] [<ffffffffa01c1784>] nfs4_try_mount+0x44/0xc0 [nfsv4] [<ffffffffa01549b7>] ? get_nfs_version+0x27/0x90 [nfs] [<ffffffffa0161a2d>] nfs_fs_mount+0x47d/0xd60 [nfs] [<ffffffff81a59c5e>] ? mutex_unlock+0xe/0x10 [<ffffffffa01606a0>] ? nfs_remount+0x430/0x430 [nfs] [<ffffffffa01609c0>] ? nfs_clone_super+0x140/0x140 [nfs] [<ffffffff81196489>] mount_fs+0x39/0x1b0 [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20 [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150 [<ffffffff811b5830>] do_mount+0x210/0xbe0 [<ffffffff811b54ca>] ? copy_mount_options+0x3a/0x160 [<ffffffff811b651f>] SyS_mount+0x6f/0xb0 [<ffffffff81a5c852>] system_call_fastpath+0x12/0x17 Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping the rcu_read_lock before doing the allocation and then reacquiring it and redoing the dereference before doing the copy. If we find that the string has somehow grown in the meantime, we'll reallocate and try again. Cc: <[email protected]> # v3.17+ Reported-by: "J. Bruce Fields" <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-10-29	SUNRPC: off by one in BUG_ON()	Dan Carpenter	1	-1/+1
	The m->pool_to[] array has "maxpools" number of elements. It's allocated in svc_pool_map_alloc_arrays() which we called earlier in the function. This test should be >= instead of >. Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-10-23	rpc: change comments to assertions	J. Bruce Fields	1	-2/+3
	Reported-by: Andrea Arcangeli <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-10-23	RPC: remove unneeded checks from xdr_truncate_encode()	J. Bruce Fields	1	-2/+2
	Thanks to Andrea Arcangeli for pointing out these checks are obviously unnecessary given the preceding calculations. Reported-by: Andrea Arcangeli <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-10-08	Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux	Linus Torvalds	5	-69/+48
	Pull nfsd updates from Bruce Fields: "Highlights: - support the NFSv4.2 SEEK operation (allowing clients to support SEEK_HOLE/SEEK_DATA), thanks to Anna. - end the grace period early in a number of cases, mitigating a long-standing annoyance, thanks to Jeff - improve SMP scalability, thanks to Trond" * 'for-3.18' of git://linux-nfs.org/~bfields/linux: (55 commits) nfsd: eliminate "to_delegation" define NFSD: Implement SEEK NFSD: Add generic v4.2 infrastructure svcrdma: advertise the correct max payload nfsd: introduce nfsd4_callback_ops nfsd: split nfsd4_callback initialization and use nfsd: introduce a generic nfsd4_cb nfsd: remove nfsd4_callback.cb_op nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence nfsd: fix nfsd4_cb_recall_done error handling nfsd4: clarify how grace period ends nfsd4: stop grace_time update at end of grace period nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls nfsd: serialize nfsdcltrack upcalls for a particular client nfsd: pass extra info in env vars to upcalls to allow for early grace period end nfsd: add a v4_end_grace file to /proc/fs/nfsd lockd: add a /proc/fs/lockd/nlm_end_grace file nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE nfsd: remove redundant boot_time parm from grace_done client tracking op ...
2014-09-30	Merge branch 'bugfixes' into linux-next	Trond Myklebust	1	-0/+3
	* bugfixes: NFSv4.1: Fix an NFSv4.1 state renewal regression NFSv4: fix open/lock state recovery error handling NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails NFS: Fabricate fscache server index key correctly SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT nfs: fix duplicate proc entries
2014-09-29	svcrdma: advertise the correct max payload	Steve Wise	2	-1/+8
	Svcrdma currently advertises 1MB, which is too large. The correct value is the minimum of RPCSVC_MAXPAYLOAD and the max scatter-gather allowed in an NFSRDMA IO chunk * the host page size. This bug is usually benign because the Linux X64 NFSRDMA client correctly limits the payload size to the correct value (64*4096 = 256KB). But if the Linux client is PPC64 with a 64KB page size, then the client will indeed use a payload size that will overflow the server. Signed-off-by: Steve Wise <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-09-25	SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT	Trond Myklebust	1	-0/+3
	The flag RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT was intended introduced in order to allow NFSv4 clients to disable resend timeouts. Since those cause the RPC layer to break the connection, they mess up the duplicate reply caches that remain indexed on the port number in NFSv4.. This patch includes the code that was missing in the original to set the appropriate flag in struct rpc_clnt, when the caller of rpc_create() sets RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT. Fixes: 8a19a0b6cb2e (SUNRPC: Add RPC task and client level options to...) Signed-off-by: Trond Myklebust <[email protected]>
2014-09-25	NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()	NeilBrown	3	-14/+0
	Now that nfs_release_page() doesn't block indefinitely, other deadlock avoidance mechanisms aren't needed. - it doesn't hurt for kswapd to block occasionally. If it doesn't want to block it would clear __GFP_WAIT. The current_is_kswapd() was only added to avoid deadlocks and we have a new approach for that. - memory allocation in the SUNRPC layer can very rarely try to ->releasepage() a page it is trying to handle. The deadlock is removed as nfs_release_page() doesn't block indefinitely. So we don't need to set PF_FSTRANS for sunrpc network operations any more. Signed-off-by: NeilBrown <[email protected]> Acked-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-09-24	rpc: Add -EPERM processing for xs_udp_send_request()	Jason Baron	2	-0/+8
	If an iptables drop rule is added for an nfs server, the client can end up in a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM is ignored since the prior bits of the packet may have been successfully queued and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request() thinks that because some bits were queued it should return -EAGAIN. We then try the request again and again, resulting in cpu spinning. Reproducer: 1) open a file on the nfs server '/nfs/foo' (mounted using udp) 2) iptables -A OUTPUT -d <nfs server ip> -j DROP 3) write to /nfs/foo 4) close /nfs/foo 5) iptables -D OUTPUT -d <nfs server ip> -j DROP The softlockup occurs in step 4 above. The previous patch, allows xs_sendpages() to return both a sent count and any error values that may have occurred. Thus, if we get an -EPERM, return that to the higher level code. With this patch in place we can successfully abort the above sequence and avoid the softlockup. I also tried the above test case on an nfs mount on tcp and although the system does not softlockup, I still ended up with the 'hung_task' firing after 120 seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix, since -EPERM appears to get ignored much lower down in the stack and does not propogate up to xs_sendpages(). This case is not quite as insidious as the softlockup and it is not addressed here. Reported-by: Yigong Lou <[email protected]> Signed-off-by: Jason Baron <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-09-24	rpc: return sent and err from xs_sendpages()	Jason Baron	1	-39/+42
	If an error is returned after the first bits of a packet have already been successfully queued, xs_sendpages() will return a positive 'int' value indicating success. Callers seem to treat this as -EAGAIN. However, there are cases where its not a question of waiting for the write queue to drain. For example, when there is an iptables rule dropping packets to the destination, the lower level code can return -EPERM only after parts of the packet have been successfully queued. In this case, we can end up continuously retrying resulting in a kernel softlockup. This patch is intended to make no changes in behavior but is in preparation for subsequent patches that can make decisions based on both on the number of bytes sent by xs_sendpages() and any errors that may have be returned. Signed-off-by: Jason Baron <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-09-24	SUNRPC: Don't wake tasks during connection abort	Benjamin Coddington	1	-0/+4
	When aborting a connection to preserve source ports, don't wake the task in xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if the connection needs to be re-established since it preserves the task's status instead of setting it to the status of the aborting kernel_connect(). This may also avoid a potential conflict on the socket's lock. Signed-off-by: Benjamin Coddington <[email protected]> Cc: [email protected] # 3.14+ Signed-off-by: Trond Myklebust <[email protected]>
2014-09-10	rpc: xs_bind - do not bind when requesting a random ephemeral port	Chris Perl	1	-2/+18
	When attempting to establish a local ephemeral endpoint for a TCP or UDP socket, do not explicitly call bind, instead let it happen implicilty when the socket is first used. The main motivating factor for this change is when TCP runs out of unique ephemeral ports (i.e. cannot find any ephemeral ports which are not a part of any TCP connection). In this situation if you explicitly call bind, then the call will fail with EADDRINUSE. However, if you allow the allocation of an ephemeral port to happen implicitly as part of connect (or other functions), then ephemeral ports can be reused, so long as the combination of (local_ip, local_port, remote_ip, remote_port) is unique for TCP sockets on the system. This doesn't matter for UDP sockets, but it seemed easiest to treat TCP and UDP sockets the same. This can allow mount.nfs(8) to continue to function successfully, even in the face of misbehaving applications which are creating a large number of TCP connections. Signed-off-by: Chris Perl <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-08-28	sunrpc: fix byte-swapping of displayed XID	Chuck Lever	1	-1/+1
	xprt_lookup_rqst() and bc_send_request() display a byte-swapped XID, but receive_cb_reply() does not. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-08-28	SUNRPC: Fix compile on non-x86	J. Bruce Fields	1	-4/+0
	current_task appears to be x86-only, oops. Let's just delete this check entirely: Any developer that adds a new user without setting rq_task will get a crash the first time they test it. I also don't think there are normally any important locks held here, and I can't see any other reason why killing a server thread would bring the whole box down. So the effort to fail gracefully here looks like overkill. Reported-by: Stephen Rothwell <[email protected]> Fixes: 983c684466e0 "SUNRPC: get rid of the request wait queue" Signed-off-by: J. Bruce Fields <[email protected]>
2014-08-17	SUNRPC: Optimise away svc_recv_available	Trond Myklebust	1	-17/+6
	We really do not want to do ioctls in the server's fast path. Instead, let's use the fact that we managed to read a full record as the indicator that we should try to read the socket again. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-08-17	SUNRPC: More optimisations of svc_xprt_enqueue()	Trond Myklebust	1	-14/+7
	Just move the transport locking out of the spin lock protected area altogether. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-08-17	SUNRPC: Fix broken kthread_should_stop test in svc_get_next_xprt	Trond Myklebust	1	-21/+10
	We should definitely not be exiting svc_get_next_xprt() with the thread enqueued. Fix this by ensuring that we fall through to the dequeue. Also move the test itself outside the spin lock protected section. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-08-17	SUNRPC: get rid of the request wait queue	Trond Myklebust	2	-17/+17
	We're always _only_ waking up tasks from within the sp_threads list, so we know that they are enqueued and alive. The rq_wait waitqueue is just a distraction with extra atomic semantics. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2014-08-17	SUNRPC: Do not grab pool->sp_lock unnecessarily in svc_get_next_xprt	Trond Myklebust	1	-5/+10
	Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>