aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-06-04SUNRPC: Move congestion window constants to header fileChuck Lever2-19/+15
I would like to use one of the RPC client's congestion algorithm constants in transport-specific code. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Reset connection timeout after successful reconnectChuck Lever1-0/+1
If the new connection is able to make forward progress, reset the re-establish timeout. Otherwise it keeps growing even if disconnect events are rare. The same behavior as TCP is adopted: reconnect immediately if the transport instance has been able to make some forward progress. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Use macros for reconnection timeout constantsChuck Lever1-7/+12
Clean up: Ensure the same max and min constant values are used everywhere when setting reconnect timeouts. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Allocate missing pagelistShirley Ma1-0/+6
GETACL relies on transport layer to alloc memory for reply buffer. However xprtrdma assumes that the reply buffer (pagelist) has been pre-allocated in upper layer. This problem was reported by IOL OFA lab test on PPC. Signed-off-by: Shirley Ma <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Tested-by: Edward Mossman <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Remove Tavor MTU settingChuck Lever1-14/+0
Clean up. Remove HCA-specific clutter in xprtrdma, which is supposed to be device-independent. Hal Rosenstock <[email protected]> observes: > Note that there is OpenSM option (enable_quirks) to return 1K MTU > in SA PathRecord responses for Tavor so that can be used for this. > The default setting for enable_quirks is FALSE so that would need > changing. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnectingChuck Lever1-9/+20
Devesh Sharma <[email protected]> reports that after a disconnect, his HCA is failing to create a fresh QP, leaving ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to wake up and post LOCAL_INV as they exit, causing an oops. rpcrdma_ep_connect() is allowing the wake-up by leaking the QP creation error code (-EPERM in this case) to the RPC client's generic layer. xprt_connect_status() does not recognize -EPERM, so it kills pending RPC tasks immediately rather than retrying the connect. Re-arrange the QP creation logic so that when it fails on reconnect, it leaves ->qp with the old QP rather than NULL. If pending RPC tasks wake and exit, LOCAL_INV work requests will flush rather than oops. On initial connect, leaving ->qp == NULL is OK, since there are no pending RPCs that might use ->qp. But be sure not to try to destroy a NULL QP when rpcrdma_ep_connect() is retried. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Reduce the number of hardway buffer allocationsChuck Lever1-12/+13
While marshaling an RPC/RDMA request, the inline_{rsize,wsize} settings determine whether an inline request is used, or whether read or write chunks lists are built. The current default value of these settings is 1024. Any RPC request smaller than 1024 bytes is sent to the NFS server completely inline. rpcrdma_buffer_create() allocates and pre-registers a set of RPC buffers for each transport instance, also based on the inline rsize and wsize settings. RPC/RDMA requests and replies are built in these buffers. However, if an RPC/RDMA request is expected to be larger than 1024, a buffer has to be allocated and registered for that RPC, and deregistered and released when the RPC is complete. This is known has a "hardway allocation." Since the introduction of NFSv4, the size of RPC requests has become larger, and hardway allocations are thus more frequent. Hardway allocations are significant overhead, and they waste the existing RPC buffers pre-allocated by rpcrdma_buffer_create(). We'd like fewer hardway allocations. Increasing the size of the pre-registered buffers is the most direct way to do this. However, a blanket increase of the inline thresholds has interoperability consequences. On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000 bytes for each RPC request buffer, using kmalloc(). Due to internal fragmentation, this wastes nearly 1200 bytes because kmalloc() already returns an 8192-byte piece of memory for a 7000-byte allocation request, though the extra space remains unused. So let's round up the size of the pre-allocated buffers, and make use of the unused space in the kmalloc'd memory. This change reduces the amount of hardway allocated memory for an NFSv4 general connectathon run from 1322092 to 9472 bytes (99%). Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Limit work done by completion handlerChuck Lever2-4/+7
Sagi Grimberg <[email protected]> points out that a steady stream of CQ events could starve other work because of the boundless loop pooling in rpcrdma_{send,recv}_poll(). Instead of a (potentially infinite) while loop, return after collecting a budgeted number of completions. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Sagi Grimberg <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrmda: Reduce calls to ib_poll_cq() in completion handlersChuck Lever2-18/+42
Change the completion handlers to grab up to 16 items per ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16 items are returned. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrmda: Reduce lock contention in completion handlersChuck Lever1-4/+10
Skip the ib_poll_cq() after re-arming, if the provider knows there are no additional items waiting. (Have a look at commit ed23a727 for more details). Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Split the completion queueChuck Lever2-92/+137
The current CQ handler uses the ib_wc.opcode field to distinguish between event types. However, the contents of that field are not reliable if the completion status is not IB_WC_SUCCESS. When an error completion occurs on a send event, the CQ handler schedules a tasklet with something that is not a struct rpcrdma_rep. This is never correct behavior, and sometimes it results in a panic. To resolve this issue, split the completion queue into a send CQ and a receive CQ. The send CQ handler now handles only struct rpcrdma_mw wr_id's, and the receive CQ handler now handles only struct rpcrdma_rep wr_id's. Fix suggested by Shirley Ma <[email protected]> Reported-by: Rafael Reiter <[email protected]> Fixes: 5c635e09cec0feeeb310968e51dad01040244851 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211 Signed-off-by: Chuck Lever <[email protected]> Tested-by: Klemens Senn <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Make rpcrdma_ep_destroy() return voidChuck Lever3-13/+4
Clean up: rpcrdma_ep_destroy() returns a value that is used only to print a debugging message. rpcrdma_ep_destroy() already prints debugging messages in all error cases. Make rpcrdma_ep_destroy() return void instead. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Simplify rpcrdma_deregister_external() synopsisChuck Lever4-10/+4
Clean up: All remaining callers of rpcrdma_deregister_external() pass NULL as the last argument, so remove that argument. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: mount reports "Invalid mount option" if memreg mode not supportedChuck Lever1-4/+4
If the selected memory registration mode is not supported by the underlying provider/HCA, the NFS mount command reports that there was an invalid mount option, and fails. This is misleading. Reporting a problem allocating memory is a lot closer to the truth. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Fall back to MTHCAFMR when FRMR is not supportedChuck Lever1-16/+15
An audit of in-kernel RDMA providers that do not support the FRMR memory registration shows that several of them support MTHCAFMR. Prefer MTHCAFMR when FRMR is not supported. If MTHCAFMR is not supported, only then choose ALLPHYSICAL. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Remove REGISTER memory registration modeChuck Lever2-88/+5
All kernel RDMA providers except amso1100 support either MTHCAFMR or FRMR, both of which are faster than REGISTER. amso1100 can continue to use ALLPHYSICAL. The only other ULP consumer in the kernel that uses the reg_phys_mr verb is Lustre. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Remove MEMWINDOWS registration modesChuck Lever4-203/+7
The MEMWINDOWS and MEMWINDOWS_ASYNC memory registration modes were intended as stop-gap modes before the introduction of FRMR. They are now considered obsolete. MEMWINDOWS_ASYNC is also considered unsafe because it can leave client memory registered and exposed for an indeterminant time after each I/O. At this point, the MEMWINDOWS modes add needless complexity, so remove them. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: Remove BOUNCEBUFFERS memory registration modeChuck Lever3-28/+1
Clean up: This memory registration mode is slow and was never meant for use in production environments. Remove it to reduce implementation complexity. Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process contextChuck Lever3-7/+21
An IB provider can invoke rpcrdma_conn_func() in an IRQ context, thus rpcrdma_conn_func() cannot be allowed to directly invoke generic RPC functions like xprt_wake_pending_tasks(). Signed-off-by: Chuck Lever <[email protected]> Tested-by: Steve Wise <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04nfs-rdma: Fix for FMR leaksAllen Andrews1-35/+38
Two memory region leaks were found during testing: 1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is called. If ib_alloc_fast_reg_page_list returns an error it bails out of the routine dropping the last ib_alloc_fast_reg_mr frmr region creating a memory leak. Added code to dereg the last frmr if ib_alloc_fast_reg_page_list fails. 2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free the MR's on the rb_mws list if there are rb_send_bufs present. However, in rpcrdma_buffer_create while the rb_mws list is being built if one of the MR allocation requests fail after some MR's have been allocated on the rb_mws list the routine never gets to create any rb_send_bufs but instead jumps to the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws list because the rb_send_bufs were never created. This leaks all the MR's on the rb_mws list that were created prior to one of the MR allocations failing. Issue(2) was seen during testing. Our adapter had a finite number of MR's available and we created enough connections to where we saw an MR allocation failure on our Nth NFS connection request. After the kernel cleaned up the resources it had allocated for the Nth connection we noticed that FMR's had been leaked due to the coding error described above. Issue(1) was seen during a code review while debugging issue(2). Signed-off-by: Allen Andrews <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-06-04xprtrdma: mind the device's max fast register page list depthSteve Wise3-16/+36
Some rdma devices don't support a fast register page list depth of at least RPCRDMA_MAX_DATA_SEGS. So xprtrdma needs to chunk its fast register regions according to the minimum of the device max supported depth or RPCRDMA_MAX_DATA_SEGS. Signed-off-by: Steve Wise <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2014-05-29Push the file layout driver into a subdirectoryTom Haynes5-12/+15
The object and block layouts already exist in their own subdirectories. This patch completes the set! Note that as a layout denotes nfs4 already, I stripped that prefix out of the file names. Signed-off-by: Tom Haynes <[email protected]> Acked-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29pNFS: Handle allocation errors correctly in objlayout_alloc_layout_hdr()Trond Myklebust1-4/+4
Return the NULL pointer when the allocation fails. Cc: Boaz Harrosh <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29pNFS: Handle allocation errors correctly in filelayout_alloc_layout_hdr()Trond Myklebust1-1/+1
Return the NULL pointer when the allocation fails. Reported-by: Fengguang Wu <[email protected]> Cc: <[email protected]> # 3.5.x Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: Apply NFS_MOUNT_CMP_FLAGMASK to nfs_compare_remount_data()Scott Mayhew1-13/+13
Those flags are obsolete and checking them can incorrectly cause remount operations to fail. Signed-off-by: Scott Mayhew <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29NFSv4: Use error handler on failed GETATTR with successful OPENAndy Adamson1-1/+1
Place the call to resend the failed GETATTR under the error handler so that when appropriate, the GETATTR is retried more than once. The server can fail the GETATTR op in the OPEN compound with a recoverable error such as NFS4ERR_DELAY. In the case of an O_EXCL open, the server has created the file, so a retrans of the OPEN call will fail with NFS4ERR_EXIST. Signed-off-by: Andy Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29NFS: Fix a potential busy wait in nfs_page_group_lockTrond Myklebust1-10/+9
We cannot allow nfs_page_group_lock to use TASK_KILLABLE here, since the loop would cause a busy wait if somebody kills the task. Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29NFS: Fix error handling in __nfs_pageio_add_requestTrond Myklebust1-0/+6
Handle the case where nfs_create_request() returns an error. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29net, sunrpc: suppress allocation warning in rpc_malloc()David Rientjes1-2/+3
rpc_malloc() allocates with GFP_NOWAIT without making any attempt at reclaim so it easily fails when low on memory. This ends up spamming the kernel log: SLAB: Unable to allocate memory on node 0 (gfp=0x4000) cache: kmalloc-8192, object size: 8192, order: 1 node 0: slabs: 207/207, objs: 207/207, free: 0 rekonq: page allocation failure: order:1, mode:0x204000 CPU: 2 PID: 14321 Comm: rekonq Tainted: G O 3.15.0-rc3-12.gfc9498b-desktop+ #6 Hardware name: System manufacturer System Product Name/M4A785TD-V EVO, BIOS 2105 07/23/2010 0000000000000000 ffff880010ff17d0 ffffffff815e693c 0000000000204000 ffff880010ff1858 ffffffff81137bd2 0000000000000000 0000001000000000 ffff88011ffebc38 0000000000000001 0000000000204000 ffff88011ffea000 Call Trace: [<ffffffff815e693c>] dump_stack+0x4d/0x6f [<ffffffff81137bd2>] warn_alloc_failed+0xd2/0x140 [<ffffffff8113be19>] __alloc_pages_nodemask+0x7e9/0xa30 [<ffffffff811824a8>] kmem_getpages+0x58/0x140 [<ffffffff81183de6>] fallback_alloc+0x1d6/0x210 [<ffffffff81183be3>] ____cache_alloc_node+0x123/0x150 [<ffffffff81185953>] __kmalloc+0x203/0x490 [<ffffffffa06b0ee2>] rpc_malloc+0x32/0xa0 [sunrpc] [<ffffffffa06a6999>] call_allocate+0xb9/0x170 [sunrpc] [<ffffffffa06b19d8>] __rpc_execute+0x88/0x460 [sunrpc] [<ffffffffa06b2da9>] rpc_execute+0x59/0xc0 [sunrpc] [<ffffffffa06a932b>] rpc_run_task+0x6b/0x90 [sunrpc] [<ffffffffa077b5c1>] nfs4_call_sync_sequence+0x51/0x80 [nfsv4] [<ffffffffa077d45d>] _nfs4_do_setattr+0x1ed/0x280 [nfsv4] [<ffffffffa0782a72>] nfs4_do_setattr+0x72/0x180 [nfsv4] [<ffffffffa078334c>] nfs4_proc_setattr+0xbc/0x140 [nfsv4] [<ffffffffa074a7e8>] nfs_setattr+0xd8/0x240 [nfs] [<ffffffff811baa71>] notify_change+0x231/0x380 [<ffffffff8119cf5c>] chmod_common+0xfc/0x120 [<ffffffff8119df80>] SyS_chmod+0x40/0x90 [<ffffffff815f4cfd>] system_call_fastpath+0x1a/0x1f ... If the allocation fails, simply return NULL and avoid spamming the kernel log. Reported-by: Marc Dietrich <[email protected]> Signed-off-by: David Rientjes <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: support page groups in nfs_read_completionWeston Andros Adamson1-7/+17
nfs_read_completion relied on the fact that there was a 1:1 mapping of page to nfs_request, but this has now changed. Regions not covered by a request have already been zeroed elsewhere. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29pnfs: filelayout: support non page aligned layoutsWeston Andros Adamson1-32/+20
Use the new pg_test interface to adjust requests to fit in the current stripe / segment. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29pnfs: allow non page aligned pnfs layout segmentsWeston Andros Adamson1-15/+10
Remove alignment checks that would revert to MDS and change pg_test to return the max ammount left in the segment (or other pg_test call) up to size of passed request, or 0 if no space is left. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29pnfs: support multiple verfs per direct reqWeston Andros Adamson4-6/+109
Support direct requests that span multiple pnfs data servers by comparing nfs_pgio_header->verf to a cached verf in pnfs_commit_bucket. Continue to use dreq->verf if the MDS is used / non-pNFS. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: remove data list from pgio headerWeston Andros Adamson3-61/+24
Since the ability to split pages into subpage requests has been added, nfs_pgio_header->rpc_list only ever has one pgio data. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: use > 1 request to handle bsize < PAGE_SIZEWeston Andros Adamson1-69/+11
Use the newly added support for multiple requests per page for rsize/wsize < PAGE_SIZE, instead of having multiple read / write data structures per pageio header. This allows us to get rid of nfs_pgio_multi. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: chain calls to pg_testWeston Andros Adamson3-21/+30
Now that pg_test can change the size of the request (by returning a non-zero size smaller than the request), pg_test functions that call other pg_test functions must return the minimum of the result - or 0 if any fail. Also clean up the logic of some pg_test functions so that all checks are for contitions where coalescing is not possible. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: allow coalescing of subpage requestsWeston Andros Adamson1-4/+0
Remove check that the request covers a whole page. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29pnfs: clean up filelayout_alloc_commit_infoWeston Andros Adamson1-20/+26
Remove unneeded else statement and clean up how commit info dataserver buckets are replaced. Suggested-by: Trond Myklebust <[email protected]> Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: page group support in nfs_mark_uptodateWeston Andros Adamson1-7/+67
Change how nfs_mark_uptodate checks to see if writes cover a whole page. This patch should have no effect yet since all page groups currently have one request, but will come into play when pg_test functions are modified to split pages into sub-page regions. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: page group syncing in write pathWeston Andros Adamson3-12/+24
Operations that modify state for a whole page must be syncronized across all requests within a page group. In the write path, this is calling end_page_writeback and removing the head request from an inode. Both of these operations should not be called until all requests in a page group have reached the point where they would call them. This patch should have no effect yet since all page groups currently have one request, but will come into play when pg_test functions are modified to split pages into sub-page regions. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: page group syncing in read pathWeston Andros Adamson3-5/+21
Operations that modify state for a whole page must be syncronized across all requests within a page group. In the read path, this is calling unlock_page and SetPageUptodate. Both of these functions should not be called until all requests in a page group have reached the point where they would call them. This patch should have no effect yet since all page groups currently have one request, but will come into play when pg_test functions are modified to split pages into sub-page regions. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: add support for multiple nfs reqs per pageWeston Andros Adamson5-21/+236
Add "page groups" - a circular list of nfs requests (struct nfs_page) that all reference the same page. This gives nfs read and write paths the ability to account for sub-page regions independently. This somewhat follows the design of struct buffer_head's sub-page accounting. Only "head" requests are ever added/removed from the inode list in the buffered write path. "head" and "sub" requests are treated the same through the read path and the rest of the write/commit path. Requests are given an extra reference across the life of the list. Page groups are never rejoined after being split. If the read/write request fails and the client falls back to another path (ie revert to MDS in PNFS case), the already split requests are pushed through the recoalescing code again, which may split them further and then coalesce them into properly sized requests on the wire. Fragmentation shouldn't be a problem with the current design, because we flush all requests in page group when a non-contiguous request is added, so the only time resplitting should occur is on a resend of a read or write. This patch lays the groundwork for sub-page splitting, but does not actually do any splitting. For now all page groups have one request as pg_test functions don't yet split pages. There are several related patches that are needed support multiple requests per page group. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: call nfs_can_coalesce_requests for every reqWeston Andros Adamson2-15/+22
Call nfs_can_coalesce_requests for every request, even the first one. This is needed for future patches to give pg_test a way to inform add_request to reduce the size of the request. Now @prev can be null in nfs_can_coalesce_requests and pg_test functions. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: modify pg_test interface to return size_tWeston Andros Adamson7-23/+62
This is a step toward allowing pg_test to inform the the coalescing code to reduce the size of requests so they may fit in whatever scheme the pg_test callback wants to define. For now, just return the size of the request if there is space, or 0 if there is not. This shouldn't change any behavior as it acts the same as when the pg_test functions returned bool. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: remove unused arg from nfs_create_requestWeston Andros Adamson5-12/+6
@inode is passed but not used. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29nfs: clean up PG_* flagsWeston Andros Adamson1-6/+4
Remove unused flags PG_NEED_COMMIT and PG_NEED_RESCHED. Add comments describing how each flag is used. Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29pnfs: fix race in filelayout commit pathWeston Andros Adamson1-4/+16
Hold the lock while modifying commit info dataserver buckets. The following oops can be reproduced by running iozone for a while against a 2 DS pynfs filelayout server. general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 nfs fscache CPU: 0 PID: 903 Comm: iozone Not tainted 3.15.0-rc1-branch-dros_testing+ #44 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference task: ffff880078164480 ti: ffff88006e972000 task.ti: ffff88006e972000 RIP: 0010:[<ffffffffa01936e1>] [<ffffffffa01936e1>] nfs_init_commit+0x22/0x RSP: 0018:ffff88006e973d30 EFLAGS: 00010246 RAX: ffff88006e973e00 RBX: ffff88006e828800 RCX: ffff88006e973e10 RDX: 0000000000000000 RSI: ffff88006e973e00 RDI: dead4ead00000000 RBP: ffff88006e973d38 R08: ffff88006e8289d8 R09: 0000000000000000 R10: ffff88006e8289d8 R11: 0000000000016988 R12: ffff88006e973b98 R13: ffff88007a0a6648 R14: ffff88006e973e10 R15: ffff88006e828800 FS: 00007f2ce396b740(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f03278a1000 CR3: 0000000079043000 CR4: 00000000001407f0 Stack: ffff88006e8289d8 ffff88006e973da8 ffffffffa00f144f ffff88006e9478c0 ffff88006e973e00 ffff88006de21080 0000000100000002 ffff880079be6c48 ffff88006e973d70 ffff88006e973d70 ffff88006e973e10 ffff88006de21080 Call Trace: [<ffffffffa00f144f>] filelayout_commit_pagelist+0x1ae/0x34a [nfs_layout_nfsv [<ffffffffa0194f72>] nfs_generic_commit_list+0x92/0xc4 [nfs] [<ffffffffa0195053>] nfs_commit_inode+0xaf/0x114 [nfs] [<ffffffffa01892bd>] nfs_file_fsync_commit+0x82/0xbe [nfs] [<ffffffffa01ceb0d>] nfs4_file_fsync+0x59/0x9b [nfsv4] [<ffffffff8114ee3c>] vfs_fsync_range+0x18/0x20 [<ffffffff8114ee60>] vfs_fsync+0x1c/0x1e [<ffffffffa01891c2>] nfs_file_flush+0x7f/0x84 [nfs] [<ffffffff81127a43>] filp_close+0x3c/0x72 [<ffffffff81140e12>] __close_fd+0x82/0x9a [<ffffffff81127a9c>] SyS_close+0x23/0x4c [<ffffffff814acd12>] system_call_fastpath+0x16/0x1b Code: 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8 RIP [<ffffffffa01936e1>] nfs_init_commit+0x22/0xe1 [nfs] RSP <ffff88006e973d30> ---[ end trace 732fe6419b235e2f ]--- Suggested-by: Trond Myklebust <[email protected]> Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29NFS: Create a common nfs_pageio_ops structAnna Schumaker4-17/+11
At this point the read and write structures look identical, so combine them into something shared by both. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29NFS: Create a common generic_pg_pgios()Anna Schumaker4-51/+28
What we have here is two functions that look identical. Let's share some more code! Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-05-29NFS: Create a common multiple_pgios() functionAnna Schumaker4-60/+25
Once again, these two functions look identical in the read and write case. Time to combine them together! Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>