aboutsummaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_trace.h
AgeCommit message (Collapse)AuthorFilesLines
2016-10-03Merge branch 'xfs-4.9-log-recovery-fixes' into for-nextDave Chinner1-6/+33
2016-09-26xfs: log recovery tracepoints to track current lsn and buffer submissionBrian Foster1-2/+29
Log recovery has particular rules around buffer submission along with tricky corner cases where independent transactions can share an LSN. As such, it can be difficult to follow when/why buffers are submitted during recovery. Add a couple tracepoints to post the current LSN of a record when a new record is being processed and when a buffer is being skipped due to LSN ordering. Also, update the recover item class to include the LSN of the current transaction for the item being processed. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-09-26xfs: remote attribute blocks aren't really userdataDave Chinner1-4/+4
When adding a new remote attribute, we write the attribute to the new extent before the allocation transaction is committed. This means we cannot reuse busy extents as that violates crash consistency semantics. Hence we currently treat remote attribute extent allocation like userdata because it has the same overwrite ordering constraints as userdata. Unfortunately, this also allows the allocator to incorrectly apply extent size hints to the remote attribute extent allocation. This results in interesting failures, such as transaction block reservation overruns and in-memory inode attribute fork corruption. To fix this, we need to separate the busy extent reuse configuration from the userdata configuration. This changes the definition of XFS_BMAPI_METADATA slightly - it now means that allocation is metadata and reuse of busy extents is acceptible due to the metadata ordering semantics of the journal. If this flag is not set, it means the allocation is that has unordered data writeback, and hence busy extent reuse is not allowed. It no longer implies the allocation is for user data, just that the data write will not be strictly ordered. This matches the semantics for both user data and remote attribute block allocation. As such, This patch changes the "userdata" field to a "datatype" field, and adds a "no busy reuse" flag to the field. When we detect an unordered data extent allocation, we immediately set the no reuse flag. We then set the "user data" flags based on the inode fork we are allocating the extent to. Hence we only set userdata flags on data fork allocations now and consider attribute fork remote extents to be an unordered metadata extent. The result is that remote attribute extents now have the expected allocation semantics, and the data fork allocation behaviour is completely unchanged. It should be noted that there may be other ways to fix this (e.g. use ordered metadata buffers for the remote attribute extent data write) but they are more invasive and difficult to validate both from a design and implementation POV. Hence this patch takes the simple, obvious route to fixing the problem... Reported-and-tested-by: Ross Zwisler <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-09-19xfs: set up per-AG free space reservationsDarrick J. Wong1-10/+65
One unfortunate quirk of the reference count and reverse mapping btrees -- they can expand in size when blocks are written to *other* allocation groups if, say, one large extent becomes a lot of tiny extents. Since we don't want to start throwing errors in the middle of CoWing, we need to reserve some blocks to handle future expansion. The transaction block reservation counters aren't sufficient here because we have to have a reserve of blocks in every AG, not just somewhere in the filesystem. Therefore, create two per-AG block reservation pools. One feeds the AGFL so that rmapbt expansion always succeeds, and the other feeds all other metadata so that refcountbt expansion never fails. Use the count of how many reserved blocks we need to have on hand to create a virtual reservation in the AG. Through selective clamping of the maximum length of allocation requests and of the length of the longest free extent, we can make it look like there's less free space in the AG unless the reservation owner is asking for blocks. In other words, play some accounting tricks in-core to make sure that we always have blocks available. On the plus side, there's nothing to clean up if we crash, which is contrast to the strategy that the rough draft used (actually removing extents from the freespace btrees). Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-08-30xfs: track log done items directly in the deferred pending work itemDarrick J. Wong1-1/+1
Christoph reports slab corruption when a deferred refcount update aborts during _defer_finish(). The cause of this was broken log item state tracking in xfs_defer_pending -- upon an abort, _defer_trans_abort() will call abort_intent on all intent items, including the ones that have already had a done item attached. This is incorrect because each intent item has 2 refcount: the first is released when the intent item is committed to the log; and the second is released when the _done_ item is committed to the log, or by the intent creator if there is no done item. In other words, once we log the done item, responsibility for releasing the intent item's second refcount is transferred to the done item and /must not/ be performed by anything else. The dfp_committed flag should have been tracking whether or not we had a done item so that _defer_trans_abort could decide if it needs to abort the intent item, but due to a thinko this was not the case. Rip it out and track the done item directly so that we do the right thing w.r.t. intent item freeing. Signed-off-by: Darrick J. Wong <[email protected]> Reported-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-08-17xfs: simplify xfs_file_iomap_beginChristoph Hellwig1-1/+0
We'll never get nimap == 0 for a successful return from xfs_bmapi_read, so don't try to handle it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-08-03xfs: convert unwritten status of reverse mappingsDarrick J. Wong1-0/+5
Provide a function to convert an unwritten rmap extent to a real one and vice versa. [ dchinner: Note that this algorithm and code was derived from the existing bmapbt unwritten extent conversion code in xfs_bmap_add_extent_unwritten_real(). ] Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-08-03xfs: add an extent to the rmap btreeDarrick J. Wong1-0/+2
Originally-From: Dave Chinner <[email protected]> Now all the btree, free space and transaction infrastructure is in place, we can finally add the code to insert reverse mappings to the rmap btree. Freeing will be done in a separate patch, so just the addition operation can be focussed on here. [darrick: handle owner offsets when adding rmaps] [dchinner: remove remaining debug printk statements] [darrick: move unwritten bit to rm_offset] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-08-03xfs: add tracepoints for the rmap functionsDarrick J. Wong1-2/+49
Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-08-03xfs: add rmap btree operationsDarrick J. Wong1-0/+3
Originally-From: Dave Chinner <[email protected]> Implement the generic btree operations needed to manipulate rmap btree blocks. This is very similar to the per-ag freespace btree implementation, and uses the AGFL for allocation and freeing of blocks. Adapt the rmap btree to store owner offsets within each rmap record, and to handle the primary key being redefined as the tuple [agblk, owner, offset]. The expansion of the primary key is crucial to allowing multiple owners per extent. [darrick: adapt the btree ops to deal with offsets] [darrick: remove init_rec_from_key] [darrick: move unwritten bit to rm_offset] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-08-03xfs: define the on-disk rmap btree formatDarrick J. Wong1-0/+2
Originally-From: Dave Chinner <[email protected]> Now we have all the surrounding call infrastructure in place, we can start filling out the rmap btree implementation. Start with the on-disk btree format; add everything needed to read, write and manipulate rmap btree blocks. This prepares the way for adding the btree operations implementation. [darrick: record owner and offset info in rmap btree] [darrick: fork, bmbt and unwritten state in rmap btree] [darrick: flags are a separate field in xfs_rmap_irec] [darrick: calculate maxlevels separately] [darrick: move the 'unwritten' bit into unused parts of rm_offset] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-08-03xfs: introduce rmap extent operation stubsDarrick J. Wong1-0/+77
Originally-From: Dave Chinner <[email protected]> Add the stubs into the extent allocation and freeing paths that the rmap btree implementation will hook into. While doing this, add the trace points that will be used to track rmap btree extent manipulations. [[email protected]: Extend the stubs to take full owner info.] Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-08-03xfs: add tracepoints and error injection for deferred extent freeingDarrick J. Wong1-1/+4
Add a couple of tracepoints for the deferred extent free operation and a site for injecting errors while finishing the operation. This makes it easier to debug deferred ops and test log redo. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-08-03xfs: add tracepoints for the deferred ops mechanismDarrick J. Wong1-0/+198
Add tracepoints for the internals of the deferred ops mechanism and tracepoint classes for clients of the dops, to make debugging easier. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-08-03xfs: introduce interval queries on btreesDarrick J. Wong1-0/+1
Create a function to enable querying of btree records mapping to a range of keys. This will be used in subsequent patches to allow querying the reverse mapping btree to find the extents mapped to a range of physical blocks, though the generic code can be used for any range query. The overlapped query range function needs to use the btree get_block helper because the root block could be an inode, in which case bc_bufs[nlevels-1] will be NULL. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-08-03xfs: support btrees with overlapping intervals for keysDarrick J. Wong1-0/+36
On a filesystem with both reflink and reverse mapping enabled, it's possible to have multiple rmap records referring to the same blocks on disk. When overlapping intervals are possible, querying a classic btree to find all records intersecting a given interval is inefficient because we cannot use the left side of the search interval to filter out non-matching records the same way that we can use the existing btree key to filter out records coming after the right side of the search interval. This will become important once we want to use the rmap btree to rebuild BMBTs, or implement the (future) fsmap ioctl. (For the non-overlapping case, we can perform such queries trivially by starting at the left side of the interval and walking the tree until we pass the right side.) Therefore, extend the btree code to come closer to supporting intervals as a first-class record attribute. This involves widening the btree node's key space to store both the lowest key reachable via the node pointer (as the btree does now) and the highest key reachable via the same pointer and teaching the btree modifying functions to keep the highest-key records up to date. This behavior can be turned on via a new btree ops flag so that btrees that cannot store overlapping intervals don't pay the overhead costs in terms of extra code and disk format changes. When we're deleting a record in a btree that supports overlapped interval records and the deletion results in two btree blocks being joined, we defer updating the high/low keys until after all possible joining (at higher levels in the tree) have finished. At this point, the btree pointers at all levels have been updated to remove the empty blocks and we can update the low and high keys. When we're doing this, we must be careful to update the keys of all node pointers up to the root instead of stopping at the first set of keys that don't need updating. This is because it's possible for a single deletion to cause joining of multiple levels of tree, and so we need to update everything going back to the root. The diff_two_keys functions return < 0, 0, or > 0 if key1 is less than, equal to, or greater than key2, respectively. This is consistent with the rest of the kernel and the C library. In btree_updkeys(), we need to evaluate the force_all parameter before running the key diff to avoid reading uninitialized memory when we're forcing a key update. This happens when we've allocated an empty slot at level N + 1 to point to a new block at level N and we're in the process of filling out the new keys. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-07-20Merge branch 'xfs-4.8-split-dax-dio' into for-nextDave Chinner1-11/+10
2016-07-20xfs: split direct I/O and DAX pathChristoph Hellwig1-0/+2
So far the DAX code overloaded the direct I/O code path. There is very little in common between the two, and untangling them allows to clean up both variants. As a side effect we also get separate trace points for both I/O types. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-07-20xfs: kill ioflagsChristoph Hellwig1-11/+8
Now that we have the direct I/O kiocb flag there is no real need to sample the value inside of XFS, and the invis flag was always just partially used and isn't worth keeping this infrastructure around for. This also splits the read tracepoint into buffered vs direct as we've done for writes a long time ago. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-06-21Merge branch 'xfs-4.8-misc-fixes-2' into for-nextDave Chinner1-0/+1
2016-06-21xfs: enable buffer deadlock postmortem diagnosis via ftraceDarrick J. Wong1-0/+1
Create a second buf_trylock tracepoint so that we can distinguish between a successful and a failed trylock. With this piece, we can use a script to look at the ftrace output to detect buffer deadlocks. [dchinner: update to if/else as per hch's suggestion] Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-06-21xfs: implement iomap based buffered write pathChristoph Hellwig1-0/+3
Convert XFS to use the new iomap based multipage write path. This involves implementing the ->iomap_begin and ->iomap_end methods, and switching the buffered file write, page_mkwrite and xfs_iozero paths to the new iomap helpers. With this change __xfs_get_blocks will never be used for buffered writes, and the code handling them can be removed. Based on earlier code from Dave Chinner. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Bob Peterson <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-05-20Merge branch 'xfs-4.7-error-cfg' into for-nextDave Chinner1-1/+0
2016-05-20Merge branch 'xfs-4.7-misc-fixes' into for-nextDave Chinner1-4/+6
2016-05-18xfs: add configurable error support to metadata buffersCarlos Maiolino1-1/+0
With the error configuration handle for async metadata write errors in place, we can now add initial support to the IO error processing in xfs_buf_iodone_error(). Add an infrastructure function to look up the configuration handle, and rearrange the error handling to prepare the way for different error handling conigurations to be used. Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-04-06xfs: Add caller function output to xfs_log_force tracepointCarlos Maiolino1-4/+6
I had sent this patch yesterday, but for some reason it didn't reach xfs list, sending again. Output the caller of xfs_log_force might be useful when tracing log checkpoint problems without the need to build kernel with DEBUG. Signed-off-by: Carlos Maiolino <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-04-06xfs: remove transaction typesChristoph Hellwig1-4/+1
These aren't used for CIL-style logging and can be dropped. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-02-08xfs: don't use ioends for direct write completionsChristoph Hellwig1-5/+4
We only need to communicate two bits of information to the direct I/O completion handler: (1) do we need to convert any unwritten extents in the range (2) do we need to check if we need to update the inode size based on the range passed to the completion handler We can use the private data passed to the get_block handler and the completion handler as a simple bitmask to communicate this information instead of the current complicated infrastructure reusing the ioends from the buffer I/O path, and thus avoiding a memory allocation and a context switch for any non-trivial direct write. As a nice side effect we also decouple the direct I/O path implementation from that of the buffered I/O path. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2016-01-08xfs: add tracepoints to readpage callsDave Chinner1-0/+26
This allows us to see page cache driven readahead in action as it passes through XFS. This helps to understand buffered read throughput problems such as readahead IO IO sizes being too small for the underlying device to reach max throughput. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2015-11-03Merge branch 'xfs-dax-updates' into for-nextDave Chinner1-0/+1
2015-11-03xfs: add ->pfn_mkwrite support for DAXDave Chinner1-0/+1
->pfn_mkwrite support is needed so that when a page with allocated backing store takes a write fault we can check that the fault has not raced with a truncate and is pointing to a region beyond the current end of file. This also allows us to update the timestamp on the inode, too, which fixes a generic/080 failure. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2015-10-12xfs: add an xfs_zero_eof() tracepointBrian Foster1-0/+1
Add a tracepoint in xfs_zero_eof() to facilitate tracking and debugging EOF zeroing events. This has proven useful in the context of other direct I/O tracepoints to ensure EOF zeroing occurs within appropriate file ranges. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2015-09-08xfs: huge page fault supportMatthew Wilcox1-0/+1
Use DAX to provide support for huge pages. Signed-off-by: Matthew Wilcox <[email protected]> Cc: Hillf Danton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-08-19xfs: icreate log item recovery and cancellation tracepointsBrian Foster1-0/+34
Various log items have recovery tracepoints to identify whether a particular log item is recovered or cancelled. Add the equivalent tracepoints for the icreate transaction. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2015-05-29xfs: allocate sparse inode chunks on full chunk allocation failureBrian Foster1-0/+47
xfs_ialloc_ag_alloc() makes several attempts to allocate a full inode chunk. If all else fails, reduce the allocation to the sparse length and alignment and attempt to allocate a sparse inode chunk. If sparse chunk allocation succeeds, check whether an inobt record already exists that can track the chunk. If so, inherit and update the existing record. Otherwise, insert a new record for the sparse chunk. Create helpers to align sparse chunk inode records and insert or update existing records in the inode btrees. The xfs_inobt_insert_sprec() helper implements the merge or update semantics required for sparse inode records with respect to both the inobt and finobt. To update the inobt, either insert a new record or merge with an existing record. To update the finobt, use the updated inobt record to either insert or replace an existing record. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2015-04-16Merge branch 'xfs-dio-extend-fix' into for-nextDave Chinner1-0/+5
Conflicts: fs/xfs/xfs_file.c
2015-04-16xfs: DIO writes within EOF don't need an ioendDave Chinner1-0/+1
DIO writes that lie entirely within EOF have nothing to do in IO completion. In this case, we don't need no steekin' ioend, and so we can avoid allocating an ioend until we have a mapping that spans EOF. This means that IO completion has two contexts - deferred completion to the dio workqueue that uses an ioend, and interrupt completion that does nothing because there is nothing that can be done in this context. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2015-04-16xfs: handle DIO overwrite EOF update completion correctlyDave Chinner1-0/+1
Currently a DIO overwrite that extends the EOF (e.g sub-block IO or write into allocated blocks beyond EOF) requires a transaction for the EOF update. Thi is done in IO completion context, but we aren't explicitly handling this situation properly and so it can run in interrupt context. Ensure that we defer IO that spans EOF correctly to the DIO completion workqueue, and now that we have an ioend in IO completion we can use the common ioend completion path to do all the work. Note: we do not preallocate the append transaction as we can have multiple mapping and allocation calls per direct IO. hence preallocating can still leave us with nested transactions by attempting to map and allocate more blocks after we've preallocated an append transaction. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2015-04-16xfs: DIO needs an ioend for writesDave Chinner1-0/+3
Currently we can only tell DIO completion that an IO requires unwritten extent completion. This is done by a hacky non-null private pointer passed to Io completion, but the private pointer does not actually contain any information that is used. We also need to pass to IO completion the fact that the IO may be beyond EOF and so a size update transaction needs to be done. This is currently determined by checks in the io completion, but we need to determine if this is necessary at block mapping time as we need to defer the size update transactions to a completion workqueue, just like unwritten extent conversion. To do this, first we need to allocate and pass an ioend to to IO completion. Add this for unwritten extent conversion; we'll do the EOF updates in the next commit. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2015-03-25Merge branch 'fallocate-insert-range' into for-nextDave Chinner1-0/+1
2015-03-25Merge branch 'xfs-misc-fixes-for-4.1-2' into for-nextDave Chinner1-10/+10
Conflicts: fs/xfs/libxfs/xfs_bmap.c fs/xfs/xfs_inode.c
2015-03-25xfs: Add support FALLOC_FL_INSERT_RANGE for fallocateNamjae Jeon1-0/+1
This patch implements fallocate's FALLOC_FL_INSERT_RANGE for XFS. 1) Make sure that both offset and len are block size aligned. 2) Update the i_size of inode by len bytes. 3) Compute the file's logical block number against offset. If the computed block number is not the starting block of the extent, split the extent such that the block number is the starting block of the extent. 4) Shift all the extents which are lying bewteen [offset, last allocated extent] towards right by len bytes. This step will make a hole of len bytes at offset. Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Ashish Sangwan <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2015-03-25xfs: %pF is only for function pointersScott Wood1-10/+10
Use %pS for actual addresses, otherwise you'll get bad output on arches like ppc64 where %pF expects a function descriptor. Signed-off-by: Scott Wood <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2015-02-23xfs: use i_mmaplock on write faultsDave Chinner1-0/+1
Take the i_mmaplock over write page faults. These come through the ->page_mkwrite callout, so we need to wrap that calls with the i_mmaplock. This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock -> i_lock. Also, move the page_mkwrite wrapper to the same region of xfs_file.c as the read fault wrappers and add a tracepoint. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2015-02-23xfs: use i_mmaplock on read faultsDave Chinner1-0/+2
Take the i_mmaplock over read page faults. These come through the ->fault callout, so we need to wrap the generic implementation with the i_mmaplock. While there, add tracepoints for the read fault as it passes through XFS. This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock -> i_lock. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-10-02xfs: introduce xfs_buf_submit[_wait]Dave Chinner1-1/+2
There is a lot of cookie-cutter code that looks like: if (shutdown) handle buffer error xfs_buf_iorequest(bp) error = xfs_buf_iowait(bp) if (error) handle buffer error spread through XFS. There's significant complexity now in xfs_buf_iorequest() to specifically handle this sort of synchronous IO pattern, but there's all sorts of nasty surprises in different error handling code dependent on who owns the buffer references and the locks. Pull this pattern into a single helper, where we can hide all the synchronous IO warts and hence make the error handling for all the callers much saner. This removes the need for a special extra reference to protect IO completion processing, as we can now hold a single reference across dispatch and waiting, simplifying the sync IO smeantics and error handling. In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and make it explicitly handle on asynchronous IO. This forces all users to be switched specifically to one interface or the other and removes any ambiguity between how the interfaces are to be used. It also means that xfs_buf_iowait() goes away. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-12Merge branch 'for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "This the bunch that sat in -next + lock_parent() fix. This is the minimal set; there's more pending stuff. In particular, I really hope to get acct.c fixes merged this cycle - we need that to deal sanely with delayed-mntput stuff. In the next pile, hopefully - that series is fairly short and localized (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more iov_iter work. Most of prereqs for ->splice_write with sane locking order are there and Kent's dio rewrite would also fit nicely on top of this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits) lock_parent: don't step on stale ->d_parent of all-but-freed one kill generic_file_splice_write() ceph: switch to iter_file_splice_write() shmem: switch to iter_file_splice_write() nfs: switch to iter_splice_write_file() fs/splice.c: remove unneeded exports ocfs2: switch to iter_file_splice_write() ->splice_write() via ->write_iter() bio_vec-backed iov_iter optimize copy_page_{to,from}_iter() bury generic_file_aio_{read,write} lustre: get rid of messing with iovecs ceph: switch to ->write_iter() ceph_sync_direct_write: stop poking into iov_iter guts ceph_sync_read: stop poking into iov_iter guts new helper: copy_page_from_iter() fuse: switch to ->write_iter() btrfs: switch to ->write_iter() ocfs2: switch to ->write_iter() xfs: switch to ->write_iter() ...
2014-06-12->splice_write() via ->write_iter()Al Viro1-1/+0
iter_file_splice_write() - a ->splice_write() instance that gathers the pipe buffers, builds a bio_vec-based iov_iter covering those and feeds it to ->write_iter(). A bunch of simple cases coverted to that... [AV: fixed the braino spotted by Cyrill] Signed-off-by: Al Viro <[email protected]>
2014-05-15Merge branch 'xfs-filestreams-lookup' into for-nextDave Chinner1-0/+58
2014-04-23xfs: add filestream allocator tracepointsChristoph Hellwig1-0/+58
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>