From 5eda43000064a69a39fb7869cc63c9571535ad29 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Thu, 2 Feb 2017 15:14:02 -0800 Subject: xfs: mark speculative prealloc CoW fork extents unwritten Christoph Hellwig pointed out that there's a potentially nasty race when performing simultaneous nearby directio cow writes: "Thread 1 writes a range from B to c " B --------- C p "a little later thread 2 writes from A to B " A --------- B p [editor's note: the 'p' denote cowextsize boundaries, which I added to make this more clear] "but the code preallocates beyond B into the range where thread "1 has just written, but ->end_io hasn't been called yet. "But once ->end_io is called thread 2 has already allocated "up to the extent size hint into the write range of thread 1, "so the end_io handler will splice the unintialized blocks from "that preallocation back into the file right after B." We can avoid this race by ensuring that thread 1 cannot accidentally remap the blocks that thread 2 allocated (as part of speculative preallocation) as part of t2's write preparation in t1's end_io handler. The way we make this happen is by taking advantage of the unwritten extent flag as an intermediate step. Recall that when we begin the process of writing data to shared blocks, we create a delayed allocation extent in the CoW fork: D: --RRRRRRSSSRRRRRRRR--- C: ------DDDDDDD--------- When a thread prepares to CoW some dirty data out to disk, it will now convert the delalloc reservation into an /unwritten/ allocated extent in the cow fork. The da conversion code tries to opportunistically allocate as much of a (speculatively prealloc'd) extent as possible, so we may end up allocating a larger extent than we're actually writing out: D: --RRRRRRSSSRRRRRRRR--- U: ------UUUUUUU--------- Next, we convert only the part of the extent that we're actively planning to write to normal (i.e. not unwritten) status: D: --RRRRRRSSSRRRRRRRR--- U: ------UURRUUU--------- If the write succeeds, the end_cow function will now scan the relevant range of the CoW fork for real extents and remap only the real extents into the data fork: D: --RRRRRRRRSRRRRRRRR--- U: ------UU--UUU--------- This ensures that we never obliterate valid data fork extents with unwritten blocks from the CoW fork. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_reflink.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 108 insertions(+), 8 deletions(-) (limited to 'fs/xfs/xfs_reflink.c') diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 07593a362cd0..8c8c4f4676da 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -82,11 +82,22 @@ * mappings are a reservation against the free space in the filesystem; * adjacent mappings can also be combined into fewer larger mappings. * + * As an optimization, the CoW extent size hint (cowextsz) creates + * outsized aligned delalloc reservations in the hope of landing out of + * order nearby CoW writes in a single extent on disk, thereby reducing + * fragmentation and improving future performance. + * + * D: --RRRRRRSSSRRRRRRRR--- (data fork) + * C: ------DDDDDDD--------- (CoW fork) + * * When dirty pages are being written out (typically in writepage), the - * delalloc reservations are converted into real mappings by allocating - * blocks and replacing the delalloc mapping with real ones. A delalloc - * mapping can be replaced by several real ones if the free space is - * fragmented. + * delalloc reservations are converted into unwritten mappings by + * allocating blocks and replacing the delalloc mapping with real ones. + * A delalloc mapping can be replaced by several unwritten ones if the + * free space is fragmented. + * + * D: --RRRRRRSSSRRRRRRRR--- + * C: ------UUUUUUU--------- * * We want to adapt the delalloc mechanism for copy-on-write, since the * write paths are similar. The first two steps (creating the reservation @@ -101,13 +112,29 @@ * Block-aligned directio writes will use the same mechanism as buffered * writes. * + * Just prior to submitting the actual disk write requests, we convert + * the extents representing the range of the file actually being written + * (as opposed to extra pieces created for the cowextsize hint) to real + * extents. This will become important in the next step: + * + * D: --RRRRRRSSSRRRRRRRR--- + * C: ------UUrrUUU--------- + * * CoW remapping must be done after the data block write completes, * because we don't want to destroy the old data fork map until we're sure * the new block has been written. Since the new mappings are kept in a * separate fork, we can simply iterate these mappings to find the ones * that cover the file blocks that we just CoW'd. For each extent, simply * unmap the corresponding range in the data fork, map the new range into - * the data fork, and remove the extent from the CoW fork. + * the data fork, and remove the extent from the CoW fork. Because of + * the presence of the cowextsize hint, however, we must be careful + * only to remap the blocks that we've actually written out -- we must + * never remap delalloc reservations nor CoW staging blocks that have + * yet to be written. This corresponds exactly to the real extents in + * the CoW fork: + * + * D: --RRRRRRrrSRRRRRRRR--- + * C: ------UU--UUU--------- * * Since the remapping operation can be applied to an arbitrary file * range, we record the need for the remap step as a flag in the ioend @@ -296,6 +323,65 @@ xfs_reflink_reserve_cow( return 0; } +/* Convert part of an unwritten CoW extent to a real one. */ +STATIC int +xfs_reflink_convert_cow_extent( + struct xfs_inode *ip, + struct xfs_bmbt_irec *imap, + xfs_fileoff_t offset_fsb, + xfs_filblks_t count_fsb, + struct xfs_defer_ops *dfops) +{ + struct xfs_bmbt_irec irec = *imap; + xfs_fsblock_t first_block; + int nimaps = 1; + + if (imap->br_state == XFS_EXT_NORM) + return 0; + + xfs_trim_extent(&irec, offset_fsb, count_fsb); + trace_xfs_reflink_convert_cow(ip, &irec); + if (irec.br_blockcount == 0) + return 0; + return xfs_bmapi_write(NULL, ip, irec.br_startoff, irec.br_blockcount, + XFS_BMAPI_COWFORK | XFS_BMAPI_CONVERT, &first_block, + 0, &irec, &nimaps, dfops); +} + +/* Convert all of the unwritten CoW extents in a file's range to real ones. */ +int +xfs_reflink_convert_cow( + struct xfs_inode *ip, + xfs_off_t offset, + xfs_off_t count) +{ + struct xfs_bmbt_irec got; + struct xfs_defer_ops dfops; + struct xfs_mount *mp = ip->i_mount; + struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK); + xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); + xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + count); + xfs_extnum_t idx; + bool found; + int error; + + xfs_ilock(ip, XFS_ILOCK_EXCL); + + /* Convert all the extents to real from unwritten. */ + for (found = xfs_iext_lookup_extent(ip, ifp, offset_fsb, &idx, &got); + found && got.br_startoff < end_fsb; + found = xfs_iext_get_extent(ifp, ++idx, &got)) { + error = xfs_reflink_convert_cow_extent(ip, &got, offset_fsb, + end_fsb - offset_fsb, &dfops); + if (error) + break; + } + + /* Finish up. */ + xfs_iunlock(ip, XFS_ILOCK_EXCL); + return error; +} + /* Allocate all CoW reservations covering a range of blocks in a file. */ static int __xfs_reflink_allocate_cow( @@ -328,6 +414,7 @@ __xfs_reflink_allocate_cow( goto out_unlock; ASSERT(nimaps == 1); + /* Make sure there's a CoW reservation for it. */ error = xfs_reflink_reserve_cow(ip, &imap, &shared); if (error) goto out_trans_cancel; @@ -337,14 +424,16 @@ __xfs_reflink_allocate_cow( goto out_trans_cancel; } + /* Allocate the entire reservation as unwritten blocks. */ xfs_trans_ijoin(tp, ip, 0); error = xfs_bmapi_write(tp, ip, imap.br_startoff, imap.br_blockcount, - XFS_BMAPI_COWFORK, &first_block, + XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, &first_block, XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), &imap, &nimaps, &dfops); if (error) goto out_trans_cancel; + /* Finish up. */ error = xfs_defer_finish(&tp, &dfops, NULL); if (error) goto out_trans_cancel; @@ -389,11 +478,12 @@ xfs_reflink_allocate_cow_range( if (error) { trace_xfs_reflink_allocate_cow_range_error(ip, error, _RET_IP_); - break; + return error; } } - return error; + /* Convert the CoW extents to regular. */ + return xfs_reflink_convert_cow(ip, offset, count); } /* @@ -641,6 +731,16 @@ xfs_reflink_end_cow( ASSERT(!isnullstartblock(got.br_startblock)); + /* + * Don't remap unwritten extents; these are + * speculatively preallocated CoW extents that have been + * allocated but have not yet been involved in a write. + */ + if (got.br_state == XFS_EXT_UNWRITTEN) { + idx--; + goto next_extent; + } + /* Unmap the old blocks in the data fork. */ xfs_defer_init(&dfops, &firstfsb); rlen = del.br_blockcount; -- cgit From dcf9585a7511147c7ffd580be8580dd39bc52fb6 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 6 Feb 2017 10:46:01 -0800 Subject: xfs: return the converted extent in __xfs_reflink_convert_cow We'll need it for the direct I/O code. Also rename the function to xfs_reflink_convert_cow_extent to describe it a bit better. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_reflink.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) (limited to 'fs/xfs/xfs_reflink.c') diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 8c8c4f4676da..219bc96bfc71 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -332,20 +332,19 @@ xfs_reflink_convert_cow_extent( xfs_filblks_t count_fsb, struct xfs_defer_ops *dfops) { - struct xfs_bmbt_irec irec = *imap; xfs_fsblock_t first_block; int nimaps = 1; if (imap->br_state == XFS_EXT_NORM) return 0; - xfs_trim_extent(&irec, offset_fsb, count_fsb); - trace_xfs_reflink_convert_cow(ip, &irec); - if (irec.br_blockcount == 0) + xfs_trim_extent(imap, offset_fsb, count_fsb); + trace_xfs_reflink_convert_cow(ip, imap); + if (imap->br_blockcount == 0) return 0; - return xfs_bmapi_write(NULL, ip, irec.br_startoff, irec.br_blockcount, + return xfs_bmapi_write(NULL, ip, imap->br_startoff, imap->br_blockcount, XFS_BMAPI_COWFORK | XFS_BMAPI_CONVERT, &first_block, - 0, &irec, &nimaps, dfops); + 0, imap, &nimaps, dfops); } /* Convert all of the unwritten CoW extents in a file's range to real ones. */ -- cgit From a14234c72bf41ac96bc8c98e96e2c84b6d4bd4f2 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 6 Feb 2017 10:50:49 -0800 Subject: xfs: go straight to real allocations for direct I/O COW writes When we allocate COW fork blocks for direct I/O writes we currently first create a delayed allocation, and then convert it to a real allocation once we've got the delayed one. As there is no good reason for that this patch instead makes use call xfs_bmapi_write from the COW allocation path. The only interesting bits are a few tweaks the low-level allocator to allow for this, most notably the need to remove the call to xfs_bmap_extsize_align for the cowextsize in xfs_bmap_btalloc - for the existing convert case it's a no-op, but for the direct allocation case it would blow up our block reservation way beyond what we reserved for the transaction. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_bmap.c | 3 +- fs/xfs/xfs_reflink.c | 94 +++++++++++++++++++++++++++++++++--------------- 2 files changed, 68 insertions(+), 29 deletions(-) (limited to 'fs/xfs/xfs_reflink.c') diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index dcffbb09444e..1cee514de19e 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -2895,13 +2895,14 @@ xfs_bmap_add_extent_hole_real( ASSERT(!isnullstartblock(new->br_startblock)); ASSERT(!bma->cur || !(bma->cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL)); - ASSERT(whichfork != XFS_COW_FORK); XFS_STATS_INC(mp, xs_add_exlist); state = 0; if (whichfork == XFS_ATTR_FORK) state |= BMAP_ATTRFORK; + if (whichfork == XFS_COW_FORK) + state |= BMAP_COWFORK; /* * Check and set flags if this segment has a left neighbor. diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 219bc96bfc71..9bba084c1436 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -389,62 +389,100 @@ __xfs_reflink_allocate_cow( xfs_fileoff_t end_fsb) { struct xfs_mount *mp = ip->i_mount; - struct xfs_bmbt_irec imap; + struct xfs_bmbt_irec imap, got; struct xfs_defer_ops dfops; struct xfs_trans *tp; xfs_fsblock_t first_block; - int nimaps = 1, error; - bool shared; + int nimaps, error, lockmode; + bool shared, trimmed; + xfs_filblks_t resaligned; + xfs_extlen_t resblks; + xfs_extnum_t idx; - xfs_defer_init(&dfops, &first_block); + resaligned = xfs_aligned_fsb_count(*offset_fsb, end_fsb - *offset_fsb, + xfs_get_cowextsz_hint(ip)); + resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned); - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, - XFS_TRANS_RESERVE, &tp); + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp); if (error) return error; - xfs_ilock(ip, XFS_ILOCK_EXCL); + lockmode = XFS_ILOCK_EXCL; + xfs_ilock(ip, lockmode); - /* Read extent from the source file. */ - nimaps = 1; - error = xfs_bmapi_read(ip, *offset_fsb, end_fsb - *offset_fsb, - &imap, &nimaps, 0); - if (error) - goto out_unlock; - ASSERT(nimaps == 1); + /* + * Even if the extent is not shared we might have a preallocation for + * it in the COW fork. If so use it. + */ + if (xfs_iext_lookup_extent(ip, ip->i_cowfp, *offset_fsb, &idx, &got) && + got.br_startoff <= *offset_fsb) { + /* If we have a real allocation in the COW fork we're done. */ + if (!isnullstartblock(got.br_startblock)) { + xfs_trim_extent(&got, *offset_fsb, + end_fsb - *offset_fsb); + *offset_fsb = got.br_startoff + got.br_blockcount; + goto out_trans_cancel; + } + } else { + nimaps = 1; + error = xfs_bmapi_read(ip, *offset_fsb, end_fsb - *offset_fsb, + &imap, &nimaps, 0); + if (error) + goto out_trans_cancel; + ASSERT(nimaps == 1); + + /* Trim the mapping to the nearest shared extent boundary. */ + error = xfs_reflink_trim_around_shared(ip, &imap, &shared, + &trimmed); + if (error) + goto out_trans_cancel; + + if (!shared) { + *offset_fsb = imap.br_startoff + imap.br_blockcount; + goto out_trans_cancel; + } - /* Make sure there's a CoW reservation for it. */ - error = xfs_reflink_reserve_cow(ip, &imap, &shared); + *offset_fsb = imap.br_startoff; + end_fsb = imap.br_startoff + imap.br_blockcount; + } + + error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0, + XFS_QMOPT_RES_REGBLKS); if (error) goto out_trans_cancel; - if (!shared) { - *offset_fsb = imap.br_startoff + imap.br_blockcount; - goto out_trans_cancel; - } + xfs_trans_ijoin(tp, ip, 0); + + xfs_defer_init(&dfops, &first_block); + nimaps = 1; /* Allocate the entire reservation as unwritten blocks. */ - xfs_trans_ijoin(tp, ip, 0); - error = xfs_bmapi_write(tp, ip, imap.br_startoff, imap.br_blockcount, + error = xfs_bmapi_write(tp, ip, *offset_fsb, end_fsb - *offset_fsb, XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, &first_block, - XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), - &imap, &nimaps, &dfops); + resblks, &imap, &nimaps, &dfops); if (error) - goto out_trans_cancel; + goto out_bmap_cancel; /* Finish up. */ error = xfs_defer_finish(&tp, &dfops, NULL); if (error) - goto out_trans_cancel; + goto out_bmap_cancel; error = xfs_trans_commit(tp); + if (error) + goto out_unlock; *offset_fsb = imap.br_startoff + imap.br_blockcount; + out_unlock: - xfs_iunlock(ip, XFS_ILOCK_EXCL); + xfs_iunlock(ip, lockmode); return error; -out_trans_cancel: + +out_bmap_cancel: xfs_defer_cancel(&dfops); + xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0, + XFS_QMOPT_RES_REGBLKS); +out_trans_cancel: xfs_trans_cancel(tp); goto out_unlock; } -- cgit From 3c68d44a2b49a0ac9165faa9c191e1e618c8a8d5 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 6 Feb 2017 10:51:03 -0800 Subject: xfs: allocate direct I/O COW blocks in iomap_begin Instead of preallocating all the required COW blocks in the high-level write code do it inside the iomap code, like we do for all other I/O. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_file.c | 8 --- fs/xfs/xfs_iomap.c | 31 +++++------ fs/xfs/xfs_reflink.c | 146 +++++++++++++++++++-------------------------------- fs/xfs/xfs_reflink.h | 4 +- fs/xfs/xfs_trace.h | 2 - 5 files changed, 68 insertions(+), 123 deletions(-) (limited to 'fs/xfs/xfs_reflink.c') diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 2a695a8f4fe7..086440e79b86 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -561,14 +561,6 @@ xfs_file_dio_aio_write( } trace_xfs_file_direct_write(ip, count, iocb->ki_pos); - - /* If this is a block-aligned directio CoW, remap immediately. */ - if (xfs_is_reflink_inode(ip) && !unaligned_io) { - ret = xfs_reflink_allocate_cow_range(ip, iocb->ki_pos, count); - if (ret) - goto out; - } - ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, xfs_dio_write_end_io); out: xfs_iunlock(ip, iolock); diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 06bfc82dd00b..e0bc290396dc 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -995,37 +995,31 @@ xfs_file_iomap_begin( offset_fsb = XFS_B_TO_FSBT(mp, offset); end_fsb = XFS_B_TO_FSB(mp, offset + length); - if (xfs_is_reflink_inode(ip) && - (flags & IOMAP_WRITE) && (flags & IOMAP_DIRECT)) { - shared = xfs_reflink_find_cow_mapping(ip, offset, &imap); - if (shared) { - xfs_iunlock(ip, lockmode); - goto alloc_done; - } - ASSERT(!isnullstartblock(imap.br_startblock)); - } - error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, &nimaps, 0); if (error) goto out_unlock; - if ((flags & IOMAP_REPORT) || - (xfs_is_reflink_inode(ip) && - (flags & IOMAP_WRITE) && (flags & IOMAP_DIRECT))) { + if (flags & IOMAP_REPORT) { /* Trim the mapping to the nearest shared extent boundary. */ error = xfs_reflink_trim_around_shared(ip, &imap, &shared, &trimmed); if (error) goto out_unlock; - - ASSERT((flags & IOMAP_REPORT) || !shared); } if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) { - error = xfs_reflink_reserve_cow(ip, &imap, &shared); - if (error) - goto out_unlock; + if (flags & IOMAP_DIRECT) { + /* may drop and re-acquire the ilock */ + error = xfs_reflink_allocate_cow(ip, &imap, &shared, + &lockmode); + if (error) + goto out_unlock; + } else { + error = xfs_reflink_reserve_cow(ip, &imap, &shared); + if (error) + goto out_unlock; + } end_fsb = imap.br_startoff + imap.br_blockcount; length = XFS_FSB_TO_B(mp, end_fsb) - offset; @@ -1054,7 +1048,6 @@ xfs_file_iomap_begin( if (error) return error; -alloc_done: iomap->flags = IOMAP_F_NEW; trace_xfs_iomap_alloc(ip, offset, length, 0, &imap); } else { diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 9bba084c1436..1f61877b065b 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -382,74 +382,75 @@ xfs_reflink_convert_cow( } /* Allocate all CoW reservations covering a range of blocks in a file. */ -static int -__xfs_reflink_allocate_cow( +int +xfs_reflink_allocate_cow( struct xfs_inode *ip, - xfs_fileoff_t *offset_fsb, - xfs_fileoff_t end_fsb) + struct xfs_bmbt_irec *imap, + bool *shared, + uint *lockmode) { struct xfs_mount *mp = ip->i_mount; - struct xfs_bmbt_irec imap, got; + xfs_fileoff_t offset_fsb = imap->br_startoff; + xfs_filblks_t count_fsb = imap->br_blockcount; + struct xfs_bmbt_irec got; struct xfs_defer_ops dfops; - struct xfs_trans *tp; + struct xfs_trans *tp = NULL; xfs_fsblock_t first_block; - int nimaps, error, lockmode; - bool shared, trimmed; + int nimaps, error = 0; + bool trimmed; xfs_filblks_t resaligned; - xfs_extlen_t resblks; + xfs_extlen_t resblks = 0; xfs_extnum_t idx; - resaligned = xfs_aligned_fsb_count(*offset_fsb, end_fsb - *offset_fsb, - xfs_get_cowextsz_hint(ip)); - resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned); - - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp); - if (error) - return error; - - lockmode = XFS_ILOCK_EXCL; - xfs_ilock(ip, lockmode); +retry: + ASSERT(xfs_is_reflink_inode(ip)); + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED)); /* * Even if the extent is not shared we might have a preallocation for * it in the COW fork. If so use it. */ - if (xfs_iext_lookup_extent(ip, ip->i_cowfp, *offset_fsb, &idx, &got) && - got.br_startoff <= *offset_fsb) { + if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &idx, &got) && + got.br_startoff <= offset_fsb) { + *shared = true; + /* If we have a real allocation in the COW fork we're done. */ if (!isnullstartblock(got.br_startblock)) { - xfs_trim_extent(&got, *offset_fsb, - end_fsb - *offset_fsb); - *offset_fsb = got.br_startoff + got.br_blockcount; - goto out_trans_cancel; + xfs_trim_extent(&got, offset_fsb, count_fsb); + *imap = got; + goto convert; } + + xfs_trim_extent(imap, got.br_startoff, got.br_blockcount); } else { - nimaps = 1; - error = xfs_bmapi_read(ip, *offset_fsb, end_fsb - *offset_fsb, - &imap, &nimaps, 0); - if (error) - goto out_trans_cancel; - ASSERT(nimaps == 1); + error = xfs_reflink_trim_around_shared(ip, imap, shared, &trimmed); + if (error || !*shared) + goto out; + } - /* Trim the mapping to the nearest shared extent boundary. */ - error = xfs_reflink_trim_around_shared(ip, &imap, &shared, - &trimmed); - if (error) - goto out_trans_cancel; + if (!tp) { + resaligned = xfs_aligned_fsb_count(imap->br_startoff, + imap->br_blockcount, xfs_get_cowextsz_hint(ip)); + resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned); - if (!shared) { - *offset_fsb = imap.br_startoff + imap.br_blockcount; - goto out_trans_cancel; - } + xfs_iunlock(ip, *lockmode); + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp); + *lockmode = XFS_ILOCK_EXCL; + xfs_ilock(ip, *lockmode); - *offset_fsb = imap.br_startoff; - end_fsb = imap.br_startoff + imap.br_blockcount; + if (error) + return error; + + error = xfs_qm_dqattach_locked(ip, 0); + if (error) + goto out; + goto retry; } error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0, XFS_QMOPT_RES_REGBLKS); if (error) - goto out_trans_cancel; + goto out; xfs_trans_ijoin(tp, ip, 0); @@ -457,9 +458,9 @@ __xfs_reflink_allocate_cow( nimaps = 1; /* Allocate the entire reservation as unwritten blocks. */ - error = xfs_bmapi_write(tp, ip, *offset_fsb, end_fsb - *offset_fsb, + error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount, XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, &first_block, - resblks, &imap, &nimaps, &dfops); + resblks, imap, &nimaps, &dfops); if (error) goto out_bmap_cancel; @@ -470,57 +471,18 @@ __xfs_reflink_allocate_cow( error = xfs_trans_commit(tp); if (error) - goto out_unlock; - - *offset_fsb = imap.br_startoff + imap.br_blockcount; - -out_unlock: - xfs_iunlock(ip, lockmode); - return error; - + return error; +convert: + return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb, + &dfops); out_bmap_cancel: xfs_defer_cancel(&dfops); xfs_trans_unreserve_quota_nblks(tp, ip, (long)resblks, 0, XFS_QMOPT_RES_REGBLKS); -out_trans_cancel: - xfs_trans_cancel(tp); - goto out_unlock; -} - -/* Allocate all CoW reservations covering a part of a file. */ -int -xfs_reflink_allocate_cow_range( - struct xfs_inode *ip, - xfs_off_t offset, - xfs_off_t count) -{ - struct xfs_mount *mp = ip->i_mount; - xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); - xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + count); - int error; - - ASSERT(xfs_is_reflink_inode(ip)); - - trace_xfs_reflink_allocate_cow_range(ip, offset, count); - - /* - * Make sure that the dquots are there. - */ - error = xfs_qm_dqattach(ip, 0); - if (error) - return error; - - while (offset_fsb < end_fsb) { - error = __xfs_reflink_allocate_cow(ip, &offset_fsb, end_fsb); - if (error) { - trace_xfs_reflink_allocate_cow_range_error(ip, error, - _RET_IP_); - return error; - } - } - - /* Convert the CoW extents to regular. */ - return xfs_reflink_convert_cow(ip, offset, count); +out: + if (tp) + xfs_trans_cancel(tp); + return error; } /* diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h index 1583c4727cb1..33ac9b8db683 100644 --- a/fs/xfs/xfs_reflink.h +++ b/fs/xfs/xfs_reflink.h @@ -28,8 +28,8 @@ extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip, extern int xfs_reflink_reserve_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap, bool *shared); -extern int xfs_reflink_allocate_cow_range(struct xfs_inode *ip, - xfs_off_t offset, xfs_off_t count); +extern int xfs_reflink_allocate_cow(struct xfs_inode *ip, + struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode); extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset, xfs_off_t count); extern bool xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset, diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index b654893130e2..fb7555e73a62 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3247,7 +3247,6 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_enospc); DEFINE_INODE_IREC_EVENT(xfs_reflink_convert_cow); DEFINE_RW_EVENT(xfs_reflink_reserve_cow); -DEFINE_RW_EVENT(xfs_reflink_allocate_cow_range); DEFINE_SIMPLE_IO_EVENT(xfs_reflink_bounce_dio_write); DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping); @@ -3257,7 +3256,6 @@ DEFINE_SIMPLE_IO_EVENT(xfs_reflink_cancel_cow_range); DEFINE_SIMPLE_IO_EVENT(xfs_reflink_end_cow); DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_remap); -DEFINE_INODE_ERROR_EVENT(xfs_reflink_allocate_cow_range_error); DEFINE_INODE_ERROR_EVENT(xfs_reflink_cancel_cow_range_error); DEFINE_INODE_ERROR_EVENT(xfs_reflink_end_cow_error); -- cgit From c5ecb42342852892f978572ddc6dca703460f25a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 6 Feb 2017 17:45:51 -0800 Subject: xfs: update ctime and mtime on clone destinatation inodes We're changing both metadata and data, so we need to update the timestamps for clone operations. Dedupe on the other hand does not change file data, and only changes invisible metadata so the timestamps should not be updated. This follows existing btrfs behavior. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong [darrick: remove redundant is_dedupe test] Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_reflink.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) (limited to 'fs/xfs/xfs_reflink.c') diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 1f61877b065b..2f3d2d5c2934 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -954,13 +954,14 @@ STATIC int xfs_reflink_update_dest( struct xfs_inode *dest, xfs_off_t newlen, - xfs_extlen_t cowextsize) + xfs_extlen_t cowextsize, + bool is_dedupe) { struct xfs_mount *mp = dest->i_mount; struct xfs_trans *tp; int error; - if (newlen <= i_size_read(VFS_I(dest)) && cowextsize == 0) + if (is_dedupe && newlen <= i_size_read(VFS_I(dest)) && cowextsize == 0) return 0; error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp); @@ -981,6 +982,10 @@ xfs_reflink_update_dest( dest->i_d.di_flags2 |= XFS_DIFLAG2_COWEXTSIZE; } + if (!is_dedupe) { + xfs_trans_ichgtime(tp, dest, + XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + } xfs_trans_log_inode(tp, dest, XFS_ILOG_CORE); error = xfs_trans_commit(tp); @@ -1294,7 +1299,8 @@ xfs_reflink_remap_range( !(dest->i_d.di_flags2 & XFS_DIFLAG2_COWEXTSIZE)) cowextsize = src->i_d.di_cowextsize; - ret = xfs_reflink_update_dest(dest, pos_out + len, cowextsize); + ret = xfs_reflink_update_dest(dest, pos_out + len, cowextsize, + is_dedupe); out_unlock: xfs_iunlock(src, XFS_MMAPLOCK_EXCL); -- cgit From 93aaead52a9eebdc20dc8fa673c350e592a06949 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Mon, 13 Feb 2017 22:52:27 -0800 Subject: xfs: fix uninitialized variable in _reflink_convert_cow Fix an uninitialize variable. Reported-by: Dan Carpenter Reviewed-by: Brian Foster Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_reflink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/xfs/xfs_reflink.c') diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 2f3d2d5c2934..da6d08fb359c 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -362,7 +362,7 @@ xfs_reflink_convert_cow( xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + count); xfs_extnum_t idx; bool found; - int error; + int error = 0; xfs_ilock(ip, XFS_ILOCK_EXCL); -- cgit From 3802a345321a08093ba2ddb1849e736f84e8d450 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 7 Mar 2017 16:45:58 -0800 Subject: xfs: only reclaim unwritten COW extents periodically We only want to reclaim preallocations from our periodic work item. Currently this is archived by looking for a dirty inode, but that check is rather fragile. Instead add a flag to xfs_reflink_cancel_cow_* so that the caller can ask for just cancelling unwritten extents in the COW fork. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong [darrick: fix typos in commit message] Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_aops.c | 2 +- fs/xfs/xfs_icache.c | 2 +- fs/xfs/xfs_inode.c | 2 +- fs/xfs/xfs_reflink.c | 23 ++++++++++++++++------- fs/xfs/xfs_reflink.h | 4 ++-- fs/xfs/xfs_super.c | 2 +- 6 files changed, 22 insertions(+), 13 deletions(-) (limited to 'fs/xfs/xfs_reflink.c') diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index bf65a9ea8642..aa8a6f0d09c3 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -293,7 +293,7 @@ xfs_end_io( goto done; if (ioend->io_bio->bi_error) { error = xfs_reflink_cancel_cow_range(ip, - ioend->io_offset, ioend->io_size); + ioend->io_offset, ioend->io_size, true); goto done; } error = xfs_reflink_end_cow(ip, ioend->io_offset, diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 7234b9748c36..3531f8f72fa5 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1608,7 +1608,7 @@ xfs_inode_free_cowblocks( xfs_ilock(ip, XFS_IOLOCK_EXCL); xfs_ilock(ip, XFS_MMAPLOCK_EXCL); - ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF); + ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false); xfs_iunlock(ip, XFS_MMAPLOCK_EXCL); xfs_iunlock(ip, XFS_IOLOCK_EXCL); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index edfa6a55b064..7eaf1ef74e3c 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1615,7 +1615,7 @@ xfs_itruncate_extents( /* Remove all pending CoW reservations. */ error = xfs_reflink_cancel_cow_blocks(ip, &tp, first_unmap_block, - last_block); + last_block, true); if (error) goto out; diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index da6d08fb359c..4a84c5ea266d 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -548,14 +548,18 @@ xfs_reflink_trim_irec_to_next_cow( } /* - * Cancel all pending CoW reservations for some block range of an inode. + * Cancel CoW reservations for some block range of an inode. + * + * If cancel_real is true this function cancels all COW fork extents for the + * inode; if cancel_real is false, real extents are not cleared. */ int xfs_reflink_cancel_cow_blocks( struct xfs_inode *ip, struct xfs_trans **tpp, xfs_fileoff_t offset_fsb, - xfs_fileoff_t end_fsb) + xfs_fileoff_t end_fsb, + bool cancel_real) { struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK); struct xfs_bmbt_irec got, del; @@ -579,7 +583,7 @@ xfs_reflink_cancel_cow_blocks( &idx, &got, &del); if (error) break; - } else { + } else if (del.br_state == XFS_EXT_UNWRITTEN || cancel_real) { xfs_trans_ijoin(*tpp, ip, 0); xfs_defer_init(&dfops, &firstfsb); @@ -621,13 +625,17 @@ xfs_reflink_cancel_cow_blocks( } /* - * Cancel all pending CoW reservations for some byte range of an inode. + * Cancel CoW reservations for some byte range of an inode. + * + * If cancel_real is true this function cancels all COW fork extents for the + * inode; if cancel_real is false, real extents are not cleared. */ int xfs_reflink_cancel_cow_range( struct xfs_inode *ip, xfs_off_t offset, - xfs_off_t count) + xfs_off_t count, + bool cancel_real) { struct xfs_trans *tp; xfs_fileoff_t offset_fsb; @@ -653,7 +661,8 @@ xfs_reflink_cancel_cow_range( xfs_trans_ijoin(tp, ip, 0); /* Scrape out the old CoW reservations */ - error = xfs_reflink_cancel_cow_blocks(ip, &tp, offset_fsb, end_fsb); + error = xfs_reflink_cancel_cow_blocks(ip, &tp, offset_fsb, end_fsb, + cancel_real); if (error) goto out_cancel; @@ -1450,7 +1459,7 @@ next: * We didn't find any shared blocks so turn off the reflink flag. * First, get rid of any leftover CoW mappings. */ - error = xfs_reflink_cancel_cow_blocks(ip, tpp, 0, NULLFILEOFF); + error = xfs_reflink_cancel_cow_blocks(ip, tpp, 0, NULLFILEOFF, true); if (error) return error; diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h index 33ac9b8db683..d29a7967f029 100644 --- a/fs/xfs/xfs_reflink.h +++ b/fs/xfs/xfs_reflink.h @@ -39,9 +39,9 @@ extern void xfs_reflink_trim_irec_to_next_cow(struct xfs_inode *ip, extern int xfs_reflink_cancel_cow_blocks(struct xfs_inode *ip, struct xfs_trans **tpp, xfs_fileoff_t offset_fsb, - xfs_fileoff_t end_fsb); + xfs_fileoff_t end_fsb, bool cancel_real); extern int xfs_reflink_cancel_cow_range(struct xfs_inode *ip, xfs_off_t offset, - xfs_off_t count); + xfs_off_t count, bool cancel_real); extern int xfs_reflink_end_cow(struct xfs_inode *ip, xfs_off_t offset, xfs_off_t count); extern int xfs_reflink_recover_cow(struct xfs_mount *mp); diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 890862f2447c..685c042a120f 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -953,7 +953,7 @@ xfs_fs_destroy_inode( XFS_STATS_INC(ip->i_mount, vn_remove); if (xfs_is_reflink_inode(ip)) { - error = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF); + error = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true); if (error && !XFS_FORCED_SHUTDOWN(ip->i_mount)) xfs_warn(ip->i_mount, "Error %d while evicting CoW blocks for inode %llu.", -- cgit