aboutsummaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)AuthorFilesLines
2014-06-06xfs: use xfs_da_geometry for block size in attr codeDave Chinner3-10/+10
Rather than using the superblock value obtained through the xfs_mount. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: remove mp->m_dir_geo from directory loggingDave Chinner6-148/+137
We don't pass the xfs_da_args or the geometry all the way down to the directory buffer logging code, hence we have to use mp->m_dir_geo here. Fix this to use the geometry passed via the xfs_da_args, and convert all the directory logging functions for consistency. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: reduce direct usage of mp->m_dir_geoDave Chinner3-74/+65
There are many places in the directory code were we don't pass the args into and so have to extract the geometry direct from the mount structure. Push the args or the geometry into these leaf functions so that we don't need to grab it from the struct xfs_mount. This, in turn, brings use to the point where directory geometry is no longer a property of the struct xfs_mount; it is not a global property anymore, and hence we can start to consider per-directory configuration of physical geometries. Start by converting the xfs_dir_isblock/leaf code - pass in the xfs_da_args and convert the readdir code to use xfs_da_args like the rest of the directory code to pass information around. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: move node entry counts to xfs_da_geometryDave Chinner5-16/+11
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: convert dir/attr btree threshold to xfs_da_geometryDave Chinner4-6/+2
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: convert m_dirblksize to xfs_da_geometryDave Chinner14-106/+115
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: convert m_dirblkfsbs to xfs_da_geometryDave Chinner9-33/+27
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: convert directory segment limits to xfs_da_geometryDave Chinner8-42/+37
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: convert directory db conversion to xfs_da_geometryDave Chinner9-70/+93
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: convert directory dablk conversion to xfs_da_geometryDave Chinner6-43/+48
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: convert dir byte/off conversion to xfs_da_geometryDave Chinner3-13/+14
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: kill XFS_DIR2...FIRSTDB macrosDave Chinner5-17/+17
They are just simple wrappers around xfs_dir2_byte_to_db(), and we've already removed one usage earlier in the patch set. Kill the rest before we start removing the xfs_mount from conversion functions. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: move directory block translatiosn to xfs_dir2_priv.hDave Chinner3-138/+139
Because they aren't actually part of the on-disk format, and so shouldn't be in xfs_da_format.h. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: introduce directory geometry structureDave Chinner8-21/+97
The directory code has a dependency on the struct xfs_mount to supply the directory block geometry. Block size, block log size, and other parameters are pre-caclulated in the struct xfs_mount or access directly from the superblock embedded in the struct xfs_mount. Extract all of this geometry information out of the struct xfs_mount and superblock and place it into a new struct xfs_da_geometry defined by the directory code. Allocate and initialise it at mount time, and attach it to the struct xfs_mount so it canbe passed back into the directory code appropriately rather than using the struct xfs_mount. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-20Merge branch 'xfs-feature-bit-cleanup' into for-nextDave Chinner11-327/+101
Conflicts: fs/xfs/xfs_inode.c
2014-05-20Merge branch 'xfs-misc-fixes-2-for-3.16' into for-nextDave Chinner7-64/+66
Conflicts: fs/xfs/xfs_ialloc.c
2014-05-20xfs: fix compile error when libxfs header used in C++ codeRoger Willcocks2-4/+4
xfs_ialloc.h:102: error: expected ',' or '...' before 'delete' Simple parameter rename, no changes to behaviour. Signed-off-by: Roger Willcocks <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-20xfs: fix infinite loop at xfs_vm_writepage on 32bit systemJie Liu1-6/+43
Write to a file with an offset greater than 16TB on 32-bit system and then trigger page write-back via sync(1) will cause task hang. # block_size=4096 # offset=$(((2**32 - 1) * $block_size)) # xfs_io -f -c "pwrite $offset $block_size" /storage/test_file # sync INFO: task sync:2590 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. sync D c1064a28 0 2590 2097 0x00000000 ..... Call Trace: [<c1064a28>] ? ttwu_do_wakeup+0x18/0x130 [<c1066d0e>] ? try_to_wake_up+0x1ce/0x220 [<c1066dbf>] ? wake_up_process+0x1f/0x40 [<c104fc2e>] ? wake_up_worker+0x1e/0x30 [<c15b6083>] schedule+0x23/0x60 [<c15b3c2d>] schedule_timeout+0x18d/0x1f0 [<c12a143e>] ? do_raw_spin_unlock+0x4e/0x90 [<c10515f1>] ? __queue_delayed_work+0x91/0x150 [<c12a12ef>] ? do_raw_spin_lock+0x3f/0x100 [<c12a143e>] ? do_raw_spin_unlock+0x4e/0x90 [<c15b5b5d>] wait_for_completion+0x7d/0xc0 [<c1066d60>] ? try_to_wake_up+0x220/0x220 [<c116a4d2>] sync_inodes_sb+0x92/0x180 [<c116fb05>] sync_inodes_one_sb+0x15/0x20 [<c114a8f8>] iterate_supers+0xb8/0xc0 [<c116faf0>] ? fdatawrite_one_bdev+0x20/0x20 [<c116fc21>] sys_sync+0x31/0x80 [<c15be18d>] sysenter_do_call+0x12/0x28 This issue can be triggered via xfstests/generic/308. The reason is that the end_index is unsigned long with maximum value '2^32-1=4294967295' on 32-bit platform, and the given offset cause it wrapped to 0, so that the following codes will repeat again and again until the task schedule time out: end_index = offset >> PAGE_CACHE_SHIFT; last_index = (offset - 1) >> PAGE_CACHE_SHIFT; if (page->index >= end_index) { unsigned offset_into_page = offset & (PAGE_CACHE_SIZE - 1); /* * Just skip the page if it is fully outside i_size, e.g. due * to a truncate operation that is in progress. */ if (page->index >= end_index + 1 || offset_into_page == 0) { ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ unlock_page(page); return 0; } In order to check if a page is fully outsids i_size or not, we can fix the code logic as below: if (page->index > end_index || (page->index == end_index && offset_into_page == 0)) Secondly, there still has another similar issue when calculating the end offset for mapping the filesystem blocks to the file blocks for delalloc. With the same tests to above, run unmount(8) will cause kernel panic if CONFIG_XFS_DEBUG is enabled: XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || \ ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 964 kernel BUG at fs/xfs/xfs_message.c:108! invalid opcode: 0000 [#1] SMP task: edddc100 ti: ec6ee000 task.ti: ec6ee000 EIP: 0060:[<f83d87cb>] EFLAGS: 00010296 CPU: 1 EIP is at assfail+0x2b/0x30 [xfs] .............. Call Trace: [<f83d9cd4>] xfs_fs_destroy_inode+0x74/0x120 [xfs] [<c115ddf1>] destroy_inode+0x31/0x50 [<c115deff>] evict+0xef/0x170 [<c115dfb2>] dispose_list+0x32/0x40 [<c115ea3a>] evict_inodes+0xca/0xe0 [<c1149706>] generic_shutdown_super+0x46/0xd0 [<c11497b9>] kill_block_super+0x29/0x70 [<c1149a14>] deactivate_locked_super+0x44/0x70 [<c114a427>] deactivate_super+0x47/0x60 [<c1161c3d>] mntput_no_expire+0xcd/0x120 [<c1162ae8>] SyS_umount+0xa8/0x370 [<c1162dce>] SyS_oldumount+0x1e/0x20 [<c15be18d>] sysenter_do_call+0x12/0x28 That because the end_offset is evaluated to 0 which is the same reason to above, hence the mapping and covertion for dealloc file blocks to file system blocks did not happened. This patch just fixed both issues. Reported-by: Michael L. Semon <[email protected]> Signed-off-by: Jie Liu <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-20xfs: remove redundant checks from xfs_da_read_bufDave Chinner1-41/+0
All of the verification checks of magic numbers are now done by verifiers, so ther eis no need to check them again once the buffer has been successfully read. If the magic number is bad, it won't even get to that code to verify it so it really serves no purpose at all anymore. Remove it. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-20xfs: log vector rounding leaks log spaceDave Chinner2-9/+17
The addition of direct formatting of log items into the CIL linear buffer added alignment restrictions that the start of each vector needed to be 64 bit aligned. Hence padding was added in xlog_finish_iovec() to round up the vector length to ensure the next vector started with the correct alignment. This adds a small number of bytes to the size of the linear buffer that is otherwise unused. The issue is that we then use the linear buffer size to determine the log space used by the log item, and this includes the unused space. Hence when we account for space used by the log item, it's more than is actually written into the iclogs, and hence we slowly leak this space. This results on log hangs when reserving space, with threads getting stuck with these stack traces: Call Trace: [<ffffffff81d15989>] schedule+0x29/0x70 [<ffffffff8150d3a2>] xlog_grant_head_wait+0xa2/0x1a0 [<ffffffff8150d55d>] xlog_grant_head_check+0xbd/0x140 [<ffffffff8150ee33>] xfs_log_reserve+0x103/0x220 [<ffffffff814b7f05>] xfs_trans_reserve+0x2f5/0x310 ..... The 4 bytes is significant. Brain Foster did all the hard work in tracking down a reproducable leak to inode chunk allocation (it went away with the ikeep mount option). His rough numbers were that creating 50,000 inodes leaked 11 log blocks. This turns out to be roughly 800 inode chunks or 1600 inode cluster buffers. That works out at roughly 4 bytes per cluster buffer logged, and at that I started looking for a 4 byte leak in the buffer logging code. What I found was that a struct xfs_buf_log_format structure for an inode cluster buffer is 28 bytes in length. This gets rounded up to 32 bytes, but the vector length remains 28 bytes. Hence the CIL ticket reservation is decremented by 32 bytes (via lv->lv_buf_len) for that vector rather than 28 bytes which are written into the log. The fix for this problem is to separately track the bytes used by the log vectors in the item and use that instead of the buffer length when accounting for the log space that will be used by the formatted log item. Again, thanks to Brian Foster for doing all the hard work and long hours to isolate this leak and make finding the bug relatively simple. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-20xfs: remove XFS_TRANS_RESERVE in collapse rangeNamjae Jeon1-2/+0
There is no need to dip into reserve pool. Reserve pool is used for much more important things. And xfs_trans_reserve will never return ENOSPC because punch hole is already done. If we get ENOSPC, collapse range will be simply failed. Cc: Brian Foster <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Ashish Sangwan <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-20xfs: remove shared supberlock feature checkingDave Chinner3-16/+6
We reject any filesystem that is mounted with this feature bit set, so we don't need to check for it anywhere else. Remove the function for checking if the feature bit is set and any code that uses it. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-20xfs: don't need dirv2 checks anymoreDave Chinner4-18/+2
If the the V2 directory feature bit is not set in the superblock feature mask the filesystem will fail the good version check. Hence we don't need any other version checking on the dir2 feature bit in the code as the filesystem will not mount without it set. Remove the checking code. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-20xfs: turn NLINK feature on by defaultDave Chinner9-164/+30
mkfs has turned on the XFS_SB_VERSION_NLINKBIT feature bit by default since November 2007. It's about time we simply made the kernel code turn it on by default and so always convert v1 inodes to v2 inodes when reading them in from disk or allocating them. This This removes needless version checks and modification when bumping link counts on inodes, and will take code out of a few common code paths. text data bss dec hex filename 783251 100867 616 884734 d7ffe fs/xfs/xfs.o.orig 782664 100867 616 884147 d7db3 fs/xfs/xfs.o.patched Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-20xfs: keep sb_bad_features2 the same a sb_features2Dave Chinner1-0/+2
Whenever we update sb_features2, we need to update sb_bad_features2 so that they remain identical on disk. This prevents future mounts or userspace utilities from getting confused over which features the filesystem supports. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-20xfs: make superblock version checks reflect realityDave Chinner1-144/+76
We only support filesystems that have v2 directory support, and than means all the checking and handling of superblock versions prior to this support being added is completely unnecessary overhead. Strip out all the version 1-3 support, sanitise the good version checking to reflect the supported versions, update all the feature supported functions and clean up all the support bit definitions to reflect the fact that we no longer care about Irix bootloader flag regions for v4 feature bits. Also, convert the return values to boolean types and remove typedefs from function declarations to clean up calling conventions, too. Because the feature bit checking is all inline code, this relatively small cleanup has a noticable impact on code size: text data bss dec hex filename 785195 100867 616 886678 d8796 fs/xfs/xfs.o.orig 783595 100867 616 885078 d8156 fs/xfs/xfs.o.patched i.e. it reduces it by 1600 bytes. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-15Merge branch 'xfs-attr-cleanup' into for-nextDave Chinner1-211/+117
Conflicts: fs/xfs/xfs_attr.c
2014-05-15Merge branch 'xfs-misc-fixes-1-for-3.16' into for-nextDave Chinner11-267/+128
2014-05-15Merge branch 'xfs-free-inode-btree' into for-nextDave Chinner18-137/+852
2014-05-15Merge branch 'xfs-filestreams-lookup' into for-nextDave Chinner10-806/+403
2014-05-15Merge branch 'xfs-unused-args-cleanup' into for-nextDave Chinner43-166/+113
2014-05-15xfs: list_lru_init returns a negative errorDave Chinner1-12/+14
And we don't invert it properly when initialising the dquot lru list. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-15xfs: negate xfs_icsb_init_counters error value Dave Chinner1-1/+1
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-15xfs: negate mount workqueue init error valueDave Chinner1-1/+1
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-15xfs: fix wrong err sign on xfs_set_acl()Dave Chinner1-2/+2
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-15xfs: fix wrong errno from xfs_initxattrsDave Chinner1-4/+4
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-15xfs: correct error sign on COLLAPSE_RANGE errorsDave Chinner1-2/+2
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-15xfs: xfs_commit_metadata returns wrong errnoDave Chinner1-1/+1
Invert it. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-15xfs: fix incorrect error sign in xfs_file_aio_readDave Chinner1-1/+1
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-15xfs: xfs_dir_fsync() returns positive errnoDave Chinner1-1/+1
And it should be negative. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-13xfs: pass struct da_args to xfs_attr_calc_sizeChristoph Hellwig1-9/+5
And remove a very confused comment. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-13xfs: simplify attr name setupChristoph Hellwig1-45/+29
Replace xfs_attr_name_to_xname with a new xfs_attr_args_init helper that sets up the basic da_args structure without using a temporary xfs_name structure. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-13xfs: fold xfs_attr_remove_int into xfs_attr_removeChristoph Hellwig1-62/+35
Also remove a useless ilock roundtrip for the first attr fork check, it's racy anyway and we redo it later under the ilock before we start the removal. Plus various minor style fixes to the new xfs_attr_remove. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-13xfs: fold xfs_attr_get_int into xfs_attr_getChristoph Hellwig1-52/+27
This allows doing an unlocked check if an attr for is present at all and slightly reduce the lock hold time if we actually do an attr get. Plus various minor style fixes to the new xfs_attr_get. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-13xfs: fold xfs_attr_set_int into xfs_attr_setChristoph Hellwig1-66/+44
Plus various minor style fixes to the new xfs_attr_set. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-08Merge tag 'xfs-for-linus-3.15-rc5' of git://oss.sgi.com/xfs/xfsLinus Torvalds9-50/+77
Pull xfs fixes from Dave Chinner: "The main fix is adding support for default ACLs on O_TMPFILE opened inodes to bring XFS into line with other filesystems. Metadata CRCs are now also considered well enough tested to be fully supported, so we're removing the shouty warnings issued at mount time for filesystems with that format. And there's transaction block reservation overrun fix. Summary: - fix a remote attribute size calculation bug that leads to a transaction overrun - add default ACLs to O_TMPFILE files - Remove the EXPERIMENTAL tag from filesystems with metadata CRC support" * tag 'xfs-for-linus-3.15-rc5' of git://oss.sgi.com/xfs/xfs: xfs: remote attribute overwrite causes transaction overrun xfs: initialize default acls for ->tmpfile() xfs: fully support v5 format filesystems
2014-05-07xfs: fix directory readahead offset off-by-oneDave Chinner1-2/+1
Directory readahead can throw loud scary but harmless warnings when multiblock directories are in use a specific pattern of discontiguous blocks are found in the directory. That is, if a hole follows a discontiguous block, it will throw a warning like: XFS (dm-1): xfs_da_do_buf: bno 637 dir: inode 34363923462 XFS (dm-1): [00] br_startoff 637 br_startblock 1917954575 br_blockcount 1 br_state 0 XFS (dm-1): [01] br_startoff 638 br_startblock -2 br_blockcount 1 br_state 0 And dump a stack trace. This is because the readahead offset increment loop does a double increment of the block index - it does an increment for the loop iteration as well as increase the loop counter by the number of blocks in the extent. As a result, the readahead offset does not get incremented correctly for discontiguous blocks and hence can ask for readahead of a directory block from an offset part way through a directory block. If that directory block is followed by a hole, it will trigger a mapping warning like the above. The bad readahead will be ignored, though, because the main directory block read loop uses the correct mapping offsets rather than the readahead offset and so will ignore the bad readahead altogether. Fix the warning by ensuring that the readahead offset is correctly incremented. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-07xfs: don't sleep in xlog_cil_force_lsn on shutdownDave Chinner2-11/+48
Reports of a shutdown hang when fsyncing a directory have surfaced, such as this: [ 3663.394472] Call Trace: [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70 [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs] [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs] [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs] [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0 [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20 [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b If we trigger a shutdown in xlog_cil_push() from xlog_write(), we will never wake waiters on the current push sequence number, so anything waiting in xlog_cil_force_lsn() for that push sequence number to come up will not get woken and hence stall the shutdown. Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in the push abort handling, in the log shutdown code when waking all waiters, and adding a shutdown check in the sequence completion wait loops to ensure they abort when a wakeup due to a shutdown occurs. Reported-by: Boris Ranto <[email protected]> Reported-by: Eric Sandeen <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-07xfs: truncate_setsize should be outside transactionsDave Chinner1-4/+16
truncate_setsize() removes pages from the page cache, and hence requires page locks to be held. It is not valid to lock a page cache page inside a transaction context as we can hold page locks when we we reserve space for a transaction. If we do, then we expose an ABBA deadlock between log space reservation and page locks. That is, both the write path and writeback lock a page, then start a transaction for block allocation, which means they can block waiting for a log reservation with the page lock held. If we hold a log reservation and then do something that locks a page (e.g. truncate_setsize in xfs_setattr_size) then that page lock can block on the page locked and waiting for a log reservation. If the transaction that is waiting for the page lock is the only active transaction in the system that can free log space via a commit, then writeback will never make progress and so log space will never free up. This issue with xfs_setattr_size() was introduced back in 2010 by commit fa9b227 ("xfs: new truncate sequence") which moved the page cache truncate from outside the transaction context (what was xfs_itruncate_data()) to inside the transaction context as a call to truncate_setsize(). The reason truncate_setsize() was located where in this place was that we can't shouldn't change the file size until after we are in the transaction context and the operation will either succeed or shut down the filesystem on failure. However, block_truncate_page() already modifies the file contents before we enter the transaction context, so we can't really fulfill this guarantee in any way. Hence we may as well ensure that on success or failure, the in-memory inode and data is truncated away and that the application cleans up the mess appropriately. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-05-06xfs: switch to ->write_iter()Al Viro1-20/+9
Signed-off-by: Al Viro <[email protected]>