aboutsummaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)AuthorFilesLines
2014-07-24xfs: fix uflags detection at xfs_fs_rm_xquotaJie Liu1-1/+1
We are intended to check up uflags against FS_PROJ_QUOTA rather than FS_USER_UQUOTA once more, it looks to me like a typo, but might cause the project quota metadata space can not be removed. Signed-off-by: Jie Liu <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: remove XFS_IS_OQUOTA_ON macrosJie Liu1-2/+0
Remove the XFS_IS_OQUOTA_ON macros as it is obsoleted. Signed-off-by: Jie Liu <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: tidy up xfs_set_inode32Eric Sandeen1-2/+4
xfs_set_inode32() caught my eye because it had weird spacing around the "-1's". In cleaning that up, I realized that the assignment in the declaration of "ino" is never used; it's rewritten before it gets read. Drop the ino initializer from its declaration since it's not used, and move the agino initialization into the body of the function, mostly so that we can have pretty whitespace and not exceed 80 columns. :) Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: allow inode allocations in post-growfs disk spaceEric Sandeen3-11/+17
Today, if we perform an xfs_growfs which adds allocation groups, mp->m_maxagi is not properly updated when the growfs is complete. Therefore inodes will continue to be allocated only in the AGs which existed prior to the growfs, and the new space won't be utilized. This is because of this path in xfs_growfs_data_private(): xfs_growfs_data_private xfs_initialize_perag(mp, nagcount, &nagimax); if (mp->m_flags & XFS_MOUNT_32BITINODES) index = xfs_set_inode32(mp); else index = xfs_set_inode64(mp); if (maxagi) *maxagi = index; where xfs_set_inode* iterates over the (old) agcount in mp->m_sb.sb_agblocks, which has not yet been updated in the growfs path. So "index" will be returned based on the old agcount, not the new one, and new AGs are not available for inode allocation. Fix this by explicitly passing the proper AG count (which xfs_initialize_perag() already has) down another level, so that xfs_set_inode* can make the proper decision about acceptable AGs for inode allocation in the potentially newly-added AGs. This has been broken since 3.7, when these two xfs_set_inode* functions were added in commit 2d2194f. Prior to that, we looped over "agcount" not sb_agblocks in these calculations. Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: mark xfs_qm_quotacheck as staticJie Liu2-96/+94
xfs_qm_quotacheck() is not used outside of xfs_qm.c. Mark it static and move it around in the file to avoid a forward declaration. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: fix cil push sequence after log recoveryMark Tinguely1-2/+0
When the CIL checkpoint is fully written to the log, the LSN of the checkpoint commit record is written into the CIL context structure. This allows log force waiters to correctly detect when the checkpoint they are waiting on have been fully written into the log buffers. However, the initial context after mount is initialised with a non-zero commit LSN, so appears to waiters as though it is complete even though it may not have even been pushed, let alone written to the log buffers. Hence a log force immediately after a filesystem is mounted may not behave correctly, nor does commit record ordering if multiple CIL pushes interleave immediately after mount. To fix this, make sure the initial context commit LSN is not touched until the first checkpointis actually pushed. [dchinner: rewrite commit message] Signed-off-by: Mark Tinguely <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: squash prealloc while over quota free space as wellBrian Foster1-6/+14
From: Brian Foster <[email protected]> Commit 4d559a3b introduced heavy prealloc. squashing to catch the case of requesting too large a prealloc on smaller filesystems, leading to repeated flush and retry cycles that occur on ENOSPC. Now that we issue eofblocks scans on EDQUOT/ENOSPC, squash the prealloc against the minimum available free space across all applicable quotas as well to avoid a similar problem of repeated eofblocks scans. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: run an eofblocks scan on ENOSPC/EDQUOTBrian Foster4-4/+87
From: Brian Foster <[email protected]> Speculative preallocation and and the associated throttling metrics assume we're working with large files on large filesystems. Users have reported inefficiencies in these mechanisms when we happen to be dealing with large files on smaller filesystems. This can occur because while prealloc throttling is aggressive under low free space conditions, it is not active until we reach 5% free space or less. For example, a 40GB filesystem has enough space for several files large enough to have multi-GB preallocations at any given time. If those files are slow growing, they might reserve preallocation for long periods of time as well as avoid the background scanner due to frequent modification. If a new file is written under these conditions, said file has no access to this already reserved space and premature ENOSPC is imminent. To handle this scenario, modify the buffered write ENOSPC handling and retry sequence to invoke an eofblocks scan. In the smaller filesystem scenario, the eofblocks scan resets the usage of preallocation such that when the 5% free space threshold is met, throttling effectively takes over to provide fair and efficient preallocation until legitimate ENOSPC. The eofblocks scan is selective based on the nature of the failure. For example, an EDQUOT failure in a particular quota will use a filtered scan for that quota. Because we don't know which quota might have caused an allocation failure at any given time, we include each applicable quota determined to be under low free space conditions in the scan. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: support a union-based filter for eofblocks scansBrian Foster2-1/+33
From: Brian Foster <[email protected]> The eofblocks scan inode filter uses intersection logic by default. E.g., specifying both user and group quota ids filters out inodes that are not covered by both the specified user and group quotas. This is suitable for behavior exposed to userspace. Scans that are initiated from within the kernel might require more broad semantics, such as scanning all inodes under each quota associated with an inode to alleviate low free space conditions in each. Create the XFS_EOF_FLAGS_UNION flag to support a conditional union-based filtering algorithm for eofblocks scans. This flag is intentionally left out of the valid mask as it is not supported for scans initiated from userspace. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: add scan owner field to xfs_eofblocksBrian Foster2-1/+14
From: Brian Foster <[email protected]> The scan owner field represents an optional inode number that is responsible for the current scan. The purpose is to identify that an inode is under iolock and as such, the iolock shouldn't be attempted when trimming eofblocks. This is an internal only field. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: introduce xfs_bulkstat_grab_ichunkJie Liu1-50/+69
From: Jie Liu <[email protected]> Introduce xfs_bulkstat_grab_ichunk() to look up an inode chunk in where the given inode resides, then grab the record. Update the data for the pointed-to record if the inode was not the last in the chunk and there are some left allocated, return the grabbed inode count on success. Refactor xfs_bulkstat() with it. Signed-off-by: Jie Liu <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: introduce xfs_bulkstat_ichunk_raJie Liu1-24/+32
From: Jie Liu <[email protected]> Introduce xfs_bulkstat_ichunk_ra() to loop over all clusters in the next inode chunk, then performs readahead if there are any allocated inodes in that cluster. Refactor xfs_bulkstat() with it. Signed-off-by: Jie Liu <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: fix error handling at xfs_bulkstatJie Liu1-31/+6
From: Jie Liu <[email protected]> We should not ignore the btree operation errors at xfs_bulkstat() but to propagate them if any. This patch fix two places in this function and the remaining things will be fixed with code refactoring thereafter. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Jie Liu <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: remove redundant user buffer count checks at xfs_bulkstatJie Liu1-3/+1
From: Jie Liu <[email protected]> Remove the redundant user buffer and count checks as it has already been validated at xfs_ioc_bulkstat(). Signed-off-by: Jie Liu <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: fix error handling at xfs_inumbersJie Liu1-57/+40
From: Jie Liu <[email protected]> To fetch the file system number tables, we currently just ignore the errors and proceed to loop over the next AG or bump agino to the next chunk in case of btree operations failed, that is not properly because those errors might hint us potential file system problems. This patch rework xfs_inumbers() to handle the btree operation errors as well as the loop conditions. Signed-off-by: Jie Liu <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: consolidate xfs_inumbersJie Liu2-38/+32
From: Jie Liu <[email protected]> Consolidate xfs_inumbers() to make the formatter function return correct error and make the source code looks a bit neat. Signed-off-by: Jie Liu <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: remove xfs_bulkstat_singleChristoph Hellwig3-57/+2
From: Christoph Hellwig <[email protected]> xfs_bukstat_one doesn't have any failure case that would go away when called through xfs_bulkstat, so remove the fallback and the now unessecary xfs_bulkstat_single function. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jie Liu <[email protected]> Signed-off-by: Jie Liu <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-24xfs: remove redundant stat assignment in xfs_bulkstat_one_intJie Liu1-4/+1
From: Jie Liu <[email protected]> Remove the redundant BULKSTAT_RV_NOTHING assignment in case of call xfs_iget() failed at xfs_bulkstat_one_int(). Signed-off-by: Jie Liu <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-15xfs: add log attributes for log lsn and grant head dataBrian Foster1-0/+64
Create log attributes to export the current runtime state of the log to sysfs. Note that the filesystem should be frozen for consistency across attributes. The following per-mount attributes are created: log_head_lsn, log_tail_lsn, reserve_grant_head and write_grant_head. These represent the physical log head, tail and reserve and write grant heads respectively. Attribute values are exported in the following format: "cycle:[block,byte]" ... where cycle represents the log cycle and [block,bytes] represents either the basic block or byte offset of the log, depending on the attribute. Log sequence number (LSN) values are encoded in basic blocks and grant heads are encoded in bytes. All values are in decimal format. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-15xfs: add xlog sysfs kobject and attribute handlersBrian Foster4-0/+64
Embed a kobject into the xfs log data structure (xlog). This creates a 'log' subdirectory for every XFS mount instance in sysfs. The lifecycle of the log kobject is tied to the lifecycle of the log. Also define a set of generic attribute handlers associated with the log kobject in preparation for the addition of attributes. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-15xfs: add xfs_mount sysfs kobjectBrian Foster6-1/+133
Embed a base kobject into xfs_mount. This creates a kobject associated with each XFS mount and a subdirectory in sysfs with the name of the filesystem. The subdirectory lifecycle matches that of the mount. Also add the new xfs_sysfs.[c,h] source files with some XFS sysfs infrastructure to facilitate attribute creation. Note that there are currently no attributes exported as part of the xfs_mount kobject. It exists solely to serve as a per-mount container for child objects. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-15xfs: add a sysfs ksetBrian Foster1-1/+11
Create a sysfs kset to contain all sub-objects associated with the XFS module. The kset is created and removed on module initialization and removal respectively. The kset uses fs_obj as a parent. This leads to the creation of a /sys/fs/xfs directory when the kset exists. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-15xfs: fix a couple error sequence jumps in xfs_mountfs()Brian Foster1-2/+2
xfs_mountfs() has a couple failure conditions that do not jump to the correct labels. Specifically: - xfs_initialize_perag_data() failure does not deallocate the log even though it occurs after log initialization - xfs_mount_reset_sbqflags() failure returns the error directly rather than jump to the error sequence Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-15Merge branch 'xfs-libxfs-restructure' into for-nextDave Chinner102-1212/+1180
2014-07-15xfs: null unused quota inodes when quota is onDave Chinner1-4/+21
When quota is on, it is expected that unused quota inodes have a value of NULLFSINO. The changes to support a separate project quota in 3.12 broken this rule for non-project quota inode enabled filesystem, as the code now refuses to write the group quota inode if neither group or project quotas are enabled. This regression was introduced by commit d892d58 ("xfs: Start using pquotaino from the superblock"). In this case, we should be writing NULLFSINO rather than nothing to ensure that we leave the group quota inode in a valid state while quotas are enabled. Failure to do so doesn't cause a current kernel to break - the separate project quota inodes introduced translation code to always treat a zero inode as NULLFSINO. This was introduced by commit 0102629 ("xfs: Initialize all quota inodes to be NULLFSINO") with is also in 3.12 but older kernels do not do this and hence taking a filesystem back to an older kernel can result in quotas failing initialisation at mount time. When that happens, we see this in dmesg: [ 1649.215390] XFS (sdb): Mounting Filesystem [ 1649.316894] XFS (sdb): Failed to initialize disk quotas. [ 1649.316902] XFS (sdb): Ending clean mount By ensuring that we write NULLFSINO to quota inodes that aren't active, we avoid this problem. We have to be really careful when determining if the quota inodes are active or not, because we don't want to write a NULLFSINO if the quota inodes are active and we simply aren't updating them. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-15xfs: refine the allocation stack switchDave Chinner6-62/+90
The allocation stack switch at xfs_bmapi_allocate() has served it's purpose, but is no longer a sufficient solution to the stack usage problem we have in the XFS allocation path. Whilst the kernel stack size is now 16k, that is not a valid reason for undoing all our "keep stack usage down" modifications. What it does allow us to do is have the freedom to refine and perfect the modifications knowing that if we get it wrong it won't blow up in our faces - we have a safety net now. This is important because we still have the issue of older kernels having smaller stacks and that they are still supported and are demonstrating a wide range of different stack overflows. Red Hat has several open bugs for allocation based stack overflows from directory modifications and direct IO block allocation and these problems still need to be solved. If we can solve them upstream, then distro's won't need to bake their own unique solutions. To that end, I've observed that every allocation based stack overflow report has had a specific characteristic - it has happened during or directly after a bmap btree block split. That event requires a new block to be allocated to the tree, and so we effectively stack one allocation stack on top of another, and that's when we get into trouble. A further observation is that bmap btree block splits are much rarer than writeback allocation - over a range of different workloads I've observed the ratio of bmap btree inserts to splits ranges from 100:1 (xfstests run) to 10000:1 (local VM image server with sparse files that range in the hundreds of thousands to millions of extents). Either way, bmap btree split events are much, much rarer than allocation events. Finally, we have to move the kswapd state to the allocation workqueue work when allocation is done on behalf of kswapd. This is proving to cause significant perturbation in performance under memory pressure and appears to be generating allocation deadlock warnings under some workloads, so avoiding the use of a workqueue for the majority of kswapd writeback allocation will minimise the impact of such behaviour. Hence it makes sense to move the stack switch to xfs_btree_split() and only do it for bmap btree splits. Stack switches during allocation will be much rarer, so there won't be significant performacne overhead caused by switching stacks. The worse case stack from all allocation paths will be split, not just writeback. And the majority of memory allocations will be done in the correct context (e.g. kswapd) without causing additional latency, and so we simplify the memory reclaim interactions between processes, workqueues and kswapd. The worst stack I've been able to generate with this patch in place is 5600 bytes deep. It's very revealing because we exit XFS at: 37) 1768 64 kmem_cache_alloc+0x13b/0x170 about 1800 bytes of stack consumed, and the remaining 3800 bytes (and 36 functions) is memory reclaim, swap and the IO stack. And this occurs in the inode allocation from an open(O_CREAT) syscall, not writeback. The amount of stack being used is much less than I've previously be able to generate - fs_mark testing has been able to generate stack usage of around 7k without too much trouble; with this patch it's only just getting to 5.5k. This is primarily because the metadata allocation paths (e.g. directory blocks) are no longer causing double splits on the same stack, and hence now stack tracing is showing swapping being the worst stack consumer rather than XFS. Performance of fs_mark inode create workloads is unchanged. Performance of fs_mark async fsync workloads is consistently good with context switches reduced by around 150,000/s (30%). Performance of dbench, streaming IO and postmark is unchanged. Allocation deadlock warnings have not been seen on the workloads that generated them since adding this patch. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-07-15Revert "xfs: block allocation work needs to be kswapd aware"Dave Chinner2-20/+9
This reverts commit 1f6d64829db78a7e1d63e15c9f48f0a5d2b5a679. This commit resulted in regressions in performance in low memory situations where kswapd was doing writeback of delayed allocation blocks. It resulted in significant parallelism of the kswapd work and with the special kswapd flags meant that hundreds of active allocation could dip into kswapd specific memory reserves and avoid being throttled. This cause a large amount of performance variation, as well as random OOM-killer invocations that didn't previously exist. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-25xfs: global error sign conversionDave Chinner67-887/+882
Convert all the errors the core XFs code to negative error signs like the rest of the kernel and remove all the sign conversion we do in the interface layers. Errors for conversion (and comparison) found via searches like: $ git grep " E" fs/xfs $ git grep "return E" fs/xfs $ git grep " E[A-Z].*;$" fs/xfs Negation points found via searches like: $ git grep "= -[a-z,A-Z]" fs/xfs $ git grep "return -[a-z,A-D,F-Z]" fs/xfs $ git grep " -[a-z].*;" fs/xfs [ with some bits I missed from Brian Foster ] Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-25libxfs: move source filesDave Chinner27-31/+32
Move all the source files that are shared with userspace into libxfs/. This is done as one big chunk simpy to get it done quickly Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-25libxfs: move header filesDave Chinner27-0/+0
Move all the header files that are shared with userspace into libxfs. This is done as one big chunk simpy to get it done quickly. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-25xfs: create libxfs infrastructureDave Chinner3-1/+6
To minimise the differences between kernel and userspace code, split the kernel code into the same structure as the userspace code. That is, the gneric core functionality of XFS is moved to a libxfs/ directory and treat it as a layering barrier in the XFS code. This patch introduces the libxfs directory, the build infrastructure and an initial source and header file to build. The libxfs directory will contain the header files that are needed to build libxfs - most of userspace does not care about the location of these header files as they are accessed indirectly. Hence keeping them inside libxfs makes it easy to track the changes and script the sync process as the directory structure will be identical. To allow this changeover to occur in the kernel code, there are some temporary infrastructure in the makefiles to grab the header filesystem from both locations. Once all the files are moved, modifications will be made in the source code that will make the need for these include directives go away. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-22xfs: Nuke XFS_ERROR macroEric Sandeen45-519/+487
XFS_ERROR was designed long ago to trap return values, but it's not runtime configurable, it's not consistently used, and we can do similar error trapping with ftrace scripts and triggers from userspace. Just nuke XFS_ERROR and associated bits. Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-22xfs: return is not a functionEric Sandeen14-141/+140
return is not a function. "return(EIO);" is silly; "return (EIO);" moreso. return is not a function. Nuke the pointless parens. [dchinner: catch a couple of extra cases in xfs_attr_list.c, xfs_acl.c and xfs_linux.h.] Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-12Merge branch 'for-linus' of ↵Linus Torvalds3-103/+34
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "This the bunch that sat in -next + lock_parent() fix. This is the minimal set; there's more pending stuff. In particular, I really hope to get acct.c fixes merged this cycle - we need that to deal sanely with delayed-mntput stuff. In the next pile, hopefully - that series is fairly short and localized (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more iov_iter work. Most of prereqs for ->splice_write with sane locking order are there and Kent's dio rewrite would also fit nicely on top of this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits) lock_parent: don't step on stale ->d_parent of all-but-freed one kill generic_file_splice_write() ceph: switch to iter_file_splice_write() shmem: switch to iter_file_splice_write() nfs: switch to iter_splice_write_file() fs/splice.c: remove unneeded exports ocfs2: switch to iter_file_splice_write() ->splice_write() via ->write_iter() bio_vec-backed iov_iter optimize copy_page_{to,from}_iter() bury generic_file_aio_{read,write} lustre: get rid of messing with iovecs ceph: switch to ->write_iter() ceph_sync_direct_write: stop poking into iov_iter guts ceph_sync_read: stop poking into iov_iter guts new helper: copy_page_from_iter() fuse: switch to ->write_iter() btrfs: switch to ->write_iter() ocfs2: switch to ->write_iter() xfs: switch to ->write_iter() ...
2014-06-12Merge commit '9f12600fe425bc28f0ccba034a77783c09c15af4' into for-linusAl Viro13-75/+104
Backmerge of dcache.c changes from mainline. It's that, or complete rebase... Conflicts: fs/splice.c Signed-off-by: Al Viro <[email protected]>
2014-06-12->splice_write() via ->write_iter()Al Viro2-43/+1
iter_file_splice_write() - a ->splice_write() instance that gathers the pipe buffers, builds a bio_vec-based iov_iter covering those and feeds it to ->write_iter(). A bunch of simple cases coverted to that... [AV: fixed the braino spotted by Cyrill] Signed-off-by: Al Viro <[email protected]>
2014-06-11Merge tag 'xfs-for-linus-3.16-rc1' of git://oss.sgi.com/xfs/xfsLinus Torvalds90-2784/+2634
Pull xfs updates from Dave Chinner: "This update contains: - cleanup removing unused function args - rework of the filestreams allocator to use dentry cache parent lookups - new on-disk free inode btree and optimised inode allocator - various bug fixes - rework of internal attribute API - cleanup of superblock feature bit support to remove historic cruft - more fixes and minor cleanups - added a new directory/attribute geometry abstraction - yet more fixes and minor cleanups" * tag 'xfs-for-linus-3.16-rc1' of git://oss.sgi.com/xfs/xfs: (86 commits) xfs: fix xfs_da_args sparse warning in xfs_readdir xfs: Fix rounding in xfs_alloc_fix_len() xfs: tone down writepage/releasepage WARN_ONs xfs: small cleanup in xfs_lowbit64() xfs: kill xfs_buf_geterror() xfs: xfs_readsb needs to check for magic numbers xfs: block allocation work needs to be kswapd aware xfs: remove redundant geometry information from xfs_da_state xfs: replace attr LBSIZE with xfs_da_geometry xfs: pass xfs_da_args to xfs_attr_leaf_newentsize xfs: use xfs_da_geometry for block size in attr code xfs: remove mp->m_dir_geo from directory logging xfs: reduce direct usage of mp->m_dir_geo xfs: move node entry counts to xfs_da_geometry xfs: convert dir/attr btree threshold to xfs_da_geometry xfs: convert m_dirblksize to xfs_da_geometry xfs: convert m_dirblkfsbs to xfs_da_geometry xfs: convert directory segment limits to xfs_da_geometry xfs: convert directory db conversion to xfs_da_geometry xfs: convert directory dablk conversion to xfs_da_geometry ...
2014-06-10fs,userns: Change inode_capable to capable_wrt_inode_uidgidAndy Lutomirski1-1/+1
The kernel has no concept of capabilities with respect to inodes; inodes exist independently of namespaces. For example, inode_capable(inode, CAP_LINUX_IMMUTABLE) would be nonsense. This patch changes inode_capable to check for uid and gid mappings and renames it to capable_wrt_inode_uidgid, which should make it more obvious what it does. Fixes CVE-2014-4014. Cc: Theodore Ts'o <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: [email protected] Signed-off-by: Andy Lutomirski <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-10Merge branch 'xfs-misc-fixes-3-for-3.16' into for-nextDave Chinner13-53/+60
2014-06-10Merge branch 'xfs-da-geom' into for-nextDave Chinner26-778/+819
2014-06-10xfs: fix xfs_da_args sparse warning in xfs_readdirDave Chinner1-1/+1
The kbuild test robot reported: >> fs/xfs/xfs_dir2_readdir.c:672:41: sparse: Using plain integer as NULL pointer Fix it. Reported-by: kbuild test robot <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: Fix rounding in xfs_alloc_fix_len()Jan Kara1-10/+8
Rounding in xfs_alloc_fix_len() is wrong. As the comment states, the result should be a number of a form (k*prod+mod) however due to sign mistake the result is different. As a result allocations on raid arrays could be misaligned in some cases. This also seems to fix occasional assertion failure: XFS_WANT_CORRUPTED_GOTO(rlen <= flen, error0) in xfs_alloc_ag_vextent_size(). Also add an assertion that the result of xfs_alloc_fix_len() is of expected form. Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: tone down writepage/releasepage WARN_ONsChristoph Hellwig1-3/+3
I recently ran into the issue fixed by "xfs: kill buffers over failed write ranges properly" which spams the log with lots of backtraces. Make debugging any issues like that easier by using WARN_ON_ONCE in the writeback code. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: small cleanup in xfs_lowbit64()Dan Carpenter1-2/+5
There are two checkpatch.pl complaints here because of the bad indenting and because of the assignment inside the condition. Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Eric Sandeen <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: kill xfs_buf_geterror()Dave Chinner8-23/+7
Most of the callers are just calling ASSERT(!xfs_buf_geterror()) which means they are checking for bp->b_error == 0. If bp is null in this case, we will assert fail, and hence it's no different in result to oopsing because of a null bp. In some cases, errors have already been checked for or the function returning the buffer can't return a buffer with an error, so it's just a redundant assert. Either way, the assert can either be removed. The other two non-assert callers can just test for a buffer and error properly. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: xfs_readsb needs to check for magic numbersDave Chinner1-6/+17
Commit daba542 ("xfs: skip verification on initial "guess" superblock read") dropped the use of a verifier for the initial superblock read so we can probe the sector size of the filesystem stored in the superblock. It, however, now fails to validate that what was read initially is actually an XFS superblock and hence will fail the sector size check and return ENOSYS. This causes probe-based mounts to fail because it expects XFS to return EINVAL when it doesn't recognise the superblock format. cc: <[email protected]> Reported-by: Plamen Petrov <[email protected]> Tested-by: Plamen Petrov <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: block allocation work needs to be kswapd awareDave Chinner2-9/+20
Upon memory pressure, kswapd calls xfs_vm_writepage() from shrink_page_list(). This can result in delayed allocation occurring and that gets deferred to the the allocation workqueue. The allocation then runs outside kswapd context, which means if it needs memory (and it does to demand page metadata from disk) it can block in shrink_inactive_list() waiting for IO congestion. These blocking waits are normally avoiding in kswapd context, so under memory pressure writeback from kswapd can be arbitrarily delayed by memory reclaim. To avoid this, pass the kswapd context to the allocation being done by the workqueue, so that memory reclaim understands correctly that the work is being done for kswapd and therefore it is not blocked and does not delay memory reclaim. To avoid issues with int->char conversion of flag fields (as noticed in v1 of this patch) convert the flag fields in the struct xfs_bmalloca to bool types. pahole indicates these variables are still single byte variables, so no extra space is consumed by this change. cc: <[email protected]> Reported-by: Tetsuo Handa <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: remove redundant geometry information from xfs_da_stateDave Chinner5-35/+20
It's carried in state->args->geo, so there's no need to duplicate it and use more stack space than necessary. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: replace attr LBSIZE with xfs_da_geometryDave Chinner4-92/+89
Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2014-06-06xfs: pass xfs_da_args to xfs_attr_leaf_newentsizeDave Chinner3-35/+19
As it's only ever called from contexts where the xfs_da_args is present and contains all the information needed inside the args structure. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]>