aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2018-01-29ceph: fix incorrect snaprealm when adding capsYan, Zheng2-3/+19
Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-01-29ceph: fix un-balanced fsc->writeback_count updateYan, Zheng1-3/+6
Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-01-29ceph: track read contexts in ceph_file_infoYan, Zheng3-9/+66
Previously ceph_read_iter() uses current->journal to pass context info to ceph_readpages(), so that ceph_readpages() can distinguish read(2) from readahead(2)/fadvise(2)/madvise(2). The problem is that page fault can happen when copying data to userspace memory. Page fault may call other filesystem's page_mkwrite() if the userspace memory is mapped to a file. The later filesystem may also want to use current->journal. The fix is define a on-stack data structure in ceph_read_iter(), add it to context list in ceph_file_info. ceph_readpages() searches the list, find if there is a context belongs to current thread. Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-01-29ceph: avoid dereferencing invalid pointer during cached readdirYan, Zheng2-19/+66
Readdir cache keeps array of dentry pointers in page cache. If any dentry in readdir cache gets pruned, ceph_d_prune() disables readdir cache for later readdir syscall. The problem is that ceph_d_prune() ignores unhashed dentry. Ideally MDS should have already revoked CEPH_CAP_FILE_SHARED (which also disables readdir cache) when dentry gets unhashed. But if it is somehow MDS does not properly revoke CEPH_CAP_FILE_SHARED and the unhashed dentry gets pruned later, ceph_d_prune() will not disable readdir cache, later readdir may reference invalid dentry pointer. The fix is make ceph_d_prune() do extra check for unhashed dentry. Disable readdir cache if the unhashed dentry is still referenced by readdir cache. Another fix in this patch is handle d_splice_alias(). If a dentry gets spliced into new parent dentry, treat it as if it was pruned (call ceph_d_prune() for it). Signed-off-by: "Yan, Zheng" <[email protected]> Acked-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-01-29ceph: use atomic_t for ceph_inode_info::i_shared_genYan, Zheng4-12/+13
It allows accessing i_shared_gen without holding i_ceph_lock. It is preparation for later patch. Signed-off-by: "Yan, Zheng" <[email protected]> Acked-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-01-29ceph: cleanup traceless reply handling for renameYan, Zheng2-12/+6
ceph_fill_trace() already calls ceph_invalidate_dir_request() for traceless reply. No need to duplicate the code in ceph_rename(). Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-01-29ceph: voluntarily drop Fx cap for readdir requestYan, Zheng1-0/+1
MDS need to rdlock directory inode's filelock when handling readdir request. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke message. Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-01-29ceph: properly drop caps for setattr requestYan, Zheng1-6/+6
For CEPH_SETATTR_ATIME, MDS needs to xlock filelock, Fsxrw caps are not allowed for xlocked filelock. For CEPH_SETATTR_SIZE request that truncates file to smaller size, MDS needs to xlock filelock, Fsxrw caps are not allowed for xlocked filelock. Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-01-29ceph: voluntarily drop Lx cap for link/rename requestsYan, Zheng1-2/+2
MDS need to xlock inode's linklock when handling link/rename requests. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke message. Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-01-29ceph: voluntarily drop Ax cap for requests that create new inodeYan, Zheng3-10/+19
MDS need to rdlock directory inode's authlock when handling these requests. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke message. Signed-off-by: "Yan, Zheng" <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-01-29GFS2: Don't try to end a non-existent transaction in unlinkBob Peterson1-3/+2
Before this patch, if function gfs2_unlink failed to get a valid transaction (for example, not enough journal blocks) it would go to label out_end_trans which did gfs2_trans_end. But if the trans_begin failed, there's no transaction to end, and trying to do so results in: kernel BUG at fs/gfs2/trans.c:117! This patch changes the goto so that it does not try to end a non-existent transaction. Signed-off-by: Bob Peterson <[email protected]>
2018-01-29xfs: remove experimental tag for reflinksChristoph Hellwig1-5/+1
But reject reflink + DAX file systems for now until the code to support reflinks on DAX is actually implemented. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> [darrick: port to 4.16] Signed-off-by: Darrick J. Wong <[email protected]>
2018-01-29xfs: don't screw up direct writes when freesp is fragmentedDarrick J. Wong1-0/+20
xfs_bmap_btalloc is given a range of file offset blocks that must be allocated to some data/attr/cow fork. If the fork has an extent size hint associated with it, the request will be enlarged on both ends to try to satisfy the alignment hint. If free space is fragmentated, sometimes we can allocate some blocks but not enough to fulfill any of the requested range. Since bmapi_allocate always trims the new extent mapping to match the originally requested range, this results in bmapi_write returning zero and no mapping. The consequences of this vary -- buffered writes will simply re-call bmapi_write until it can satisfy at least one block from the original request. Direct IO overwrites notice nmaps == 0 and return -ENOSPC through the dio mechanism out to userspace with the weird result that writes fail even when we have enough space because the ENOSPC return overrides any partial write status. For direct CoW writes the situation was disastrous because nobody notices us returning an invalid zero-length wrong-offset mapping to iomap and the write goes off into space. Therefore, if free space is so fragmented that we managed to allocate some space but not enough to map into even a single block of the original allocation request range, we should break the alignment hint in order to guarantee at least some forward progress for the direct write. If we return a short allocation to iomap_apply it'll call back about the remaining blocks. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29xfs: check reflink allocation mappingsDarrick J. Wong1-0/+7
There's a really bad bug in xfs_reflink_allocate_cow -- if bmapi_write can return a zero error code but no mappings. This happens if there's an extent size hint (which causes allocation requests to be rounded to extsz granularity internally), but there wasn't a big enough chunk of free space to start filling at the extsz granularity and fill even one block of the range that we actually requested. In any case, if we got no mappings we can't possibly do anything useful with the contents of imap, so we must bail out with ENOSPC here. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29iomap: warn on zero-length mappingsDarrick J. Wong1-0/+2
Don't let the iomap callback get away with feeding us a garbage zero length mapping -- there was a bug in xfs that resulted in those leaking out to hilarious effect. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29xfs: treat CoW fork operations as delalloc for quota accountingDarrick J. Wong2-6/+41
Since the CoW fork only exists in memory, it is incorrect to update the on-disk quota block counts when we modify the CoW fork. Unlike the data fork, even real extents in the CoW fork are only delalloc-style reservations (on-disk they're owned by the refcountbt) so they must not be tracked in the on disk quota info. Ensure the i_delayed_blks accounting reflects this too. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29xfs: only grab shared inode locks for source file during reflinkDarrick J. Wong1-10/+15
Reflink and dedupe operations remap blocks from a source file into a destination file. The destination file needs exclusive locks on all levels because we're updating its block map, but the source file isn't undergoing any block map changes so we can use a shared lock. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modesDarrick J. Wong4-22/+39
Refactor xfs_lock_two_inodes to take separate locking modes for each inode. Specifically, this enables us to take a SHARED lock on one inode and an EXCL lock on the other. The lock class (MMAPLOCK/ILOCK) must be the same for each inode. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29xfs: reflink should break pnfs leases before sharing blocksDarrick J. Wong1-1/+47
Before we share blocks between files, we need to break the pnfs leases on the layout before we start slicing and dicing the block map. The structure of this function sets us up for the lock contention reduction in the next patch. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29xfs: don't clobber inobt/finobt cursors when xref with rmapDarrick J. Wong1-2/+2
Even if we can't use the inobt/finobt cursors to count the number of inode btree blocks, we are never allowed to clobber the cursor of the btree being checked, so don't do this. Found by fuzzing level = ones in xfs/364. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29xfs: skip CoW writes past EOF when writeback races with truncateDarrick J. Wong1-0/+13
Every so often we blow the ASSERT(type != XFS_IO_COW) in xfs_map_blocks when running fsstress, as we do in generic/269. The cause of this is writeback racing with truncate -- writeback doesn't take the iolock, so truncate can sneak in to decrease i_size and truncate page cache while writeback is gathering buffer heads to schedule writeout. If we hit this race on a block that has a CoW mapping, we'll get a valid imap from the CoW fork but the reduced i_size trims the mapping to zero length (which makes it invalid), so we call xfs_map_blocks to try again. This doesn't do much anyway, since any mapping we get out of that will also be invalid, so we might as well skip the assert and just stop. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29xfs: preserve i_rdev when recycling a reclaimable inodeAmir Goldstein1-0/+2
Commit 66f364649d870 ("xfs: remove if_rdev") moved storing of rdev value for special inodes to VFS inodes, but forgot to preserve the value of i_rdev when recycling a reclaimable xfs_inode. This was detected by xfstest overlay/017 with inodex=on mount option and xfs base fs. The test does a lookup of overlay chardev and blockdev right after drop caches. Overlayfs inodes hold a reference on underlying xfs inodes when mount option index=on is configured. If drop caches reclaim xfs inodes, before it relclaims overlayfs inodes, that can sometimes leave a reclaimable xfs inode and that test hits that case quite often. When that happens, the xfs inode cache remains broken (zere i_rdev) until the next cycle mount or drop caches. Fixes: 66f364649d870 ("xfs: remove if_rdev") Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-01-29xfs: refactor accounting updates out of xfs_bmap_btallocDarrick J. Wong1-13/+17
Move all the inode and quota accounting updates out of xfs_bmap_btalloc in preparation for fixing some quota accounting problems with copy on write. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2018-01-29xfs: refactor inode verifier corruption error printingDarrick J. Wong4-10/+50
Refactor inode verifier error reporting into a non-libxfs function so that we aren't encoding the message format in libxfs. This also changes the kernel dmesg output to resemble buffer verifier errors more closely. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29xfs: make tracepoint inode number format consistentDarrick J. Wong1-6/+6
Fix all the inode number formats to be consistently (0x%llx) in all trace point definitions. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29xfs: always zero di_flags2 when we free the inodeDarrick J. Wong1-0/+1
Always zero the di_flags2 field when we free the inode so that we never end up with an on-disk record for an unallocated inode that also has the reflink iflag set. This is in keeping with the general principle that only files can have the reflink iflag set, even though we'll zero out di_flags2 if we ever reallocate the inode. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29xfs: call xfs_qm_dqattach before performing reflink operationsDarrick J. Wong1-0/+5
Ensure that we've attached all the necessary dquots before performing reflink operations so that quota accounting is accurate. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2018-01-29xfs: bmap code cleanupShan Hai1-24/+8
Remove the extent size hint and realtime inode relevant code from the xfs_bmapi_reserve_delalloc since it is not called on the inode with extent size hint set or on a realtime inode. Signed-off-by: Shan Hai <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-01-29Use list_head infra-structure for buffer's log items listCarlos Maiolino9-71/+44
Now that buffer's b_fspriv has been split, just replace the current singly linked list of xfs_log_items, by the list_head infrastructure. Also, remove the xfs_log_item argument from xfs_buf_resubmit_failed_buffers(), there is no need for this argument, once the log items can be walked through the list_head in the buffer. Signed-off-by: Carlos Maiolino <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> [darrick: minor style cleanups] Signed-off-by: Darrick J. Wong <[email protected]>
2018-01-29Split buffer's b_fspriv fieldCarlos Maiolino18-86/+104
By splitting the b_fspriv field into two different fields (b_log_item and b_li_list). It's possible to get rid of an old ABI workaround, by using the new b_log_item field to store xfs_buf_log_item separated from the log items attached to the buffer, which will be linked in the new b_li_list field. This way, there is no more need to reorder the log items list to place the buf_log_item at the beginning of the list, simplifying a bit the logic to handle buffer IO. This also opens the possibility to change buffer's log items list into a proper list_head. b_log_item field is still defined as a void *, because it is still used by the log buffers to store xlog_in_core structures, and there is no need to add an extra field on xfs_buf just for xlog_in_core. Signed-off-by: Carlos Maiolino <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> [darrick: minor style changes] Signed-off-by: Darrick J. Wong <[email protected]>
2018-01-29Get rid of xfs_buf_log_item_t typedefCarlos Maiolino3-52/+56
Take advantage of the rework on xfs_buf log items list, to get rid of ths typedef for xfs_buf_log_item. This patch also fix some indentation alignment issues found along the way. Signed-off-by: Carlos Maiolino <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-34/+10
Signed-off-by: David S. Miller <[email protected]>
2018-01-29btrfs: only dirty the inode in btrfs_update_time if something was changedJeff Layton1-2/+3
At this point, we know that "now" and the file times may differ, and we suspect that the i_version has been flagged to be bumped. Attempt to bump the i_version, and only mark the inode dirty if that actually occurred or if one of the times was updated. Signed-off-by: Jeff Layton <[email protected]> Acked-by: David Sterba <[email protected]> Reviewed-by: Liu Bo <[email protected]>
2018-01-29xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementingJeff Layton1-6/+8
If XFS_ILOG_CORE is already set then go ahead and increment it. Signed-off-by: Jeff Layton <[email protected]> Acked-by: Darrick J. Wong <[email protected]> Acked-by: Dave Chinner <[email protected]>
2018-01-29fs: only set S_VERSION when updating times if necessaryJeff Layton1-3/+7
We only really need to update i_version if someone has queried for it since we last incremented it. By doing that, we can avoid having to update the inode if the times haven't changed. If the times have changed, then we go ahead and forcibly increment the counter, under the assumption that we'll be going to the storage anyway, and the increment itself is relatively cheap. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Jan Kara <[email protected]>
2018-01-29xfs: convert to new i_version APIJeff Layton5-7/+15
Signed-off-by: Jeff Layton <[email protected]> Acked-by: Darrick J. Wong <[email protected]> Acked-by: Dave Chinner <[email protected]>
2018-01-29ufs: use new i_version APIJeff Layton3-6/+9
Signed-off-by: Jeff Layton <[email protected]>
2018-01-29ocfs2: convert to new i_version APIJeff Layton4-10/+14
Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Jan Kara <[email protected]>
2018-01-29nfsd: convert to new i_version APIJeff Layton1-1/+2
Mostly just making sure we use the "get" wrappers so we know when it is being fetched for later use. Signed-off-by: Jeff Layton <[email protected]>
2018-01-29nfs: convert to new i_version APIJeff Layton6-23/+26
For NFS, we just use the "raw" API since the i_version is mostly managed by the server. The exception there is when the client holds a write delegation, but we only need to bump it once there anyway to handle CB_GETATTR. Tested-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Jeff Layton <[email protected]>
2018-01-29ext4: convert to new i_version APIJeff Layton7-17/+26
Signed-off-by: Jeff Layton <[email protected]> Acked-by: Theodore Ts'o <[email protected]>
2018-01-29ext2: convert to new i_version APIJeff Layton2-6/+8
Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Jan Kara <[email protected]>
2018-01-29exofs: switch to new i_version APIJeff Layton2-5/+7
Signed-off-by: Jeff Layton <[email protected]>
2018-01-29btrfs: convert to new i_version APIJeff Layton3-5/+12
Signed-off-by: Jeff Layton <[email protected]> Acked-by: David Sterba <[email protected]>
2018-01-29afs: convert to new i_version APIJeff Layton2-3/+5
For AFS, it's generally treated as an opaque value, so we use the *_raw variants of the API here. Note that AFS has quite a different definition for this counter. AFS only increments it on changes to the data to the data in regular files and contents of the directories. Inode metadata changes do not result in a version increment. We'll need to reconcile that somehow if we ever want to present this to userspace via statx. Signed-off-by: Jeff Layton <[email protected]>
2018-01-29affs: convert to new i_version APIJeff Layton3-5/+8
Signed-off-by: Jeff Layton <[email protected]>
2018-01-29fat: convert to new i_version APIJeff Layton4-19/+22
Signed-off-by: Jeff Layton <[email protected]>
2018-01-29fs: new API for handling inode->i_versionJeff Layton7-0/+7
Add a documentation blob that explains what the i_version field is, how it is expected to work, and how it is currently implemented by various filesystems. We already have inode_inc_iversion. Add several other functions for manipulating and accessing the i_version counter. For now, the implementation is trivial and basically works the way that all of the open-coded i_version accesses work today. Future patches will convert existing users of i_version to use the new API, and then convert the backend implementation to do things more efficiently. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Jan Kara <[email protected]>
2018-01-28NFS: Fix a race between mmap() and O_DIRECTTrond Myklebust1-1/+1
When locking the file in order to do O_DIRECT on it, we must unmap any mmapped ranges on the pagecache so that we can flush out the dirty data. Fixes: a5864c999de67 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <[email protected]> Cc: [email protected] # v4.8+
2018-01-28fs/cifs/cifsacl.c Fixes typo in a commentAchilles Gaikwad1-1/+1
Signed-off-by: Achilles Gaikwad <[email protected]> Signed-off-by: Steve French <[email protected]>