aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2019-07-06NFS: Cleanup if nfs_match_client is interruptedBenjamin Coddington1-2/+2
Don't bail out before cleaning up a new allocation if the wait for searching for a matching nfs client is interrupted. Memory leaks. Reported-by: [email protected] Fixes: 950a578c6128 ("NFS: make nfs_match_client killable") Signed-off-by: Benjamin Coddington <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06nfs: disable client side deduplicationDarrick J. Wong1-1/+5
The NFS protocol doesn't support deduplication, so turn it off again. Fixes: ce96e888fe48e ("Fix nfs4.2 return -EINVAL when do dedupe operation") Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06NFSv4: Add lease_time and lease_expired to 'nfs4:' line of mountstatsDave Wysochanski1-0/+11
On the NFS client there is no low-impact way to determine the nfs4 lease time or whether the lease is expired, so add these to mountstats with times displayed in seconds. If the lease is not expired, display lease_expired=0. Otherwise, display lease_expired=seconds_since_expired, similar to 'age:' line in mountstats. Signed-off-by: Dave Wysochanski <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06NFS: Clean up writeback codeTrond Myklebust3-10/+5
Now that the VM promises never to recurse back into the filesystem layer on writeback, remove all the GFP_NOFS references etc from the generic writeback code. Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06Merge branch 'multipath_tcp'Trond Myklebust6-11/+45
2019-07-06Merge branch 'containers'Trond Myklebust7-9/+242
2019-07-06NFS: send state management on a single connection.NeilBrown1-9/+13
With NFSv4.1, different network connections need to be explicitly bound to a session. During session startup, this is not possible so only a single connection must be used for session startup. So add a task flag to disable the default round-robin choice of connections (when nconnect > 1) and force the use of a single connection. Then use that flag on all requests for session management - for consistence, include NFSv4.0 management (SETCLIENTID) and session destruction Reported-by: Chuck Lever <[email protected]> Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06NFS: Allow multiple connections to a NFSv2 or NFSv3 serverTrond Myklebust1-0/+1
Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06NFS: Display the "nconnect" mount option if it is set.Trond Myklebust1-0/+2
Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06pNFS: Allow multiple connections to the DSTrond Myklebust2-0/+6
If the user specifies -onconnect=<number> mount option, and the transport protocol is TCP, then set up <number> connections to the pNFS data server as well. The connections will all go to the same IP address. Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06NFSv4: Allow multiple connections to NFSv4.x (x>0) serversTrond Myklebust3-2/+12
If the user specifies the -onconn=<number> mount option, and the transport protocol is TCP, then set up <number> connections to the server. The connections will all go to the same IP address. Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06NFS: Add a mount option to specify number of TCP connections to useTrond Myklebust2-0/+11
Allow the user to specify that the client should use multiple connections to the server. For the moment, this functionality will be limited to TCP and to NFSv4.x (x>0). Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06NFS: Add sysfs support for per-container identifierTrond Myklebust4-0/+135
In order to identify containers to the NFS client, we add a per-net sysfs attribute that udev can fill with the appropriate identifier. The identifier could be a unique hostname, but in most cases it will probably be a persisted uuid. Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06NFS: Add deferred cache invalidation for close-to-open consistency violationsTrond Myklebust2-4/+15
If the client detects that close-to-open cache consistency has been violated, and that the file or directory has been changed on the server, then do a cache invalidation when we're done working with the file. The reason we don't do an immediate cache invalidation is that we want to avoid performance problems due to false positives. Also, note that we cannot guarantee cache consistency in this situation even if we do invalidate the cache. Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06NFS: Cleanup - add nfs_clients_exit to mirror nfs_clients_initTrond Myklebust3-8/+13
Add a helper to clean up the struct nfs_net when it is being destroyed. Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06NFS: Create a root NFS directory in /sys/fs/nfsTrond Myklebust4-1/+94
Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06NFSv4: Handle the special Linux file open access modeTrond Myklebust2-1/+2
According to the open() manpage, Linux reserves the access mode 3 to mean "check for read and write permission on the file and return a file descriptor that can't be used for reading or writing." Currently, the NFSv4 code will ask the server to open the file, and will use an incorrect share access mode of 0. Since it has an incorrect share access mode, the client later forgets to send a corresponding close, meaning it can leak stateids on the server. Fixes: ce4ef7c0a8a05 ("NFS: Split out NFS v4 file operations") Cc: [email protected] # 3.6+ Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06NFSv4: Handle open for execute correctlyTrond Myklebust1-8/+20
When mapping the NFSv4 context to an open mode and access mode, we need to treat the FMODE_EXEC flag differently. For the open mode, FMODE_EXEC means we need read share access. For the access mode checking, we need to verify that the user actually has execute access. Signed-off-by: Trond Myklebust <[email protected]>
2019-07-06Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-1/+0
Pull vfs fixlet from Al Viro: "Fix bogus default y in Kconfig (VALIDATE_FS_PARSER) That thing should not be turned on by default, especially since it's not quiet in case it finds no problems. Geert has sent the obvious fix quite a few times, but it fell through the cracks" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: VALIDATE_FS_PARSER should default to n
2019-07-05Merge tag 'nfsd-5.2-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-1/+1
Pull nfsd fixes from Bruce Fields: "Two more quick bugfixes for nfsd: fixing a regression causing mount failures on high-memory machines and fixing the DRC over RDMA" * tag 'nfsd-5.2-2' of git://linux-nfs.org/~bfields/linux: nfsd: Fix overflow causing non-working mounts on 1 TB machines svcrdma: Ignore source port when computing DRC hash
2019-07-05xfs: disable map_sync for async flushPankaj Gupta1-3/+6
Dont support 'MAP_SYNC' with non-DAX files and DAX files with asynchronous dax_device. Virtio pmem provides asynchronous host page cache flush mechanism. We don't support 'MAP_SYNC' with virtio pmem and xfs. Signed-off-by: Pankaj Gupta <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2019-07-05ext4: disable map_sync for async flushPankaj Gupta1-4/+6
Dont support 'MAP_SYNC' with non-DAX files and DAX files with asynchronous dax_device. Virtio pmem provides asynchronous host page cache flush mechanism. We don't support 'MAP_SYNC' with virtio pmem and ext4. Signed-off-by: Pankaj Gupta <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2019-07-05xfs: online scrub needn't bother zeroing its temporary bufferDarrick J. Wong1-1/+5
The xattr scrubber functions use the temporary memory buffer either for storing bitmaps or for testing if attribute value extraction works. The bitmap code always zeroes what it needs and the value extraction sets the buffer contents, so it's not necessary to waste CPU time zeroing on allocation. Note that while we never read the contents that the attr value extraction function sets, we do need to call it to check the remote attribute header and CRCs to check for corruption. A flame graph analysis showed that we were spending 7% of a xfs_scrub run (the whole program, not just the attr scrubber itself) allocating and zeroing 64k segments needlessly. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2019-07-05xfs: only allocate memory for scrubbing attributes when we need itDarrick J. Wong2-10/+63
In examining a flame graph of time spent running xfs_scrub on various filesystems, I noticed that we spent nearly 7% of the total runtime on allocating a zeroed 65k buffer for every SCRUB_TYPE_XATTR invocation. We do this even if none of the attribute values were anywhere near 64k in size, even if there were no attribute blocks to check space on, and even if it just turns out there are no attributes at all. Therefore, rearrange the xattr buffer setup code to support reallocating with a bigger buffer and redistribute the callers of that function so that we only allocate memory just prior to needing it, and only allocate as much as we need. If we can't get memory with the ILOCK held we'll bail out with EDEADLOCK which will allocate the maximum memory. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2019-07-05xfs: refactor attr scrub memory allocation functionDarrick J. Wong2-9/+26
Move the code that allocates memory buffers for the extended attribute scrub code into a separate function so we can reduce memory allocations in the next patch. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2019-07-05xfs: refactor extended attribute buffer pointer functionsDarrick J. Wong2-9/+72
Replace the open-coded attribute buffer pointer calculations with helper functions to make it more obvious what we're doing with our freeform memory allocation w.r.t. either storing xattr values or computing btree block free space. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2019-07-05xfs: attribute scrub should use seen_enough to pass error valuesDarrick J. Wong2-3/+13
When we're iterating all the attributes using the built-in xattr iterator, we can use the seen_enough variable to pass error codes back to the main scrub function instead of flattening them into 0/1. This will be used in a more exciting fashion in upcoming patches. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2019-07-05btrfs: fix memory leak of path on error return pathColin Ian King1-2/+1
Currently if the allocation of roots or tmp_ulist fails the error handling does not free up the allocation of path causing a memory leak. Fix this and other similar leaks by moving the call of btrfs_free_path from label out to label out_free_ulist. Kudos to David Sterba for spotting the issue in my original fix and suggesting the correct way to fix the leak and Anand Jain for spotting a double free issue. Addresses-Coverity: ("Resource leak") Fixes: 5911c8fe05c5 ("btrfs: fiemap: preallocate ulists for btrfs_check_shared") Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2019-07-05fs: VALIDATE_FS_PARSER should default to nGeert Uytterhoeven1-1/+0
CONFIG_VALIDATE_FS_PARSER is a debugging tool to check that the parser tables are vaguely sane. It was set to default to 'Y' for the moment to catch errors in upcoming fs conversion development. Make sure it is not enabled by default in the final release of v5.1. Fixes: 31d921c7fb969172 ("vfs: Add configuration parser helpers") Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Al Viro <[email protected]>
2019-07-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-16/+26
Merge more fixes from Andrew Morton: "5 fixes" * emailed patches from Andrew Morton <[email protected]>: swap_readpage(): avoid blk_wake_io_task() if !synchronous devres: allow const resource arguments mm/vmscan.c: prevent useless kswapd loops fs/userfaultfd.c: disable irqs for fault_pending and event locks mm/page_alloc.c: fix regression with deferred struct page init
2019-07-05Merge tag 'dax-fix-5.2-rc8' of ↵Linus Torvalds1-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax fix from Dan Williams: "A single dax fix that has been soaking awaiting other fixes under discussion to join it. As it is getting late in the cycle lets proceed with this fix and save follow-on changes for post-v5.3-rc1. - Fix xarray entry association for mixed mappings" * tag 'dax-fix-5.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Fix xarray entry association for mixed mappings
2019-07-05Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-3/+4
Pull do_move_mount() fix from Al Viro: "Regression fix" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: move_mount: reject moving kernel internal mounts
2019-07-05fs/userfaultfd.c: disable irqs for fault_pending and event locksEric Biggers1-16/+26
When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock. This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by userfaultfd_ctx_read(), which in turn can be waiting for userfaultfd_ctx::fault_pending_wqh.lock or userfaultfd_ctx::event_wqh.lock. But elsewhere the fault_pending_wqh and event_wqh locks are taken with IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep reports that a deadlock is possible. Fix it by always disabling IRQs when taking the fault_pending_wqh and event_wqh locks. Commit ae62c16e105a ("userfaultfd: disable irqs when taking the waitqueue lock") didn't fix this because it only accounted for the fd_wqh lock, not the other locks nested inside it. Link: http://lkml.kernel.org/r/[email protected] Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL") Signed-off-by: Eric Biggers <[email protected]> Reported-by: [email protected] Reported-by: [email protected] Reported-by: [email protected] Reviewed-by: Andrew Morton <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: <[email protected]> [4.19+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-04mnt_init(): call shmem_init() unconditionallyAl Viro1-0/+2
No point having two call sites (earlier in init_rootfs() from mnt_init() in case we are going to use shmem-style rootfs, later from do_basic_setup() unconditionally), along with the logics in shmem_init() itself to make the second call a no-op... Signed-off-by: Al Viro <[email protected]>
2019-07-04constify ksys_mount() string argumentsAl Viro1-2/+2
Signed-off-by: Al Viro <[email protected]>
2019-07-04don't bother with registering rootfsAl Viro1-6/+1
init_mount_tree() can get to rootfs_fs_type directly and that simplifies a lot of things. We don't need to register it, we don't need to look it up *and* we don't need to bother with preventing subsequent userland mounts. That's the way we should've done that from the very beginning. There is a user-visible change, namely the disappearance of "rootfs" from /proc/filesystems. Note that it's been unmountable all along and it didn't show up in /proc/mounts; however, it *is* a user-visible change and theoretically some script might've been using its presence in /proc/filesystems to tell 2.4.11+ from earlier kernels. *IF* any complaints about behaviour change do show up, we could fake it in /proc/filesystems. I very much doubt we'll have to, though. Signed-off-by: Al Viro <[email protected]>
2019-07-04init_rootfs(): don't bother with init_ramfs_fs()Al Viro1-5/+1
the only thing done by the latter is making ramfs visible to mount(2); we don't need it there - rootfs is separate and, in fact, made visible to mount(2) in the same init_rootfs(). Signed-off-by: Al Viro <[email protected]>
2019-07-04vfs: Convert openpromfs to use the new mount APIDavid Howells1-5/+15
Convert the openpromfs filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2019-07-04vfs: Convert efivarfs to use the new mount APIDavid Howells1-10/+15
Convert the efivarfs filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. [AV: get rid of efivarfs_sb nonsense - it has never been used] See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <[email protected]> cc: Matthew Garrett <[email protected]> cc: Jeremy Kerr <[email protected]> cc: Ard Biesheuvel <[email protected]> cc: [email protected] Signed-off-by: Al Viro <[email protected]>
2019-07-04vfs: Convert configfs to use the new mount APIDavid Howells1-5/+15
Convert the configfs filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <[email protected]> cc: Joel Becker <[email protected]> cc: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2019-07-04vfs: Convert binfmt_misc to use the new mount APIDavid Howells1-5/+15
Convert the binfmt_misc filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <[email protected]> cc: Alexander Viro <[email protected]> cc: [email protected] Signed-off-by: Al Viro <[email protected]>
2019-07-04convenience helper: get_tree_single()Al Viro2-1/+9
counterpart of mount_single(); switch fusectl to it Signed-off-by: Al Viro <[email protected]>
2019-07-04convenience helper get_tree_nodev()Al Viro3-2/+10
counterpart of mount_nodev(). Switch hugetlb and pseudo to it. Signed-off-by: Al Viro <[email protected]>
2019-07-04fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()Al Viro1-4/+6
make unhash_mnt() return the mountpoint to be dropped, let callers deal with it. Signed-off-by: Al Viro <[email protected]>
2019-07-04__detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymoreAl Viro1-1/+1
... not since 1e9c75fb9c47 ("mnt: fix __detach_mounts infinite loop") Signed-off-by: Al Viro <[email protected]>
2019-07-04nfs: dget_parent() never returns NULLAl Viro1-4/+2
Signed-off-by: Al Viro <[email protected]>
2019-07-04ceph: don't open-code the check for dead lockrefAl Viro1-1/+1
Signed-off-by: Al Viro <[email protected]>
2019-07-04btrfs: move the subvolume reservation stuff out of extent-tree.cJosef Bacik2-54/+56
This is just two functions, put it in root-tree.c since it involves root items. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2019-07-04btrfs: migrate the delalloc space stuff to it's own homeJosef Bacik12-499/+526
We have code for data and metadata reservations for delalloc. There's quite a bit of code here, and it's used in a lot of places so I've separated it out to it's own file. inode.c and file.c are already pretty large, and this code is complicated enough to live in its own space. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>
2019-07-04btrfs: migrate btrfs_trans_release_chunk_metadataJosef Bacik4-19/+19
Move this into transaction.c with the rest of the transaction related code. Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>