aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2011-03-14pull dropping RCU on success of link_path_walk() into path_lookupat()Al Viro1-18/+12
Signed-off-by: Al Viro <[email protected]>
2011-03-14untangle the "need_reval_dot" messAl Viro1-63/+44
instead of ad-hackery around need_reval_dot(), do the following: set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute symlink traversal, on ".." and on procfs-style symlinks. Clear on normal components, leave unchanged on ".". Non-nested callers of link_path_walk() call handle_reval_path(), which checks that flag is set and that fs does want the final revalidate thing, then does ->d_revalidate(). In link_path_walk() all the return_reval stuff is gone. Signed-off-by: Al Viro <[email protected]>
2011-03-14merge component type recognitionAl Viro1-26/+22
no need to do it in three places... Signed-off-by: Al Viro <[email protected]>
2011-03-14merge path_init and path_init_rcuAl Viro1-83/+35
Actual dependency on whether we want RCU or not is in 3 small areas (as it ought to be) and everything around those is the same in both versions. Since each function has only one caller and those callers are on two sides of if (flags & LOOKUP_RCU), it's easier and cleaner to merge them and pull the checks inside. Signed-off-by: Al Viro <[email protected]>
2011-03-14sanitize path_walk() messAl Viro1-92/+56
New helper: path_lookupat(). Basically, what do_path_lookup() boils to modulo -ECHILD/-ESTALE handler. path_walk* family is gone; vfs_path_lookup() is using link_path_walk() directly, do_path_lookup() and do_filp_open() are using path_lookupat(). Signed-off-by: Al Viro <[email protected]>
2011-03-14take RCU-dependent stuff around exec_permission() into a new helperAl Viro1-11/+14
Signed-off-by: Al Viro <[email protected]>
2011-03-14kill path_lookup()Al Viro2-5/+4
all remaining callers pass LOOKUP_PARENT to it, so flags argument can die; renamed to kern_path_parent() Signed-off-by: Al Viro <[email protected]>
2011-03-14GFS2: Update to AIL list lockingSteven Whitehouse3-1/+5
The previous patch missed a couple of places where the AIL list needed locking, so this fixes up those places, plus a comment is corrected too. Signed-off-by: Steven Whitehouse <[email protected]> Cc: Dave Chinner <[email protected]>
2011-03-13compat breakage in preadv() and pwritev()Al Viro1-2/+6
Fix for a dumb preadv()/pwritev() compat bug - unlike the native variants, the compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g. on pipe the native preadv() will fail with -ESPIPE and compat one will act as readv() and succeed. Not critical, but it's a clear bug with trivial fix, so IMO it's OK for -final. Signed-off-by: Al Viro <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-13compat breakage in preadv() and pwritev()Al Viro1-2/+6
Fix for a dumb preadv()/pwritev() compat bug - unlike the native variants, compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g. on pipe the native preadv() will fail with -ESPIPE and compat one will act as readv() and succeed. Not critical, but it's a clear bug with trivial fix. Signed-off-by: Al Viro <[email protected]>
2011-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds5-62/+135
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: break out of shrink_delalloc earlier btrfs: fix not enough reserved space btrfs: fix dip leak Btrfs: make sure not to return overlapping extents to fiemap Btrfs: deal with short returns from copy_from_user Btrfs: fix regressions in copy_from_user handling
2011-03-12Btrfs: break out of shrink_delalloc earlierChris Mason2-12/+32
Josef had changed shrink_delalloc to exit after three shrink attempts, which wasn't quite enough because new writers could race in and steal free space. But it also fixed deadlocks and stalls as we tried to recover delalloc reservations. The code was tweaked to loop 1024 times, and would reset the counter any time a small amount of progress was made. This was too drastic, and with a lot of writers we can end up stuck in shrink_delalloc forever. The shrink_delalloc loop is fairly complex because the caller is looping too, and the caller will go ahead and force a transaction commit to make sure we reclaim space. This reworks things to exit shrink_delalloc when we've forced some writeback and the delalloc reservations have gone down. This means the writeback has not just started but has also finished at least some of the metadata changes required to reclaim delalloc space. If we've got this wrong, we're returning ENOSPC too early, which is a big improvement over the current behavior of hanging the machine. Test 224 in xfstests hammers on this nicely, and with 1000 writers trying to fill a 1GB drive we get our first ENOSPC at 93% full. The other writers are able to continue until we get 100%. This is a worst case test for btrfs because the 1000 writers are doing small IO, and the small FS size means we don't have a lot of room for metadata chunks. Signed-off-by: Chris Mason <[email protected]>
2011-03-11NFS: NFSROOT should default to "proto=udp"Chuck Lever1-15/+14
There have been a number of recent reports that NFSROOT is no longer working with default mount options, but fails only with certain NICs. Brian Downing <[email protected]> bisected to commit 56463e50 "NFS: Use super.c for NFSROOT mount option parsing". Among other things, this commit changes the default mount options for NFSROOT to use TCP instead of UDP as the underlying transport. TCP seems less able to deal with NICs that are slow to initialize. The system logs that have accompanied reports of problems all show that NFSROOT attempts to establish a TCP connection before the NIC is fully initialized, and thus the TCP connection attempt fails. When a TCP connection attempt fails during a mount operation, the NFS stack needs to fail the operation. Usually user space knows how and when to retry it. The network layer does not report a distinct error code for this particular failure mode. Thus, there isn't a clean way for the RPC client to see that it needs to retry in this case, but not in others. Because NFSROOT is used in some environments where it is not possible to update the kernel command line to specify "udp", the proper thing to do is change NFSROOT to use UDP by default, as it did before commit 56463e50. To make it easier to see how to change default mount options for NFSROOT and to distinguish default settings from mandatory settings, I've adjusted a couple of areas to document the specifics. root_nfs_cat() is also modified to deal with commas properly when concatenating strings containing mount option lists. This keeps root_nfs_cat() call sites simpler, now that we may be concatenating multiple mount option strings. Tested-by: Brian Downing <[email protected]> Tested-by: Mark Brown <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Cc: <[email protected]> # 2.6.37 Signed-off-by: Trond Myklebust <[email protected]>
2011-03-11nfs4: remove duplicated #includeHuang Weiyi1-1/+0
Remove duplicated #include('s) in fs/nfs/nfs4proc.c Signed-off-by: Huang Weiyi <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2011-03-11NFSv4: nfs4_state_mark_reclaim_nograce() should be staticTrond Myklebust2-4/+2
There are no more external users of nfs4_state_mark_reclaim_nograce() or nfs4_state_mark_reclaim_reboot(), so mark them as static. Signed-off-by: Trond Myklebust <[email protected]>
2011-03-11NFSv4: Fix the setlk error handlerTrond Myklebust1-9/+4
Signed-off-by: Trond Myklebust <[email protected]>
2011-03-11NFSv4.1: Fix the handling of the SEQUENCE status bitsTrond Myklebust1-3/+8
We want SEQUENCE status bits to be handled by the state manager in order to avoid threading issues. Signed-off-by: Trond Myklebust <[email protected]>
2011-03-11NFSv4/4.1: Fix nfs4_schedule_state_recovery abusesTrond Myklebust3-29/+49
nfs4_schedule_state_recovery() should only be used when we need to force the state manager to check the lease. If we just want to start the state manager in order to handle a state recovery situation, we should be using nfs4_schedule_state_manager(). This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing its use with a set of helper functions that do the right thing. Signed-off-by: Trond Myklebust <[email protected]>
2011-03-11GFS2: introduce AIL lockDave Chinner5-18/+29
The log lock is currently used to protect the AIL lists and the movements of buffers into and out of them. The lists are self contained and no log specific items outside the lists are accessed when starting or emptying the AIL lists. Hence the operation of the AIL does not require the protection of the log lock so split them out into a new AIL specific lock to reduce the amount of traffic on the log lock. This will also reduce the amount of serialisation that occurs when the gfs2_logd pushes on the AIL to move it forward. This reduces the impact of log pushing on sequential write throughput. Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Steven Whitehouse <[email protected]>
2011-03-11GFS2: fix block allocation check for fallocateBenjamin Marzinski1-25/+31
GFS2 fallocate wasn't properly checking if a blocks were already allocated. In write_empty_blocks(), if a page didn't have buffer_heads attached, GFS2 was always treating it as if there were no blocks allocated for that page. GFS2 now calls gfs2_block_map() to check if the blocks are allocated before writing them out. Signed-off-by: Benjamin Marzinski <[email protected]> Signed-off-by: Steven Whitehouse <[email protected]>
2011-03-11GFS2: Optimize glock multiple-dequeue codeBob Peterson1-8/+4
This is a small patch that optimizes multiple glock dequeue operations. It changes the unlock order to be more efficient and makes it easier for lock debugging tools to unravel. It also eliminates the need for the temp variable x, although that would likely be optimized out. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Steven Whitehouse <[email protected]>
2011-03-10NFSv4.1 reclaim complete must wait for completionAndy Adamson1-0/+3
Signed-off-by: Andy Adamson <[email protected]> [Trond: fix whitespace errors] Signed-off-by: Trond Myklebust <[email protected]>
2011-03-10NFSv4: remove duplicate clientid in struct nfs_clientAndy Adamson1-2/+2
Signed-off-by: Andy Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2011-03-10NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAYRicardo Labiaga1-1/+11
Fix bug where we currently retry the EXCHANGEID call again, eventhough we already have a valid clientid. Instead, delay and retry the CREATE_SESSION call. Signed-off-by: Ricardo Labiaga <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2011-03-10(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit ↵Frank Filz1-1/+6
31 or 63 are set in fileid The problem was use of an int32, which when converted to a uint64 is sign extended resulting in a fileid that doesn't fit in 32 bits even though the intent of the function is to fit the fileid into 32 bits. Signed-off-by: Frank Filz <[email protected]> Reviewed-by: Jeff Layton <[email protected]> [Trond: Added an include for compat.h] Signed-off-by: Trond Myklebust <[email protected]>
2011-03-10nfs: fix compilation warningJovi Zhang1-1/+1
this commit fix compilation warning as following: linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Jovi Zhang <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2011-03-10nfs: add kmalloc return value check in decode_and_add_dsStanislav Fomichev1-0/+4
add kmalloc return value check in decode_and_add_ds Signed-off-by: Stanislav Fomichev <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2011-03-10nfs: close NFSv4 COMMIT vs. CLOSE raceJeff Layton1-0/+2
I've been adding in more artificial delays in the NFSv4 commit and close codepaths to uncover races. The kernel I'm testing has the patch to close the race in __rpc_wait_for_completion_task that's in Trond's cthon2011 branch. The reproducer I've been using does this in a loop: mkdir("DIR"); fd = open("DIR/FILE", O_WRONLY|O_CREAT|O_EXCL, 0644); write(fd, "abcdefg", 7); close(fd); unlink("DIR/FILE"); rmdir("DIR"); The above reproducer shouldn't result in any silly-renaming. However, when I add a "msleep(100)" just after the nfs_commit_clear_lock call in nfs_commit_release, I can almost always force one to occur. If I can force it to occur with that, then it can happen without that delay given the right timing. nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait for the task to complete before putting its reference to it, so the last reference get put in rpc_release task and gets queued to a workqueue. In this situation, the last open context reference may be put by the COMMIT release instead of the close() syscall. The close() syscall returns too quickly and the unlink runs while the d_count is still high since the COMMIT release hasn't put its dentry reference yet. Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete before putting the task reference when FLUSH_SYNC is set. With this, the last reference is put by the process that's initiating the FLUSH_SYNC commit and the race is closed. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2011-03-10SUNRPC: Close a race in __rpc_wait_for_completion_task()Trond Myklebust2-3/+3
Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: Trond Myklebust <[email protected]>
2011-03-10btrfs: fix not enough reserved spaceMiao Xie1-2/+3
btrfs_link() will insert 3 items(inode ref, dir name item and dir index item) into the b+ tree and update 2 items(its inode, and parent's inode) in the b+ tree. So we should reserve space for these 5 items, not 3 items. Reported-by: Tsutomu Itoh <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-03-10btrfs: fix dip leakDaniel J Blueman1-0/+1
The btrfs DIO code leaks dip structs when dip->csums allocation fails; bio->bi_end_io isn't set at the point where the free_ordered branch is consequently taken, thus bio_endio doesn't call the function which would free it in the normal case. Fix. Signed-off-by: Daniel J Blueman <[email protected]> Acked-by: Miao Xie <[email protected]> Signed-off-by: Chris Mason <[email protected]>
2011-03-10fs/dcache: allow d_obtain_alias() to return unhashed dentriesJ. Bruce Fields1-2/+24
Without this patch, inodes are not promptly freed on last close of an unlinked file by an nfs client: client$ mount -tnfs4 server:/export/ /mnt/ client$ tail -f /mnt/FOO ... server$ df -i /export server$ rm /export/FOO (^C the tail -f) server$ df -i /export server$ echo 2 >/proc/sys/vm/drop_caches server$ df -i /export the df's will show that the inode is not freed on the filesystem until the last step, when it could have been freed after killing the client's tail -f. On-disk data won't be deallocated either, leading to possible spurious ENOSPC. This occurs because when the client does the close, it arrives in a compound with a putfh and a close, processed like: - putfh: look up the filehandle.  The only alias found for the inode will be DCACHE_UNHASHED alias referenced by the filp this, so it creates a new DCACHE_DISCONECTED dentry and returns that instead. - close: closes the existing filp, which is destroyed immediately by dput() since it's DCACHE_UNHASHED. - end of the compound: release the reference to the current filehandle, and dput() the new DCACHE_DISCONECTED dentry, which gets put on the unused list instead of being destroyed immediately. Nick Piggin suggested fixing this by allowing d_obtain_alias to return the unhashed dentry that is referenced by the filp, instead of making it create a new dentry. Leave __d_find_alias() alone to avoid changing behavior of other callers. Also nfsd doesn't need all the checks of __d_find_alias(); any dentry, hashed or unhashed, disconnected or not, should work. Signed-off-by: J. Bruce Fields <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-03-10Check for immutable/append flag in fallocate pathMarco Stornelli1-0/+8
In the fallocate path the kernel doesn't check for the immutable/append flag. It's possible to have a race condition in this scenario: an application open a file in read/write and it does something, meanwhile root set the immutable flag on the file, the application at that point can call fallocate with success. In addition, we don't allow to do any unreserve operation on an append only file but only the reserve one. Signed-off-by: Marco Stornelli <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-03-10fat: fix d_revalidate oopsen on NFS exportsAl Viro1-2/+2
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <[email protected]>
2011-03-10jfs: fix d_revalidate oopsen on NFS exportsAl Viro1-1/+1
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <[email protected]>
2011-03-10ocfs2: fix d_revalidate oopsen on NFS exportsAl Viro1-1/+1
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <[email protected]>
2011-03-10gfs2: fix d_revalidate oopsen on NFS exportsAl Viro1-1/+1
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <[email protected]>
2011-03-10fuse: fix d_revalidate oopsen on NFS exportsAl Viro1-1/+1
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <[email protected]>
2011-03-10ceph: fix d_revalidate oopsen on NFS exportsAl Viro1-1/+1
can't blindly check nd->flags in ->d_revalidate() Signed-off-by: Al Viro <[email protected]>
2011-03-10reiserfs xattr ->d_revalidate() shouldn't care about RCUAl Viro1-2/+0
... it returns an error unconditionally Signed-off-by: Al Viro <[email protected]>
2011-03-10/proc/self is never going to be invalidated...Al Viro1-30/+0
Signed-off-by: Al Viro <[email protected]>
2011-03-09Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linuxLinus Torvalds3-9/+10
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: nfsd: wrong index used in inner loop nfsd4: fix bad pointer on failure to find delegation NFSD: fix decode_cb_sequence4resok
2011-03-09Merge branch 'for-linus' of ↵Linus Torvalds3-7/+22
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: nd->inode is not set on the second attempt in path_walk() unfuck proc_sysctl ->d_compare() minimal fix for do_filp_open() race
2011-03-09GFS2: Remove potential race in flock codeSteven Whitehouse1-2/+4
This patch ensures that we always wait for glock demotion when dropping flocks on a file in order to prevent any race conditions associated with further flock calls or closing the file. Signed-off-by: Steven Whitehouse <[email protected]>
2011-03-09GFS2: Fix glock deallocation raceSteven Whitehouse4-11/+12
This patch fixes a race in deallocating glocks which was introduced in the RCU glock patch. We need to ensure that the glock count is kept correct even in the case that there is a race to add a new glock into the hash table. Also, to avoid having to wait for an RCU grace period, the glock counter can be decremented before call_rcu() is called. Signed-off-by: Steven Whitehouse <[email protected]>
2011-03-09GFS2: quota allows exceeding hard limitAbhijith Das2-1/+8
Immediately after being synced to disk, cached quotas are zeroed out and a subsequent access of the cached quotas results in incorrect zero values. This meant that gfs2 assumed the actual usage to be the zero (or near-zero) usage values it found in the cached quotas and comparison against warn/limits never triggered a quota violation. This patch adds a new flag QDF_REFRESH that is set after a sync so that the cached quotas are forcefully refreshed from disk on a subsequent access on seeing this flag set. Resolves: rhbz#675944 Signed-off-by: Abhi Das <[email protected]> Signed-off-by: Steven Whitehouse <[email protected]>
2011-03-08nd->inode is not set on the second attempt in path_walk()Al Viro1-0/+1
We leave it at whatever it had been pointing to after the first link_path_walk() had failed with -ESTALE. Things do not work well after that... Signed-off-by: Al Viro <[email protected]>
2011-03-08nfsd: wrong index used in inner looproel1-2/+2
Index i was already used in the outer loop Cc: [email protected] Signed-off-by: Roel Kluin <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2011-03-08Btrfs: make sure not to return overlapping extents to fiemapChris Mason1-6/+27
The btrfs fiemap code was incorrectly returning duplicate or overlapping extents in some cases. cp was blindly trusting this result and we would end up with a destination file that was bigger than the original because some bytes were copied twice. The fix here adjusts our offsets to make sure we're always moving forward in the fiemap results. Signed-off-by: Chris Mason <[email protected]>
2011-03-08unfuck proc_sysctl ->d_compare()Al Viro2-4/+11
a) struct inode is not going to be freed under ->d_compare(); however, the thing PROC_I(inode)->sysctl points to just might. Fortunately, it's enough to make freeing that sucker delayed, provided that we don't step on its ->unregistering, clear the pointer to it in PROC_I(inode) before dropping the reference and check if it's NULL in ->d_compare(). b) I'm not sure that we *can* walk into NULL inode here (we recheck dentry->seq between verifying that it's still hashed / fetching dentry->d_inode and passing it to ->d_compare() and there's no negative hashed dentries in /proc/sys/*), but if we can walk into that, we really should not have ->d_compare() return 0 on it! Said that, I really suspect that this check can be simply killed. Nick? Signed-off-by: Al Viro <[email protected]>