aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2014-10-09ocfs2: call o2quo_exit() if malloc failed in o2net_init()Joseph Qi1-7/+11
In o2net_init, if malloc failed, it directly returns -ENOMEM. Then o2quo_exit won't be called in init_o2nm. Signed-off-by: Joseph Qi <[email protected]> Reviewed-by: joyce.xue <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09ocfs2: fix shift left operations overflowJoseph Qi2-2/+2
ocfs2_inode_info->ip_clusters and ocfs2_dinode->id1.bitmap1.i_total are defined as type u32, so the shift left operations may overflow if volume size is large, for example, 2TB and cluster size is 1MB. Signed-off-by: Joseph Qi <[email protected]> Reviewed-by: Alex Chen <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09ocfs2/dlm: refactor error handling in dlm_alloc_ctxtJoseph Qi1-20/+22
Refactoring error handling in dlm_alloc_ctxt to simplify code. Signed-off-by: Joseph Qi <[email protected]> Reviewed-by: Alex Chen <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09fs/ocfs2/stack_user.c: fix typo in ocfs2_control_release()Andrew Morton1-1/+1
It is supposed to zero pv_minor. Reported-by: Himangi Saraogi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09ntfs: remove bogus spaceAndrea Gelmini1-1/+1
fs/ntfs/debug.c:124: WARNING: space prohibited between function name and open parenthesis '(' Signed-off-by: Andrea Gelmini <[email protected]> Signed-off-by: Anton Altaparmakov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09ntfs: use find_get_page_flags() to mark page accessed as it is no longer ↵Anton Altaparmakov2-3/+4
marked later on Mel Gorman's commit 2457aec63745 ("mm: non-atomically mark page accessed during page cache allocation where possible") removed mark_page_accessed() calls from NTFS without updating the matching find_lock_page() to find_get_page_flags(GFP_LOCK | FGP_ACCESSED) thus causing the page to never be marked accessed. This patch fixes that. Signed-off-by: Anton Altaparmakov <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09fanotify: enable close-on-exec on events' fd when requested in fanotify_init()Yann Droneaud1-1/+1
According to commit 80af258867648 ("fanotify: groups can specify their f_flags for new fd"), file descriptors created as part of file access notification events inherit flags from the event_f_flags argument passed to syscall fanotify_init(2)[1]. Unfortunately O_CLOEXEC is currently silently ignored. Indeed, event_f_flags are only given to dentry_open(), which only seems to care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in open_check_o_direct() and O_LARGEFILE in generic_file_open(). It's a pity, since, according to some lookup on various search engines and http://codesearch.debian.net/, there's already some userspace code which use O_CLOEXEC: - in systemd's readahead[2]: fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME); - in clsync[3]: #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC) int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS); - in examples [4] from "Filesystem monitoring in the Linux kernel" article[5] by Aleksander Morgado: if ((fanotify_fd = fanotify_init (FAN_CLOEXEC, O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0) Additionally, since commit 48149e9d3a7e ("fanotify: check file flags passed in fanotify_init"). having O_CLOEXEC as part of fanotify_init() second argument is expressly allowed. So it seems expected to set close-on-exec flag on the file descriptors if userspace is allowed to request it with O_CLOEXEC. But Andrew Morton raised[6] the concern that enabling now close-on-exec might break existing applications which ask for O_CLOEXEC but expect the file descriptor to be inherited across exec(). In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file descriptor returned as part of file access notify can break applications due to deadlock. So close-on-exec is needed for most applications. More, applications asking for close-on-exec are likely expecting it to be enabled, relying on O_CLOEXEC being effective. If not, it might weaken their security, as noted by Jan Kara[8]. So this patch replaces call to macro get_unused_fd() by a call to function get_unused_fd_flags() with event_f_flags value as argument. This way O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is interpreted and close-on-exec get enabled when requested. [1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294 [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631 https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38 [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/ [6] http://lkml.kernel.org/r/[email protected] [7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l [8] http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yann Droneaud <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed by: Heinrich Schuchardt <[email protected]> Tested-by: Heinrich Schuchardt <[email protected]> Cc: Mihai Don\u021bu <[email protected]> Cc: Pádraig Brady <[email protected]> Cc: Heinrich Schuchardt <[email protected]> Cc: Jan Kara <[email protected]> Cc: Valdis Kletnieks <[email protected]> Cc: Michael Kerrisk-manpages <[email protected]> Cc: Lino Sanfilippo <[email protected]> Cc: Richard Guy Briggs <[email protected]> Cc: Eric Paris <[email protected]> Cc: Al Viro <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09fsnotify: don't put user context if it was never assignedSasha Levin1-2/+4
On some failure paths we may attempt to free user context even if it wasn't assigned yet. This will cause a NULL ptr deref and a kernel BUG. The path I was looking at is in inotify_new_group(): oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL); if (unlikely(!oevent)) { fsnotify_destroy_group(group); return ERR_PTR(-ENOMEM); } fsnotify_destroy_group() would get called here, but group->inotify_data.user is only getting assigned later: group->inotify_data.user = get_current_user(); Signed-off-by: Sasha Levin <[email protected]> Cc: John McCutchan <[email protected]> Cc: Robert Love <[email protected]> Cc: Eric Paris <[email protected]> Reviewed-by: Heinrich Schuchardt <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09fs/notify/group.c: make fsnotify_final_destroy_group() staticAndrew Morton2-4/+1
No callers outside this file. Cc: Sasha Levin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09udf: Fix loading of special inodesJan Kara3-11/+26
Some UDF media have special inodes (like VAT or metadata partition inodes) whose link_count is 0. Thus commit 4071b9136223 (udf: Properly detect stale inodes) broke loading these inodes because udf_iget() started returning -ESTALE for them. Since we still need to properly detect stale inodes queried by NFS, create two variants of udf_iget() - one which is used for looking up special inodes (which ignores link_count == 0) and one which is used for other cases which return ESTALE when link_count == 0. Fixes: 4071b913622316970d0e1919f7d82b4403fec5f2 CC: [email protected] Signed-off-by: Jan Kara <[email protected]>
2014-10-09Merge branch 'timers-core-for-linus' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Nothing really exciting this time: - a few fixlets in the NOHZ code - a new ARM SoC timer abomination. One should expect that we have enough of them already, but they insist on inventing new ones. - the usual bunch of ARM SoC timer updates. That feels like herding cats" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: arm_arch_timer: Consolidate arch_timer_evtstrm_enable clocksource: arm_arch_timer: Enable counter access for 32-bit ARM clocksource: arm_arch_timer: Change clocksource name if CP15 unavailable clocksource: sirf: Disable counter before re-setting it clocksource: cadence_ttc: Add support for 32bit mode clocksource: tcb_clksrc: Sanitize IRQ request clocksource: arm_arch_timer: Discard unavailable timers correctly clocksource: vf_pit_timer: Support shutdown mode ARM: meson6: clocksource: Add Meson6 timer support ARM: meson: documentation: Add timer documentation clocksource: sh_tmu: Document r8a7779 binding clocksource: sh_mtu2: Document r7s72100 binding clocksource: sh_cmt: Document SoC specific bindings timerfd: Remove an always true check nohz: Avoid tick's double reprogramming in highres mode nohz: Fix spurious periodic tick behaviour in low-res dynticks mode
2014-10-09ncpfs: use list_for_each_entry() for d_subdirs walkAl Viro2-17/+3
Signed-off-by: Al Viro <[email protected]>
2014-10-09vfs: move getname() from callers to do_mount()Seunghun Lee2-30/+9
It would make more sense to pass char __user * instead of char * in callers of do_mount() and do getname() inside do_mount(). Suggested-by: Al Viro <[email protected]> Signed-off-by: Seunghun Lee <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09gfs2_atomic_open(): skip lookups on hashed dentryAl Viro1-0/+5
hashed dentry can be passed to ->atomic_open() only if a) it has just passed revalidation and b) it's negative Signed-off-by: Al Viro <[email protected]>
2014-10-09jfs: don't hash direct inodeAl Viro1-1/+1
hlist_add_fake(inode->i_hash), same as for the rest of special ones... Signed-off-by: Al Viro <[email protected]>
2014-10-09ecryptfs: ->f_op is never NULLAl Viro1-1/+1
Signed-off-by: Al Viro <[email protected]>
2014-10-09missing annotation in fs/file.cAl Viro1-0/+1
Signed-off-by: Al Viro <[email protected]>
2014-10-09fs: namespace: suppress 'may be used uninitialized' warningsTim Gardner3-23/+15
The gcc version 4.9.1 compiler complains Even though it isn't possible for these variables to not get initialized before they are used. fs/namespace.c: In function ‘SyS_mount’: fs/namespace.c:2720:8: warning: ‘kernel_dev’ may be used uninitialized in this function [-Wmaybe-uninitialized] ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags, ^ fs/namespace.c:2699:8: note: ‘kernel_dev’ was declared here char *kernel_dev; ^ fs/namespace.c:2720:8: warning: ‘kernel_type’ may be used uninitialized in this function [-Wmaybe-uninitialized] ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags, ^ fs/namespace.c:2697:8: note: ‘kernel_type’ was declared here char *kernel_type; ^ Fix the warnings by simplifying copy_mount_string() as suggested by Al Viro. Cc: Alexander Viro <[email protected]> Signed-off-by: Tim Gardner <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09cachefiles_write_page(): switch to __kernel_write()Al Viro2-29/+21
Signed-off-by: Al Viro <[email protected]>
2014-10-099p: switch to %p[dD]Al Viro7-34/+34
Signed-off-by: Al Viro <[email protected]>
2014-10-09cifs: switch to use of %p[dD]Al Viro3-19/+19
Signed-off-by: Al Viro <[email protected]>
2014-10-09fs: make cont_expand_zero interruptibleMikulas Patocka1-0/+5
This patch makes it possible to kill a process looping in cont_expand_zero. A process may spend a lot of time in this function, so it is desirable to be able to kill it. It happened to me that I wanted to copy a piece data from the disk to a file. By mistake, I used the "seek" parameter to dd instead of "skip". Due to the "seek" parameter, dd attempted to extend the file and became stuck doing so - the only possibility was to reset the machine or wait many hours until the filesystem runs out of space and cont_expand_zero fails. We need this patch to be able to terminate the process. Signed-off-by: Mikulas Patocka <[email protected]> Cc: [email protected] Signed-off-by: Al Viro <[email protected]>
2014-10-09fs: Fix theoretical division by 0 in super_cache_scan().Tetsuo Handa1-0/+2
total_objects could be 0 and is used as a denom. While total_objects is a "long", total_objects == 0 unlikely happens for 3.12 and later kernels because 32-bit architectures would not be able to hold (1 << 32) objects. However, total_objects == 0 may happen for kernels between 3.1 and 3.11 because total_objects in prune_super() was an "int" and (e.g.) x86_64 architecture might be able to hold (1 << 32) objects. Signed-off-by: Tetsuo Handa <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: stable <[email protected]> # 3.1+ Signed-off-by: Al Viro <[email protected]>
2014-10-09dcache: Fix no spaces at the start of a line in dcache.cDaeseok Youn1-4/+4
Fixed coding style in dcache.c Signed-off-by: Daeseok Youn <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09[jffs2] kill wbuf_queued/wbuf_dwork_lockAl Viro2-17/+2
schedule_delayed_work() happening when the work is already pending is a cheap no-op. Don't bother with ->wbuf_queued logics - it's both broken (cancelling ->wbuf_dwork leaves it set, as spotted by Jeff Harris) and pointless. It's cheaper to let schedule_delayed_work() handle that case. Reported-by: Jeff Harris <[email protected]> Tested-by: Jeff Harris <[email protected]> Cc: [email protected] Signed-off-by: Al Viro <[email protected]>
2014-10-09handle suicide on late failure exits in execve() in search_binary_handler()Al Viro4-61/+30
... rather than doing that in the guts of ->load_binary(). [updated to fix the bug spotted by Shentino - for SIGSEGV we really need something stronger than send_sig_info(); again, better do that in one place] Signed-off-by: Al Viro <[email protected]>
2014-10-09dcache.c: call ->d_prune() regardless of d_unhashed()Al Viro1-1/+1
the only in-tree instance checks d_unhashed() anyway, out-of-tree code can preserve the current behaviour by adding such check if they want it and we get an ability to use it in cases where we *want* to be notified of killing being inevitable before ->d_lock is dropped, whether it's unhashed or not. In particular, autofs would benefit from that. Signed-off-by: Al Viro <[email protected]>
2014-10-09d_prune_alias(): just lock the parent and call __dentry_kill()Al Viro1-14/+7
The only reason for games with ->d_prune() was __d_drop(), which was needed only to force dput() into killing the sucker off. Note that lock_parent() can be called under ->i_lock and won't drop it, so dentry is safe from somebody managing to kill it under us - it won't happen while we are holding ->i_lock. __dentry_kill() is called only with ->d_lockref.count being 0 (here and when picked from shrink list) or 1 (dput() and dropping the ancestors in shrink_dentry_list()), so it will never be called twice - the first thing it's doing is making ->d_lockref.count negative and once that happens, nothing will increment it. Signed-off-by: Al Viro <[email protected]>
2014-10-09proc: Update proc_flush_task_mnt to use d_invalidateEric W. Biederman1-4/+2
Now that d_invalidate always succeeds and flushes mount points use it in stead of a combination of shrink_dcache_parent and d_drop in proc_flush_task_mnt. This removes the danger of a mount point under /proc/<pid>/... becoming unreachable after the d_drop. Reviewed-by: Miklos Szeredi <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09vfs: Remove d_drop calls from d_revalidate implementationsEric W. Biederman3-7/+0
Now that d_invalidate always succeeds it is not longer necessary or desirable to hard code d_drop calls into filesystem specific d_revalidate implementations. Remove the unnecessary d_drop calls and rely on d_invalidate to drop the dentries. Using d_invalidate ensures that paths to mount points will not be dropped. Reviewed-by: Miklos Szeredi <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09vfs: Make d_invalidate return voidEric W. Biederman6-31/+12
Now that d_invalidate can no longer fail, stop returning a useless return code. For the few callers that checked the return code update remove the handling of d_invalidate failure. Reviewed-by: Miklos Szeredi <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09vfs: Merge check_submounts_and_drop and d_invalidateEric W. Biederman1-33/+22
Now that d_invalidate is the only caller of check_submounts_and_drop, expand check_submounts_and_drop inline in d_invalidate. Reviewed-by: Miklos Szeredi <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09vfs: Remove unnecessary calls of check_submounts_and_dropEric W. Biederman5-26/+0
Now that check_submounts_and_drop can not fail and is called from d_invalidate there is no longer a need to call check_submounts_and_drom from filesystem d_revalidate methods so remove it. Reviewed-by: Miklos Szeredi <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09vfs: Lazily remove mounts on unlinked files and directories.Eric W. Biederman2-33/+39
With the introduction of mount namespaces and bind mounts it became possible to access files and directories that on some paths are mount points but are not mount points on other paths. It is very confusing when rm -rf somedir returns -EBUSY simply because somedir is mounted somewhere else. With the addition of user namespaces allowing unprivileged mounts this condition has gone from annoying to allowing a DOS attack on other users in the system. The possibility for mischief is removed by updating the vfs to support rename, unlink and rmdir on a dentry that is a mountpoint and by lazily unmounting mountpoints on deleted dentries. In particular this change allows rename, unlink and rmdir system calls on a dentry without a mountpoint in the current mount namespace to succeed, and it allows rename, unlink, and rmdir performed on a distributed filesystem to update the vfs cache even if when there is a mount in some namespace on the original dentry. There are two common patterns of maintaining mounts: Mounts on trusted paths with the parent directory of the mount point and all ancestory directories up to / owned by root and modifiable only by root (i.e. /media/xxx, /dev, /dev/pts, /proc, /sys, /sys/fs/cgroup/{cpu, cpuacct, ...}, /usr, /usr/local). Mounts on unprivileged directories maintained by fusermount. In the case of mounts in trusted directories owned by root and modifiable only by root the current parent directory permissions are sufficient to ensure a mount point on a trusted path is not removed or renamed by anyone other than root, even if there is a context where the there are no mount points to prevent this. In the case of mounts in directories owned by less privileged users races with users modifying the path of a mount point are already a danger. fusermount already uses a combination of chdir, /proc/<pid>/fd/NNN, and UMOUNT_NOFOLLOW to prevent these races. The removable of global rename, unlink, and rmdir protection really adds nothing new to consider only a widening of the attack window, and fusermount is already safe against unprivileged users modifying the directory simultaneously. In principle for perfect userspace programs returning -EBUSY for unlink, rmdir, and rename of dentires that have mounts in the local namespace is actually unnecessary. Unfortunately not all userspace programs are perfect so retaining -EBUSY for unlink, rmdir and rename of dentries that have mounts in the current mount namespace plays an important role of maintaining consistency with historical behavior and making imperfect userspace applications hard to exploit. v2: Remove spurious old_dentry. v3: Optimized shrink_submounts_and_drop Removed unsued afs label v4: Simplified the changes to check_submounts_and_drop Do not rename check_submounts_and_drop shrink_submounts_and_drop Document what why we need atomicity in check_submounts_and_drop Rely on the parent inode mutex to make d_revalidate and d_invalidate an atomic unit. v5: Refcount the mountpoint to detach in case of simultaneous renames. Reviewed-by: Miklos Szeredi <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09vfs: Add a function to lazily unmount all mounts from any dentry.Eric W. Biederman2-0/+40
The new function detach_mounts comes in two pieces. The first piece is a static inline test of d_mounpoint that returns immediately without taking any locks if d_mounpoint is not set. In the common case when mountpoints are absent this allows the vfs to continue running with it's same cacheline foot print. The second piece of detach_mounts __detach_mounts actually does the work and it assumes that a mountpoint is present so it is slow and takes namespace_sem for write, and then locks the mount hash (aka mount_lock) after a struct mountpoint has been found. With those two locks held each entry on the list of mounts on a mountpoint is selected and lazily unmounted until all of the mount have been lazily unmounted. v7: Wrote a proper change description and removed the changelog documenting deleted wrong turns. Signed-off-by: Eric W. Biederman <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09vfs: factor out lookup_mountpoint from new_mountpointEric W. Biederman1-3/+12
I am shortly going to add a new user of struct mountpoint that needs to look up existing entries but does not want to create a struct mountpoint if one does not exist. Therefore to keep the code simple and easy to read split out lookup_mountpoint from new_mountpoint. Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09vfs: Keep a list of mounts on a mount pointEric W. Biederman2-0/+8
To spot any possible problems call BUG if a mountpoint is put when it's list of mounts is not empty. AV: use hlist instead of list_head Reviewed-by: Miklos Szeredi <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09vfs: Don't allow overwriting mounts in the current mount namespaceEric W. Biederman3-1/+49
In preparation for allowing mountpoints to be renamed and unlinked in remote filesystems and in other mount namespaces test if on a dentry there is a mount in the local mount namespace before allowing it to be renamed or unlinked. The primary motivation here are old versions of fusermount unmount which is not safe if the a path can be renamed or unlinked while it is verifying the mount is safe to unmount. More recent versions are simpler and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount in a directory owned by an arbitrary user. Miklos Szeredi <[email protected]> reports this is approach is good enough to remove concerns about new kernels mixed with old versions of fusermount. A secondary motivation for restrictions here is that it removing empty directories that have non-empty mount points on them appears to violate the rule that rmdir can not remove empty directories. As Linus Torvalds pointed out this is useful for programs (like git) that test if a directory is empty with rmdir. Therefore this patch arranges to enforce the existing mount point semantics for local mount namespace. v2: Rewrote the test to be a drop in replacement for d_mountpoint v3: Use bool instead of int as the return type of is_local_mountpoint Reviewed-by: Miklos Szeredi <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09vfs: More precise tests in d_invalidateEric W. Biederman1-34/+4
The current comments in d_invalidate about what and why it is doing what it is doing are wildly off-base. Which is not surprising as the comments date back to last minute bug fix of the 2.2 kernel. The big fat lie of a comment said: If it's a directory, we can't drop it for fear of somebody re-populating it with children (even though dropping it would make it unreachable from that root, we still might repopulate it if it was a working directory or similar). [AV] What we really need to avoid is multiple dentry aliases of the same directory inode; on all filesystems that have ->d_revalidate() we either declare all positive dentries always valid (and thus never fed to d_invalidate()) or use d_materialise_unique() and/or d_splice_alias(), which take care of alias prevention. The current rules are: - To prevent mount point leaks dentries that are mount points or that have childrent that are mount points may not be be unhashed. - All dentries may be unhashed. - Directories may be rehashed with d_materialise_unique check_submounts_and_drop implements this already for well maintained remote filesystems so implement the current rules in d_invalidate by just calling check_submounts_and_drop. The one difference between d_invalidate and check_submounts_and_drop is that d_invalidate must respect it when a d_revalidate method has earlier called d_drop so preserve the d_unhashed check in d_invalidate. Reviewed-by: Miklos Szeredi <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09vfs: Document the effect of d_revalidate on d_find_aliasEric W. Biederman1-1/+2
d_drop or check_submounts_and_drop called from d_revalidate can result in renamed directories with child dentries being unhashed. These renamed and drop directory dentries can be rehashed after d_materialise_unique uses d_find_alias to find them. Reviewed-by: Miklos Szeredi <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09delayed mntputAl Viro2-19/+57
On final mntput() we want fs shutdown to happen before return to userland; however, the only case where we want it happen right there (i.e. where task_work_add won't do) is MNT_INTERNAL victim. Those have to be fully synchronous - failure halfway through module init might count on having vfsmount killed right there. Fortunately, final mntput on MNT_INTERNAL vfsmounts happens on shallow stack. So we handle those synchronously and do an analog of delayed fput logics for everything else. As the result, we are guaranteed that fs shutdown will always happen on shallow stack. Signed-off-by: Al Viro <[email protected]>
2014-10-09autofs - remove obsolete d_invalidate() from expireIan Kent1-6/+0
Biederman's umount-on-rmdir series changes d_invalidate() to sumarily remove mounts under the passed in dentry regardless of whether they are busy or not. So calling this in fs/autofs4/expire.c:autofs4_tree_busy() is definitely the wrong thing to do becuase it will silently umount entries instead of just cleaning stale dentrys. But this call shouldn't be needed and testing shows that automounting continues to function without it. As Al Viro correctly surmises the original intent of the call was to perform what shrink_dcache_parent() does. If at some time in the future I see stale dentries accumulating following failed mounts I'll revisit the issue and possibly add a shrink_dcache_parent() call if needed. Signed-off-by: Ian Kent <[email protected]> Cc: Al Viro <[email protected]> Cc: Eric W. Biederman <[email protected]> Signed-off-by: Al Viro <[email protected]>
2014-10-09Allow sharing external names after __d_move()Al Viro1-16/+59
* external dentry names get a small structure prepended to them (struct external_name). * it contains an atomic refcount, matching the number of struct dentry instances that have ->d_name.name pointing to that external name. The first thing free_dentry() does is decrementing refcount of external name, so the instances that are between the call of free_dentry() and RCU-delayed actual freeing do not contribute. * __d_move(x, y, false) makes the name of x equal to the name of y, external or not. If y has an external name, extra reference is grabbed and put into x->d_name.name. If x used to have an external name, the reference to the old name is dropped and, should it reach zero, freeing is scheduled via kfree_rcu(). * free_dentry() in dentry with external name decrements the refcount of that name and, should it reach zero, does RCU-delayed call that will free both the dentry and external name. Otherwise it does what it used to do, except that __d_free() doesn't even look at ->d_name.name; it simply frees the dentry. All non-RCU accesses to dentry external name are safe wrt freeing since they all should happen before free_dentry() is called. RCU accesses might run into a dentry seen by free_dentry() or into an old name that got already dropped by __d_move(); however, in both cases dentry must have been alive and refer to that name at some point after we'd done rcu_read_lock(), which means that any freeing must be still pending. Signed-off-by: Al Viro <[email protected]>
2014-10-08NFSv4.1/pnfs: replace broken pnfs_put_lseg_asyncTrond Myklebust3-12/+29
You cannot call pnfs_put_lseg_async() more than once per lseg, so it is really an inappropriate way to deal with a refcount issue. Instead, replace it with a function that decrements the refcount, and puts the final 'free' operation (which is incompatible with locks) on the workqueue. Cc: Weston Andros Adamson <[email protected]> Fixes: e6cf82d1830f: pnfs: add pnfs_put_lseg_async Signed-off-by: Trond Myklebust <[email protected]>
2014-10-08fs: Add a missing permission check to do_umountAndy Lutomirski1-0/+2
Accessing do_remount_sb should require global CAP_SYS_ADMIN, but only one of the two call sites was appropriately protected. Fixes CVE-2014-7975. Signed-off-by: Andy Lutomirski <[email protected]>
2014-10-08NFSv4: Remove dead prototype for nfs4_insert_deviceid_node()Tom Haynes1-1/+0
nfs4_insert_deviceid_node() was removed in 661373b13d0490ff410a2133d4a7a117f2dd037e Signed-off-by: Tom Haynes <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2014-10-08Merge tag 'f2fs-for-3.18' of ↵Linus Torvalds17-786/+1421
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set introduces a couple of new features such as large sector size, FITRIM, and atomic/volatile writes. Several patches enhance power-off recovery and checkpoint routines. The fsck.f2fs starts to support fixing corrupted partitions with recovery hints provided by this patch-set. Summary: - retain some recovery information for fsck.f2fs - enhance checkpoint speed - enhance flush command management - bug fix for lseek - tune in-place-update policies - enhance roll-forward speed - revisit all the roll-forward and fsync rules - support larget sector size - support FITRIM - support atomic and volatile writes And several clean-ups and bug fixes are included" * tag 'f2fs-for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits) f2fs: support volatile operations for transient data f2fs: support atomic writes f2fs: remove unused return value f2fs: clean up f2fs_ioctl functions f2fs: potential shift wrapping buf in f2fs_trim_fs() f2fs: call f2fs_unlock_op after error was handled f2fs: check the use of macros on block counts and addresses f2fs: refactor flush_nat_entries to remove costly reorganizing ops f2fs: introduce FITRIM in f2fs_ioctl f2fs: introduce cp_control structure f2fs: use more free segments until SSR is activated f2fs: change the ipu_policy option to enable combinations f2fs: fix to search whole dirty segmap when get_victim f2fs: fix to clean previous mount option when remount_fs f2fs: skip punching hole in special condition f2fs: support large sector size f2fs: fix to truncate blocks past EOF in ->setattr f2fs: update i_size when __allocate_data_block f2fs: use MAX_BIO_BLOCKS(sbi) f2fs: remove redundant operation during roll-forward recovery ...
2014-10-08Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linuxLinus Torvalds26-331/+866
Pull nfsd updates from Bruce Fields: "Highlights: - support the NFSv4.2 SEEK operation (allowing clients to support SEEK_HOLE/SEEK_DATA), thanks to Anna. - end the grace period early in a number of cases, mitigating a long-standing annoyance, thanks to Jeff - improve SMP scalability, thanks to Trond" * 'for-3.18' of git://linux-nfs.org/~bfields/linux: (55 commits) nfsd: eliminate "to_delegation" define NFSD: Implement SEEK NFSD: Add generic v4.2 infrastructure svcrdma: advertise the correct max payload nfsd: introduce nfsd4_callback_ops nfsd: split nfsd4_callback initialization and use nfsd: introduce a generic nfsd4_cb nfsd: remove nfsd4_callback.cb_op nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence nfsd: fix nfsd4_cb_recall_done error handling nfsd4: clarify how grace period ends nfsd4: stop grace_time update at end of grace period nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls nfsd: serialize nfsdcltrack upcalls for a particular client nfsd: pass extra info in env vars to upcalls to allow for early grace period end nfsd: add a v4_end_grace file to /proc/fs/nfsd lockd: add a /proc/fs/lockd/nlm_end_grace file nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE nfsd: remove redundant boot_time parm from grace_done client tracking op ...
2014-10-08Merge tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds38-3190/+2373
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - fix an NFSv4.1 state renewal regression - fix open/lock state recovery error handling - fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails - fix statd when reconnection fails - don't wake tasks during connection abort - don't start reboot recovery if lease check fails - fix duplicate proc entries Features: - pNFS block driver fixes and clean ups from Christoph - More code cleanups from Anna - Improve mmap() writeback performance - Replace use of PF_TRANS with a more generic mechanism for avoiding deadlocks in nfs_release_page" * tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (66 commits) NFSv4.1: Fix an NFSv4.1 state renewal regression NFSv4: fix open/lock state recovery error handling NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails NFS: Fabricate fscache server index key correctly SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT NFSv3: Fix missing includes of nfs3_fs.h NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() NFS: avoid waiting at all in nfs_release_page when congested. NFS: avoid deadlocks with loop-back mounted NFS filesystems. MM: export page_wakeup functions SCHED: add some "wait..on_bit...timeout()" interfaces. NFS: don't use STABLE writes during writeback. NFSv4: use exponential retry on NFS4ERR_DELAY for async requests. rpc: Add -EPERM processing for xs_udp_send_request() rpc: return sent and err from xs_sendpages() lockd: Try to reconnect if statd has moved SUNRPC: Don't wake tasks during connection abort Fixing lease renewal nfs: fix duplicate proc entries pnfs/blocklayout: Fix a 64-bit division/remainder issue in bl_map_stripe ...
2014-10-08btrfs: Fix compile error when CONFIG_SECURITY is not set.Qu Wenruo1-0/+2
Fix the following compile error when CONFIG_SECURITY is not set: error: 'struct security_mnt_opts' has no member named 'num_mnt_opts' Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: Chris Mason <[email protected]>