Age | Commit message (Collapse) | Author | Files | Lines |
|
no more users left outside of fs/*.c (and very few outside of
fs/namespace.c, actually)
Signed-off-by: Al Viro <[email protected]>
|
|
The handling of mount flags in set_mnt_shared() got a little tangled
up during previous cleanups, with the following problems:
* MNT_PNODE_MASK is defined as a literal constant when it should be a
bitwise xor of other MNT_* flags
* set_mnt_shared() clears and then sets MNT_SHARED (part of MNT_PNODE_MASK)
* MNT_PNODE_MASK could use a comment in mount.h
* MNT_PNODE_MASK is a terrible name, change to MNT_SHARED_MASK
This patch fixes these problems.
Signed-off-by: Al Viro <[email protected]>
|
|
Add __percpu sparse annotations to fs.
These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.
Signed-off-by: Tejun Heo <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Alex Elder <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Alexander Viro <[email protected]>
|
|
This patch speeds up lmbench lat_mmap test by about another 2% after the
first patch.
Before:
avg = 462.286
std = 5.46106
After:
avg = 453.12
std = 9.58257
(50 runs of each, stddev gives a reasonable confidence)
It does this by introducing mnt_clone_write, which avoids some heavyweight
operations of mnt_want_write if called on a vfsmount which we know already
has a write count; and mnt_want_write_file, which can call mnt_clone_write
if the file is open for write.
After these two patches, mnt_want_write and mnt_drop_write go from 7% on
the profile down to 1.3% (including mnt_clone_write).
[AV: mnt_want_write_file() should take file alone and derive mnt from it;
not only all callers have that form, but that's the only mnt about which
we know that it's already held for write if file is opened for write]
Cc: Dave Hansen <[email protected]>
Signed-off-by: Nick Piggin <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up
basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it.
A microbenchmark yes, but it exercises some important paths in the mm.
Before:
avg = 501.9
std = 14.7773
After:
avg = 462.286
std = 5.46106
(50 runs of each, stddev gives a reasonable confidence, but there is quite
a bit of variation there still)
It does this by removing the complex per-cpu locking and counter-cache and
replaces it with a percpu counter in struct vfsmount. This makes the code
much simpler, and avoids spinlocks (although the msync is still pretty
costly, unfortunately). It results in about 900 bytes smaller code too. It
does increase the size of a vfsmount, however.
It should also give a speedup on large systems if CPUs are frequently operating
on different mounts (because the existing scheme has to operate on an atomic in
the struct vfsmount when switching between mounts). But I'm most interested in
the single threaded path performance for the moment.
[AV: minor cleanup]
Cc: Dave Hansen <[email protected]>
Signed-off-by: Nick Piggin <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Add support for explicitly requesting full atime updates. This makes it
possible for kernels to default to relatime but still allow userspace to
override it.
Signed-off-by: Matthew Garrett <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Remove a CVS keyword that wasn't updated for a long time from a comment.
Signed-off-by: Adrian Bunk <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
- use kstrdup() instead of kmalloc() + memcpy()
- return NULL if allocating ->mnt_devname failed
- mnt_devname should be const
Signed-off-by: Li Zefan <[email protected]>
Acked-by: Cyrill Gorcunov <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Remove the "#ifdef __KERNEL__" tests from unexported header files in
linux/include whose entire contents are wrapped in that preprocessor
test.
Signed-off-by: Robert P. J. Day <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add a unique ID to each peer group using the IDR infrastructure. The
identifiers are reused after the peer group dissolves.
The IDR structures are protected by holding namepspace_sem for write
while allocating or deallocating IDs.
IDs are allocated when a previously unshared vfsmount becomes the
first member of a peer group. When a new member is added to an
existing group, the ID is copied from one of the old members.
IDs are freed when the last member of a peer group is unshared.
Setting the MNT_SHARED flag on members of a subtree is done as a
separate step, after all the IDs have been allocated. This way an
allocation failure can be cleaned up easilty, without affecting the
propagation state.
Based on design sketch by Al Viro.
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Add a unique ID to each vfsmount using the IDR infrastructure. The
identifiers are reused after the vfsmount is freed.
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
Originally from: Herbert Poetzl <[email protected]>
This is the core of the read-only bind mount patch set.
Note that this does _not_ add a "ro" option directly to the bind mount
operation. If you require such a mount, you must first do the bind, then
follow it up with a 'mount -o remount,ro' operation:
If you wish to have a r/o bind mount of /foo on bar:
mount --bind /foo /bar
mount -o remount,ro /bar
Acked-by: Al Viro <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
This is the real meat of the entire series. It actually
implements the tracking of the number of writers to a mount.
However, it causes scalability problems because there can be
hundreds of cpus doing open()/close() on files on the same mnt at
the same time. Even an atomic_t in the mnt has massive scalaing
problems because the cacheline gets so terribly contended.
This uses a statically-allocated percpu variable. All want/drop
operations are local to a cpu as long that cpu operates on the same
mount, and there are no writer count imbalances. Writer count
imbalances happen when a write is taken on one cpu, and released
on another, like when an open/close pair is performed on two
Upon a remount,ro request, all of the data from the percpu
variables is collected (expensive, but very rare) and we determine
if there are any outstanding writers to the mount.
I've written a little benchmark to sit in a loop for a couple of
seconds in several cpus in parallel doing open/write/close loops.
http://sr71.net/~dave/linux/openbench.c
The code in here is a a worst-possible case for this patch. It
does opens on a _pair_ of files in two different mounts in parallel.
This should cause my code to lose its "operate on the same mount"
optimization completely. This worst-case scenario causes a 3%
degredation in the benchmark.
I could probably get rid of even this 3%, but it would be more
complex than what I have here, and I think this is getting into
acceptable territory. In practice, I expect writing more than 3
bytes to a file, as well as disk I/O to mask any effects that this
has.
(To get rid of that 3%, we could have an #defined number of mounts
in the percpu variable. So, instead of a CPU getting operate only
on percpu data when it accesses only one mount, it could stay on
percpu data when it only accesses N or fewer mounts.)
[AV] merged fix for __clear_mnt_mount() stepping on freed vfsmount
Acked-by: Al Viro <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
This patch adds two function mnt_want_write() and mnt_drop_write(). These are
used like a lock pair around and fs operations that might cause a write to the
filesystem.
Before these can become useful, we must first cover each place in the VFS
where writes are performed with a want/drop pair. When that is complete, we
can actually introduce code that will safely check the counts before allowing
r/w<->r/o transitions to occur.
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Al Viro <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
... and take it out of ->umount_begin() instances. Call with all locks
already taken (by do_umount()) and leave calling release_mounts() to
caller (it will do release_mounts() anyway, so we can just put into
the same list).
Signed-off-by: Al Viro <[email protected]>
|
|
make propagate_mount_busy() exclude references from the vfsmounts
that had been isolated by umount_tree() and are just waiting for
release_mounts() to dispose of their ->mnt_parent/->mnt_mountpoint.
Signed-off-by: Al Viro <[email protected]>
|
|
Fix the misspellings of "propogate", "writting" and (oh, the shame
:-) "kenrel" in the source tree.
Signed-off-by: Robert P. J. Day <[email protected]>
Signed-off-by: Adrian Bunk <[email protected]>
|
|
I noticed cache misses in touch_atime() that can be avoided if we keep
mnt_count & mnt_expiry_mark in a different cache line than mnt_flags
(mostly read)
mnt_count & mnt_expiry_mark are modified each time a file is opened/closed
in a file system.
touch_atime() is called each time a file is read, and generally needs to
read mnt_flags.
Other fields of struct vfsmount are mostly read so I chose to move
mnt_count & mnt_expiry_mark at the end of struct vfsmount. And adding a
comment so that nobody tries to re-arrange fields to fill the holes :)
On 64bits platforms, the new offsetof(mnt_count) is 0xC0
On 32bits platforms, it is 0x60, so I didnot add a
____cacheline_aligned_in_smp because it would have a too big impact on the
size of this object (in particular if CONFIG_X86_L1_CACHE_SHIFT=7)
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add "relatime" (relative atime) support. Relative atime only updates the
atime if the previous atime is older than the mtime or ctime. Like
noatime, but useful for applications like mutt that need to know when a
file has been read since it was last modified.
A corresponding patch against mount(8) is available at
http://userweb.kernel.org/~akpm/mount-relative-atime.txt
Signed-off-by: Valerie Henson <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Karel Zak <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Rename 'struct namespace' to 'struct mnt_namespace' to avoid confusion with
other namespaces being developped for the containers : pid, uts, ipc, etc.
'namespace' variables and attributes are also renamed to 'mnt_ns'
Signed-off-by: Kirill Korotaev <[email protected]>
Signed-off-by: Cedric Le Goater <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Herbert Poetzl <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Conflicts:
fs/nfs/inode.c
fs/super.c
Fix conflicts between patch 'NFS: Split fs/nfs/inode.c' and patch
'VFS: Permit filesystem to override root dentry on mount'
|
|
Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.
This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.
linux/mount.h has been added where necessary to make allyesconfig build
successfully.
Interest has also been expressed for use with the FUSE and XFS filesystems.
Signed-off-by: David Howells <[email protected]>
Acked-by: Al Viro <[email protected]>
Cc: Nathan Scott <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Allow a submount to be marked as being 'shrinkable' by means of the
vfsmount->mnt_flags, and then add a function 'shrink_submounts()' which
attempts to recursively unmount these submounts.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
do_kern_mount() does not allow the kernel to use private mount interfaces
without exposing the same interfaces to userland. The problem is that the
filesystem is referenced by name, thus meaning that it and its mount
interface must be registered in the global filesystem list.
vfs_kern_mount() passes the struct file_system_type as an explicit
parameter in order to overcome this limitation.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Turn noatime and nodiratime into per-mount instead of per-sb flags.
After all the preparations this is a rather trivial patch. The mount code
needs to treat the two options as per-mount instead of per-superblock, and
touch_atime needs to be changed to check the new MNT_ flags in addition to
the MS_ flags that are kept for filesystems that are always
noatime/nodiratime but not user settable anymore. Besides that core code
only nfs needed an update because it's leaving atime updates to the server
and thus sets the S_NOATIME flag on every inode, but needs to know whether
it's a real noatime mount for an getattr optimization.
While we're at it I've killed the IS_NOATIME/IS_NODIRATIME macros that were
only used by touch_atime.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Small cleanups in shared mounts code.
Signed-off-by: Miklos Szeredi <[email protected]>
Cc: Ram Pai <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
An unbindable mount does not forward or receive propagation. Also
unbindable mount disallows bind mounts. The semantics is as follows.
Bind semantics:
It is invalid to bind mount an unbindable mount.
Move semantics:
It is invalid to move an unbindable mount under shared mount.
Clone-namespace semantics:
If a mount is unbindable in the parent namespace, the corresponding
cloned mount in the child namespace becomes unbindable too. Note:
there is subtle difference, unbindable mounts cannot be bind mounted
but can be cloned during clone-namespace.
Signed-off-by: Ram Pai <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A slave mount always has a master mount from which it receives
mount/umount events. Unlike shared mount the event propagation does not
flow from the slave mount to the master.
Signed-off-by: Ram Pai <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This creates shared mounts. A shared mount when bind-mounted to some
mountpoint, propagates mount/umount events to each other. All the
shared mounts that propagate events to each other belong to the same
peer-group.
Signed-off-by: Ram Pai <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A private mount does not forward or receive propagation. This patch
provides user the ability to convert any mount to private.
Signed-off-by: Ram Pai <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The way we currently deal with quota and process accounting that might
keep vfsmount busy at umount time is inherently broken; we try to turn
them off just in case (not quite correctly, at that) and
a) pray umount doesn't fail (otherwise they'll stay turned off)
b) pray nobody doesn anything funny just as we turn quota off
Moreover, LSM provides hooks for doing the same sort of broken logics.
The proper way to deal with that is to introduce the second kind of
reference to vfsmount. Semantics:
- when the last normal reference is dropped, all special ones are
converted to normal ones and if there had been any, cleanup is done.
- normal reference can be cloned into a special one
- special reference can be converted to normal one; that's a no-op if
we'd already passed the point of no return (i.e. mntput() had
converted special references to normal and started cleanup).
The way it works: e.g. starting process accounting converts the vfsmount
reference pinned by the opened file into special one and turns it back
to normal when it gets shut down; acct_auto_close() is done when no
normal references are left. That way it does *not* obstruct umount(2)
and it silently gets turned off when the last normal reference to
vfsmount is gone. Which is exactly what we want...
The same should be done by LSM module that holds some internal
references to vfsmount and wants to shut them down on umount - it should
make them special and security_sb_umount_close() will be called exactly
when the last normal reference to vfsmount is gone.
quota handling is even simpler - we don't use normal file IO anymore, so
there's no need to hold vfsmounts at all. DQUOT_OFF() is done from
deactivate_super(), where it really belongs.
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
kernel/power/disk.c needs a declaration of name_to_dev_t() in scope. mount.h
seems like an appropriate choice.
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch renames _mntput() to something a little more descriptive:
mntput_no_expire().
Signed-off-by: Miklos Szeredi <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch renames vfsmount->mnt_fslink to something a little more
descriptive: vfsmount->mnt_expire.
Signed-off-by: Mike Waychison <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!
|