aboutsummaryrefslogtreecommitdiff
path: root/include/uapi/linux/fs.h
AgeCommit message (Collapse)AuthorFilesLines
2014-04-01vfs: add cross-renameMiklos Szeredi1-0/+1
If flags contain RENAME_EXCHANGE then exchange source and destination files. There's no restriction on the type of the files; e.g. a directory can be exchanged with a symlink. Signed-off-by: Miklos Szeredi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: J. Bruce Fields <[email protected]>
2014-04-01vfs: add RENAME_NOREPLACE flagMiklos Szeredi1-0/+2
If this flag is specified and the target of the rename exists then the rename syscall fails with EEXIST. The VFS does the existence checking, so it is trivial to enable for most local filesystems. This patch only enables it in ext4. For network filesystems the VFS check is not enough as there may be a race between a remote create and the rename, so these filesystems need to handle this flag in their ->rename() implementations to ensure atomicity. Andy writes about why this is useful: "The trivial answer: to eliminate the race condition from 'mv -i'. Another answer: there's a common pattern to atomically create a file with contents: open a temporary file, write to it, optionally fsync it, close it, then link(2) it to the final name, then unlink the temporary file. The reason to use link(2) is because it won't silently clobber the destination. This is annoying: - It requires an extra system call that shouldn't be necessary. - It doesn't work on (IMO sensible) filesystems that don't support hard links (e.g. vfat). - It's not atomic -- there's an intermediate state where both files exist. - It's ugly. The new rename flag will make this totally sensible. To be fair, on new enough kernels, you can also use O_TMPFILE and linkat to achieve the same thing even more cleanly." Suggested-by: Andy Lutomirski <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]> Reviewed-by: J. Bruce Fields <[email protected]>
2013-09-10fs: bump inode and dentry counters to longGlauber Costa1-3/+3
This series reworks our current object cache shrinking infrastructure in two main ways: * Noticing that a lot of users copy and paste their own version of LRU lists for objects, we put some effort in providing a generic version. It is modeled after the filesystem users: dentries, inodes, and xfs (for various tasks), but we expect that other users could benefit in the near future with little or no modification. Let us know if you have any issues. * The underlying list_lru being proposed automatically and transparently keeps the elements in per-node lists, and is able to manipulate the node lists individually. Given this infrastructure, we are able to modify the up-to-now hammer called shrink_slab to proceed with node-reclaim instead of always searching memory from all over like it has been doing. Per-node lru lists are also expected to lead to less contention in the lru locks on multi-node scans, since we are now no longer fighting for a global lock. The locks usually disappear from the profilers with this change. Although we have no official benchmarks for this version - be our guest to independently evaluate this - earlier versions of this series were performance tested (details at http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no visible performance regressions while yielding a better qualitative behavior in NUMA machines. With this infrastructure in place, we can use the list_lru entry point to provide memcg isolation and per-memcg targeted reclaim. Historically, those two pieces of work have been posted together. This version presents only the infrastructure work, deferring the memcg work for a later time, so we can focus on getting this part tested. You can see more about the history of such work at http://lwn.net/Articles/552769/ Dave Chinner (18): dcache: convert dentry_stat.nr_unused to per-cpu counters dentry: move to per-sb LRU locks dcache: remove dentries from LRU before putting on dispose list mm: new shrinker API shrinker: convert superblock shrinkers to new API list: add a new LRU list type inode: convert inode lru list to generic lru list code. dcache: convert to use new lru list infrastructure list_lru: per-node list infrastructure shrinker: add node awareness fs: convert inode and dentry shrinking to be node aware xfs: convert buftarg LRU to generic code xfs: rework buffer dispose list tracking xfs: convert dquot cache lru to list_lru fs: convert fs shrinkers to new scan/count API drivers: convert shrinkers to new count/scan API shrinker: convert remaining shrinkers to count/scan API shrinker: Kill old ->shrink API. Glauber Costa (7): fs: bump inode and dentry counters to long super: fix calculation of shrinkable objects for small numbers list_lru: per-node API vmscan: per-node deferred work i915: bail out earlier when shrinker cannot acquire mutex hugepage: convert huge zero page shrinker to new shrinker API list_lru: dynamically adjust node arrays This patch: There are situations in very large machines in which we can have a large quantity of dirty inodes, unused dentries, etc. This is particularly true when umounting a filesystem, where eventually since every live object will eventually be discarded. Dave Chinner reported a problem with this while experimenting with the shrinker revamp patchset. So we believe it is time for a change. This patch just moves int to longs. Machines where it matters should have a big long anyway. Signed-off-by: Glauber Costa <[email protected]> Cc: Dave Chinner <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Al Viro <[email protected]> Cc: Artem Bityutskiy <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Carlos Maiolino <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dave Chinner <[email protected]> Cc: David Rientjes <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: J. Bruce Fields <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Stultz <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Thomas Hellstrom <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Al Viro <[email protected]>
2013-04-29mm: make snapshotting pages for stable writes a per-bio operationDarrick J. Wong1-1/+0
Walking a bio's page mappings has proved problematic, so create a new bio flag to indicate that a bio's data needs to be snapshotted in order to guarantee stable pages during writeback. Next, for the one user (ext3/jbd) of snapshotting, hook all the places where writes can be initiated without PG_writeback set, and set BIO_SNAP_STABLE there. We must also flag journal "metadata" bios for stable writeout, since file data can be written through the journal. Finally, the MS_SNAP_STABLE mount flag (only used by ext3) is now superfluous, so get rid of it. [[email protected]: rename _submit_bh()'s `flags' to `bio_flags', delobotomize the _submit_bh declaration] [[email protected]: teeny cleanup] Signed-off-by: Darrick J. Wong <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Artem Bityutskiy <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-02-21block: optionally snapshot page contents to provide stable pages during writeDarrick J. Wong1-0/+3
This provides a band-aid to provide stable page writes on jbd without needing to backport the fixed locking and page writeback bit handling schemes of jbd2. The band-aid works by using bounce buffers to snapshot page contents instead of waiting. For those wondering about the ext3 bandage -- fixing the jbd locking (which was done as part of ext4dev years ago) is a lot of surgery, and setting PG_writeback on data pages when we actually hold the page lock dropped ext3 performance by nearly an order of magnitude. If we're going to migrate iscsi and raid to use stable page writes, the complaints about high latency will likely return. We might as well centralize their page snapshotting thing to one place. Signed-off-by: Darrick J. Wong <[email protected]> Tested-by: Andy Lutomirski <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Artem Bityutskiy <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Joel Becker <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Eric Van Hensbergen <[email protected]> Cc: Ron Minnich <[email protected]> Cc: Latchesar Ionkov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-10-16Unexport some bits of linux/fs.hDavid Howells1-132/+0
There are some bits of linux/fs.h which are only used within the kernel and shouldn't be in the UAPI. Move these from uapi/linux/fs.h into linux/fs.h. Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2012-10-13UAPI: (Scripted) Disintegrate include/linuxDavid Howells1-0/+334
Signed-off-by: David Howells <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: Michael Kerrisk <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Acked-by: Dave Jones <[email protected]>