diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/Locking | 2 | ||||
-rw-r--r-- | Documentation/filesystems/cifs/README | 2 | ||||
-rw-r--r-- | Documentation/filesystems/dax.txt | 32 | ||||
-rw-r--r-- | Documentation/filesystems/directory-locking | 32 | ||||
-rw-r--r-- | Documentation/filesystems/nilfs2.txt | 5 | ||||
-rw-r--r-- | Documentation/filesystems/overlayfs.txt | 9 | ||||
-rw-r--r-- | Documentation/filesystems/pohmelfs/design_notes.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/porting | 60 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 1 | ||||
-rw-r--r-- | Documentation/filesystems/qnx6.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.txt | 2 |
11 files changed, 123 insertions, 26 deletions
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 619af9bfdcb3..75eea7ce3d7c 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -194,7 +194,7 @@ prototypes: void (*invalidatepage) (struct page *, unsigned int, unsigned int); int (*releasepage) (struct page *, int); void (*freepage)(struct page *); - int (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset); + int (*direct_IO)(struct kiocb *, struct iov_iter *iter); int (*migratepage)(struct address_space *, struct page *, struct page *); int (*launder_page)(struct page *); int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); diff --git a/Documentation/filesystems/cifs/README b/Documentation/filesystems/cifs/README index 2d5622f60e11..a54788405429 100644 --- a/Documentation/filesystems/cifs/README +++ b/Documentation/filesystems/cifs/README @@ -272,7 +272,7 @@ A partial list of the supported mount options follows: same domain (e.g. running winbind or nss_ldap) and the server supports the Unix Extensions then the uid and gid can be retrieved from the server (and uid - and gid would not have to be specifed on the mount. + and gid would not have to be specified on the mount. For servers which do not support the CIFS Unix extensions, the default uid (and gid) returned on lookup of existing files will be the uid (gid) of the person diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt index 7bde64014a89..ce4587d257d2 100644 --- a/Documentation/filesystems/dax.txt +++ b/Documentation/filesystems/dax.txt @@ -79,6 +79,38 @@ These filesystems may be used for inspiration: - ext4: the fourth extended filesystem, see Documentation/filesystems/ext4.txt +Handling Media Errors +--------------------- + +The libnvdimm subsystem stores a record of known media error locations for +each pmem block device (in gendisk->badblocks). If we fault at such location, +or one with a latent error not yet discovered, the application can expect +to receive a SIGBUS. Libnvdimm also allows clearing of these errors by simply +writing the affected sectors (through the pmem driver, and if the underlying +NVDIMM supports the clear_poison DSM defined by ACPI). + +Since DAX IO normally doesn't go through the driver/bio path, applications or +sysadmins have an option to restore the lost data from a prior backup/inbuilt +redundancy in the following ways: + +1. Delete the affected file, and restore from a backup (sysadmin route): + This will free the file system blocks that were being used by the file, + and the next time they're allocated, they will be zeroed first, which + happens through the driver, and will clear bad sectors. + +2. Truncate or hole-punch the part of the file that has a bad-block (at least + an entire aligned sector has to be hole-punched, but not necessarily an + entire filesystem block). + +These are the two basic paths that allow DAX filesystems to continue operating +in the presence of media errors. More robust error recovery mechanisms can be +built on top of this in the future, for example, involving redundancy/mirroring +provided at the block layer through DM, or additionally, at the filesystem +level. These would have to rely on the above two tenets, that error clearing +can happen either by sending an IO through the driver, or zeroing (also through +the driver). + + Shortcomings ------------ diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking index 09bbf9a54f80..c314badbcfc6 100644 --- a/Documentation/filesystems/directory-locking +++ b/Documentation/filesystems/directory-locking @@ -1,30 +1,37 @@ Locking scheme used for directory operations is based on two -kinds of locks - per-inode (->i_mutex) and per-filesystem +kinds of locks - per-inode (->i_rwsem) and per-filesystem (->s_vfs_rename_mutex). - When taking the i_mutex on multiple non-directory objects, we + When taking the i_rwsem on multiple non-directory objects, we always acquire the locks in order by increasing address. We'll call that "inode pointer" order in the following. For our purposes all operations fall in 5 classes: 1) read access. Locking rules: caller locks directory we are accessing. +The lock is taken shared. -2) object creation. Locking rules: same as above. +2) object creation. Locking rules: same as above, but the lock is taken +exclusive. 3) object removal. Locking rules: caller locks parent, finds victim, -locks victim and calls the method. +locks victim and calls the method. Locks are exclusive. 4) rename() that is _not_ cross-directory. Locking rules: caller locks -the parent and finds source and target. If target already exists, lock -it. If source is a non-directory, lock it. If that means we need to -lock both, lock them in inode pointer order. +the parent and finds source and target. In case of exchange (with +RENAME_EXCHANGE in rename2() flags argument) lock both. In any case, +if the target already exists, lock it. If the source is a non-directory, +lock it. If we need to lock both, lock them in inode pointer order. +Then call the method. All locks are exclusive. +NB: we might get away with locking the the source (and target in exchange +case) shared. 5) link creation. Locking rules: * lock parent * check that source is not a directory * lock source * call the method. +All locks are exclusive. 6) cross-directory rename. The trickiest in the whole bunch. Locking rules: @@ -35,11 +42,12 @@ rules: fail with -ENOTEMPTY * if new parent is equal to or is a descendent of source fail with -ELOOP - * If target exists, lock it. If source is a non-directory, lock - it. In case that means we need to lock both source and target, - do so in inode pointer order. + * If it's an exchange, lock both the source and the target. + * If the target exists, lock it. If the source is a non-directory, + lock it. If we need to lock both, do so in inode pointer order. * call the method. - +All ->i_rwsem are taken exclusive. Again, we might get away with locking +the the source (and target in exchange case) shared. The rules above obviously guarantee that all directories that are going to be read, modified or removed by method will be locked by caller. @@ -73,7 +81,7 @@ objects - A < B iff A is an ancestor of B. attempt to acquire some lock and already holds at least one lock. Let's consider the set of contended locks. First of all, filesystem lock is not contended, since any process blocked on it is not holding any locks. -Thus all processes are blocked on ->i_mutex. +Thus all processes are blocked on ->i_rwsem. By (3), any process holding a non-directory lock can only be waiting on another non-directory lock with a larger address. Therefore diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt index 41c3d332acc9..5b21ef76f751 100644 --- a/Documentation/filesystems/nilfs2.txt +++ b/Documentation/filesystems/nilfs2.txt @@ -268,3 +268,8 @@ among NILFS2 files can be depicted as follows: ( regular file, directory, or symlink ) For detail on the format of each file, please see include/linux/nilfs2_fs.h. + +There are no patents or other intellectual property that we protect +with regard to the design of NILFS2. It is allowed to replicate the +design in hopes that other operating systems could share (mount, read, +write, etc.) data stored in this format. diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt index 28091457b71a..d6259c786316 100644 --- a/Documentation/filesystems/overlayfs.txt +++ b/Documentation/filesystems/overlayfs.txt @@ -194,15 +194,6 @@ If a file with multiple hard links is copied up, then this will "break" the link. Changes will not be propagated to other names referring to the same inode. -Symlinks in /proc/PID/ and /proc/PID/fd which point to a non-directory -object in overlayfs will not contain valid absolute paths, only -relative paths leading up to the filesystem's root. This will be -fixed in the future. - -Some operations are not atomic, for example a crash during copy_up or -rename will leave the filesystem in an inconsistent state. This will -be addressed in the future. - Changes to underlying filesystems --------------------------------- diff --git a/Documentation/filesystems/pohmelfs/design_notes.txt b/Documentation/filesystems/pohmelfs/design_notes.txt index 8aef91335701..106d17fbb05f 100644 --- a/Documentation/filesystems/pohmelfs/design_notes.txt +++ b/Documentation/filesystems/pohmelfs/design_notes.txt @@ -29,7 +29,7 @@ Main features of this FS include: * Read request (data read, directory listing, lookup requests) balancing between multiple servers. * Write requests are replicated to multiple servers and completed only when all of them are acked. * Ability to add and/or remove servers from the working set at run-time. - * Strong authentification and possible data encryption in network channel. + * Strong authentication and possible data encryption in network channel. * Extended attributes support. POHMELFS is based on transactions, which are potentially long-standing objects that live diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index f1b87d8aa2da..a5fb89cac615 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -525,3 +525,63 @@ in your dentry operations instead. set_delayed_call() where it used to set *cookie. ->put_link() is gone - just give the destructor to set_delayed_call() in ->get_link(). +-- +[mandatory] + ->getxattr() and xattr_handler.get() get dentry and inode passed separately. + dentry might be yet to be attached to inode, so do _not_ use its ->d_inode + in the instances. Rationale: !@#!@# security_d_instantiate() needs to be + called before we attach dentry to inode. +-- +[mandatory] + symlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/ + i_pipe/i_link union zeroed out at inode eviction. As the result, you can't + assume that non-NULL value in ->i_nlink at ->destroy_inode() implies that + it's a symlink. Checking ->i_mode is really needed now. In-tree we had + to fix shmem_destroy_callback() that used to take that kind of shortcut; + watch out, since that shortcut is no longer valid. +-- +[mandatory] + ->i_mutex is replaced with ->i_rwsem now. inode_lock() et.al. work as + they used to - they just take it exclusive. However, ->lookup() may be + called with parent locked shared. Its instances must not + * use d_instantiate) and d_rehash() separately - use d_add() or + d_splice_alias() instead. + * use d_rehash() alone - call d_add(new_dentry, NULL) instead. + * in the unlikely case when (read-only) access to filesystem + data structures needs exclusion for some reason, arrange it + yourself. None of the in-tree filesystems needed that. + * rely on ->d_parent and ->d_name not changing after dentry has + been fed to d_add() or d_splice_alias(). Again, none of the + in-tree instances relied upon that. + We are guaranteed that lookups of the same name in the same directory + will not happen in parallel ("same" in the sense of your ->d_compare()). + Lookups on different names in the same directory can and do happen in + parallel now. +-- +[recommended] + ->iterate_shared() is added; it's a parallel variant of ->iterate(). + Exclusion on struct file level is still provided (as well as that + between it and lseek on the same struct file), but if your directory + has been opened several times, you can get these called in parallel. + Exclusion between that method and all directory-modifying ones is + still provided, of course. + + Often enough ->iterate() can serve as ->iterate_shared() without any + changes - it is a read-only operation, after all. If you have any + per-inode or per-dentry in-core data structures modified by ->iterate(), + you might need something to serialize the access to them. If you + do dcache pre-seeding, you'll need to switch to d_alloc_parallel() for + that; look for in-tree examples. + + Old method is only used if the new one is absent; eventually it will + be removed. Switch while you still can; the old one won't stay. +-- +[mandatory] + ->atomic_open() calls without O_CREAT may happen in parallel. +-- +[mandatory] + ->setxattr() and xattr_handler.set() get dentry and inode passed separately. + dentry might be yet to be attached to inode, so do _not_ use its ->d_inode + in the instances. Rationale: !@#!@# security_d_instantiate() needs to be + called before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack + ->d_instantiate() uses not just ->getxattr() but ->setxattr() as well. diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 7f5607a089b4..e8d00759bfa5 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -225,6 +225,7 @@ Table 1-2: Contents of the status files (as of 4.1) TracerPid PID of process tracing this process (0 if not) Uid Real, effective, saved set, and file system UIDs Gid Real, effective, saved set, and file system GIDs + Umask file mode creation mask FDSize number of file descriptor slots currently allocated Groups supplementary group list NStgid descendant namespace thread group ID hierarchy diff --git a/Documentation/filesystems/qnx6.txt b/Documentation/filesystems/qnx6.txt index 408679789136..4f3d6a882bdc 100644 --- a/Documentation/filesystems/qnx6.txt +++ b/Documentation/filesystems/qnx6.txt @@ -16,7 +16,7 @@ qnx6fs shares many properties with traditional Unix filesystems. It has the concepts of blocks, inodes and directories. On QNX it is possible to create little endian and big endian qnx6 filesystems. This feature makes it possible to create and use a different endianness fs -for the target (QNX is used on quite a range of embedded systems) plattform +for the target (QNX is used on quite a range of embedded systems) platform running on a different endianness. The Linux driver handles endianness transparently. (LE and BE) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 4164bd6397a2..c61a223ef3ff 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -591,7 +591,7 @@ struct address_space_operations { void (*invalidatepage) (struct page *, unsigned int, unsigned int); int (*releasepage) (struct page *, int); void (*freepage)(struct page *); - ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter, loff_t offset); + ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); /* migrate the contents of a page to the specified target */ int (*migratepage) (struct page *, struct page *); int (*launder_page) (struct page *); |